# install.packages("remotes")
::install_git("https://code.europa.eu/digclass/digclass.git") remotes
Class (still) matters
Class is a key concept in the social and political sciences, and it explains many important phenomena, from party preferences and voting over social attitudes to health outcomes (e.g., Elo 2009; Gingrich 2017; Schwander and Häusermann 2013; Häusermann et al. 2022; Evans 2000). Therefore, it is important for every empirical social and political researcher to know how to measure people’s positions in class structures.
Sociologists have spent a lot of time on developing class schemes that make the abstract concept of “class” empirically measurable. The probably most famous class scheme is the Erikson-Goldthorpe-Portocarero (EGP) class scheme that was developed in the 1970s (Erikson, Goldthorpe, and Portocarero 1979), but there are also more recent schemes that take into account the fact that, as a result of technological change, increased educational attainment, and other factors, societies and labor markets in the 21st century look quite different than they did in the 1970s or 1980s. Daniel Oesch’s (2006) scheme is an important modern class scheme.
The basis for class schemes is generally occupation – what job does someone have? For example, someone who is a medical doctor would typically be seen as a “higher-skilled professional”, whereas a welder would usually be classified as a “skilled manual worker”. People’s occupations are usually measured with occupational classification schemes, the most widely used is the International Labour Organization’s (ILO) International Standard Classification of Occupations (ISCO) scheme.1 This scheme comes in different versions reflecting the years they were adopted: ISCO-68, ISCO-88, and ISCO-08.
There are of course some people who’s occupation is being self-employed – they run their own businesses, which can be a small one-person business (e.g., a shop) but it can also be a medium-sized company with 500 employees. Obviously, this has effects on their class membership: A small shop owner would often be considered to be a member of the “petite bourgeoisie”, while someone who owns a larger company might be considered a “capital owner”.
What information do you need, and where do you get it?
Class is an individual-level variable: A person can be a member of the working class, but a country cannot. This means that we use individual-level data – survey data – to measure class. Such survey data need to contain three pieces of information (variables) that reflect people’s class membership:
- Their occupation. This needs to be measured at the highest level of detail, meaning with the four-digit ISCO-88 or ISCO-08 scheme.
- Whether or not they are self-employed.
- If they are self-employed, how many employees they have.
Many survey datasets contain this information in some form, but it is usually easiest to use either data from large and well-known comparative social survey projects like the International Social Survey Project (ISSP) or the European Social Survey (ESS).2 Both are free to use (but you do need to register as a user). Many national survey projects also contain that information, but occupation is often coded based on the ISCO scheme but based on national occupational classification schemes (e.g., ANZSCO for Australia and New Zealand or SOC for the United States). These can be translated to the ISCO scheme with specific conversion tables, but this often takes quite a bit of time and effort.
Technically speaking, applying a class scheme to survey data is quite a bit of work because you need to go over a long list of occupations – the four-digit ISCO08 scheme contains 473 different occupations – and decide which class they belong to. Following this, you have to write code to group all the different observations in your dataset into their classes. Obviously, this would take a lot of time.
Fortunately, people have written packages for R
that make this a quick and (normally) easy thing to do. Two relevant packages are the DIGCLASS
package, which was developed by researchers at the EU, and the occupar
package.3
The rest of this tutorial shows how you can measure people’s class with data from the ISSP and using the DIGCLASS
package for R
. Most of this also applies if you work with data from the ESS, but some data import and cleaning steps might be different. Below is an example of how your dataset needs to look like that you can use to guide your data cleaning and preparation when you work with the ESS.
Installing the DIGCLASS
package
The DIGCLASS
package is not on CRAN (the official R
“app store”), but you can install it with the remotes
-package (which you need to have installed first, of course):
Next, we load the package with library()
, in addition to the tidyverse
package:
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(DIGCLASS)
theme_set(theme_classic())
Getting ISSP data
In this tutorial, we will work with data from the 2016 Role of Government round of the ISSP (v. 2.0.0; 19.09.2018), which you can download from the GESIS data repository: https://www.gesis.org/en/issp/data-and-documentation/role-of-government/2016#c127852. As mentioned earlier, you need to register as a user, but this is free – and also gives you access to many other survey datasets like the Eurobarometer or the European Values Study.
Make sure that you download the data in SPSS (.sav
) format and that you store them in the folder that you are working in (ideally your RStudio Project folder).
Importing the dataset
To import the dataset, you can use the read_sav()
function from the haven
package. Important: Simply import the dataset for now, do not yet convert it with labelled::unlabelled()
! I have stored the dataset as issp16.sav
, so I need to specify this in my code – you obviously need to use the name that you gave your dataset file:
<- haven::read_sav("issp16.sav") issp
As other large survey datasets, the ISSP dataset is very large and contains almost 400 variables:
dim(issp)
[1] 48720 395
To make things easier for now, we trim the data to the variables we actually need plus one variable (v19
) that we can later use as a dependent variable in an example analysis:
%>%
issp select(studyno,country, # good to keep these in
# ISCO-08 occupational codes
ISCO08, # Employment relationship, to identify self-employed
EMPREL, # number of employees if self-employed
NEMPLOY, # a variable measuring respondents' views on whether the government should spend more on the unemployed
v19 -> issp )
Data preparation
The DIGCLASS
package expects the data it works with to be in a specific format. If you for example call up the help file for the DIGCLASS::isco08_to_oesch()
function with ?DIGCLASS::isco08_to_oesch
and scroll down a bit, you see that the function needs three main inputs:
x
, which is the four-digit ISCO-08 scores. They need to be stored as text (character
)self_employed
, which needs to be a “numeric vector indicating whether each individual is self-employed (1) or an employee (0).”n_employees
, which needs to be a “numeric vector indicating the number of employees under each respondent.”
This means we need to have three variables that correspond exactly to this: ISCO-08 scores as text, a 0/1 dummy indicating whether someone is self-employed, and a variable containing the number of employees for those who are self-employed.
Preparing the ISCO-08 scores
Let’s start the data preparation with ISCO08
, and let’s first take a closer look at how it is stored now:
class(issp$ISCO08)
[1] "haven_labelled" "vctrs_vctr" "double"
From the result of class()
, we see that the variable is stored in a labelled
-type format – which is because the dataset was imported with haven
– and this is also the case for all the other variables (see the Environment).
To see a bit more clearly how the ISCO08
variable looks like, let’s look at the first few observations:
%>%
issp select(ISCO08) %>%
slice_head(n = 10) # to get first ten observations
# A tibble: 10 × 1
ISCO08
<dbl+lbl>
1 2611 [Lawyers]
2 2512 [Software developers]
3 1212 [Human resource managers]
4 1439 [Services managers not elsewhere classified]
5 4419 [Clerical support workers not elsewhere classified]
6 1345 [Education managers]
7 3230 [Traditional and complementary medicine associate professionals]
8 2654 [Film, stage and related directors and producers]
9 2611 [Lawyers]
10 5131 [Waiters]
You see that the first observation is a lawyer, which has the ISCO-08 code 2611, the next is a software developer (ISCO-08 code 2512), and so on.
Now comes an important step: We need to convert the ISCO08
variable to a character-type variable – for some reason, the DIGCLASS
package expects that the ISCO codes are stored as text (e.g., “2611”, “2512”), and that is what we need to deliver for the package to work.
To do that, we simply use as.character()
:
%>%
issp mutate(isco_nums_as_text = as.character(ISCO08)) -> issp
The new variable should now be a character-type variable:
class(issp$isco_nums_as_text)
[1] "character"
This means that the ISCO-08 scores are taken care off and we can move on to the next piece of information that we need: a 0/1 variable that tells us if people are self-employed.
Self-employment
Information about how people earn their living in general is contained in the EMPREL
variable. To see how this looks like, we can quickly tabulate the individual categories:
table(issp$EMPREL)
1 2 3 4
33504 4169 1797 1185
Unfortunately, we only get numbers. This is because the dataset is still stored in the labelled
format, and we can quickly fix this by using unlabelled()
:
<- labelled::unlabelled(issp) issp
Now the tabulation should work as intended:
table(issp$EMPREL)
NAP (Code 3 in WORK; NZ: Code 2-9 MAINSTAT)
0
Employee
33504
Self-employed without employees
4169
Self-employed with employees
1797
Working for own family's business
1185
No answer
0
You see that most respondents fall into the “Employee” category, but there are also people who are self-employed with and without employees. Some also work in a family business. Finally, there are some empty categories that have no observations, but we ignore them for now.
All we really need to do is to re-code this variable into a 0/1 dummy that is equal to 1 if people are self-employed and 0 otherwise. Here, we can use the case_match()
function, which is simply put a more advanced version of if_else()
:
%>%
issp mutate(selfemp = case_match(EMPREL,
c("Self-employed without employees",
"Self-employed with employees") ~ 1,
c("Employee","Working for own family's business") ~ 0,
.default = NA)) -> issp
Maybe you can already see that we are here telling R
to create a new variable called selfemp
that is 1 if the EMPREL
variable is either “Self-employed without employees” or “Self-employed with employees” and 0 otherwise. To make sure that observations that do not fit either of these conditions are excluded, we specify .default = NA
.
We can do a quick cross-tabulation to see if the re-coding worked as intended:
table(issp$EMPREL,issp$selfemp)
0 1
NAP (Code 3 in WORK; NZ: Code 2-9 MAINSTAT) 0 0
Employee 33504 0
Self-employed without employees 0 4169
Self-employed with employees 0 1797
Working for own family's business 1185 0
No answer 0 0
It looks like things did work: the self-employed are coded as 1, all others are 0.
Number of employees
The final variable we need is how many employees those respondents who are self-employed have. Here, we can use the NEMPLOY
variable from the ISSP dataset, but let’s again begin by simply checking what type this variable is:
class(issp$NEMPLOY)
[1] "numeric"
The variable is already numeric, which means we do not really have to do anything with it – it is good to go. But we can nevertheless quickly visualize it to see how it is distributed:
%>%
issp ggplot(aes(x = NEMPLOY)) +
geom_histogram(color = "white")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 46996 rows containing non-finite outside the scale range
(`stat_bin()`).
There are a few extreme outliers which make it difficult to see anything. We can get a clearer picture by removing those with more than 100 employees from the graph (obviously, we only do this for the graph!):
%>%
issp filter(NEMPLOY<100) %>%
ggplot(aes(x = NEMPLOY)) +
geom_histogram(color = "white")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Most respondents who have employees have only relatively small businesses with less than 25 employees.
Generating a class variable
We now have all pieces of information we need and can get to the class variable. Let’s start by generating two of Daniel Oesch’s (2006) class schemes, the very simple one with five classes and the more advanced one with eight classes. Each can be generated with the isco08_to_oesch()
function. The following code shows how to create both class schemes at once:
%>%
issp mutate(oesch_5 = DIGCLASS::isco08_to_oesch(x = isco_nums_as_text,
self_employed = selfemp,
n_employees = NEMPLOY,
n_classes = 5,
label = T,
to_factor = F),
oesch_8 = DIGCLASS::isco08_to_oesch(x = isco_nums_as_text,
self_employed = selfemp,
n_employees = NEMPLOY,
n_classes = 8,
label = T,
to_factor = F)) -> issp
ℹ ISCO variable has occupations with digits less than 4. Converting to 4 digits.
• Converted `110` to `0110`
• Converted `310` to `0310`
• Converted `210` to `0210`
ℹ ISCO variable has occupations with digits less than 4. Converting to 4 digits.
• Converted `110` to `0110`
• Converted `310` to `0310`
• Converted `210` to `0210`
Let’s have a look at the results:
table(issp$oesch_5)
'Higher-grade service class' 'Lower-grade service class'
6374 7058
'Skilled workers' 'Small business owners'
12478 1117
'Unskilled workers'
7558
table(issp$oesch_8)
'(Associate) managers'
5541
'Clerks'
4043
'Production workers'
8309
'Self-employed professionals and large employers'
493
'Service workers'
7684
'Small business owners'
1117
'Socio-cultural (semi-)professionals'
4667
'Technical (semi-)professionals'
2731
And we have what we want: Two class schemes, one simpler and the other a bit more detailed. The second one is used by for example Gingrich (2017) or Schwander & Häusermann (2013).4
Example analysis
Let’s say we wanted to find out if people’s class has an effect on how they think about the welfare state, specifically whether the government should do more to support the unemployed. As mentioned earlier, the ISSP includes a variable that measures these attitudes and which looks like this:
class(issp$v19)
[1] "factor"
table(issp$v19)
NAV (PH) Spend much more Spend more
0 7209 12422
Spend the same as now Spend less Spend much less
16760 6546 2390
Can't choose No answer
0 0
The variable is stored as a factor (i.e., as a categorical variable), but it has five categories – so we can, sort of, get away with treating it as if it were numeric (this is what Thewissen and Rueda 2019 also do). To be able to do that, we first have to check how it looks internally and then convert it:
::visfactor(dataset = issp,
bst290variable = "v19")
values labels
1 NAV (PH)
2 Spend much more
3 Spend more
4 Spend the same as now
5 Spend less
6 Spend much less
7 Can't choose
8 No answer
There is a bit of a divergence between values and labels – the NAV (PH)
category is empty (see above), which means the lowest actual category has the value of 2 and so on. We can fix this by simply using droplevels()
to get rid of empty categories and then as.numeric()
.
One thing we need to pay attention to is that, right now, lower scores correspond to more support for government aid to the unemployed. This is a bit strange to work with, so we reverse the scale of the new variable by subtracting it from 6 (so that the score of 1 becomes 6-1 = 5, 2 becomes 6-2 = 4, and so on:)
%>%
issp mutate(v19 = droplevels(v19),
unemspend = 6 - as.numeric(v19)) -> issp
The new numeric variable has values from 1 to 5, which is what we want:
table(issp$unemspend)
1 2 3 4 5
2390 6546 16760 12422 7209
Let’s now see to class influences attitudes toward help for the unemployed in Sweden (it is important to focus on one country alone, otherwise a simple linear regression model will give wrong results!):
%>%
issp filter(country == "SE-Sweden") -> swe_data
<- lm(unemspend ~ oesch_5,
mod1 data = swe_data)
summary(mod1)
Call:
lm(formula = unemspend ~ oesch_5, data = swe_data)
Residuals:
Min 1Q Median 3Q Max
-2.38053 -0.38053 0.03929 0.43478 2.43478
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.80315 0.05211 53.793 < 2e-16 ***
oesch_5'Lower-grade service class' 0.15756 0.07196 2.190 0.0288 *
oesch_5'Skilled workers' 0.50707 0.07234 7.010 4.56e-12 ***
oesch_5'Small business owners' -0.23793 0.18084 -1.316 0.1886
oesch_5'Unskilled workers' 0.57738 0.09391 6.148 1.16e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.8305 on 939 degrees of freedom
(196 observations deleted due to missingness)
Multiple R-squared: 0.07685, Adjusted R-squared: 0.07291
F-statistic: 19.54 on 4 and 939 DF, p-value: 1.842e-15
As always, one category (here: “Higher-grade service class”) is omitted from the model and the coefficients show us the difference from each other class to the omitted one. This means that all classes except for small business owners are significantly more supportive of government help for the unemployed than the higher-grade service class.
To get a better sense, we can use prediction::prediction_summary()
to get predicted support scores per class based on the model:
::prediction_summary(model = mod1,
predictionat = list(oesch_5 = unique(na.omit(swe_data$oesch_5))))
at(oesch_5) Prediction SE z p lower upper
'Skilled workers' 3.310 0.05017 65.98 0.000e+00 3.212 3.409
'Higher-grade service class' 2.803 0.05211 53.79 0.000e+00 2.701 2.905
'Lower-grade service class' 2.961 0.04963 59.65 0.000e+00 2.863 3.058
'Unskilled workers' 3.381 0.07813 43.27 0.000e+00 3.227 3.534
'Small business owners' 2.565 0.17317 14.81 1.202e-49 2.226 2.905
We can get an ever better picture of the results if we just visualize the result:
::prediction_summary(model = mod1,
predictionat = list(oesch_5 = unique(na.omit(swe_data$oesch_5)))) %>%
ggplot(aes(x = Prediction,
y = reorder(`at(oesch_5)`,Prediction),
xmin = lower, xmax = upper)) +
geom_point(stat = "identity") +
geom_linerange() +
scale_x_continuous(breaks = seq(1,5,1),
limits = c(1,5)) +
labs(x = "Predicted support for government aid to the unemployed",
y = "Class", caption = "95% confidence intervals")
Note that we use reorder()
to arrange the classes from highest to lowest support. Clearly, small business owners in Sweden are least supportive of government help for the unemployed, while unskilled and skilled workers (i.e., the “working class”) are most supportive. Looks like class does still matter in Sweden!
References
Footnotes
See also https://isco-ilo.netlify.app/en/isco-08/#download-isco-08-material↩︎
See https://issp.org/ and https://www.europeansocialsurvey.org/.↩︎
See https://code.europa.eu/digclass/digclass and https://github.com/DiogoFerrari/occupar.↩︎
Gingrich calls “socio-cultural (semi-) professionals” the “new middle class”, “technical (semi-) professionals”, “clerks”, and “(Associate) managers” are the “old middle class”, “service workers” are the “new working class”, and “Production workers” are the “old working class”. If you wanted, you could use
case_match()
to re-code theoesch_5
variable into a new and simpler class scheme that corresponds to what Gingrich is using.↩︎