Measuring class with survey data

Survey data
Class
ISSP
Political science
Sociology
Author

Carlo Knotz

Published

March 8, 2025

Class (still) matters

Class is a key concept in the social and political sciences, and it explains many important phenomena, from party preferences and voting over social attitudes to health outcomes (e.g., Elo 2009; Gingrich 2017; Schwander and Häusermann 2013; Häusermann et al. 2022; Evans 2000). Therefore, it is important for every empirical social and political researcher to know how to measure people’s positions in class structures.

Sociologists have spent a lot of time on developing class schemes that make the abstract concept of “class” empirically measurable. The probably most famous class scheme is the Erikson-Goldthorpe-Portocarero (EGP) class scheme that was developed in the 1970s (Erikson, Goldthorpe, and Portocarero 1979), but there are also more recent schemes that take into account the fact that, as a result of technological change, increased educational attainment, and other factors, societies and labor markets in the 21st century look quite different than they did in the 1970s or 1980s. Daniel Oesch’s (2006) scheme is an important modern class scheme.

The basis for class schemes is generally occupation – what job does someone have? For example, someone who is a medical doctor would typically be seen as a “higher-skilled professional”, whereas a welder would usually be classified as a “skilled manual worker”. People’s occupations are usually measured with occupational classification schemes, the most widely used is the International Labour Organization’s (ILO) International Standard Classification of Occupations (ISCO) scheme.1 This scheme comes in different versions reflecting the years they were adopted: ISCO-68, ISCO-88, and ISCO-08.

There are of course some people who’s occupation is being self-employed – they run their own businesses, which can be a small one-person business (e.g., a shop) but it can also be a medium-sized company with 500 employees. Obviously, this has effects on their class membership: A small shop owner would often be considered to be a member of the “petite bourgeoisie”, while someone who owns a larger company might be considered a “capital owner”.

What information do you need, and where do you get it?

Class is an individual-level variable: A person can be a member of the working class, but a country cannot. This means that we use individual-level data – survey data – to measure class. Such survey data need to contain three pieces of information (variables) that reflect people’s class membership:

  1. Their occupation. This needs to be measured at the highest level of detail, meaning with the four-digit ISCO-88 or ISCO-08 scheme.
  2. Whether or not they are self-employed.
  3. If they are self-employed, how many employees they have.

Many survey datasets contain this information in some form, but it is usually easiest to use either data from large and well-known comparative social survey projects like the International Social Survey Project (ISSP) or the European Social Survey (ESS).2 Both are free to use (but you do need to register as a user). Many national survey projects also contain that information, but occupation is often coded based on the ISCO scheme but based on national occupational classification schemes (e.g., ANZSCO for Australia and New Zealand or SOC for the United States). These can be translated to the ISCO scheme with specific conversion tables, but this often takes quite a bit of time and effort.

Technically speaking, applying a class scheme to survey data is quite a bit of work because you need to go over a long list of occupations – the four-digit ISCO08 scheme contains 473 different occupations – and decide which class they belong to. Following this, you have to write code to group all the different observations in your dataset into their classes. Obviously, this would take a lot of time.

Fortunately, people have written packages for R that make this a quick and (normally) easy thing to do. Two relevant packages are the DIGCLASS package, which was developed by researchers at the EU, and the occupar package.3

The rest of this tutorial shows how you can measure people’s class with data from the ISSP and using the DIGCLASS package for R. Most of this also applies if you work with data from the ESS, but some data import and cleaning steps might be different. Below is an example of how your dataset needs to look like that you can use to guide your data cleaning and preparation when you work with the ESS.

Installing the DIGCLASS package

The DIGCLASS package is not on CRAN (the official R “app store”), but you can install it with the remotes-package (which you need to have installed first, of course):

# install.packages("remotes")
remotes::install_git("https://code.europa.eu/digclass/digclass.git")

Next, we load the package with library(), in addition to the tidyverse package:

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(DIGCLASS)
theme_set(theme_classic())

Getting ISSP data

In this tutorial, we will work with data from the 2016 Role of Government round of the ISSP (v. 2.0.0; 19.09.2018), which you can download from the GESIS data repository: https://www.gesis.org/en/issp/data-and-documentation/role-of-government/2016#c127852. As mentioned earlier, you need to register as a user, but this is free – and also gives you access to many other survey datasets like the Eurobarometer or the European Values Study.

Make sure that you download the data in SPSS (.sav) format and that you store them in the folder that you are working in (ideally your RStudio Project folder).

Importing the dataset

To import the dataset, you can use the read_sav() function from the haven package. Important: Simply import the dataset for now, do not yet convert it with labelled::unlabelled()! I have stored the dataset as issp16.sav, so I need to specify this in my code – you obviously need to use the name that you gave your dataset file:

issp <- haven::read_sav("issp16.sav")

As other large survey datasets, the ISSP dataset is very large and contains almost 400 variables:

dim(issp)
[1] 48720   395

To make things easier for now, we trim the data to the variables we actually need plus one variable (v19) that we can later use as a dependent variable in an example analysis:

issp %>% 
  select(studyno,country, # good to keep these in 
         ISCO08, # ISCO-08 occupational codes
         EMPREL, # Employment relationship, to identify self-employed
         NEMPLOY, # number of employees if self-employed
         v19 # a variable measuring respondents' views on whether the government should spend more on the unemployed
         ) -> issp

Data preparation

The DIGCLASS package expects the data it works with to be in a specific format. If you for example call up the help file for the DIGCLASS::isco08_to_oesch() function with ?DIGCLASS::isco08_to_oesch and scroll down a bit, you see that the function needs three main inputs:

  1. x, which is the four-digit ISCO-08 scores. They need to be stored as text (character)
  2. self_employed, which needs to be a “numeric vector indicating whether each individual is self-employed (1) or an employee (0).”
  3. n_employees, which needs to be a “numeric vector indicating the number of employees under each respondent.”

This means we need to have three variables that correspond exactly to this: ISCO-08 scores as text, a 0/1 dummy indicating whether someone is self-employed, and a variable containing the number of employees for those who are self-employed.

Preparing the ISCO-08 scores

Let’s start the data preparation with ISCO08, and let’s first take a closer look at how it is stored now:

class(issp$ISCO08)
[1] "haven_labelled" "vctrs_vctr"     "double"        

From the result of class(), we see that the variable is stored in a labelled-type format – which is because the dataset was imported with haven – and this is also the case for all the other variables (see the Environment).

To see a bit more clearly how the ISCO08 variable looks like, let’s look at the first few observations:

issp %>% 
  select(ISCO08) %>% 
  slice_head(n = 10) # to get first ten observations
# A tibble: 10 × 1
   ISCO08                                                               
   <dbl+lbl>                                                            
 1 2611 [Lawyers]                                                       
 2 2512 [Software developers]                                           
 3 1212 [Human resource managers]                                       
 4 1439 [Services managers not elsewhere classified]                    
 5 4419 [Clerical support workers not elsewhere classified]             
 6 1345 [Education managers]                                            
 7 3230 [Traditional and complementary medicine associate professionals]
 8 2654 [Film, stage and related directors and producers]               
 9 2611 [Lawyers]                                                       
10 5131 [Waiters]                                                       

You see that the first observation is a lawyer, which has the ISCO-08 code 2611, the next is a software developer (ISCO-08 code 2512), and so on.

Now comes an important step: We need to convert the ISCO08 variable to a character-type variable – for some reason, the DIGCLASS package expects that the ISCO codes are stored as text (e.g., “2611”, “2512”), and that is what we need to deliver for the package to work.

To do that, we simply use as.character():

issp %>% 
  mutate(isco_nums_as_text = as.character(ISCO08)) -> issp

The new variable should now be a character-type variable:

class(issp$isco_nums_as_text)
[1] "character"

This means that the ISCO-08 scores are taken care off and we can move on to the next piece of information that we need: a 0/1 variable that tells us if people are self-employed.

Self-employment

Information about how people earn their living in general is contained in the EMPREL variable. To see how this looks like, we can quickly tabulate the individual categories:

table(issp$EMPREL)

    1     2     3     4 
33504  4169  1797  1185 

Unfortunately, we only get numbers. This is because the dataset is still stored in the labelled format, and we can quickly fix this by using unlabelled():

issp <- labelled::unlabelled(issp)

Now the tabulation should work as intended:

table(issp$EMPREL)

NAP (Code 3 in WORK; NZ: Code 2-9 MAINSTAT) 
                                          0 
                                   Employee 
                                      33504 
            Self-employed without employees 
                                       4169 
               Self-employed with employees 
                                       1797 
          Working for own family's business 
                                       1185 
                                  No answer 
                                          0 

You see that most respondents fall into the “Employee” category, but there are also people who are self-employed with and without employees. Some also work in a family business. Finally, there are some empty categories that have no observations, but we ignore them for now.

All we really need to do is to re-code this variable into a 0/1 dummy that is equal to 1 if people are self-employed and 0 otherwise. Here, we can use the case_match() function, which is simply put a more advanced version of if_else():

issp %>% 
  mutate(selfemp = case_match(EMPREL,
                              c("Self-employed without employees",
                                "Self-employed with employees") ~ 1,
                              c("Employee","Working for own family's business") ~ 0,
                              .default = NA)) -> issp

Maybe you can already see that we are here telling R to create a new variable called selfemp that is 1 if the EMPREL variable is either “Self-employed without employees” or “Self-employed with employees” and 0 otherwise. To make sure that observations that do not fit either of these conditions are excluded, we specify .default = NA.

We can do a quick cross-tabulation to see if the re-coding worked as intended:

table(issp$EMPREL,issp$selfemp)
                                             
                                                  0     1
  NAP (Code 3 in WORK; NZ: Code 2-9 MAINSTAT)     0     0
  Employee                                    33504     0
  Self-employed without employees                 0  4169
  Self-employed with employees                    0  1797
  Working for own family's business            1185     0
  No answer                                       0     0

It looks like things did work: the self-employed are coded as 1, all others are 0.

Number of employees

The final variable we need is how many employees those respondents who are self-employed have. Here, we can use the NEMPLOY variable from the ISSP dataset, but let’s again begin by simply checking what type this variable is:

class(issp$NEMPLOY)
[1] "numeric"

The variable is already numeric, which means we do not really have to do anything with it – it is good to go. But we can nevertheless quickly visualize it to see how it is distributed:

issp %>% 
  ggplot(aes(x = NEMPLOY)) +
    geom_histogram(color = "white")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 46996 rows containing non-finite outside the scale range
(`stat_bin()`).

There are a few extreme outliers which make it difficult to see anything. We can get a clearer picture by removing those with more than 100 employees from the graph (obviously, we only do this for the graph!):

issp %>% 
  filter(NEMPLOY<100) %>% 
  ggplot(aes(x = NEMPLOY)) +
    geom_histogram(color = "white")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Most respondents who have employees have only relatively small businesses with less than 25 employees.

Generating a class variable

We now have all pieces of information we need and can get to the class variable. Let’s start by generating two of Daniel Oesch’s (2006) class schemes, the very simple one with five classes and the more advanced one with eight classes. Each can be generated with the isco08_to_oesch() function. The following code shows how to create both class schemes at once:

issp %>% 
  mutate(oesch_5 = DIGCLASS::isco08_to_oesch(x = isco_nums_as_text,
                                             self_employed = selfemp,
                                             n_employees = NEMPLOY,
                                             n_classes = 5,
                                             label = T,
                                             to_factor = F),
         oesch_8 = DIGCLASS::isco08_to_oesch(x = isco_nums_as_text,
                                             self_employed = selfemp,
                                             n_employees = NEMPLOY,
                                             n_classes = 8,
                                             label = T,
                                             to_factor = F)) -> issp
ℹ ISCO variable has occupations with digits less than 4. Converting to 4 digits.
• Converted `110` to `0110`
• Converted `310` to `0310`
• Converted `210` to `0210`
ℹ ISCO variable has occupations with digits less than 4. Converting to 4 digits.
• Converted `110` to `0110`
• Converted `310` to `0310`
• Converted `210` to `0210`

Let’s have a look at the results:

table(issp$oesch_5)

'Higher-grade service class'  'Lower-grade service class' 
                        6374                         7058 
           'Skilled workers'      'Small business owners' 
                       12478                         1117 
         'Unskilled workers' 
                        7558 
table(issp$oesch_8)

                           '(Associate) managers' 
                                             5541 
                                         'Clerks' 
                                             4043 
                             'Production workers' 
                                             8309 
'Self-employed professionals and large employers' 
                                              493 
                                'Service workers' 
                                             7684 
                          'Small business owners' 
                                             1117 
            'Socio-cultural (semi-)professionals' 
                                             4667 
                 'Technical (semi-)professionals' 
                                             2731 

And we have what we want: Two class schemes, one simpler and the other a bit more detailed. The second one is used by for example Gingrich (2017) or Schwander & Häusermann (2013).4

Example analysis

Let’s say we wanted to find out if people’s class has an effect on how they think about the welfare state, specifically whether the government should do more to support the unemployed. As mentioned earlier, the ISSP includes a variable that measures these attitudes and which looks like this:

class(issp$v19)
[1] "factor"
table(issp$v19)

             NAV (PH)       Spend much more            Spend more 
                    0                  7209                 12422 
Spend the same as now            Spend less       Spend much less 
                16760                  6546                  2390 
         Can't choose             No answer 
                    0                     0 

The variable is stored as a factor (i.e., as a categorical variable), but it has five categories – so we can, sort of, get away with treating it as if it were numeric (this is what Thewissen and Rueda 2019 also do). To be able to do that, we first have to check how it looks internally and then convert it:

bst290::visfactor(dataset = issp,
                  variable = "v19")
 values                labels
      1              NAV (PH)
      2       Spend much more
      3            Spend more
      4 Spend the same as now
      5            Spend less
      6       Spend much less
      7          Can't choose
      8             No answer

There is a bit of a divergence between values and labels – the NAV (PH) category is empty (see above), which means the lowest actual category has the value of 2 and so on. We can fix this by simply using droplevels() to get rid of empty categories and then as.numeric().

One thing we need to pay attention to is that, right now, lower scores correspond to more support for government aid to the unemployed. This is a bit strange to work with, so we reverse the scale of the new variable by subtracting it from 6 (so that the score of 1 becomes 6-1 = 5, 2 becomes 6-2 = 4, and so on:)

issp %>% 
  mutate(v19 = droplevels(v19),
         unemspend = 6 - as.numeric(v19)) -> issp

The new numeric variable has values from 1 to 5, which is what we want:

table(issp$unemspend)

    1     2     3     4     5 
 2390  6546 16760 12422  7209 

Let’s now see to class influences attitudes toward help for the unemployed in Sweden (it is important to focus on one country alone, otherwise a simple linear regression model will give wrong results!):

issp %>% 
  filter(country == "SE-Sweden") -> swe_data

mod1 <- lm(unemspend ~ oesch_5,
           data = swe_data)
summary(mod1)

Call:
lm(formula = unemspend ~ oesch_5, data = swe_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.38053 -0.38053  0.03929  0.43478  2.43478 

Coefficients:
                                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)                         2.80315    0.05211  53.793  < 2e-16 ***
oesch_5'Lower-grade service class'  0.15756    0.07196   2.190   0.0288 *  
oesch_5'Skilled workers'            0.50707    0.07234   7.010 4.56e-12 ***
oesch_5'Small business owners'     -0.23793    0.18084  -1.316   0.1886    
oesch_5'Unskilled workers'          0.57738    0.09391   6.148 1.16e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8305 on 939 degrees of freedom
  (196 observations deleted due to missingness)
Multiple R-squared:  0.07685,   Adjusted R-squared:  0.07291 
F-statistic: 19.54 on 4 and 939 DF,  p-value: 1.842e-15

As always, one category (here: “Higher-grade service class”) is omitted from the model and the coefficients show us the difference from each other class to the omitted one. This means that all classes except for small business owners are significantly more supportive of government help for the unemployed than the higher-grade service class.

To get a better sense, we can use prediction::prediction_summary() to get predicted support scores per class based on the model:

prediction::prediction_summary(model = mod1,
                               at = list(oesch_5 = unique(na.omit(swe_data$oesch_5))))
                  at(oesch_5) Prediction      SE     z         p lower upper
            'Skilled workers'      3.310 0.05017 65.98 0.000e+00 3.212 3.409
 'Higher-grade service class'      2.803 0.05211 53.79 0.000e+00 2.701 2.905
  'Lower-grade service class'      2.961 0.04963 59.65 0.000e+00 2.863 3.058
          'Unskilled workers'      3.381 0.07813 43.27 0.000e+00 3.227 3.534
      'Small business owners'      2.565 0.17317 14.81 1.202e-49 2.226 2.905

We can get an ever better picture of the results if we just visualize the result:

prediction::prediction_summary(model = mod1,
                               at = list(oesch_5 = unique(na.omit(swe_data$oesch_5)))) %>% 
  ggplot(aes(x = Prediction, 
             y = reorder(`at(oesch_5)`,Prediction), 
             xmin = lower, xmax = upper)) +
    geom_point(stat = "identity") +
    geom_linerange() +
    scale_x_continuous(breaks = seq(1,5,1),
                       limits = c(1,5)) +
    labs(x = "Predicted support for government aid to the unemployed",
         y = "Class", caption = "95% confidence intervals")

Note that we use reorder() to arrange the classes from highest to lowest support. Clearly, small business owners in Sweden are least supportive of government help for the unemployed, while unskilled and skilled workers (i.e., the “working class”) are most supportive. Looks like class does still matter in Sweden!

References

Elo, Irma T. 2009. “Social Class Differentials in Health and Mortality: Patterns and Explanations in Comparative Perspective.” Annual Review of Sociology 35 (1): 553–72.
Erikson, Robert, John H Goldthorpe, and Lucienne Portocarero. 1979. “Intergenerational Class Mobility in Three Western European Societies: England, France and Sweden.” The British Journal of Sociology 30 (4): 415–41.
Evans, Geoffrey. 2000. “The Continued Significance of Class Voting.” Annual Review of Political Science 3 (1): 401–17.
Gingrich, Jane. 2017. “A New Progressive Coalition? The European Left in a Time of Change.” The Political Quarterly 88 (1): 39–51.
Häusermann, Silja, Michael Pinggera, Macarena Ares, and Matthias Enggist. 2022. “Class and Social Policy in the Knowledge Economy.” European Journal of Political Research 61 (2): 462–84.
Oesch, Daniel. 2006. Redrawing the Class Map. Stratification and Institutions in Britain, Germany, Sweden and Switzerland. Basingstoke: Palgrave Macmillan.
Schwander, Hanna, and Silja Häusermann. 2013. “Who Is in and Who Is Out? A Risk-Based Conceptualization of Insiders and Outsiders.” Journal of European Social Policy 23 (3): 248–69.
Thewissen, Stefan, and David Rueda. 2019. “Automation and the Welfare State: Technological Change as a Determinant of Redistribution Preferences.” Comparative Political Studies 52 (2): 171–208.

Footnotes

  1. See also https://isco-ilo.netlify.app/en/isco-08/#download-isco-08-material↩︎

  2. See https://issp.org/ and https://www.europeansocialsurvey.org/.↩︎

  3. See https://code.europa.eu/digclass/digclass and https://github.com/DiogoFerrari/occupar.↩︎

  4. Gingrich calls “socio-cultural (semi-) professionals” the “new middle class”, “technical (semi-) professionals”, “clerks”, and “(Associate) managers” are the “old middle class”, “service workers” are the “new working class”, and “Production workers” are the “old working class”. If you wanted, you could use case_match() to re-code the oesch_5 variable into a new and simpler class scheme that corresponds to what Gingrich is using.↩︎