library(tidyverse)
library(texreg)
Are people influenced by the environments they live in?
How much are people’s opinions, behaviors, or perceptions affected by the environments (or “macro-level contexts”) they live in? Many sociologists argue that societal norms shape people’s behavior (e.g., that men do more of the housework in societies with egalitarian gender norms), and political scientists similarly suggest that political institutions influence political attitudes and behavior (e.g., that people participate more in elections in proportional or majoritarian electoral systems).
To test these types of theories, one needs to compare people’s opinions or behavior across contexts with different social norms, political institutions, or other macro-level factors that might have an influence on people. This is usually done with comparative survey data such as data from the European Social Survey, the International Social Survey Program, the Eurobarometer, the OECD Risks that Matter survey, or the World Values Study.1 The big advantage that comparative survey data offer is that they are standardized: The same survey with the exact same questions is conducted in multiple countries at the same time, so that people’s responses to the questions – i.e., their attitudes or behavior – can be directly compared.2 This means that one can use these survey data to find out for macro-level environmental factors influence (or at least are correlated with) people’s individual behaviors and attitudes.
This type of analysis can seem daunting to students but – as always in life – there are easier and more complicated ways of doing this. This post shows you how to do it in the easiest way possible, using R
and techniques that undergraduate students usually learn in their introductory statistics courses: Descriptive statistics and linear regression models (taking inspiration from Blekesaune and Bjørkhaug 2021).
Simply put (TL;DR), you pick a comparative survey dataset that contains relevant variables and covers countries that differ in relevant ways. For example, to study the effects of gender norms on the household division of work, you would find a survey dataset that contains questions about how much time people spend on housework and which covers countries with very egalitarian gender norms and countries with very inegalitarian gender norms. Then you pick one country that has, based on other studies or datasets, egalitarian norms and one that has very inegalitarian norms and you analyze how couples divide household chores between them (see e.g., Iversen and Rosenbluth 2006). Ideally, the countries you pick are otherwise as similar as possible so that you can be more sure that any differences you find are really the result of gender norms and not other factors (see e.g., Ringdal 2018, chap. 9; King, Keohane, and Verba 1994; Landman 2003, chaps. 2–3 for an explanation of good case selection strategies).
Thematically, we stick to the general topic of gender and gender differences, but we do not look at household work. Instead, we look at the political gender gap: How men and women differ in their political opinions (Inglehart and Norris 2000; Iversen and Rosenbluth 2010, 2006). More specifically, we do a simple re-test of the “household bargaining theory” of political gender differences by Iversen & Rosenbluth (2006, 2010).
The Iversen/Rosenbluth hypothesis in the smallest of nutshells
Very (very) simply put, Iversen & Rosenbluth (2006, 2010) argue that women are politically to the left of men, other things equal, but also that this depends on macro-level factors – specifically on how countries’ economies are structured. In countries that have economies that rely strongly on specific skills (think: highly trained craftsmen and -women that are really good at a few specific tasks), this gap should be particularly large. In contrast, in countries that rely more on general skills (think: flexible professionals that can quickly switch between jobs), women and men should be more equal in their political opinions.
Re-analysis using ESS data
We do a new test of this hypothesis using data from the tenth (2018) round of the European Social Survey (ESS).
Out of all the countries covered by this round of the ESS, we select the following two countries based on information we have from Iversen & Rosenbluth (2006), but also other studies (Hall and Soskice 2001; Iversen and Soskice 2001):
- Ireland, which is known to rely strongly on general skills. Here, we expect a small gender gap.
- Norway, which relies on specific skills. Here, we expect a large gender gap.
We use the following micro-level variables from the ESS:
- Left-right ideology (
lrscale
). This is measures people’s general political orientation and is the dependent variable. - Gender (
gndr
; male/female). This is the central independent variable here. - Household income (
hinctnta
): This is a relevant control variable. - Age (
agea
; years): Also a control variable. - Education (
eduyrs
): A final variable we want to control for.
Packages
We use the tidyverse
for data management & visualization and texreg
to present regression results:
Set theme for graphs
The classic theme just looks better…
theme_set(theme_classic())
Data import
You can download the data for free (after a registration) from https://www.europeansocialsurvey.org/. I use the .dta
(Stata) version and saved the dataset as ESS10.dta
on my computer. I use the haven
package to import the dataset, and then immediately convert the dataset to the traditional R
format with labelled::unlabelled
(to be able to do this, you need to have both of these packages installed. Loading them with library()
is not necessary).
<- labelled::unlabelled(haven::read_dta("ESS10.dta")) ess
Trimming
The entire ESS is massive. To make things easier to handle, we select only the relevant variables (plus some useful “administrative” ones such as idno
, essround
, and cntry
):
%>%
ess select(idno,essround,cntry,lrscale,gndr,agea,eduyrs,hinctnta) -> ess
Data cleaning
Household income (hinctnta
) and left-right self-placement (lrscale
) are factors and need to be correctly converted to numeric before we can use them in a regression analysis:
class(ess$hinctnta)
[1] "factor"
class(ess$lrscale)
[1] "factor"
::visfactor(dataset = ess,
bst290variable = "hinctnta") # no label/value divergence, no adjustment needed
values labels
1 J - 1st decile
2 R - 2nd decile
3 C - 3rd decile
4 M - 4th decile
5 F - 5th decile
6 S - 6th decile
7 K - 7th decile
8 P - 8th decile
9 D - 9th decile
10 H - 10th decile
::visfactor(dataset = ess,
bst290variable = "lrscale") # labels/values are off by 1, needs to be adjusted
values labels
1 Left
2 1
3 2
4 3
5 4
6 5
7 6
8 7
9 8
10 9
11 Right
%>%
ess mutate(hhinc = as.numeric(hinctnta),
lrscale = as.numeric(lrscale) - 1) -> ess
Country selection
The final “trimming” operation we need to do is to select only the two countries we want to compare. This is easy to do with filter()
, and we create separate datasets for each of the two countries:
unique(ess$cntry)
[1] "BE" "BG" "CH" "CZ" "EE" "FI" "FR" "GB" "GR" "HR" "HU" "IE" "IS" "IT" "LT"
[16] "ME" "MK" "NL" "NO" "PT" "SI" "SK"
%>%
ess filter(cntry=="NO") -> norway
%>%
ess filter(cntry=="IE") -> ireland
Descriptive analysis of political gender gaps by country
It is good practice to first do a bit of visual analysis to get a sense of how the data look before moving to more complicated statistical analyses. Here, we use a bit of dplyr
(group_by()
& summarize()
) to calculate the political gender gap in each country – how men and women differ, on average, in their ideology – and then visualize the result with a ggplot()
bar graph.
%>%
ireland group_by(gndr) %>%
summarise(avg_lr = mean(lrscale, na.rm = T)) %>%
ggplot(aes(x = gndr, y = avg_lr)) +
geom_bar(stat = "identity") +
geom_text(aes(label = round(avg_lr, digits = 1)), vjust = -.5) +
scale_y_continuous(limits = c(0,6)) +
labs(x = "Gender", y = "Average left-right placement",
caption = "Higher scores = more conservative",
title = "Ireland")
%>%
norway group_by(gndr) %>%
summarise(avg_lr = mean(lrscale, na.rm = T)) %>%
ggplot(aes(x = gndr, y = avg_lr)) +
geom_bar(stat = "identity") +
geom_text(aes(label = round(avg_lr, digits = 1)), vjust = -.5) +
scale_y_continuous(limits = c(0,6)) +
labs(x = "Gender", y = "Average left-right placement",
caption = "Higher scores = more conservative",
title = "Norway")
It looks like the data support the hypothesis. We expected a small ideological gap between men and women in Ireland, and that is what we find: Men and women hardly differ on average in their left-right orientation (5.3 - 5.2 = 0.1). In contrast, this difference is four times as large (5.2 - 4.8 = 0.4), which is what we would have expected.
Regression analysis
While the visual analysis is useful, we also need to do a more thorough test where we control for other variables. To do that, we do a simple linear (OLS) regression analysis separately for each country:
# Baseline model
<- lm(lrscale ~ gndr,
no_mod1 data = norway)
# With controls
<- lm(lrscale ~ gndr + agea + eduyrs + hhinc,
no_mod2 data = norway)
# Baseline model
<- lm(lrscale ~ gndr,
ie_mod1 data = ireland)
# With controls
<- lm(lrscale ~ gndr + agea + eduyrs + hhinc,
ie_mod2 data = ireland)
We use screenreg()
from the texreg
package to show the results directly next to each other so that we can spot differences between the two countries more easily:
screenreg(list(no_mod1,no_mod2,ie_mod1,ie_mod2),
stars = 0.05,
custom.header = list("Norway" = 1:2, "Ireland" = 3:4),
custom.model.names = c("No controls","Controls",
"No controls","Controls"),
custom.coef.map = list("(Intercept)" = "Intercept",
"gndrFemale" = "Female",
"agea" = "Age",
"eduyrs" = "Education (years)",
"hhinc" = "Household income (deciles)"))
=========================================================================
Norway Ireland
---------------------- ---------------------
No controls Controls No controls Controls
-------------------------------------------------------------------------
Intercept 5.21 * 5.41 * 5.26 * 4.61 *
(0.09) (0.35) (0.08) (0.38)
Female -0.41 * -0.38 * -0.02 -0.03
(0.13) (0.13) (0.11) (0.13)
Age 0.01 * 0.02 *
(0.00) (0.00)
Education (years) -0.10 * -0.01
(0.02) (0.02)
Household income (deciles) 0.12 * -0.03
(0.03) (0.03)
-------------------------------------------------------------------------
R^2 0.01 0.04 0.00 0.03
Adj. R^2 0.01 0.04 -0.00 0.02
Num. obs. 1375 1300 1516 993
=========================================================================
* p < 0.05
Women are again significantly more to the left than men in Norway but not in Ireland – which is what Iversen & Rosenbluth would have predicted. These effects are barely affected by the inclusion of controls for age, education, and household income.
Overall, this relatively simple re-test supports the Iversen/Rosenbluth theory of gender differences.
Next steps
You have now seen how you can do a simple cross-country comparative analysis of survey data with R
. Obviously, you can adapt this type of analysis to many different questions so long as you have relevant data. For example, if you have macro-level indicators of how countries’ electoral systems look like (which you do: https://cpds-data.org/) and comparative survey data on people’s electoral behavior (which you can get via the ESS), you can test if rates of participation in election differ between types of electoral systems. The same applies to any combination of macro-level factor and micro-level behavior you can think of and have data for.
Importantly, you may also have noticed that we did not use any form of quantitative data to measure macro-level factors or to pick countries – we simply relied on findings from other studies to select relevant countries.
Finally, there are obviously ways to make this type of analysis more sophisticated. One additional step one can take is to test statistically if the coefficients from regression models are statistically significantly different from each other. Paternoster et al. (1998) have developed a simple formula for this that works basically like a standard two-sample t-test.
The most advanced way to compare survey data from different countries is obviously with a multi-level or hierarchical regression analysis. This is what academic researchers usally use because it multi-level regression models make it possible to use all available data from a comparative survey dataset instead of picking only a small number of countries. This makes it possible to estimate more complicated models and to get more accurate and reliable results. If you want to learn more about this, there is a series of articles that explains these models in a very intuitive and easy fashion (Merlo, Yang, et al. 2005; Merlo, Chaix, et al. 2005a, 2005b; Merlo et al. 2006; see also Steenbergen and Jones 2002), and the book by Finch et al. (2014) explains how you implement these models in R
.
References
Footnotes
See https://github.com/erikgahner/PolData?tab=readme-ov-file#cross-sectional for a list of comparative survey data projects.↩︎
Obviously, questionnaires are translated where languages differ, and there are sometimes cases where some questions are only asked in a subset of all countries that are included in a given survey.↩︎