12 HT6 Regression: Interactions
library(tidyverse) # our main collection of functions
library(tidylog) # prints additional output from the tidyverse commands - load after tidyverse
library(haven) # allows us to load .dta (Stata specific) files
library(here) # needed to navigate to folders and files in a project
library(car) # needed for the linear hypothesis testing after a regression
library(margins) # to ease interpretation
12.2 Run a simple regression of thermimmigrant with gender
To run a simple regression, we again use the lm()
command for a linear regression (lm stands for linear regression). As with most datasets in this course, we can take the gender
variable just as it comes from Stata - where the variable is labelled and has a numeric value for its labels. Doing so gives us the correct model:
df %>%
lm(thermimmigrant ~ gender, data = .) %>%
summary
Call:
lm(formula = thermimmigrant ~ gender, data = .)
Residuals:
Min 1Q Median 3Q Max
-45.885 -15.885 4.115 17.679 57.679
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 42.3206 0.9067 46.677 < 2e-16 ***
gender 3.5647 1.2052 2.958 0.00314 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 27.03 on 2046 degrees of freedom
(54 observations deleted due to missingness)
Multiple R-squared: 0.004257, Adjusted R-squared: 0.003771
F-statistic: 8.748 on 1 and 2046 DF, p-value: 0.003136
But now we don’t really know what 0
and 1
correspond to! This is why I’d recommend always converting a categorical variable to a factor
using the as_factor()
command from the haven
package, which we have used to import Stata datasets each week.
Now if we run the same command, then we get the same coefficient but are certain that 0
= Male and 1
= Female. Indeed, naming the variable female
to start with would resolve that problem altogether.
Call:
lm(formula = thermimmigrant ~ gender, data = .)
Residuals:
Min 1Q Median 3Q Max
-45.885 -15.885 4.115 17.679 57.679
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 42.3206 0.9067 46.677 < 2e-16 ***
genderFemale 3.5647 1.2052 2.958 0.00314 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 27.03 on 2046 degrees of freedom
(54 observations deleted due to missingness)
Multiple R-squared: 0.004257, Adjusted R-squared: 0.003771
F-statistic: 8.748 on 1 and 2046 DF, p-value: 0.003136
12.3 Run a simple regression of thermimmigrant with gunown
The problem above becomes clearer yet again when we use the gunown
variable:
df %>%
pull(gunown) %>% # we use pull to extract the column as a vector, which print_labels needs - equiv. to print_labels(df$gunown)
print_labels()
Labels:
value label
-9 Refused
-8 DK
1 Yes
5 No
Now if we were to use the variable “as is,” we’d still get a result - but it would be completely non-sensical because the No
variable has a numeric value of 5
where it should be 0
and -8
and -9
should not have a numeric interpretation at all:
df %>%
lm(thermimmigrant ~ gunown, data = .) %>%
summary
Call:
lm(formula = thermimmigrant ~ gunown, data = .)
Residuals:
Min 1Q Median 3Q Max
-47.421 -17.421 2.579 12.657 62.657
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.8234 1.3805 25.226 < 2e-16 ***
gunown 2.5195 0.3264 7.718 1.84e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 26.71 on 2005 degrees of freedom
(95 observations deleted due to missingness)
Multiple R-squared: 0.02886, Adjusted R-squared: 0.02837
F-statistic: 59.57 on 1 and 2005 DF, p-value: 1.845e-14
We therefore first convert the variable to a factor and the quickly use the distinct()
command to check which observations are actually present:
distinct: removed 2,099 rows (>99%), 3 rows remaining
# A tibble: 3 x 1
gunown
<fct>
1 Yes
2 No
3 <NA>
Now we only have Yes
and No
observations, alongside the missing values! This allows us to estimate a useful model now:
Call:
lm(formula = thermimmigrant ~ gunown, data = .)
Residuals:
Min 1Q Median 3Q Max
-47.421 -17.421 2.579 12.657 62.657
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.343 1.095 34.098 < 2e-16 ***
gunownNo 10.078 1.306 7.718 1.84e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 26.71 on 2005 degrees of freedom
(95 observations deleted due to missingness)
Multiple R-squared: 0.02886, Adjusted R-squared: 0.02837
F-statistic: 59.57 on 1 and 2005 DF, p-value: 1.845e-14
If you wanted to switch the dummy base around, you could use the relevel()
command - you can see that both results are the same!
df %>%
mutate(gunown = as_factor(gunown),
gunown = relevel(gunown, "No")) %>%
lm(thermimmigrant ~ gunown, data = .) %>%
summary
Call:
lm(formula = thermimmigrant ~ gunown, data = .)
Residuals:
Min 1Q Median 3Q Max
-47.421 -17.421 2.579 12.657 62.657
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 47.4207 0.7109 66.703 < 2e-16 ***
gunownYes -10.0778 1.3057 -7.718 1.84e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 26.71 on 2005 degrees of freedom
(95 observations deleted due to missingness)
Multiple R-squared: 0.02886, Adjusted R-squared: 0.02837
F-statistic: 59.57 on 1 and 2005 DF, p-value: 1.845e-14
12.4 Add socio-economic controls
As we now want to add socio-economic controls, we want to make sure we deal with all dummy variables as well! We therefore convert them to factors again!
The income
and education
variables are not ideal - education is maxed out at 17
years (all the hard years of a PhD hidden) and the income
variable is not divided into equal intervals. At this stage, we don’t deal with them further, but we just leave them as their numerical values as both are ordered in an ordinal sense (the numeric values are ordered) - but ideally we would use the full data, if we had it available.
df %>%
mutate(gunown = as_factor(gunown),
gender = as_factor(gender),
orientself = as_factor(orientself)) %>%
lm(thermimmigrant ~ gender + gunown + orientself + children + education + income, data = .) %>%
summary
Call:
lm(formula = thermimmigrant ~ gender + gunown + orientself +
children + education + income, data = .)
Residuals:
Min 1Q Median 3Q Max
-55.446 -17.556 2.042 16.036 63.912
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 35.0699 4.3373 8.086 1.47e-15 ***
genderFemale 3.5826 1.4808 2.419 0.01569 *
gunownNo 9.3904 1.6082 5.839 6.71e-09 ***
orientselfHomosexual 2.8213 5.2348 0.539 0.59002
orientselfBisexual 4.7749 5.6770 0.841 0.40046
children 1.5955 0.5857 2.724 0.00654 **
education 0.3235 0.3321 0.974 0.33006
income -0.2955 0.1482 -1.993 0.04646 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 25.73 on 1230 degrees of freedom
(864 observations deleted due to missingness)
Multiple R-squared: 0.0514, Adjusted R-squared: 0.046
F-statistic: 9.52 on 7 and 1230 DF, p-value: 1.483e-11
We see that sexual orientation and education do not seem to explain variation in the feeling towards immigrants. However, gender, owning a gun, having children and higher incomes seem to explain some variation.
12.5 We now add Religion
Because we already looked at the religion variable a few weeks ago, we know that they are not ordinal - so we need to convert them to a factor (just like all others we did above):
df %>%
mutate(denom = as_factor(denom)) %>%
# same as above
mutate(gunown = as_factor(gunown),
gender = as_factor(gender),
orientself = as_factor(orientself)) %>%
lm(thermimmigrant ~ gender + gunown + orientself + children + education + income + denom, data = .) %>%
summary
Call:
lm(formula = thermimmigrant ~ gender + gunown + orientself +
children + education + income + denom, data = .)
Residuals:
Min 1Q Median 3Q Max
-64.448 -16.167 1.902 15.383 64.825
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.3971 4.3364 7.010 3.94e-12 ***
genderFemale 3.5636 1.4554 2.448 0.01449 *
gunownNo 7.9780 1.5841 5.036 5.46e-07 ***
orientselfHomosexual 2.0767 5.1345 0.404 0.68595
orientselfBisexual 5.4529 5.5565 0.981 0.32661
children 1.4905 0.5740 2.597 0.00953 **
education 0.5165 0.3279 1.575 0.11549
income -0.3722 0.1451 -2.564 0.01046 *
denomCatholic 14.2423 1.7694 8.049 1.96e-15 ***
denomJewish 7.4824 6.4638 1.158 0.24726
denomOther 5.6566 3.6556 1.547 0.12203
denomNone 3.5929 2.0145 1.783 0.07475 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 25.1 on 1224 degrees of freedom
(866 observations deleted due to missingness)
Multiple R-squared: 0.09946, Adjusted R-squared: 0.09137
F-statistic: 12.29 on 11 and 1224 DF, p-value: < 2.2e-16
So we see that compared to our reference cateogry protestant
, only catholics seem to have a clearly more positive view towards immigrants.
We can again use relevel()
to change that reference category e.g. to None
:
df %>%
mutate(denom = as_factor(denom),
denom = relevel(denom, "None")) %>%
# same as above
mutate(gunown = as_factor(gunown),
gender = as_factor(gender),
orientself = as_factor(orientself)) %>%
lm(thermimmigrant ~ gender + gunown + orientself + children + education + income + denom, data = .) %>%
summary
Call:
lm(formula = thermimmigrant ~ gender + gunown + orientself +
children + education + income + denom, data = .)
Residuals:
Min 1Q Median 3Q Max
-64.448 -16.167 1.902 15.383 64.825
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.9900 4.5565 7.460 1.64e-13 ***
genderFemale 3.5636 1.4554 2.448 0.01449 *
gunownNo 7.9780 1.5841 5.036 5.46e-07 ***
orientselfHomosexual 2.0767 5.1345 0.404 0.68595
orientselfBisexual 5.4529 5.5565 0.981 0.32661
children 1.4905 0.5740 2.597 0.00953 **
education 0.5165 0.3279 1.575 0.11549
income -0.3722 0.1451 -2.564 0.01046 *
denomProtestant -3.5929 2.0145 -1.783 0.07475 .
denomCatholic 10.6494 2.2978 4.635 3.96e-06 ***
denomJewish 3.8895 6.6115 0.588 0.55644
denomOther 2.0637 3.9344 0.525 0.60000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 25.1 on 1224 degrees of freedom
(866 observations deleted due to missingness)
Multiple R-squared: 0.09946, Adjusted R-squared: 0.09137
F-statistic: 12.29 on 11 and 1224 DF, p-value: < 2.2e-16
12.6 What if we add ethnicity?
Do you think being catholic really makes are more favourable to immigration? What if you add ethnicity to the model?
We keep our base as having no denomimation and now add ethnicity and define it’s base as “White” (default is “Black”):
df %>%
mutate(denom = as_factor(denom),
denom = relevel(denom, "None"),
ethnic = as_factor(ethnic),
ethnic = relevel(ethnic, "White")) %>%
# same as above
mutate(gunown = as_factor(gunown),
gender = as_factor(gender),
orientself = as_factor(orientself)) %>%
lm(thermimmigrant ~ gender + gunown + orientself + children + education + income + denom + ethnic, data = .) %>%
summary
Call:
lm(formula = thermimmigrant ~ gender + gunown + orientself +
children + education + income + denom + ethnic, data = .)
Residuals:
Min 1Q Median 3Q Max
-74.012 -15.065 1.249 15.746 69.618
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.2940 4.3705 4.415 1.10e-05 ***
genderFemale 3.5302 1.3516 2.612 0.009117 **
gunownNo 3.7857 1.5099 2.507 0.012295 *
orientselfHomosexual 1.8841 4.8679 0.387 0.698793
orientselfBisexual 8.3188 5.1677 1.610 0.107709
children 0.3179 0.5384 0.590 0.555013
education 1.2398 0.3102 3.996 6.83e-05 ***
income -0.1870 0.1372 -1.363 0.173058
denomProtestant -5.7132 1.9295 -2.961 0.003127 **
denomCatholic 1.7981 2.2444 0.801 0.423186
denomJewish 4.6333 6.1296 0.756 0.449856
denomOther -2.6382 3.7085 -0.711 0.476974
ethnicBlack 15.2430 1.9371 7.869 7.88e-15 ***
ethnicAsian 7.9853 5.6650 1.410 0.158918
ethnicNative American 17.7684 8.2807 2.146 0.032090 *
ethnicHispanic or Latino 26.6278 1.9417 13.714 < 2e-16 ***
ethnicOther, specify 18.6619 5.1948 3.592 0.000341 ***
ethnicMultiracial 11.9962 4.3617 2.750 0.006042 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 23.24 on 1214 degrees of freedom
(870 observations deleted due to missingness)
Multiple R-squared: 0.2333, Adjusted R-squared: 0.2226
F-statistic: 21.74 on 17 and 1214 DF, p-value: < 2.2e-16
This shows a bit more clearly, that catholics do not seem to have a better view towards immigrants - only protestants have a generally more negative view of immigrants than people without a denomination.
With regard to ethnicity, however, we see that Hispanics, Other ethnicities, African Americans and people of multiracial ethnicity have all more positive views of immigrants.
12.7 Interaction of Female and gunowner
We start out as we always do, but note that in the regression equation in lm()
, we add a term gender:gunown
, which creates our interaction variable! We could keep the gender + gunown
terms as well, but we don’t have to - so let’s remove them because then we get a bit more output for each possible case (Note: one will be NA
because this is going to be perfectly collinear with the intercept).
Because we will need the model below for a hypothesis test, we save the model first as interaction_model
!
df %>%
mutate(denom = as_factor(denom),
ethnic = as_factor(ethnic)) %>%
# same as above
mutate(gunown = as_factor(gunown),
gunwon = relevel(gunown, "No"),
gender = as_factor(gender),
orientself = as_factor(orientself)) %>%
lm(thermimmigrant ~ orientself + children + education +
income + denom + ethnic + gender:gunown, data = .) %>%
summary
Call:
lm(formula = thermimmigrant ~ orientself + children + education +
income + denom + ethnic + gender:gunown, data = .)
Residuals:
Min 1Q Median 3Q Max
-74.219 -14.960 1.268 15.648 69.182
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.3433 3.9489 9.203 < 2e-16 ***
orientselfHomosexual 2.0051 4.8770 0.411 0.681043
orientselfBisexual 8.3287 5.1695 1.611 0.107412
children 0.3183 0.5385 0.591 0.554655
education 1.2395 0.3103 3.994 6.89e-05 ***
income -0.1892 0.1373 -1.378 0.168457
denomCatholic 7.5161 1.8508 4.061 5.20e-05 ***
denomJewish 10.3125 6.0137 1.715 0.086633 .
denomOther 3.0904 3.4604 0.893 0.371992
denomNone 5.6750 1.9321 2.937 0.003374 **
ethnicAsian -7.2551 5.8369 -1.243 0.214118
ethnicNative American 2.4607 8.4027 0.293 0.769687
ethnicHispanic or Latino 11.4041 2.3540 4.844 1.43e-06 ***
ethnicWhite -15.2127 1.9390 -7.846 9.39e-15 ***
ethnicOther, specify 3.4573 5.3298 0.649 0.516671
ethnicMultiracial -3.1859 4.5985 -0.693 0.488562
genderMale:gunownYes -7.0825 2.0568 -3.443 0.000594 ***
genderFemale:gunownYes -4.4006 2.0390 -2.158 0.031111 *
genderMale:gunownNo -3.9520 1.6466 -2.400 0.016539 *
genderFemale:gunownNo NA NA NA NA
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 23.25 on 1213 degrees of freedom
(870 observations deleted due to missingness)
Multiple R-squared: 0.2335, Adjusted R-squared: 0.2221
F-statistic: 20.53 on 18 and 1213 DF, p-value: < 2.2e-16
The interpretation goes as follows: Our base term is always genderFemale:gunownNo
, so we compare each other coefficient to Women who do not own guns.
- Women who own guns have a
-4.4
unit less favourable view towards immigrants than Women who do not own guns. - Men who own guns have a
-7
unit less favourable view towards immigrants than Women who do not own guns. - Men who do not own guns have a
3.95
unit less favourable view towards immigrants than Women who do not own guns.
All of these terms are significant at the 5% significance level, but only one is significant at the 1% significance level.
12.7.1 Testing this
We can now formally test some hypotheses using an F-test. We can do this using the linearHypothesis()
command from the car
package, which we loaded above.
The first option for this command is the model we want to test - here we can simply pipe %>%
our model to the linearHypothesis()
command. Now we need to insert some arguments to the hypothesis.matrix
.
To illustrate two different ways we could use those hypothesis tests, we will save two models quickly: the interaction_model
and the interaction_model_full
. Both models are identical, apart from how they present their results.
The interaction_model
uses the variable logic of \(x_1 + x_2 + x_1*x_2\) while interaction_model_full uses the special :
operator and adds all different combinations of interactions to the model. Compare the following two outputs:
interaction_model
summary(interaction_model)
Call:
lm(formula = thermimmigrant ~ gender + gunown + orientself +
children + education + income + denom + ethnic + gender:gunown,
data = .)
Residuals:
Min 1Q Median 3Q Max
-74.219 -14.960 1.268 15.648 69.182
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 32.3913 3.9603 8.179 7.15e-16 ***
genderFemale 3.9520 1.6466 2.400 0.01654 *
gunownYes -3.1305 2.1004 -1.490 0.13637
orientselfHomosexual 2.0051 4.8770 0.411 0.68104
orientselfBisexual 8.3287 5.1695 1.611 0.10741
children 0.3183 0.5385 0.591 0.55466
education 1.2395 0.3103 3.994 6.89e-05 ***
income -0.1892 0.1373 -1.378 0.16846
denomCatholic 7.5161 1.8508 4.061 5.20e-05 ***
denomJewish 10.3125 6.0137 1.715 0.08663 .
denomOther 3.0904 3.4604 0.893 0.37199
denomNone 5.6750 1.9321 2.937 0.00337 **
ethnicAsian -7.2551 5.8369 -1.243 0.21412
ethnicNative American 2.4607 8.4027 0.293 0.76969
ethnicHispanic or Latino 11.4041 2.3540 4.844 1.43e-06 ***
ethnicWhite -15.2127 1.9390 -7.846 9.39e-15 ***
ethnicOther, specify 3.4573 5.3298 0.649 0.51667
ethnicMultiracial -3.1859 4.5985 -0.693 0.48856
genderFemale:gunownYes -1.2701 2.8295 -0.449 0.65360
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 23.25 on 1213 degrees of freedom
(870 observations deleted due to missingness)
Multiple R-squared: 0.2335, Adjusted R-squared: 0.2221
F-statistic: 20.53 on 18 and 1213 DF, p-value: < 2.2e-16
interaction_model_full
summary(interaction_model_full)
Call:
lm(formula = thermimmigrant ~ orientself + children + education +
income + denom + ethnic + gender:gunown, data = .)
Residuals:
Min 1Q Median 3Q Max
-74.219 -14.960 1.268 15.648 69.182
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 31.9427 4.2872 7.451 1.76e-13 ***
orientselfHomosexual 2.0051 4.8770 0.411 0.68104
orientselfBisexual 8.3287 5.1695 1.611 0.10741
children 0.3183 0.5385 0.591 0.55466
education 1.2395 0.3103 3.994 6.89e-05 ***
income -0.1892 0.1373 -1.378 0.16846
denomCatholic 7.5161 1.8508 4.061 5.20e-05 ***
denomJewish 10.3125 6.0137 1.715 0.08663 .
denomOther 3.0904 3.4604 0.893 0.37199
denomNone 5.6750 1.9321 2.937 0.00337 **
ethnicAsian -7.2551 5.8369 -1.243 0.21412
ethnicNative American 2.4607 8.4027 0.293 0.76969
ethnicHispanic or Latino 11.4041 2.3540 4.844 1.43e-06 ***
ethnicWhite -15.2127 1.9390 -7.846 9.39e-15 ***
ethnicOther, specify 3.4573 5.3298 0.649 0.51667
ethnicMultiracial -3.1859 4.5985 -0.693 0.48856
genderMale:gunownNo 0.4486 2.1079 0.213 0.83150
genderFemale:gunownNo 4.4006 2.0390 2.158 0.03111 *
genderMale:gunownYes -2.6819 2.3236 -1.154 0.24864
genderFemale:gunownYes NA NA NA NA
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 23.25 on 1213 degrees of freedom
(870 observations deleted due to missingness)
Multiple R-squared: 0.2335, Adjusted R-squared: 0.2221
F-statistic: 20.53 on 18 and 1213 DF, p-value: < 2.2e-16
Personally, I find the second output of the interaction_model_full
more easy to interpret.
Now we can test all different effects:
12.7.2 What is the total effect for a female gunowner?
Now we need to insert some arguments to the hypothesis.matrix
.
We want to test for a difference between e.g. a woman who owns a gun with e.g. a man who does not own a gun (our base).
To do this using our interaction_model
, we need to use a vector that describes the situation of being a female and owning a gun. We then set this equation to 0 using = 0
because the case where all of these effects are 0 would coincide with a man not owning a gun (our base). This would look like this: "1.genderFemale + 1.gunownYes + 1.genderFemale:gunownYes = 0"
where now we set the genderFemale
to 1 to indicate a woman, the gunown
to 1 to indicate her owning a gun. The interaction then of course will also be 1:
interaction_model %>%
linearHypothesis(hypothesis.matrix = "1.genderFemale + 1.gunownYes + 1.genderFemale:gunownYes = 0")
Linear hypothesis test
Hypothesis:
genderFemale + gunownYes + genderFemale:gunownYes = 0
Model 1: restricted model
Model 2: thermimmigrant ~ gender + gunown + orientself + children + education +
income + denom + ethnic + gender:gunown
Res.Df RSS Df Sum of Sq F Pr(>F)
1 1214 655705
2 1213 655680 1 24.482 0.0453 0.8315
Alternatively, we can use the interaction_model_full
model and test explicitly whether the genderMale:gunownNo
is different to genderFemale:gunownYes
, which is equivalent to the intercept (i.e. identical to 1). This gives us a much easier formula: "genderMale:gunownNo = 0"
. Note: we need to use the option singluar.ok=TRUE
, because one coefficient in interaction_model_full
is not defined (as it is collinear with the intercept):
interaction_model_full %>%
linearHypothesis(singular.ok = TRUE,
hypothesis.matrix = "genderMale:gunownNo = 0")
Linear hypothesis test
Hypothesis:
genderMale:gunownNo = 0
Model 1: restricted model
Model 2: thermimmigrant ~ orientself + children + education + income +
denom + ethnic + gender:gunown
Res.Df RSS Df Sum of Sq F Pr(>F)
1 1214 655705
2 1213 655680 1 24.482 0.0453 0.8315
As you can see both outcomes are equivalent: Our linear hypothesis test now tests a Model 1, which is our restricted model, to the model we calculated just before using an F-test (using the F-distribution). Becase our p-value (here called Pr(>F)
) is \(p>0.05\), we cannot reject the \(H_0\) of a difference between those two models. We therefore conclude that Women with a gun do not express different values towards immigration than men without a gun.
12.7.3 What is the total effect for male gunowners?
Again we compare this with our base category: Men who do not own guns.
interaction_model %>%
linearHypothesis(hypothesis.matrix = "0.genderFemale + 1.gunownYes + 0.genderFemale:gunownYes = 0")
Linear hypothesis test
Hypothesis:
gunownYes = 0
Model 1: restricted model
Model 2: thermimmigrant ~ gender + gunown + orientself + children + education +
income + denom + ethnic + gender:gunown
Res.Df RSS Df Sum of Sq F Pr(>F)
1 1214 656881
2 1213 655680 1 1200.8 2.2214 0.1364
Or we could write this more succintly, using just "1.gunownYes = 0"
beacuse everything else is 0 by default (and Male
is 0).
Of course using interaction_model_full
we can use this:
interaction_model_full %>%
linearHypothesis(singular.ok = TRUE,
hypothesis.matrix = "genderMale:gunownYes = genderMale:gunownNo")
Linear hypothesis test
Hypothesis:
- genderMale:gunownNo + genderMale:gunownYes = 0
Model 1: restricted model
Model 2: thermimmigrant ~ orientself + children + education + income +
denom + ethnic + gender:gunown
Res.Df RSS Df Sum of Sq F Pr(>F)
1 1214 656881
2 1213 655680 1 1200.8 2.2214 0.1364
12.7.4 What is the total effect for female non-gunowner?
Again we compare this with our base category: Men who do not own guns. Now using this more succintly:
interaction_model %>%
linearHypothesis(hypothesis.matrix = "1.genderFemale = 0")
Linear hypothesis test
Hypothesis:
genderFemale = 0
Model 1: restricted model
Model 2: thermimmigrant ~ gender + gunown + orientself + children + education +
income + denom + ethnic + gender:gunown
Res.Df RSS Df Sum of Sq F Pr(>F)
1 1214 658794
2 1213 655680 1 3113.9 5.7606 0.01654 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
12.8 Interaction between education and gunowners
Despite what we mentioned above, we treat education as continuous here.
We follow the same logic as before:
df %>%
mutate(denom = as_factor(denom),
ethnic = as_factor(ethnic)) %>%
# same as above
mutate(gunown = as_factor(gunown),
gunown = relevel(gunown, "No"),
gender = as_factor(gender),
orientself = as_factor(orientself)) %>%
lm(thermimmigrant ~ gender + orientself + children + education + gunown +
income + denom + ethnic + education:gunown, data = .) %>%
summary
Call:
lm(formula = thermimmigrant ~ gender + orientself + children +
education + gunown + income + denom + ethnic + education:gunown,
data = .)
Residuals:
Min 1Q Median 3Q Max
-73.701 -15.181 1.362 15.878 69.247
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.7645 4.4578 7.574 7.13e-14 ***
genderFemale 3.5374 1.3520 2.616 0.00900 **
orientselfHomosexual 1.9894 4.8731 0.408 0.68317
orientselfBisexual 8.3078 5.1692 1.607 0.10828
children 0.3217 0.5386 0.597 0.55043
education 1.1498 0.3509 3.277 0.00108 **
gunownYes -8.4467 8.6223 -0.980 0.32746
income -0.1886 0.1372 -1.374 0.16957
denomCatholic 7.5464 1.8518 4.075 4.90e-05 ***
denomJewish 10.6087 6.0319 1.759 0.07887 .
denomOther 3.1743 3.4648 0.916 0.35976
denomNone 5.7339 1.9305 2.970 0.00303 **
ethnicAsian -7.0251 5.8520 -1.200 0.23020
ethnicNative American 2.5568 8.4013 0.304 0.76092
ethnicHispanic or Latino 11.3214 2.3564 4.805 1.74e-06 ***
ethnicWhite -15.2009 1.9392 -7.839 9.90e-15 ***
ethnicOther, specify 3.4081 5.3290 0.640 0.52259
ethnicMultiracial -3.1475 4.5998 -0.684 0.49394
education:gunownYes 0.3465 0.6311 0.549 0.58306
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 23.25 on 1213 degrees of freedom
(870 observations deleted due to missingness)
Multiple R-squared: 0.2335, Adjusted R-squared: 0.2222
F-statistic: 20.53 on 18 and 1213 DF, p-value: < 2.2e-16
We can now interpret this using the margins
package (loaded above), which is actually very similar to the margins
command in Stata. For this, we need to save the modified data and the model separately:
df %>%
mutate(denom = as_factor(denom),
ethnic = as_factor(ethnic)) %>%
# same as above
mutate(gunown = as_factor(gunown),
gunown = relevel(gunown, "No"),
gender = as_factor(gender),
orientself = as_factor(orientself)) -> data_ready
lm(thermimmigrant ~ gender + orientself + children + education + gunown +
income + denom + ethnic + education:gunown, data = data_ready) -> final_model
Now we can use both in the margins()
command. Let’s first get the marginal effect for education:
factor AME SE z p lower upper
education 1.2640 0.3135 4.0325 0.0001 0.6497 1.8784
Now let’s tease out the marginal effect with the interaction (look a the AME
column for Average Marginal Effects):
margins(model = final_model,
data = data_ready,
variables = "education",
at = list(gunown = c("No","Yes"))) %>%
summary()
factor gunown AME SE z p lower upper
education 1.0000 1.1498 0.3509 3.2769 0.0010 0.4621 1.8376
education 2.0000 1.4963 0.5610 2.6675 0.0076 0.3969 2.5958
Here the gunown = 1
and gunown = 2
refers to the order of the list - so 1
here is No
while 2
is Yes. If you remove the summary
command, it makes more sense:
margins(model = final_model,
data = data_ready,
variables = "education",
at = list(gunown = c("No","Yes")))
at(gunown) education
No 1.150
Yes 1.496