9 HT3 Hypothesis Testing: t-tests
For more information on t-tests, see here.
library(tidyverse) # our main collection of functions
library(tidylog) # prints additional output from the tidyverse commands - load after tidyverse
library(haven) # allows us to load .dta (Stata specific) files
library(here) # needed to navigate to folders and files in a project
library(skimr) # allows us to get an overview over the data quickly
9.2 Check the data - what is it about?
df %>%
skim()
Name | Piped data |
Number of rows | 2102 |
Number of columns | 94 |
_______________________ | |
Column type frequency: | |
character | 90 |
numeric | 4 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
region | 0 | 1.00 | 1 | 1 | 0 | 4 | 0 |
gender | 0 | 1.00 | 1 | 1 | 0 | 2 | 0 |
relguide | 15 | 0.99 | 1 | 1 | 0 | 4 | 0 |
pray | 11 | 0.99 | 1 | 1 | 0 | 6 | 0 |
relattend | 5 | 1.00 | 1 | 1 | 0 | 5 | 0 |
denom | 7 | 1.00 | 1 | 1 | 0 | 5 | 0 |
orientself | 39 | 0.98 | 1 | 1 | 0 | 3 | 0 |
orientknow | 42 | 0.98 | 1 | 1 | 0 | 2 | 0 |
age | 38 | 0.98 | 2 | 2 | 0 | 75 | 0 |
marstat | 14 | 0.99 | 1 | 1 | 0 | 6 | 0 |
education | 10 | 1.00 | 1 | 2 | 0 | 18 | 0 |
union | 11 | 0.99 | 1 | 1 | 0 | 2 | 0 |
income | 806 | 0.62 | 1 | 2 | 0 | 25 | 0 |
class | 677 | 0.68 | 1 | 1 | 0 | 3 | 0 |
ethnic | 13 | 0.99 | 2 | 3 | 0 | 7 | 0 |
gunown | 45 | 0.98 | 1 | 1 | 0 | 2 | 0 |
efficacy1a | 1058 | 0.50 | 1 | 1 | 0 | 5 | 0 |
efficacy1b | 1061 | 0.50 | 1 | 1 | 0 | 5 | 0 |
efficacy1c | 1063 | 0.49 | 1 | 1 | 0 | 5 | 0 |
efficacy1d | 1062 | 0.49 | 1 | 1 | 0 | 5 | 0 |
efficacy2a | 1045 | 0.50 | 1 | 1 | 0 | 5 | 0 |
efficacy2b | 1048 | 0.50 | 1 | 1 | 0 | 5 | 0 |
efficacy2c | 1051 | 0.50 | 1 | 1 | 0 | 5 | 0 |
efficacy2d | 1050 | 0.50 | 1 | 1 | 0 | 5 | 0 |
ideology | 617 | 0.71 | 1 | 1 | 0 | 7 | 0 |
partyid3 | 29 | 0.99 | 1 | 1 | 0 | 3 | 0 |
partystrength | 824 | 0.61 | 1 | 1 | 0 | 2 | 0 |
partylean | 1313 | 0.38 | 1 | 1 | 0 | 3 | 0 |
partyid7 | 48 | 0.98 | 1 | 1 | 0 | 7 | 0 |
taxes | 18 | 0.99 | 1 | 1 | 0 | 7 | 0 |
milspend | 24 | 0.99 | 1 | 1 | 0 | 7 | 0 |
otherspend | 41 | 0.98 | 1 | 1 | 0 | 7 | 0 |
socialsec | 23 | 0.99 | 1 | 1 | 0 | 7 | 0 |
gradtax | 44 | 0.98 | 1 | 1 | 0 | 3 | 0 |
servespend | 202 | 0.90 | 1 | 1 | 0 | 7 | 0 |
biggov | 27 | 0.99 | 1 | 1 | 0 | 2 | 0 |
govmarket | 45 | 0.98 | 1 | 1 | 0 | 2 | 0 |
govsize | 42 | 0.98 | 1 | 1 | 0 | 2 | 0 |
cappun | 169 | 0.92 | 1 | 1 | 0 | 4 | 0 |
gunbuy | 23 | 0.99 | 1 | 1 | 0 | 3 | 0 |
gaymarriage | 55 | 0.97 | 1 | 1 | 0 | 4 | 0 |
immigration | 5 | 1.00 | 1 | 1 | 0 | 3 | 0 |
immjobs | 20 | 0.99 | 1 | 1 | 0 | 4 | 0 |
abortion | 1065 | 0.49 | 1 | 1 | 0 | 5 | 0 |
equalopp | 1 | 1.00 | 1 | 1 | 0 | 5 | 0 |
isolationism | 59 | 0.97 | 1 | 1 | 0 | 2 | 0 |
iraq | 61 | 0.97 | 1 | 1 | 0 | 2 | 0 |
torture | 34 | 0.98 | 1 | 1 | 0 | 7 | 0 |
thermrush | 512 | 0.76 | 1 | 3 | 0 | 27 | 0 |
thermdem | 52 | 0.98 | 1 | 3 | 0 | 30 | 0 |
thermgop | 59 | 0.97 | 1 | 3 | 0 | 27 | 0 |
thermbush | 9 | 1.00 | 1 | 3 | 0 | 27 | 0 |
thermobama | 12 | 0.99 | 1 | 3 | 0 | 22 | 0 |
thermmccain | 9 | 1.00 | 1 | 3 | 0 | 27 | 0 |
thermbiden | 239 | 0.89 | 1 | 3 | 0 | 24 | 0 |
thermpalin | 120 | 0.94 | 1 | 3 | 0 | 28 | 0 |
thermclinton | 17 | 0.99 | 1 | 3 | 0 | 22 | 0 |
thermhispanic | 52 | 0.98 | 1 | 3 | 0 | 25 | 0 |
thermfund | 216 | 0.90 | 1 | 3 | 0 | 28 | 0 |
thermcatholic | 56 | 0.97 | 1 | 3 | 0 | 18 | 0 |
thermfem | 130 | 0.94 | 1 | 3 | 0 | 25 | 0 |
thermfed | 42 | 0.98 | 1 | 3 | 0 | 26 | 0 |
thermjews | 102 | 0.95 | 1 | 3 | 0 | 20 | 0 |
thermliberal | 147 | 0.93 | 1 | 3 | 0 | 24 | 0 |
thermmiddle | 37 | 0.98 | 1 | 3 | 0 | 22 | 0 |
thermunion | 81 | 0.96 | 1 | 3 | 0 | 25 | 0 |
thermpoor | 46 | 0.98 | 1 | 3 | 0 | 21 | 0 |
thermmilitary | 25 | 0.99 | 1 | 3 | 0 | 25 | 0 |
thermbig | 39 | 0.98 | 1 | 3 | 0 | 26 | 0 |
thermwelfare | 56 | 0.97 | 1 | 3 | 0 | 25 | 0 |
thermconserv | 112 | 0.95 | 1 | 3 | 0 | 25 | 0 |
thermworking | 16 | 0.99 | 1 | 3 | 0 | 22 | 0 |
thermenviron | 91 | 0.96 | 1 | 3 | 0 | 26 | 0 |
thermscotus | 63 | 0.97 | 1 | 3 | 0 | 26 | 0 |
thermgay | 61 | 0.97 | 1 | 3 | 0 | 26 | 0 |
thermasian | 91 | 0.96 | 1 | 3 | 0 | 19 | 0 |
thermcongress | 48 | 0.98 | 1 | 3 | 0 | 23 | 0 |
thermblack | 47 | 0.98 | 1 | 3 | 0 | 21 | 0 |
thermsouth | 99 | 0.95 | 1 | 3 | 0 | 24 | 0 |
thermimmigrant | 54 | 0.97 | 1 | 3 | 0 | 25 | 0 |
thermrich | 59 | 0.97 | 1 | 3 | 0 | 23 | 0 |
thermwhite | 50 | 0.98 | 1 | 3 | 0 | 18 | 0 |
thermisrael | 126 | 0.94 | 1 | 3 | 0 | 22 | 0 |
thermmuslim | 130 | 0.94 | 1 | 3 | 0 | 21 | 0 |
thermhindu | 260 | 0.88 | 1 | 3 | 0 | 24 | 0 |
thermchristian | 33 | 0.98 | 1 | 3 | 0 | 23 | 0 |
thermatheist | 133 | 0.94 | 1 | 3 | 0 | 24 | 0 |
turnout | 0 | 1.00 | 1 | 1 | 0 | 2 | 0 |
presvote | 538 | 0.74 | 1 | 1 | 0 | 3 | 0 |
housevote | 791 | 0.62 | 1 | 1 | 0 | 4 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
CASE | 0 | 1 | 1165.05 | 672.43 | 1.00 | 578.25 | 1166.50 | 1747.75 | 2323.0 | ▇▇▇▇▇ |
weight | 0 | 1 | 1.00 | 0.75 | 0.17 | 0.42 | 0.74 | 1.31 | 3.7 | ▇▃▂▁▁ |
state | 0 | 1 | 25.35 | 15.28 | 1.00 | 10.00 | 25.00 | 41.00 | 50.0 | ▇▃▃▆▇ |
children | 5 | 1 | 0.78 | 1.21 | 0.00 | 0.00 | 0.00 | 1.00 | 11.0 | ▇▁▁▁▁ |
9.3 Perform a one-sample t-test: test whether the mean of thermmilitary is higher than 50
df %>%
pull(thermmilitary) %>%
t.test(mu = 50)
One Sample t-test
data: .
t = 65.007, df = 2076, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 50
95 percent confidence interval:
79.08470 80.89411
sample estimates:
mean of x
79.98941
Reject the null that people do not have a opinion of the military at the 5% level. So yes, people have a good opinion of the military (largely in favour (above 50)).
9.3.1 Plotting t-tests
As a little extra, I will now also show you how to plot the result of a t-test. We first need to install an additional package:
install.packages("webr")
With this new package, we can simply use the plot
function:

The blue dot here shows us where the t-statistic value of our sample is, compared to our assumed distribution.
9.4 Perform the same test, checking that it is higher that 80, what does it say?
df %>%
pull(thermmilitary) %>%
t.test(mu = 80)
One Sample t-test
data: .
t = -0.02296, df = 2076, p-value = 0.9817
alternative hypothesis: true mean is not equal to 80
95 percent confidence interval:
79.08470 80.89411
sample estimates:
mean of x
79.98941
Now we fail to reject the null. \(H_0: \mu=80\)
Again we plot it:

9.5 Perform a two-sample t-test: Thermfed and Gunown
Is there a difference between the two groups? Admittedly, this is a bit easier without using the pipe logic:
t.test(df$thermfed, df$gunown, paired = FALSE)
We could also use the the exposition %$%
operator from the magrittr
package (but this is rarely used):
Attaching package: 'magrittr'
The following object is masked from 'package:purrr':
set_names
The following object is masked from 'package:tidyr':
extract
df %$%
t.test(thermfed, gunown, paired = FALSE)
Welch Two Sample t-test
data: thermfed and gunown
t = 93.545, df = 2084.1, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
47.16457 49.18446
sample estimates:
mean of x mean of y
52.007767 3.833252
But we can still do it with our known tools:
summarise: now one row and one column, ungrouped
[[1]]
Welch Two Sample t-test
data: thermfed and gunown
t = 93.545, df = 2084.1, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
47.16457 49.18446
sample estimates:
mean of x mean of y
52.007767 3.833252
Yes. People who own guns have a lower opinion of the federal government.
df %>%
summarise(ttest = list(t.test(thermfed, gunown, paired = FALSE))) %>%
pull(ttest) %>%
`[[`(1) %>% # this is needed to extract the ttest result from the list that is outputted
plot()

9.6 Perform a two-sample test: Democratic and Republican Party
Test whether the mean support for the democratic party is the same as the mean support for the republican party
summarise: now one row and one column, ungrouped
[[1]]
Welch Two Sample t-test
data: thermdem and thermgop
t = 22.029, df = 4090.9, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
16.32829 19.51864
sample estimates:
mean of x mean of y
62.76634 44.84288
Reject the null of no difference in support. Average support higher for the democrats
df %>%
summarise(ttest = list(t.test(x = thermdem, y = thermgop, paired = FALSE))) %>%
pull(ttest) %>%
`[[`(1) %>% # this is needed to extract the ttest result from the list that is outputted
plot()

9.7 Now restrict this test to Catholics only, what do you observe?
Find out where we have the information regarding faith:
lapply(df, function(x) attributes(x)$label)
$CASE
[1] "Case ID"
$weight
NULL
$state
NULL
$region
[1] "Region of country"
$gender
[1] "Gender"
$relguide
[1] "Does religion provide guidance in life?"
$pray
[1] "How often does respondent pray?"
$relattend
[1] "How often does respondent attend religious services?"
$denom
[1] "Religious affiliation"
$orientself
[1] "Sexual orientation"
$orientknow
[1] "Know gay, lesbian, bisexual family or friends?"
$age
[1] "Age"
$marstat
[1] "Marital status"
$children
[1] "Number of children under 18 in household"
$education
[1] "Highest grade of school or year of college completed"
$union
[1] "Anyone in household belong to a labor union?"
$income
[1] "Household income"
$class
[1] "Self-identification as working or middle class"
$ethnic
[1] "Racial or ethnic identification"
$gunown
[1] "Does respondent have a gun in his or her home or garage?"
$efficacy1a
[1] "Politics/govt too complicated to understand"
$efficacy1b
[1] "Good understanding of political issues"
$efficacy1c
[1] "Public officials don't care what people like me think"
$efficacy1d
[1] "Have no say about what govt does"
$efficacy2a
[1] "Politics/govt too complicated to understand"
$efficacy2b
[1] "Good understanding of political issues"
$efficacy2c
[1] "How much do public officials care what people like me think"
$efficacy2d
[1] "How much can people like you affect what the government does"
$ideology
[1] "Liberal-conservative self-placement"
$partyid3
[1] "Party self-identification - 3 point scale"
$partystrength
[1] "Strength of party identification"
$partylean
[1] "Party leanings"
$partyid7
[1] "Party self-identification - 7 point scale)"
$taxes
[1] "Reduce deficit by raising taxes"
$milspend
[1] "Reduce deficit by cutting military spending"
$otherspend
[1] "Reduce deficit by cutting nonmilitary spending"
$socialsec
[1] "Invest social security in stocks and bonds"
$gradtax
[1] "Statement best agrees with respondent about graduated tax"
$servespend
[1] "Position on services vs. spending"
$biggov
[1] "Govt bigger because too involved OR bigger problems?"
$govmarket
[1] "Need strong govt for complex problems OR free market?"
$govsize
[1] "Less govt better OR more that govt should be doing"
$cappun
[1] "Favor/oppose death penalty"
$gunbuy
[1] "Should fed govt make it more difficult to buy a gun?"
$gaymarriage
[1] "Position on gay marriage"
$immigration
[1] "How important is controlling illegal immigration?"
$immjobs
[1] "How likely that immigration will take away jobs?"
$abortion
[1] "Abortion - self placement"
$equalopp
[1] "Society should make sure everyone has equal opportunity"
$isolationism
[1] "This country would be better off if we just stayed home"
$iraq
[1] "Was Iraq war worth the cost"
$torture
[1] "Favor-oppose torture for suspected terrorists"
$thermrush
[1] "Feeling Thermometer: Rush Limbaugh"
$thermdem
[1] "Feeling Thermometer: Democratic party"
$thermgop
[1] "Feeling Thermometer: Republican party"
$thermbush
[1] "Feeling Thermometer: George W. Bush"
$thermobama
[1] "Feeling Thermometer: Barack Obama"
$thermmccain
[1] "Feeling Thermometer: John McCain"
$thermbiden
[1] "Feeling Thermometer: Joe Biden"
$thermpalin
[1] "Feeling Thermometer: Sarah Palin"
$thermclinton
[1] "Feeling Thermometer: Hillary Clinton"
$thermhispanic
[1] "Feeling Thermometer: Hispanics"
$thermfund
[1] "Feeling Thermometer: Christian fundamentalists"
$thermcatholic
[1] "Feeling Thermometer: Catholics"
$thermfem
[1] "Feeling Thermometer: Feminists"
$thermfed
[1] "Feeling Thermometer: Federal Government in Washington"
$thermjews
[1] "Feeling Thermometer: Jews"
$thermliberal
[1] "Feeling Thermometer: Liberals"
$thermmiddle
[1] "Feeling Thermometer: Middle class people"
$thermunion
[1] "Feeling Thermometer: Labor unions"
$thermpoor
[1] "Feeling Thermometer: Poor people"
$thermmilitary
[1] "Feeling Thermometer: The military"
$thermbig
[1] "Feeling Thermometer: Big business"
$thermwelfare
[1] "Feeling Thermometer: People on welfare"
$thermconserv
[1] "Feeling Thermometer: Conservatives"
$thermworking
[1] "Feeling Thermometer: Working class people"
$thermenviron
[1] "Feeling Thermometer: Environmentalists"
$thermscotus
[1] "Feeling Thermometer: The Supreme Court of the United States"
$thermgay
[1] "Feeling Thermometer: Gays and lesbians (homosexuals)"
$thermasian
[1] "Feeling Thermometer: Asian Americans"
$thermcongress
[1] "Feeling Thermometer: Congress"
$thermblack
[1] "Feeling Thermometer: Blacks"
$thermsouth
[1] "Feeling Thermometer: Southerners"
$thermimmigrant
[1] "Feeling Thermometer: Illegal immigrants"
$thermrich
[1] "Feeling Thermometer: Rich people"
$thermwhite
[1] "Feeling Thermometer: White people"
$thermisrael
[1] "Feeling Thermometer: Israel"
$thermmuslim
[1] "Feeling Thermometer: Muslims"
$thermhindu
[1] "Feeling Thermometer: Hindus"
$thermchristian
[1] "Feeling Thermometer: Christians"
$thermatheist
[1] "Feeling Thermometer: Atheists"
$turnout
[1] "Did respondent vote in November 2008"
$presvote
[1] "Vote in 2008 presidential election"
$housevote
[1] "Party of respondent's vote for House"
So we see that the variable denom
is “Religious affiliation.” Let’s check which numeric value represents catholics:
print_labels(df$denom)
Labels:
value label
-9 Refused
-8 DK
-1 Not asked
1 Protestant
2 Catholic
3 Jewish
7 Other
9 None
So we know that catholics are denom == 2
. Now we can use this:
df %>%
filter(denom == 2) %>%
summarise(ttest = list(t.test(x = thermdem, y = thermgop, paired = FALSE))) %>%
pull(ttest)
filter: removed 1,625 rows (77%), 477 rows remaining
summarise: now one row and one column, ungrouped
[[1]]
Welch Two Sample t-test
data: thermdem and thermgop
t = 11.11, df = 929, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
14.78362 21.12716
sample estimates:
mean of x mean of y
65.21075 47.25536
We can now rejct the null of no difference in support. Average support higher for the democrats, although stronger support for both camps if look at their individual means, we reject the null of no difference in support. Average support higher for the democrats.

9.8 Additional restrictions
We now add restrictions to owning a gun, attending church more than twice a month (relattend), and being white (ethnic). What do you find?
df %>%
filter(denom == 2 & gunown == 1 & ethnic == 50 & relattend == 1) %>%
summarise(ttest = list(t.test(x = thermdem, y = thermgop, paired = FALSE))) %>%
pull(ttest)
filter: removed 2,089 rows (99%), 13 rows remaining
summarise: now one row and one column, ungrouped
[[1]]
Welch Two Sample t-test
data: thermdem and thermgop
t = 1.1508, df = 20.02, p-value = 0.2634
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-9.479355 32.812688
sample estimates:
mean of x mean of y
56.66667 45.00000
We now fail to reject the null that the mean support are the same - this is mainly due to the high variance due to the low number of observations (see confidence intervals).
