9 HT3 Hypothesis Testing: t-tests

For more information on t-tests, see here.

library(tidyverse) # our main collection of functions
library(tidylog) # prints additional output from the tidyverse commands - load after tidyverse 
library(haven) # allows us to load .dta (Stata specific) files
library(here) # needed to navigate to folders and files in a project
library(skimr) # allows us to get an overview over the data quickly

9.1 Load the Data

Load the dataset:

df <- read_dta(here("data","anes08.dta"),)

9.2 Check the data - what is it about?

df %>% 
  skim()
Table 9.1: Data summary
Name Piped data
Number of rows 2102
Number of columns 94
_______________________
Column type frequency:
character 90
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
region 0 1.00 1 1 0 4 0
gender 0 1.00 1 1 0 2 0
relguide 15 0.99 1 1 0 4 0
pray 11 0.99 1 1 0 6 0
relattend 5 1.00 1 1 0 5 0
denom 7 1.00 1 1 0 5 0
orientself 39 0.98 1 1 0 3 0
orientknow 42 0.98 1 1 0 2 0
age 38 0.98 2 2 0 75 0
marstat 14 0.99 1 1 0 6 0
education 10 1.00 1 2 0 18 0
union 11 0.99 1 1 0 2 0
income 806 0.62 1 2 0 25 0
class 677 0.68 1 1 0 3 0
ethnic 13 0.99 2 3 0 7 0
gunown 45 0.98 1 1 0 2 0
efficacy1a 1058 0.50 1 1 0 5 0
efficacy1b 1061 0.50 1 1 0 5 0
efficacy1c 1063 0.49 1 1 0 5 0
efficacy1d 1062 0.49 1 1 0 5 0
efficacy2a 1045 0.50 1 1 0 5 0
efficacy2b 1048 0.50 1 1 0 5 0
efficacy2c 1051 0.50 1 1 0 5 0
efficacy2d 1050 0.50 1 1 0 5 0
ideology 617 0.71 1 1 0 7 0
partyid3 29 0.99 1 1 0 3 0
partystrength 824 0.61 1 1 0 2 0
partylean 1313 0.38 1 1 0 3 0
partyid7 48 0.98 1 1 0 7 0
taxes 18 0.99 1 1 0 7 0
milspend 24 0.99 1 1 0 7 0
otherspend 41 0.98 1 1 0 7 0
socialsec 23 0.99 1 1 0 7 0
gradtax 44 0.98 1 1 0 3 0
servespend 202 0.90 1 1 0 7 0
biggov 27 0.99 1 1 0 2 0
govmarket 45 0.98 1 1 0 2 0
govsize 42 0.98 1 1 0 2 0
cappun 169 0.92 1 1 0 4 0
gunbuy 23 0.99 1 1 0 3 0
gaymarriage 55 0.97 1 1 0 4 0
immigration 5 1.00 1 1 0 3 0
immjobs 20 0.99 1 1 0 4 0
abortion 1065 0.49 1 1 0 5 0
equalopp 1 1.00 1 1 0 5 0
isolationism 59 0.97 1 1 0 2 0
iraq 61 0.97 1 1 0 2 0
torture 34 0.98 1 1 0 7 0
thermrush 512 0.76 1 3 0 27 0
thermdem 52 0.98 1 3 0 30 0
thermgop 59 0.97 1 3 0 27 0
thermbush 9 1.00 1 3 0 27 0
thermobama 12 0.99 1 3 0 22 0
thermmccain 9 1.00 1 3 0 27 0
thermbiden 239 0.89 1 3 0 24 0
thermpalin 120 0.94 1 3 0 28 0
thermclinton 17 0.99 1 3 0 22 0
thermhispanic 52 0.98 1 3 0 25 0
thermfund 216 0.90 1 3 0 28 0
thermcatholic 56 0.97 1 3 0 18 0
thermfem 130 0.94 1 3 0 25 0
thermfed 42 0.98 1 3 0 26 0
thermjews 102 0.95 1 3 0 20 0
thermliberal 147 0.93 1 3 0 24 0
thermmiddle 37 0.98 1 3 0 22 0
thermunion 81 0.96 1 3 0 25 0
thermpoor 46 0.98 1 3 0 21 0
thermmilitary 25 0.99 1 3 0 25 0
thermbig 39 0.98 1 3 0 26 0
thermwelfare 56 0.97 1 3 0 25 0
thermconserv 112 0.95 1 3 0 25 0
thermworking 16 0.99 1 3 0 22 0
thermenviron 91 0.96 1 3 0 26 0
thermscotus 63 0.97 1 3 0 26 0
thermgay 61 0.97 1 3 0 26 0
thermasian 91 0.96 1 3 0 19 0
thermcongress 48 0.98 1 3 0 23 0
thermblack 47 0.98 1 3 0 21 0
thermsouth 99 0.95 1 3 0 24 0
thermimmigrant 54 0.97 1 3 0 25 0
thermrich 59 0.97 1 3 0 23 0
thermwhite 50 0.98 1 3 0 18 0
thermisrael 126 0.94 1 3 0 22 0
thermmuslim 130 0.94 1 3 0 21 0
thermhindu 260 0.88 1 3 0 24 0
thermchristian 33 0.98 1 3 0 23 0
thermatheist 133 0.94 1 3 0 24 0
turnout 0 1.00 1 1 0 2 0
presvote 538 0.74 1 1 0 3 0
housevote 791 0.62 1 1 0 4 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
CASE 0 1 1165.05 672.43 1.00 578.25 1166.50 1747.75 2323.0 ▇▇▇▇▇
weight 0 1 1.00 0.75 0.17 0.42 0.74 1.31 3.7 ▇▃▂▁▁
state 0 1 25.35 15.28 1.00 10.00 25.00 41.00 50.0 ▇▃▃▆▇
children 5 1 0.78 1.21 0.00 0.00 0.00 1.00 11.0 ▇▁▁▁▁

9.3 Perform a one-sample t-test: test whether the mean of thermmilitary is higher than 50

df %>% 
  pull(thermmilitary) %>% 
  t.test(mu = 50)

    One Sample t-test

data:  .
t = 65.007, df = 2076, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 50
95 percent confidence interval:
 79.08470 80.89411
sample estimates:
mean of x 
 79.98941 

Reject the null that people do not have a opinion of the military at the 5% level. So yes, people have a good opinion of the military (largely in favour (above 50)).

9.3.1 Plotting t-tests

As a little extra, I will now also show you how to plot the result of a t-test. We first need to install an additional package:

With this new package, we can simply use the plot function:

df %>% 
  pull(thermmilitary) %>% 
  t.test(mu = 50) %>% 
  plot()

The blue dot here shows us where the t-statistic value of our sample is, compared to our assumed distribution.

9.4 Perform the same test, checking that it is higher that 80, what does it say?

df %>% 
  pull(thermmilitary) %>% 
  t.test(mu = 80)

    One Sample t-test

data:  .
t = -0.02296, df = 2076, p-value = 0.9817
alternative hypothesis: true mean is not equal to 80
95 percent confidence interval:
 79.08470 80.89411
sample estimates:
mean of x 
 79.98941 

Now we fail to reject the null. \(H_0: \mu=80\)

Again we plot it:

df %>% 
  pull(thermmilitary) %>% 
  t.test(mu = 80) %>% 
  plot()

9.5 Perform a two-sample t-test: Thermfed and Gunown

Is there a difference between the two groups? Admittedly, this is a bit easier without using the pipe logic:

t.test(df$thermfed, df$gunown, paired = FALSE)

We could also use the the exposition %$% operator from the magrittr package (but this is rarely used):


Attaching package: 'magrittr'
The following object is masked from 'package:purrr':

    set_names
The following object is masked from 'package:tidyr':

    extract
df %$%
  t.test(thermfed, gunown, paired = FALSE)

    Welch Two Sample t-test

data:  thermfed and gunown
t = 93.545, df = 2084.1, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 47.16457 49.18446
sample estimates:
mean of x mean of y 
52.007767  3.833252 

But we can still do it with our known tools:

df %>% 
  summarise(ttest = list(t.test(thermfed, gunown, paired = FALSE))) %>% 
  pull(ttest)
summarise: now one row and one column, ungrouped
[[1]]

    Welch Two Sample t-test

data:  thermfed and gunown
t = 93.545, df = 2084.1, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 47.16457 49.18446
sample estimates:
mean of x mean of y 
52.007767  3.833252 

Yes. People who own guns have a lower opinion of the federal government.

df %>% 
  summarise(ttest = list(t.test(thermfed, gunown, paired = FALSE))) %>% 
  pull(ttest) %>% 
  `[[`(1) %>% # this is needed to extract the ttest result from the list that is outputted
  plot()

9.6 Perform a two-sample test: Democratic and Republican Party

Test whether the mean support for the democratic party is the same as the mean support for the republican party

df %>%
  summarise(ttest = list(t.test(x = thermdem, y = thermgop, paired = FALSE))) %>% 
  pull(ttest)
summarise: now one row and one column, ungrouped
[[1]]

    Welch Two Sample t-test

data:  thermdem and thermgop
t = 22.029, df = 4090.9, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 16.32829 19.51864
sample estimates:
mean of x mean of y 
 62.76634  44.84288 

Reject the null of no difference in support. Average support higher for the democrats

df %>%
  summarise(ttest = list(t.test(x = thermdem, y = thermgop, paired = FALSE))) %>% 
  pull(ttest)  %>% 
  `[[`(1) %>% # this is needed to extract the ttest result from the list that is outputted
  plot()

9.7 Now restrict this test to Catholics only, what do you observe?

Find out where we have the information regarding faith:

lapply(df, function(x) attributes(x)$label)
$CASE
[1] "Case ID"

$weight
NULL

$state
NULL

$region
[1] "Region of country"

$gender
[1] "Gender"

$relguide
[1] "Does religion provide guidance in life?"

$pray
[1] "How often does respondent pray?"

$relattend
[1] "How often does respondent attend religious services?"

$denom
[1] "Religious affiliation"

$orientself
[1] "Sexual orientation"

$orientknow
[1] "Know gay, lesbian, bisexual family or friends?"

$age
[1] "Age"

$marstat
[1] "Marital status"

$children
[1] "Number of children under 18 in household"

$education
[1] "Highest grade of school or year of college completed"

$union
[1] "Anyone in household belong to a labor union?"

$income
[1] "Household income"

$class
[1] "Self-identification as working or middle class"

$ethnic
[1] "Racial or ethnic identification"

$gunown
[1] "Does respondent have a gun in his or her home or garage?"

$efficacy1a
[1] "Politics/govt too complicated to understand"

$efficacy1b
[1] "Good understanding of political issues"

$efficacy1c
[1] "Public officials don't care what people like me think"

$efficacy1d
[1] "Have no say about what govt does"

$efficacy2a
[1] "Politics/govt too complicated to understand"

$efficacy2b
[1] "Good understanding of political issues"

$efficacy2c
[1] "How much do public officials care what people like me think"

$efficacy2d
[1] "How much can people like you affect what the government does"

$ideology
[1] "Liberal-conservative self-placement"

$partyid3
[1] "Party self-identification - 3 point scale"

$partystrength
[1] "Strength of party identification"

$partylean
[1] "Party leanings"

$partyid7
[1] "Party self-identification - 7 point scale)"

$taxes
[1] "Reduce deficit by raising taxes"

$milspend
[1] "Reduce deficit by cutting military spending"

$otherspend
[1] "Reduce deficit by cutting nonmilitary spending"

$socialsec
[1] "Invest social security in stocks and bonds"

$gradtax
[1] "Statement best agrees with respondent about graduated tax"

$servespend
[1] "Position on services vs. spending"

$biggov
[1] "Govt bigger because too involved OR bigger problems?"

$govmarket
[1] "Need strong govt for complex problems OR free market?"

$govsize
[1] "Less govt better OR more that govt should be doing"

$cappun
[1] "Favor/oppose death penalty"

$gunbuy
[1] "Should fed govt make it more difficult to buy a gun?"

$gaymarriage
[1] "Position on gay marriage"

$immigration
[1] "How important is controlling illegal immigration?"

$immjobs
[1] "How likely that immigration will take away jobs?"

$abortion
[1] "Abortion - self placement"

$equalopp
[1] "Society should make sure everyone has equal opportunity"

$isolationism
[1] "This country would be better off if we just stayed home"

$iraq
[1] "Was Iraq war worth the cost"

$torture
[1] "Favor-oppose torture for suspected terrorists"

$thermrush
[1] "Feeling Thermometer: Rush Limbaugh"

$thermdem
[1] "Feeling Thermometer: Democratic party"

$thermgop
[1] "Feeling Thermometer: Republican party"

$thermbush
[1] "Feeling Thermometer: George W. Bush"

$thermobama
[1] "Feeling Thermometer: Barack Obama"

$thermmccain
[1] "Feeling Thermometer: John McCain"

$thermbiden
[1] "Feeling Thermometer: Joe Biden"

$thermpalin
[1] "Feeling Thermometer: Sarah Palin"

$thermclinton
[1] "Feeling Thermometer: Hillary Clinton"

$thermhispanic
[1] "Feeling Thermometer: Hispanics"

$thermfund
[1] "Feeling Thermometer: Christian fundamentalists"

$thermcatholic
[1] "Feeling Thermometer: Catholics"

$thermfem
[1] "Feeling Thermometer: Feminists"

$thermfed
[1] "Feeling Thermometer: Federal Government in Washington"

$thermjews
[1] "Feeling Thermometer: Jews"

$thermliberal
[1] "Feeling Thermometer: Liberals"

$thermmiddle
[1] "Feeling Thermometer: Middle class people"

$thermunion
[1] "Feeling Thermometer: Labor unions"

$thermpoor
[1] "Feeling Thermometer: Poor people"

$thermmilitary
[1] "Feeling Thermometer: The military"

$thermbig
[1] "Feeling Thermometer: Big business"

$thermwelfare
[1] "Feeling Thermometer: People on welfare"

$thermconserv
[1] "Feeling Thermometer: Conservatives"

$thermworking
[1] "Feeling Thermometer: Working class people"

$thermenviron
[1] "Feeling Thermometer: Environmentalists"

$thermscotus
[1] "Feeling Thermometer: The Supreme Court of the United States"

$thermgay
[1] "Feeling Thermometer:  Gays and lesbians (homosexuals)"

$thermasian
[1] "Feeling Thermometer: Asian Americans"

$thermcongress
[1] "Feeling Thermometer: Congress"

$thermblack
[1] "Feeling Thermometer: Blacks"

$thermsouth
[1] "Feeling Thermometer: Southerners"

$thermimmigrant
[1] "Feeling Thermometer: Illegal immigrants"

$thermrich
[1] "Feeling Thermometer: Rich people"

$thermwhite
[1] "Feeling Thermometer: White people"

$thermisrael
[1] "Feeling Thermometer: Israel"

$thermmuslim
[1] "Feeling Thermometer: Muslims"

$thermhindu
[1] "Feeling Thermometer: Hindus"

$thermchristian
[1] "Feeling Thermometer: Christians"

$thermatheist
[1] "Feeling Thermometer: Atheists"

$turnout
[1] "Did respondent vote in November 2008"

$presvote
[1] "Vote in 2008 presidential election"

$housevote
[1] "Party of respondent's vote for House"

So we see that the variable denom is “Religious affiliation.” Let’s check which numeric value represents catholics:

print_labels(df$denom)

Labels:
 value      label
    -9    Refused
    -8         DK
    -1  Not asked
     1 Protestant
     2   Catholic
     3     Jewish
     7      Other
     9       None

So we know that catholics are denom == 2. Now we can use this:

df %>%
  filter(denom == 2) %>% 
  summarise(ttest = list(t.test(x = thermdem, y = thermgop, paired = FALSE))) %>% 
  pull(ttest)
filter: removed 1,625 rows (77%), 477 rows remaining
summarise: now one row and one column, ungrouped
[[1]]

    Welch Two Sample t-test

data:  thermdem and thermgop
t = 11.11, df = 929, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 14.78362 21.12716
sample estimates:
mean of x mean of y 
 65.21075  47.25536 

We can now rejct the null of no difference in support. Average support higher for the democrats, although stronger support for both camps if look at their individual means, we reject the null of no difference in support. Average support higher for the democrats.

9.8 Additional restrictions

We now add restrictions to owning a gun, attending church more than twice a month (relattend), and being white (ethnic). What do you find?

df %>%
  filter(denom == 2 & gunown == 1 & ethnic == 50 & relattend == 1) %>% 
  summarise(ttest = list(t.test(x = thermdem, y = thermgop, paired = FALSE))) %>% 
  pull(ttest)
filter: removed 2,089 rows (99%), 13 rows remaining
summarise: now one row and one column, ungrouped
[[1]]

    Welch Two Sample t-test

data:  thermdem and thermgop
t = 1.1508, df = 20.02, p-value = 0.2634
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -9.479355 32.812688
sample estimates:
mean of x mean of y 
 56.66667  45.00000 

We now fail to reject the null that the mean support are the same - this is mainly due to the high variance due to the low number of observations (see confidence intervals).