10 HT4 Hypothesis Testing: \(\chi^2\) tests

library(tidyverse) # our main collection of functions
library(tidylog) # prints additional output from the tidyverse commands - load after tidyverse 
library(haven) # allows us to load .dta (Stata specific) files
library(here) # needed to navigate to folders and files in a project
library(webr) # allows us to plot the outcome of the chi-squared test
library(magrittr) # to use the exposition operator

10.1 Load the Data

Load the dataset:

df <- read_dta(here("data","PS4.dta"))

10.2 \(\chi^2\) test: Democracy and Institutions

Rather than the normal pipe command, we again use the exposition %$% operator from the magrittr package. We need to do this because the chisq.test() needs a vector input rather than a column input (yes, I know this is annoying in R).

To replicate the result from Stata, we need to turn off the continuity correction that R automatically uses by using the option correct = FALSE. We will not go into this mechanism here.

df %$% 
  chisq.test(x = chga_demo, y = icrg_cat, correct = FALSE)

    Pearson's Chi-squared test

data:  chga_demo and icrg_cat
X-squared = 5.235, df = 1, p-value = 0.02214

We could also stick with our standard pipe, but then the command looks a bit messier:

df %>% 
  summarise(chisq_test = list(chisq.test(x = chga_demo, y = icrg_cat, correct = FALSE))) %>% 
  pull(chisq_test)
summarise: now one row and one column, ungrouped
[[1]]

    Pearson's Chi-squared test

data:  chga_demo and icrg_cat
X-squared = 5.235, df = 1, p-value = 0.02214

Let’s plot this again. We use the normal pipe command after the chisq.test() command again:

df %$% 
  chisq.test(x = chga_demo, y = icrg_cat, correct = FALSE) %>% 
  plot()

Because our estimate is in the red-shaded area (our test statistic value is larger than our critical value of \(\alpha_{0.05}\)), we reject the \(H_0\) of no association.

10.3 \(\chi^2\) test: Democracy and Inequality

We just use a different variable now:

df %$% 
  chisq.test(x = chga_demo, y = gini_cat, correct = FALSE)
Warning in stats::chisq.test(x, y, ...): Chi-squared approximation may be
incorrect

    Pearson's Chi-squared test

data:  chga_demo and gini_cat
X-squared = 0.1069, df = 2, p-value = 0.948

Actually the command gives us a warning that the \(\chi^2\) approximation my be incorrect in this case. The command gave the warning because many of the expected values will be very small and therefore the approximations of p may not be right. We can simply accept this, but we can also take steps to make our estimate more accurate by choosing the option simulate.p.value = TRUE. This now uses a Monte Carlo simulation with 2000 replications to estimate this test (don’t worry about this though).

df %$% 
  chisq.test(x = chga_demo, y = gini_cat, correct = FALSE, simulate.p.value = TRUE)

    Pearson's Chi-squared test with simulated p-value (based on 2000
    replicates)

data:  chga_demo and gini_cat
X-squared = 0.1069, df = NA, p-value = 0.947

Now we don’t get a warning anymore!! This method, however, does not output the degrees of freedom.

We’ll plot the original estimate now though to stay consistent with our Stata estimates. We use the normal pipe command after the chisq.test() command again:

df %$% 
  chisq.test(x = chga_demo, y = gini_cat, correct = FALSE) %>% 
  plot()
Warning in stats::chisq.test(x, y, ...): Chi-squared approximation may be
incorrect

We clearly see that the test fails to reject the \(H_0\).