10 HT4 Hypothesis Testing: \(\chi^2\) tests
library(tidyverse) # our main collection of functions
library(tidylog) # prints additional output from the tidyverse commands - load after tidyverse
library(haven) # allows us to load .dta (Stata specific) files
library(here) # needed to navigate to folders and files in a project
library(webr) # allows us to plot the outcome of the chi-squared test
library(magrittr) # to use the exposition operator
10.2 \(\chi^2\) test: Democracy and Institutions
Rather than the normal pipe command, we again use the exposition %$%
operator from the magrittr
package. We need to do this because the chisq.test()
needs a vector input rather than a column input (yes, I know this is annoying in R).
To replicate the result from Stata, we need to turn off the continuity correction that R automatically uses by using the option correct = FALSE
. We will not go into this mechanism here.
df %$%
chisq.test(x = chga_demo, y = icrg_cat, correct = FALSE)
Pearson's Chi-squared test
data: chga_demo and icrg_cat
X-squared = 5.235, df = 1, p-value = 0.02214
We could also stick with our standard pipe, but then the command looks a bit messier:
df %>%
summarise(chisq_test = list(chisq.test(x = chga_demo, y = icrg_cat, correct = FALSE))) %>%
pull(chisq_test)
summarise: now one row and one column, ungrouped
[[1]]
Pearson's Chi-squared test
data: chga_demo and icrg_cat
X-squared = 5.235, df = 1, p-value = 0.02214
Let’s plot this again. We use the normal pipe command after the chisq.test()
command again:
df %$%
chisq.test(x = chga_demo, y = icrg_cat, correct = FALSE) %>%
plot()

Because our estimate is in the red-shaded area (our test statistic value is larger than our critical value of \(\alpha_{0.05}\)), we reject the \(H_0\) of no association.
10.3 \(\chi^2\) test: Democracy and Inequality
We just use a different variable now:
df %$%
chisq.test(x = chga_demo, y = gini_cat, correct = FALSE)
Warning in stats::chisq.test(x, y, ...): Chi-squared approximation may be
incorrect
Pearson's Chi-squared test
data: chga_demo and gini_cat
X-squared = 0.1069, df = 2, p-value = 0.948
Actually the command gives us a warning that the \(\chi^2\) approximation my be incorrect in this case. The command gave the warning because many of the expected values will be very small and therefore the approximations of p may not be right. We can simply accept this, but we can also take steps to make our estimate more accurate by choosing the option simulate.p.value = TRUE
. This now uses a Monte Carlo simulation with 2000 replications to estimate this test (don’t worry about this though).
df %$%
chisq.test(x = chga_demo, y = gini_cat, correct = FALSE, simulate.p.value = TRUE)
Pearson's Chi-squared test with simulated p-value (based on 2000
replicates)
data: chga_demo and gini_cat
X-squared = 0.1069, df = NA, p-value = 0.947
Now we don’t get a warning anymore!! This method, however, does not output the degrees of freedom.
We’ll plot the original estimate now though to stay consistent with our Stata estimates. We use the normal pipe command after the chisq.test()
command again:
df %$%
chisq.test(x = chga_demo, y = gini_cat, correct = FALSE) %>%
plot()
Warning in stats::chisq.test(x, y, ...): Chi-squared approximation may be
incorrect

We clearly see that the test fails to reject the \(H_0\).