# 10 HT4 Hypothesis Testing: \(\chi^2\) tests

```
library(tidyverse) # our main collection of functions
library(tidylog) # prints additional output from the tidyverse commands - load after tidyverse
library(haven) # allows us to load .dta (Stata specific) files
library(here) # needed to navigate to folders and files in a project
library(webr) # allows us to plot the outcome of the chi-squared test
library(magrittr) # to use the exposition operator
```

## 10.2 \(\chi^2\) test: Democracy and Institutions

Rather than the normal pipe command, we again use the exposition `%$%`

operator from the `magrittr`

package. We need to do this because the `chisq.test()`

needs a vector input rather than a column input (yes, I know this is annoying in R).

To replicate the result from Stata, we need to turn off the continuity correction that R automatically uses by using the option `correct = FALSE`

. We will not go into this mechanism here.

```
df %$%
chisq.test(x = chga_demo, y = icrg_cat, correct = FALSE)
```

```
Pearson's Chi-squared test
data: chga_demo and icrg_cat
X-squared = 5.235, df = 1, p-value = 0.02214
```

We could also stick with our standard pipe, but then the command looks a bit messier:

```
df %>%
summarise(chisq_test = list(chisq.test(x = chga_demo, y = icrg_cat, correct = FALSE))) %>%
pull(chisq_test)
```

`summarise: now one row and one column, ungrouped`

```
[[1]]
Pearson's Chi-squared test
data: chga_demo and icrg_cat
X-squared = 5.235, df = 1, p-value = 0.02214
```

Let’s plot this again. We use the normal pipe command after the `chisq.test()`

command again:

```
df %$%
chisq.test(x = chga_demo, y = icrg_cat, correct = FALSE) %>%
plot()
```

Because our estimate is in the red-shaded area (our test statistic value is larger than our critical value of \(\alpha_{0.05}\)), we reject the \(H_0\) of no association.

## 10.3 \(\chi^2\) test: Democracy and Inequality

We just use a different variable now:

```
df %$%
chisq.test(x = chga_demo, y = gini_cat, correct = FALSE)
```

```
Warning in stats::chisq.test(x, y, ...): Chi-squared approximation may be
incorrect
```

```
Pearson's Chi-squared test
data: chga_demo and gini_cat
X-squared = 0.1069, df = 2, p-value = 0.948
```

Actually the command gives us a warning that the \(\chi^2\) approximation my be incorrect in this case. The command gave the warning because many of the expected values will be very small and therefore the approximations of p may not be right. We can simply accept this, but we can also take steps to make our estimate more accurate by choosing the option `simulate.p.value = TRUE`

. This now uses a Monte Carlo simulation with 2000 replications to estimate this test (don’t worry about this though).

```
df %$%
chisq.test(x = chga_demo, y = gini_cat, correct = FALSE, simulate.p.value = TRUE)
```

```
Pearson's Chi-squared test with simulated p-value (based on 2000
replicates)
data: chga_demo and gini_cat
X-squared = 0.1069, df = NA, p-value = 0.947
```

Now we don’t get a warning anymore!! This method, however, does not output the degrees of freedom.

We’ll plot the original estimate now though to stay consistent with our Stata estimates. We use the normal pipe command after the `chisq.test()`

command again:

```
df %$%
chisq.test(x = chga_demo, y = gini_cat, correct = FALSE) %>%
plot()
```

```
Warning in stats::chisq.test(x, y, ...): Chi-squared approximation may be
incorrect
```

We clearly see that the test fails to reject the \(H_0\).