SOCI8015 Lab 9: Crosstab & Chi-sqaure Test

This ninth lab introduces how to produce a cross-tabulation and how to conduct a Chi-square test of Independence.

We will use three packages for this lab. Load them using the following code:

library(sjlabelled)
library(sjmisc)
library(sjPlot)

This lab uses the 2012 AuSSa dataset. You can download the file of this dataset on the course website(iLearn). Download the data file and put it in your working directory. Then, run the following code:

aus2012 <-readRDS("aussa2012.rds")

The dataset is loaded as aus2012.

A Simple Example

A frequency table is a typical way to describe just one categorical variable. When you want to describe two categorical variables simultaneously, especially their relationship, we need a special type of table called cross-tabulation (or crosstab for short). In a crosstab, the categories of one variable determine the rows of the table, and the categories of the other variable determine the columns. The cells of the table contain the frequency that a particular combination of categories occurred.

Suppose that we are investigating whether there is an association between gender (sex) and attitudes toward single parenthood. (singlpar). singlpar measures the extent to which respondents agree or disagree with the statement that one parent can raise the children as well as two parents together. We assume that gender may influence attitudes. Therefore, we think of gender as independent and attitudes toward single parenthood as dependent variable.

To generate a crosstab and to conduct a Chi-square test, we use ‘sjt.xtab()’ from the ‘sjPlot’ package. Use ‘sjt.xtab(data name$name of dependent, data name$name of independent, show.col.prc =TRUE)’. ‘show.col.prc=TRUE’ adds column percentages to the crosstab. Thus, the following code creates the crosstab of gender (sex) and attitudes toward single parenthood (singlpar). MAKE SURE that the INDEPENDENT variable should be put in the first row and the DEPENDENT variable in the first column. Otherwise, you can’t get the proper column percentage that enables you to interpret the result.

sjt.xtab(aus2012$singlpar, aus2012$sex, show.col.prc = TRUE)
Q5a Single parent
can raise child as
well
Sex of Respondent Total
Male Female
Strongly agree 35
5.1 %
114
13.4 %
149
9.7 %
Agree 173
25.4 %
375
44 %
548
35.7 %
Neither agree nor
disagree
81
11.9 %
130
15.2 %
211
13.8 %
Disagree 303
44.5 %
208
24.4 %
511
33.3 %
Strongly disagree 89
13.1 %
26
3 %
115
7.5 %
Total 681
100 %
853
100 %
1534
100 %
χ2=162.659 · df=4 · Cramer’s V=0.326 · p=0.000

The output shows the crosstab and its associated Chi-square statistics. Independent variable (sex) is put in the first row, and dependent variable (singlpar) in the first column. The crosstab shows column percentages. Thus, you can easily compare the attitude between men and women. For instance, women (44%) are more likely to agree with the statement than men (25.4%). Below the table, Chi-square statistic and p-value are displayed. Chi-square statistic is 162.659, degree of freedom is 4, and p-value is 0.000. Since p-value is less than .05, you can conclude that gender is significantly associated with attitudes toward single parenthood at alpha = .05.

More Complex Examples

When you try to examine bivariate association using a crosstab, it would be a very daunting task if your categorical variable has too many categories or you are using a continuous variable. In this case, you need to recode such variables so that the variables have reduced numbers (normally less than five) of categories. Nonetheless, the reduced categories should still be theoretically meaningful. In this lab, we examine how education, age, and class—which are independent variables— are associated with attitudes toward single parenthood. When you look at these independent variables, you will easily notice that they have so many categories. Education (degree) is a categorical variable with seven categories, but we do not need such many categories to examine the association. Age (age) and class (tobpot) are continuous variables, and therefore, categorising these two variables is a must for creating crosstabs.

Recoding Variables

First, let’s recode age into a variable of three categories, which are “40 or less = 1”, “41 to 60 =2” and “61 or more = 3”. The following codes perform this task.

aus2012 <- rec(aus2012, age, rec = "min:40=1; 41:60=2; 61:max=3", append = TRUE)
aus2012$age_r <- set_label(aus2012$age_r, label = "Age Category")
aus2012$age_r <- set_labels(aus2012$age_r, 
                            labels = c("40 or less" = 1, "41 to 60" = 2, "61 or more" = 3))

Second, let’s make a new education variable which simplifies the categories of degree. “Did not complete High School to Year 10 (1)”, “Completed High School to Year 10 (2)” and “Completed High School to Year 12 (3)” are collapsed into “High School or less (1)”. “Trade qualification or apprenticeship (4)” and “Certificate or Diploma (5)” are collapsed into “Vocational Education & Training (2)”. “Bachelor Degree (6) and “Postgraduate Degree or Postgraduate Diploma(7)” are collapsed into “University or more (3)”. The following codes perform this task.

aus2012 <- rec(aus2012, degree, rec = "1:3=1; 4:5=2; 6:7=3", append = TRUE)
aus2012$degree_r <- set_label(aus2012$degree_r, label = "Education")
aus2012$degree_r <- set_labels(aus2012$degree_r, 
                               labels = c("High school or less" = 1, 
                                          "Vocational Education & Training" = 2, 
                                          "University or more" = 3))

Lastly, a 10-scale social position variable, topbot, is recoded into a variable of class consisting of lower, middle, and upper class. Values from 1 to 5 are collapsed into “lower class (1)”, 6 to 8 into “middle class (2)”, and 9 to 10 into “upper class (3)”. The following codes perform this task.

aus2012 <- rec(aus2012, topbot, rec = "1:5=1; 6:8=2; 9:10=3", append = TRUE)
aus2012$topbot_r <- set_label(aus2012$topbot_r, label = "class")
aus2012$topbot_r <- set_labels(aus2012$topbot_r, 
                               labels = c("lower" = 1, "middle" = 2, "upper" = 3))

Crosstab and Chi-sqaure Test

Now we are ready to examine the bivariate association. The following codes generate crosstabs of attitudes toward single parenthood (singlpar) and age (age_r).

sjt.xtab(aus2012$singlpar, aus2012$age_r, show.col.prc = TRUE)
Q5a Single parent
can raise child as
well
Age Category Total
40 or less 41 to 60 61 or more
Strongly agree 64
17.5 %
61
9.9 %
23
4.3 %
148
9.7 %
Agree 152
41.6 %
205
33.4 %
187
34.6 %
544
35.8 %
Neither agree nor
disagree
53
14.5 %
83
13.5 %
73
13.5 %
209
13.8 %
Disagree 81
22.2 %
213
34.7 %
211
39 %
505
33.2 %
Strongly disagree 15
4.1 %
52
8.5 %
47
8.7 %
114
7.5 %
Total 365
100 %
614
100 %
541
100 %
1520
100 %
χ2=71.039 · df=8 · Cramer’s V=0.153 · p=0.000

In the crosstab, you can easily notice that younger people are more likely to be in favour of single parenthood than older people. Chi-square is 71.04, and p-value is approximately 0.000, which is much less than .05. Thus, we conclude that age and attitudes toward single parenthood are dependent at alpha = .05.

The following codes generate crosstabs of attitudes toward single parenthood (singlpar) and education (degree_r).

sjt.xtab(aus2012$singlpar, aus2012$degree_r, show.col.prc = TRUE)
Q5a Single parent
can raise child as
well
Education Total
High school or less Vocational Education
& Training
University or more
Strongly agree 35
7.9 %
58
10.7 %
56
11.4 %
149
10.1 %
Agree 164
36.9 %
179
32.9 %
172
35 %
515
34.8 %
Neither agree nor
disagree
63
14.2 %
84
15.4 %
60
12.2 %
207
14 %
Disagree 149
33.6 %
184
33.8 %
164
33.3 %
497
33.6 %
Strongly disagree 33
7.4 %
39
7.2 %
40
8.1 %
112
7.6 %
Total 444
100 %
544
100 %
492
100 %
1480
100 %
χ2=6.602 · df=8 · Cramer’s V=0.047 · p=0.580

The crosstab does not show a clear pattern of association between the two variables. Chi-square is 6.602, and p-value is 0.580, which is greater than .05. Thus, we conclude that education and attitudes toward single parenthood are independent at alpha = .05.

The following codes generate crosstabs of attitudes toward single parenthood (singlpar) and class (topbot_r).

sjt.xtab(aus2012$singlpar, aus2012$topbot_r, show.col.prc = TRUE)
Q5a Single parent
can raise child as
well
class Total
lower middle upper
Strongly agree 45
11.1 %
79
8.7 %
13
13.3 %
137
9.7 %
Agree 151
37.3 %
329
36.4 %
28
28.6 %
508
36.1 %
Neither agree nor
disagree
53
13.1 %
123
13.6 %
14
14.3 %
190
13.5 %
Disagree 121
29.9 %
319
35.3 %
31
31.6 %
471
33.5 %
Strongly disagree 35
8.6 %
54
6 %
12
12.2 %
101
7.2 %
Total 405
100 %
904
100 %
98
100 %
1407
100 %
χ2=13.879 · df=8 · Cramer’s V=0.070 · p=0.085

Again, the crosstab does not show a clear pattern of association between the two variables. Chi-square is 13.879, and p-value is 0.085, which is greater than .05. Thus, we conclude that class and attitudes toward single parenthood are independent at alpha = .05.

Lab 9 Participation Activity

No Lab Participation Activity this week. Completing R Analysis Task 3 will contribute to your participation mark.


The R codes you have written so far look like:

################################################################################
# Lab 9: Crosstab and Chi-square Test
# 17/05/2021
# SOCI8015 & SOCX8015
################################################################################

# Load packages
library(sjlabelled)
library(sjmisc)
library(sjPlot)

# Import the 2012 AuSSA dataset
aus2012 <- readRDS("aussa2012.rds")

# A Simple Example
sjt.xtab(aus2012$singlpar, aus2012$sex, show.col.prc = TRUE)

# More Complex Examples
# Recode independent variables
## Age
aus2012 <- rec(aus2012, age, rec = "min:40=1; 41:60=2; 61:max=3", append = TRUE)
aus2012$age_r <- set_label(aus2012$age_r, label = "Age Category")
aus2012$age_r <- set_labels(aus2012$age_r, 
                            labels = c("40 or less" = 1, "41 to 60" = 2, "61 or more" = 3))

## Education
aus2012 <- rec(aus2012, degree, rec = "1:3=1; 4:5=2; 6:7=3", append = TRUE)
aus2012$degree_r <- set_label(aus2012$degree_r, label = "Education")
aus2012$degree_r <- set_labels(aus2012$degree_r, 
                               labels = c("High school or less" = 1, 
                                          "Vocational Education & Training" = 2, 
                                          "University or more" = 3))

## Social Class
aus2012 <- rec(aus2012, topbot, rec = "1:5=1; 6:8=2; 9:10=3", append = TRUE)
aus2012$topbot_r <- set_label(aus2012$topbot_r, label = "class")
aus2012$topbot_r <- set_labels(aus2012$topbot_r, 
                               labels = c("lower" = 1, "middle" = 2, "upper" = 3))

# Crosstab & Chi-square test
sjt.xtab(aus2012$singlpar, aus2012$age_r, show.col.prc = TRUE)
sjt.xtab(aus2012$singlpar, aus2012$degree_r, show.col.prc = TRUE)
sjt.xtab(aus2012$singlpar, aus2012$topbot_r, show.col.prc = TRUE)
Last updated on 16 May, 2021 by Dr Hang Young Lee(hangyoung.lee@mq.edu.au)