# SSCI2020 Workshop 8: Confidence Intervals and Hypothesis Testing

In the workshop 8, we continue using the 2009 Australian Survey of Social Attitudes(2009 AuSSA). This workshop introduces how to estimate confidence intervals and how to conduct a t-test of sample means.

# Compute Confidence Intervals

## Confidence Intervals for Continuous Variables

Suppose that we want to estimate the 95% confidence intervals for Australians’ mean age using the 2009 AuSSA. To create a confidence interval, 1) go to Analyze > Descriptive Statistics > Explore. In the box of Explore, 2) select age (the variable should be a continuous variable) and move it to the box of Dependent List. 3) Click Statistics (See <Figure 1>). Figure 1: <Figure 1>

Then, in the box of Explore: Statistics, 4) tick Descriptives and 5) type a value of confidence level (in this case, 95) in the box of Confidence Interval for Mean. 6) Click Continue (See <Figure 2>). Figure 2: <Figure 2>

Now, you are back to the previous box. Click OK at the bottom. It will show the 95% confidence interval of Australians’ mean age (See <Figure 3>). Figure 3: <Figure 3>

## Confidence Intervals by Groups

Suppose that we want to estimate the 95% confidence intervals of Australians’ mean age for men and women, respectively. Go to Analyze > Descriptive Statistics > Explore. In the box of Explore, put age in the Dependent List and sex in the Factor List (see <Figure 4>). Note that the variable in the section of Dependent List should be a continuous variable, and the variable in the section of Factor List should be a nominal variable (Check the icon of variables, which shows the level of measurement). Figure 4: <Figure 4>

Click Statistics. Do the same thing as you did in the previous section. It will show the 95% confidence intervals of mean age for Australian men and women, respectively (see <Figure 5>). Figure 5: <Figure 5>

## Confidence Intervals for Proportions

When you want to compute the confidence interval for a categorical (nominal or ordinal) variable, you first need to dichotomise the variable so that you can compute a proportion of a category. Suppose that we are estimating the 95% confidence interval for the proportion of people who think that it is important to come from a wealthy family (opwlth).

First, we need to dichotomise opwlth using Recode into Different Variables. <Table 1> shows the recoding scheme of the new dichotomised variable (newopwlth) Make this new variable (newopwlth) using opwlth.

Table 1: Recoding scheme for new opwlth variable
Old variable(opwlth)
New variable(newopwlth)
Values Labels Values Labels
System- or user-missing System- or user-missing System-missing System-missing
1 Essential 1 Important
2 Very important
3 Fairly important
4 Not very important 0 Not important
5 Not important at all

Then, follow the same procedure as you did for getting the confidence interval for the mean age of Australians. <Figure 6> would be helpful. Figure 6: <Figure 6>

Then, you will see the ouput as in <Figure 7>. Figure 7: <Figure 7>

Now, we get the 95% confidence interval for the proportion of people who think that it is important to come from a wealthy family. But what if we want to get this proportion for men and women, respectively?

Put a variable of sex in the box of Factor List in <Figure 6>. Then, you will see <Figure 8>, which shows the 95% confidence intervals for the proportion for men and women, respectively. Figure 8: <Figure 8>

## Visualising Confidence Intervals

Now, we are going to visualise the confidence intervals for the proportion of people who think that it is important to come from a wealthy family by gender. 1) Go to Graph > Legacy Dialogs > Error Bar. In the box of Error Bar, 2) select Simple and Summaries for groups of cases. 3) Click Define (see <Figure 9>). Figure 9: <Figure 9>

In the box of Define Simple Error Bar: Summaries for Groups of Cases , 4) move newopwlth to the section of Variable. 5) move sex in the section of Category Axis and then 6) Click OK (see <Figure 10>). Figure 10: <Figure 10>

It will produce <Figure 11>. After looking at <Figure 11>, adjudicate whether men and women assess the importance of wealthy family backgrounds differently. If you are not sure how to do this, please see the week 8 lecture slides. Figure 11: <Figure 11>

# Hypothesis Testing

## One-Sample t-Test

The standard working hours per week in Australia is currently 38. Let’s test whether Australians work, on average, 38 hours per week or not, using a variable, wrkhrs. We will test whether the average working hours of Australians is significantly different from 38 hours. Thus, our research hypothesis is that Australians work, on average, either more than or less than 38 hours per week. The null hypothesis is that Australians work, on average, 38 hours per week.

To conduct one-sample t-test, 1) go to Analyze > Compare Mean > One-Sample T Test. In the box of One-Sample T Test, 2) move a variable of your interest (in this case, wrkhrs) and 3) specify a test value which the null hypothesis assumes (in this case 38) (see <Figure 12>). Figure 12: <Figure 12>

Then, 4) click OK. You will see <Figure 13>, which provides the sample mean, the difference from the value that the null hypothesis assumes, t-statistic and its associated p-value. Based on <Figure 13>, do you think Australians work, on average, 38 hours per week? If you are not sure, see the week 9 lecture slides. Figure 13: <Figure 13>

NOTE: The p-value in SPSS is based on a two-tailed t-test. If you want to get p-values for a one-tailed t-test, divide the p-value in SPSS by two. For example, if the p-value in SPSS outputs is .025, the p-value for a one-tailed test will be .0125

## Two-Sample t-Test

Suppose that we would like to test whether the average working hours are different between men and women. Thus, the null hypothesis is that there is no gender difference in the average working hours. We will use a two-sample t-test because we compare the average working hours between two different groups (men and women).

To conduct two-sample t-test, 1) go to Analyze> Compare Means > Independent-Samples T Test. 2) Move a variable of your interest (in this case wrkhrs) to the box of Test Variable(s) and 3) a variable of Groups (in this case sex) to the box of Grouping Variable. 4) Click Define Groups (see <Figure 14>). Figure 14: <Figure 14>

In the box of Define Groups, 5) select Use specified values. And 6) input 1 (male) for Group 1 and 2 (female) for Group 2. 7) Click Continue. In the previous box, 8) click OK (see <Figure 15>). Figure 15: <Figure 15>

Then, you will see <Figure 16>. Figure 16: <Figure 16>

Group Statistics in <Figure 16> show the average working hours for males and females, respectively. Independent Samples Test in <Figure 16> examines whether the difference (9.1375 = 45.014 – 35.876) is statistically significant.

First, look at the column of “Levene’s Test for Equality of Variance”. This is a test for equal variances. The null hypothesis is that the two distributions (for males and females) have equal variances. Let’s set the significance level .05. <Figure 16> shows that p-value (denoted as Sig.) for Levene’s Test is .259, which is much greater than the significance level (.05). Thus, we cannot reject the null hypothesis that the two distributions have the same variance. Therefore, we conclude that the two distributions have equal variances.

For this reason, we will use the t-test for “Equal variances are assumed” (The first row). T statistic is 9.611, and its p-value is .000. And this p-value is computed based on two-tailed t-test. Again, let’s set the significance level (α) at .05. The p-value is much less than α, and thus we reject the null hypothesis that there is no gender difference in working hours per week. Therefore, we conclude that there is a significant difference in working hours per week between men and women.

## t-Test of Proportion Differences

Suppose that we would like to test whether there is a gender (sex) difference in the proportion of people who think that it is important to come from a wealthy family(newopwlth). Repeat the same procedure as you did in the previous section, but with newopwlth in the box of Test Variable(s) and sex in the box of Grouping Variable (See <Figure 17>). Figure 17: <Figure 17>

Then, you will see <Figure 18>. Figure 18: <Figure 18>

Levene’s Test in <Figure 18> shows the p-value is less than the alpha (α=.05). Therefore, the two distributions have a different variance, which enforces you to use “Equal variances not assumed”. The t-statistic is 2.951, and its p-value (.003) is smaller than .05. So, we can reject the null hypothesis that there is no difference in the proportion of people who think that it is important to come from a wealthy family between men and women at α = .05 and even α = 01.

# Workshop Activity 8: Confidence Intervals and Hypothesis Testing

1. Public polls often report the percentage of people who support a political party. We are going to do a similar task using a variable, prtyaff. This variable asks which political party, respondents think, is affiliated with (See <Table 2> for more details). Using prtyaff, complete the following tasks.

A. Compute the 90% confidence interval of the percentage of people who identify themselves as Liberal party supporters. Make sure that you need to make a new variable (named “liberal”) which differentiates Liberal party supporters (recoded into 1) from all the other party supporters (recoded into 0). Please see <Figure 19> for recoding.

B. Compute the 90% confidence interval of the percentage of people who identify themselves as Labour party supporters. Make sure that you need to make a new variable (named “labour”) which differentiates Labour party supporters (recoded into 1) from all the other party supporters (recoded into 0).

C. Based on the two 90% confidence intervals you computed, which party has a higher percentage of supporters? Also, do you think there is a significant difference in terms of the percentage of supporters between Liberal and Labour party?

Table 2: Values and Labels for prtyaff
Values Labels
1 Liberal
2 Labour
3 National
4 Australian Democrats
5 Greens
6 One Nation
7 Family First
95 Other parties
96 Would not vote; No party preference Figure 19: <Figure 19>

1. Compute the 95% confidence intervals of years of schooling (educyrs) for males and females, respectively. Based on the outcome, do you think there is a significant difference in the years of schooling between men and women?

1. Investigate whether people have, on average, more or less than 12 years of schooling. Conduct a one-sample t-test using educyrs. Set the significance level at α = .05. What is your conclusion? Do people have 12 years of schooling, or more or less?

1. Investigate whether men and women view the importance of race (oprace) in getting ahead in society differently. You need to recode oprace in the same way as you did for opwlth (see <Table 1>). For this question, you are required to conduct a two-sample t-test. Based on the SPSS output, what is your conclusion? Set the significance level at α = .05 for your conclusion.

Last updated on 01 October, 2020 by Dr Hang Young Lee(hangyoung.lee@mq.edu.au)