SOCI832: Overview: Week 6

Week 6: Scales & Indicies, and Dimension Reduction

Reading Chapter 17: Exploratory Factor Analysis in Field, Miles, and Field, 2012. Discovering Statistics Using R.

Learning Objectives

By the end of this class, students should be able to (1) define, (2) know when to use, (3) interpret R output for, and (4) - with the assistance of methods101.com and Google - run the R commands for the following types of statistical analysis:

  • factor analysis
  • Cronbach’s alpha

Lecture

The lecture is broken into four parts:

  • 6.1: Factor analysis
  • 6.2: Example
  • 6.3: Cronbach’s alpha
  • 6.4: Tables and Graphs

Exercise

Using the dataset for your project, complete the following three tasks.

For each Task, please post to the Google Doc here screenshots of your results (e.g. figures or tables) and two or three sentences explaining what is important, surprising, or interesting about the results.

  • Task 1: Run a factor analysis of a set of variables in your dataset. What does this tell you about the underlying structure of these variables?
  • Task 2: Construct a scale or index, and test it’s reliability by calculating the Cronbach’s alpha for the scale/index. What do the results of Cronbach’s alpha tell you about the scale, and the items in the scale? Does the test suggest that the scale is reliable? Could it be improved? How do you know?

External students and those who miss class, please post your answers to the blog on iLearn. Please post your code and images within the blog, not as attachments. Please do attach your dataset/s so that we can all follow along.


Some handy code for cleaning data

Cheatsheet for Data Manipulation and Data Cleaning

Note useful functions like:

  • dplyr::filter() - select rows that meet a critiera
  • dplyr::distinct() - remove duplicate rows
  • dplyr::select () - select columns by name
  • dplyr::mutate() - make new variable
  • dplyr::left_join() - joins matching rows on specified column
  • dplyr::bind_rows() - binds rows to bottom of dataset
  • dplyr::bind_cols() - binds columns to right side of dataset
  • dplyr::group_by() - group data into rows with same values - can use to create multiple groups for summary statistics, or regressions
  • tidyr::drop_na() - drop cases that have missing values in one or more variables
# Load packages into memory
library(dplyr)
library(sjlabelled)
library(sjmisc)
library(sjstats)
library(sjPlot)
library(summarytools)
library(ggplot2)
library(ggthemes)
library(GPArotation) 
library(psych)
library(ggrepel)

lga <- readRDS(url("https://methods101.com/data/nsw-lga-crime-clean.RDS"))

mean_unemp <- mean(lga$unemploy, na.rm = TRUE)

lga %>%
  dplyr::select(giniinc, unemploy, robbery) %>%
  tidyr::drop_na() %>%
  stats::lm(robbery ~ giniinc + unemploy, data = .) %>%
  base::summary()
## 
## Call:
## stats::lm(formula = robbery ~ giniinc + unemploy, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.369 -14.717  -5.500   8.512 185.460 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  -30.468     30.242  -1.007  0.31654   
## giniinc       60.649     53.466   1.134  0.25980   
## unemploy       4.994      1.838   2.716  0.00798 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27.5 on 86 degrees of freedom
## Multiple R-squared:  0.08177,    Adjusted R-squared:  0.06042 
## F-statistic: 3.829 on 2 and 86 DF,  p-value: 0.02552
lga %>%
  dplyr::select(giniinc, unemploy, sexoff) %>%
  tidyr::drop_na() %>%
  filter(unemploy > mean_unemp) %>%
  stats::lm(sexoff ~ giniinc + unemploy, data = .) %>%
  base::summary()
## 
## Call:
## stats::lm(formula = sexoff ~ giniinc + unemploy, data = .)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -177.661  -58.942   -4.406   63.487  253.247 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  -129.59     169.50  -0.765  0.44808   
## giniinc       860.64     300.39   2.865  0.00604 **
## unemploy       -6.05      11.00  -0.550  0.58466   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 95.01 on 51 degrees of freedom
## Multiple R-squared:  0.1479, Adjusted R-squared:  0.1144 
## F-statistic: 4.425 on 2 and 51 DF,  p-value: 0.01691
lga %>%
  dplyr::select(giniinc, unemploy, robbery) %>%
  tidyr::drop_na() %>%
  filter(unemploy < mean_unemp) %>%
  stats::lm(robbery ~ giniinc + unemploy, data = .) %>%
  base::summary()
## 
## Call:
## stats::lm(formula = robbery ~ giniinc + unemploy, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -22.186  -8.022  -2.730   6.177  32.350 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    6.763     24.842   0.272    0.787  
## giniinc      -43.777     31.227  -1.402    0.170  
## unemploy       7.479      3.150   2.374    0.023 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.93 on 36 degrees of freedom
## Multiple R-squared:  0.2645, Adjusted R-squared:  0.2237 
## F-statistic: 6.474 on 2 and 36 DF,  p-value: 0.003964
Last updated on 02 September, 2019 by Dr Nicholas Harrigan (nicholas.harrigan@mq.edu.au)