SOCI832: Overview: Week 5

Reminder: Week 7 Presentation + Report: What to submit? What is the marking criteria?

From Overview:Lessons

Week 7 Presentation and report (0% - Practice)

Instructions:

  1. Replicate a published study with public dataset: Students are to:
  2. Find an article: Find a social science (or closely related discipline) study that has been published as a peer-reviewed academic article, and that uses a publically accessable dataset, and
  3. Replicate in R: Replicate the analysis presented in the paper using R.
  4. R code should not already exist: Article and dataset should NOT already have publicly available R code (this would make the exercise pointless).
  5. By Week 7: By Week 7 students should have identified the article, downloaded the dataset, and conducted preliminary analysis (i.e. univariate and bivariate analysis).
  6. Presentation: In class in Week 7 students will present for a maximum of 12 minutes, and provide:
  • a brief introduction to the article and the dataset
  • their preliminary analysis, including tables and figures.
  1. Report: In class in Week 7 students will submit their written report (printed out, and also submitted through ilearn), which shall consist of:
  • A copy of the article to be replicated
  • A link to the dataset
  • A copy of their R code (script file) with brief annotations to explain what you have done
  • A short report of approximately 600-1000 words with no more than five tables and figures, which present a preliminary analysis of the dataset.
  1. Consultation: Students are expected to consult with the lecturer (Nick) before class (4pm - 6pm), in class (6pm - 9pm), and outside class (Facebook messenger, WhatsApp) to (1) confirm their choice of article, and (2) discuss any issues and problems they are having with the analysis.

Marking criteria:

  1. Motivates interest of audience: Presentation and report should motivate the interest of the audience by identifying both what is intellectually curious about the topic, and why it is substantively important for the public, policy makers, or other non-academic audiences.
  2. Clear writing style: Straight-forward, clear, easy to read writing. This means generally using short sentences, having a single coherent and easy to understand argument, and using paragraphs with topic sentences.
  3. Professional Tables and Figures: Tables and figures should be presented like they would appear in an academic article, which means, at the least, (1) that tables are not just cut and paste from R output, (2) that tables and figures include only the necessary information, (3) that tables and figures include all appropriate information, and (4) they should be able to be interpreted on their own (without the text), and (5) all tables and figures should be referred to by number in the text.
  4. Analysis: Analysis should be a high quality replication of the analysis in the academic article which it comes from. Analysis may briefly extend on the analysis in the article, if space and time permits.
  5. Explanation: Explanation of the analysis should be simple, clear, and correctly use terminology. It should also point out the substantive significance of the results in a way that another person with a Masters Degree, but not in this area of research, could understand.
  6. R code: R code should be as simple and tidy as possible, with brief annotated comments (after # symbols), which explain the purpose of each section of code. The R code should be in a form which the lecturer can run on their computer and replicate the analysis.


Tip: Recoding Dummy Variables

# Install Packages
if(!require(dplyr)) {install.packages("sjlabelled", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(sjlabelled)) {install.packages("sjlabelled", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(sjmisc)) {install.packages("sjmisc", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(sjstats)) {install.packages("sjstats", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(sjPlot)) {install.packages("sjlabelled", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(summarytools)) {install.packages("summarytools", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(ggplot2)) {install.packages("ggplot2", repos='https://cran.csiro.au/', dependencies= TRUE)}
if(!require(ggthemes)) {install.packages("ggthemes", repos='https://cran.csiro.au/', dependencies= TRUE)}


# Load packages into memory
library(dplyr)
library(sjlabelled)
library(sjmisc)
library(sjstats)
library(sjPlot)
library(summarytools)
library(ggplot2)
library(ggthemes)

# Turn off scientific notation
options(digits=5, scipen=15) 

# Stop View from overloading memory with a large datasets
RStudioView <- View
View <- function(x) {
  if ("data.frame" %in% class(x)) { RStudioView(x[1:500,]) } else { RStudioView(x) }
}
elect_2013 <- read.csv(url("https://methods101.com/data/elect_2013.csv"))
frq(elect_2013, country_birth)
## 
## country_birth <integer>
## # total N=3955  valid N=3795  mean=2.65  sd=3.52
## 
##   val  frq raw.prc valid.prc cum.prc
##     1 2819   71.28     74.28   74.28
##     2   70    1.77      1.84   76.13
##     3  307    7.76      8.09   84.22
##     4   10    0.25      0.26   84.48
##     5   51    1.29      1.34   85.82
##     6   27    0.68      0.71   86.53
##     7   17    0.43      0.45   86.98
##     8   39    0.99      1.03   88.01
##     9   26    0.66      0.69   88.70
##    10   28    0.71      0.74   89.43
##    11   26    0.66      0.69   90.12
##    12  375    9.48      9.88  100.00
##  <NA>  160    4.05        NA      NA
elect_2013$d_aust <- elect_2013$country_birth
elect_2013$d_nz <- elect_2013$country_birth
elect_2013$d_uk <- elect_2013$country_birth
elect_2013$d_ireland <- elect_2013$country_birth

elect_2013 <- rec(elect_2013, d_aust, rec = "1=1; NA=NA; else=0", append = TRUE, suffix = "")
elect_2013 <- rec(elect_2013, d_nz, rec = "2=1; NA=NA; else=0", append = TRUE, suffix = "")
elect_2013 <- rec(elect_2013, d_uk, rec = "3=1; NA=NA; else=0", append = TRUE, suffix = "")
elect_2013 <- rec(elect_2013, d_ireland, rec = "4=1; NA=NA; else=0", append = TRUE, suffix = "")

frq(elect_2013, d_aust)
## 
## d_aust <numeric>
## # total N=3955  valid N=3795  mean=0.74  sd=0.44
## 
##   val  frq raw.prc valid.prc cum.prc
##     0  976   24.68     25.72   25.72
##     1 2819   71.28     74.28  100.00
##  <NA>  160    4.05        NA      NA
frq(elect_2013, d_ireland)
## 
## d_ireland <numeric>
## # total N=3955  valid N=3795  mean=0.00  sd=0.05
## 
##   val  frq raw.prc valid.prc cum.prc
##     0 3785   95.70     99.74   99.74
##     1   10    0.25      0.26  100.00
##  <NA>  160    4.05        NA      NA


Week 5: Bivariate Analysis

Learning Objectives

By the end of this class, students should be able to (1) define, (2) know when to use, (3) interpret R output for, and (4) - with the assistance of methods101.com and Google - run the R commands for the following types of statistical analysis:

  • cross tabulations
  • comparison of means
  • correlation coefficient
  • visualisation of a matrix of correlation coefficients

Lecture

The lecture is broken into four parts:

  1. cross tabs
  2. comparison of means
  3. simple correlation
  4. correlation tables and plots

Exercise

Using the dataset for your project, complete the following task.

Please post to the Google Doc here screenshots of your results (e.g. figures or tables) and two or three sentences explaining what is important, surprising, or interesting about the results.

  • Task 1: Generate a table and a plot of a matrix of the correlation coefficients of at least five variables. Identify one or more important, surprising, or interesting aspects of these results.
  • Task 2: Experiment with changing the graphical settings to make your plot better for communicating with the reader.


Last updated on 26 August, 2019 by Dr Nicholas Harrigan (nicholas.harrigan@mq.edu.au)