SOCI832: Lesson 4.4: Plan Your Data Cleaning

1. Run the standard set up code

# Install Packages
if(!require(dplyr)) {install.packages("sjlabelled", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(sjlabelled)) {install.packages("sjlabelled", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(sjmisc)) {install.packages("sjmisc", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(sjPlot)) {install.packages("sjlabelled", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(summarytools)) {install.packages("summarytools", repos='https://cran.csiro.au/', dependencies=TRUE)}

# Load packages into memory
library(dplyr)
library(sjlabelled)
library(sjmisc)
library(sjPlot)
library(summarytools)

# Turn off scientific notation
options(digits=5, scipen=15) 

# Stop View from overloading memory with a large datasets
RStudioView <- View
View <- function(x) {
  if ("data.frame" %in% class(x)) { RStudioView(x[1:500,]) } else { RStudioView(x) }
}

2. Compare the coding of your key variables in (1) the article; (2) the official codebook; (3) your r_codebook

You now need to undertake close analysis of the article, the offical codebook, and your own r_codebook to try to make sure all your variables are correctly coded, and if not, decide how you are going to change them.

Political Knowledge

  1. Variable name in article: Political Knowledge
  2. Variable name in R dataset: f10, f11, f12, f13, f16p1, f16p2, f16p3, f16p4, f16p5, f16p6
  3. Issues/problems/changes:
    1. Don’t know -> 0 (wrong)
    2. Make variable which 1 = correct, 0 = wrong/don’t know
      • Federation = TRUE
      • Prop Rep Senate = TRUE
      • Const High Court = FALSE
      • Deposit to run = TRU
      • Four years between elections = FALSE
      • 75 members of HoR = FALSE
      • Treasurer = Bowen
      • Unemployment = 5.7
      • Second in seats = ALP
      • UN Sec General = Ban Ki-moon
    3. Add together to one scale (0-10)
  4. Final variable name (label): pol_knowledge “Political Knowledge”
  5. Variable value labels? No

Variable: F10

  • 3 -> 1
  • 1, 2, 4, 5 -> 0
  • -1 -> NA

Variable: F11

  • 2 -> 1
  • 1, 3, 4, 5 -> 0
  • -1 -> NA

etc from f12 to f16p6

2. Likelihood Vote

  1. Variable name: Likelihood Vote
  2. Name in R dataset: a12
  3. Issues/problems/changes
    • need to reverse code
    • ds\(likely_vote <- 6 - ds\)a12
  4. Final name (label): likely_vote (Likelihood Vote)

3. General Internet

4. Election Internet

5. Internet_skills

6. Weight

7. Income => Quintiles

8. Highest_qual => Bachelor or not

9. Gender

Last updated on 24 August, 2019 by Dr Nicholas Harrigan (nicholas.harrigan@mq.edu.au)