SOCI832: Overview: Introduction to Statistics and Univariate Analysis

Reading

Field, A., Miles, J., and Field, Z. (2012). Discovering statistics using R. Sage publications.

  • Chapter 2: Everything you ever wanted to know about statistics (well, sort of)


Concepts

Descriptive statistics
Mean/Median/Mode (Central tendency)
Standard deviation/Interquartile range (Variation/dispersion)
Minimum/Maximum
Percentile
N (number of non-missing cases)
Inferential statistics
Population
Population parameter
Sample
Sample statistic
Hypothesis testing
Null hypothesis
Sampling distribution
Central limit theorem
Standard error
Confidence interval
p-value
Effect size
Bivariate inferential statistics
Correlation, comparison of means, chi-squared
Multivariate inferential statistics
Linear and logistic regression
Dimension reduction and finding categories
Factor analysis, Cluster analysis


Lesson 3.1: An introduction to statistics

Learning Objectives

By the end of this class, students should be able to:

  1. Explain the different purposes of deductive and inferential statistics
  2. Identify a particular statistics as deductive or inferential
  3. Explain the differences between the different measures of the central tendency of a set of values, and know when to use each measure.
  4. In everyday language, explain the meaning and purpose of measures of variation, such as standard deviation and interquartile range.
  5. Explain the difference between a population parameter and sample statistics.
  6. Identify the null hypothesis and the alternative hypothesis
  7. In everyday language, explain the meaning of standard error, and the purpose of measuring it.
  8. Explain the purpose of, the meaning of, and the relationship between standard error, confidence interval, and p-value.
  9. Identify measures of effect size, and to explain what they mean and why they matter.
  10. Identify the types of research questions we would answer with bivariate inferential statistics, regression models, and dimension reduction.

Questions

  • What is the difference between deductive and inferential statistics?
  • How can I meaningfully describe a large dataset with just a few numbers?
  • How can I use a small sample of the real world to draw conclusions about the larger world we can’t see?
  • How can I draw conclusions about a world with so much (often random) variation?
  • How can I express certainty and uncertainty in my measurements?
  • How can I express the strength of the relationship between two variables?
  • How do I know when to use the major statistical tests, such as correlation, regression, and factor analysis?


Lesson 3.2: Exercise: Identifying concepts in an academic article.

Practice Exercise

Identify examples of the following concepts in the model/demonstration article:

  • Descriptive statistics
    • Mean/Median/Mode (Central tendency)
    • Standard deviation/Interquartile range (Variation/dispersion)
    • Minimum/Maximum
    • Percentile
    • N (number of non-missing cases)
  • Inferential statistics
    • Population
      • Population parameter
    • Sample
      • Sample statistic
    • Hypothesis testing
      • Null hypothesis
      • Sampling distribution
      • Central limit theorm
    • Standard error
      • Confidence interval
      • p-value
      • Effect size
    • Bivariate inferential statistics
      • Correlation, comparison of means, chi-squared
    • Multivariate inferential statistics
      • Linear and logistic regression
    • Dimension reduction and finding categories
      • Factor analysis, Cluster analysis

Paste examples of the concepts you find into the Google Doc here.


Lesson 3.3: R Commands for Univariate Statistics

Learning Objectives

By the end of this class, students should be able to run (with the assistance of, and ability to constantly refer to methods101.com and the internet (e.g. Google Search) and R Help) the following R commands:

  • set up commands
  • %>% - piping
  • select()
  • descr() from summarytools package
  • print()


Lesson 3.4: R Exercises.

Practice Exercise

Choose three variables - one dependent variable, and two independent variables - from the practice dataset provided. Write R code that generates the following analysis of your three variables:

  1. A descriptive statistics table with N, mean, sd, min, and max
  2. A plot of the univariate statistics of one or more of the variables.
  3. A bivariate statistical test of the relationship between each pair of the three variables (i.e. A-B, B-C, C-A)
  4. A regression model of the influence of the two independent variables on the dependent variable.

Paste your results and your code, into the Google Doc here.