SOCI832: Lesson 11.2: Other Regression Models

0. Code to run to set up your computer.
1. Other types of regression models
2. How regression models vary.
3. References

0. Code to run to set up your computer.

# Update Packages
update.packages(ask = FALSE, repos='https://cran.csiro.au/', dependencies = TRUE)

# Install Packages
if(!require(dplyr)) {install.packages("dplyr", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(sjlabelled)) {install.packages("sjlabelled", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(sjmisc)) {install.packages("sjmisc", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(sjstats)) {install.packages("sjstats", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(sjPlot)) {install.packages("sjPlot", repos='https://cran.csiro.au/', dependencies=TRUE)}
if(!require(lm.beta)) {install.packages("lm.beta", repos='https://cran.csiro.au/', dependencies=TRUE)}

# Load packages into memory
base::library(dplyr)
base::library(sjlabelled)
base::library(sjmisc)
base::library(sjstats)
base::library(sjPlot)
base::library(lm.beta)

# Turn off scientific notation
options(digits=3, scipen=8) 

# Stop View from overloading memory with a large datasets
RStudioView <- View
View <- function(x) {
  if ("data.frame" %in% class(x)) { RStudioView(x[1:500,]) } else { RStudioView(x) }
}

# Datasets
# Example 1: Crime Dataset
lga <- readRDS(url("https://methods101.com/data/nsw-lga-crime-clean.RDS"))

# Example 2: AuSSA Dataset
aus2012 <- readRDS(url("https://mqsociology.github.io/learn-r/soci832/aussa2012.RDS"))

# Example 3: Australian Electoral Survey
aes_full <- readRDS(gzcon(url("https://mqsociology.github.io/learn-r/soci832/aes_full.rds")))

# Example 4: AES 2013, reduced
elect_2013 <- read.csv(url("https://methods101.com/data/elect_2013.csv"))

1. Other types of regression models

There are an almost infinite number of regression models available for data analysis.

As a researcher, it is impossible to know all them.

What I want to do in this lesson is introduce you:

The main ways that models vary, such as their dependent variable, their assumptions about the distribution of the dependent variable, and the method of estimation.
The basic commands for running the most common models you are likely to come across.

2. How regression models vary.

I think we can conceptualise - even if it is an oversimplification - of three main ways that regression models systematically differ from each other:

The measurement of dependent variable: is it continuous/interval, binary, ordinal, or a range of choices, or something else?
The (assumed) statistical distribution of the dependent variable: is it normally distributed, or a count, or is it best represented by a logistic (or probit) distribution.
Dependencies between the cases (units of analysis): Are these repeated measurements on the same cases (such as in time-series)? Are the cases nested within larger organisational units (e.g. classes, schools, states, nations?).
The method of estimation: there are lots of different ways of calcuating the best model - some involve direct calculation, while others involve simulations and maximising/minimising certain ‘fit’ statistics.

In the table below we list a number of the most important regression models, and their characteristics.

Model name	Dep Var	When to use?	Command in R
Linear regression (ordinary least squares - OLS)	Cont. or Intval	DV is continuous or interval. e.g. Mark out of 100 in exam.	`stats::lm(...)`
Logistic regression (Logit)	Binary	DV is binary. e.g. Pass(1)/Fail(0) Alternative to Probit Follow convention of discipline	`stats::glm(... , family = binomial)`
Probit regression (Probit)	Binary	DV is binary. e.g. Pass(1)/Fail(0) Alternative to Logit Follow convention of discipline	`stats::glm(...,` `family = binomial(link = "probit"))`
Conditional logit	Choices	DV is three or more (unordered) nominal choices. e.g. Brand of phone; Favourite colour. IVs = characteristics of choices	`survival::clogit(...)`
Multinomial logit	Choices	DV is three or more (unordered) nominal choices. e.g. Brand of phone; Favourite colour. IVs = characteristics of individuals	`mlogit::mlogit(...)`
Ordinal logistic regression (Ordered logit)	Ordinal	DV is ordinal variable (few options). e.g. Agree/Neutral/Disagree Trump is good President	`MASS::polr(...)` or `ordinal::clm(...)`
Poisson regression	Count	DV is a count variable (assumes variance = mean). e.g. Number of students who fail in each class	`stats::glm(..., family="poisson")`
Negative binomial regression	Count	DV is a count variable (doesnot assume variance = mean). e.g. Number of students who fail in each class	`MASS::glm.nb(...)`
Zero inflated negative binomial regression	Count	DV is a count variable (large number of zero cases). e.g. Number of students who fail in each class	`pscl::zeroinfl(...)`
Multilevel Models	Any	Cases are clustered into groups which mean they are not independent e.g. students in classes, classes in schools, schools in states.	`lme4::lmer(...)`
Tobit regression	Cont. but censored	DV is censored, i.e. you cannot observe the DV above or below a certain value. e.g. ‘ATAR less than 30’; ‘surivial longer than 5 years’	`VGAM::vglm(..., tobit(Upper = ...)` or `AER::tobit(...)`
Survival analysis (Cox regresion)	Time	DV is time until event. e.g. Years survival from diagnosis; Years studying PhD until graduation.	`survival::coxph(...)`