SOCI832: Lesson 2.3: Finding an article and data for your assignment

Learning Objectives

By the end of this class, students should be able to:

  1. Explain what type of article and dataset is required for their main assignment
  2. Search for articles and datasets for their main assignment using multiple methods, including different search engines, websites, and citation tracing.
  3. Identify datasets and articles that are likely to be GOOD and BAD candidates to replicate


Questions

  • What type of article and dataset is required for the main assignment?
  • How to find articles and datasets for the main assignment?
  • How to identify good and bad candidates for articles/datasets?


1. What type of article and dataset is required for the main assignment?

As stated in the unit guide, the requirements for this assignment are to:

Replicate a published study with public dataset:

  • Step 1: Find an article and dataset: Find a social science (or closely related discipline) study that has been published as a peer-reviewed academic article, and that uses a publically accessable dataset, and
  • Step 2: Replicate analysis in R: Replicate the analysis presented in the paper using R.
  • NOTE: R code should not already exist: Article and dataset should NOT already have publicly available R code (this would make the exercise pointless).

What makes a good article and dataset?

Essential characteristics:

  • Article uses the dataset you can access
  • Article conducts univariate, bivariate, and multivariate analysis (or at least two of the three, and you can do all three)
  • Dataset needs to be in the format of standard social science data (i.e. rows are units of analysis, columns are variables). The data cannot be summary tables (e.g. means, max, SD, etc.) or results tables (e.g. correlation matrix)

Non-essential characteristics:

  • Article and dataset are on a topic you find interesting and which you can convince others is interesting.
  • Dataset is easy to access, and doesn’t require applying for permission to an authority (such as a repository or the author).

You have a datasets and article for your assignment when…

  • You have downloaded the dataset and opened it in R, and in a spreadsheet (such as Excel), and briefly explored the data and the data seems to be healthy
  • You have downloaded the codebook for the dataset
  • You have downloaded the article
  • You have confirmed the article contains univariate, bivariate, and multivariate analysis
  • You have confirmed that the variables in the article’s analysis are in the dataset you have downloaded (i.e. that the dataset you have downloaded is not a reduced version of the dataset, e.g. missing confidential data)
  • You have shown this to Nick, and got his approval to move ahead (before the end of Week 4).


2. How to find articles and datasets for the main assignment?

I recommend using one or more of these four strategies:

  • Strategy 1: Search academic data repositories
  • Strategy 2: Search open data websites
  • Strategy 3: Google Scholar search
  • Strategy 4: Google Scholar search for articles that use public datasets
  • Strategy 5: Google Dataset Search (Beta)

In my opinion a combination of Strategy 1 (to familiarise yourself with good, relevant datasets), and Strategy 4 (to find articles that use these datasets) is likely to be the most productive and efficient.

Strategy 1: Search academic data repositories

Summary: This method involves going to a dedicated academic data repository, such as ICPSR or Dataverse/Australian Data Archive, and finding a dataset on a topic you are interested in. On these websites, you should be able to find links to articles that have used these datasets. Alternatively, you can search on scholar.google.com or google.com for academic articles that use these datasets.

What to do?

Step 1: Start at one of the main academic data repository websites, such as:

Or, start at a page which has links to academic data repository websites, such as:

Step 2: Go to a respository and search for datasets that look relevant to your interests.

Step 3: Some repositories, such as ICPSR, will include a list of publications which have used the dataset. Find these publications, and see if any fit the necessary critieria for your project.

Step 4: Check you can download the datasets (some datasets have restrictions). If you find a dataset that looks particularly interesting and useful, write it down, and then start conducting scholar.google.com searchers for articles that use this dataset.

Pros of strategy 1

  • ICPSR contains a lot of high quality data and very easy to find links to articles that have published with the data
  • Many of the datasets are very high quality
  • Many of the datasets are able to be downloaded in full (e.g. Australian Election Study)

Cons of strategy 1

  • Many articles that are linked to data only use the data superficially - they are not necessarily statistical analysis or social science models of outcomes. For example, they could use a few statistics from the dataset for a law journal article.
  • Some of the datasets are restricted (only limited amounts of AdHealth are available to public), or require you to apply to access (e.g. AuSSA)


Strategy 2: Search Open Data Websites

What to do?

Step 1: Go to one of the open dataset websites, listed below, and search for datasets that interest you.

Australian Bureau of Statistics

Open Data of various governments

Useful online lists of open data

Step 2: When you find an interesting dataset, do a Google Scholar (or similar) search for articles that use the dataset.

Pros of strategy 1

  • Lots of interesting, public data.

Cons of strategy 1

  • Much of the data was collected for other purposes, so is not useful for academic research
  • Most datasets will not have academic articles written about them.


What to do?

Step 1: Search for articles on topics that interest you on google.scholar.com or similar academic datasbases.

Step 2: When you find interesting articles, look at the ‘Data and Methods’ section, and see if they are using a publicly available datasets

Step 3: If they are using a publicly available dataset, search for it and check you can download it with all the required variables.

Pros of strategy 1

  • You will find lots of interesting articles on your topic

Cons of strategy 1

  • Most academic articles won’t use publicly available datasets.


Strategy 4: Google Scholar search for articles that use public datasets

What to do?

Step 1: Restrict your search to one or two very famous public datasets you know you can get access to, such as the General Social Survey, or the World Values Survey, or the Australian Election Study.

Step 2: Check which years you can download the full dataset for, and double check you can import it into R

Step 3: Search for articles in scholar.google.com but restrict to articles that include the name of your dataset, e.g. “general social survey”

Step 4: When you find interesting articles, check they are of the type appropriate for your project (univariate, bivariate, multivariate analysis)

Pros of strategy 1

  • You should find articles that are highly relevant quite quickly.
  • You will not waste time searching articles that don’t use public datasets
  • You will be able to use scholar.google.com to search for articles, which is sometimes easier than looking through a bibliography/references on ICPSR.

Cons of strategy 1

  • You will be limited to studies that use famous datasets
  • You will still probably find a lot of articles that only use the datasets peripherally.


What to do?

Step 1: Go to Google Dataset Search (Beta)

Step 2: Search for topics or datasets that interest you.

Step 3: When you find an interesting dataset, look for the words under the title that say “X scholarly articles cite this dataset (View in Google Scholar)”. This will help you identify articles that use these datasets. See Figure 1.

Google Dataset Search (Beta). Note the link to 7 scholarly articles that cite this dataset (red box)

Figure 1: Google Dataset Search (Beta). Note the link to 7 scholarly articles that cite this dataset (red box)

Pros and Cons

  • I’ve only just become recently aware of this database.
  • I would love to know what you think the pros and cons of a Google Dataset Search.
  • Send me your feedback, and I will try to include on this website (methods101.com)


Last updated on 05 August, 2019 by Dr Nicholas Harrigan (nicholas.harrigan@mq.edu.au)