Accessibility Settings

color options

monochrome muted color dark

reading tools

isolation ruler
» Tipsheet

Resource

Topics

Prepping Data – Tips

Read this article in

Once you have your data, check out these free online tipsheets and tutorials for advice on how to inspect and clean it before you start analyzing.

This story is a great example of what to do when there are gaps in terms of the data available from official bodies (2021).

Data Biographies: How to Get to Know Your Data (2017) is a blog post by Heather Krause of the Canadian data journalism consulting website idatassist that walks through the process of interrogating the contents and collection process (as well as potential shortcomings) of a dataset before analyzing the data. 

The Quartz Guide to Bad Data (2018) is a file on GitHub that discusses the most common problems found in datasets and how to solve them. It has been translated into Chinese, Japanese, Portuguese, and Spanish.

ProPublica’s Guide to Bulletproofing Data (updated 2018) Put together by Jennifer LaFleur with many contributions. Best practices for checking your data. It’s a work in progress, so add your suggestions. 

This tutorial by Belgian journalist Stijn Debrouwere explains how to find common flaws in data and avoid misinterpreting datasets. It is available with a free subscription to the datajournalism.com site.

Get Started with OpenRefine (2017) is a quick tutorial with screenshots that walks through the basic features of the data cleaning tool OpenRefine. It was created by UCLA professor Miriam Posner.

Cleaning Data in OpenRefine (2018) is a detailed online guide with hands-on examples and video tutorials that walks users through the process of cleaning and standardizing data in OpenRefine. It was created by John Little, a data science librarian at Duke University.

This tutorial taught by Belgian data journalist Maarten Lambrechts is an introduction to using Excel to clean and standardize messy data. The training requires a free account with Datajournalism.com.

Republish our articles for free, online or in print, under a Creative Commons license.

Republish this article


Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

Read Next

Data Journalism News & Analysis

From Space to Story in Data Journalism

Over the past 10 years satellite imagery has become an important component of data journalism. In the next 10, it will likely evolve further, from a tool used primarily for illustrating stories to an integral part of research and investigative reporting.