How Univision Revealed Flaws in Costa Rica’s Judicial System

Four years of work and 8,000 judicial rulings later, the team at Univision Data shows how in Costa Rica, a person is more likely to be convicted of a crime if they are assigned a public defense attorney than if they have a private one. Their methodology included web scraping, R and logistic regression — a statistical method common in social sciences but practically unexplored in newsrooms.

The Quartz Guide to Bad Data

This guide was written to help Quartz staff identify problems with data they report on. After publishing it on GitHub, we heard from folks in many other industries who also found it helpful, so we’re republishing it here for the benefit of all Quartz readers. The most up-to-date version of this guide can always be found on GitHub. An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them. As a reporter your world is full of data.

Investigative Reporting in 2015: GIJN’s Top 12 Stories

As 2015 nears an end, we’d like to share our top 12 stories of the year — the stories that you, our dear readers, found most compelling. The list ranges from free data tools and crowdfunding to the secrets of the Wayback Machine. Please join us in taking a look at The Best of GIJN.org this year.

On the Ethics of Web Scraping and Data Journalism

Web scraping is a way to extract information presented on websites. As I explained it in the first installment of this article, web scraping is used by many companies. It’s also a great tool for reporters who know how to code, since more and more public institutions publish their data on their websites.
With web scrapers, which are also called “bots,” it’s possible to gather large amounts of data for stories. But what are the ethical rules that reporters have to follow while web scraping?

Web Scraping: A Journalist’s Guide

$8 billion in just a few hours earlier this year? It was because of a web scraper, a tool companies use—as do many data reporters. A web scraper is simply a computer program that reads the HTML code from webpages, and analyze it. With such a program, or “bot,” it’s possible to extract data and information from websites.