Newsrooms don’t need large budgets for analyzing data–they can easily access basic data tools that are free or inexpensive. The summary below is based on a five-day training session at Delo, the leading daily newspaper in Slovenia. Anuška Delić, journalist and project leader of DeloData at the paper, initiated the training with the aim of getting her team to work on data stories with easily available tools and a lot of new data.
“At first it seemed that not all of the 11 participants, who had no or almost no prior knowledge of this exciting field of journalism, would ‘catch the bug’ of data-driven thinking about stories, but soon it became obvious” once the training commenced, said Delić.
Introducing Data Tools
In addition to demonstrating basic Internet searches (see below), advanced Excel, Google Fusion, OpenRefine, and Helium Scraper, which I also included in trainings at the European Data Journalism Conference (Data Harvest), I offered training in PDF-extraction with CometDocs, DocumentCloud, Datawrapper, and CartoDB.
It turns out there is a lot of good data in Slovenia that can be used for stories, from the statistical office, for example. Such data can even be sorted according by municipality, which is also the case in other European Union countries.
Internet Search Tips
We extracted data from PDFs, using CometDocs and OnlineOCR.net. See also this overview of good tools for importing PDFs. CometDocs will solve most needs of PDF extraction while also recognizing special characters in alphabets of different countries. For members of Investigative Reporters and Editors (IRE), CometDocs is free.
DocumentCloud is free to use. It’s a good tool for embedding notes in a document, giving readers an opportunity to review the entire document.
The basic version of Helium Scraper, which is a good tool, costs US$100. It is the easiest way to begin scraping, I think. It works on PCs, but not on Macs.
Here you can also find other tools for scraping data from the web.
Google Fusion is a great mapping tool and free to use in most cases. It’s important to try to get the right version of the map of municipalities in your country and import it as a standard-map into Google Fusion. Below are some good links for working with Fusion:
Data Wrapper is a very easy tool for making good interactive graphs, but embedding the graphs from the company’s server requires payment.
The server can also be used for maps created via Google Fusion, but remember to structure your drives.
CartoDB is a great alternative to Google Fusion with a lot of possibilities to make maps in new ways.
In the free version, it’s possible to upload an unlimited number of maps and tables, however, the total data limit is 50 MB, which is enough in most cases. In the free version, there is limited access to geocoding, which then needs to be done with another tool. Or the newsroom has to acquire at least one paid account.
TimelineJS is a free, open-source tool that enables users to build visually-rich interactive timelines. It’s available in 40 languages. You can easily build the content in a Google Spreadsheet and then import it to TimelineJS.
Happy data drilling!
Nils Mulvad is a co-founder and board member of the Global Investigative Journalism Network, as well as Investigative Reporting Denmark. He is also editor at Kaas & Mulvad, a data journalism consulting firm and associate professor at the The Danish School of Media and Journalism. He was CEO for the Danish International Center for Analytical Reporting 2001-2006 and European journalist of the year in 2006.