Journalists looking for their next data-driven story might find an API just as useful — if not more so — than a traditional spreadsheet. What are APIs and how can they help in reporting? Paul Bradshaw explains. To watch a video tutorial by Bradshaw on working with APIs, go to the end of this article.
If you’re looking for data as a journalist, chances are you might find it in an API. What those three letters stand for (Application Programming Interface) isn’t that important — what is important to journalists is that an API is a way of asking questions and getting answers.
The Basics — How Do APIs Differ from Data and How to Use Them
So is an API just a type of database? Well, yes and no. Some APIs work like a database, where you can specify a keyword or category, select parameters for your search, and get results in a data format like CSV, JSON, or XML.
OpenCorporates is one example of a website that offers an API like this: supply a name to one part of the API, and it will give you data on company directors that match (this API was used for Global Witness’s investigation into the jade trade). OpenCorporates offer free accounts to journalists and you’ll find videos like this one where they explain how to use the platform for investigations.
The UK police API, for example, will provide data on crimes near any given location. To get that data you need to form a URL: one example that is given on the API’s ‘documentation’ is https://data.police.uk/api/crimes-at-location?date=2017-02&lat=52.629729&lng=-1.131592
That URL basically encodes a particular query. It breaks down into a number of parts:
- The base URL, which points to the API itself. In this case: https://data.police.uk/api/
- The type of question that is being asked (an API may support a number of different types of questions). In this case: crimes-at-location
- A question mark followed by any ingredients that are used in your question. In this case the example indicates that the API expects a date, a latitude and a longitude: date=2017-02&lat=52.629729&lng=-1.131592
If I split those up at each ampersand you can better see what’s being asked for:
date=2017-02 & lat=52.629729 & lng=-1.131592
The elements after each equals sign (and before any ampersands that follow) are the values that are specific to your query, so in this case you might change the value after ‘lat=’ and ‘lng=’ to get data on a different location, and the value of the date in YYYY-MM format to get data for a different period (the API no longer returns data from 2017 so in this particular example you will need to change this regardless).
When you load that URL — use Firefox or Chrome for the best results — you will be given data in response to that query. It won’t look like a normal webpage because the data will most likely be in JSON format (some APIs will allow you to specify what format you prefer, and some provide data in XML), which uses square and curly brackets to structure the data.
However, if you just want data from one query then it is possible to save the JSON or XML (by using the browser’s File > Save option) and searching for ‘JSON to CSV converter’ or ‘XML to CSV converter’ to find one of the many free tools that will convert the data to a spreadsheet (such as the ones on ConvertCSV).
Using APIs for Searching Images or Other Media
But other APIs allow you to ask a question in the form of a picture: Google’s Cloud Vision API will, when given an image, return text that it identifies in the image, list objects, and ratings against concepts such as ‘racy’ or ‘violence’. The Microsoft Emotion API was used by Periscopic, for example, to produce an analysis of Donald Trump’s facial expressions during major speeches.
The other big difference between a database and an API is that some APIs don’t return data in answer to a query — they return other media such as images and sounds: QuickChart is just one of a number of APIs that will generate a chart when given the data. Other APIs will return audio (music or speech), and the Google Maps API will give you a map when queried about a lat-long coordinate.
A final difference — crucial for journalists — is that APIs are often much more regularly updated than published datasets. Some APIs provide access to live data: Twitter’s API, for example, tells you what’s on the site right now. This means that APIs can be used to power interactives and visualizations that are updated whenever the data is — there’s no need to download the latest spreadsheet, repeat the analysis and create a new chart: instead you can write a script that changes the charts whenever the data is updated.
Building Stories with APIs
APIs open up new possibilities for storytelling that aren’t always available with traditional datasets. The Spotify API provides data on music and listening behavior that isn’t accessible in any other way. At the BBC Data Unit, I used it as part of a report on gender balance at music festivals, while for a story on the most expensive train tickets I used Google Maps’s Distance Matrix API, which allowed me to find out the distance of a train route itself — something which you don’t get on Google Maps itself.
APIs can be especially useful for augmenting existing data: a number of APIs will — when given a list of names — return a likely gender for each, alongside a confidence level. The Parserator API will split an address into its separate components; and there are multiple APIs that will give you a list of administrative bodies for any given postcode or zip code, helping you to categorize granular data by area.
Where to find APIs
Many data sources will provide an API alongside traditional spreadsheets — it’s always worth looking out for this, and checking whether it offers additional functionality or information. But it’s also worth including the word ‘API’ when searching for data more generally.
Once you do find an API look for the “documentation”: this is a page or collection of pages that normally explains how the API works. It’s normally aimed at people familiar with programming so there will be some jargon to come to grips with, including:
- Functions and methods: These are the questions that you can ask in an API. The OpenCorporates API, for example, has a “method call” for searching company officers, and another for searching by company number, among others.
- Arguments: These are parameters that you might include when querying an API. For example, you might need to include an argument specifying the file format you want the results in, or the zip code you want data about.
- API keys: A “key” is a password that some APIs might require you to include in your query. To get a key you’ll normally have to register with the site, and to use it you normally include it as one of the “arguments” (see above).
- Limits and quotas: Some APIs will limit the number of queries (or “requests”) you can make each hour or day or month.
- Endpoints: An endpoint is a URL you generate in order to ask your question. When loaded, the URL should provide the response to your question, in the form of data (often in JSON format).
Not all APIs are created equal: some are poorly documented; some may even be broken — so don’t be afraid to abandon one API and try others until you’re happy with one that makes sense.
There’s a certain learning curve involved in using APIs — it helps if you already know some programming, or can get help from a developer. So if you’re just getting started, look for tutorials that take you through the process of working with a particular API, and try to pick APIs that provide examples of URLs that you can adapt (the UK police API, for example, offers an ‘example request’ for each method) or interactive tools that allow you to generate a URL.
Paul Bradshaw leads both the MA in Data Journalism and the MA in Multiplatform and Mobile Journalism at Birmingham City University in the UK. He also works as a consultant data journalist in the BBC England data unit.