How One Mexican Data Team Uncovered the Story of 4,000 Missing Women
Mexican newspaper El Universal has put a face to the 4,534 women who have gone missing in Mexico City and the State of Mexico over the last decade: Ausencias Ignoradas (Ignored Absences) aims to put pressure on the government and eradicate this situation.
Daniela Guazo, from the data journalism team, explains how they gathered the data and presented the information not as numbers but as close people:
Breaking Down the Numbers
The Mexican government reported 4,281 missing women from 2005 to 2014, of which they are still looking for 2,000. The number was there — but nobody broke it down.
“The Mexican government declares reports and statistics without uploading the data. Therefore, when you want to check the information, there isn’t any document to follow or refer to.”
Scraping the Data
El Universal Data worked with Morlan, a company specialized in data analysis and programming, to gather the information from Odisea and Capea. Both are official websites which hold information on missing people but don’t present them in a downloadable format.
They were able to scrape 1,480 records (pictures and text) from Odisea in a JSON format before the website was closed down in November last year.
However, they could not scrape the data on Capea: the structure was extremely bad and journalists had to transcribe the information by hand in Excel.
By February 2016 the website had 6,787 records of which 3,054 could be systematized:
“We started reading record by record and filtered them by gender. Once we got all the missing women, we followed the structure from Odisea and started building the dataset for Mexico City.”
Once this process was completed, they matched and cleaned both data sets. This left 4,534 faces with some patterns (such as the age, body size, height or the color of the eyes), which they brought to the Mexican authorities.
“When it comes to missing people, there isn’t open data. Authorities don’t want to upload databases with all these details and all you have online is messy data in non readable formats such as JPGs that have to be scraped or copied by hand.”
Families Waiting for Their Daughters
Although they presented the story using one case as the backbone, they spoke to at least ten families in Mexico City. All complained about the same things:
- Unhelpful authorities
- Daughters would have called the family to say goodbye and that they are safe
- They did not pack their suitcases
- Mobiles phones are disconnected on the same day
- Families are the ones who look for the missing people because the government mainly categorizes these cases as “not located”, “lost” or “absent”, meaning that there isn’t a crime.
Data Journalism in Latin America
Daniela has worked as a data journalist for the last six years. She says that there are several countries such as Peru or Argentina that are growing open data and improving data journalism skills.
However, Mexico isn’t part of that:
“They are now understanding that data journalism is not only about graphics, numbers or statistics. It has a very strong journalism component. But resources are very scarce.”
The El Universal Data team, comprising Lilia Saúl and Daniela Guazo, was able to create Ausencias Ignoradas thanks to the Mike O’Connor Scholarship from the International Center for Journalists (ICFJ).
The story took six months including planning, gathering and analysing data, taking pictures, talking to families, writing the article, programming and designing — and it received a strong response from the audience and major organisations.
The next step is to update the information from 2016 and create another database for missing men in the city.
This story originally appeared on the Online Journalism Blog and is reprinted with permission.
Maria Crosas is a journalist interested in data journalism and visualizations. She has worked as a visual journalist in Spain and she’s currently finishing an Online Journalism MA at Birmingham City University, where she has been experimenting with the use of virtual reality and bots in journalism. On her blog, she writes about data, journalism, and visualizations inside and outside newsrooms.