Accessibility Settings

color options

monochrome muted color dark

reading tools

isolation ruler

Stories

Topics

Inside a Pioneering Italian Data Journalism Collaboration

Screen Shot 2016-01-08 at 1.02.55 PMEditor’s Note: Confiscati Bene, released in mid-December in Europe, is a pioneering data journalism collaboration that digs into the $4 billion of goods in the EU confiscated from criminals by European authorities. An international team of journalists and their allies sought to create a European database of seized assets and answer troubling questions about the accountability of the process. Confiscati Bene (literally, Well Confiscated) received support from GIJN member JournalismFund.eu; the main project can be seen at  http://eu.confiscatibene.it

In this post, Andrea Nelson Mauro, founder of project leader Dataninja.it (another GIJN member) tells us about organizing the project in Italy, which involved a diverse group of journalists, activists, and technologists. Mauro describes step-by-step the investigation — now published in 19 Italian newspapers and across Europe — and explains the various tools used, including web scraping, content curation, data mining, and coding. We expect more great projects from of this group.


On September 5 we had a Publication Day in Italy for our inve­sti­ga­tion regar­ding goods confiscated from the Mafia: one natio­nal new­spa­per (L’Espresso) and 18 web­si­tes of the same publi­sher (Repubblica-L’Espresso) — see the map below — put online our series revealing how many buil­dings and com­pa­nies have been sei­zed, region by region, to whom they did belon­ged, and what the government is doing to give back these assets to Italian citi­zens. It was a big oppor­tu­nity and an ama­zing expe­rience for those of us wor­king on the inve­sti­ga­tion. We began the project in July 2015.

Meanwhile, a very inte­re­sting blog­post by Alberto Cairo, the Knight Chair in Visual Journalism at the University of Miami, appeared on lthe NiemanLab website. Titled “Data  Journalism Needs To Up Its Own Stan­dards,” the story talked about over-promises from FiveThirtyEight and Vox​.com pro­jects that should “treat their data with more scien­ti­fic rigor,” according to Cairo. For these and for other exam­ples he cited, you may find — IMHO — a lot of inte­re­sting sug­ge­stions, expe­cially if you’re doing jour­na­lism with data, and the kind of issues and doubts we see every day in our jobs inside Dataninja’s pipe­line. Until now data jour­na­lism — as I saw it — has deve­lo­ped too much as descrip­tive sta­ti­stics, data visua­li­za­tions, pre­dic­tive ana­ly­sis, and spe­cial effects on the web (the “Wow! Effect,” as some friends say— or “map-itis” about peo­ple who pu­blish a map every minute without any news value).

So, I’d like to share what we did for the pro­ject “Confiscati Bene” (literally, “Well Confiscated”) with the aim of starting a dialogue and getting feedback on we did well and what we need to improve.

Step 1: From Meeting the Open Data Project “Confiscati Bene” to Working Inside

Spaghetti Open Data (SOD) is a group of Italian citizens interested in the release of public data in open format.

Spaghetti Open Data (SOD) is a group of Italian citizens interested in release of public data in an open format.

The open data world gave me a great oppor­tu­nity for refac­to­ring my skills, and some years back I joi­ned Italy’s “Spaghetti Open Data” community. In March 2014, during a hac­ka­thon, we deve­lo­ped the first ver­sion of “Confiscati Bene,” an indepen­dent pro­ject powe­red by citi­zens to open data on goods sei­zed from the Mafia. As first step, all data was scra­ped from the offi­cial web­site of the agency which has a data­base of con­fi­sca­ted goods. What a great oppor­tu­nity! Not only for publishing the data, but for try­ing to improve the pro­ject with our jour­na­li­stic and data skills. We joi­ned the team and helped build an online plat­form with a data cata­log on Mafia assets that needed to be upda­ted. Working this way we lear­ned a lot about con­fi­sca­ted goods (by rea­ding Acts from Parliament and disco­ve­ring various reports and docu­ments); Team mem­bers shared these docu­ments on a project mailing list. How long would I have spent fin­ding these resour­ces on my own, instead of having a team that shared it quic­kly? How much could peo­ple help us (as jour­na­lists) do our jobs better, if only we gave them the oppor­tu­nity? Doing it toge­ther — and not only with jour­na­lists — should work better.

Step 2: From Starting the Investigation to Publishing in 19 Newspapers and on the Web

By the end of July we had star­ted our inve­sti­ga­tion and built a team of three jour­na­lists (Andrea Nelson Mauro — that’s me! ?; Alessio Cimarelli; and Gianluca De Martino). We read some­thing like 3,000 pages of docu­ments and reports by various insti­tu­tions and obser­va­to­ries, to better understand the data (even we are not experts in this area). By matching results and leads, we created a kind of “con­tent cura­tion” from the docu­ments, extrac­ting the most important jour­na­li­stic issues (for us). For instance, we disco­ve­red that the Italian Government (with the EU) provided € 6 million to the public agency overseeing con­fi­sca­ted goods to build a big data­base to collect these data, but no one did anything, no one knows where the money went, and no one ever saw the project.

Regarding the skills and acti­vi­ties we developed:

  • Data (and Sto­ry) Mining — This was a big chap­ter of the inve­sti­ga­tion: we did it on offi­cial docu­ments and also on the web for finding mat­ching results and sta­ti­stics with data scra­ped from the public agency of confiscated goods. Sometimes you need to be determined to understand exactly at which step the good is taken. For example, is it sei­zed, con­fi­sca­ted, frozen by the law, ceded to an NGO?
  • Coding and Geo Issues — For sho­wing con­fi­sca­ted goods on a map, we nee­ded to develop a visua­li­za­tion tool. This was created by Alessio Cimarelli, using only open ­source tools (Leaflet, D3js, OSM Nominatim, and others). Data are sho­wed on the Italian regions by abso­lute values and not nor­ma­li­zed by popu­la­tion or other dimen­sions, because we aimed to draft a kind of raw over­view: to show where the Mafia spent the money, and the dif­fe­rences bet­ween big cities and small towns.
  • Content Curation — We thought every con­fi­sca­tion should be told by every new­spa­per, as well. Starting from this idea, we aggre­ga­ted every story by region and from new­spa­per archi­ves, and from the most impor­tant bosses from whom goods were con­fi­sca­ted. Working this way (and after mat­ching results with quan­ti­ta­tive data) we could draw an over­view showing which mafia (the Sicilian Mafia, the Camorra, the Ndrangheta), sho­wing a kind of distri­bu­tion by region.
  • The Review Process — Working in a team is very hel­p­ful for poin­ting out mista­kes, but even better in my honest opi­nion was sha­ring articlel drafts with other mem­bers of the pro­ject.

Step 3: Looking Forward, We’re Doing Database Journalism

After we pu­bli­shed, we gave the data back to Confiscati Bene, by uploa­ding in a data cata­log deve­lo­ped with DKAN (a Drupal CMS like CKAN, powe­red for us by Twinbit). We’re part of the team of this pro­ject, so we’re inte­re­sted for impro­ving it, col­lec­ting other data and deve­lo­ping other chap­ters (for instance across Europe). With release of the project in 19 new­spa­pers, we’ve now successfully disseminated not only the news from the project but the data itself, and we’re continuing to update the data. I don’t know where we will end up, but I know we’re moving forward and try­ing to improve this, so maybe you will hear yet more about Confiscati Bene.


andreamauro

This post was originally published at DataNinja.it and is posted here with permission. Andrea Nelson Mauro is a data journalist and founder of Dataninja.it and Datamediahub.it, and is part of SpaghettiOpenData.org and OpenDataSicilia.it.

Republish our articles for free, online or in print, under a Creative Commons license.

Republish this article


Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

Read Next

Forbidden Stories' Pegasus Project exposé

Data Journalism

Data Journalism Top 10: Pegasus, Silencing Reporters, Europe Flooding, Diversity Mapping, K-pop

Our NodeXL mapping from July 12 to 18, which tracks the most popular data journalism stories on Twitter each week, found a series of articles resulting from the collaborative project that analyzed an unprecedented leak of more than 50,000 phone numbers selected for surveillance. In this edition, we also feature an insight into Facebook’s data wars by The New York Times, an interactive piece by Al Jazeera on how the holy city of Mecca has expanded, and a colorful project by the Washington Post on the rise of K-pop.

ProPublica illustration for The Secret IRS Files'

Data Journalism

Data Journalism Top 10: Secret Tax Files, India’s Faltering Vaccines, Western Drought, Argentina’s News Deserts, The Gambia’s Toxic Water

Our NodeXL #ddj mapping from June 7 to 13, which tracks the most popular data journalism stories on Twitter each week, focused in on this major investigation by ProPublica, which offers an unprecedented look inside the financial lives of US billionaires. In this edition, we also feature a detailed look at India’s faltering vaccination campaign, a data project exploring Argentina’s news deserts, and an investigation of The Gambia’s water paradox.

Data Journalism

Data Journalism Top 10: Tulsa Race Massacre, Canada’s Prison Bias, Colombia’s Police Violence, Football’s Big Money, Europe’s Lobbyists, Battling Misinformation

For inmates in Canada, risk assessments can determine which type of prison they are sent to and their chances of successfully reentering society. But an investigation by The Globe and Mail revealed that these assessments are biased against Indigenous and Black inmates. Our NodeXL #ddj mapping from May 24 to 30 also found an interactive project by The New York Times recreating the Black neighborhood in Tulsa, Oklahoma destroyed in 1921, and data-driven reporting on the influence of big money in soccer, the cost of Italy’s vaccination campaign, and police violence during the recent protests in Colombia.

Data Journalism

Updating the Inverted Pyramid of Data Journalism

Data journalist and instructor Paul Bradshaw has updated the Inverted Pyramid of Data Journalism — and brought together resources for every stage. The most basic change is the recognition of a stage that precedes all others — idea generation.