January 11, 2016

Inside a Pioneering Italian Data Journalism Collaboration

Print More

Screen Shot 2016-01-08 at 1.02.55 PMEditor’s Note: Confiscati Bene, released in mid-December in Europe, is a pioneering data journalism collaboration that digs into the $4 billion of goods in the EU confiscated from criminals by European authorities. An international team of journalists and their allies sought to create a European database of seized assets and answer troubling questions about the accountability of the process. Confiscati Bene (literally, Well Confiscated) received support from GIJN member JournalismFund.eu; the main project can be seen at  http://eu.confiscatibene.it

In this post, Andrea Nelson Mauro, founder of project leader Dataninja.it (another GIJN member) tells us about organizing the project in Italy, which involved a diverse group of journalists, activists, and technologists. Mauro describes step-by-step the investigation — now published in 19 Italian newspapers and across Europe — and explains the various tools used, including web scraping, content curation, data mining, and coding. We expect more great projects from of this group.

On September 5 we had a Publication Day in Italy for our inve­sti­ga­tion regar­ding goods confiscated from the Mafia: one natio­nal new­spa­per (L’Espresso) and 18 web­si­tes of the same publi­sher (Repubblica-L’Espresso) — see the map below — put online our series revealing how many buil­dings and com­pa­nies have been sei­zed, region by region, to whom they did belon­ged, and what the government is doing to give back these assets to Italian citi­zens. It was a big oppor­tu­nity and an ama­zing expe­rience for those of us wor­king on the inve­sti­ga­tion. We began the project in July 2015.

Meanwhile, a very inte­re­sting blog­post by Alberto Cairo, the Knight Chair in Visual Journalism at the University of Miami, appeared on lthe NiemanLab website. Titled “Data  Journalism Needs To Up Its Own Stan­dards,” the story talked about over-promises from FiveThirtyEight and Vox​.com pro­jects that should “treat their data with more scien­ti­fic rigor,” according to Cairo. For these and for other exam­ples he cited, you may find — IMHO — a lot of inte­re­sting sug­ge­stions, expe­cially if you’re doing jour­na­lism with data, and the kind of issues and doubts we see every day in our jobs inside Dataninja’s pipe­line. Until now data jour­na­lism — as I saw it — has deve­lo­ped too much as descrip­tive sta­ti­stics, data visua­li­za­tions, pre­dic­tive ana­ly­sis, and spe­cial effects on the web (the “Wow! Effect,” as some friends say— or “map-itis” about peo­ple who pu­blish a map every minute without any news value).

So, I’d like to share what we did for the pro­ject “Confiscati Bene” (literally, “Well Confiscated”) with the aim of starting a dialogue and getting feedback on we did well and what we need to improve.

Step 1: From Meeting the Open Data Project “Confiscati Bene” to Working Inside

Spaghetti Open Data (SOD) is a group of Italian citizens interested in the release of public data in open format.

Spaghetti Open Data (SOD) is a group of Italian citizens interested in release of public data in an open format.

The open data world gave me a great oppor­tu­nity for refac­to­ring my skills, and some years back I joi­ned Italy’s “Spaghetti Open Data” community. In March 2014, during a hac­ka­thon, we deve­lo­ped the first ver­sion of “Confiscati Bene,” an indepen­dent pro­ject powe­red by citi­zens to open data on goods sei­zed from the Mafia. As first step, all data was scra­ped from the offi­cial web­site of the agency which has a data­base of con­fi­sca­ted goods. What a great oppor­tu­nity! Not only for publishing the data, but for try­ing to improve the pro­ject with our jour­na­li­stic and data skills. We joi­ned the team and helped build an online plat­form with a data cata­log on Mafia assets that needed to be upda­ted. Working this way we lear­ned a lot about con­fi­sca­ted goods (by rea­ding Acts from Parliament and disco­ve­ring various reports and docu­ments); Team mem­bers shared these docu­ments on a project mailing list. How long would I have spent fin­ding these resour­ces on my own, instead of having a team that shared it quic­kly? How much could peo­ple help us (as jour­na­lists) do our jobs better, if only we gave them the oppor­tu­nity? Doing it toge­ther — and not only with jour­na­lists — should work better.

Step 2: From Starting the Investigation to Publishing in 19 Newspapers and on the Web

By the end of July we had star­ted our inve­sti­ga­tion and built a team of three jour­na­lists (Andrea Nelson Mauro — that’s me! 😉; Alessio Cimarelli; and Gianluca De Martino). We read some­thing like 3,000 pages of docu­ments and reports by various insti­tu­tions and obser­va­to­ries, to better understand the data (even we are not experts in this area). By matching results and leads, we created a kind of “con­tent cura­tion” from the docu­ments, extrac­ting the most important jour­na­li­stic issues (for us). For instance, we disco­ve­red that the Italian Government (with the EU) provided € 6 million to the public agency overseeing con­fi­sca­ted goods to build a big data­base to collect these data, but no one did anything, no one knows where the money went, and no one ever saw the project.

Regarding the skills and acti­vi­ties we developed:

  • Data (and Sto­ry) Mining — This was a big chap­ter of the inve­sti­ga­tion: we did it on offi­cial docu­ments and also on the web for finding mat­ching results and sta­ti­stics with data scra­ped from the public agency of confiscated goods. Sometimes you need to be determined to understand exactly at which step the good is taken. For example, is it sei­zed, con­fi­sca­ted, frozen by the law, ceded to an NGO?
  • Coding and Geo Issues — For sho­wing con­fi­sca­ted goods on a map, we nee­ded to develop a visua­li­za­tion tool. This was created by Alessio Cimarelli, using only open ­source tools (Leaflet, D3js, OSM Nominatim, and others). Data are sho­wed on the Italian regions by abso­lute values and not nor­ma­li­zed by popu­la­tion or other dimen­sions, because we aimed to draft a kind of raw over­view: to show where the Mafia spent the money, and the dif­fe­rences bet­ween big cities and small towns.
  • Content Curation — We thought every con­fi­sca­tion should be told by every new­spa­per, as well. Starting from this idea, we aggre­ga­ted every story by region and from new­spa­per archi­ves, and from the most impor­tant bosses from whom goods were con­fi­sca­ted. Working this way (and after mat­ching results with quan­ti­ta­tive data) we could draw an over­view showing which mafia (the Sicilian Mafia, the Camorra, the Ndrangheta), sho­wing a kind of distri­bu­tion by region.
  • The Review Process — Working in a team is very hel­p­ful for poin­ting out mista­kes, but even better in my honest opi­nion was sha­ring articlel drafts with other mem­bers of the pro­ject.

Step 3: Looking Forward, We’re Doing Database Journalism

After we pu­bli­shed, we gave the data back to Confiscati Bene, by uploa­ding in a data cata­log deve­lo­ped with DKAN (a Drupal CMS like CKAN, powe­red for us by Twinbit). We’re part of the team of this pro­ject, so we’re inte­re­sted for impro­ving it, col­lec­ting other data and deve­lo­ping other chap­ters (for instance across Europe). With release of the project in 19 new­spa­pers, we’ve now successfully disseminated not only the news from the project but the data itself, and we’re continuing to update the data. I don’t know where we will end up, but I know we’re moving forward and try­ing to improve this, so maybe you will hear yet more about Confiscati Bene.


This post was originally published at DataNinja.it and is posted here with permission. Andrea Nelson Mauro is a data journalist and founder of Dataninja.it and Datamediahub.it, and is part of SpaghettiOpenData.org and OpenDataSicilia.it.

Leave a Reply

Your email address will not be published. Required fields are marked *