Inside a Pioneering Italian Data Journalism Collaboration

Print More

Screen Shot 2016-01-08 at 1.02.55 PMEditor’s Note: Confiscati Bene, released in mid-December in Europe, is a pioneering data journalism collaboration that digs into the $4 billion of goods in the EU confiscated from criminals by European authorities. An international team of journalists and their allies sought to create a European database of seized assets and answer troubling questions about the accountability of the process. Confiscati Bene (literally, Well Confiscated) received support from GIJN member; the main project can be seen at

In this post, Andrea Nelson Mauro, founder of project leader (another GIJN member) tells us about organizing the project in Italy, which involved a diverse group of journalists, activists, and technologists. Mauro describes step-by-step the investigation — now published in 19 Italian newspapers and across Europe — and explains the various tools used, including web scraping, content curation, data mining, and coding. We expect more great projects from of this group.

On September 5 we had a Publication Day in Italy for our inve­sti­ga­tion regar­ding goods confiscated from the Mafia: one natio­nal new­spa­per (L’Espresso) and 18 web­si­tes of the same publi­sher (Repubblica-L’Espresso) — see the map below — put online our series revealing how many buil­dings and com­pa­nies have been sei­zed, region by region, to whom they did belon­ged, and what the government is doing to give back these assets to Italian citi­zens. It was a big oppor­tu­nity and an ama­zing expe­rience for those of us wor­king on the inve­sti­ga­tion. We began the project in July 2015.

Meanwhile, a very inte­re­sting blog­post by Alberto Cairo, the Knight Chair in Visual Journalism at the University of Miami, appeared on lthe NiemanLab website. Titled “Data  Journalism Needs To Up Its Own Stan­dards,” the story talked about over-promises from FiveThirtyEight and Vox​.com pro­jects that should “treat their data with more scien­ti­fic rigor,” according to Cairo. For these and for other exam­ples he cited, you may find — IMHO — a lot of inte­re­sting sug­ge­stions, expe­cially if you’re doing jour­na­lism with data, and the kind of issues and doubts we see every day in our jobs inside Dataninja’s pipe­line. Until now data jour­na­lism — as I saw it — has deve­lo­ped too much as descrip­tive sta­ti­stics, data visua­li­za­tions, pre­dic­tive ana­ly­sis, and spe­cial effects on the web (the “Wow! Effect,” as some friends say— or “map-itis” about peo­ple who pu­blish a map every minute without any news value).

So, I’d like to share what we did for the pro­ject “Confiscati Bene” (literally, “Well Confiscated”) with the aim of starting a dialogue and getting feedback on we did well and what we need to improve.

Step 1: From Meeting the Open Data Project “Confiscati Bene” to Working Inside

Spaghetti Open Data (SOD) is a group of Italian citizens interested in the release of public data in open format.

Spaghetti Open Data (SOD) is a group of Italian citizens interested in release of public data in an open format.

The open data world gave me a great oppor­tu­nity for refac­to­ring my skills, and some years back I joi­ned Italy’s “Spaghetti Open Data” community. In March 2014, during a hac­ka­thon, we deve­lo­ped the first ver­sion of “Confiscati Bene,” an indepen­dent pro­ject powe­red by citi­zens to open data on goods sei­zed from the Mafia. As first step, all data was scra­ped from the offi­cial web­site of the agency which has a data­base of con­fi­sca­ted goods. What a great oppor­tu­nity! Not only for publishing the data, but for try­ing to improve the pro­ject with our jour­na­li­stic and data skills. We joi­ned the team and helped build an online plat­form with a data cata­log on Mafia assets that needed to be upda­ted. Working this way we lear­ned a lot about con­fi­sca­ted goods (by rea­ding Acts from Parliament and disco­ve­ring various reports and docu­ments); Team mem­bers shared these docu­ments on a project mailing list. How long would I have spent fin­ding these resour­ces on my own, instead of having a team that shared it quic­kly? How much could peo­ple help us (as jour­na­lists) do our jobs better, if only we gave them the oppor­tu­nity? Doing it toge­ther — and not only with jour­na­lists — should work better.

Step 2: From Starting the Investigation to Publishing in 19 Newspapers and on the Web

By the end of July we had star­ted our inve­sti­ga­tion and built a team of three jour­na­lists (Andrea Nelson Mauro — that’s me! ?; Alessio Cimarelli; and Gianluca De Martino). We read some­thing like 3,000 pages of docu­ments and reports by various insti­tu­tions and obser­va­to­ries, to better understand the data (even we are not experts in this area). By matching results and leads, we created a kind of “con­tent cura­tion” from the docu­ments, extrac­ting the most important jour­na­li­stic issues (for us). For instance, we disco­ve­red that the Italian Government (with the EU) provided € 6 million to the public agency overseeing con­fi­sca­ted goods to build a big data­base to collect these data, but no one did anything, no one knows where the money went, and no one ever saw the project.

Regarding the skills and acti­vi­ties we developed:

  • Data (and Sto­ry) Mining — This was a big chap­ter of the inve­sti­ga­tion: we did it on offi­cial docu­ments and also on the web for finding mat­ching results and sta­ti­stics with data scra­ped from the public agency of confiscated goods. Sometimes you need to be determined to understand exactly at which step the good is taken. For example, is it sei­zed, con­fi­sca­ted, frozen by the law, ceded to an NGO?
  • Coding and Geo Issues — For sho­wing con­fi­sca­ted goods on a map, we nee­ded to develop a visua­li­za­tion tool. This was created by Alessio Cimarelli, using only open ­source tools (Leaflet, D3js, OSM Nominatim, and others). Data are sho­wed on the Italian regions by abso­lute values and not nor­ma­li­zed by popu­la­tion or other dimen­sions, because we aimed to draft a kind of raw over­view: to show where the Mafia spent the money, and the dif­fe­rences bet­ween big cities and small towns.
  • Content Curation — We thought every con­fi­sca­tion should be told by every new­spa­per, as well. Starting from this idea, we aggre­ga­ted every story by region and from new­spa­per archi­ves, and from the most impor­tant bosses from whom goods were con­fi­sca­ted. Working this way (and after mat­ching results with quan­ti­ta­tive data) we could draw an over­view showing which mafia (the Sicilian Mafia, the Camorra, the Ndrangheta), sho­wing a kind of distri­bu­tion by region.
  • The Review Process — Working in a team is very hel­p­ful for poin­ting out mista­kes, but even better in my honest opi­nion was sha­ring articlel drafts with other mem­bers of the pro­ject.

Step 3: Looking Forward, We’re Doing Database Journalism

After we pu­bli­shed, we gave the data back to Confiscati Bene, by uploa­ding in a data cata­log deve­lo­ped with DKAN (a Drupal CMS like CKAN, powe­red for us by Twinbit). We’re part of the team of this pro­ject, so we’re inte­re­sted for impro­ving it, col­lec­ting other data and deve­lo­ping other chap­ters (for instance across Europe). With release of the project in 19 new­spa­pers, we’ve now successfully disseminated not only the news from the project but the data itself, and we’re continuing to update the data. I don’t know where we will end up, but I know we’re moving forward and try­ing to improve this, so maybe you will hear yet more about Confiscati Bene.


This post was originally published at and is posted here with permission. Andrea Nelson Mauro is a data journalist and founder of and, and is part of and

Leave a Reply

Your email address will not be published. Required fields are marked *