Accessibility Settings

color options

monochrome muted color dark

reading tools

isolation ruler

Stories

Topics

My Favorite Tools with Russia’s Roman Anin

Read this article in

Image Courtesy: Roman Anin

For GIJN’s My Favorite Tools series, we spoke with Roman Anin, the 33-year-old founder and editor-in-chief of IStories, a nonprofit Russian investigative news site.

Launched this year and based in Moscow, IStories — short for “Important Stories” — has 13 staff. So far it has investigated topics as varied as the persecution of opposition politician Alexei Navalny, nepotism within the country’s waste management market, and a dubious coronavirus antibody test.

Born and raised in Moldova, Anin was initially set on becoming a professional soccer player. But at age 17 he left his youth team when his family moved to Russia. Intent on retaining a link to the beautiful game, he studied journalism at Moscow State University — in the hope of becoming a sports commentator — before joining the renowned Moscow-based newspaper Novaya Gazeta as a sports writer in 2006. 

Novaya Gazeta, a beleaguered publication that has seen six of its staff murdered since 2000, has long had a strong investigative culture. As a result, rather than simply covering sporting events, Anin soon found himself digging into stories of corruption in soccer, including match fixing.

Roman Anin

In August 2008, while most of his colleagues were away on holiday, the Russo-Georgian war broke out. Novaya Gazeta sent Anin to report from the front line. On his return, he joined the newspaper’s investigative team, where he remained until earlier this year. In that role he reported on the notorious fraud case uncovered by lawyer and tax auditor Sergei Magnitsky; the Panama Papers’ revelations regarding Sergey Roldugin, a billionaire cellist and lifelong friend of Vladimir Putin; construction contract corruption at the 2014 Winter Olympics in Sochi; and Aslan Gagiyev, accused of being the architect of a prolific murder squad. Some of these investigations he conducted in association with the Organized Crime and Corruption Reporting Project (OCCRP), an international consortium of which he has been a member since 2009.

Anin has received many reporting awards, among them the Knight International Journalism Award in 2013 for his work on Magnitsky, and three of the most prestigious awards in Russian investigative journalism: the Artem Borovik award, the Youlian Semenov award, and the Andrey Sakharov award. He was also awarded the International Center for Journalists’ 2020 Knight Trailblazer Award for having launched IStories. His reporting on Magnitsky prompted criminal investigations in several countries, while his reporting on the Sochi games led then-President Dmitry Medvedev to demand an inquiry. But as might be expected in a country ranked 149 out of 180 in Reporters Without Borders’ 2020 World Press Freedom Index, Russian authorities do not usually take kindly to his investigative work. (For his part, Anin acknowledges that Russia is “not the safest country” for journalists but adds that it is even worse elsewhere.)

Anin spent the 2018-19 academic year at Stanford University, as a John S. Knight Journalism Fellow, taking classes in coding and psychology — both useful skills for journalism, he says — and it was there that he conceived IStories.

Here, in Anin’s own words, are some of his favorite tools:

VeraCrypt

VeraCrypt allows you to create encrypted folders in which you can keep data securely. Before VeraCrypt, I used TrueCrypt, which I learned about from Julian Assange while I worked on Cablegate, the United States’ diplomatic cables leak. 

“I came to London to retrieve from WikiLeaks the cables from the US embassy in Russia. To transport the data, I couldn’t just keep it openly on my laptop, or even online; I had to keep it securely. To do that, I placed the data in encrypted folders. If anyone had checked my laptop, they would not have been able to find the folders; even if they had, they would not have been able to decrypt them. 

“Now I use VeraCrypt — an open source tool — which does the same job. You can create encrypted folders on your computer and, if you want, upload them online. It also allows you to camouflage the folder so that it doesn’t look like a data folder, and instead looks like an app or a movie. 

“I use VeraCrypt every day to encrypt all of my investigative work.”

LastPass

LastPass stores encrypted passwords. It also allows you to synchronize your devices so that they all hold your passwords securely. You then access all your passwords on LastPass thanks to a master password. This tool allows you to use many complex passwords, and change them frequently, without having to remember them all. I use it on a daily basis.

“I know how important security is because I was hacked once. The hack was very sophisticated: The hackers blocked my SIM card and created a duplicate of it; then they requested a recovery password from my Gmail account to my phone number, which they received on the SIM they had issued. My advice, in corrupt and authoritarian countries such as Russia, is to never use a phone number as a recovery or even as a second factor, beyond your password, in a two-step authentication. (I instead use Google Authenticator as the second factor.)

“But my case is rare. The majority of people are being hacked because they use weak passwords and, worse still, use the same password across different accounts. LastPass helps you avoid that pitfall. You may wonder: ‘If LastPass is hacked, will all of my passwords be compromised?’ The answer is no. LastPass’s server has indeed been hacked, but none of the passwords were compromised because LastPass doesn’t store the passwords, only ‘hashed’ versions of the passwords, which are indecipherable.”

OpenRefine

OpenRefine allows you to clean messy data, which sometimes is really hard. I use Python in the majority of cases, but for people who can’t program, OpenRefine is really a great tool.

“Imagine that you have a spreadsheet, with millions of rows, about state contracts. Of course, in such a huge amount of data there will be mistakes, for example in the names of the suppliers, or the dates will be mixed up, or some of the rows will be missing, or some of the prices will be written in different formats. How would you then sum it up? How would you calculate an average? You have to first put all of the data in the same format, which we call cleaning the data. OpenRefine allows you to do that easily.

“I used OpenRefine in the context of my story on waste management, because I had thousands of rows of data about different landfills in different regions. I wanted to analyze where the biggest landfills were.

“First I used a program called Tabula, which allowed me to extract tables from publicly-available PDFs into Excel, then I uploaded those Excel tables into OpenRefine. Without OpenRefine it would have been virtually impossible to do. I would have had to go through each line of the spreadsheet to check whether everything was in the same format. 

“I used to spend months cleaning data.”

The IStories team. Image Courtesy: Roman Anin

Coding

“I code in two programming languages, Python and JavaScript. Python I mainly use to collect and analyze data, and to automate some of the tasks, while JavaScript — in particular the D3 library of JavaScript — I use to visualize data.

One of our first pieces at IStories was about state contract procurement. In the context of the recent constitutional referendum, I decided to analyze how much money the state spent on it, and on what specifically.

“Without coding, it would have been impossible to do, because I had compiled 400,000 publicly available contracts. I needed to analyze them, find out which were the biggest ones, sort them, find patterns in them. I used Python to analyze the data, which led to one of the most popular stories on our website. What I found was that the state spent a lot of money on buying millions and millions of masks and protective equipment to be used by officials at polling stations, while doctors were suffering from a lack of protective gear during the coronavirus pandemic. That tells you what the government’s priorities were.

“I spent about a week on this story, which would have been impossible without coding. How else would I have analyzed 400,000 contracts?

“That investigation inspired me to create a tool by coding in Python a small app that scrapes the information released every day on the official federal procurement website, and combines that data with information from the company registry. It then performs a superficial but very useful analysis about the suppliers mentioned in these contracts: When was the company established? How many people work at the company? What is the turnover? Who owns it, etc.? The app then compiles the answers in an HTML file, which it emails out to me and to my reporters every day. It saves us a lot of time.

“In the old days, I checked the procurement database from time to time, when I had a little spare time. Now everything is done automatically. It takes me mere seconds to read through the newsletter.”

Russian Commercial Court Database

“Russia has what is probably the best public database in the world when it comes to our commercial courts. It’s completely free, and what makes it unique is that you can search it for keywords. In the majority of court databases, you can only do a search for the names of the parties. But this database will search for any keyword in all the rulings delivered in Russia’s commercial courts, then give you the results in PDF format. 

“The database does not search through materials filed in court or through transcripts of hearings; only the judges’ rulings are accessible. But these rulings summarize the case and provide the judges’ final decision, so they are a very useful resource.

“When I am investigating a company, I search for its name in this database to see whether it has been involved in any disputes in court.

“Once, I decided on a whim to search the database for ‘fraud,’ ‘billions,’ and ‘Gazprom’ to see whether the Russian gas giant Gazprom had been involved in any fraud cases involving billions of rubles. I found a case in which tax officers were suing one of Gazprom’s subsidiaries for buying equipment at an inflated price through an offshore company and ended up writing a story about it.”

UN Comtrade Database and Import Genius

“One of my favorite online databases is the United Nations Comtrade Database. It allows you to see the data about export and import flows between different countries. It’s pretty easy to use and allows you to search by specific importer and exporter country, by product traded, and by time period.

“After Russia imposed sanctions on different European countries — it stopped buying some products from the countries which imposed sanctions on Russia — reporters wanted to know how that would affect Russia’s imports. UN Comtrade allows you to do that, by inputting Russia as the importer, and the rest of the world as the exporter. It then tells you how much of that product is being imported and its cost. That is just one example of how it might be used. I use this database a lot, including to learn where Russia officially exports its arms. (Unofficial trade doesn’t make it into the database.) 

“The last time I used it was after the huge explosion of ammonium nitrate in the Lebanese capital, Beirut, on August 4. According to news reports, the ship carrying these goods was on its way to Mozambique when it was stopped in Beirut. I wondered from whom Mozambique was buying these explosives, and found that the majority of its ammonium imports were from Ukraine, so we thought the story might lead to Ukraine. Actually, it led to Georgia, another exporter of ammonium nitrate to Mozambique. The database didn’t reveal the origin of this specific shipment but taught us more generally about Mozambique’s imports of this product.

“To learn about the specific shipment, you can use another database, which I like very much but is very expensive: Import Genius. You have to subscribe to use it. [Monthly rates vary from $99 to $399, as of October 2020.]

“It gives you data on specific shipments and the parties involved in those exports and imports. You can search for the name of the company you’re interested in or by its trading registration number.”

Additional Reading

GIJN’s My Favorite Tools series

Resource Center: Online Research Tools

You Won’t Kill the Message: How Investigative Journalists Are Pushing Back Against Attacks


Olivier Holmey is a French-British journalist and translator living in London. His work has appeared in The Times, The Independent, Private Eye, NiemanLab, The Africa Report, and Jeune Afrique, among other publications.

Republish our articles for free, online or in print, under a Creative Commons license.

Republish this article


Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

Read Next

Data Journalism

GIJN’s Data Journalism Top 10: AI-Powered Investigations, Why Companies Donate, Data From PDFs, UK Pay Gap

What’s the global data journalism community tweeting about this week? Our NodeXL #ddj mapping from April 8 to 14 finds a zine focused on machine learning-powered investigative journalism produced by @bxrobertz, a video explainer from @FT on whether big corporations are really generous or just avoiding taxes, @knowtheory and @amandabee reviewing seven optical character recognition tools and @workbenchdata offering a tutorial on visualizing @Twitter data.

Data Journalism

This Week’s Top 10 in Data Journalism

What’s the global data journalism community tweeting about this week? Our NodeXL #ddj mapping from February 12 to 18 finds @MattLWilliams discussing the ethics of publishing Twitter content, @MaryJoWebster explaining several common “dirty data” problems and @MediaShiftOrg showing examples of the powerful impact of small data teams in newsrooms.

Data Journalism

This Week’s Top Ten in Data Journalism

What’s the global data journalism community tweeting about this week? Our NodeXL #ddj mapping from October 16 to 22 has @Sage_News analyzing data journalism practices over the past four years, data expert @albertocairo weighing in on uncertainty in interpreting graphics, @tristanf listing 12 new digital story formats for news and @R_Graph_Gallery’s inspiring Python Graph Gallery.

Data Journalism

Top Ten #ddj: This Week’s Top Data Journalism

What’s the global #ddj community tweeting about? Our NodeXL mapping from June 5 to 11 includes #VisualTrumpery from @mcrosasb, analysis of Theresa May’s election disaster by @GuardianVisuals, dataviz structuring strategies from @eagereyes, and school enrollment woes in Delhi from @htTweets.