Reporters were given tips on the best tools and latest facial recognition techniques to help their investigations, at a panel at GIJC23. Image: Wiktoria Gruca for GIJN
Beyond Facial Recognition: State-of-the-Art Research Techniques
Investigative journalism thrives on a blend of high-tech tools and traditional research techniques, and in the field’s fast-evolving landscape, having the best tools and techniques can help newsrooms and reporters stay ahead of the curve and tell impactful stories.
In a panel on state-of-the-art research techniques at the 13th Global Investigative Journalism Conference (#GIJC23) and moderated by dataLEADS CEO Syed Nazakat, OCCRP research head Karina Shedrofsky and ICIJ training manager Jelena Cosic discussed recent investigations they’ve worked on and their favorite methods and tools, from facial recognition services to document categorizing.
Beyond Facial Recognition
Shedrofsky cited one of her latest projects — investigating an alleged cryptocurrency scam among teachers in Russia — to demonstrate how she uses one of her favorite tools. She had a photograph of a potential subject, but nothing else, so she started with PimEyes, a reverse image search service with facial recognition capability.
PimEyes returned many results and links, including an individual’s name — and another photograph, possibly of the same person. So she turned to Amazon’s Rekognition, which compares faces to determine whether two images are indeed the same person — which confirmed a 98% match. But Shedrofsky stressed the importance of verifying facial recognition results — even those with very high confidence rates — because sometimes these services can fail.
When in another case PimEyes yielded no results, she turned to search4faces, a service that indexes VK (VKontakte), a popular Russian social media platform. Here, she found what she needed.
As a third example, she cited a story on a businessman said to be acting as a proxy for a sanctioned Russian oligarch. A reporter asked Shedrofsky for help proving the identity of the businessman’s son — when the reporter had spoken to the son, he had denied that the man in question was his father.
What Shedrofsky knew were their full names, the son’s date of birth, and the son’s place of work — a Russian majority state-owned bank. This time, she used Pipl, a tool that’s very good at combining a person’s physical presence with their online presence; entering an email or a phone number can yield a person’s social media accounts or physical address.
Because the son has a fairly common name, the search returned many results, but one of them had an email with the bank’s domain. Then, she found a Facebook account she thought belonged to him, but it was a private account, with no information. This seemed like a dead end, but she knew an important trick: on private Facebook accounts, you can still search by clicking the three horizontal dots “(…)” in the right corner of the profile page — anything publicly posted on a timeline is searchable.
Shedrofsky searched for birthdays, relevant names, anything that came to mind. She saw that this profile had received birthday messages on the date she knew to be the son’s birthday.
At that point, she was pretty confident that this was the correct profile. Searching for the word “love,” she found out the name of his wife. She then ran custom Google searches with their first names and surname, and found the website of the photography company that shot their wedding — and to her very welcome surprise, that the company had posted the entire wedding album. When she found a photo in the wedding album of the man she suspected was the father, she ran it through PimEyes — which found a match.
Shedrofsky said that her favorite tools lately — besides OCCRP’s Aleph database — are OpenCorporates, and contact information apps such as Truecaller and Rocketreach.
Old-School Techniques, Cutting Edge Tools
Cosic, from the International Consortium of Investigative Journalists, said she uses a combination of old-school techniques with modern tools. She showcased one of ICIJ’s latest projects: Deforestation Inc., a cross-border investigation that exposed companies branded, with the help of environmental certifications, as “sustainable” but which are accused of having contributed to forest destruction and of committing human rights violations. This wasn’t an easy project and they had to create their own database using a variety of sources.
- Certification bodies and auditors.
- EUTR’s (EU Timber Regulations) list of violations by country.
- Reports on environmental violations from NGOs and country reports.
- Trade data from ImportGenius.
- FOIs, corporate documents, marketing materials, and court filings.
- Accessing parent companies’ data through Orbis and Factiva.
All these datasets had to be harmonized so that the information could be accessed from a single master database. Cosic highlighted the importance of identifying research methodology before starting investigations of this kind.
Cosic said her all-time favorite tool is ICIJ’s Datashare — which runs OCR (optical character recognition) technology on uploaded documents to make them searchable. It also automatically detects and filters documents by people, organizations, and locations, making searches more efficient.
‘More Art than Science’
Shedrofsky and Cosic both acknowledged investigative journalism’s many challenges, with the former observing that the field is “more art than science,” and is constantly changing. “Staying on top of evolving crimes is a continuous challenge,” Shedrofsky cautioned.
For her part, Cosic highlighted the difficulties obtaining information from China, the limitations imposed by GDPR — the EU’s data protection regulation — and the need to navigate offshore data and domain registration complexities.
But, the pair pointed out, there are ways to stay ahead of the curve. Here are some of their tips for investigative journalists:
- Spreadsheets are your best friends — use them to organize your data.
- Seek guidance from data experts for effective data management.
- Label and organize downloaded documents into folders.
- Explore Chrome add-ons for capturing entire web pages and use Wayback Machine add-ons for search history preservation.
- Structure and tag documents for effective categorization.
- Collaborate securely using double-encrypted open-source platforms.
- Recognize the value of diverse skills and backgrounds in investigative journalism.