WEBINAR - From the Panama Papers to the Epstein Files: Investigating Leaks and Large-Scale Data in the Age of AI
June 18, 2026 • 09:00
-
day
days
-
hour
hours
-
min
mins
-
sec
secs

Accessibility Settings

color options

monochrome muted color dark

reading tools

isolation ruler
GIJC25 - Uncovering Wrongdoing Using AI

Image: GIJN, YouTube

Resource

» Guide

Topics

Tips for Using AI as a Reporting Tool to Uncover Wrongdoing

Investigative reporters often face a common dilemma: the data exists, sometimes in vast quantities, but it is inaccessible, unstructured, or too large to examine manually. The problem lies not only in discovering wrongdoing, but also in building systems that make patterns visible and tips actionable.

At the 14th Global Investigative Journalism Conference (GIJC25) in Kuala Lumpur, the session “Uncovering Wrongdoing Using AI: Methods, Techniques, and Challenges” brought together newsroom leaders and academics who have built those systems. The panel centered on replicable methods that journalists can apply across industries and countries.

Three approaches stood out: building searchable ownership and financial databases; creating AI-powered “digital democracy” systems that generate reporting tips; and designing AI agent and OCR (optical character recognition) workflows to process massive document archives.

Building a Searchable Fishing and Ownership Database

Fabrizio Palumbo, associate professor of data journalism at OsloMet and founder of the AI Journalism Resource Center, described how his team partnered with Norwegian newsrooms to work with official fisheries data.

Norway publishes detailed records of every registered fishing trip. Each entry includes the vessel, species caught, weight, time, and fishing area. The initial idea was straightforward: cross-check reported catches against quotas to detect underreporting or overreporting.

After two years of analysis, the team found no clear discrepancies.

Instead of abandoning the project, they pivoted. “We spent more than two years trying to figure it out… and we couldn’t get anything out of it informatively,” Palumbo said. So they built an infrastructure that journalists could use directly.

  • Collect and Normalize the Data

The team gathered gigabytes of fisheries data and standardized the records into a machine-readable database. The emphasis was on open source, free tools and GDPR-compliant workflows. Palumbo stressed that any tool must meet “ethical requirements” and be explainable to journalists.

  •  Add Search and Ownership Layers

The database allows reporters to search for individual vessels or companies. Once extracted, the data is mapped into a graph network to visualize ownership links and quota allocations.

This enables reporters to answer questions such as:

  • Who owns this vessel?
  • Which companies share ownership?
  • Are quotas concentrated among related entities?
  • How do money flows move between firms?

The model is replicable across other sectors. Any industry with licensing regimes, quotas, permits, or regulatory filings can be structured in the same way: energy concessions, mining permits, pharmaceutical licenses, or public procurement contracts.

  •  Automate Personalized Newsletters

One of the most practical outputs was an automated weekly newsletter powered by a large language model (LLM). Journalists can predefine topics such as cod catches in northern Norway, new vessel registrations, or regulatory changes. The system generates customized updates from the database.

This idea can travel easily. A newsroom covering extractive industries could create weekly alerts on new drilling permits. A health reporter could receive alerts on pharmaceutical company filings. The key is not the generative AI itself, but the structured database behind it, and automated updates.

As Palumbo noted, the tool must be understandable and trustworthy: journalists must “be able to understand what you’re doing and… trust so that they will publish what you actually find out.”

Creating AI-Powered ‘Digital Democracy’ Tip Systems

Image showing head shots of the legislators in California

The CalMatters site allows users to filter by key topic, and to dive into data on different state politicians. Image; Screenshot, CalMatters

Sisi Wei, chief impact officer at CalMatters, described Digital Democracy, a system that tracks every word spoken by any California state legislator, every vote, and every campaign contribution.

The project was born from a reporting gap. Many local newsrooms no longer send reporters to the statehouse. Lawmakers were operating with limited scrutiny. “We start with the why,” Wei said. “AI is not always our answer.”

  • Build a Comprehensive Data Backbone

Digital Democracy ingests:

  • Transcripts of legislative hearings (via AI transcription)
  • Facial recognition to identify speakers
  • Bill sponsorship and voting records
  • Campaign donations
  • Gift disclosures

Importantly, generative AI is used only for transcription. Human staff review transcripts daily to correct speaker names and entities.

The heavier analytical work uses machine learning models built and controlled internally. Wei emphasized the difference: with in-house models, “we control all the inputs.”

  • Train the Model to Generate Story Leads

The core innovation is not public-facing dashboards. It is a password-protected section for journalists that generates story leads from the database. The model weighs variables such as:

  • Did a legislator vote against the interests of their top donors?
  • Are there patterns of abstention?
  • Are financial interests aligned with specific votes?

Initially, the system produced “boring” leads. Journalists provided feedback. The model was retrained based on what editors considered newsworthy.

One example revealed legislators who avoided voting “no” on controversial bills by refusing to vote at all, allowing bills to fail without public accountability. The model detected this pattern, flagged it, and reporters investigated it for months. The resulting story changed legislative behavior.

  • Offer Personalized Newsletters

Digital Democracy also generates weekly personalized newsletters for citizens, summarizing what their representatives did.

This model is globally transferable. Any country with parliamentary transcripts, voting records, and campaign finance disclosures can replicate it. The technical stack may vary, but the method remains:

  1. Centralize legislative data.
  2. Build explainable machine learning models.
  3. Use journalist feedback loops.
  4. Deliver actionable leads, not abstract dashboards.

Wei cautioned that AI is not always the best tool. In one project digitizing disclosure forms, generative AI failed to extract reliable structured data. The newsroom reverted to PDF Plumber, an open-source Python library built by a journalist. The lesson: use the simplest tool that solves the problem.

Practical Takeaways

These two cases have several common principles:

  • Start with the reporting problem, not the tool.
  • Build structured, searchable databases first.
  • Use machine learning for pattern detection, not final conclusions.
  • Integrate feedback loops from journalists.
  • Keep detailed documentation of every step.
  • Publish methodology when possible.
  • Maintain human oversight at every stage.

As Wei put it, “AI is simply a tool.”

The most transferable lesson is that using AI effectively in investigative journalism is not about adopting the newest model — it’s about building systems that turn complex data into reproducible reporting leads. Whether tracking fishing quotas or legislative votes  the workflow remains the same: collect, structure, test, verify, and report.


Serdar VardarSerdar Vardar is an investigative journalist at Deutsche Welle’s Environment Desk, specializing in cross-border environmental crimes, climate crisis coverage, corruption, and tax evasion. Winner of the EU Investigative Journalism Awards in the Western Balkans and Turkey, he has uncovered significant stories including the Qatargate scandal, Turkish corporate propaganda in the Balkans, and China’s Belt and Road environmental impacts in Peru and Colombia. Vardar has also worked on major global investigations including ICIJ’s Pandora Papers, Shadow Diplomats, and Deforestation Inc. with his work appearing in Deutsche Welle, Al Jazeera, and through collaborations with ICIJ and OCCRP.

Republish our articles for free, online or in print, under a Creative Commons license.

Republish this article


Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

Read Next

Resource Guide Chapter

Holding the Power of Big Tech Accountable 

Covering AI requires examining the power structures and decisions that shape how these systems are built and deployed, and who ultimately benefits from them.

Resource Guide Chapter

Making Tech Surveillance a Reporting Beat

While journalists are frequently victims of digital surveillance, they have also turned the scrutiny of spyware and surveillance systems into an investigative beat all of itself.

Resource Guide Chapter

Investigating Disinformation in the Age of AI

In high-velocity information wars, investigative value lies less in disproving every falsehood than in documenting patterns, tactics, and systemic vulnerabilities.