Image: GIJN, YouTube
Tips for Using AI as a Reporting Tool to Uncover Wrongdoing
Resource Guide
Tech Focus Project
Resource Guide Chapter
The Investigative Agenda for Tech and AI Journalism
Resource Guide Chapter
Radical Collaboration: Why It’s the Antidote to Big Tech
Resource Guide Chapter
Holding the Power of Big Tech Accountable
Resource Guide Chapter
Gabriel Geiger Shares Tips and Tools on Investigating Government Use of AI
Resource Guide Chapter
Making Tech Surveillance a Reporting Beat
Resource Guide Chapter
John Scott-Railton Shares Tips and Tools to Protect Yourself Digitally
Resource Guide Chapter
Investigating Location-Tracking Surveillance Systems
Resource Guide Chapter
Investigating Disinformation in the Age of AI
Resource Guide Chapter
Karen Hao on AI Narratives Reporters Should Deconstruct
Resource Guide Chapter
Leveraging AI and Technology to Investigate Power
Resource Guide Chapter
Tips for Using AI as a Reporting Tool to Uncover Wrongdoing
Resource Guide Chapter
Gina Chua on 4 Tips for Innovative Journalism in the Age of AI
Global Academy Webinars Resource Guide Chapter
Webinar: Detecting AI-Generated Content – Updated Tools and Techniques
Resource Guide Chapter
Athandiwe Saba Shares Practical Tips on Investigating Big Tech in Africa
Resource Guide Chapter
Investigating the Human Cost of Tech
Resource Guide Chapter
Techniques for Investigating Data Centers
Resource Guide Chapter
Credits and Acknowledgments
Investigative reporters often face a common dilemma: the data exists, sometimes in vast quantities, but it is inaccessible, unstructured, or too large to examine manually. The problem lies not only in discovering wrongdoing, but also in building systems that make patterns visible and tips actionable.
At the 14th Global Investigative Journalism Conference (GIJC25) in Kuala Lumpur, the session “Uncovering Wrongdoing Using AI: Methods, Techniques, and Challenges” brought together newsroom leaders and academics who have built those systems. The panel centered on replicable methods that journalists can apply across industries and countries.
Three approaches stood out: building searchable ownership and financial databases; creating AI-powered “digital democracy” systems that generate reporting tips; and designing AI agent and OCR (optical character recognition) workflows to process massive document archives.
Building a Searchable Fishing and Ownership Database
Fabrizio Palumbo, associate professor of data journalism at OsloMet and founder of the AI Journalism Resource Center, described how his team partnered with Norwegian newsrooms to work with official fisheries data.
Norway publishes detailed records of every registered fishing trip. Each entry includes the vessel, species caught, weight, time, and fishing area. The initial idea was straightforward: cross-check reported catches against quotas to detect underreporting or overreporting.
After two years of analysis, the team found no clear discrepancies.
Instead of abandoning the project, they pivoted. “We spent more than two years trying to figure it out… and we couldn’t get anything out of it informatively,” Palumbo said. So they built an infrastructure that journalists could use directly.
- Collect and Normalize the Data
The team gathered gigabytes of fisheries data and standardized the records into a machine-readable database. The emphasis was on open source, free tools and GDPR-compliant workflows. Palumbo stressed that any tool must meet “ethical requirements” and be explainable to journalists.
- Add Search and Ownership Layers
The database allows reporters to search for individual vessels or companies. Once extracted, the data is mapped into a graph network to visualize ownership links and quota allocations.
This enables reporters to answer questions such as:
- Who owns this vessel?
- Which companies share ownership?
- Are quotas concentrated among related entities?
- How do money flows move between firms?
The model is replicable across other sectors. Any industry with licensing regimes, quotas, permits, or regulatory filings can be structured in the same way: energy concessions, mining permits, pharmaceutical licenses, or public procurement contracts.
- Automate Personalized Newsletters
One of the most practical outputs was an automated weekly newsletter powered by a large language model (LLM). Journalists can predefine topics such as cod catches in northern Norway, new vessel registrations, or regulatory changes. The system generates customized updates from the database.
This idea can travel easily. A newsroom covering extractive industries could create weekly alerts on new drilling permits. A health reporter could receive alerts on pharmaceutical company filings. The key is not the generative AI itself, but the structured database behind it, and automated updates.
As Palumbo noted, the tool must be understandable and trustworthy: journalists must “be able to understand what you’re doing and… trust so that they will publish what you actually find out.”
Creating AI-Powered ‘Digital Democracy’ Tip Systems

The CalMatters site allows users to filter by key topic, and to dive into data on different state politicians. Image; Screenshot, CalMatters
Sisi Wei, chief impact officer at CalMatters, described Digital Democracy, a system that tracks every word spoken by any California state legislator, every vote, and every campaign contribution.
The project was born from a reporting gap. Many local newsrooms no longer send reporters to the statehouse. Lawmakers were operating with limited scrutiny. “We start with the why,” Wei said. “AI is not always our answer.”
- Build a Comprehensive Data Backbone
Digital Democracy ingests:
- Transcripts of legislative hearings (via AI transcription)
- Facial recognition to identify speakers
- Bill sponsorship and voting records
- Campaign donations
- Gift disclosures
Importantly, generative AI is used only for transcription. Human staff review transcripts daily to correct speaker names and entities.
The heavier analytical work uses machine learning models built and controlled internally. Wei emphasized the difference: with in-house models, “we control all the inputs.”
- Train the Model to Generate Story Leads
The core innovation is not public-facing dashboards. It is a password-protected section for journalists that generates story leads from the database. The model weighs variables such as:
- Did a legislator vote against the interests of their top donors?
- Are there patterns of abstention?
- Are financial interests aligned with specific votes?
Initially, the system produced “boring” leads. Journalists provided feedback. The model was retrained based on what editors considered newsworthy.
One example revealed legislators who avoided voting “no” on controversial bills by refusing to vote at all, allowing bills to fail without public accountability. The model detected this pattern, flagged it, and reporters investigated it for months. The resulting story changed legislative behavior.
- Offer Personalized Newsletters
Digital Democracy also generates weekly personalized newsletters for citizens, summarizing what their representatives did.
This model is globally transferable. Any country with parliamentary transcripts, voting records, and campaign finance disclosures can replicate it. The technical stack may vary, but the method remains:
- Centralize legislative data.
- Build explainable machine learning models.
- Use journalist feedback loops.
- Deliver actionable leads, not abstract dashboards.
Wei cautioned that AI is not always the best tool. In one project digitizing disclosure forms, generative AI failed to extract reliable structured data. The newsroom reverted to PDF Plumber, an open-source Python library built by a journalist. The lesson: use the simplest tool that solves the problem.
Practical Takeaways
These two cases have several common principles:
- Start with the reporting problem, not the tool.
- Build structured, searchable databases first.
- Use machine learning for pattern detection, not final conclusions.
- Integrate feedback loops from journalists.
- Keep detailed documentation of every step.
- Publish methodology when possible.
- Maintain human oversight at every stage.
As Wei put it, “AI is simply a tool.”
The most transferable lesson is that using AI effectively in investigative journalism is not about adopting the newest model — it’s about building systems that turn complex data into reproducible reporting leads. Whether tracking fishing quotas or legislative votes the workflow remains the same: collect, structure, test, verify, and report.
Serdar Vardar is an investigative journalist at Deutsche Welle’s Environment Desk, specializing in cross-border environmental crimes, climate crisis coverage, corruption, and tax evasion. Winner of the EU Investigative Journalism Awards in the Western Balkans and Turkey, he has uncovered significant stories including the Qatargate scandal, Turkish corporate propaganda in the Balkans, and China’s Belt and Road environmental impacts in Peru and Colombia. Vardar has also worked on major global investigations including ICIJ’s Pandora Papers, Shadow Diplomats, and Deforestation Inc. with his work appearing in Deutsche Welle, Al Jazeera, and through collaborations with ICIJ and OCCRP.