WEBINAR - From the Panama Papers to the Epstein Files: Investigating Leaks and Large-Scale Data in the Age of AI
June 18, 2026 • 09:00
-
day
days
-
hour
hours
-
min
mins
-
sec
secs

Accessibility Settings

color options

monochrome muted color dark

reading tools

isolation ruler

Tag

Java

1 post

Reporting Tools & Tips

A Poor Journalist’s Text-Mining Toolkit

How can you search and analyze collections of documents on your own computers with simple tools? At DataHarvest, Robert Gebeloff and I ran a workshop to answer that question. As people were seemed interested, here’s a write-up of the two key tools we worked with: Apache Tika for content extraction and regular expressions in Sublime Text as an advanced search tool.