Accessibility Settings

color options

monochrome muted color dark

reading tools

isolation ruler
OCCRP Aleph database tool screenshot
OCCRP Aleph database tool screenshot

Image: Joanna Demarco for GIJN

» Tipsheet

Stories

Resource

Topics

Tipsheet for Reporters: How to Get the Best from OCCRP’s Aleph

Read this article in

In investigative journalism, connecting data points is often key to uncovering the truth. With an increasing amount of information available online and offline, journalists need effective ways to navigate vast collections of documents, records, and datasets.

Aleph, the data platform hosted by the Organized Crime and Corruption Reporting Project (OCCRP), is designed to help investigative reporters efficiently search, organize, and analyze data, making it easier to trace hidden connections in complex investigations — especially when it comes to “follow the money” stories that involve corporate crime and corruption. And not only that, it also comes pre-loaded with datasets that shed light on shell companies in the Caymans, bank accounts in Switzerland, jet ownership in the Isle of Man and other bad guy favorites.

While Aleph is widely used by investigative reporters, many journalists may not be familiar with the full range of features or know how to begin using it. This tipsheet aims to provide an overview of how to access the platform, use its core tools, and take advantage of more advanced features like cross-referencing entities and visualizing networks. Whether you’re new to Aleph or an experienced user, this article offers practical tips to help make the most of it.

Why Aleph Was Created

Aleph — named after a short story by Jorge Luis Borges — was developed as part of OCCRP’s broader mission to support transparency and accountability in reporting. It consolidates data from millions of documents, including corporate registries, financial records, leaks, and legal filings, and makes them searchable and cross-referable.

The platform arose from the challenges investigative reporters face in managing large datasets, particularly in cross-border investigations. Journalists needed a centralized tool to sift through this information, find patterns, and identify connections that might otherwise be missed. Aleph simplifies this process, allowing reporters to focus on analyzing the data and drawing insights from it.

Since its creation, Aleph has grown significantly. It now holds over 400 million documents or entities from more than 200 datasets — and this is only the public part. Many more are shared exclusively among journalists working on specific investigations. These datasets have been key to uncovering issues such as cross-border financial crimes and high-level corruption.

Real-World Examples: Investigations Using Aleph

Aleph has supported several major investigative projects. In the Azerbaijani Laundromat, journalists traced billions of dollars funneled through shell companies to bribe European officials. The Suisse Secrets project exposed hidden Swiss bank accounts used by criminals and corrupt officials. More recently, Dubai Unlocked revealed how Dubai’s real estate market is used to hide wealth by corrupt politicians and criminals. In all these investigations, Aleph’s cross-referencing tools enabled reporters to uncover connections that would otherwise have remained obscured.

How to Access Aleph

Access to Aleph is free. Journalists, researchers, and civil society members can request access through the OCCRP website. Each request is reviewed by a staff member to ensure that Aleph’s capabilities align with investigative efforts. There are different levels of access, with some datasets restricted to a few vetted reporters and others available to the public and accessible directly on the website without signing up.

Once you’re granted access, you can explore a wide range of datasets, from corporate registries to leaks. Aleph also allows you to upload your own documents, which you can cross-reference with the platform’s extensive database. Importantly, you retain control over your uploaded data and can decide who can access it — or if you want to keep such documents in your own private work space.

Getting Started: First Steps with Aleph

Aleph’s interface is designed to make document searches easy. You can search, filter, and tag documents to quickly locate relevant information. For beginners, the document search feature is a good starting point. Search by keywords, names, or specific identifiers, and use filters to narrow down results by date, document type, or source, which helps manage large volumes of information.

Aleph advanced search screenshot

Clicking on the three bars button next to the search field in Aleph lets users conduct an advanced search for keywords, names, or other specific identifiers. Image: Screenshot, Aleph

Multilingual support is another useful feature. Aleph can process and search documents in various languages and also supports optical character recognition (OCR), making scanned images or PDFs searchable. In cross-border investigations, Aleph can transliterate search terms across languages, such as returning results for the Cyrillic string “Путин” when searching for “Putin.” The interface itself is available in six languages: English, German, French, Russian, Arabic, and Spanish.

Aleph search in different languages and with transliteration

Aleph also offers multilingual support in six different languages and allows users to search for names using transliteration. Image: Screenshot, Aleph

What Data Is in Aleph?

Aleph contains datasets from over 180 countries, with a particular focus on Eastern Europe and former Soviet states. OCCRP runs automated scrapers to keep datasets up to date. However, data availability varies by country — some regions have more detailed datasets, while others may have limited or incomplete records.

OCCRP’s research and data team collaborates with local partner organizations to gather relevant information, including company registries and government filings. Leaked or hacked datasets are also included, though each case undergoes an ethical review to determine its public interest.

Crucially, your own data can expand Aleph’s functionality. By uploading documents, such as FOI records or leaks, you can cross-reference your findings with existing datasets, allowing you to uncover connections that might otherwise go unnoticed.

For more training on how to search, use filters, and set alerts in Aleph, watch the video below.

Advanced Features

For more experienced users, Aleph offers advanced features that enhance investigative work. Cross-referencing allows you to connect people, companies, and addresses across datasets, making it easier to track networks that operate across borders.

Aleph also includes a visualization tool that enables users to map relationships between entities, creating network diagrams or timelines to visualize connections and key findings. These features can help reporters navigate complex datasets more efficiently and identify key relationships in their investigations.

Aleph network diagram to visualize connections between corporate entities and shell companies

Aleph also includes a visualization tool to allow users to diagram networks between corporate entities and shell companies. Image: Screenshot, Aleph

And for the engineers out there: Aleph’s source code is available for free under an open source license. A number of organizations host their own instance and there’s a vivid community of developers who work on new features and help maintain the code base.

OCCRP regularly offers virtual training sessions on Aleph’s advanced features. Journalists interested in deepening their understanding of the platform can participate in these sessions to learn more.

Conclusion: Making the Most of Aleph

Aleph is a resource that helps journalists uncover connections in vast datasets. By combining Aleph’s existing data with your own materials, you can conduct deeper investigations and track complex networks more effectively. Whether you are starting out or looking to use advanced features, Aleph is a practical tool designed to support investigative reporting across borders.


Jan Strozyk, OCCRPJan Strozyk is the chief data editor at OCCRP, based in Germany. He co-leads OCCRP’s research and data team alongside head of research Karina Shedrofsky. He works closely with the editorial and the data teams, coordinating data analysis and working on data-driven investigations. Previously, he was a reporter with the German public news broadcaster NDR, where he worked on the Luxembourg Leaks, the Panama Papers, the Paradise Papers, the FinCEN Files, and other cross-border investigations.

Republish our articles for free, online or in print, under a Creative Commons license.

Republish this article


Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

Read Next