The OCCRP's Jan Strozyk addressing the audience at the Working with Hackers panel at GIJC23. Image: Rocky Kistner for GIJN
Working With Hackers: Where — and How — Journalists Should Use These Sources
The use of hacked data is an increasingly common ethical challenge for investigative journalists.
There are now terabytes of hacked datasets about corporations and government institutions available to reporters that trusted nonprofit groups have already obtained and curated for verification, source protection, privacy, and public interest value. And the careful and responsible use of this information can lead to key public interest revelations.
However, the prospect of using purloined data, even when verified, brings up core ethical questions: Is the story sufficiently important, and difficult to prove, to justify the use of hacked data — and how much detail about its origins should reporters disclose to their audiences?
Questions like these remain a case-by-case challenge, because the hackers can have all manner of motives — noble, partisan, and criminal — and range from brave whistleblowers to exiled hacktivists doing cyber attacks on autocratic institutions to, yes, even ransomware extortionists.
In a session on Working with Hackers at the 13th Global Investigative Journalism Conference (#GIJC23), three veteran data journalists shared their insights on how to work with hacked information, and how to deal with sources who use cyberattacks to get it.
The panel included Lorax Horne, editor of Distributed Denial of Secrets (DDoSecrets), Jan Strozyk, chief data editor at the Organized Crime and Corruption Reporting Project (OCCRP), and Alena Prykhodzka, an exiled Belarusian reporter at Partisan Wave and a collaborator with the Cyberpartisans hacktivist group.
Look for Trusted Gatekeepers
Raw hacked datasets can present various legal, privacy, and accuracy perils for newsrooms. They might be peppered with personal identifiable information (PII) that would be highly inappropriate or even dangerous to publish. This data might have no public value beyond gossip or commercial competition. And these leaks can sometimes contain disinformation or hate speech or subtle tells that identify the leaker, potentially putting them in jeopardy.
As a result, the panel agreed that newsrooms should browse for this kind of information from trusted archives that filter leaks for these issues before posting the files on their databases.
Launched in 2018, DDoSecrets is a journalism collective that fills this role. It curates and publishes leaked public interest information on both open and limited-access platforms, and has worked with newsrooms such as ICIJ, Der Spiegel, Forbidden Stories, and the European Investigative Collaborations network.
Meanwhile, OCCRP’s Aleph archive includes a collection of over 1,100 leaked datasets — including 260 internally classified as “hacked” data — which journalists at both OCCRP partner organizations and independent newsrooms around the world frequently mine for major investigations.
At the session, Horne and Strozyk confirmed that both of these archives only post leaks after the data has been verified, evaluated for public interest value, checked for inappropriate PII, and selected for either open access or limited access for verified journalists and researchers.
Public Interest Is Paramount
Horne said that DDoSecrets seeks important data that is not otherwise publicly available, and that their team evaluates leaks in terms of “a very broad and historical perspective.”
“If it doesn’t have public interest value, we will discard it,” Horne said. “Then, we’ll look to see if it contains passport scans or national ID numbers or personal addresses, especially from non-public persons. If it doesn’t contain PII, we can publish the data via the censorship-resistant ‘torrents’ publishing method. If it does, we’ll see if the PII can be stripped or redacted.”
Last year, a hacked leak published by DDoSecrets revealed that numerous members of the far-right Oath Keepers militia in the United States were also elected officials or members of the military, and detailed the membership totals by US state. However, Horne explained that the organization redacted membership lists because the thousands of names could not all be independently verified. The team also obtained and published a massive hacked database from the Bahamas corporate registry, as well as 489 gigabytes of meeting notes, audio files, and emails from Russia’s censorship agency, Roskomnadzor Moscow.
A new, cross-border collaborative investigation into a sharp change in global drug trafficking routes – NarcoFiles: The New Criminal Order — offers a great illustration of the public interest benefits, security hazards, and criminal threats involved with hacking, all in one story.
Led by OCCRP and Centro Latinoamericano de Investigación Periodística (CLIP) — both of which are GIJN members — and including several other groups, the investigation originated with the hacking of highly sensitive emails from the Colombian Prosecutor’s Office by a local hacktivist group in 2022.
Despite this group’s claims of noble, anti-corruption intent, OCCRP pointed out that the raw hacked dataset could expose Drug Enforcement Administration (DEA) agents, informants, and witnesses “to severe danger.” So, DDoSecrets and hacking news site Enlace Hacktivista obtained and carefully curated five terabytes of data from the hackers — including seven million emails — to mitigate these security risks, and worked with OCCRP and its partners to offer public interest leads.
In a detailed disclosure, OCCRP explained how the leak was verified, and noted that “measures were also taken to protect third parties and to avoid disrupting ongoing investigations.”
Working With Data Thieves: Where to Draw the Line
In addition to datasets submitted by more altruistic sources like whistleblowers and ideological hacktivists, journalists sometimes encounter leak archives originally posted by extortionists. In these cases, mercenary hackers have carried out their ransomware threat to publish non-public data in some dark corner of the internet.
While fishing for new sources online, Strozyk said he often encounters links to potentially newsworthy ransomware files published in places like the dark web. “Part of my job is to also go out into the internet and look for datasets that our reporters might be interested to work with,” said Strozyk. But finding this data doesn’t automatically mean a journalist should accept it.
James Ball — a trustee at the Ethical Journalism Network (EJN) and the former global editor of the Bureau of Investigative Journalism — stressed that journalists should consider public interest values at every step when dealing with hacked documents.
“Should we even look at a [hacked] dataset? Do we have a reason to believe it might contain information in the public interest? We should reassess — is it living up to what we thought? Is it more intrusive than we first realized?” Ball said, presenting examples of the questions news organizations should be asking when considering hacked or extorted data.
“So, not just as a tack-on at the end,” Ball added. “This can often be helpful to evidence in the case of legal complaints, but is often good practice too.”
He pointed to the blockbuster whistleblower leak from Edward Snowden that used massive amounts of data illicitly shared from the US National Security Agency. The Guardian and The Washington Post decided that the public interest value of the information, which they vetted, outweighed the potential legal risks, and that — because he was entitled to access the data — Snowden was technically a whistleblower, rather than a hacker. Ball said he shared that opinion as well.
“The hack/leak distinction is never a clean one,” he explained. “Most countries’ laws define a hack (or at least computer misuse) as accessing data that you’re not entitled to access, or accessing for purposes for which you’re not supposed to access it.”
When it comes to information sourced from unrepentant bad actors, like extortionist hackers, Ball said media outlets should prominently disclose to the audiences the nature of any ransomware data that they end up using. However, he recommended that the press avoid naming them.
“Ransomed data increases the public interest threshold at which publication is merited, and you probably owe it to the audience to say it was the result of ransomware,” he said. “I personally would not name the ‘gang’ behind it, as that might help them claim future ransoms.”
Ransomware Data Rules of Engagement
Both Horne and Strozyk said they use the following guidelines for dealing with information illicitly published by ransomware groups.
- They never pay for the use of this data.
- They never publish a previously leaked dataset if the source has current access to the same institution’s data. “We don’t engage with anyone who is in a system live,” Strozyk explained. “This is partly for legal reasons — otherwise, you might accidentally instruct someone to hack something for you, or be accused of collusion.”
- They take care to avoid being used as leverage to increase future ransom demands. “We don’t let ourselves be used as a negotiating tactic with extortion attempts,” said Horne.
- They don’t provide public access to ransomware data. Instead, they restrict access to requests made by verified journalists and researchers.
- They retain full, independent editorial control of the dataset. “We remain in control of the use of the data; we could never take a dataset along with any set of instructions, or spin, from the source,” said Horne.
Strozyk finds leads and links to some of these datasets in Telegram groups — but cautioned that it’s important to be open about your role as a journalist in your profile.
“Even on Telegram groups, I’m in there with my real name, and that I’m with OCCRP, so everyone knows I’m a journalist. This can also immediately scare people away who don’t want to follow our rules, which is good,” he noted.
Strozyk also monitors sites like ransomwatch and RansomLook for datasets of potential interest to investigative reporters, and closely follows the vx-underground Twitter channel for tips about new illicit leaks.
“I don’t find hackers very different from other types of sources — every source has an agenda,” Horne noted. “We do mirror ransomware datasets, but we don’t pay for data. These are datasets that the ransomware groups themselves have already published. We don’t deal with those groups directly.”
Later, Strozyk told GIJN: “We don’t provide access to ransomware datasets to the public, but we do to our ‘Friends of OCCRP’ user groups, which are open to anyone who can demonstrate that they are working as a journalist. These datasets come with a label that they’ve been originally published by a ransomware/hacker group, and that the user needs to take appropriate measures.”
Source of Last Resort
Why risk all of these ethical pitfalls in the first place? Because, the speakers said, these leaks are sometimes the only way to find leads that could expose an issue of overwhelming public interest.
These ethically fraught cases are comparable to some notable — and rare — journalistic uses of black market data. In 2020, Bellingcat researchers made small payments to data brokers to investigate the attempted murder of Russian opposition leader Alexey Navalny. That data helped them find individuals whose personal travel patterns matched Navalny’s air travel, which led them to identify the state agents they believe were behind the poisoning attempt. (Bellingcat Executive Director Christo Grozev told GIJN that this difficult ethical decision was made because no law enforcement unit was willing to investigate, and because suspects included professional spies skilled at covering their tracks.)
“The way we see it is that this ransomware data is out there anyway, and it’s best if there is a central place where it is accurately described and labeled for journalists, rather than someone else out there selling or abusing it,” said Strozyk.
This ends-justifies-the-means approach is an everyday reality for the exiled hacktivist groups trusted by independent media.
“The independent media landscape in Belarus was absolutely destroyed in 2020,” exiled reporter Alena Prykhodzka explained. “Independent journalists became refugees and now work in exile. This was the precondition for creating hacktivist groups, because there was only one way to survive against the regimes” of Belarusian dictator Alexander Lukashenko and Russia’s Vladimir Putin. “There was no other option,” she said.
Prykhodzka said exiled journalists created a hacktivist group called Cyberpartisans, which, she said, has since hacked dozens of repressive institutions in Belarus and Russia. She said they had curated and republished hacked datasets on, for instance, rail transport, internal affairs, passport documentation, passenger traffic, and law enforcement conduct complaints made via the Belarus “102” police contact line.
Prykhodzka noted that the group — which works within the guidelines of an ethical code of conduct — has even obtained more than 100 hours of wiretapped phone calls made by that autocratic country’s security services.
Prykhodzka said verified journalists could request access to any of these databases via the @probivator_bot Telegram chat, or by contacting the group at cyberpartisans@protonmail.com.
“You will be asked to verify who you are and to explain clearly why you want to use the data,” she explained.
“I do think the motivation of the source matters here,” Ball said. “A principled whistleblower, versus a hacker seeking revenge, versus a company discrediting a rival, all would have different implications. But if a story is important enough, you could potentially use the information whether it came from any of those three. The more you can share what you know about your source’s motivations, or how you got the information (without revealing a confidential source) the better, and it is not appropriate ever to lie about the source of information.”
Rowan Philp is GIJN’s senior reporter. He was formerly chief reporter for South Africa’s Sunday Times. As a foreign correspondent, he has reported on news, politics, corruption, and conflict from more than two dozen countries around the world.