Accessibility Settings

color options

monochrome muted color dark

reading tools

isolation ruler

Image: Shutterstock

Stories

Topics

How to Become a Deep Web Super Sleuth

Русский

Image: Shutterstock

Most journalists use Google daily, but search engines only show about 4% of the content that is actually available online, according to Albrecht Ude, a German journalist, researcher, and trainer at the 11th Global Investigative Journalism Conference.

“Search engines are completely useless for finding any content on the deep web,” said Ude.

So how can journalists harvest the deep web?

Think abstract, said Ude. Don’t think about the specific content you want to find, but rather where such content might exist; then find related databases. Search engines won’t insert names into databases, so you’ll have to do that yourself.

For example, if you need contact information for a specific architect and know where he or she lives, check if there’s a regional professional association database. That’s how Ude tracked down a source whose email address didn’t seem to exist online.

Here are four tips on how to identify databases that can give you information that Google won’t.

1. Who Runs the Database?

Who is likely to invest time and money to create and maintain a database with the kind of information you’re looking for? “This problem will not be solved by a search engine, but by your head,” Ude said.

2. Hack Search Engines

Find databases by searching for your topic with “database OR directory OR catalogue OR registry” on a search engine. If you want some privacy, Dutch company www.startpage.com runs searches for you on Google, without giving the tech giant your information.

3. Use Wikipedia

Look up the topic on Wikipedia and check the “External links” section at the bottom of the page. Those links are of generally higher quality than those delivered by search engine results, according to Ude.

Follow Wikipedia categories pages and keyword links. And search in local languages.

You can also find lists of databases on Wikipedia. For example, lists of academic databases and a list of online databases.

4. Search for Database Lists

If searching in English, type the phrase “a * z database” into a search engine. This will return a list of “A to Z” databases.

Use a university library in your city. This will give you access to thousands of scientific databases that usually charge a subscription rate. Some universities charge annual fees to make use of their facilities if you’re not a student, but this is much cheaper than paying subscription fees for databases.

German speakers can use the “database of databases.” The University of Regensburg lists more than 10,000 databases.

Be sure to search in other languages if relevant.

Bonus

Ude shared databases that you absolutely must know:

Archives are one of the best tools to search for records, specifically deleted pages. For example, you can find information that a company may have removed or changed following a news event. Search the Wayback Machine for archived pages, or archive a page you want saved on Archive.today.

IANA Root Zone Database has information on who owns all valid, usable top-level domains. New information is not available in the EU due to new privacy laws, but there are ongoing efforts to negotiate access for journalists.

WorldCat is a global library catalog that will find any published book. WorldCat’s Identities is a useful name search engine.

Common Vulnerabilities and Exposures is a great database to investigate internet fraud and has “every known security leak on the net,” according to Ude.

Tenders Electronic Daily lists where exactly the European Union is spending its money. Designed for investors, it’s updated daily.

Directory of Open Access Journals indexes peer-reviewed scientific journals whose articles are available for free.

National libraries can be excellent resources to find databases. Wikipedia has a list of national and state libraries.

German news outlet Zeit Online compiled all German street names with the help of OpenStreetMap. If you only know a street name but not the city, this can be a useful resource.


Leonie Kijewski is a freelance reporter based in Cambodia. Her work has appeared in The Guardian, Al Jazeera, Voice of America, and various other publications. She speaks German, English, French, and Dutch. She previously worked for the Phnom Penh Post in Cambodia as a sub-editor and reporter. Leonie holds a master’s degree in International Law.

Republish our articles for free, online or in print, under a Creative Commons license.

Republish this article


Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

Read Next

Peatland Burning on North York Moors UK

Methodology Reporting Tools & Tips

Using Satellites to Reveal the Burning of the UK’s Protected Moorlands

How Greenpeace’s investigative site, Unearthed, used satellite imagery and database mapping to reveal hundreds of fires on environmentally protected land in the English moors – including dozens that could be illegal.

Reporting Tools & Tips

Henk van Ess on Visual Thinking for Online Investigations

In a recent GIJN webinar, open source reporting expert Henk van Ess shared several online search tricks. But he explains that these work-arounds are merely tools for a new approach to online research that he calls “thinking visual,” which invites reporters to think of keywords like a search engine, rather than a person.