Image: Shutterstock

Stories

•

Topics

» Reporting Tools & Tips

How to Become a Deep Web Super Sleuth

by Leonie Kijewski • October 14, 2019

Русский

Image: Shutterstock

Most journalists use Google daily, but search engines only show about 4% of the content that is actually available online, according to Albrecht Ude, a German journalist, researcher, and trainer at the 11th Global Investigative Journalism Conference.

“Search engines are completely useless for finding any content on the deep web,” said Ude.

So how can journalists harvest the deep web?

Think abstract, said Ude. Don’t think about the specific content you want to find, but rather where such content might exist; then find related databases. Search engines won’t insert names into databases, so you’ll have to do that yourself.

For example, if you need contact information for a specific architect and know where he or she lives, check if there’s a regional professional association database. That’s how Ude tracked down a source whose email address didn’t seem to exist online.

Here are four tips on how to identify databases that can give you information that Google won’t.

1. Who Runs the Database?

Who is likely to invest time and money to create and maintain a database with the kind of information you’re looking for? “This problem will not be solved by a search engine, but by your head,” Ude said.

2. Hack Search Engines

Find databases by searching for your topic with “database OR directory OR catalogue OR registry” on a search engine. If you want some privacy, Dutch company www.startpage.com runs searches for you on Google, without giving the tech giant your information.

3. Use Wikipedia

Look up the topic on Wikipedia and check the “External links” section at the bottom of the page. Those links are of generally higher quality than those delivered by search engine results, according to Ude.

Follow Wikipedia categories pages and keyword links. And search in local languages.

You can also find lists of databases on Wikipedia. For example, lists of academic databases and a list of online databases.

4. Search for Database Lists

If searching in English, type the phrase “a * z database” into a search engine. This will return a list of “A to Z” databases.

Use a university library in your city. This will give you access to thousands of scientific databases that usually charge a subscription rate. Some universities charge annual fees to make use of their facilities if you’re not a student, but this is much cheaper than paying subscription fees for databases.

German speakers can use the “database of databases.” The University of Regensburg lists more than 10,000 databases.

Be sure to search in other languages if relevant.

Bonus

Ude shared databases that you absolutely must know:

Archives are one of the best tools to search for records, specifically deleted pages. For example, you can find information that a company may have removed or changed following a news event. Search the Wayback Machine for archived pages, or archive a page you want saved on Archive.today.

IANA Root Zone Database has information on who owns all valid, usable top-level domains. New information is not available in the EU due to new privacy laws, but there are ongoing efforts to negotiate access for journalists.

WorldCat is a global library catalog that will find any published book. WorldCat’s Identities is a useful name search engine.

Common Vulnerabilities and Exposures is a great database to investigate internet fraud and has “every known security leak on the net,” according to Ude.

Tenders Electronic Daily lists where exactly the European Union is spending its money. Designed for investors, it’s updated daily.

Directory of Open Access Journals indexes peer-reviewed scientific journals whose articles are available for free.

National libraries can be excellent resources to find databases. Wikipedia has a list of national and state libraries.

German news outlet Zeit Online compiled all German street names with the help of OpenStreetMap. If you only know a street name but not the city, this can be a useful resource.

Leonie Kijewski is a freelance reporter based in Cambodia. Her work has appeared in The Guardian, Al Jazeera, Voice of America, and various other publications. She speaks German, English, French, and Dutch. She previously worked for the Phnom Penh Post in Cambodia as a sub-editor and reporter. Leonie holds a master’s degree in International Law.

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Republish our articles for free, online or in print, under a Creative Commons license.

Read other stories tagged with:

11th Global Investigative Journalism Conference database deep web Germany GIJC19 google online research search engines

Republish this article

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

<h2>How to Become a Deep Web Super Sleuth</h2> by <a href="https://twitter.com/LeonieKij">Leonie Kijewski</a> for Global Investigative Journalism Network &bull; October 14, 2019 <a href="https://gijn.org/2019/10/15/%d0%b3%d0%bb%d1%83%d0%b1%d0%b8%d0%bd%d0%bd%d0%b0%d1%8f-%d1%81%d0%b5%d1%82%d1%8c-deep-web-%d0%ba%d0%b0%d0%ba-%d0%bd%d0%b0%d0%b9%d1%82%d0%b8-%d0%b8%d0%bd%d1%84%d0%be%d1%80%d0%bc%d0%b0%d1%86%d0%b8%d1%8e/">Русский</a>Most journalists use Google daily, but search engines only show about 4% of the content that is actually available online, according to <a href="https://www.ude.de/">Albrecht Ude</a>, a German journalist, researcher, and trainer at the <a href="https://gijc2019.org/">11th Global Investigative Journalism Conference</a>."Search engines are completely useless for finding any content on the deep web,&rdquo; said Ude.So how can journalists harvest the deep web?Think abstract, said Ude. Don&rsquo;t think about the specific content you want to find, but rather where such content might exist; then find related databases. Search engines won&rsquo;t insert names into databases, so you&rsquo;ll have to do that yourself.For example, if you need contact information for a specific architect and know where he or she lives, check if there's a regional professional association database. That's how Ude tracked down a source whose email address didn't seem to exist online.Here are four tips on how to identify databases that can give you information that Google won&rsquo;t.<h4>1. Who Runs the Database?</h4>Who is likely to invest time and money to create and maintain a database with the kind of information you're looking for? &ldquo;This problem will not be solved by a search engine, but by your head,&rdquo; Ude said.<h4>2. Hack Search Engines</h4>Find databases by searching for your topic with &ldquo;database OR directory OR catalogue OR registry&rdquo; on a search engine. If you want some privacy, Dutch company <a href="http://www.startpage.com">www.startpage.com</a> runs searches for you on Google, without giving the tech giant your information.<h4>3. Use Wikipedia</h4>Look up the topic on Wikipedia and check the "External links" section at the bottom of the page. Those links are of generally higher quality than those delivered by search engine results, according to Ude.Follow Wikipedia categories pages and keyword links. And search in local languages.You can also find lists of databases on Wikipedia. For example, lists of <a href="https://en.wikipedia.org/wiki/List_of_academic_databases_and_search_engines">academic databases</a> and a list of <a href="https://en.wikipedia.org/wiki/List_of_online_databases">online databases</a>.<h4>4. Search for Database Lists</h4>If searching in English, type the phrase &ldquo;a * z database&rdquo; into a search engine. This will return a list of &ldquo;A to Z&rdquo; databases.Use a university library in your city. This will give you access to thousands of scientific databases that usually charge a subscription rate. Some universities charge annual fees to make use of their facilities if you&rsquo;re not a student, but this is much cheaper than paying subscription fees for databases.German speakers can use the &ldquo;database of databases.&rdquo; The University of Regensburg <a href="http://www.bibliothek.uni-regensburg.de/dbinfo/">lists more</a> than 10,000 databases.Be sure to search in other languages if relevant.<h4>Bonus</h4>Ude shared databases that you absolutely must know:Archives are one of the best tools to search for records, specifically deleted pages. For example, you can find information that a company may have removed or changed following a news event. Search the <a href="https://archive.org/">Wayback Machine </a>for archived pages, or archive a page you want saved on <a href="https://archive.is/">Archive.today</a>.<a href="https://www.iana.org/domains/root/db">IANA Root Zone Database</a> has information on who owns all valid, usable top-level domains. New information is not available in the EU due to new privacy laws, but there are ongoing efforts to negotiate access for journalists.<a href="https://www.worldcat.org/">WorldCat</a> is a global library catalog that will find any published book. <a href="http://www.worldcat.org/identities">WorldCat's Identities</a> is a useful name search engine.<a href="https://cve.mitre.org/">Common Vulnerabilities and Exposures</a> is a great database to investigate internet fraud and has "every known security leak on the net," according to Ude.<a href="https://ted.europa.eu/TED/main/HomePage.do">Tenders Electronic Daily </a>lists where exactly the European Union is spending its money. Designed for investors, it&rsquo;s updated daily.<a href="https://doaj.org/">Directory of Open Access Journals</a> indexes peer-reviewed scientific journals whose articles are available for free.National libraries can be excellent resources to find databases. Wikipedia has <a href="https://en.wikipedia.org/wiki/list_of_national_and_state_libraries">a list of national and state libraries</a>.German news outlet Zeit Online compiled <a href="https://www.zeit.de/interactive/strassennamen/">all German street names</a> with the help of <a href="https://www.openstreetmap.org/">OpenStreetMap</a>. If you only know a street name but not the city, this can be a useful resource.<hr><img class="alignleft size-thumbnail wp-image-4325" src="http://gijc2019.org/wp-content/uploads/sites/59/2019/09/IMG-20181021-WA0025-140x140.jpg" alt="" width="140" height="140"><a href="https://twitter.com/LeonieKij">Leonie Kijewski</a> is a freelance reporter based in Cambodia. Her work has appeared in The Guardian, Al Jazeera, Voice of America, and various other publications. She speaks German, English, French, and Dutch. She previously worked for the Phnom Penh Post in Cambodia as a sub-editor and reporter. Leonie holds a master's degree in International Law. 
	This <a target="_blank" href="https://gijn.org/stories/how-to-become-a-deep-web-super-sleuth/">article</a> first appeared on <a target="_blank" href="https://gijn.org">Global Investigative Journalism Network</a> and is republished here under a Creative Commons license.
	<img id="republication-tracker-tool-source" src="https://gijn.org/?republication-pixel=true&amp;post=657947&amp;ga=UA-21528033-17">

Harnessing the Power of Global Forest Watch for Data-Driven Reporting on Land Cover Change

by Morgan Erickson-Davis, SEJournal • June 4, 2024

A Mongabay senior editor shares tips for unlocking the full capabilities of Global Forest Watch, a free-to-use online platform that she says has been instrumental in her reporting.

Data Journalism Reporting Tools & Tips

Spill-Tracking Data Sources to Help Cover Hazmat Events and Environmental Disasters

by Joseph A. Davis, SEJournal • April 4, 2024

Environmental journalists should check out the new database tool Spill Tracker, but should also bookmark these other resources for reporting on hazmat events.

Reporting Tools & Tips

Introducing FBarchive: A New, Searchable Repository of Facebook Whistleblower Documents

by Clark Merrefield, The Journalist's Resource • October 20, 2023

The new, searchable database FBarchive is designed to help researchers, journalists, and policymakers better understand and investigate decisions made at Meta about some of the most influential social media platforms across the globe.

Methodology Reporting Tools & Tips

Using Satellites to Reveal the Burning of the UK’s Protected Moorlands

by Emma Howard and Crispin Dowler • June 9, 2022

How Greenpeace’s investigative site, Unearthed, used satellite imagery and database mapping to reveal hundreds of fires on environmentally protected land in the English moors – including dozens that could be illegal.

Accessibility Settings

text size

color options

reading tools

other

Stories

Topics

How to Become a Deep Web Super Sleuth

1. Who Runs the Database?

2. Hack Search Engines

3. Use Wikipedia

4. Search for Database Lists

Bonus

Read other stories tagged with:

Republish this article

Read Next

Data Journalism Reporting Tools & Tips

Harnessing the Power of Global Forest Watch for Data-Driven Reporting on Land Cover Change

Data Journalism Reporting Tools & Tips

Spill-Tracking Data Sources to Help Cover Hazmat Events and Environmental Disasters

Reporting Tools & Tips

Introducing FBarchive: A New, Searchable Repository of Facebook Whistleblower Documents

Methodology Reporting Tools & Tips

Using Satellites to Reveal the Burning of the UK’s Protected Moorlands

Stories

Topics

How to Become a Deep Web Super Sleuth

Related Resources

Remote Sensing and Data Tools for Environmental Investigations

Tips for Building a Database for Investigations

GIJN Guide to Investigating Foreign Lobbying

Guide to Investigating Caste

Share

1. Who Runs the Database?

2. Hack Search Engines

3. Use Wikipedia

4. Search for Database Lists

Bonus

Related Resources

Remote Sensing and Data Tools for Environmental Investigations

Tips for Building a Database for Investigations

GIJN Guide to Investigating Foreign Lobbying

Guide to Investigating Caste

Related Stories

Harnessing the Power of Global Forest Watch for Data-Driven Reporting on Land Cover Change

Spill-Tracking Data Sources to Help Cover Hazmat Events and Environmental Disasters

Introducing FBarchive: A New, Searchable Repository of Facebook Whistleblower Documents

Using Satellites to Reveal the Burning of the UK’s Protected Moorlands

Read other stories tagged with:

Republish this article

Read Next

Data Journalism Reporting Tools & Tips

Harnessing the Power of Global Forest Watch for Data-Driven Reporting on Land Cover Change

Data Journalism Reporting Tools & Tips

Spill-Tracking Data Sources to Help Cover Hazmat Events and Environmental Disasters

Reporting Tools & Tips

Introducing FBarchive: A New, Searchable Repository of Facebook Whistleblower Documents

Methodology Reporting Tools & Tips

Using Satellites to Reveal the Burning of the UK’s Protected Moorlands