Accessibility Settings

color options

monochrome muted color dark

reading tools

isolation ruler

Illustration: Smaranda Tolosano for GIJN

Resource

» Guide

Topics

Introduction to Investigative Journalism: Deep Internet Research: Leveraging Open-Source Research and Verification Techniques

Read this article in

In today’s world, where information is both abundant and scarce, becoming skilled in deep Internet research is like navigating a maze of hidden paths. This chapter will guide you through advanced research techniques, including the use of open source tools and verification methods. You’ll also learn how to effectively use data and piece together the vast puzzles of the internet. By the end, you’ll have mastered research, including open source intelligence (OSINT), and taken the precision of your analysis to new heights.

Definition of Deep Online Research: How It Works for Investigations

Deep research is navigating under the surface of the Internet to find information that is frequently more thorough, extensive, and occasionally more sensitive. This calls for specific equipment, methods, but, at the same time it requires a dedication to moral principles.

Advanced internet research significantly improves any investigative process by providing access to information that isn’t easily found through standard search engines. Investigators can efficiently retrieve large amounts of data using web crawlers and data mining tools, which index and extract information from websites that are beyond the reach of conventional search engines.

To gain a comprehensive understanding, data on a particular subject from various sources should be cross-referenced and connected. This process verifies the information, highlights inconsistencies, and uncovers deeper insights.

The term open source Intelligence (OSINT), used by intelligence services since the 19th century, isn’t new, but it has become popular more recently among journalists and internet users who find it a vital resource because of the depth and breadth of information and because of new ways to reach and analyze it.

A forensic journalist is a specialized investigative journalist who uses these techniques to uncover, verify, and analyze information. These journalists apply forensic precision to public data, often using advanced tools and methodologies to trace the origins of information, expose hidden patterns, and verify facts in complex investigations. With open source intelligence, a forensic journalist leverages publicly available data, such as social media activity, satellite images, and online databases, to piece together narratives that reveal truths about events, individuals, or organizations that might otherwise remain hidden. This approach is crucial in an era where misinformation is rampant, and transparency is key to holding power accountable.

Utilizing specific open source intelligence tools, such as Maltego and Gephi, allows investigators to create visual representations of data, enabling them to identify patterns, trends, and connections that might not be immediately visible.

Planning and Conducting Your Investigation

Research and investigation are mental processes in the first place. Tools and techniques continually evolve and adapt depending on the research target. To develop a strong investigative mindset, it’s crucial to master and practice planning your research by following these steps:

Step 1: Specify the Goals of the Research

Clearly state the objectives of your investigation, such as conducting background checks, verifying identities, or investigating potential threats. Identifying your goals from the start will streamline your research process, focusing on the specific information you’re seeking and the intended outcome.

Step 2: Determine Data Sources

Identify the tools and techniques that can help you gather the necessary data and information. These might include official documents, social media sites, court records, and other relevant sources.

Step 3: Put Together the Required Methods

Selecting the right methods and tools to obtain your data is key to achieving your goals. These tools might include open source research tools, web crawling and data mining software such as Scrapy or metadata extraction tools such as EXIFTool, and maybe browsers that allow access to the dark web.

Step 4: Design Your Own Search Plan 

Each investigation requires a unique plan to achieve your goals. This plan will vary based on the information and tools at your disposal, as well as the clues you uncover along the way. The stages of the plan may need to be adjusted as new evidence or insights are gathered, allowing you to refine your approach and stay on track toward your objectives.

In an investigation, you might begin with social media research to identify individuals or locations, which can then lead you to databases, subscription services, and other hidden content. Eventually, you might explore dark web sources using the Tor Browser to uncover information not available on the surface or deep web without revealing the user’s identity. Before exploring dark web sources, it is important to discuss with your editor or get legal advice.

In another investigation, you might start with public records of a company and its employees, which could then direct you to crime reports and court orders. Each investigation’s path is shaped by the unique clues and resources available at each stage.

An example of a business investigation plan. Image: Courtesy of the author

Step 5: Cross-Reference the Collected Data

Gathering the data and information is not the last step in your investigation; instead, you are halfway because you will need to cross-reference data to ensure the accuracy and reliability of information you gathered. This step involves comparing details such as dates, names, locations, and events across multiple sources. By checking for consistency and identifying discrepancies, investigators can uncover errors or areas that need further investigation.

This process also includes finding supporting evidence, assessing the credibility of sources, and analyzing metadata and contextual information. For instance, in a financial fraud investigation, bank transaction records should be verified against company records and third-party confirmations to validate details and identify inconsistencies.

Step 6: Analyze the Data and Connect the Dots

Once you have gathered the different information and ensured its accuracy, it is time to analyze these data and connect the dots to transform raw data into meaningful insights. This begins with organizing and cleaning the data to ensure it is structured. By identifying key patterns and trends, researchers can start to uncover recurring themes and significant behaviors within the data. This process enables investigators to piece together different elements, revealing connections that might not be immediately obvious.

Step 7: Identify Findings

This process starts with reviewing all analyzed data to pinpoint key aspects relevant to the investigation’s objectives. Investigators then place these findings within the broader context of the investigation, ensuring they are both relevant and valuable.

Communicating the findings effectively to your audience is also essential. This involves presenting the results clearly and concisely, often using visual aids like charts, graphs, and tables to enhance understanding. Simplify the findings to be understandable to your audience and organize them logically, typically starting with the most significant insights and progressing to more detailed observations.

Example of visualizing open source data: Showing links between individuals using graph visualization by data visualization company Cambridge Intelligence. Image: Screenshot, Cambridge Intelligence

Example of visualizing open source data: showing links between businesses using graph visualization. Image: Screenshot, Cambridge Intelligence

Internet Deep Search

Let’s do a little exercise. Put on your investigator hat and write down a few places you’d go to gather intelligence on a target.

What did you write down? When I did the exercise, I wrote down Google, LinkedIn, the target’s website, and X/Twitter. Some others would write the government websites for company registrations or court records.

There isn’t only one right place to find information.

Open source intelligence refers to information about individuals or organizations that can be legally collected from free, public sources. It also includes public information from books, public library reports, newspaper articles, press releases, and social media.

Public records, including property records, court records, criminal records, and voting records, are valuable sources. Some paid sources provide more comprehensive information, and certain tools not explicitly designed for OSINT can still be advantageous. Forums, classifieds, and blog posts can offer detailed insights into the target.

Techniques and Tools

Tools for conducting investigations on the internet are constantly evolving, so the most powerful asset in any deep research is your techniques and search plan. These techniques differ depending on your target, whether it’s researching individuals or businesses, or uncovering the truth behind fake claims.

People Search

Finding information about individuals can take many routes. Public records, databases, court records, social media accounts, and their online activities are key sources. Let’s dive deeper into each one.

Social Media 

Social media platforms are treasure troves of historical data, often revealing personal details about potential targets, including job information, photos, videos, family details, fitness tracker data, check-ins, and much more.

Social media usernames help identify the person and also which social media applications the person is using. We can use websites such as Namechk.com and User-Searcher to check if a username is linked to our target.

It is important to check that the person is the real person, as there might be homonyms and fake accounts.

Image: Screenshot, User Searcher

Social media posts from family and friends can provide valuable information for investigations about the person of interest, such as their location, pet’s name, children’s names, home address, email address, phone number, banking information, and more. Regardless of the platform, you’ll typically encounter several categories of social media content that may be useful, including the author’s posts or comments, replies from the author or friends, social interactions (such as likes or connections), videos or images, and metadata, like timestamps.

For example, once, while investigating a company’s director, I came across a forum where he was communicating with friends about fantasy football. This is a site where players create fantasy teams based on real players and compete with friends. I was able to gain valuable insights into where the director goes every Thursday night and, even more importantly, details about his friends. The images, videos, and documents publicly shared contained multiple sources of information that were useful for my investigation.

When doing social media OSINT, remember to dig deep into threads. You’ll often find more relevant information by examining the entire conversation rather than just the original post. Social interactions can consist of many different activities.

For example, a woman posted a photo on social media showing her new car and license plate and a caption saying, “Goodbye, California. Hello, New Hampshire.” Her intentions were to share her excitement with her social network and show off her first car. However, my OSINT brain kicked into high gear seeing the wealth of information she unintentionally shared with the world. Without knowing it, she publicly gave away the following information: Two states where public records could potentially provide additional details about her. The answer to a common password reset question: What’s the make and model or color of your first car? A license plate to cross-reference DMV records. Geolocation data in the photo pointing to her new home, and information that could be used for impersonation.

Not only do images sometimes contain metadata with GPS coordinates, but small details in the photos can also be valuable. Most modern cameras, including those on smartphones, use something called EXIF formatting to store information about photos taken by the camera. When examining EXIF data, you’ll typically find details like shutter speed, exposure compensation, timestamps, and other technical information about how the picture was taken. If the camera is equipped with GPS, like most smartphones, you may also find highly accurate latitude and longitude information.

Data Aggregator Websites

Some websites aggregate personal data from various sources and present it in a format that aids in investigating individuals for some fees such as Pipl and Spokeo. Radaris and ZabaSearch and offer Offers background checks, contact information, and other details through public and private databases about U.S residents.

Image: Screenshot, pipl

Image: Screenshot, Radaris

Image: Screenshot, ZabaSearch

Researching Companies and Organizations 

There are many laws and regulations that require organizations to file paperwork with their local authority to stay compliant. In highly regulated industries, such as finance, there are additional filing requirements that are often accessible to the public free of charge. These documents provide details about the founders or executives and can be followed in investigations.

Image: Screenshot, Companies House / Gov.UK

Different countries have different websites that publish such documents, including Companies House in the UK and DNB in Switzerland. There are additional resources for companies in multiple countries, including the OCCRP investigative dashboard or the free service Open Corporates.

Image: Screenshot, OCCRP

Image: Screenshot, OpenCorporates

Social media is also an excellent resource for researching any business, as most companies maintain an official presence on popular social networks. If you’re uncertain about the authenticity of a social media account, visit the company’s website to look for links to their official social media pages. Examine the videos, images, and posts, but pay particular attention to the comment threads and reviews for additional insights.

Exercise: Use Companies House, to find the individuals with significant control over Bros Brothers International Limited and if it is an active company or not. 

From experience, I found that business records from earlier in an organization’s history typically have the most valuable open source data because the founders are using their personal data like phone numbers, physical addresses, and email addresses. Security is often not at the forefront of an organization’s priority list until later in their development, which can be used to our advantage.

Employees search

Another useful place to look for organizational open source data is on employer review sites like Glassdoor, Dun and Bradstreet, Hoovers, the US Patent and Trademark Office, and the Better Business Bureau.

Image: Screenshot, Glassdoor

LinkedIn provides an amazingly powerful search engine and, as a website, is geared towards employee research. The search form allows you to search by name, location, university, industry, and mutual contacts.

Image: Screenshot, LinkedIn

Exercise: Use social media to find doctors connected to UNHCR in Uganda.

Public Records and Databases 

Public records and databases are collections of information that anyone can access, usually managed by government bodies, universities, or other organizations. They include documents such as court records, property deeds, company financial reports, and academic research. These sources are key for investigative journalism, helping to check facts, discover hidden details, and build solid stories.

For instance, academic databases such as JSTOR, ProQuest, and Google Scholar provide research papers that offer historical insights and deep analysis. News archives like LexisNexis and Factiva help journalists explore the history of people, companies, or events by sifting through years of media coverage.

Public records are available on platforms like PACER for court documents. While PACER charges per page, there is a site called CourtListener that puts up millions of records from PACER for free.

Also, US Security and Exchange Commission (SEC) filings for company finances provide essential information on legal and corporate activities. Some countries will give you access to such databases upon official request. For instance, in the UK, you can apply for access through the government’s official website. Local government sites also offer important records such as property transactions and business licenses. These tools are invaluable for journalists aiming to uncover the full story.

Exercise: Can you find out which company had a court case filed against it in 2022, called Nate Rahn in the United States?

Google Dorking

Google Dorking or Google Hacking is a term used to refer to advance searching on Google to access information not accessible with the standard searches. It involves writing precise search queries to locate hidden information, or not properly secured and gathering detailed information that is crucial for your investigative reporting. It is crucial to know that Dorking is not illegal.

Google dorking involves using commands to make your search precise for example:

  • site: “site:example.com “confidential”” finds pages on example.com that contain the word “confidential”.
  • filetype: filetype:pdf “financial report” finds PDF files that include the phrase “financial report”.
  • inurl: inurl:admin finds pages with “admin” in the URL, often related to administrative sections of websites.
  • intitle: intitle:”index of” finds directories and lists of files.

Google Dorking is a powerful technique for investigative journalism, allowing reporters to uncover data that would otherwise remain hidden. However, it’s crucial to use these tools responsibly, ensuring that the information obtained is used in a legal and ethical manner.

Dark Web

The dark web is a goldmine of information, often containing breached data, passwords, and sites full of details that were never meant to be public. While it’s notorious for being a hub for cybercriminals, it’s also increasingly accessible for those conducting OSINT investigations. That said, it’s essential to be extremely cautious when diving into the dark web because it is filled with illegal activities, and even a wrong click can lead to serious legal issues or put your cybersecurity at risk.

To safely navigate the dark web, you’ll need a browser like Tor, which routes your traffic through multiple proxy servers around the world, making it nearly impossible to trace.

As an example, let’s say you’re an investigative journalist digging into a political corruption case. Υou might stumble upon a forum where whistleblowers are sharing confidential files on the dark web relating to the case you are investigating.

But remember, getting this kind of information isn’t without its dangers. An option is to subscribe for free to the Hunchly Report, which gives you a daily list of active hidden services and links. This way, you don’t have to spend hours navigating the dark web yourself. To be successful, you need to be open-minded, get creative, and use your investigative skills.

Before going this route, remember to check with a lawyer or your editor.

Verification Skills

In the age of social media, we are surrounded by misinformation and disinformation, which can mislead the public by distorting facts and, in some cases, lead to violence. Journalism today focuses on finding truth and verifying news. In the following pages, we will explore how to confidently assess the quality of the material you encounter.

Verification involves four main aspects:

Illustration: Shereen Sherif Fahmy Youssef

  • Source: Can we trust it?
  • Authenticity: Is the content genuine?
  • Location: Is the location accurate?
  • Time: Is the claimed date accurate?

Verifying Images

Step 1: Reverse Image Searching

To know if the image is new or has been published on the internet before; You can upload images to perform “reverse image searches” on Google Images, Bing, and other search engines to determine if the image is new or old, and whether it is related to the incident it claims to depict, or if it pertains to a different incident or location. Reverse image search only works if the photo exists online.

Step 2: Where Is This?

Try to locate where the photo was taken by noticing and searching for visual clues. Translating signs in the image can also help in this process.

Exercise: Below you can see a photo of a statue. Can you find the exact location of it? 
Please identify the coordinates of where the photo was taken.

Image: Shutterstock

Step 3: Is It Authentic, Photoshopped, or an AI Product?

With the current technology, any image can be manipulated or simply be AI-generated. Read an example in this link when an image of Eiffel Tower on fire went viral.

To know if it is manipulated or not upload the image to this website: https://fotoforensics.com/ or use the fake image detector website: https://www.fakeimagedetector.com/#google_vignette.

Image: Screenshot, Fake Image Detector

In the example below, you can observe that the hand has six fingers, which indicates that the image is probably not authentic, as people born with six fingers are very few. Also, the Palestinian flag that appears on the shirt is not accurate.

Facial Recognition

When dealing with faces, you may need to verify that the photo of the person you have is indeed of the individual they claim to be. For this purpose, you can perform a biometric facial search using tools like Pimeyes.com. Free alternatives include search4faces.com and facecheck.id.

Example: looking for the name of this woman, the author of this chapter.

Image: Shereen Sherif Fahmy Youssef

Image: Screenshot, PicEyes

With biometric searching, try using different photos and adjust the brightness, contrast, and sharpness. Also, seek out better-quality versions from the original source.

Exercise: Who is this man?

Save this to your desktop and use Facecheck.id to find out.

 

Facial Comparison

In some investigations, you may need to compare two faces to determine if they are of the same person. For this, you can use tools like Amazon Rekognition, FacePlusPlus, or MXFace, though these services may involve fees.

If the similarity percentage between the two photos is above 80%, you can be fairly confident in using the results.

An example of using this software in journalistic work is described in this BBC article.

The article discusses a claim about a group of men in fishing gear, who appear to have been photographed with Russian President Vladimir Putin on a boat in 2016.

It was alleged that these same people were also photographed at a 2017 church service. Comparing the faces on the boat with those at the church using facial recognition software yielded similarity scores of over 99% for all four men and the woman. This evidence supports the claim that actors were used by Putin on different occasions.

Image: Screenshot, BBC

Verifying Videos

Verifying videos involves answering the following questions: Who? What? When? Where? Why? How?

Some tools, like Youtube DataViewer, can analyze YouTube videos automatically. However, as with many OSINT tools, it has been discontinued. Fortunately, there is a similar tool available as a plugin that you can add to your internet browser: InVID Verification Plugin. This plugin can reverse videos and provide more information, such as the metadata of the video, and it can also be used for images.

Another method is to take screenshots of key frames from the video and use Google Images to perform a reverse image search. This can help you determine when the video first surfaced and whether it is linked to different locations or incidents.

After gathering this information, you should begin looking for clues in the video itself that can help confirm the location and time.

For example, at the BBC, we were asked to verify a video showing recent flooding in Libya and to confirm the exact location.

Another method is to take screenshots of key frames from the video and use Google Images to perform a reverse image search. This can help you determine when the video first surfaced and whether it is linked to different locations or incidents.

After gathering this information, you should begin looking for clues in the video itself that can help confirm the location and time.

For example, we were asked at BBC to verify a video showing recent flooding in Libya and to confirm the exact location.

What clues do we have?

Al Maghar Neighbourhood in Derna is the first clue that has been mentioned in the post, so we started by looking on the map for this location.

After that, We looked for businesses that might have at least 31 security cameras (because we can see on the camera’s shots it is number 31.

Could it be the Derna Medical Centre?

They have security cameras.

Facebook Photos

Matching other clues, like the stairs, the white and black marble, and the stair railing.

Exercise: Verify the video in this link

Please note that to develop the verification skills you need to practice to become familiar with the techniques.

Further readings about using these tools and how they have been used in journalism (you may need to use Google Translate to translate some articles into English):

Palestinian recounts being stripped and driven away by the Israeli army: https://www.bbc.co.uk/news/world-middle-east-67666270

Trump falsely claims Harris’s crowd was faked.

Strikes on south Gaza: BBC verifies attacks in areas of “safety.”

BBC Verify analyzes footage of Ukraine’s incursion into Russia.

Fact Check: How True Are the Claims Surrounding the Killing of an Israeli Businessman in Alexandria (story in Arabic)?

What is known about the mass graves found in a Gaza hospital (in Portuguese)?

What is the truth behind the photos of prisoners in Al-Shifa Complex and the bombing of the surgical building (in Arabic)?

As you can see in this last article, the photos show doctors and nurses at the Al-Shifa Hospital in Gaza after their arrest and stripped by the Israeli forces. But this is not true. These photos were taken by an Israeli soldier at this location inside the Kuwait High School for Girls, near the Indonesian Hospital, in Beit Lahia, north of Gaza.

We reasoned about the features of the place through another photo of the same incident.

Final Tips

  • Always double and triple-check the information you find from other sources.
  • Don’t get discouraged if a company or individual doesn’t have a large online presence. If your target is an organization, someone within the company will likely make a mistake and post information that you can use to your advantage. You just have to be persistent. When researching individuals, there have been many times in my career when an online presence was so small it felt like I was chasing a ghost. Even then, you can typically find small bits of information if you look hard enough.
  • Keep asking yourself: What does this data tell you about the target, and how do the pieces of information fit into the puzzle? Every person is different, and has their own story to tell if you let the data speak to you.
  • As you start using open source intelligence, it’s a great idea to keep a log or build a list of sources you find useful. You can also start with those compiled by BBC Africa Eye.

Shereen Sherif Fahmy Youssef is a BBC senior investigative journalist with expertise in utilizing OSINT intelligence tools and data analysis to uncover stories. With a master’s degree in communication, Journalism, and Related Programs from the University of Sheffield. Currently serving as a Senior Forensics Broadcast Journalist at BBC News Arabic, Prior to her current position, Shereen served as a Senior Producer, leading team responsible for digital and broadcast content focusing on the war in Sudan. 

Republish our articles for free, online or in print, under a Creative Commons license.

Republish this article


Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

Read Next

Methodology Teaching & Training

Document of the Day: U.S. Secret Service Contract for “Dark Web” Research

Paper trails have always been of great interest to investigative journalists. Digging into documents can tell a great deal about people, organizations, and what they’re up to. Here’s today’s Doc of the Day, a contract recently filled by the U.S. Secret Service, the law enforcement group charged with protecting the president and other political VIPs. It’s for “Dark Web Data Subscription.” More than 90% of the Web is thought to be unsearchable by Google and other common search engines. This is often called the dark or deep Web, and it includes sites behind firewalls and passwords, unusual formats, criminal and other hidden networks, and lots and lots of databases.