Stories

•

Topics

Digging Up Hidden Data with the Web Inspector

by Smaranda Tolosano • July 28, 2021

Read this article in

Many reporters never notice the “inspect element” option below the “copy” and save-as” functions in the right-click menu on any webpage related to their investigation.

But it turns out that this little-used web inspector tool can dig up a wealth of hidden information from a site’s source code, reveal the raw data behind graphics, and download images and videos that supposedly cannot be saved.

A simple understanding of this tool and HTML basics can also help reporters scrape data from any web page, with no background in computer science needed.

At IRE21, the Investigative Reporters & Editors’ annual conference, journalist and educator Samantha Sunne shared tips for journalists with little to no coding experience on how to retrieve and analyze data from any web page using two simple tools: the Web Inspector and Google Sheets.

Here are five ways you can use these tools to extract and analyze data from any web page:

1. “Inspect” a website’s source code to extract links, photos, and embedded content.

Every browser offers a version of the Web Inspector in its Developer Tools or Develop tab.

“Browsers are reading the ‘source code’ – the code that makes up the webpage – and displaying it to the user,” explained Sunne.

In her tutorial, Sunne detailed the ways the inspection tool appears on different browsers. In Safari, for example, you can right click on the area of a page you want to inspect and select “Inspect Element.”

With this, you’ll be able to find any hyperlinks and the source of any other materials embedded on the web page. You’ll also be able to read alt text — used to describe the function or content of an image or element on a page — and captions of images, which could include the names of people shown, the location it was taken, and more.

You can refer to an HTML reference guide to find the code identifying embedded photos (<img src=”url”>), as well as links (<a href=”url”>), and other elements.

2. Save images and videos from any website (even Instagram).

“Getting ahold of hard-to-get files is one great way to use the Web Inspector,” according to Sunne.

One key advantage is the ability to retrieve original files, even from websites such as Instagram, which otherwise prevent you from saving the photos or videos they host. It takes just three easy steps:

Right click on the photo or video you want to download and choose “Inspect.” Perform a page search (control or command + F) looking for “<video>” tags, which will bracket the video’s source code.

The Web Inspector will automatically identify all instances in which “<video>” appears in the source code. Then, hover over the highlighted links to find the source link preceded by “src=”. Or, step through all the images/tags:
Finally, click on the source link to open up the photo or video in a separate browser tab and download it with a simple right click.

3. Collect data in an automatically updating spreadsheet.

You’ve gotten your hands on a great dataset crucial to your investigation, but it’s located on a webpage and you can’t download the data as a spreadsheet. So what do you do now?

“Copying and pasting from the webpage works,” Sunne noted. “But the information won’t stay updated, or show me additional information, like the websites the links are leading to.”

Here again, the Web Inspector comes in handy. With it, you can identify the type of data stored on the web page, import it into a Google Sheet, then analyze or visualize it in different ways.

In the example below, we used the Web Inspector to scrape COVID-19 rates from the European Centre for Disease Prevention and Control.

To retrieve the table from this website, we followed these steps:

Right-click on the table or other data set you want to copy and select Inspect to find out what kind of HTML element it is – common elements are “table”, bullet lists (“ul”), and links (“a”).
The Web Inspector will highlight elements on the web page and show the corresponding source code. This is how you can identify HTML elements such as this table.

Fill in the following formula in a new Google Sheet with the element you want extracted — in this case, “table.” If there is only one table on the page you are scraping, the ID will be 0; if there are two, the second table’s ID will be 1, and so on.

=ImportHTML(“url”,“table”,”ID”)

When you enter the =ImportHTML formula, Google Sheets provides you with an example and explanation for how the formula functions and the kinds of data it can retrieve.

=ImportHTML(“https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases”,“table”,0)

Google Sheets will automatically fill the spreadsheet with data scraped from the web page. You can then organize, filter, and visualize the data according to your needs.

“The formula is using the source code to pull in the HTML element ‘table,’” Sunne said. She goes into more detail in her Scraping without Programming tutorial.

4. Extract only a specific type of data

Downloading all of a table or page’s data can be useful in your investigation, but what if you’re searching for all the images on a page or all the links to sources in a report?

Google Sheets allows you to perform this kind of scraping too, using the =ImportXML(“url”,”xpath_query”) formula.

“An XPATH is basically like an address to a bit of data on a page,” Sunne explained. It allows you to retrieve data, even when it’s not formatted, into a neat table on the webpage.

In the panel, Sunne showed examples of useful XPATHs, such as for scraping all headers containing a specific country name.

If you want to keep track of interesting clips on your investigation topic, you could also scrape URLs and headlines from any news site using this formula:

=IMPORTXML(“url”,”//CLASS[contains(”country”)]”)

=IMPORTXML(“https://www.nytimes.com/section/world”,”//h2“) will scrape all of the “h2” elements from the page into the Google Sheet.

=IMPORTXML(“https://www.nytimes.com/section/world”,”//h2[contains(.,’China’)]”) will just scrape the h2 elements that include the word “China”.

For instance, we scraped for all headlines containing the word “China” in The New York Times’ world section using the following:

Inspect the webpage to identify the class (i.e., text type) you’re looking for (“p” for paragraph, “h1” for a header, “h2” for subheadings…)
Insert the word you are looking for in the formula (replace “country”)
Have the data automatically loaded to your Google Sheets once a day!

5. Free apps (if you’re terrified of code)

If all of this hasn’t convinced you to learn a bit of HTML, you might still try using browser extensions or free apps. They give you less control over how the data is collected and formatted, but will save you the trouble of writing code lines and spreadsheet formulas.

Sunne recommended the following:

Parsehub: A desktop application capable of scraping data from any website, including interactive content (and will extract data from pages coded using JavaScript or AJAX). It doesn’t require coding knowledge due to its user-friendly interface and lets you upload data to Excel and JSON as well as import to Google Sheets and the Tableau analytics platform.
Outwit: In addition to their web scraper, Outwit offers services to build a custom scraper, automate scraping, and even extract the data for you.
WebScraper: An easy point-and-click solution for those who prefer not to deal with code, WebScraper is able to build “site maps” based on the website’s structure and data points you want to extract.

Additional Resources

GIJN Resource Center’s Scraping Data
GIJN’s Data Journalism Guide: Tools for Scraping, Cleaning, and Prepping Data
Web Scraping: A Journalist’s Guide

Smaranda Tolosano manages translations and partnerships for GIJN. She previously reported for the Thomson Reuters Foundation in Morocco, covering the government’s use of spyware to target regime dissidents and the emergence of feminist movements on social media.

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Republish our articles for free, online or in print, under a Creative Commons license.

Read other stories tagged with:

cleaning data data analysis data journalism data tools web scraping website tools

Republish this article

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

<h2>Digging Up Hidden Data with the Web Inspector</h2> by <a href="https://twitter.com/_smaranda">Smaranda Tolosano</a> for Global Investigative Journalism Network &bull; July 28, 2021 Many reporters never notice the &ldquo;inspect element&rdquo; option below the &ldquo;copy&rdquo; and save-as&rdquo; functions in the right-click menu on any webpage related to their investigation.But it turns out that this little-used web inspector tool can dig up a wealth of hidden information from a site&rsquo;s source code, reveal the raw data behind graphics, and download images and videos that supposedly cannot be saved.A simple understanding of this tool and HTML basics can also help reporters scrape data from any web page, with no background in computer science needed.At IRE21, the Investigative Reporters &amp; Editors&rsquo; annual conference, journalist and educator <a href="https://twitter.com/SamanthaSunne">Samantha Sunne</a> shared tips for journalists with little to no coding experience on how to retrieve and analyze data from any web page using two simple tools: the Web Inspector and Google Sheets.Here are five ways you can use these tools to extract and analyze data from any web page:<h4>1. &ldquo;Inspect&rdquo; a website&rsquo;s source code to extract links, photos, and embedded content.</h4>Every browser offers a version of the Web Inspector in its Developer Tools or Develop tab.&ldquo;Browsers are reading the &lsquo;source code&rsquo; &ndash; the code that makes up the webpage &ndash; and displaying it to the user,&rdquo; explained Sunne.In <a href="https://docs.google.com/document/d/1xJUYhPNXfL8N9WAQJbKg3tSNe3z8nL4czH_Oek4GEGc/edit">her tutorial,</a> Sunne detailed the ways the inspection tool appears on different browsers. In Safari, for example, you can right click on the area of a page you want to inspect and select &ldquo;Inspect Element.&rdquo;&nbsp;With this, you&rsquo;ll be able to find any hyperlinks and the source of any other materials embedded on the web page. You&rsquo;ll also be able to read alt text &mdash; used to describe the function or content of an image or element on a page &mdash; and captions of images, which could include the names of people shown, the location it was taken, and more.You can refer to an <a href="https://html.com/#HTML_Reference_Guides">HTML reference guide</a> to find the code identifying embedded photos (&lt;img src=&rdquo;url&rdquo;&gt;), as well as links (&lt;a href=&rdquo;url&rdquo;&gt;), and other elements.<h4>2. Save images and videos from any website (even Instagram).</h4>&ldquo;Getting ahold of hard-to-get files is one great way to use the Web Inspector,&rdquo; according to Sunne.One key advantage is the ability to retrieve original files, even from websites such as Instagram, which otherwise prevent you from saving the photos or videos they host. It takes just three easy steps:<ul>
<li>Right click on the photo or video you want to download and choose &ldquo;Inspect.&rdquo; Perform a page search (control or command + F) looking for &ldquo;&lt;video&gt;&rdquo; tags, which will bracket the video&rsquo;s source code.</li>
</ul><ul>
<li>The Web Inspector will automatically identify all instances in which &ldquo;&lt;video&gt;&rdquo; appears in the source code. Then, hover over the highlighted links to find the source link preceded by &ldquo;src=&rdquo;. Or, step through all the images/tags:&nbsp; <img class="wp-image-358503 size-full" src="https://gijn.org/wp-content/uploads/2021/07/Screenshot-2021-07-15-at-13.16.31.png" alt="Using the Web Inspector to download images" width="1138" height="154"></li>
<li>Finally, click on the source link to open up the photo or video in a separate browser tab and download it with a simple right click.</li>
</ul><h4>3. Collect data in an automatically updating spreadsheet.</h4>You&rsquo;ve gotten your hands on a great dataset crucial to your investigation, but it&rsquo;s located on a webpage and you can&rsquo;t download the data as a spreadsheet. So what do you do now?&ldquo;Copying and pasting from the webpage works,&rdquo; Sunne noted. &ldquo;But the information won't stay updated, or show me additional information, like the websites the links are leading to.&rdquo;Here again, the Web Inspector comes in handy. With it, you can identify the type of data stored on the web page, import it into a Google Sheet, then analyze or visualize it in different ways.In the example below, we used the Web Inspector to scrape COVID-19 rates from the <a href="https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases">European Centre for Disease Prevention and Control</a>.To retrieve the table from this website, we followed these steps:<ul>
<li>Right-click on the table or other data set you want to copy and select Inspect to find out what kind of HTML element it is - common elements are &ldquo;table&rdquo;, bullet lists (&ldquo;ul&rdquo;), and links (&ldquo;a&rdquo;).</li>
<li>The Web Inspector will highlight elements on the web page and show the corresponding source code. This is how you can identify HTML elements such as this table.</li>
</ul><ul>
<li>Fill in the following formula in a new Google Sheet with the element you want extracted -- in this case, &ldquo;table.&rdquo; If there is only one table on the page you are scraping, the ID will be 0; if there are two, the second table&rsquo;s ID will be 1, and so on.</li>
</ul>=ImportHTML(&ldquo;url&rdquo;,&ldquo;table&rdquo;,&rdquo;ID&rdquo;)<ul>
<li>When you enter the =ImportHTML formula, Google Sheets provides you with an example and explanation for how the formula functions and the kinds of data it can retrieve.</li>
</ul>=ImportHTML(&ldquo;<a href="https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases%E2%80%9D,%E2%80%9Ctable%E2%80%9D,0">https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases&rdquo;,&ldquo;table&rdquo;,0</a>)<a href="https://gijn.org/wp-content/uploads/2021/07/Webscraping_img_02.png"><img class="wp-image-358543 size-large aligncenter" src="https://gijn.org/wp-content/uploads/2021/07/Webscraping_img_02-771x664.png" alt="Use Google Sheets to download data" width="771" height="664"></a><ul>
<li>Google Sheets will automatically fill the spreadsheet with data scraped from the web page. You can then organize, filter, and visualize the data according to your needs.</li>
</ul>&ldquo;The formula is using the source code to pull in the HTML element 'table,'&rdquo; Sunne said. She goes into more detail in her <a href="https://docs.google.com/presentation/u/1/d/1mPCXxmyEhBGRlwaLQOTZ6B8uXjKHzBDL_QLp6bw09hk/edit?usp=sharing">Scraping without Programming</a> tutorial.<h5>4. Extract only a specific type of data</h5>Downloading all of a table or page&rsquo;s data can be useful in your investigation, but what if you&rsquo;re searching for all the images on a page or all the links to sources in a report?Google Sheets allows you to perform this kind of scraping too, using the&nbsp; =ImportXML(&ldquo;url&rdquo;,&rdquo;xpath_query&rdquo;) formula.&ldquo;An XPATH is basically like an address to a bit of data on a page,&rdquo; Sunne explained. It allows you to retrieve data, even when it&rsquo;s not formatted, into a neat table on the webpage.In the panel, Sunne showed examples of useful XPATHs, such as for scraping all headers containing a specific country name.If you want to keep track of interesting clips on your investigation topic, you could also scrape URLs and headlines from any news site using this formula:=IMPORTXML(&ldquo;url&rdquo;,&rdquo;//CLASS[contains(&rdquo;country&rdquo;)]&rdquo;)=IMPORTXML("<a href="https://www.nytimes.com/section/world%22,%22//h2">https://www.nytimes.com/section/world","//h2</a>") will scrape all of the &ldquo;h2&rdquo; elements from the page into the Google Sheet.=IMPORTXML("<a href="https://www.nytimes.com/section/world%22,%22//h2%5Bcontains(.,'China')">https://www.nytimes.com/section/world","//h2[contains(.,'China')</a>]") will just scrape the h2 elements that include the word &ldquo;China&rdquo;.For instance, we scraped for all headlines containing the word &ldquo;China&rdquo; in The New York Times&rsquo; world section using the following:<ul>
<li>Inspect the webpage to identify the class (i.e., text type) you&rsquo;re looking for (&ldquo;p&rdquo; for paragraph, &ldquo;h1&rdquo; for a header, &ldquo;h2&rdquo; for subheadings&hellip;)</li>
<li>Insert the word you are looking for in the formula (replace &ldquo;country&rdquo;)</li>
<li>Have the data automatically loaded to your Google Sheets once a day!</li>
</ul><h4>5. Free apps (if you&rsquo;re terrified of code)</h4>If all of this hasn&rsquo;t convinced you to learn a bit of HTML, you might still try using browser extensions or free apps. They give you less control over how the data is collected and formatted, but will save you the trouble of writing code lines and spreadsheet formulas.Sunne recommended the following:<ul>
<li><a href="https://www.parsehub.com/">Parsehub</a>: A desktop application capable of scraping data from any website, including interactive content (and will extract data from pages coded using JavaScript or AJAX). It doesn&rsquo;t require coding knowledge due to its user-friendly interface and lets you upload data to Excel and JSON as well as import to Google Sheets and the Tableau analytics platform.</li>
<li><a href="https://www.outwit.com/">Outwit</a>: In addition to their web scraper, Outwit offers services to build a custom scraper, automate scraping, and even extract the data for you.</li>
<li><a href="https://webscraper.io/">WebScraper</a>: An easy point-and-click solution for those who prefer not to deal with code, WebScraper is able to build &ldquo;site maps&rdquo; based on the website&rsquo;s structure and data points you want to extract.</li>
</ul><h4>Additional Resources</h4><ul>
<li>GIJN Resource Center&rsquo;s <a href="https://gijn.org/scraping-data/">Scraping Data</a></li>
<li>GIJN's Data Journalism Guide: <a href="https://gijn.org/tools-for-scraping-cleaning-and-prepping-data/">Tools for Scraping, Cleaning, and Prepping Data</a></li>
<li><a href="https://gijn.org/2015/08/11/web-scraping-a-journalists-guide/">Web Scraping: A Journalist&rsquo;s Guide</a></li>
</ul><hr><a href="https://gijn.org/wp-content/uploads/2021/06/Smaranda-Tolosano.jpg"><img class="alignleft size-thumbnail wp-image-341729" src="https://gijn.org/wp-content/uploads/2021/06/Smaranda-Tolosano-140x140.jpg" alt="Portrait of Smaranda Tolosano, GIJN" width="140" height="140"></a><a href="https://twitter.com/_smaranda">Smaranda Tolosano</a> manages translations and partnerships for GIJN. She previously reported for the <a href="https://news.trust.org/item/20191106165838-jkgbz">Thomson Reuters Foundation</a> in Morocco, covering&nbsp;the government's use of spyware to target regime dissidents and the emergence of feminist movements on social media.
	This <a target="_blank" href="https://gijn.org/stories/digging-up-hidden-data-with-the-web-inspector/">article</a> first appeared on <a target="_blank" href="https://gijn.org">Global Investigative Journalism Network</a> and is republished here under a Creative Commons license.
	<img id="republication-tracker-tool-source" src="https://gijn.org/?republication-pixel=true&amp;post=657947&amp;ga=UA-21528033-17">

From NICAR25, Four Free Cutting Edge and Time-Saving Investigative Data Tools

by Rowan Philp • March 13, 2025

GIJN shares some of the no-cost, easy-to-use data tools that NICAR25 conference panelists described as surprisingly useful but unknown by investigative reporters.

Website code open source metadata analysis tool

Reporting Tools & Tips

Look Inside the Open Source ‘Information Laundromat’ Tool for Examining Website Content and Metadata

by Craig Silverman, Digital Investigations • November 8, 2024

The Information Laundromat is one of the newest open source website analysis tools, developed by the Alliance For Securing Democracy.

A map of the positions of liquified natural gas (LNG) tankers around the world, showing key trade routes between the US East Coast, North West Europe and East Asia. Source: LSEG.

Climate Reporting Tools & Tips

Fossil Fuel Investigations: How to Find Stories Using Oil and Gas Supply Chains

by Michael Hornsby and Sam Leon • October 17, 2024

The oil and gas industry is complex and notoriously opaque. But with new tools, it’s become easier for investigators to dig into this field.

Data journalism training class at Izmir University of Economics in Türkiye

Data Journalism Reporting Tools & Tips

Tips for Using Data in a Small Newsroom

by Pınar Dağ • June 5, 2024

Small newsrooms need to focus on the importance of data use more than ever — but they often face numerous hurdles, including a lack of funding and limited human resources.

Accessibility Settings

text size

color options

reading tools

other

Stories

Topics

Digging Up Hidden Data with the Web Inspector

Read this article in

1. “Inspect” a website’s source code to extract links, photos, and embedded content.

2. Save images and videos from any website (even Instagram).

3. Collect data in an automatically updating spreadsheet.

4. Extract only a specific type of data

5. Free apps (if you’re terrified of code)

Additional Resources

Read other stories tagged with:

Republish this article

Read Next

Data Journalism Reporting Tools & Tips

From NICAR25, Four Free Cutting Edge and Time-Saving Investigative Data Tools

Reporting Tools & Tips

Look Inside the Open Source ‘Information Laundromat’ Tool for Examining Website Content and Metadata

Climate Reporting Tools & Tips

Fossil Fuel Investigations: How to Find Stories Using Oil and Gas Supply Chains

Data Journalism Reporting Tools & Tips

Tips for Using Data in a Small Newsroom

Stories

Topics

Digging Up Hidden Data with the Web Inspector

Read this article in

Related Resources

Step-By-Step Guide for Journalists on the Basics of Google Sheets

Tipsheet for Using Ocean Data in Your Investigations

GIJC23 – The Future of Data Journalism: New Analytical Tools, Data Visualization, and AI

GIJC23 – The Basics of Using Google Sheets

Share

1. “Inspect” a website’s source code to extract links, photos, and embedded content.

2. Save images and videos from any website (even Instagram).

3. Collect data in an automatically updating spreadsheet.

4. Extract only a specific type of data

5. Free apps (if you’re terrified of code)

Additional Resources

Related Resources

Step-By-Step Guide for Journalists on the Basics of Google Sheets

Tipsheet for Using Ocean Data in Your Investigations

GIJC23 – The Future of Data Journalism: New Analytical Tools, Data Visualization, and AI

GIJC23 – The Basics of Using Google Sheets

Related Stories

From NICAR25, Four Free Cutting Edge and Time-Saving Investigative Data Tools

Look Inside the Open Source ‘Information Laundromat’ Tool for Examining Website Content and Metadata

Fossil Fuel Investigations: How to Find Stories Using Oil and Gas Supply Chains

Tips for Using Data in a Small Newsroom

Read other stories tagged with:

Republish this article

Read Next

Data Journalism Reporting Tools & Tips

From NICAR25, Four Free Cutting Edge and Time-Saving Investigative Data Tools

Reporting Tools & Tips

Look Inside the Open Source ‘Information Laundromat’ Tool for Examining Website Content and Metadata

Climate Reporting Tools & Tips

Fossil Fuel Investigations: How to Find Stories Using Oil and Gas Supply Chains

Data Journalism Reporting Tools & Tips

Tips for Using Data in a Small Newsroom