Topics

» Investigative Techniques » Reporting Tools & Tips

4 More Essential Tips for Using the Wayback Machine

by Craig Silverman, Digital Investigations • May 11, 2023

Read this article in

中文

ProPublica’s Craig Silverman explains how to bulk archive pages, compare changes, and see when elements of a page were archived.

The previous edition of Digital Investigations offered advice for getting the most out of the Wayback Machine. Now I’m back with even more tips, thanks to an interview with Mark Graham, director of the Wayback Machine.

He pointed to a few features I forgot to mention along with one I wasn’t aware of. We also talked about the challenge of archiving social media content.

The Wayback Machine is run by the Internet Archive, a 27 year-old nonprofit dedicated to providing universal access to all knowledge. “We are a digital library,” Graham said.

As a library, it has patrons instead of users, he said. Let’s look at some useful features for journalist and researcher patrons.

1. View and Compare Changes

The Changes feature lets you compare different versions of the same archived page and see the differences.

“Maybe a journalist is writing a story showing how a content material on a webpage has changed over time,” Graham said. “In that case, they would need to know about the Changes feature of the Wayback Machine, where you can compare the material on one URL on two different points of time.”

The Changes feature is accessible from the top menu of any archived page you’re browsing in the Wayback Machine:

Image: Screenshot

You can also load it directly with this URL format: https://web.archive.org/web/changes/https://www.nytco.com/journalism/

Place the URL you want to compare after https://web.archive.org/web/changes/ and it will bring up a page that shows year by year archive grids:

wayback machine website archive calendar

Image: Screenshot, Wayback Machine

Each shaded square corresponds to a page capture, and the color legend indicates which days may have significant changes. Select two captures and then click the “Compare” button at the top of the page. You get a side-by-side view of the captures.

I chose a page from early March 2023 (left) and one from early January 2022 (right). The comparison showed that the New York Times corporate page about its journalism had updated the footer menu options and text:

Wayback Machine side-by-side comparison website archive

Image: Screenshot, Wayback Machine

2. Use ‘About this Capture’ to Verify Page Elements

The basic description of the Wayback Machine is that captures and stores archives of webpages. The reality is a little more nuanced.

“The web is messy, the web is constantly changing,” Graham said. “And when I say constantly changing, it can also be dynamic.”

I asked him how confident we can be that the archive shows exactly what was on a page at the date and time listed in the Wayback Machine. The short answer is that, yes, you can have confidence. But elements of an archived page can be drawn from different archived material, each with its own timestamp. This is where the nuance comes in.

The Wayback Machine has a feature that lets you view the timestamps of different elements on a page. You access it by clicking on the “About this capture” button in the upper right-hand corner of a page capture:

Wayback Machine About this capture function

Image: Screenshot

Using https://www.nytco.com/journalism/ as an example, here’s what we get:

Image: Screenshot, Wayback Machine

Even though the page was archived on Oct. 20, 2021, the capture pulls some elements from more recent archives. Most of the page elements listed above are images that make up the page template. A couple of the files are JavaScript and CSS. Graham explained that the Wayback Machine pulls from different images, JavaScript, and CSS files to make the page when you view it.

“When we replay a page, we actually take and gather together each of those page requisites represented by its own URL with its own archive, and we put them together,” he said. “One of the challenges is that each of those objects could be archived at a different time in date.”

For example, the main photo at the top of the page (“17XP-PULITZERS2-superJumbo-article.jpg”) was pulled from a a capture taken 8 days prior to me loading the archive. If that photo/file is important to your investigation, you’d want to check its archive page to see if it’s changed over time, or to look for a capture closer to the target date. But as long as that file has remained the same over time, you’re OK.

As a general but not absolute rule, the body text on a typical webpage is not pulled from a separate page or file. It’s therefore less likely to be affected by this dynamic. But the safest option is to check “About this capture” and make sure that the text, images, or other element on the page capture you’re citing are consistent with the date you’re interested in.

3. Bulk Archive URLs Using Google Sheets

Graham reminded me that you can bulk upload URLs for archiving using Google Sheets. The process is pretty simple. First create a Google Sheet with a single column that lists the URLs you want to archive. Then go here to connect your Google account to your archive.org account.

Image: Screenshot, Internet Archive

Once that’s completed, you’ll see this screen. Click on “Archive URLs.”

Image: Screenshot, Internet Archive

Now you can insert a link to your Google Sheet containing URLs you want to archive.

Image: Screenshot, Internet Archive

Since you connected to your Google and archive.org accounts, all of the captures will be stored in your archive.org account for easy retrieval.

“That feature came about because my wife once asked me, ‘Mark, how can I easily archive a bunch of URLs?’” he said.

Graham worked with engineers at the Internet Archive to make it happen.

4. Email Your Feedback and Requests

“Many, many, many features of the Wayback Machine exists today because a patron asked for them, a patron asked a question, or made a suggestion or recommendation,” Graham said. “We really appreciate requests and questions.”

He encouraged people to email info@archive.org.

“We receive hundreds of emails a day and we have a team of people that review them and respond to them,” Graham said. “I personally respond to the ones about the Wayback Machine that can’t be handled by the first level of response.”

He especially encouraged journalists to reach out if they have questions or requests.

Bonus Info: Archiving Social Media

Power users of the Wayback Machine know it ranges from difficult to impossible to archive social media content there. This has less to do with the its functionality and limitations, and more to do with how companies like Meta try to thwart scraping.

Here’s what Graham said about why it’s hard to archive content from social media:

Just as some other websites are more challenging to archive than other websites, in particular Facebook and Instagram represent challenges. They take active measures to try to prevent various kinds of automation, including scraping. If you go to the Facebook site, for example, there’s a section about web scraping where they talk about the staffing they have dedicated to efforts to prevent web scraping and web archiving.

We work respectfully with the web. This isn’t our material. As a library we work to make material generally available. So in the case of Facebook and Instagram, we do try. And we we think it’s completely appropriate for us to archive publicly accessible information. So this would be say, for example, the public Facebook pages of the communications departments of the country of Ukraine or China.

One piece of encouraging news is Graham said the Wayback Machine is “actively working with several media organizations” to try and improve social media archiving. Hopefully things improve soon.

This post was originally published in Craig Silverman’s Digital Investigations Substack newsletter and is reprinted here with permission.

Additional Resources

My Favorite Tools with BuzzFeed’s Craig Silverman

5 Online Search Tools to Make Journalists’ Lives Easier

Tips for Using the Internet Archive’s Wayback Machine in Your Next Investigation

Craig Silverman is a national reporter for ProPublica, covering voting, platforms, disinformation, and online manipulation. He was previously media editor of BuzzFeed News, where he pioneered coverage of digital disinformation.

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Republish our articles for free, online or in print, under a Creative Commons license.

Read other stories tagged with:

Cross post digital investigations Internet Internet Archive investigative Journalism online journalism Wayback Machine

Republish this article

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License

Material from GIJN’s website is generally available for republication under a Creative Commons Attribution-NonCommercial 4.0 International license. Images usually are published under a different license, so we advise you to use alternatives or contact us regarding permission. Here are our full terms for republication. You must credit the author, link to the original story, and name GIJN as the first publisher. For any queries or to send us a courtesy republication note, write to hello@gijn.org.

<h2>4 More Essential Tips for Using the Wayback Machine</h2> by <a href="https://twitter.com/CraigSilverman">Craig Silverman, Digital Investigations</a> for Global Investigative Journalism Network &bull; May 11, 2023 ProPublica's Craig Silverman explains how to bulk archive pages, compare changes, and see when elements of a page were archived.The previous edition of <a href="https://digitalinvestigations.substack.com/">Digital Investigations</a> offered advice for&nbsp;<a href="https://digitalinvestigations.substack.com/p/getting-the-most-out-of-the-wayback" rel="">getting the most out of the Wayback Machine</a>. Now I&rsquo;m back with even more tips, thanks to an interview with Mark Graham, director of the Wayback Machine.He pointed to a few features I forgot to mention along with one I wasn&rsquo;t aware of. We also talked about the challenge of archiving social media content.The&nbsp;<a href="https://archive.org/web/" rel="">Wayback Machine</a>&nbsp;is run by the&nbsp;<a href="https://archive.org/" rel="">Internet Archive</a>, a 27 year-old nonprofit dedicated to providing universal access to all knowledge. &ldquo;We are a digital library,&rdquo; Graham said.As a library, it has patrons instead of users, he said. Let&rsquo;s look at some useful features for journalist and researcher patrons.<h4 class="header-with-anchor-widget">1. View and Compare Changes</h4>The Changes feature lets you compare different versions of the same archived page and see the differences.&ldquo;Maybe a journalist is writing a story showing how a content material on a webpage has changed over time,&rdquo; Graham said. &ldquo;In that case, they would need to know about the Changes feature of the Wayback Machine, where you can compare the material on one URL on two different points of time.&rdquo;The Changes feature is accessible from the top menu of any archived page you&rsquo;re browsing in the Wayback Machine:You can also load it directly with this URL format:&nbsp;https://web.archive.org/web/changes/https://www.nytco.com/journalism/Place the URL you want to compare after&nbsp;https://web.archive.org/web/changes/&nbsp;and it will bring up a page that shows year by year archive grids:Each shaded square corresponds to a page capture, and the color legend indicates which days may have significant changes. Select two captures and then click the &ldquo;Compare&rdquo; button at the top of the page. You get a side-by-side view of the captures.I&nbsp;chose&nbsp;a page from early March 2023 (left) and one from early January 2022 (right). The <a href="https://web.archive.org/web/diff/20230305234120/20220108205806/nytco.com/journalism/">comparison showed</a> that the New York Times corporate page about its journalism had updated the footer menu options and text:<h4 class="header-with-anchor-widget">2. Use 'About this Capture' to Verify Page Elements</h4>The basic description of the Wayback Machine is that captures and stores archives of webpages. The reality is a little more nuanced.&ldquo;The web is messy, the web is constantly changing,&rdquo; Graham said. &ldquo;And when I say constantly changing, it can also be dynamic.&rdquo;I asked him how confident we can be that the archive shows exactly what was on a page at the date and time listed in the Wayback Machine. The short answer is that, yes, you can have confidence. But elements of an archived page can be drawn from different archived material, each with its own timestamp. This is where the nuance comes in.The Wayback Machine has a feature that lets you view the timestamps of different elements on a page. You access it by clicking on the "About this capture" button in the upper right-hand corner of a page capture:<figure>
</figure>Using https://www.nytco.com/journalism/ as an example, here&rsquo;s what we get:Even though the page was archived on Oct. 20, 2021, the capture pulls some elements from more recent archives. Most of the page elements listed above are images that make up the page template. A couple of the files are JavaScript and CSS. Graham explained that the Wayback Machine pulls from different images, JavaScript, and CSS files to make the page when you view it.&ldquo;When we replay a page, we actually take and gather together each of those page requisites represented by its own URL with its own archive, and we put them together,&rdquo; he said. &ldquo;One of the challenges is that each of those objects could be archived at a different time in date.&rdquo;For example, the main photo at the top of the&nbsp;<a href="https://www.nytco.com/journalism/" rel="">page</a>&nbsp;(&ldquo;17XP-PULITZERS2-superJumbo-article.jpg&rdquo;) was pulled from a a capture taken 8 days prior to me loading the archive. If that photo/file is important to your investigation, you&rsquo;d want to check its <a href="https://web.archive.org/web/20230000000000*/https://nytco-assets.nytimes.com/2018/10/17XP-PULITZERS2-superJumbo-article.jpg?quality=70&amp;auto=webp&amp;width=2400" rel="">archive page&nbsp;</a>to see if it&rsquo;s changed over time, or to look for a capture closer to the target date. But as long as that file has remained the same over time, you&rsquo;re OK.<figure>
</figure>As a general but not absolute rule, the body text on a typical webpage is not pulled from a separate page or file. It&rsquo;s therefore less likely to be affected by this dynamic. But the safest option is to check &ldquo;About this capture&rdquo; and make sure that the text, images, or other element on the page capture you&rsquo;re citing are consistent with the date you&rsquo;re interested in.<h4 class="header-with-anchor-widget">3. Bulk Archive URLs Using Google Sheets</h4>Graham reminded me that you can bulk upload URLs for archiving using Google Sheets.&nbsp;The process is pretty simple. First create a Google Sheet with a single column that lists the URLs you want to archive. Then&nbsp;<a href="https://archive.org/services/wayback-gsheets/" rel="">go here</a>&nbsp;to connect your Google account to your archive.org account.Once that&rsquo;s completed, you&rsquo;ll see this screen. Click on &ldquo;Archive URLs.&rdquo;Now you can insert a link to your Google Sheet containing URLs you want to archive.Since you connected to your Google and archive.org accounts, all of the captures will be stored in your archive.org account for easy retrieval.&ldquo;That feature came about because my wife once asked me, &lsquo;Mark, how can I easily archive a bunch of URLs?&rsquo;&rdquo; he said.Graham worked with engineers at the Internet Archive to make it happen.<h4 class="header-with-anchor-widget">4. Email Your Feedback and Requests</h4>&ldquo;Many, many, many features of the Wayback Machine exists today because a patron asked for them, a patron asked a question, or made a suggestion or recommendation,&rdquo; Graham said. &ldquo;We really appreciate requests and questions.&rdquo;He encouraged people to email <a href="mailto:info@archive.org">info@archive.org</a>.&ldquo;We receive hundreds of emails a day and we have a team of people that review them and respond to them,&rdquo; Graham said. &ldquo;I personally respond to the ones about the Wayback Machine that can't be handled by the first level of response.&rdquo;He especially encouraged journalists to reach out if they have questions or requests.<h4 class="header-with-anchor-widget">Bonus Info: Archiving Social Media</h4>Power users of the Wayback Machine know it ranges from difficult to impossible to archive social media content there. This has less to do with the its functionality and limitations, and more to do with how companies like Meta try to thwart scraping.Here&rsquo;s what Graham said about why it&rsquo;s hard to archive content from social media:<blockquote>Just as some other websites are more challenging to archive than other websites, in particular Facebook and Instagram represent challenges. They take active measures to try to prevent various kinds of automation, including scraping. If you go to the Facebook site, for example, there's a&nbsp;<a href="https://www.facebook.com/help/463983701520800">section&nbsp;about web scraping</a> where they talk about the staffing they have dedicated to efforts to prevent web scraping and web archiving.
We work respectfully with the web. This isn't our material. As a library we work to make material generally available. So in the case of Facebook and Instagram, we do try. And we we think it's completely appropriate for us to archive publicly accessible information. So this would be say, for example, the public Facebook pages of the communications departments of the country of Ukraine or China.</blockquote>One piece of encouraging news is Graham said the Wayback Machine is &ldquo;actively working with several media organizations&rdquo; to try and improve social media archiving. Hopefully things improve soon.This post was originally published in Craig Silverman's <a href="https://digitalinvestigations.substack.com/p/4-more-essential-tips-for-using-the">Digital Investigations Substack newsletter</a> and is reprinted here with permission.<h4>Additional Resources</h4><a href="https://gijn.org/2020/07/15/my-favorite-tools-with-buzzfeeds-craig-silverman/">My Favorite Tools with BuzzFeed&rsquo;s Craig Silverman</a><a href="https://gijn.org/2022/11/01/5-online-search-tools-to-make-journalists-lives-easier/">5 Online Search Tools to Make Journalists&rsquo; Lives Easier</a><a href="https://gijn.org/2021/05/05/tips-for-using-the-internet-archives-wayback-machine-in-your-next-investigation/">Tips for Using the Internet Archive&rsquo;s Wayback Machine in Your Next Investigation</a><figure>
</figure><hr><a href="https://gijn.org/wp-content/uploads/2023/05/Screenshot-2023-05-04-at-14.36.13.png"><img class="alignleft wp-image-637172" src="https://gijn.org/wp-content/uploads/2023/05/Screenshot-2023-05-04-at-14.36.13-140x140.png" alt="" width="100" height="100"></a><a href="https://www.craigsilverman.ca/">Craig Silverman </a>is a national reporter for ProPublica, covering voting, platforms, disinformation, and online manipulation. He was previously media editor of BuzzFeed News, where he pioneered coverage of digital disinformation.
	This <a target="_blank" href="https://gijn.org/resource/4-more-essential-tips-for-using-the-wayback-machine/">article</a> first appeared on <a target="_blank" href="https://gijn.org">Global Investigative Journalism Network</a> and is republished here under a Creative Commons license.
	<img id="republication-tracker-tool-source" src="https://gijn.org/?republication-pixel=true&amp;post=657947&amp;ga=UA-21528033-17">

How to Verify Bystander Video

by Alex Mahadevan, Poynter • April 20, 2026

From the Minneapolis shootings to the Guthrie kidnapping, visual investigation skills are now mandatory. Here’s how to do it.

Investigative Techniques

Updated Test of 24 LLMs for Geolocation

by Foeke Postma, Bellingcat • September 3, 2025

Bellingcat ran a series of geolocation challenges using two dozen popular AI platforms, with varying results.

New York Times reporter Azmat Khan, speaking at a panel event in New York City.

Reporting Tools & Tips

This Reporter Exposed the Civilian Toll of US Airstrikes. Her Warning: Be Ready for a More Hostile World

by Maurice Oniango, The Reuters Institute • June 19, 2025

Azmat Khan is known for her rigorous reporting on civilian casualties from US airstrikes, and for exposing systemic failures in military and government accountability.

Turning Unreadable Text into Evidence, Henk van Ess, Digital Digging

Reporting Tools & Tips

Tips for Turning Unreadable Text Into Evidence

by Henk van Ess, Digital Digging • May 29, 2025

Whether it’s extracting names from footage, decoding social media posts, or reading distorted text in documents, Henk van Ess explains how free digital tools can take on these investigative tasks.

Accessibility Settings

text size

color options

reading tools

other

Resource

Stories

Topics

4 More Essential Tips for Using the Wayback Machine

Read this article in

1. View and Compare Changes

2. Use ‘About this Capture’ to Verify Page Elements

3. Bulk Archive URLs Using Google Sheets

4. Email Your Feedback and Requests

Bonus Info: Archiving Social Media

Additional Resources

Read other stories tagged with:

Republish this article

Read Next

Investigative Techniques Reporting Tools & Tips

How to Verify Bystander Video

Investigative Techniques

Updated Test of 24 LLMs for Geolocation

Reporting Tools & Tips

This Reporter Exposed the Civilian Toll of US Airstrikes. Her Warning: Be Ready for a More Hostile World

Reporting Tools & Tips

Tips for Turning Unreadable Text Into Evidence

Resource

Stories

Topics

4 More Essential Tips for Using the Wayback Machine

Read this article in

Related Resources

Toolkit: How to Investigate Illegal, Unreported, and Unregulated (IUU) Fishing

Investigating Elections: Threat from AI Audio Deepfakes

GIJC23 – The Future of Data Journalism: New Analytical Tools, Data Visualization, and AI

GIJC23 – The Basics of Using Google Sheets

Share

1. View and Compare Changes

2. Use ‘About this Capture’ to Verify Page Elements

3. Bulk Archive URLs Using Google Sheets

4. Email Your Feedback and Requests

Bonus Info: Archiving Social Media

Additional Resources

Related Resources

Toolkit: How to Investigate Illegal, Unreported, and Unregulated (IUU) Fishing

Investigating Elections: Threat from AI Audio Deepfakes

GIJC23 – The Future of Data Journalism: New Analytical Tools, Data Visualization, and AI

GIJC23 – The Basics of Using Google Sheets

Related Stories

How to Verify Bystander Video

Updated Test of 24 LLMs for Geolocation

This Reporter Exposed the Civilian Toll of US Airstrikes. Her Warning: Be Ready for a More Hostile World

Tips for Turning Unreadable Text Into Evidence

Read other stories tagged with:

Republish this article

Read Next

Investigative Techniques Reporting Tools & Tips

How to Verify Bystander Video

Investigative Techniques

Updated Test of 24 LLMs for Geolocation

Reporting Tools & Tips

This Reporter Exposed the Civilian Toll of US Airstrikes. Her Warning: Be Ready for a More Hostile World

Reporting Tools & Tips

Tips for Turning Unreadable Text Into Evidence