January 27, 2015

Introducing The Research Desk: Secrets of the Wayback Machine

Print More

sketch garyGIJN is delighted to introduce The Research Desk, a new feature by noted librarian and consultant Gary Price. Gary, who writes the INFOdocket blog for Library Journal, will offer GIJN readers items on the latest databases, digital tools, and documents around the world. 

Before launching INFOdocket (@infodocket), Gary was co-founder and senior editor at ResourceShelf and DocuTicker for 10 years. He’s also served as director of online information services at Ask.com and contributing editor at Search Engine Land, and is co-author of The Invisible Web. Welcome, Gary!

red tel booth time machine3The Wayback Machine, one of the many services from the Internet Archive, is among the most useful and important tools for anyone who uses the Internet for research.


It’s by far the largest archive of web pages available to the public (more than 435 billion pages dating back to 1996 as of January 2015). Without it, trying to find and access old web content would be nearly impossible, short of attempting to contact the webmaster and asking them if they have a copy of the page or pages from the date you need.

Of course, using The Wayback Machine is not without its challenges. These include not being able to keyword search material in the database and that the web crawler that archives content does not discover, visit, or revisit every page available all day everyday.

While being able to keyword search the 435 billion pages would be wonderful and something we would love to see at some point, it is VERY POSSIBLE and VERY EASY for any user to use The Wayback Machine to archive any web page or PDF at any and all times they select.

This feature became available a couple of years ago but is unknown to many users.

In other words, you can use The Wayback Machine to archive whatever content YOU select easily and for free. At the same time, you’re helping make the database more complete.

How It’s Done

To permanently archive any web page or PDF and then get a direct link to the archived copy, follow the following steps:

1. Visit The Wayback Machine’s homepage at: http://web.archive.org.

2. Locate the box on the lower right-side of the page labeled “Save Page Now.”


3. In another browser tab or window, copy the url of the web page or PDF you want to archive.

4. Head back to the “Save Page Now” box and paste in the url. Click “save page.” That’s it.

5. In a matter of seconds a box will appear providing you with a direct link to this specific archived version of the page or PDF.


6. You’ll also be able to see the time and date the copy was archived along with other copies (if available) on the Wayback page for the specific url.


Making the Process Even Easier

A bookmarklet is available (free) that will allow you to skip the process listed above. The archiving process requiring only one-click. You can grab the bookmarklet here.

What’s Not Archivable Using Wayback?

The Wayback Machine observes the Robots.txt standard and will not crawl pages, documents, or servers that have it in place. This FAQ from the Internet Archive explains more. There are also a number of other possible issues that can cause pages not to archive, including javascript and password protected pages. More about some of them here.

In future posts we will share other web archiving tools, services, and strategies.

Bonus! How to Read a Wayback Machine URL

Every Wayback Machine URL includes the date and exact time (to the second) that the page was crawled, captured, and archived.

Let’s break down the URL of the page we captured in the example above.


  • 2015-Year
  • 01-Month
  • 19-Date
  • 21:59:11-Time the page was crawled and archived. Wayback uses UTC.
  • /GIJN.org-Page that was archived.

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington, D.C., metro area. He is the author of INFOdocket (@infodocket) for Library Journal, and was a co-founder and senior editor at ResourceShelf and DocuTicker. Previously, Price served as a contributing editor to Search Engine Land and director of Online Information Services at Ask.com. 

2 thoughts on “Introducing The Research Desk: Secrets of the Wayback Machine

  1. For those wishing to conduct systematic and extensive Web site preservation, you can command an army of Web crawlers to repeatedly crawl sites, in whole or part, and to have those pages poured into a full-text searchable database, with the Internet Archive’s subscription service, Archive-It https://archive-it.org/

  2. I use the Wayback Machine all the time when I have broken links on Libguides. I am responsible for guides that I did not create, and I do not always know what the linked pages contained. If the page has been archived, I can see the type of information it had and can find a suitable replacement. It is an incredibly useful tool.

Leave a Reply

Your email address will not be published. Required fields are marked *