GIJN is delighted to introduce The Research Desk, a new feature by noted librarian and consultant Gary Price. Gary, who writes the INFOdocket blog for Library Journal, will offer GIJN readers items on the latest databases, digital tools, and documents around the world.
Before launching INFOdocket (@infodocket), Gary was co-founder and senior editor at ResourceShelf and DocuTicker for 10 years. He’s also served as director of online information services at Ask.com and contributing editor at Search Engine Land, and is co-author of The Invisible Web. Welcome, Gary!
It’s by far the largest archive of web pages available to the public (more than 435 billion pages dating back to 1996 as of January 2015). Without it, trying to find and access old web content would be nearly impossible, short of attempting to contact the webmaster and asking them if they have a copy of the page or pages from the date you need.
Of course, using The Wayback Machine is not without its challenges. These include not being able to keyword search material in the database and that the web crawler that archives content does not discover, visit, or revisit every page available all day everyday.
While being able to keyword search the 435 billion pages would be wonderful and something we would love to see at some point, it is VERY POSSIBLE and VERY EASY for any user to use The Wayback Machine to archive any web page or PDF at any and all times they select.
This feature became available a couple of years ago but is unknown to many users.
In other words, you can use The Wayback Machine to archive whatever content YOU select easily and for free. At the same time, you’re helping make the database more complete.
Hear a 2018 audio presentation by Mark Graham, the director of the Wayback Machine.
How It’s Done
To permanently archive any web page or PDF and then get a direct link to the archived copy, follow the following steps:
1. Visit The Wayback Machine’s homepage at: http://web.archive.org.
2. Locate the box on the lower right-side of the page labeled “Save Page Now.”
3. In another browser tab or window, copy the url of the web page or PDF you want to archive.
4. Head back to the “Save Page Now” box and paste in the url. Click “save page.” That’s it.
5. In a matter of seconds a box will appear providing you with a direct link to this specific archived version of the page or PDF.
6. You’ll also be able to see the time and date the copy was archived along with other copies (if available) on the Wayback page for the specific url.
Making the Process Even Easier
A bookmarklet is available (free) that will allow you to skip the process listed above. The archiving process requiring only one-click. You can grab the bookmarklet here.
What’s Not Archivable Using Wayback?
In future posts we will share other web archiving tools, services, and strategies.
Bonus! How to Read a Wayback Machine URL
Every Wayback Machine URL includes the date and exact time (to the second) that the page was crawled, captured, and archived.
Let’s break down the URL of the page we captured in the example above.
- 21:59:11-Time the page was crawled and archived. Wayback uses UTC.
- /GIJN.org-Page that was archived.
Gary Price (firstname.lastname@example.org) is a librarian, writer, consultant, and frequent conference speaker based in the Washington, D.C., metro area. He is the author of INFOdocket (@infodocket) for Library Journal, and was a co-founder and senior editor at ResourceShelf and DocuTicker. Previously, Price served as a contributing editor to Search Engine Land and director of Online Information Services at Ask.com.