How to use archived versions of websites for SEO search?
In 2001, a non-profit organization called Internet Archive launched a new tool called Wayback Machine at the URL: archive.org.
The mission of the Internet archive was to create a digital library of Internet history.
Since web pages are constantly changing, Wayback Machine scanners frequently visit and cache pages for archives.
Their goal was to make this content available to future generations of researchers, historians and scientists. But this data is also valuable for marketers and SEO specialists.
Whenever we are working on a project that involves a dramatic change in traffic for the main site, this is one of the first places where we look at the cached pages before and after the traffic changes.
Even if you do not conduct forensic analysis of the site, access to the site’s change log can be a valuable tool.
You can find old content or even recall a promotion that was held last year.
Troubleshooting with Wayback Machine
As with browsing a live website, the cached pages will have all the available information that can explain the traffic change.
The entire website, including all HTML, is contained in the cache, which makes it quite easy to identify obvious structural or technical changes.
Here are the steps to use the Wayback Machine for troubleshooting.
1. Put your URL in the Archive.org search box
This does not have to be the home page. This can be any URL on the site.
2. Select the date when you think the code may have changed
Pay attention to the color coding of dates:
- Red indicates an error has occurred.
- Green indicates redirect.
- Blue means there was a good page cache.
You may have to choose dates for a long time, and then dig into each version until you find something interesting that you should pay attention to.
For larger sites, you will find that home pages are cached several times a day, while other sites are cached only a few times a year.
3. The cached page from archive.org will load into your browser like any other web site, except that it will have a header from Archive.org.
Look for obvious changes in structure and content that could lead to a change in search visibility.
4. Open the page source and find:
5. Compare everything that is different from the current site, and analyze cause-effect or correlation relationships.
Look at things like cross-references, words used on pages, and even evidence that a site could have been hacked for a certain period of time.
The Wayback Machine even saves snapshots of robots.txt files, so if changes in scanning permissions have occurred, evidence will be readily available.
This feature was surprisingly useful for me when sites mysteriously dropped out of the index with no apparent fine, spam or currently visible problems with the robots.txt file.
To find the history of the robots file, simply drag and drop the URL of the robot into the search box as follows. After that, select the date, and then analyze the differences with the current robots file. There are a number of free tools on the Internet that allow you to compare two different types of text.
Another less obvious use of the Wayback Machine is to determine how competitors could create backlinks in the past.
In addition to these incredibly useful ways to use Wayback Machine to troubleshoot SEO problems, there are more clever ways to use this data.
For those who create private blog networks (PBN) for backlinks, an archived site is a great way to restore the contents of a recently purchased expired domain.
The restored site is then populated with links to other sites on the network.
With the exception of shady use cases, Wayback Machine is one of the best free tools you can have in your arsenal of digital marketing. There is simply no other tool that has an 18-year history of almost every website in the world.