An Idiot’s Guide to the PageRank Algorithm
The original PageRank algorithm was first outlined in the fabled ‘Anatomy of a Large-Scale Hypertextual Web Search Engine’ – the academic white paper produced by Sergey Brin and Lawrence Page that gave Google its genesis.
While the search algorithm itself has evolved far beyond this point, and is now myth amongst SEO professionals, it is occasionally worth revisiting this renowned document.
By understanding the PageRank algorithm, we can understand how the foundations that one of the world’s biggest businesses was built on, were formed. Although PageRank sculpting is not an advisable tactic in the searchable web ecosystem, we can also get a good idea of how flows internally through a website.
With this in mind, I wanted to write an analysis of the original algorithm to help people familiarise themselves with what is an incredibly complex formula. So, brace yourselves people, this is going to get heavy.
An Analysis of the Original PageRank Algorithm
What I’m about to show you is often looked at with a sense of fear, confusion and inertia that is perfectly captured by the ‘Michael Bluth’ character from Arrested Development, when his son describes to him the concept of a ‘Mayonegg’.
Well, I hope you’re ready – in the original Stanford paper, the PageRank of page (A) is calculated using the following formula:
PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
Truly frightening, isn’t it? Stick with me and use the below definitions to guide you through the labyrinthine complexities of this algorithm.
- d – ‘d’ is a dampening factor between ’0′ and ’1′, and in the original Google paper it is stated that this value is normally set at 0.85.
- PR(Tn) – Each page has a notion of its own self-importance. This is defined by “PR(T1)” for the first page in the web, right the way to “PR(Tn)” for the last page.
- PR(Tn)/C(Tn)– Hypothetically, if ‘Page A’ has a backlink from page “n”, then the share of the vote page A will get is defined by “PR(Tn)/C(Tn)”.
- d(… - All these fractions of votes are added together but, to stop the other pages having too much influence, this total vote is “damped down” by multiplying it by 0.85 (the factor “d” as mentioned in point 1).
- (1-d) - The (1 – d) bit at the beginning is a bit of probability math magic, so the “sum of all web pages’ PageRanks will be one” (as is stated in the original Google paper). It adds in the bit lost by the “d(….” and it also means that if a page has no links to it (no backlinks) even then it will still get a small PR of 0.15 (i.e. 1 – 0.85). (Aside: the Google paper says “the sum of all pages” but they mean the “the normalised sum”, otherwise known as “the average” to you and me).
The colour coding should make it slightly easier to follow, but in essence, what this means is that the PageRank of a webpage is calculated as a sum of the PageRanks of all pages pointing it (aka incoming links), divided by the number of outgoing links on those pages.
How does PageRank flow within a website?
This is where it gets interesting for inbound marketers like you and me. Most importantly, this formula shows that PR does not consider a website as a whole, but it is determined for each page individually. Just to make things even more complicated, the PR of a webpage is calculated recursively by the PR of sites linking to the original page. Again, this tells us that PageRank is likely to be in a state of constant flux, rather than just a static number.
Google uses what is known as a ‘rolling index’, which means that webpages float in and out of it all the time. As a result, PageRank value is being constantly recalculated as webpages pass in and out of the index, meaning links are lost and gained all the time. While this analysis is based on the original algorithm, it could provide be a potential contributor to what is known as the ‘Google Dance’.
Why does this help us?
As a final part of this PageRank analysis, let me show you an example of how PR would flow internally within a minute four-page website. I’ll be using the figures given in the original academic document to calculate this, so bear in mind that these are unlikely to still be accurate.
This example is hypothetical, but I am still relatively confident that Google use a similar concept to calculate the PageRank of a webpage. The same logic can be applied on a larger scale and calculated recursively to help you understand how PR is distributed within your own website.
While PR can be used as a very general indicator of a webpages quality, it now tells us almost nothing about why that page ranks where it does. However, I still think understanding the concept is important to know and I hope you will have found this idiot’s guide to the PageRank algorithm useful.
Please leave a comment if you have any questions about what I’ve written and I’ll be sure to provide a swift response (or defence!).