Tag Archives: Algorthim

Exploring the PageRank Algorithm

INTRODUCTION & BRIEF HISTORY

At the self-proclaimed “heart” of Google’s software (2014) lies their remarkable PageRank algorithm. PageRank added an extra hypertextual layer of abstraction to earlier keyword driven search models used by companies such as AltaVista and Infoseek (Chater & Oaksford, 2008). PageRank utilised a weighted ranking algorithm that took into account the number of hyperlinks pointing towards a connected page and their interrelation with other popular websites (Brin & Page, 1998). Named after its inventor Google founder Larry Page (not just a simple reference to the “pages” being ranked) (Page, 2001) PageRank was based upon previous conceptual ideas of academic citation using hypertext technology. These original principles were pioneered through the earlier works of Ellen Spertus (1997) and Massimo Marchiori (1997). These early concepts around hypertext and academic citation where then modified and applied directly to ranking webpages popularity and their relevance to web user’s search queries.

ALGORITHM BREAKDOWN

The PageRank Algorithm uses a probabilistic distribution to calculate the rank of a website. This rank is then updated every time Google’s web-crawler scours the web. This follows the Random Surfer Model which models user behaviour based on the probability that they will follow a hyperlink to a new site or randomly leave and request a new page.

There have been two public versions of the PageRank algorithm released which were famously mixed up by the Google founder Serge Brin and Larry Page in their original paper The Anatomy of a Large-Scale Hypertextual Web Search Engine (1998). 

Figure 1. PageRank without the damping factor being relative to the number of pages

PR(A) = (1-d) + d (PR(W1)/C(W1) + … + PR(Wn)/C(Wn))

Figure 2. PageRank with the damping factor relative to the number of pages

PR(A) = (1-d) / N + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))

ALGORITHM KEY FOR FIGURE 1: from http://interestingwebs.blogspot.co.uk/2009/05/simple-explain-of-google-pagerank.html

PR(A) – Page Rank of page A
PR(Wi) – Page Rank of pages Wi which link to page A
C(Wi) – number of outbound links on page Wi
d – damping factor which can be set between 0 and 1.

PageRank is calculated as a recursive function, meaning the accuracy of a page’s ranks increases after every iteration. If we look at the following diagram below in figure 3.

Figure 3. PageRank Diagram: from http://codispatch.blogspot.com/2015/12/java-program-implement-google-page-rank-algorithm.html
Simple PageRank Algorithm

We can see the probabilistic distribution using eigenvector centrality. For example on the first or ‘zero’ iteration the PageRank value is calculated as an even distribution of 1 divided by N number of pages. Which would give a value of 0.2 for each of the 5 pages (1/n). As you can see in table 1 below.

Table 1: PageRank Values: from http://codispatch.blogspot.com/2015/12/java-program-implement-google-page-rank-algorithm.html
PageRank ValuesOnce the initial zero iteration value has been calculated as seen in figure 4 above in the 0 row. The next iteration will then divide each page or node’s current rank by the number of outbound links. For example node C has 4 outgoing links so its initial value of 0.2 is divided by 4 to give a value of 0.05. As node A has 1 inbound link from node C it now has a value of 0.05. This process is repeated for all of the inbound and outbound links for each node and repeated for each iteration as show in the table in table 1.

After this initial calculation has been made a dampening factor taking into account the probability of a user leaving the site is incurred. Note – Google employees have alluded to the fact that the dampening factor of the random surfer model lies somewhere between 0.7 and 0.85 (Levy, 2011).

POTENTIAL INTEGRATION INTO BRITIZEN

The concept of if eigenvector centrality utilised by Google’s PageRank algorithm can be applied successfully to the Britizen’s scoring system If the dampening factor induced by the random surfer model of the PageRank algorithm is discarded due to its irrelevance in peer to peer scoring. This is due to the fact that the hypertextual links are not hypothetically traversed as in Google’s model and are therefore superfluous in our design.  Utilising the dispersal of scores via eigenvector centrality based on the peer to peer links on the site where a user’s number of friends represents even bidirectional links could induce certain desirable social behaviours. For example if a user had a small closely-nit group of friends with high Britizen points this would be more beneficial than having lots of friends with mediocre scores as the division of the users overall score would be dispersed equally to their peers. The implementation would of course only utilise eigenvector centrality as a weighted proportion or metric to calculate the user’s overall score, depending on how much social engineering is the focus of the scoring system. For more radical effects I believe a weighting of 50% of a user’s overall score would statistically show the greatest results as the user’s own background and whom they befriend would be calculated in equal measure, allowing us to observe the social effects of our scoring algorithm more clearly.

REFERENCES

Brin, S. & Page, L., 1998. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Seventh International World-Wide Web Conference (WWW 1998), 14-18 April.

Chater, N. & Oaksford, M., 2008. The Probabilistic Mind: Prospects for Bayesian Cognitive Science. New York: Oxford University Press, USA.

Chitpady , S., 2017. Java Program to Implement Simple PageRank Algorithm. [Online]
Available at: http://codispatch.blogspot.com/2015/12/java-program-implement-google-page-rank-algorithm.html
[Accessed 23 April 2017].

Google, 2014. Our history in depth. [Online]
Available at: https://www.google.com/about/company/history
[Accessed 16 11 2016].

Levy, S., 2011. In The Plex: How Google Thinks, Works, and Shapes Our Lives. New York: Simon & Schuster.

Marchiori, M., 1997. The Quest for Correct Information on the Web: Hyper Search Engines. The Sixth International WWW Conference (WWW 97), 7-11 April.

Page, L., 2001. Method for Node Ranking in a Linked Database. USA, Patent No. US 6285999 B1.

Spertus, E., 1997. ParaSite: Mining Structural Information on the Web. The Sixth International WWW Conference (WWW 97), 7-11 April.

Webs, I., 2009. Simple Explanation of Google PageRank. [Online] Available at: http://interestingwebs.blogspot.co.uk/2009/05/simple-explain-of-google-pagerank.html

 

 

Britizen Score

A user’s Britizen score is made up of several factors, these factors include:

  • Your friend’s scores and groups scores
  • Normal credit check
  • Criminal Record
  • Post history
  • Employment

Your friend’s scores and groups scores

This part of one’s score are the networks in which the user is a part of, the higher score of the networks such as friend’s and group’s the higher their own personal score. Having this as part of a personal score will encourage users to apply pressure on their networks to improve their scores as it directly affects one’s own score. This could potentially lead to ostracising friend’s with low scores in the most extreme cases.

Normal credit check

This credit check will be performed via a third party company such as Experian[1] and will be a measure of how likely a person is able to pay their bills. This will promote users trusting other users with high scores as they are more confident when lending money or favours.

Criminal Record

Having one’s criminal record be a part of their score will encourage users not to commit crimes as this will have an even greater damaging affect on their reputation if it is on show via their Britizen score.

Post History

Users who make posts on Britizen which promote British values will be given higher scores. Users who post material which do not agree with British values will be penalised in their score.

Employment History

A user’s employment history will affect one’s score, this will encourage users to obtain better jobs as the bonus for employment can be tailored specific to jobs which the country currently needs, for example the NHS is currently understaffed, therefore doctors and nurses could receive larger Britizen score bonuses, hopefully encouraging more users to become doctors or nurses.

 

 

[1] http://www.experian.co.uk/