How google Pagerank works

how does google pagerank algorithm work and how pagerank algorithm works
WilliamsMcmahon Profile Pic
WilliamsMcmahon,United States,Professional
Published Date:20-07-2017
Your Website URL(Optional)
Introduction to Information Retrieval Introduction to Information Retrieval Web Search 1Introduction to Information Retrieval Indexing anchor text  Anchor text is often a better description of a page’s content than the page itself.  Anchor text can be weighted more highly than the text on the page.  A Google bomb is a search with “bad” results due to maliciously manipulated anchor text.  dangerous cult on Google, Bing, Yahoo 4 4Introduction to Information Retrieval PageRank  Model: a web surfer doing a random walk on the web  Formalization: Markov chain  PageRank is the long-term visit rate of the random surfer or the steady-state distribution.  Need teleportation to ensure well-defined PageRank  Power method to compute PageRank  PageRank is the principal left eigenvector of the transition probability matrix. 5 5Introduction to Information Retrieval Computing PageRank: Power method  PageRank vector = π = (π1, π2) = (0.25, 0.75) P (d ) = P (d ) ∗ P + P (d ) ∗ P t 1 t−1 1 11 t−1 2 21 P (d ) = P (d ) ∗ P + P (d ) ∗ P t 2 t−1 1 12 t−1 2 22 6 6Introduction to Information Retrieval HITS: Hubs and authorities 7 7Introduction to Information Retrieval HITS update rules  A: link matrix   h: vector of hub scores   a: vector of authority scores  HITS algorithm:    Compute h = Aa   T  Compute a = A h  Iterate until convergence  Output (i) list of hubs ranked according to hub score and (ii) list of authorities ranked according to authority score 8 8Introduction to Information Retrieval Web search overview 10 10Introduction to Information Retrieval Search is the top activity on the web 11 11Introduction to Information Retrieval Without search engines, the web wouldn’t work  Without search, content is hard to find.  → Without search, there is no incentive to create content.  Why publish something if nobody will read it?  Why publish something if I don’t get ad revenue from it?  Somebody needs to pay for the web.  Servers, web infrastructure, content creation  A large part today is paid by search ads.  Search pays for the web. 12 12Introduction to Information Retrieval Interest aggregation  Unique feature of the web: A small number of geographically dispersed people with similar interests can find each other.  Elementary school kids with hemophilia  People interested in translating R5R5 Scheme into relatively portable C (open source project)  Search engines are a key enabler for interest aggregation. 13 13Introduction to Information Retrieval IR on the web vs. IR in general  On the web, search is not just a nice feature.  Search is a key enabler of the web: . . .  . . . financing, content creation, interest aggregation etc. → look at search ads  The web is a chaotic und uncoordinated collection. → lots of duplicates – need to detect duplicates  No control / restrictions on who can author content → lots of spam – need to detect spam  The web is very large. → need to know how big it is 14 14Introduction to Information Retrieval Take-away today  Big picture  Ads – they pay for the web  Duplicate detection – addresses one aspect of chaotic content creation  Spam detection – addresses one aspect of lack of central access control  Probably won’t get to today  Web information retrieval  Size of the web 15 15Introduction to Information Retrieval Outline ❶ Recap ❷ Big picture ❸ Ads ❹ Duplicate detection 16Introduction to Information Retrieval First generation of search ads: Goto (1996) 17 17Introduction to Information Retrieval First generation of search ads: Goto (1996)  Buddy Blake bid the maximum (0.38) for this search.  He paid 0.38 to Goto every time somebody clicked on the link.  Pages were simply ranked according to bid – revenue maximization for Goto.  No separation of ads/docs. Only one result list  Upfront and honest. No relevance ranking, . . .  . . . but Goto did not pretend there was any. 18 18Introduction to Information Retrieval Second generation of search ads: Google (2000/2001)  Strict separation of search results and search ads 19 19Introduction to Information Retrieval Two ranked lists: web pages (left) and ads (right) SogoTrade appears in search results. SogoTrade appears in ads. Do search engines rank advertisers higher than non-advertisers? All major search engines claim no. 20 20Introduction to Information Retrieval Do ads influence editorial content?  Similar problem at newspapers / TV channels  A newspaper is reluctant to publish harsh criticism of its major advertisers.  The line often gets blurred at newspapers / on TV.  No known case of this happening with search engines yet? 21 21Introduction to Information Retrieval How are the ads on the right ranked? 22 22Introduction to Information Retrieval How are ads ranked?  Advertisers bid for keywords – sale by auction.  Open system: Anybody can participate and bid on keywords.  Advertisers are only charged when somebody clicks on your ad.  How does the auction determine an ad’s rank and the price paid for the ad?  Basis is a second price auction, but with twists  For the bottom line, this is perhaps the most important research area for search engines – computational advertising.  Squeezing an additional fraction of a cent from each ad means billions of additional revenue for the search engine. 23 23