Language models for information retrieval

collaborative language-based models of family therapy and language models for information retrieval ppt
WilliamsMcmahon Profile Pic
WilliamsMcmahon,United States,Professional
Published Date:20-07-2017
Your Website URL(Optional)
Comment
Introduction to Information Retrieval Introduction to Information Retrieval Language Models for IRIntroduction to Information Retrieval Overview ❶ Recap ❷ Language models ❸ Language Models for IR ❹ DiscussionIntroduction to Information Retrieval Overview ❶ Recap ❷ Language models ❸ Language Models for IR ❹ DiscussionIntroduction to Information Retrieval Indexing anchor text  Anchor text is often a better description of a page’s content than the page itself.  Anchor text can be weighted more highly than the text page.  A Google bomb is a search with “bad” results due to maliciously manipulated anchor text.  dangerous cult on Google, Bing, Yahoo 4Introduction to Information Retrieval PageRank  Model: a web surfer doing a random walk on the web  Formalization: Markov chain  PageRank is the long-term visit rate of the random surfer or the steady-state distribution.  Need teleportation to ensure well-defined PageRank  Power method to compute PageRank.  PageRank is the principal left eigenvector of the transition probability matrix. 5Introduction to Information Retrieval Computing PageRank: Power method x x 1 2 P (d ) P (d ) t 1 t 2 P = 0.1 P = 0.9 11 12 P = 0.3 P = 0.7 21 22  t 0 1 0.3 0.7 = xP 0  2 t 0.3 0.7 0.24 0.76 = xP 1  3 t 0.24 0.76 0.252 0.748 = xP 2  4 t 0.252 0.748 0.2496 0.7504 = xP 3 . . . . . .  ∞ t 0.25 0.75 0.25 0.75 = xP ∞  PageRank vector = p = (p , p ) = (0.25, 0.75) 12 P (d ) = P (d ) P + P (d ) P · · t 1 t-1 1 11 t-1 2 21 P (d ) = P (d ) P + P (d ) P 6 · · t 2 t-1 1 12 t-1 2 22Introduction to Information Retrieval HITS: Hubs and authorities 7Introduction to Information Retrieval HITS update rules  A: link matrix   h: vector of hub scores   a: vector of authority scores  HITS algorithm:    Compute h = Aa   T  Compute a = A h  Iterate until convergence  Output (i) list of hubs ranked according to hub score and (ii) list of authorities ranked according to authority score 8Introduction to Information Retrieval Outline ❶ Recap ❷ Language models ❸ Language Models for IR ❹ DiscussionIntroduction to Information Retrieval Recall: Naive Bayes generative model 10Introduction to Information Retrieval Naive Bayes and LM generative models  We want to classify document d. We want to classify a query q.  Classes: geographical regions like China, UK, Kenya. Each document in the collection is a different class.  Assume that d was generated by the generative model. Assume that q was generated by a generative model.  Key question: Which of the classes is most likely to have generated the document? Which document (=class) is most likely to have generated the query q?  Or: for which class do we have the most evidence? For which document (as the source of the query) do we have the most evidence? 11Introduction to Information Retrieval Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates the query. ❸ What we need to do: ❹ Define the precise generative model we want to use ❺ Estimate parameters (different parameters for each document’s model) ❻ Smooth to avoid zeros ❼ Apply to query and find document most likely to have generated the query ❽ Present most likely document(s) to user ❾ Note that x – y is pretty much what we did in Naive Bayes. Introduction to Information Retrieval What is a language model? We can view a finite state automaton as a deterministic language model. I wish I wish I wish I wish . . . Cannot generate: “wish I wish” or “I wish I”. Our basic model: each document was generated by a different automaton like this except that these automata are probabilistic. 13Introduction to Information Retrieval A probabilistic language model This is a one-state probabilistic finite-state automaton – a unigram language model – and the state emission distribution for its one state q . STOP is not a word, but a special symbol 1 indicating that the automaton stops. frog said that toad likes frog STOP P(string) = 0.01 · 0.03 · 0.04 · 0.01 · 0.02 · 0.01 · 0.02 = 0.0000000000048 14Introduction to Information Retrieval A different language model for each document frog said that toad likes frog STOP P(stringM ) = 0.01 · 0.03 · 0.04 · d1 -12 0.01 · 0.02 · 0.01 · 0.02 = 0.0000000000048 = 4.8 · 10 P(stringM ) = 0.01 · 0.03 · 0.05 · 0.02 · 0.02 · 0.01 · 0.02 = d2 -12 0.0000000000120 = 12 · 10 P(stringM ) P(stringM ) d1 d2 Thus, document d is “more relevant” to the string “frog said that 2 toad likes frog STOP” than d is. 1 15Introduction to Information Retrieval Outline ❶ Recap ❷ Language models ❸ Language Models for IR ❹ DiscussionIntroduction to Information Retrieval Using language models in IR  Each document is treated as (the basis for) a language model.  Given a query q  Rank documents based on P(dq)  P(q) is the same for all documents, so ignore  P(d) is the prior – often treated as the same for all d  But we can give a prior to “high-quality” documents, e.g., those with high PageRank.  P(qd) is the probability of q given d.  So to rank documents according to relevance to q, ranking 17 according to P(qd) and P(dq) is equivalent.Introduction to Information Retrieval Where we are  In the LM approach to IR, we attempt to model the query generation process.  Then we rank documents by the probability that a query would be observed as a random sample from the respective document model.  That is, we rank according to P(qd).  Next: how do we compute P(qd)? 18Introduction to Information Retrieval How to compute P(qd)  We will make the same conditional independence assumption as for Naive Bayes. (q: length ofr q; t : the token occurring at position k in q) k  This is equivalent to:  tf : term frequency ( occurrences) of t in q t,q  Multinomial model (omitting constant factor) 19Introduction to Information Retrieval Parameter estimation  Missing piece: Where do the parameters P(tM ). come from? d  Start with maximum likelihood estimates (as we did for Naive Bayes) (d: length of d; tf : occurrences of t in d) t,d  As in Naive Bayes, we have a problem with zeros.  A single t with P(tM ) = 0 will make zero. d  We would give a single term “veto power”.  For example, for query Michael Jackson top hits a document about “top songs” (but not using the word “hits”) would have P(tM ) = 0. – That’s bad. d 20  We need to smooth the estimates to avoid zeros.