Cosine similarity formula

Why is ranking so important and cosine similarity calculation example and also cosine similarity information retrieval
WilliamsMcmahon Profile Pic
WilliamsMcmahon,United States,Professional
Published Date:20-07-2017
Your Website URL(Optional)
Introduction to Information Retrieval Introduction to Information Retrieval Scores in a Complete Search System 1Introduction to Information Retrieval Overview ❶ Recap ❷ Why rank? ❸ More on cosine ❹ Implementation of ranking ❺ The complete search system 2Introduction to Information Retrieval Outline ❶ Recap ❷ Why rank? ❸ More on cosine ❹ Implementation of ranking ❺ The complete search system 3Introduction to Information Retrieval Term frequency weight  The log frequency weight of term t in d is defined as follows 4 4Introduction to Information Retrieval idf weight  The document frequency dft is defined as the number of documents that t occurs in.  We define the idf weight of term t as follows:  idf is a measure of the informativeness of the term. 5 5Introduction to Information Retrieval tf-idf weight  The tf-idf weight of a term is the product of its tf weight and its idf weight. 6 6Introduction to Information Retrieval Cosine similarity between query and document  q is the tf-idf weight of term i in the query. i  d is the tf-idf weight of term i in the document. i  and are the lengths of and  and are length-1 vectors (= normalized). 7 7Introduction to Information Retrieval Cosine similarity illustrated 8 8Introduction to Information Retrieval tf-idf example: lnc.ltn Query: “best car insurance”. Document: “car insurance auto insurance”. term frequency, df: document frequency, idf: inverse document frequency, weight:the final weight of the term in the query or document, n’lized: document weights after cosine normalization, product: the product of final query weight and final document weight 1/1.92 0.52 1.3/1.92 0.68 Final similarity score between query and  document: w · w = 0 + 0 + 1.04 + 2.04 = 3.08 i qi di 9 9Introduction to Information Retrieval Take-away today  The importance of ranking: User studies at Google  Length normalization: Pivot normalization  Implementation of ranking  The complete search system 10 10Introduction to Information Retrieval Outline ❶ Recap ❷ Why rank? ❸ More on cosine ❹ Implementation of ranking ❺ The complete search system 11Introduction to Information Retrieval Why is ranking so important?  Last lecture: Problems with unranked retrieval  Users want to look at a few results – not thousands.  It’s very hard to write queries that produce a few results.  Even for expert searchers  → Ranking is important because it effectively reduces a large set of results to a very small one.  Next: More data on “users only look at a few results”  Actually, in the vast majority of cases they only examine 1, 2, or 3 results. 12 12Introduction to Information Retrieval Empirical investigation of the effect of ranking  How can we measure how important ranking is?  Observe what searchers do when they are searching in a controlled setting  Videotape them  Ask them to “think aloud”  Interview them  Eye-track them  Time them  Record and count their clicks  The following slides are from Dan Russell’s JCDL talk  Dan Russell is the “Über Tech Lead for Search Quality & User Happiness” at Google. 13 13Introduction to Information Retrieval 14 14Introduction to Information Retrieval 15 15Introduction to Information Retrieval 16 16Introduction to Information Retrieval 17 17Introduction to Information Retrieval 18 18Introduction to Information Retrieval 19 19Introduction to Information Retrieval Importance of ranking: Summary  Viewing abstracts: Users are a lot more likely to read the abstracts of the top-ranked pages (1, 2, 3, 4) than the abstracts of the lower ranked pages (7, 8, 9, 10).  Clicking: Distribution is even more skewed for clicking  In 1 out of 2 cases, users click on the top-ranked page.  Even if the top-ranked page is not relevant, 30% of users will click on it.  → Getting the ranking right is very important.  → Getting the top-ranked page right is most important. 20 20

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.