Recall information retrieval

How can we improve recall in search and recall dynamics reveal the retrieval of emotional context and stimulating the recall and retrieval of information
WilliamsMcmahon Profile Pic
WilliamsMcmahon,United States,Professional
Published Date:20-07-2017
Your Website URL(Optional)
Introduction to Information Retrieval Introduction to Information Retrieval Relevance Feedback & Query Expansion 1Introduction to Information Retrieval How can we improve recall in search?  Main topic today: two ways of improving recall: relevance feedback and query expansion  As an example consider query q: aircraft . . .  . . . and document d containing “plane”, but not containing “aircraft”  A simple IR system will not return d for q.  Even if d is the most relevant document for q  We want to change this:  Return relevant documents even if there is no term match with the (original) query 5 5Introduction to Information Retrieval Recall  Loose definition of recall in this lecture: “increasing the number of relevant documents returned to user”  This may actually decrease recall on some measures, e.g., when expanding “jaguar” with “panthera”  . . .which eliminates some relevant documents, but increases relevant documents returned on top pages 6 6Introduction to Information Retrieval Options for improving recall  Local: Do a “local”, on-demand analysis for a user query  Main local method: relevance feedback  Part 1  Global: Do a global analysis once (e.g., of collection) to produce thesaurus  Use thesaurus for query expansion  Part 2 7 7Introduction to Information Retrieval Google examples for query expansion  One that works well  ˜flights -flight  One that doesn’t work so well  ˜hospitals -hospital 8 8Introduction to Information Retrieval Relevance feedback: Basic idea  The user issues a (short, simple) query.  The search engine returns a set of documents.  User marks some docs as relevant, some as nonrelevant.  Search engine computes a new representation of the information need. Hope: better than the initial query.  Search engine runs new query and returns new results.  New results have (hopefully) better recall. 10 10Introduction to Information Retrieval Relevance feedback  We can iterate this: several rounds of relevance feedback.  We will use the term ad hoc retrieval to refer to regular retrieval without relevance feedback.  We will now look at three different examples of relevance feedback that highlight different aspects of the process. 11 11Introduction to Information Retrieval User feedback: Select relevant documents Source: Fernando Díaz 18 18Introduction to Information Retrieval Results after relevance feedback Source: Fernando Díaz 19 19Introduction to Information Retrieval Example 3: A real (non-image) example Initial query: new space satellite applications Results for initial query: (r = rank) r + 1 0.539 NASA Hasn’t Scrapped Imaging Spectrometer + 2 0.533 NASA Scratches Environment Gear From Satellite Plan 3 0.528 Science Panel Backs NASA Satellite Plan, But Urges Launches of Smaller Probes 4 0.526 A NASA Satellite Project Accomplishes Incredible Feat: Staying Within Budget 5 0.525 Scientist Who Exposed Global Warming Proposes Satellites for Climate Research 6 0.524 Report Provides Support for the Critics Of Using Big Satellites to Study Climate 7 0.516 Arianespace Receives Satellite Launch Pact From Telesat Canada + 8 0.509 Telecommunications Tale of Two Companies User then marks relevant documents with “+”. 20Introduction to Information Retrieval Expanded query after relevance feedback 2.074 new 15.106 space 30.816 satellite 5.660 application 5.991 nasa 5.196 eos 4.196 launch 3.972 aster 3.516 instrument 3.446 arianespace Compare to original 3.004 bundespost 2.806 ss 2.790 rocket 2.053 scientist 2.003 broadcast 1.172 earth 0.836 oil 0.646 measure query: new space satellite applications 21 21Introduction to Information Retrieval Results for expanded query r 1 0.513 NASA Scratches Environment Gear From Satellite Plan 2 0.500 NASA Hasn’t Scrapped Imaging Spectrometer 3 0.493 When the Pentagon Launches a Secret Satellite, Space Sleuths Do Some Spy Work of Their Own 4 0.493 NASA Uses ‘Warm’ Superconductors For Fast Circuit 5 0.492 Telecommunications Tale of Two Companies 6 0.491 Soviets May Adapt Parts of SS-20 Missile For Commercial Use 7 0.490 Gaping Gap: Pentagon Lags in Race To Match the Soviets In Rocket Launchers 8 0.490 Rescue of Satellite By Space Agency To Cost 90 Million 22Introduction to Information Retrieval Key concept for relevance feedback: Centroid  The centroid is the center of mass of a set of points.  Recall that we represent documents as points in a high- dimensional space.  Thus: we can compute centroids of documents.  Definition: where D is a set of documents and is the vector we use to represent document d. 24 24Introduction to Information Retrieval Rocchio’ algorithm  The Rocchio’ algorithm implements relevance feedback in the vector space model.  Rocchio’ chooses the query that maximizes D : set of relevant docs; D : set of nonrelevant docs r nr  Intent: qopt is the vector that separates relevant and nonrelevant docs maximally.  Making some additional assumptions, we can rewrite as: 26 26Introduction to Information Retrieval Terminology  We use the name Rocchio’ for the theoretically better motivated original version of Rocchio.  The implementation that is actually used in most cases is the SMART implementation – we use the name Rocchio (without prime) for that. 38 38Introduction to Information Retrieval Rocchio 1971 algorithm (SMART) Used in practice: q : modified query vector; q : original query vector; D and m 0 r D : sets of known relevant and nonrelevant documents nr respectively; α, β, and γ: weights  New query moves towards relevant documents and away from nonrelevant documents.  Tradeoff α vs. β/γ: If we have a lot of judged documents, we want a higher β/γ.  Set negative term weights to 0.  “Negative weight” for a term doesn’t make sense in the 39 39 vector space model.Introduction to Information Retrieval Positive vs. negative relevance feedback  Positive feedback is more valuable than negative feedback.  For example, set β = 0.75, γ = 0.25 to give higher weight to positive feedback.  Many systems only allow positive feedback. 40 40Introduction to Information Retrieval Relevance feedback: Assumptions  When can relevance feedback enhance recall?  Assumption A1: The user knows the terms in the collection well enough for an initial query.  Assumption A2: Relevant documents contain similar terms (so I can “hop” from one relevant document to a different one when giving relevance feedback). 41 41Introduction to Information Retrieval Violation of A1  Assumption A1: The user knows the terms in the collection well enough for an initial query.  Violation: Mismatch of searcher’s vocabulary and collection vocabulary  Example: cosmonaut / astronaut 42 42Introduction to Information Retrieval Violation of A2  Assumption A2: Relevant documents are similar.  Example for violation: contradictory government policies  Several unrelated “prototypes”  Subsidies for tobacco farmers vs. anti-smoking campaigns  Aid for developing countries vs. high tariffs on imports from developing countries  Relevance feedback on tobacco docs will not help with finding docs on developing countries. 43 43