Summarization ppt

summarizing powerpoint 5th grade and text summarization ppt
Dr.DouglasPatton Profile Pic
Dr.DouglasPatton,United States,Teacher
Published Date:26-07-2017
Your Website URL(Optional)
Comment
Question Answering Summarization in Question AnsweringDan Jurafsky Text Summarization • Goal: produce an abridged version of a text that contains information that is important or relevant to a user. • Summarization Applications • outlines or abstracts of any document, article, etc • summaries of email threads • action items from a meeting • simplifying text by compressing sentences 2Dan Jurafsky What to summarize? Single vs. multiple documents • Single-document summarization • Given a single document, produce • abstract • outline • headline • Multiple-document summarization • Given a group of documents, produce a gist of the content: • a series of news stories on the same event • a set of web pages about some topic or question 3Dan Jurafsky Query-focused Summarization & Generic Summarization • Generic summarization: • Summarize the content of a document • Query-focused summarization: • summarize a document with respect to an information need expressed in a user query. • a kind of complex question answering: • Answer a question by summarizing a document that has the information to construct the answer 4Dan Jurafsky Summarization for Question Answering: Snippets • Create snippets summarizing a web page for a query • Google: 156 characters (about 26 words) plus title and link 5Dan Jurafsky Summarization for Question Answering: Multiple documents Create answers to complex questions summarizing multiple documents. • Instead of giving a snippet for each document • Create a cohesive answer that combines information from each document 6Dan Jurafsky Extractive summarization & Abstractive summarization • Extractive summarization: • create the summary from phrases or sentences in the source document(s) • Abstractive summarization: • express the ideas in the source documents using (at least in part) different words 7Dan Jurafsky Simple baseline: take the first sentence 8Question Answering Summarization in Question AnsweringQuestion Answering Generating Snippets and other Single- Document AnswersDan Jurafsky Snippets: query-focused summaries 11Dan Jurafsky Summarization: Three Stages 1. content selection: choose sentences to extract from the document 2. information ordering: choose an order to place them in the summary 3. sentence realization: clean up the sentences Extracted All sentences sentences from documents Summary Sentence Document Realization Information Sentence Sentence Segmentation Extraction Ordering Sentence Simplification Content Selection 12Dan Jurafsky Basic Summarization Algorithm 1. content selection: choose sentences to extract from the document 2. information ordering: just use document order 3. sentence realization: keep original sentences Extracted All sentences sentences from documents Summary Sentence Document Realization Information Sentence Sentence Segmentation Extraction Ordering Sentence Simplification Content Selection 13Dan Jurafsky Unsupervised content selection H. P. Luhn. 1958. The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development. 2:2, 159-165. • Intuition dating back to Luhn (1958): • Choose sentences that have salient or informative words • Two approaches to defining salient words 1. tf-idf: weigh each word w in document j by tf-idf i weight(w ) =tf ´idf i ij i 2. topic signature: choose a smaller set of salient words • mutual information • log-likelihood ratio (LLR) Dunning (1993), Lin and Hovy (2000) ì ï 1 if -2logl(w ) 10 i weight(w ) = í i 14 ï 0 otherwise îDan Jurafsky Topic signature-based content selection with queries Conroy, Schlesinger, and O’Leary 2006 • choose words that are informative either • by log-likelihood ratio (LLR) • or by appearing in the query ì 1 if -2logl(w ) 10 i ï ï (could learn more weight(w ) = 1 if w Îquestion í i i complex weights) ï 0 otherwise ï î • Weigh a sentence (or window) by weight of its words: 1 weight(s) = weight(w) å S 15 wÎSDan Jurafsky Supervised content selection • Given: • Problems: • a labeled training set of good • hard to get labeled training summaries for each document data • alignment difficult • Align: • performance not better than • the sentences in the document unsupervised algorithms with sentences in the summary • So in practice: • Extract features • Unsupervised content • position (first sentence?) selection is more common • length of sentence • word informativeness, cue phrases • cohesion • Train • a binary classifier (put sentence in summary? yes or no)Question Answering Generating Snippets and other Single- Document AnswersQuestion Answering Evaluating Summaries: ROUGEDan Jurafsky ROUGE (Recall Oriented Understudy for Gisting Evaluation) Lin and Hovy 2003 • Intrinsic metric for automatically evaluating summaries • Based on BLEU (a metric used for machine translation) • Not as good as human evaluation (“Did this answer the user’s question?”) • But much more convenient • Given a document D, and an automatic summary X: 1. Have N humans produce a set of reference summaries of D 2. Run system, giving automatic summary X 3. What percentage of the bigrams from the reference summaries appear in X? min(count(i,X),count(i,S)) å å sÎRefSummaries bigrams iÎS ROUGE - 2 = count(i,S) å å 19 sÎRefSummaries bigrams iÎSDan Jurafsky A ROUGE example: Q: “What is water spinach?” Human 1: Water spinach is a green leafy vegetable grown in the tropics. Human 2: Water spinach is a semi-aquatic tropical plant grown as a vegetable. Human 3: Water spinach is a commonly eaten leaf vegetable of Asia. • System answer: Water spinach is a leaf vegetable commonly eaten in tropical areas of Asia. • ROUGE-2 = 3 + 3 + 6 = 12/28 = .43 10 + 9 + 9 20

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.