Question? Leave a message!




NATURAL LANGUAGE PROCESSING FOR COMMUNICATION

NATURAL LANGUAGE PROCESSING FOR COMMUNICATION
NATURAL LANGUAGE PROCESSING FOR COMMUNICATION Sections 23.1 – 23.3 (not covering 23.2.1-2) Please set your mobile devices to silent. CS 3243 - NLP for Communication 1 Last Time Introduction to Learning  Supervised Learning - Induction from observations  Trading model fit for simplicity  Algorithms   KNN   Naïve Bayes   Decision Trees / Information Gain CS 3243 - NLP for Communication 2 Outline  Formal Grammar  Parsing: Syntactic Analysis  Augmented Grammars  The larger context   Communication as Action   Semantic Interpretation   Ambiguity and Disambiguation   Discourse Understanding CS 3243 - NLP for Communication 3 Language   Formal language: A (possibly infinite) set of strings   Grammar: A finite set of rules that specifies a language   Rewrite rules Convention: Uppercase are   Non-terminal symbols not observed (S, NP, etc.) for non-   Terminal symbols observed (“he”) terminals, lowercase for terminals   S → NP VP   NP → Pronoun   Pronoun → “he” CS 3243 - NLP for Communication 4 Generative Capacity Noam Chomsky described four grammatical formalisms:   Recursively enumerable grammars   Unrestricted rules: both sides of the rewrite rules can have any number of terminal and non-terminal symbols; full Turing machines ABd → CaE   Context-sensitive grammars   The RHS must contain at least as many symbols as the LHS ASB → AXB   Context-free grammars (CFG)   LHS is a single non-terminal symbol S → XYa   Regular grammars   LHS is single non-terminal; RHS a terminal plus optional non-terminal X → a X → aY CS 3243 - NLP for Communication 5 Formal Grammar   The lexicon for ε : o Noun → stench breeze glitter wumpus pit pits gold … Verb → is see smell shoot stinks go grab turn … Adjective → right left east dead back smelly … Adverb → here there nearby ahead right left east … Pronoun → me you I it … Name → John Mary Boston Aristotle … Article → the a an … Preposition → to in on near … Conjunction → and or but … Digit → 0 1 2 3 4 5 6 7 8 9 CS 3243 - NLP for Communication 6 Formal Grammar   The grammar for ε : o S → NP VP I + feel a breeze S Conjunction S I feel a breeze + and + I smell a wumpus NP → Pronoun I Name John We’ll deal with Noun pits probabilistic rules later Article Noun the + wumpus Digit Digit 3 4 NP PP the wumpus + to the east NP RelClause the wumpus + that is smelly CS 3243 - NLP for Communication 7 Formal Grammar  Parts of speech   Open class: noun, verb, adjective, adverb   Closed class: pronoun, article, preposition, conjunction, …  Shortcomings of our grammar   Overgenerate: “Me go Boston”   Undergenerate: “I think the wumpus is smelly” CS 3243 - NLP for Communication 8 Parse Tree S NP VP VP Adjective Article Noun Verb the wumpus is dead CS 3243 - NLP for Communication 9 Syntactic Analysis (Parsing)  Parsing: The process of finding a parse tree for a given input string  Top-down parsing   Start with the S symbol and search for a tree that has the words as its leaves  Bottom-up parsing   Start with the words and search for a tree with root S CS 3243 - NLP for Communication 10 Trace of Bottom-up Parsing List of nodes Subsequence Rule the wumpus is dead the Article → the Article wumpus is dead wumpus Noun → wumpus Article Noun is dead Article Noun NP → Article Noun NP is dead is Verb → is NP Verb dead dead Adjective → dead NP Verb Adjective Verb VP → Verb NP VP Adjective VP Adjective VP → VP Adjective NP VP NP VP S → NP VP S   Left to Right processing per token CS 3243 - NLP for Communication 11 Blank spaces to fill in on this slide Intrasentence ambiguity “Have the students of CS 3243 take the exam” Artificial Intelligence … taken the exam” CS 3243 - NLP for Communication 12 Probabilistic Grammars   Probabilistic lexicon Noun → stench .05 breeze .1 wumpus .15 pits .05 … Verb → is .1 feel .1 stinks .05 … … Digit → 0 .1 1 .1 2 .1 …   Probabilistic Grammar VP→ Verb .4 stinks VP NP .35 feel + a breeze VP Adjective .05 is + smelly VP PP .1 turn + to the east VP Adverb .1 go + ahead Where each category (e.g., VP) rule has probabilities that sum to one. CS 3243 - NLP for Communication 13 CYK Chart Parsing   Uses dynamic programming to memoize intermediate results, saving to a chart.   Bottom Up Iterative Processing   Converts context free grammar into a special form: Chomsky Normal Form   X → “a”   X → YZ 2 3 n   Uses space O(n m) ≅ O(n ), despite O(2 ) possible parses.   Suitable for probabilistic CFGs (PCFGs). CS 3243 - NLP for Communication 14 CYK Parsing CS 3243 - NLP for Communication 15 Blank spaces to fill in on this slide An Ambiguous Example S → NP VP S → NP VP PP S,S,S PP → P NP NP → A NP S,VP NP → NP PP VP → V PP VP → V NP NP,VP VP → V See whether you NP → N understand what S PP each of the “S”s stand for S,NP S,VP NP N,A N,V N,V P A N British left waffles on Falkland Islands CS 3243 - NLP for Communication 16 Blank spaces to fill in on this slide Dealing with Probabilities S → NP VP .3 S3.6864E-4, S → NP VP PP .7 S4.608E-4, S5.37E-3 PP → P NP 1 S.00576, NP → A NP .4 NP → NP PP .2 VP.00384 NP → N .4 VP.048, ? We ignored the VP → V PP .4 NP.0128 probabilistic VP → V NP .3 lexicon for this S.0144 PP VP → V .4 example which .16 should also be used. S.048, S.048, NP.16 ? NP.16 VP.12 N,A N,V N,V P A N British left waffles on Falkland Islands CS 3243 - NLP for Communication 17 Subjective & Objective Cases  Overgeneration:   S → NP VP → NP VP NP → NP Verb NP   Pronoun Verb NP → Pronoun Verb Pronoun She loves him her loves he She ran towards him She ran towards he CS 3243 - NLP for Communication 18 Handling Subjective & Objective Cases S → NPs VP … NP → Pronoun Name Noun … s s NP → Pronoun Name Noun … o o VP → VP NPo … PP → Preposition NP o Pronoun → I you he she it … s Pronoun → me you him her it … o  Disadvantage: Grammar size grows exponentially CS 3243 - NLP for Communication 19 Augmented Grammars  Handling case, agreement, etc  Augment grammar rules to allow parameters on nonterminal categories   NP(Subjective)   NP(Objective)   NP(case) CS 3243 - NLP for Communication 20 Definite Clause Grammar (DCG)   The grammar for ε1: S → NP(Subjective) VP … NP(case) → Pronoun(case) Name Noun … VP → VP NP(Objective) … PP → Preposition NP(Objective) Definite Clauses, Pronoun(Subjective) → I you he she it … where did we hear that before? Pronoun(Objective) → me you him her it … CS 3243 - NLP for Communication 21 Definite Clause Grammar (DCG)   Each grammar rule is a definite clause in logic:   S → NP VP   NP(s1) ∧ VP(s2) ⇒ S(s1 + s2)   NP(case) → Pronoun(case) Prolog as a suitable   Pronoun(case, s1) ⇒ NP(case, s1) language for deterministic NLP   DCG enables parsing as logical inference:   Top-down parsing is backward chaining   Bottom-up parsing is forward chaining CS 3243 - NLP for Communication 22 Verb Subcategorization Verb Subcats Example Verb Phrase give NP,PP give the gold to me NP,NP give me the gold smell NP smell a wumpus Adjective smell awful PP smell like a wumpus is Adjective is smelly PP is in 2 2 NP is a pit died died believe S believe the wumpus is dead CS 3243 - NLP for Communication 23 Verb Subcategorization S → NP(Subjective) VP( ) VP(subcat) → Verb(subcat) VP(subcat + NP) NP(Objective) VP(subcat + Adjective) Adjective VP(subcat + PP) PP VP(subcat) → VP(subcat) PP VP(subcat) Adverb Verb(NP,NP) → give hand … CS 3243 - NLP for Communication 24 Parsing Using Verb Subcategorization S NP(Subjective) VP( ) VP(NP) NP(Objective) VP(NP,NP) NP(Objective) Pronoun(Subjective) Verb(NP,NP) Pronoun(Objective) Article Noun you give me the gold CS 3243 - NLP for Communication 25 The larger context – communication   Communication   Intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs  Humans use language to communicate most of what is known about the world  The Turing test is based on language CS 3243 - NLP for Communication 26 Communication as Action   Speech act   Language production viewed as an action   Speaker, hearer, utterance   Examples:   Query: “Have you smelled the wumpus anywhere?”   Inform: “There’s a breeze here in 3 4.”   Request: “Please help me carry the gold.” “I could use some help carrying this.”   Acknowledge: “OK”   Promise: “I’ll shoot the wumpus.” CS 3243 - NLP for Communication 27 Component Steps of Communication SPEAKER:  Intention Know(H,¬Alive(Wumpus,S )) 3  Generation “The wumpus is dead”  Synthesis thaxwahmpaxsihzdehd CS 3243 - NLP for Communication 28 Component Steps of Communication HEARER:   Perception: “The wumpus is dead” S   Analysis NP VP (Parsing): Article Noun Verb Adjective The wumpus is dead (Semantic Interpretation): ¬Alive(Wumpus, Now) Tired(Wumpus, Now) (Pragmatic Interpretation): ¬Alive(Wumpus , S ) 1 3 Tired(Wumpus , S ) 1 3 CS 3243 - NLP for Communication 29 Component Steps of Communication HEARER:  Disambiguation: ¬Alive(Wumpus ,S ) 1 3  Incorporation: TELL(KB, ¬Alive(Wumpus ,S )) 1 3 CS 3243 - NLP for Communication 30 Semantic Interpretation  Semantics: meaning of utterances  First-order logic as the representation language  Compositional semantics: meaning of a phrase is composed of meaning of the constituent parts of the phrase CS 3243 - NLP for Communication 31 Semantic Interpretation Exp(x) → Exp(x ) Operator(op) Exp(x ) 1 2 x = Apply(op, x , x ) 1 2 Exp(x) → ( Exp(x) ) Exp(x) → Number(x) Number(x) → Digit(x) Number(x) → Number(x ) Digit(x ) x = 10 × x + x 1 2 1 2 Digit(x) → x 0 ≤ x ≤ 9 Operator(x) → x x ∈ +, -, ×, ÷ CS 3243 - NLP for Communication 32 Semantic Interpretation CS 3243 - NLP for Communication 33 Semantic Interpretation John loves Mary Loves(John, Mary) (λy λx Loves(x,y)) (Mary) ≡ λx Loves(x, Mary) (λx Loves(x, Mary)) (John) ≡ Loves(John, Mary) S(rel(obj)) → NP(obj) VP(rel) VP(rel(obj)) → Verb(rel) NP(obj) NP(obj) → Name(obj) Name(John) → John Name(Mary) → Mary Verb(λy λx Loves(x,y) ) → loves CS 3243 - NLP for Communication 34 Semantic Interpretation S(Loves(John, Mary)) NP(John) VP(λx Loves(x, Mary) ) NP(Mary) Name(John) Verb(λy λx Loves(x, y) ) Name(Mary) John loves Mary CS 3243 - NLP for Communication 35 Pragmatic Interpretation  Adding context-dependent information about the current situation to each candidate semantic interpretation  Indexicals: phrases that refer directly to the current situation   “I am in Boston today” (“I” refers to speaker and “today” refers to now) CS 3243 - NLP for Communication 36 Language Generation The same DCG can be used for parsing and generation   Parsing:   Given: S(sem, John, loves, Mary)   Return: sem = Loves(John, Mary)   Generation:   Given: S(Loves(John, Mary), words)   Return: words = John, loves, Mary CS 3243 - NLP for Communication 37 Ambiguity   Lexical ambiguity   “the back of the room” vs. “back up your files”   “In the interest of stimulating the economy, the government lowered the interest rate.”   Syntactic ambiguity (structural ambiguity)   “I smelled a wumpus in 2,2”   Semantic ambiguity   “the IBM lecture”   Pragmatic ambiguity   “I’ll meet you next Friday” CS 3243 - NLP for Communication 38 Metonymy Denotes a concept by naming some other concept closely related to it   Examples:   Company for company’s spokesperson (“IBM announced a new model”)   Author for author’s works (“I read Shakespeare”)   Producer for producer’s product (“I drive a Honda”) CS 3243 - NLP for Communication 39 Metonymy  Representation of “IBM announced” CS 3243 - NLP for Communication 40 Metaphor Refer to concepts using words whose meanings are appropriate to other completely different kinds of concepts   Example: corporation-as-person metaphor:   Speak of a corporation as if it is a person and can experience emotions, has a mind, etc.   “That doesn’t scare Digital, which has grown to be the world’s second-largest computer maker.”   “But if the company changed its mind, however, it would do so for investment reasons, the filing said.” CS 3243 - NLP for Communication 41 Disambiguation •  Disambiguation is like diagnosis •  The speaker’s intent to communicate is an unobserved cause of the words in the utterance •  The hearer’s job is to work backwards from the words and from knowledge of the situation to recover the most likely intent of the speaker CS 3243 - NLP for Communication 42 Discourse Understanding   Discourse: multiple sentences   Reference resolution: The interpretation of a pronoun or a definite noun phrase that refers to an object in the world   “John flagged down the waiter. He ordered a ham sandwich.”   “He” refers to “John”   “After John proposed to Mary, they found a preacher and got married. For the honeymoon, they went to Hawaii.”   “they”? “the honeymoon”? CS 3243 - NLP for Communication 43 Discourse Understanding •  Structure of coherent discourse: Sentences are joined by coherence relations •  Examples of coherence relations between S1 and S2: –  Enable or cause: S1 brings about a change of state that causes or enables S2   “I went outside. I drove to school.” –  Explanation: the reverse of enablement, S2 causes or enables S1 and is an explanation for S1   “I was late for school. I overslept.” –  Exemplification: S2 is an example of the general principle in S1   “This algorithm reverses a list. The input A,B,C is mapped to C,B,A.” –  Etc. CS 3243 - NLP for Communication 44 Summary   NLP is full of ambiguity   Natural languages a testament to human intelligence, creativity   NLP largely processes one utterance at a time   Deterministic methods can follow context free grammars, (DCGs, hence Prolog)   PCFGs add probabilistic interpretation   Both parsable using DP (CYK algorithm) CS 3243 - NLP for Communication 45
Website URL
Comment