Question? Leave a message!




NATURAL LANGUAGE PROCESSING FOR COMMUNICATION

NATURAL LANGUAGE PROCESSING FOR COMMUNICATION
NATURAL LANGUAGE PROCESSING FOR COMMUNICATION Sections 23.1 – 23.3 (not covering 23.2.12) Please set your mobile devices to silent. CS 3243 NLP for Communication 1 Last Time Introduction to Learning  Supervised Learning Induction from observations  Trading model fit for simplicity  Algorithms   KNN   Naïve Bayes   Decision Trees / Information Gain CS 3243 NLP for Communication 2 Outline  Formal Grammar  Parsing: Syntactic Analysis  Augmented Grammars  The larger context   Communication as Action   Semantic Interpretation   Ambiguity and Disambiguation   Discourse Understanding CS 3243 NLP for Communication 3 Language   Formal language: A (possibly infinite) set of strings   Grammar: A finite set of rules that specifies a language   Rewrite rules Convention: Uppercase are   Nonterminal symbols not observed (S, NP, etc.) for non   Terminal symbols observed (“he”) terminals, lowercase for terminals   S → NP VP   NP → Pronoun   Pronoun → “he” CS 3243 NLP for Communication 4 Generative Capacity Noam Chomsky described four grammatical formalisms:   Recursively enumerable grammars   Unrestricted rules: both sides of the rewrite rules can have any number of terminal and nonterminal symbols; full Turing machines ABd → CaE   Contextsensitive grammars   The RHS must contain at least as many symbols as the LHS ASB → AXB   Contextfree grammars (CFG)   LHS is a single nonterminal symbol S → XYa   Regular grammars   LHS is single nonterminal; RHS a terminal plus optional nonterminal X → a X → aY CS 3243 NLP for Communication 5 Formal Grammar   The lexicon for ε : o Noun → stench breeze glitter wumpus pit pits gold … Verb → is see smell shoot stinks go grab turn … Adjective → right left east dead back smelly … Adverb → here there nearby ahead right left east … Pronoun → me you I it … Name → John Mary Boston Aristotle … Article → the a an … Preposition → to in on near … Conjunction → and or but … Digit → 0 1 2 3 4 5 6 7 8 9 CS 3243 NLP for Communication 6 Formal Grammar   The grammar for ε : o S → NP VP I + feel a breeze S Conjunction S I feel a breeze + and + I smell a wumpus NP → Pronoun I Name John We’ll deal with Noun pits probabilistic rules later Article Noun the + wumpus Digit Digit 3 4 NP PP the wumpus + to the east NP RelClause the wumpus + that is smelly CS 3243 NLP for Communication 7 Formal Grammar  Parts of speech   Open class: noun, verb, adjective, adverb   Closed class: pronoun, article, preposition, conjunction, …  Shortcomings of our grammar   Overgenerate: “Me go Boston”   Undergenerate: “I think the wumpus is smelly” CS 3243 NLP for Communication 8 Parse Tree S NP VP VP Adjective Article Noun Verb the wumpus is dead CS 3243 NLP for Communication 9 Syntactic Analysis (Parsing)  Parsing: The process of finding a parse tree for a given input string  Topdown parsing   Start with the S symbol and search for a tree that has the words as its leaves  Bottomup parsing   Start with the words and search for a tree with root S CS 3243 NLP for Communication 10 Trace of Bottomup Parsing List of nodes Subsequence Rule the wumpus is dead the Article → the Article wumpus is dead wumpus Noun → wumpus Article Noun is dead Article Noun NP → Article Noun NP is dead is Verb → is NP Verb dead dead Adjective → dead NP Verb Adjective Verb VP → Verb NP VP Adjective VP Adjective VP → VP Adjective NP VP NP VP S → NP VP S   Left to Right processing per token CS 3243 NLP for Communication 11 Blank spaces to fill in on this slide Intrasentence ambiguity “Have the students of CS 3243 take the exam” Artificial Intelligence … taken the exam” CS 3243 NLP for Communication 12 Probabilistic Grammars   Probabilistic lexicon Noun → stench .05 breeze .1 wumpus .15 pits .05 … Verb → is .1 feel .1 stinks .05 … … Digit → 0 .1 1 .1 2 .1 …   Probabilistic Grammar VP→ Verb .4 stinks VP NP .35 feel + a breeze VP Adjective .05 is + smelly VP PP .1 turn + to the east VP Adverb .1 go + ahead Where each category (e.g., VP) rule has probabilities that sum to one. CS 3243 NLP for Communication 13 CYK Chart Parsing   Uses dynamic programming to memoize intermediate results, saving to a chart.   Bottom Up Iterative Processing   Converts context free grammar into a special form: Chomsky Normal Form   X → “a”   X → YZ 2 3 n   Uses space O(n m) ≅ O(n ), despite O(2 ) possible parses.   Suitable for probabilistic CFGs (PCFGs). CS 3243 NLP for Communication 14 CYK Parsing CS 3243 NLP for Communication 15 Blank spaces to fill in on this slide An Ambiguous Example S → NP VP S → NP VP PP S,S,S PP → P NP NP → A NP S,VP NP → NP PP VP → V PP VP → V NP NP,VP VP → V See whether you NP → N understand what S PP each of the “S”s stand for S,NP S,VP NP N,A N,V N,V P A N British left waffles on Falkland Islands CS 3243 NLP for Communication 16 Blank spaces to fill in on this slide Dealing with Probabilities S → NP VP .3 S3.6864E4, S → NP VP PP .7 S4.608E4, S5.37E3 PP → P NP 1 S.00576, NP → A NP .4 NP → NP PP .2 VP.00384 NP → N .4 VP.048, We ignored the VP → V PP .4 NP.0128 probabilistic VP → V NP .3 lexicon for this S.0144 PP VP → V .4 example which .16 should also be used. S.048, S.048, NP.16 NP.16 VP.12 N,A N,V N,V P A N British left waffles on Falkland Islands CS 3243 NLP for Communication 17 Subjective Objective Cases  Overgeneration:   S → NP VP → NP VP NP → NP Verb NP   Pronoun Verb NP → Pronoun Verb Pronoun She loves him her loves he She ran towards him She ran towards he CS 3243 NLP for Communication 18 Handling Subjective Objective Cases S → NPs VP … NP → Pronoun Name Noun … s s NP → Pronoun Name Noun … o o VP → VP NPo … PP → Preposition NP o Pronoun → I you he she it … s Pronoun → me you him her it … o  Disadvantage: Grammar size grows exponentially CS 3243 NLP for Communication 19 Augmented Grammars  Handling case, agreement, etc  Augment grammar rules to allow parameters on nonterminal categories   NP(Subjective)   NP(Objective)   NP(case) CS 3243 NLP for Communication 20 Definite Clause Grammar (DCG)   The grammar for ε1: S → NP(Subjective) VP … NP(case) → Pronoun(case) Name Noun … VP → VP NP(Objective) … PP → Preposition NP(Objective) Definite Clauses, Pronoun(Subjective) → I you he she it … where did we hear that before Pronoun(Objective) → me you him her it … CS 3243 NLP for Communication 21 Definite Clause Grammar (DCG)   Each grammar rule is a definite clause in logic:   S → NP VP   NP(s1) ∧ VP(s2) ⇒ S(s1 + s2)   NP(case) → Pronoun(case) Prolog as a suitable   Pronoun(case, s1) ⇒ NP(case, s1) language for deterministic NLP   DCG enables parsing as logical inference:   Topdown parsing is backward chaining   Bottomup parsing is forward chaining CS 3243 NLP for Communication 22 Verb Subcategorization Verb Subcats Example Verb Phrase give NP,PP give the gold to me NP,NP give me the gold smell NP smell a wumpus Adjective smell awful PP smell like a wumpus is Adjective is smelly PP is in 2 2 NP is a pit died died believe S believe the wumpus is dead CS 3243 NLP for Communication 23 Verb Subcategorization S → NP(Subjective) VP( ) VP(subcat) → Verb(subcat) VP(subcat + NP) NP(Objective) VP(subcat + Adjective) Adjective VP(subcat + PP) PP VP(subcat) → VP(subcat) PP VP(subcat) Adverb Verb(NP,NP) → give hand … CS 3243 NLP for Communication 24 Parsing Using Verb Subcategorization S NP(Subjective) VP( ) VP(NP) NP(Objective) VP(NP,NP) NP(Objective) Pronoun(Subjective) Verb(NP,NP) Pronoun(Objective) Article Noun you give me the gold CS 3243 NLP for Communication 25 The larger context – communication   Communication   Intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs  Humans use language to communicate most of what is known about the world  The Turing test is based on language CS 3243 NLP for Communication 26 Communication as Action   Speech act   Language production viewed as an action   Speaker, hearer, utterance   Examples:   Query: “Have you smelled the wumpus anywhere”   Inform: “There’s a breeze here in 3 4.”   Request: “Please help me carry the gold.” “I could use some help carrying this.”   Acknowledge: “OK”   Promise: “I’ll shoot the wumpus.” CS 3243 NLP for Communication 27 Component Steps of Communication SPEAKER:  Intention Know(H,¬Alive(Wumpus,S )) 3  Generation “The wumpus is dead”  Synthesis thaxwahmpaxsihzdehd CS 3243 NLP for Communication 28 Component Steps of Communication HEARER:   Perception: “The wumpus is dead” S   Analysis NP VP (Parsing): Article Noun Verb Adjective The wumpus is dead (Semantic Interpretation): ¬Alive(Wumpus, Now) Tired(Wumpus, Now) (Pragmatic Interpretation): ¬Alive(Wumpus , S ) 1 3 Tired(Wumpus , S ) 1 3 CS 3243 NLP for Communication 29 Component Steps of Communication HEARER:  Disambiguation: ¬Alive(Wumpus ,S ) 1 3  Incorporation: TELL(KB, ¬Alive(Wumpus ,S )) 1 3 CS 3243 NLP for Communication 30 Semantic Interpretation  Semantics: meaning of utterances  Firstorder logic as the representation language  Compositional semantics: meaning of a phrase is composed of meaning of the constituent parts of the phrase CS 3243 NLP for Communication 31 Semantic Interpretation Exp(x) → Exp(x ) Operator(op) Exp(x ) 1 2 x = Apply(op, x , x ) 1 2 Exp(x) → ( Exp(x) ) Exp(x) → Number(x) Number(x) → Digit(x) Number(x) → Number(x ) Digit(x ) x = 10 × x + x 1 2 1 2 Digit(x) → x 0 ≤ x ≤ 9 Operator(x) → x x ∈ +, , ×, ÷ CS 3243 NLP for Communication 32 Semantic Interpretation CS 3243 NLP for Communication 33 Semantic Interpretation John loves Mary Loves(John, Mary) (λy λx Loves(x,y)) (Mary) ≡ λx Loves(x, Mary) (λx Loves(x, Mary)) (John) ≡ Loves(John, Mary) S(rel(obj)) → NP(obj) VP(rel) VP(rel(obj)) → Verb(rel) NP(obj) NP(obj) → Name(obj) Name(John) → John Name(Mary) → Mary Verb(λy λx Loves(x,y) ) → loves CS 3243 NLP for Communication 34 Semantic Interpretation S(Loves(John, Mary)) NP(John) VP(λx Loves(x, Mary) ) NP(Mary) Name(John) Verb(λy λx Loves(x, y) ) Name(Mary) John loves Mary CS 3243 NLP for Communication 35 Pragmatic Interpretation  Adding contextdependent information about the current situation to each candidate semantic interpretation  Indexicals: phrases that refer directly to the current situation   “I am in Boston today” (“I” refers to speaker and “today” refers to now) CS 3243 NLP for Communication 36 Language Generation The same DCG can be used for parsing and generation   Parsing:   Given: S(sem, John, loves, Mary)   Return: sem = Loves(John, Mary)   Generation:   Given: S(Loves(John, Mary), words)   Return: words = John, loves, Mary CS 3243 NLP for Communication 37 Ambiguity   Lexical ambiguity   “the back of the room” vs. “back up your files”   “In the interest of stimulating the economy, the government lowered the interest rate.”   Syntactic ambiguity (structural ambiguity)   “I smelled a wumpus in 2,2”   Semantic ambiguity   “the IBM lecture”   Pragmatic ambiguity   “I’ll meet you next Friday” CS 3243 NLP for Communication 38 Metonymy Denotes a concept by naming some other concept closely related to it   Examples:   Company for company’s spokesperson (“IBM announced a new model”)   Author for author’s works (“I read Shakespeare”)   Producer for producer’s product (“I drive a Honda”) CS 3243 NLP for Communication 39 Metonymy  Representation of “IBM announced” CS 3243 NLP for Communication 40 Metaphor Refer to concepts using words whose meanings are appropriate to other completely different kinds of concepts   Example: corporationasperson metaphor:   Speak of a corporation as if it is a person and can experience emotions, has a mind, etc.   “That doesn’t scare Digital, which has grown to be the world’s secondlargest computer maker.”   “But if the company changed its mind, however, it would do so for investment reasons, the filing said.” CS 3243 NLP for Communication 41 Disambiguation •  Disambiguation is like diagnosis •  The speaker’s intent to communicate is an unobserved cause of the words in the utterance •  The hearer’s job is to work backwards from the words and from knowledge of the situation to recover the most likely intent of the speaker CS 3243 NLP for Communication 42 Discourse Understanding   Discourse: multiple sentences   Reference resolution: The interpretation of a pronoun or a definite noun phrase that refers to an object in the world   “John flagged down the waiter. He ordered a ham sandwich.”   “He” refers to “John”   “After John proposed to Mary, they found a preacher and got married. For the honeymoon, they went to Hawaii.”   “they” “the honeymoon” CS 3243 NLP for Communication 43 Discourse Understanding •  Structure of coherent discourse: Sentences are joined by coherence relations •  Examples of coherence relations between S1 and S2: –  Enable or cause: S1 brings about a change of state that causes or enables S2   “I went outside. I drove to school.” –  Explanation: the reverse of enablement, S2 causes or enables S1 and is an explanation for S1   “I was late for school. I overslept.” –  Exemplification: S2 is an example of the general principle in S1   “This algorithm reverses a list. The input A,B,C is mapped to C,B,A.” –  Etc. CS 3243 NLP for Communication 44 Summary   NLP is full of ambiguity   Natural languages a testament to human intelligence, creativity   NLP largely processes one utterance at a time   Deterministic methods can follow context free grammars, (DCGs, hence Prolog)   PCFGs add probabilistic interpretation   Both parsable using DP (CYK algorithm) CS 3243 NLP for Communication 45
sharer
Presentations
Free
Document Information
Category:
Presentations
User Name:
Dr.BenjaminClark
User Type:
Teacher
Country:
United States
Uploaded Date:
21-07-2017