Question? Leave a message!




Top-Down Parsing

Top-Down Parsing
MIT MIT 6 6 .035 035 TopDown Parsing Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Orientation • Language specification • Lexical structure – regular expressions • Syntactic structure – grammar • • This This Lecture Lecture recursive recursive d descent escent parsers parsers • Code parser as set of mutually recursive procedures • Structure Structure o of f p program rogram matches matches s structure tructure of of grammar grammar Starting Point • Assume lexical analysis has produced a sequence of of tokens tokens • Each token has a type and value • • Types Types c correspond orrespond t to o t terminals erminals • Values to contents of token read in • • Examples Examples • Int 549 – integer token with value 549 read in • if if if if keyword, keyword, no no need need for for a a value value • AddOp + add operator, value + Basic Approach • Start with Start symbol • B B uild ild a ll eft f tmost t d d erii vati tion • If leftmost symbol is nonterminal, choose a production production and and apply apply it it • If leftmost symbol is terminal, match against inp put • If all terminals match, have found a parse •Key y: find correct p productions for nonterminals Graphical Illustration of Leftmost Derivation Derivation Sentential Form NT T T T NT NT 1 1 2 3 2 3 Apply Production Not Here Her Her e e Grammar for Parsing g Examp p le Start→ Expr • Set Set o of f t tok okens ens iis s Expr→ Expr + Term +, , , /, Int , where Expr→ Expr Term Int = 0909 Expr Expr→→ Te Ter rm m • • F Fo or r c con onv ve enience nience , ma may y r repr epre esent sent Term→ Term Int each Int n token by n Term→ Term / Int Term → Int Parsing Example Parse Remaining Input Tr Tre ee e Start Start 222 Sentential Form Start Start Current Position in Parse Tree Parsing Example Parse Remaining Input Tr Tre ee e Start Start 222 Expr p Sentential Form Expr Expr App pplied Production Start→ Expr Current Position in Parse Tree Parsing Example Parse Remaining Input Tr Tre ee e Start Start 222 Expr p Sentential Form Expr Term Expr Expr Te Ter rm m Expr→ Expr + Term App pplied Production Expr→ Expr Term Expr→ Expr Term Expr→ TermParsing Example Parse Remaining Input Tr Tre ee e Start Start 222 Expr p Sentential Form Expr Term Te Ter rm m Te Ter rm m Term App pplied Production Expr p →→ Expr p + Te erm Expr→ Expr Term Expr→ Term Expr→ Term Expr→ Term Parsing Example Parse Remaining Input Tr Tre ee e Start Start 222 Expr p Sentential Form Expr Term Int Int Te Ter rm m Term App pplied Production Int Term→ IntParsing Example Parse Remaining Input Tr Tre ee e Match Match Start Start 222 Input Token Expr p Sentential Form Expr Term 2 2 Te Ter rm m Term Int 2 Parsing Example Parse Remaining Input Tr Tre ee e Match Match Start Start 22 Input Token Expr p Sentential Form Expr Term 2 2 Te Ter rm m Term Int 2 Parsing Example Parse Remaining Input Tr Tre ee e Match Match Start Start 22 Input Token Expr p Sentential Form Expr Term 2 2 Te Ter rm m Term Int 2 Parsing Example Parse Remaining Input Tr Tre ee e Start Start 22 Expr p Sentential Form Expr Term 2 2 Te Ter rm mInt Int Term Term Int App pplied Production Int 2 Term→ Term IntParsing Example Parse Remaining Input Tr Tre ee e Start Start 22 Expr p Sentential Form Expr Term 2 2 Int Int Int Int Term Term Int App pplied Production Int 2 Term→ Int Int Parsing Example Parse Remaining Input Tr Tre ee e Match Match Start Start 22 Input Token Expr p Sentential Form Expr Term 2 2 2 2 Int Int Term Term Int Int 2 Int 2 Parsing Example Parse Remaining Input Tr Tre ee e Match Match Start Start 2 Input Token Expr p Sentential Form Expr Term 2 2 2 2 Int Int Term Term Int Int 2 Int 2 Parsing Example Parse Remaining Input Tr Tre ee e Match Match Start Start 2 Input Token Expr p Sentential Form Expr Term 2 2 2 2 Int Int Term Term Int Int 2 Int 2 Parsing Example Parse Remaining Input Tr Tre ee e Pa Par rs se e Start Start 2 Complete Expr p Sentential Form Expr Term 2 2 2 2 2 2 Term Term Int 2 Int 2 Int 2 Summary • Three Actions (Mechanisms) • AAl pply prodd ucti ti on t to expandd current t nonterminal in parse tree • • Match Match c current urrent terminal terminal (consuming (consuming input) input) • Accept the parse as correct • • Parser Parser generates generates p preorder reorder t traversal raversal of of parse parse t tree ree • visit parents before children • visit visit s siblings iblings f from rom lleft eft t to o r right ightPolicy Problem • Which production to use for each nonterminal • Classical Sep paration of Policy y and Mechanism • One Approach: Backtracking • Treat it as a search problem • At each choice point, try next alternative • If it is clear that current try fails, go back to previous previous choice choice and and try try something something d different ifferent • General technique for searching • Used a lot in classical AI and natural langg guage processing (parsing, speech recognition) Backtracking g Examp p le Parse Remaining Input Tr Tre ee e Start Start 222 Sentential Form Start Start Backtracking g Examp p le Parse Remaining Input Tr Tre ee e Start Start 222 Exp pr Sentential Form Expr Expr App pplied Production Start → Expr Backtracking g Examp p le Parse Remaining Input Tr Tre ee e Start Start 222 Exp pr Sentential Form Expr + Term Expr Expr + Te Ter rm m App pplied Production Expr → Expr + Term Backtracking g Examp p le Parse Remaining Input Tr Tre ee e Start Start 222 Exp pr Sentential Form Expr + Term Te Ter rm m + Te Ter rm m Term App pplied Production Expr → Term Backtracking g Examp p le Parse Remaining Input Match Match Tr Tre ee e Start Start 222 Input Exp pr T Token oken Sentential Form Expr + Term Int Int + Te Ter rm m Term App pplied Production Int Term → IntBacktracking g Examp p le Parse Remaining Input Can Can ’t t Tr Tre ee e Start Start 22 Match Exp pr Input Input Sentential Form Token Expr + Term 2 2 Te Ter rm m Term App pplied Production Int 2 Term → IntBacktracking g Examp p le Parse Remaining Input So So Tr Tre ee e Start Start 222 Backtrack Exp pr Sentential Form Expr Expr App pplied Production Start → Expr Backtracking g Examp p le Parse Remaining Input Tr Tre ee e Start Start 222 Exp pr Sentential Form Expr Term Expr Expr Te Ter rm m App pplied Production Expr → Expr Term Backtracking g Examp p le Parse Remaining Input Tr Tre ee e Start Start 222 Exp pr Sentential Form Expr Term Te Ter rm m Te Ter rm m Term App pplied Production Expr → Term Backtracking g Examp p le Parse Remaining Input Tr Tre ee e Start Start 222 Exp pr Sentential Form Expr Term Int Int Te Ter rm m Term App pplied Production Int Term → IntBacktracking g Examp p le Parse Remaining Input Match Match Tr Tre ee e Start Start 22 Input Exp pr T Token oken Sentential Form Expr Term 2 2 Te Ter rm m Term Int 2 Backtracking g Examp p le Parse Remaining Input Match Match Tr Tre ee e Start Start 22 Input Exp pr T Token oken Sentential Form Expr Term 2 2 Te Ter rm m Term Int 2 Left Recursion + TopDown Parsing = Infinite Infinite Loop Loop • Example Production: Term→ TermNum • PPt otent tii al l parsiing t teps: s Term Term Term Num Num Te Ter rm m Num Num Te Ter rm m Term Num General Search Issues • Three components • Search space (parse trees) • • Search Search algorithm algorithm (parsing (parsing algorithm) algorithm) • Goal to find (parse tree for input program) • Would like to (but can’t always) ensure that • Find goal (hopefully quickly) if it exists • Search terminates if it does not • Handled Handled in in various various ways ways in in various various contexts contexts • Finite search space makes it easy • Exploration strategies for infinite search space • Sometimes Sometimes one one goal goal more more important important (model (model checking) checking) • For parsing, hack grammar to remove left recursion Eliminating Left Recursion • Start with productions of form •A →A α • A →β • α, β sequences of terminals and nonterminals that do do not not start start with with A A • Repeated application of A →A α A builds builds parse parse tree tree like like t this: his: α A A α ββ αEliminating Left Recursion • Replacement productions –A →A α A →β R R is a new nonterminal –A →β R →α R – R →ε New New Parse Parse T Tr ree ee A Old Parse Tree A A R R β R αα A A αα R α β α ε Hacked Grammar Original Grammar New Grammar Fragment Fr Fragment agment Te Ter rm m→→ In Int t Te Ter rm m ’ ’ Term→ Term Int Term’→ Int Term’ Term→ Term / Int Term’→ / / Int Term’ Term→ Int Term’→ ε Parse Tree Comp parisons Original Original Gr Grammar ammar New New G Gr ra ammar mmar Term Te Ter rm m Int Term’ Int Term Term’ Int Int Int Term’ Int εε Eliminating Left Recursion • Changes search space exploration algorithm • E Elili mii nat tes didi rect t iif nfiii nit te recursiion • But grammar less intuitive • • Sets Sets things things up up for for predictive predictive p parsing arsingPredictive Parsing • Alternative to backtracking • UUf seful l ff or programmi ing ll anguages, whihi chh can bbe designed to make parsing easier • • Basic Basic iidea dea • Look ahead in input stream • • Decide Decide which which p production roduction t to o a apply pply based based on on next tokens in input stream • We will use one token of lookahead Predictive Parsing Example Grammar Start Start →→ Expr Expr Te Ter rm m →→ In In t t Te Te r rm m’ Expr → Term Expr’ Term’ → In t Term’ Expr’ → + Expr’ Term’ → / In t Term’ Expr’ → Expr’ Term’ → ε Expr’ → ε Choice Points • Assume Term’ is current position in parse tree • • Have Have three three possible possible productions productions to to apply apply Term’→ Int Term’ Te erm’→→ / / Int t Te erm’ Term’ →ε • Use next token to decide • If next token is , apply Term’→ Int Term’ • If next token is /, apply Term’→ / Int Term’ • • Otherwise Otherwise, a apply pply Term Term ’→→εε Predictive Parsing + Hand Coding = Recursive Recursive Descent Descent Parser Parser • One procedure per nonterminal NT • Pd Productitions NT NT→ ββ , …, NT NT→ ββ 1 n • Procedure examines the current input symbol T to determine which p production to app pply y •If T∈First(β ) k • Apply production k • C Consume t termiinalls iin ββ ((h check k ffor correct t k terminal) • Recursively call procedures for nonterminals in ββ k k • Current input symbol stored in global variable token • Procedures return • true true if if parse parse succeeds succeeds • false if parse fails Example Boolean Term() if (token = Int n) token = NextToken(); return(TermPrime()) else return(false) Boolean Boolean T TermPrime() ermPrime() if (token = ) token = NextToken(); if if (t (tok ken = I Int t n) ) t tok ken = N Next tT Tok ken() (); ret turn( (T TermP Priime()) ()) else return(false) else if (token = /) token = NextToken(); if (token = Int n) token = NextToken(); return(TermPrime()) else else return(false) return(false) else return(true) Term→ Int Term’ Term’→ Int Term’ Term’→ / Int Term’ Term’ →ε Multiple Productions With Same Prefix Prefix in in RHS RHS • Example Grammar NT NT → if if then then NT→ if then else • • Assume Assume NT NT is is current current position position in in parse parse tree t ree, a and nd if is the next token • • Unclear Unclear which which production production to to apply apply • Multiple k such that T∈First(β ) k • if if ∈∈ First(if First(if then) then) •if ∈ First(if then else) Solution: Left Factor the Grammar • New Grammar Factors Common Prefix Into Single Single Production Production NT→ if then NT’ NT NT’→→ else else NT’ →ε • • No No choice choice when when next next token token iis s iif f • All choices have been unified in one production. Nonterminals • What about productions with nonterminals NT NT→ NT NTα 1 1 NT→ NTα 2 2 • Must choose based on possible first terminals that NT and NT can generate 1 2 • Wh What t iif f NT NT or NT NT can generat te ε 1 2 • Must choose based on α and α 1 2 NT derives ε • Two rules • NT NT →ε iimpli lies NT NT ddi erives ε • NT→ NT ... NT and for all 1≤i≤n NT 1 n i ddi erives εi implili es NT NT ddi erives ε Fixed Point Algorithm for Derives ε for all nonterminals NT set et NT NT de deri ive es ε to to be be f fallse e for all productions of the form NT →ε set set NT NT derives derives ε to to be be true true while (some NT derives ε changed in last iteration) for for a all ll productions productions o of f t the he form form NT NT→→ NT NT ... NT NT 1 n if (for all 1≤i≤n NT derives ε) i set set NT NT derives derives εε to to be be true true First(β) •T∈ First(β ) if T can appear as the first symbol symbol in in a a derivation derivation starting starting from from ββ 1) T∈First(T ) 2) ) First( ( S ) ) ⊆⊆ First( ( Sββ) ) 3) NT derives ε implies First(β) ⊆ First(NTβ) 4) NT→ Sβ implies First(S β) ⊆ First(NT ) •Notation • T T iis a t termiinall, NT NT iis a nont termiinall, S S iis a terminal or nonterminal, and β is a sequence of terminals or nonterminals Rules + Request Generate System of Subset Inclusion Inclusion Constr Constra aints ints Grammar Request: What is First(Term’ ) Term’→ Int Term’ Term’→ / Int Term’ Constraints Term’ →ε First( Num Term’ ) ⊆ First(Term’ ) First(/ Num Term’ ) ⊆ First(Term’ ) Rules First() ⊆ First( Num Term’ ) 1) ) T∈First( ( T ) ) First( First(/) /) ⊆⊆ First( First(//N Num umT Te er rm m ’ ) ) 2) First(S) ⊆ First(Sβ) ∈First() 3) NT derives ε implies / ∈First(/) First(β) ⊆ First(NTβ) 4) NT→ Sβ implies First(S First(S ββ) ) ⊆ First( First(NT NT ) ) Constraint Propg pagation Algg orithm Constraints Solution Solution First( Num Term’ ) ⊆ First(Term’ ) First(Term’ )= First(/ Num Term’ ) ⊆ First(Term’ ) First( Num Term’ ) = First( First( ) ) ⊆⊆ First( First( Num Num T Te erm rm ’ ) ) First(/ erm’ ) = Num T First(/) ⊆ First(/ Num Term’ ) First() = ∈First() First(/) First(/) = = / / / ∈First(/) Initialize Sets to Propagate Constraints Until Fixed Point Constraint Proppg agation Algg orithm Constraints Solution Solution First( Num Term’ ) ⊆ First(Term’ ) First(Term’ )= First(/ Num Term’ ) ⊆ First(Term’ ) First( Num Term’ ) = First( First() ) ⊆⊆ First( First( Num Num T Te erm rm ’ ) ) First(/ erm’ ) = Num T First(/) ⊆ First(/ Num Term’ ) First() = ∈First() First(/) First(/) = = / / / ∈First(/) Grammar Term’→ Int Term’ Term’→ / Int Term’ Te Ter rm m’→→ εε Constraint Proppg agation Algg orithm Constraints Solution Solution First( Num Term’ ) ⊆ First(Term’ ) First(Term’ )= First(/ Num Term’ ) ⊆ First(Term’ ) First( Num Term’ ) = First( First() ) ⊆⊆ First( First( Num Num T Te erm rm ’ ) ) First(/ erm’ ) = / Num T First(/) ⊆ First(/ Num Term’ ) First() = ∈First() First(/) First(/) = = / / / ∈First(/) Grammar Term’→ Int Term’ Term’→ / Int Term’ Te Ter rm m’→→ εε Constraint Proppg agation Algg orithm Constraints Solution Solution First( Num Term’ ) ⊆ First(Term’ ) First(Term’ ) = ,/ First(/ Num Term’ ) ⊆ First(Term’ ) First( Num Term’ ) = First( First() ) ⊆⊆ First( First( Num Num T Te erm rm ’ ) ) First(/ erm’ ) = / Num T First(/) ⊆ First(/ Num Term’ ) First() = ∈First() First(/) First(/) = = / / / ∈First(/) Grammar Term’→ Int Term’ Term’→ / Int Term’ Te Ter rm m’→→ εε Constraint Proppg agation Algg orithm Constraints Solution Solution First( Num Term’ ) ⊆ First(Term’ ) First(Term’ ) = ,/ First(/ Num Term’ ) ⊆ First(Term’ ) First( Num Term’ ) = First( First() ) ⊆⊆ First( First( Num Num T Te erm rm ’ ) ) First(/ erm’ ) = / Num T First(/) ⊆ First(/ Num Term’ ) First() = ∈First() First(/) First(/) = = / / / ∈First(/) Grammar Term’→ Int Term’ Term’→ / Int Term’ Te Ter rm m’→→ εε Building A Parse Tree • Have each procedure return the section of the parse parse tree tree f for or the the p part art o of f t the he string string it it parsed parsed • Use exceptions to make code structure clean Building Parse Tree In Example Term() if (token = Int n) oldToken = token; ; token = NextToken(); (); node = TermPrime(); if (node == NULL) return oldToken; else else return(new return(new T Te ermNode(oldT rmNode(oldToken oken , node); node); else throw SyntaxError TermPrime() if (token = ) (token = /) first = token; next = NextToken(); if ( (next = Int n) ) token = NextToken(); return(new TermPrimeNode(first, next, TermPrime()) else else throw throw S SyntaxError yntaxError else return(NULL) Parse Tree for 234 Desired Concrete Abstract Parse Tree Parse Tree Term Int Int Te Ter rm m’ Te Ter rm m 2 Int Term Te Ter rm m’ Int Int 4 3 Int Int Term’ Int Int 2 2 3 3 4 ε Why Use HandCoded Parser • Why not use parser generator • • What What do do you you d do o if if your your parser parser d doesn oesn’t tw work ork • Recursive descent parser – write more code •Pas arseer g geene era ato tor • Hack grammar • But if parser generator doesn’t work, noth hing you can d do • If you have complicated grammar • • Increase Increase chance chance of of going going outside outside comfort comfort z zone one of parser generator • Your p parser may y NEVER work Bottom Line • Recursive descent parser properties • • Probably Probably more more work work • But less risk of a disaster you can almost always make a recursive descent p parser work • May have easier time dealing with resulting code • Single language system • No need to deal with potentially flaky parser generator • • No No integration integration issues issues with with automatically automatically generated code •If y your p parser develop pment time is small comp pared to rest of project, or you have a really complicated language, use handcoded recursive descent parser Summary • TopDown Parsing • UU se LL ookk ahh ead d t to AA void id BB acktk trackiki ng •Parser is • • Hand HandCoded Coded • Set of Mutually Recursive Procedures Direct Generation of Abstract Tree • TermPrime builds an incomplete tree • Missing leftmost child • • Returns Returns root root and and incomplete incomplete node node • (root, incomplete) = TermPrime() • Called with token = • Remaining tokens = 3 4 root Term Int Int incomplete incomplete Te Ter rm m 4 Missing Left child Int Missing Missingleft leftchild child to be filled in by to be filled in by 3 caller callerCode for Term Inp put to Term() parse if (token = Int n) 234 leftmostInt = token; ; token = NextToken(); (); (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete incomplete.leftChild leftChild = = leftmostI leftmostI n nt; t; return root; else throw SyntaxError Int token 2 2 Code for Term Inp put to Term() parse if (token = Int n) 234 leftmostInt = token; ; token = NextToken(); (); (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete incomplete.leftChild leftChild = = leftmostI leftmostI n nt; t; return root; else throw SyntaxError Int token 2 2 Code for Term Inp put to Term() parse if (token = Int n) 234 leftmostInt = token; ; token = NextToken(); (); (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete incomplete.leftChild leftChild = = leftmostI leftmostI n nt; t; return root; else throw SyntaxError Int token 2 2 Code for Term Inp put to Term() parse if (token = Int n) 234 leftmostInt = token; ; token = NextToken(); (); (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete incomplete.leftChild leftChild = = leftmostI leftmostI n nt; t; return root; else throw SyntaxError Term root Int Term incomplete 4 4 Int Int leftmostInt 2 2 3 3 Code for Term Inp put to Term() parse if (token = Int n) 234 leftmostInt = token; ; token = NextToken(); (); (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete incomplete.leftChild leftChild = = leftmostI leftmostI n nt; t; return root; else throw SyntaxError Term root Int Term incomplete 4 4 Int Int leftmostInt 2 2 3 3 Code for Term Inp put to Term() parse if (token = Int n) 234 leftmostInt = token; ; token = NextToken(); (); (root, incomplete) = TermPrime(); if (root == NULL) return leftmostInt; incomplete incomplete.leftChild leftChild = = leftmostI leftmostI n nt; t; return root; else throw SyntaxError Term root Int Term incomplete 4 4 Int Int leftmostInt 2 2 3 3 Code for TermPrime TermPrime() if (token = ) (token = /) Missing left child op = ttk oken; next t = N Next tTTok ken( ()); to be filled in by if (next = Int n) caller token = NextToken(); (root, (root, incomplete) incomplete) = TermPrime(); TermPrime(); if (root == NULL) root = new ExprNode(NULL, op, next); return (root, root); else newChild = new ExprNode(NULL, op, next); incomplete.leftChild = newChild; return(root, newChild); else else throw throw S SyntaxError yntaxError else return(NULL,NULL) MIT OpenCourseWare http://ocw.mit.edu 6.035 Computer Language Engineering Spring 2010 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
sharer
Presentations
Free
Document Information
Category:
Presentations
User Name:
Dr.MasonHanks
User Type:
Teacher
Country:
Germany
Uploaded Date:
23-07-2017