Question? Leave a message!




Context free grammars

Context free grammars
Syntax Analysis www.ThesisScientist.comOutline  Role of parser  Context free grammars  Top down parsing  Bottom up parsing  Parser generators www.ThesisScientist.comThe role of parser token Source Parse tree Intermediate Lexical Rest of Parser program representation Analyzer Front End getNext Token Symbol table www.ThesisScientist.comUses of grammars E E + T T T T F F F (E) id E TE’ E’ +TE’ Ɛ T FT’ T’ FT’ Ɛ F (E) id www.ThesisScientist.comError handling  Common programming errors  Lexical errors  Syntactic errors  Semantic errors  Lexical errors  Error handler goals  Report the presence of errors clearly and accurately  Recover from each error quickly enough to detect subsequent errors  Add minimal overhead to the processing of correct progrms www.ThesisScientist.comErrorrecover strategies  Panic mode recovery  Discard input symbol one at a time until one of designated set of synchronization tokens is found  Phrase level recovery  Replacing a prefix of remaining input by some string that allows the parser to continue  Error productions  Augment the grammar with productions that generate the erroneous constructs  Global correction  Choosing minimal sequence of changes to obtain a globally leastcost correction www.ThesisScientist.comContext free grammars  Terminals  Nonterminals expression expression + term  Start symbol expression expression – term expression term  productions term term factor term term / factor term factor factor (expression) factor id www.ThesisScientist.comDerivations  Productions are treated as rewriting rules to generate a string  Rightmost and leftmost derivations  E E + E E E E (E) id  Derivations for –(id+id)  E = E = (E) = (E+E) = (id+E)=(id+id) www.ThesisScientist.comParse trees  (id+id)  E = E = (E) = (E+E) = (id+E)=(id+id) www.ThesisScientist.comAmbiguity  For some strings there exist more than one parse tree  Or more than one leftmost derivation  Or more than one rightmost derivation  Example: id+idid www.ThesisScientist.comElimination of ambiguity www.ThesisScientist.comElimination of ambiguity (cont.)  Idea:  A statement appearing between a then and an else must be matched www.ThesisScientist.comElimination of left recursion  A grammar is left recursive if it has a nonterminal A + such that there is a derivation A= Aα  Top down parsing methods cant handle left recursive grammars  A simple rule for direct left recursion elimination:  For a rule like:  A Aαβ  We may replace it with  A β A’  A’ α A’ ɛ www.ThesisScientist.comLeft recursion elimination (cont.)  There are cases like following  S Aa b  A Ac Sd ɛ  Left recursion elimination algorithm:  Arrange the nonterminals in some order A1,A2,…,An.  For (each i from 1 to n)  For (each j from 1 to i1)  Replace each production of the form Ai Aj γ by the production Ai δ1 γ δ2 γ … δk γ where Aj δ1 δ2 … δk are all current Aj productions   Eliminate left recursion among the Aiproductions  www.ThesisScientist.comLeft factoring  Left factoring is a grammar transformation that is useful for producing a grammar suitable for predictive or topdown parsing.  Consider following grammar:  Stmt if expr then stmt else stmt  if expr then stmt  On seeing input if it is not clear for the parser which production to use  We can easily perform left factoring:  If we have Aαβ1 αβ2 then we replace it with  A αA’  A’ β1 β2 www.ThesisScientist.comLeft factoring (cont.)  Algorithm  For each nonterminal A, find the longest prefix α common to two or more of its alternatives. If α ɛ, then replace all of Aproductions Aαβ1 αβ2 … αβn γ by  A αA’ γ  A’ β1 β2 … βn  Example:  S I E t S i E t S e S a  E b www.ThesisScientist.comwww.ThesisScientist.comIntroduction  A Topdown parser tries to create a parse tree from the root towards the leafs scanning input from left to right  It can be also viewed as finding a leftmost derivation for an input string  Example: id+idid E E E E E E E TE’ lm lm lm lm lm E’ +TE’ Ɛ T E’ T E’ T E’ T E’ T E’ T FT’ F T’ F T’ F T’ F T’ + T E’ T’ FT’ Ɛ F (E) id id id id ƐƐ www.ThesisScientist.comRecursive descent parsing  Consists of a set of procedures, one for each nonterminal  Execution begins with the procedure for start symbol  A typical procedure for a nonterminal void A() choose an Aproduction, AX1X2..Xk for (i=1 to k) if (Xi is a nonterminal call procedure Xi(); else if (Xi equals the current input symbol a) advance the input to the next symbol; else / an error has occurred / www.ThesisScientist.comRecursive descent parsing (cont)  General recursive descent may require backtracking  The previous code needs to be modified to allow backtracking  In general form it cant choose an Aproduction easily.  So we need to try all alternatives  If one failed the input pointer needs to be reset and another alternative should be tried  Recursive descent parsers cant be used for left recursive grammars www.ThesisScientist.comExample ScAd Input: cad Aab a S S S c A d c A d c A d a b a www.ThesisScientist.comFirst and Follow  First() is set of terminals that begins strings derived from  If α=ɛ then is also in First(ɛ)  In predictive parsing when we have A αβ, if First(α) and First(β) are disjoint sets then we can select appropriate Aproduction by looking at the next input  Follow(A), for any nonterminal A, is set of terminals a that can appear immediately after A in some sentential form  If we have S = αAaβ for some αand βthen a is in Follow(A)  If A can be the rightmost symbol in some sentential form, then is in Follow(A) www.ThesisScientist.comComputing First  To compute First(X) for all grammar symbols X, apply following rules until no more terminals or ɛ can be added to any First set: 1. If X is a terminal then First(X) = X. 2. If X is a nonterminal and XY1Y2…Yk is a production for some k=1, then place a in First(X) if for some i a is in First(Yi) and ɛ is in all of First(Y1),…,First(Yi1) that is Y1…Yi1 = ɛ. if ɛ is in First(Yj) for j=1,…,k then add ɛ to First(X). 3. If X ɛ is a production then add ɛ to First(X)  Example www.ThesisScientist.comComputing follow  To compute First(A) for all nonterminals A, apply following rules until nothing can be added to any follow set: 1. Place in Follow(S) where S is the start symbol 2. If there is a production A αBβ then everything in First(β) except ɛ is in Follow(B). 3. If there is a production AB or a production AαBβ where First(β) contains ɛ, then everything in Follow(A) is in Follow(B)  Example www.ThesisScientist.comLL(1) Grammars  Predictive parsers are those recursive descent parsers needing no backtracking  Grammars for which we can create predictive parsers are called LL(1)  The first L means scanning input from left to right  The second L means leftmost derivation  And 1 stands for using one input symbol for lookahead  A grammar G is LL(1) if and only if whenever A αβare two distinct productions of G, the following conditions hold:  For no terminal a do αandβ both derive strings beginning with a  At most one of α or βcan derive empty string  If α= ɛ then βdoes not derive any string beginning with a terminal in Follow(A). www.ThesisScientist.comConstruction of predictive parsing table  For each production Aα in grammar do the following: 1. For each terminal a in First(α) add A in MA,a 2. If ɛ is in First(α), then for each terminal b in Follow(A) add A ɛ to MA,b. If ɛ is in First(α) and is in Follow(A), add A ɛ to MA, as well  If after performing the above, there is no production in MA,a then set MA,a to error www.ThesisScientist.comFirst Follow Example (,id +, , ), F E TE’ (,id +, ), T E’ +TE’ Ɛ (,id ), E T FT’ +,ɛ ), E’ T’ FT’ Ɛ +, ), ,ɛ T’ F (E) id Input Symbol Non id + ( ) terminal E TE’ E TE’ E E’ Ɛ E’ Ɛ E’ +TE’ E’ T FT’ T FT’ T T’ Ɛ T’ Ɛ T’ FT’ T’ Ɛ T’ www.ThesisScientist.com F F id F (E) Another example S iEtSS’ a S’ eS Ɛ E b Input Symbol Non a b i t e terminal S a S iEtSS’ S S’ Ɛ S’ Ɛ S’ S’ eS E b E www.ThesisScientist.comNonrecursive predicting parsing a + b Predictive output parsing X stack program Y Z Parsing Table M www.ThesisScientist.comPredictive parsing algorithm Set ip point to the first symbol of w; Set X to the top stack symbol; While (X) / stack is not empty / if (X is a) pop the stack and advance ip; else if (X is a terminal) error(); else if (MX,a is an error entry) error(); else if (MX,a = XY1Y2..Yk) output the production XY1Y2..Yk; pop the stack; push Yk,…,Y2,Y1 on to the stack with Y1 on top; set X to the top stack symbol; www.ThesisScientist.comExample  id+idid Stack Input Action Matched E id+idid www.ThesisScientist.comError recovery in predictive parsing  Panic mode  Place all symbols in Follow(A) into synchronization set for nonterminal A: skip tokens until an element of Follow(A) is seen and pop A from stack.  Add to the synchronization set of lower level construct the symbols that begin higher level constructs  Add symbols in First(A) to the synchronization set of nonterminal A  If a nonterminal can generate the empty string then the production deriving can be used as a default  If a terminal on top of the stack cannot be matched, pop the terminal, issue a message saying that the terminal was insterted www.ThesisScientist.comNon Input Symbol terminal id + ( ) synch E E TE’ E TE’ synch Example E’ Ɛ E’ Ɛ E’ +TE’ E’ synch synch synch T FT’ T FT’ T T’ Ɛ T’ Ɛ T’ FT’ T’ Ɛ T’ synch synch synch synch F id F (E) F Stack Input Action E )id+id Error, Skip ) E id+id id is in First(E) TE’ id+id id+id FT’E’ id+id idT’E’ +id T’E’ FT’E’ +id FT’E’ +id Error, MF,+=synch T’E’ +id F has been poped www.ThesisScientist.comwww.ThesisScientist.comIntroduction  Constructs parse tree for an input string beginning at the leaves (the bottom) and working towards the root (the top)  Example: idid F E E E + T T idid F id T id T F T T F F T F F id id F F F (E) id T F id F id id id F id id www.ThesisScientist.comShiftreduce parser  The general idea is to shift some symbols of input to the stack until a reduction can be applied  At each reduction step, a specific substring matching the body of a production is replaced by the nonterminal at the head of the production  The key decisions during bottomup parsing are about when to reduce and about what production to apply  A reduction is a reverse of a step in a derivation  The goal of a bottomup parser is to construct a derivation in reverse:  E=T=TF=Tid=Fid=idid www.ThesisScientist.comHandle pruning  A Handle is a substring that matches the body of a production and whose reduction represents one step along the reverse of a rightmost derivation Right sentential form Handle Reducing production id idid Fid Fid F TF Fid Tid id TF TF ETF www.ThesisScientist.comShift reduce parsing  A stack is used to hold grammar symbols  Handle always appear on top of the stack  Initial configuration: Stack Input w  Acceptance configuration Stack Input S www.ThesisScientist.comShift reduce parsing (cont.)  Basic operations:  Shift Stack Input Action  Reduce idid shift  Accept id reduce by Fid id F id reduce by TF  Error T id shift  Example: idid T id shift reduce by Fid Tid reduce by TTF TF reduce by ET T accept E www.ThesisScientist.comHandle will appear on top of the stack S S A B B A α α β γ z γ z y x y Stack Input Stack Input αβγ yz αγ xyz αβB yz αBxy z αβBy z www.ThesisScientist.comConflicts during shit reduce parsing  Two kind of conflicts  Shift/reduce conflict  Reduce/reduce conflict  Example: Input Stack … if expr then stmt else … www.ThesisScientist.comReduce/reduce conflict stmt id(parameterlist) stmt expr:=expr parameterlistparameterlist, parameter parameterlistparameter parameterid exprid(exprlist) exprid exprlistexprlist, expr Input Stack exprlistexpr … id(id ,id) … www.ThesisScientist.comLR Parsing  The most prevalent type of bottomup parsers  LR(k), mostly interested on parsers with k=1  Why LR parsers  Table driven  Can be constructed to recognize all programming language constructs  Most general nonbacktracking shiftreduce parsing method  Can detect a syntactic error as soon as it is possible to do so  Class of grammars for which we can construct LR parsers are superset of those which we can construct LL parsers www.ThesisScientist.comStates of an LR parser  States represent set of items  An LR(0) item of G is a production of G with the dot at some position of the body:  For AXYZ we have following items  A.XYZ  AX.YZ  AXY.Z  AXYZ.  In a state having A.XYZ we hope to see a string derivable from XYZ next on the input.  What about AX.YZ www.ThesisScientist.comConstructing canonical LR(0) item sets  Augmented grammar:  G with addition of a production: S’S  Closure of item sets:  If I is a set of items, closure(I) is a set of items constructed from I by the following rules:  Add every item in I to closure(I)  If Aα.Bβ is in closure(I) and Bγ is a production then add the item B.γ to clsoure(I). I0=closure(E’.E  Example: E’E E’.E E.E+T E E + T T E.T T T F F T.TF F (E) id T.F F.(E) www.ThesisScientist.com F.idConstructing canonical LR(0) item sets (cont.)  Goto (I,X) where I is an item set and X is a grammar symbol is closure of set of all items A αX. β where A α.X β is in I I1  Example E’E. E EE.+T I0=closure(E’.E E’.E I2 T E.E+T E’T. E.T TT.F T.TF I4 T.F F(.E) ( E.E+T F.(E) E.T F.id T.TF T.F F.(E) F.id www.ThesisScientist.comClosure algorithm SetOfItems CLOSURE(I) J=I; repeat for (each item A α.Bβ in J) for (each prodcution Bγ of G) if (B.γ is not in J) add B.γ to J; until no more items are added to J on one round; return J; www.ThesisScientist.comGOTO algorithm SetOfItems GOTO(I,X) J=empty; if (A α.X β is in I) add CLOSURE(A αX. β ) to J; return J; www.ThesisScientist.comCanonical LR(0) items Void items(G’) C= CLOSURE(S’.S); repeat for (each set of items I in C) for (each grammar symbol X) if (GOTO(I,X) is not empty and not in C) add GOTO(I,X) to C; until no new set of items are added to C on a round; www.ThesisScientist.comE’E E E + T T T T F F acc Example F (E) id I6 I9 EE+.T I1 T.TF T EE+T. + T.F E’E. TT.F F.(E) E EE.+T F.id I0=closure(E’.E I2 I7 I10 T E’.E F TT.F E’T. E.E+T F.(E) TTF. TT.F E.T F.id id T.TF id T.F I5 F.(E) Fid. + F.id ( I4 F(.E) I8 I11 E.E+T E ) EE.+T E.T T.TF F(E.) F(E). T.F F.(E) F.id I3 TF. www.ThesisScientist.comUse of LR(0) automaton  Example: idid Line Stack Symbols Input Action (1) 0 idid Shift to 5 (2) 05 id id Reduce by Fid (3) 03 F id Reduce by TF (4) 02 T id Shift to 7 (5) 027 T id Shift to 5 (6) 0275 Tid Reduce by Fid (7) 02710 TF Reduce by TTF (8) 02 T Reduce by ET (9) 01 E accept www.ThesisScientist.comLRParsing model a1… ai… an INPUT LR Parsing Output Sm Program Sm1 … ACTION GOTO www.ThesisScientist.comLR parsing algorithm let a be the first symbol of w; while(1) /repeat forever / let s be the state on top of the stack; if (ACTIONs,a = shift t) push t onto the stack; let a be the next input symbol; else if (ACTIONs,a = reduce Aβ) pop β symbols of the stack; let state t now be on top of the stack; push GOTOt,A onto the stack; output the production Aβ; else if (ACTIONs,a=accept) break; / parsing is done / else call errorrecovery routine; www.ThesisScientist.com(0) E’E Example (1) E E + T (2) E T STATE ACTON GOTO (3) T T F id + ( ) E T F (4) T F (5) F (E) 0 S5 S4 1 2 3 idid+id (6) Fid 1 S6 Acc Line Stac Symbol Input Action 2 R2 S7 R2 R2 k s 3 R R7 R4 R4 (1) 0 idid+id Shift to 5 4 (2) 05 id id+id Reduce by Fid 4 S5 S4 8 2 3 (3) 03 F id+id Reduce by TF 5 R R R6 R6 (4) 02 T id+id Shift to 7 6 6 (5) 027 T id+id Shift to 5 6 S5 S4 9 3 (6) 0275 Tid +id Reduce by Fid 7 S5 S4 10 (7) 02710 TF +id Reduce by T TF 8 S6 S11 (8) 02 T +id Reduce by ET 9 R1 S7 R1 R1 (9) 01 E +id Shift 10 R3 R3 R3 R3 (10) 016 E+ id Shift 11 R5 R5 R5 R5 (11) 0165 E+id Reduce by Fid (12) 0163 E+F Reduce by TF (13) 0169 E+T` Reduce by E E+T (14) 01 E accept www.ThesisScientist.comConstructing SLR parsing table  Method  Construct C=I0,I1, … , In, the collection of LR(0) items for G’  State i is constructed from state Ii:  If Aα.aβ is in Ii and Goto(Ii,a)=Ij, then set ACTIONi,a to “shift j”  If Aα. is in Ii, then set ACTIONi,a to “reduce Aα” for all a in follow(A)  If S’.S is in Ii, then set ACTIONI, to “Accept”  If any conflicts appears then we say that the grammar is not SLR(1).  If GOTO(Ii,A) = Ij then GOTOi,A=j  All entries not defined by above rules are made “error”  The initial state of the parser is the one constructed from the set of items containing S’.S www.ThesisScientist.comExample grammar which is not S L=R R SLR(1) L R id R L I0 I1 I3 I5 I7 S’.S S’S. S R. L id. L R. S .L=R I4 I6 S.R I2 I8 L.R SL=.R L .R S L.=R R L. R.L R.L L.id R L. L.R L.R R . L I9 L.id L.id S L=R. Action = Shift 6 2 Reduce RL www.ThesisScientist.comMore powerful LR parsers  CanonicalLR or just LR method  Use lookahead symbols for items: LR(1) items  Results in a large collection of items  LALR: lookaheads are introduced in LR(0) items www.ThesisScientist.comCanonical LR(1) items  In LR(1) items each item is in the form: Aα.β,a  An LR(1) item Aα.β,a is valid for a viable prefix γ if there is a derivation S=δAw=δαβw, where rm  Γ= δα  Either a is the first symbol of w, or w is ε and a is  Example: S=aaBab=aaaBab  SBB rm Item Ba.B,a is valid for γ=aaa  BaBb and w=ab www.ThesisScientist.comConstructing LR(1) sets of items SetOfItems Closure(I) repeat for (each item Aα.Bβ,a in I) for (each production Bγ in G’) for (each terminal b in First(βa)) add B.γ, b to set I; until no more items are added to I; return I; SetOfItems Goto(I,X) initialize J to be the empty set; for (each item Aα.Xβ,a in I) add item AαX.β,a to set J; return closure(J); void items(G’) initialize C to Closure(S’.S,); repeat for (each set of items I in C) for (each grammar symbol X) if (Goto(I,X) is not empty and not in C) add Goto(I,X) to C; until no new sets of items are added to C; www.ThesisScientist.comExample S’S SCC CcC Cd www.ThesisScientist.comCanonical LR(1) parsing table  Method  Construct C=I0,I1, … , In, the collection of LR(1) items for G’  State i is constructed from state Ii:  If Aα.aβ, b is in Ii and Goto(Ii,a)=Ij, then set ACTIONi,a to “shift j”  If Aα., a is in Ii, then set ACTIONi,a to “reduce Aα”  If S’.S, is in Ii, then set ACTIONI, to “Accept”  If any conflicts appears then we say that the grammar is not LR(1).  If GOTO(Ii,A) = Ij then GOTOi,A=j  All entries not defined by above rules are made “error”  The initial state of the parser is the one constructed from the set of items containing S’.S, www.ThesisScientist.comExample S’S SCC CcC Cd www.ThesisScientist.comLALR Parsing Table  For the previous example we had: I4 Cd. , c/d I47 Cd. , c/d/ I7 Cd. ,  State merges cant produce ShiftReduce conflicts. Why  But it may produce reducereduce conflict www.ThesisScientist.comExample of RR conflict in state merging S’S S aAd bBd aBe bAe A c B c www.ThesisScientist.comAn easy but spaceconsuming LALR table construction  Method: 1. Construct C=I0,I1,…,In the collection of LR(1) items. 2. For each core among the set of LR(1) items, find all sets having that core, and replace these sets by their union. 3. Let C’=J0,J1,…,Jm be the resulting sets. The parsing actions for state i, is constructed from Ji as before. If there is a conflict grammar is not LALR(1). 4. If J is the union of one or more sets of LR(1) items, that is J = I1 UI2…IIk then the cores of Goto(I1,X), …, Goto(Ik,X) are the same and is a state like K, then we set Goto(J,X) =k.  This method is not efficient, a more efficient one is discussed in the book www.ThesisScientist.comCompaction of LR parsing table  Many rows of action tables are identical  Store those rows separately and have pointers to them from different states  Make lists of (terminalsymbol, action) for each state  Implement Goto table by having a link list for each nonterinal in the form (current state, next state) www.ThesisScientist.comUsing ambiguous grammars STATE ACTON GO TO EE+E id + ( ) E 0 S3 S2 1 EEE 1 S4 S5 Acc E(E) 2 S3 S2 6 3 R4 R4 R4 R4 Eid 4 S3 S2 7 5 S3 S2 8 6 S4 S5 I0: E’.E I1: E’E. I2: E(.E) E.E+E EE.+E E.E+E 7 R1 S5 R1 R1 E.EE EE.E E.EE 8 R2 R2 R2 R2 E.(E) E.(E) 9 R3 R3 R3 R3 E.id E.id I4: EE+.E I5: EE.E I6: E(E.) I7: EE+E. I3: E.id E.E+E E(.E) EE.+E EE.+E E.EE E.E+E EE.E EE.E E.(E) E.EE I8: EEE. I9: E(E). E.id E.(E) EE.+E E.id www.ThesisScientist.com EE.EReadings  Chapter 4 of the book www.ThesisScientist.com