Automata Theory Grammars and languages

automata grammars and computability and automata grammar tutorial and finite automata and grammars and automata theory grammar examples
Dr.NaveenBansal Profile Pic
Dr.NaveenBansal,India,Teacher
Published Date:25-10-2017
Your Website URL(Optional)
Comment
CHAPTER 12 Languages, Automata, Grammars 12.1 INTRODUCTION This chapter discusses three topics, languages, automata, and grammars. These three topics are closely related to each other. Our languages will use the letters a, b,... to code data rather than the digits 0 and 1 used by some other texts. 12.2 ALPHABET, WORDS, FREE SEMIGROUP Consider a nonempty set A of symbols. A word or string w on the set A is a finite sequence of its elements. For example, suppose A=a,b,c. Then the following sequences are words on A: u= ababb and v = accbaaa When discussing words on A, we frequently call A the alphabet, and its elements are called letters. We will also 2 2 3 abbreviate our notation and write a for aa, a for aaa, and so on. Thus, for the above words, u = abab and 3 2 v = ac ba . The empty sequence of letters, denoted by l (Greek letter lambda) or  (Greek letter epsilon), or 1, is also ∗ considered to be a word on A, called the empty word. The set of all words on A is denoted by A (read: “A star”). The length of a word u, written u or l(u), is the number of elements in its sequence of letters. For the above words u and v, we have l(u)= 5 and l(v)= 7. Also, l(l)= 0, wherel is the empty word. Remark: Unless otherwise stated, the alphabet A will be finite, the symbols u, v, w will be reserved for words on A, and the elements of A will come from the letters a,b,c. Concatenation Consider two words u and v on the alphabet A. The concatenation of u and v, written uv, is the word obtained by writing down the letters of u followed by the letters of v. For example, for the above words u and v, we have 2 2 3 uv = ababbaccbaaa = abab ac ba 2 3 n+1 n As with letters, for any word u, we define u = uu, u = uuu, and, in general, u = uu . 303 Copyright © 2007, 1997, 1976 by The McGraw-Hill Companies, Inc. Click here for terms of use. 304 LANGUAGES, AUTOMATA, GRAMMARS CHAP. 12 Clearly, for any words u, v, w, the words (uv)w and u(vw) are identical, they simply consist of the letters of u, v, w written down one after the other. Also, adjoining the empty word before or after a word u does not change the word u. That is: Theorem 12.1: The concatenation operation for words on an alphabet A is associative. The empty wordl is an identity element for the operation. (Generally speaking, the operation is not commutative, e.g., uv = vu for the above words u and v.) Subwords, Initial Segments Consider any word u= a a ...a on an alphabet A. Any sequence w = a a ...a is called a subword 1 2 n j j+1 k of u. In particular, the subword w = a a ...a beginning with the first letter of u, is called an initial segment 1 2 k of u. In other words, w is a subword of u if u= v wv and w is an initial segment of u if u = wv. Observe that 1 2 l and u are both subwords of uv since u= lu. Consider the word u= abca. The subwords and initial segments of u follow: (1) Subwords: l,a,b,c,ab,bc,ca,abc,bca,abca = u (2) Initial segments: l,a,ab,abc,abca = u. Observe that the subword w = a appears in two places in u. The word ac is not a subword of u even though all its letters belong to u. Free Semigroup, Free Monoid Let F denote the set of all nonempty words from an alphabet A with the operation of concatenation. As noted above, the operation is associative. Thus F is a semigroup; it is called the free semigroup over A or the free semigroup generated by A. One can easily show that F satisfies the right and left cancellation laws. However, F is not commutative when A has more than one element. We will write F for the free semigroup over A when A we want to specify the set A. Now let M = A be the set of all words from A including the empty wordl. Sincel is an identity element for the operation of concatenation, M is a monoid, called the free monoid over A. 12.3 LANGUAGES A language L over an alphabet A is a collection of words on A. Recall that A denotes the set of all words on A. Thus a language L is simply a subset of A. EXAMPLE 12.1 Let A=a,b. The following are languages over A. 2 m m (a)L =a, ab, ab ,... (c)L =a b m 0 1 3 m n m n (b)L =a b m 0,n 0 (d)L = b ab m≥ 0,n≥ 0 2 4 One may verbally describe these languages as follows. (a) L consists of all words beginning with an a and followed by zero or more b’s. 1 (b) L consists of all words beginning with one or more as followed by one or more b’s. 2 (c) L consists of all words beginning with one or more as and followed by the same number of b’s. 3 (d) L consists of all words with exactly one a. 4CHAP. 12 LANGUAGES, AUTOMATA, GRAMMARS 305 Operations on Languages Suppose L and M are languages over an alphabet A. Then the “concatenation” of L and M, denoted by LM, is the language defined as follows: LM=uv u ∈ L, v ∈ V That is, LM denotes the set of all words which come from the concatenation of a word from L with a word from M. For example, suppose 2 2 3 2 4 6 L =a, b ,L =a ,ab,b ,L =a ,a ,a ,... 1 2 3 Then: 2 3 2 2 4 3 2 2 2 2 5 L L =a ,ab ,b a, b ,L L =a ,a b,ab ,b a ,b ab, b 1 1 1 2 3 5 7 2 2 2 4 2 6 L L =a ,a ,a ,..., b a ,b a ,b a ,... 1 3 Clearly, the concatenation of languages is associative since the concatenation of words is associative. Powers of a language L are defined as follows: 0 1 2 m+1 m L =l,L = L, L = LL, L = L L for m 1. The unary operation L (read “L star”) of a language L, called the Kleene closure of L because Kleene proved Theorem 12.2, is defined as the infinite union: ∞  ∗ 0 1 2 k L = L ∪ L ∪ L ∪···= L k=0 + The definition of L agrees with the notation A which consists of all words over A. Some texts define L to be 1 2 + the union of L ,L ,... that is, L is the same as L but without the empty word λ. 12.4 REGULAR EXPRESSIONS, REGULAR LANGUAGES Let A be a (nonempty) alphabet. This section defines a regular expression r over A and a language L(r) over A associated with the regular expression r. The expression r and its corresponding language L(r) are defined inductively as follows. Definition 12.1: Each of the following is a regular expression over an alphabet A. (1) The symbol “l” (empty word) and the pair “( )” (empty expression) are regular expressions. (2) Each letter a in A is a regular expression. ∗ (3) If r is a regular expression, then (r ) is a regular expression. (4) If r and r are regular expressions, then (r ∨ r ) is a regular expression. 1 2 1 2 (5) If r and r are regular expressions, then (r r ) is a regular expression. 1 2 1 2 All regular expressions are formed in this way. Observe that a regular expression r is a special kind of a word (string) which uses the letters of A and the five symbols: () ∗∨ l We emphasize that no other symbols are used for regular expressions. Definition 12.2: The language L(r) over A defined by a regular expression r over A is as follows: (1) L(l)=l and L(( )) =, the empty set. (2) L(a)=a, where a is a letter in A.306 LANGUAGES, AUTOMATA, GRAMMARS CHAP. 12 ∗ (3) L(r )= (L(r)) (the Kleene closure of L(r)). (4) L(r ∨ r ) = L(r )∪ L(r ) (the union of the languages). 1 2 1 2 (5) L(r r )= L(r )L(r ) (the concatenation of the languages). 1 2 1 2 Remark: Parentheses will be omitted from regular expressions when possible. Since the concatenation of languages and the union of languages are associative, many of the parentheses may be omitted. Also, by adopting the convention that “” takes precedence over concatenation, and concatenation takes precedence over “∨,” other parentheses may be omitted. Definition 12.3: Let L be a language over A. Then L is called a regular language over A if there exists a regular expression r over A such that L = L(r). EXAMPLE12.2 Let A=a,b. Each of the following is an expression r and its corresponding language L(r): (a) Let r = a. Then L(r) consists of all powers of a including the empty wordl. (b) Let r = aa. Then L(r) consists of all positive powers of a excluding the empty word. ∗ 2 (c) Let r = a∨ b . Then L(r) consists of a or any word in b, that is, L(r)=a,l,b,b ,···. ∗ ∗ (d) Let r = (a∨ b) . Note L(a∨ b)=a∪b= A; hence L(r)= A , all words over A. ∗ (e) Let r = (a ∨ b) bb. Then L(r) consists of the concatenation of any word in A with bb, that is, all words 2 ending in b . ∗ (f) Let r = a ∧ b . L(r) does not exist since r is not a regular expression. (Specifically, ∧ is not one of the symbols used for regular expressions.) EXAMPLE 12.3 Consider the following languages over A=a,b: n m n m m m (a)L =a b m 0,n 0; (b)L =b ab m 0,n 0; (c)L =a b m 0. 1 2 3 Find a regular expression r over A=a,b such that L = L(r) for i = 1,2,3. i (a) L consists of those words beginning with one or more a’s followed by one or more b’s. Thus we can set 1 ∗ ∗ ∗ ∗ r = aa bb . Note that r is not unique; for example, r = a abb is another solution. (b) L consists of all words which begin with one or more b’s followed by a single a which is then followed 2 by one or more b’s, that is, all words with exactly one a which is neither the first nor the last letter. Hence ∗ ∗ r = bb abb is a solution. (c) L consists of all words beginning with one or more a’s followed by the same number of b’s. There exists 3 no regular expression r such that L = L(r); that is, L is not a regular language. The proof of this fact 3 3 appears in Example 12.8. 12.5 FINITE STATE AUTOMATA A finite state automaton (FSA) or, simply, an automaton M, consists of five parts: (1) A finite set (alphabet) A of inputs. (2) A finite set S of (internal) states. (3) A subset Y of S (called accepting or “yes” states). (4) An initial state s in S. 0 (5) A next-state function F from S × A into S.CHAP. 12 LANGUAGES, AUTOMATA, GRAMMARS 307 Such an automaton M is denoted by M = (A,S,Y,s ,F) when we want to indicate its five parts. (The 0 plural of automaton is automata.) Some texts define the next-state function F : S × A → S in (5) by means of a collection of functions f : S → S, one for each a ∈ A. Setting F(s,a)= f (s) shows that both definitions are equivalent. a a EXAMPLE 12.4 The following defines an automaton M with two input symbols and three states: (1) A=a,b, input symbols. (2) S=s ,s ,s , internal states. 0 1 2 (3) Y =s ,s , “yes” states. 0 1 (4) s , initial state. 0 (5) Next-state function F : S × A→ S defined explicitly in Fig.12-1(a) or by the table in Fig.12-1(b). Fig. 12-1 State Diagram of an Automaton M An automaton M is usually defined by means of its state diagram D = D(M) rather than by listing its five parts. The state diagram D = D(M) is a labeled directed graph as follows. (1) The vertices of D(M) are the states in S and an accepting state is denoted by means of a double circle. (2) There is an arrow (directed edge) in D(M) from state s to state s labeled by an input a ifF(s ,a)= s or, j k j k equivalently, if f (s ) = s . a j k (3) The initial state s is indicated by means of a special arrow which terminates at s but has no initial vertex. 0 0 For each vertex s and each letter a in the alphabet A, there will be an arrow leaving s which is labeled j j by a; hence the outdegree of each vertex is equal to number of elements in A. For notational convenience, we label a single arrow by all the inputs which cause the same change of state rather than having an arrow for each such input. The state diagram D = D(M) of the automaton M in Example 12.4 appears in Fig.12-2. Note that both a and b label the arrow from s to s since F(s ,a) = s and F(s ,b) = s . Note also that the outdegree of each 2 2 2 2 2 2 vertex is 2, the number of elements in A.308 LANGUAGES, AUTOMATA, GRAMMARS CHAP. 12 Fig. 12-2 Language L(M) Determined by an Automaton M Each automaton M with input alphabet A defines a language over A, denoted by L(M), as follows. Let w = a a ··· a be a word on A. Then w determines the following path in the state diagram graph 1 2 m D(M) where s is the initial state and F(s , a )= s for i ≥ 1: 0 i−1 i i P = (s ,a ,s ,a ,s ,··· ,a ,s ) 0 1 1 2 2 m m We say that M recognizes the word w if the final state s is an accepting state in Y. The language L(M) of M is m the collection of all words from A which are accepted by M. EXAMPLE 12.5 Determine whether or not the automaton M in Fig.12-2 accepts the words: w = ababba; w = baab; w = l the empty word. 1 2 3 Use Fig.12-2 and the words w and w to obtain the following respective paths: 1 2 a b a b b a b a a b P = s → s → s → s → s → s → s and P = s → s → s → s → s 1 0 0 1 0 1 2 2 2 0 1 0 0 1 The final state in P is s which is not in Y; hence w is not accepted by M. On the other hand, the final state in 1 2 1 P is s which is in Y; hence w is accepted by M. The final state determined by w is the initial state s since 2 1 2 3 0 w = l is the empty word. Thus w is accepted by M since s ∈ Y. 3 3 0 EXAMPLE 12.6 Describe the language L(M) of the automaton M in Fig.12-2. L(M) will consist of all words w on A which do not have two successive b’s. This comes from the following facts: (1) We can enter the state s if and only if there are two successive b’s. 2 (2) We can never leave s . 2 (3) The state s is the only rejecting (nonaccepting) state. 2 The fundamental relationship between regular languages and automata is contained in the following theorem (whose proof lies beyond the scope of this text). Theorem 12.2(Kleene): Alanguage Loveranalphabet Aisregularifandonlyifthereisafinitestateautomaton M such that L = L(M). The star operation L on a language L is sometimes called the Kleene closure of L since Kleene first proved the above basic result. EXAMPLE 12.7 Let A=a,b. Construct an automaton M which will accept precisely those words from A which end in two b’s.CHAP. 12 LANGUAGES, AUTOMATA, GRAMMARS 309 Fig. 12-3 2 Since b is accepted, but not l or b, we need three states, s , the initial state, and s and s with an arrow 0 1 2 labeled b going from s to s and one from s to s . Also, s is an accepting state, but not s nor s . This gives 0 1 1 2 2 0 1 the graph in Fig.12-3(a). On the other hand, if there is an a, then we want to go back to s , and if we are in s 0 2 and there is a b, then we want to stay in s . These additional conditions give the required automaton M which is 2 shown in Fig.12-3(b). Pumping Lemma Let M be an automaton over A with k states. Suppose w = a a ··· a is a word over A accepted by M and 1 2 n supposew=nk, the number of states. Let P = (s ,s ,...,s ) 0 1 n be the corresponding sequence of states determined by the word w. Sincenk, two of the states in P must be equal, say s = s whereij. Let w be divided into subwords x, y, z as follows: i j x = a a ··· a,y = a ··· a,z = a ··· a 1 2 i i+1 j j+1 n m m As shown in Fig.12-4, xy ends in s = s ; hence xy also ends in s . Thus, for every m, w = xy z ends i j i m in s , which is an accepting state. n Fig. 12-4 The above discussion proves the following important result. Theorem 12.3 (Pumping Lemma): Suppose M is an automaton over A such that: (i) M has k states. (ii) M accepts a word w from A wherewk. m Then w = xyz where, for every positive m, w = xy z is accepted by M. m The next example gives an application of the Pumping Lemma. m m EXAMPLE 12.8 Show that the language L=a b m is positive is not regular. Suppose L is regular. Then, by Theorem 12.2, there exists a finite state automaton M which accepts L. k k Suppose M has k states. Let w = a b . Then w k. By the Pumping Lemma (Theorem 12.3), w = xyz 2 2 where y is not empty and w = xy z is also accepted by M.If y consists of only a’s or only b’s, then w will not 2 have the same number of a’s as b’s. If y contains both a’s and b’s, then w will have a’s following b’s. In either 2 case w does not belong to L, which is a contradiction. Thus L is not regular. 2310 LANGUAGES, AUTOMATA, GRAMMARS CHAP. 12 12.6 GRAMMARS Figure12-5 shows the grammatical construction of a specific sentence. Observe that there are: (1) various variables, e.g., (sentence), (noun phrase),···; (2) various terminal words, e.g., “The”, “boy,”···; (3) a beginning variable (sentence); (4) various substitutions or productions, e.g. )sentence→)noun phrase)verb phrase )object phrase→)article)noun )noun→ apple The final sentence only contains terminals, although both variables and terminals appear in its construction by the productions. This intuitive description is given in order to motivate the following definition of a grammar and the language it generates. Fig. 12-5 Definition 12.4: A phrase structure grammar or, simply, a grammar G consists of four parts: (1) A finite set (vocabulary) V. (2) A subset T of V whose elements are called terminals; the elements of N = V\T are called non-terminals or variables. (3) A nonterminal symbol S called the start symbol. (4) A finite set P of productions. (A production is an ordered pair (α, β), usually written α → β, where α and β are words in V, and the production must contain at least one nonterminal on its left side α.) Such a grammar G is denoted by G= G(V,T,S,P) when we want to indicate its four parts. The following notation, unless otherwise stated or implied, will be used for our grammars. Terminals will be denoted by italic lower case Latin letters, a,b,c,···, and nonterminals will be denoted by italic capital Latin letters, A,B,C,···, with S as the start symbol. Also, Greek letters, α, β,···, will denote words in V, that is, words in terminals and nonterminals. Furthermore, we will write α → (β ,β ,···,β ) instead of α → β ,α → β ,···,α → β 1 2 k 1 2 k Remark: Frequently, we will define a grammar G by only giving its productions, assuming implicitly that S is the start symbol and that the terminals and nonterminals of G are only those appearing in the productions.CHAP. 12 LANGUAGES, AUTOMATA, GRAMMARS 311 EXAMPLE 12.9 The following defines a grammar G with S as the start symbol: 1 2 3 4 5 V =A,B,S,a,b,T =a,b,P =S→ AB, A→ Aa, B→Bb, A→a, B→ b The productions may be abbreviated as follows: S → AB, A→ (Aa, a), B → (Bb,b) Language L(G) of a Grammar G Suppose w and w are words over the vocabulary set V of a grammar G. We write w ⇒ w if w can be obtained from w by using one of the productions; that is, if there exists words u and v such that w = uαv and w = uβv and there is a production α → β. Furthermore, we write ∗ w ⇒⇒ w or w⇒ w if w can be obtained from w using a finite number of productions. Now let G be a grammar with terminal set T . The language of G, denoted by L(G), consists of all words in T that can be obtained from the start symbol S by the above process; that is, ∗ L(G)=w ∈ T S ⇒⇒ w 2 4 EXAMPLE 12.10 Consider the grammar G in Example 12.9. Observe that w = a b can be obtained from the start symbol S as follows: 2 4 S ⇒ AB ⇒ AaB ⇒ aaB ⇒ aaBb ⇒ aaBbb ⇒ aaBbbb ⇒ aabbbb = a b 2 4 2 4 Here we used the productions 1, 2, 4, 3, 3, 3, 5, respectively. Thus we can write S ⇒⇒ a b . Hence w = a b belongs to L(G). More generally, the production sequence: 1, 2 (r times), 4, 3 (s times), 5 s r will produce the word w = a ab b where r and s are nonnegative integers. On the other hand, no sequence of productions can produce an a after a b. Accordingly, m n L(G)=a b m and n are positive integers That is, the language L(G) of the grammar G consists of all words which begin with one or more a’s followed by one or more b’s. EXAMPLE 12.11 Find the language L(G) overa,b,c generated by the grammar G: S → aSb, aS → Aa, Aab → c n n First we must apply the first production one or more times to obtain the word w = a Sb wheren 0. To m m eliminate S, we must apply the second production to obtain the word w = a Aabb where m = n− 1 ≥ 0. m m Now we can only apply the third production to finally obtain the word w = a cb where m≥ 0. Accordingly, m m L(G)=a cb m nonnegative That is, L(G) consists of all words with the same nonnegative number of a’s and b’s separated byac.312 LANGUAGES, AUTOMATA, GRAMMARS CHAP. 12 Types of Grammars Grammars are classified according to the kinds of production which are allowed. The following grammar classification is due to Noam Chomsky. AType 0 grammar has no restrictions on its productions. Types 1, 2, and 3 are defined as follows: (1) A grammar G is said to be of Type 1 if every production is of the form α → β where α≤β or of the form α → l. (2) A grammar G is said to be of Type 2 if every production is of the form A → β where the left side A is a nonterminal. (3) A grammar G is said to be of Type 3 if every production is of the form A → a or A → aB, that is, where the left side A is a single nonterminal and the right side is a single terminal or a terminal followed by a nonterminal, or of the form S → l. Observe that the grammars form a hierarchy; that is, every Type 3 grammar is a Type 2 grammar, every Type 2 grammar is a Type 1 grammar, and every Type 1 grammar is a Type 0 grammar. Grammars are also classified in terms of context-sensitive, context-free, and regular as follows. (a) A grammar G is said to be context-sensitive if the productions are of the form αAα → αβα The name “context-sensitive” comes from the fact that we can replace the variable A by β in a word only when A lies between α and α . (b) A grammar G is said to be context-free if the productions are of the form A→ β The name “context-free” comes from the fact that we can now replace the variable A by β regardless of where A appears. (c) A grammar G is said to be regular if the productions are of the form A→ a, A→ aB, S → l Observe that a context-free grammar is the same as a Type 2 grammar, and a regular grammar is the same as a Type 3 grammar. A fundamental relationship between regular grammars and finite automata follows. Theorem 12.4: A language L can be generated by a Type 3 (regular) grammar G, if and only if there exists a finite automaton M which accepts L . Thus a language L is regular iff L = L(r) where r is a regular expression iff L= L(M) where M is a finite automaton iff L = L(G) where G is a regular grammar. (Recall that “iff” is an abbreviation for “if and only if.”) n n EXAMPLE 12.12 Consider the language L=a b n 0. (a) Find a context-free grammar G which generates L. Clearly the grammar G with the following productions will generate L: S → ab, S → aSb Note that G is context-free (b) Find a regular grammar G which generates L. By Example 12.8, L is not a regular language. Thus L cannot be generated by a regular grammar.CHAP. 12 LANGUAGES, AUTOMATA, GRAMMARS 313 Derivation Trees of Context-Free Grammars Consider a context-free (Type 2) grammar G. Any derivation of a word w in L(G) can be represented graphically by means of an ordered, rooted tree T , called a derivation tree or parse tree. We illustrate such a derivation tree below. Let G be the context-free grammar with the following productions: S → aAB, A→ Bba, B → bB, B → c The word w = acbabc can be derived from S as follows: S ⇒ aAB ⇒ a(Bba)B ⇒ acbaB ⇒ acba(bB)⇒ acbabc One can draw a derivation tree T of the word w as indicated by Fig.12-6. Specifically, we begin with S as the root and then add branches to the tree according to the production used in the derivation of w. This yields the completed tree T which is shown in Fig.12-6(e). The sequence of leaves from left to right in T is the derived word w. Also, any non-leaf in T is a variable, say A, and the immediate successors (children) of A form a word α where A → α is the production of G used in the derivation of w. Fig. 12-6 Backus-Naur Form There is another notation, called the Backus-Naur form, which is sometimes used for describing the produc- tions of a context-free (Type 2) grammar. Specifically: (i) “::=” is used instead of “→.” (ii) Every nonterminal is enclosed in brackets). (iii) All productions with the same nonterminal left-hand side are combined into one statement with all the right-hand sides listed on the right of :: = separated by vertical bars. For instance, the productions A→ aB, A→ b, A→ BC are combined into the one statement: )A ::= a)Bb)B)C314 LANGUAGES, AUTOMATA, GRAMMARS CHAP. 12 Machines and Grammars Theorem 12.4 tells us that the regular languages correspond to the finite state automata (FSA). There are also machines, more powerful than the FSA, which correspond to the other grammars. (a) Pushdown Automata: A pushdown automaton P is similar to a FSA except that P has an auxiliary stack whichprovidesanunlimitedamountofmemoryfor P. Alanguage Lisrecognizedbyapushdownautomaton P if and only if L is context-free. (b) Linear BoundedAutomata: Alinear bounded automaton B is more powerful than a pushdown automaton. Such an automaton B uses a tape which is linearly bounded by the length of the input word w. A language L is recognized by a linear bounded automaton B if and only if L is context-sensitive. (c) Turing Machine: ATuring machine M, named after the British mathematicianAlan Turing, uses an infinite tape; it is able to recognize every language L that can be generated by any phase-structure grammar G. In fact, a Turing machine M is one of a number of equivalent ways to define the notion of a “computable” function. The discussion of the pushdown automata and the linear bounded automata lies beyond the scope of this text. We will discuss Turing machines in Chapter 13. Solved Problems WORDS 3 2 2 2 2 2 12.1. Consider the words u= a ba b and v = bab . Find: (a) uv;uv;(b) vu,vu;(c) v ,v . Write the letters of the first word followed by the letters of the second word, and then count the number of letters in the resulting word. 3 2 3 2 2 2 2 3 (a) uv = (a ba b )(bab )= a ba b ab;uv= 12 2 2 3 2 2 2 3 2 (b) vu= (bab )(a ba b )= bab a ba b;vu= 12 2 2 3 2 2 2 (c) v = vv = (bab )(bab )= bab ab;v = 8 2 3 12.2. Suppose u = a b and v = b ab. Find: (a) uvu;(b)lu, ul, ulv. 2 2 4 (a) Write down the letters in u, then v, and finally u to obtain uvu = a b aba b. 2 2 4 (b) Sincel is the empty word,lu= ul= u= a b and ulv = uv = a b ab. 12.3. Let w = abcd.(a) Find all subwords of w.(b) Which of them are initial segments? (a) The subwords are: l, a, b, c, d, ab, bc, cd, abc, bcd, w = abcd. (We emphasize that v = acd is not a subword of w even though all its letters belong to w.) (b) The initial segments arel, a, ab, abc, w = abcd. 12.4. For any words u and v, show that: (a)uv=u+v; (b)uv=vu. (a) Suppose u= r and v= s. Then uv will consist of the r letters of u followed by the s letters of v; hence uv= r + s=u+v. (b) Using (a) yieldsuv=u+v=v+u=vu. 12.5. State the difference between the free semigroup on an alphabet A and the free monoid on A. The free semigroup on A is the set of all nonempty words in A under the operation of concatenation; it does not include the empty wordl. On the other hand, the free monoid on A does include the empty word λ.CHAP. 12 LANGUAGES, AUTOMATA, GRAMMARS 315 LANGUAGES 12.6. Let A=a,b. Describe verbally the following languages over A (which are subsets of A): s t m r 2 m 3 (a) L =(ab) m 0;(b) L =a ba ba r, s, t ≥ 0;(c) L =a b a m 0. 1 2 3 (a) L consists of words w = ababab···ab, that is, beginning with a, alternating with b, and ending with b. 1 (b) L consists of all words with exactly two b’s. 2 2 3 (c) L consists of all words beginning with a and ending with a with one or more b’s between them. 3 2 2 12.7. Let K=a,ab,a and L=b ,aba be languages over A=a,b. Find: (a) KL;(b) LL. 2 2 3 2 2 3 (a) Concatenate words in K with words in L to obtain KL=ab ,a ba,ab ,ababa,a b ,a ba. 4 2 2 2 (b) Concatenate words in L with words in L to obtain LL=b ,b aba,abab ,aba ba. 0 3 −2 12.8. Consider the language L=ab,c over A=a,b,c. Find: (a) L ;(b) L ;(c) L . 0 (a) L =l, by definition. (b) Form all three-word sequences from L to obtain: 3 2 2 3 L =ababab,ababc,abcab,abc ,cabab,cabc,c ab,c (c) The negative power of a language is not defined. ∗ 2 3 12.9. Let A=a,b,c. Find L where: (a) L=b ;(b) L=a,b;(c) L=a,b,c . ∗ n (a) L consists of all words b where n is even (including the empty wordl). ∗ (b) L consists of words in a and b. ∗ (c) L consists of all words from A with the property that the length of each maximal subword composed entirely of c’s is divisible by 3. 12.10. Consider a countable alphabet A=a ,a ,.... Let L be the language over A consisting of those words 1 2 k w such that the sum of the subscripts of the letters in w is equal to k. (For example, w = a a a a a 2 3 3 6 4 belongs to L .) (a) Find L.(b) Show that L is finite. (c) Show that A is countable. (d) Show 18 4 k that any language over A is countable. (a) No word in L can have more than four letters, and no letter a withn 4 can be used. 4 n Thus we obtain the following list: a a a a,a a a,a a a,a a a,a a,a a,a a,a 1 1 1 1 1 1 2 1 2 1 2 1 1 1 3 3 1 2 2 4 (b) Only a finite number of the a’s, that is, a , a ,...,a , can be used in L and no word in L can have 1 2 k k k more than k letters. Thus L is finite. k (c) A is the countable union of the finite sets L ; hence A is countable. k (d) L is a subset of the countable set A; hence L is also countable. REGULAR EXPRESSIONS, REGULAR LANGUAGES 12.11. Let A=a,b. Describe the language L(r) where: ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ (a) r = abb a; (b) r = b ab ab ; (c) r = a ∨ b ; (d) r = ab ∧ a . (a) L(r) consists of all words beginning and ending in a and enclosing one or more b’s. (b) L(r) consists of all words with exactly two a’s. 2 2 (c) L(r) consists of all words only in a or only in b, that is, L(r)=l,a,a ,··· ,b,b ,··· (d) Here r is not a regular expression since∧ is not one of the symbols used in forming regular expressions.316 LANGUAGES, AUTOMATA, GRAMMARS CHAP. 12 12.12. Let A=a,b,c and let w = abc. State whether or not w belongs to L(r) where: ∗ ∗ ∗ ∗ (a) r = a ∨ (b∨ c) ; (b) r = a (b∨ c) . (a) No. Here L(r) consists of word in a or words in b and c. ∗ ∗ (b) Yes, since a ∈ L(a) and bc ∈ (b∨ c) . 12.13. Let A=a,b. Find a regular expression r such that L(r) consists of all words w where: 2 2 (a) w begins with a and ends with b;(b) w contains an even number of a’s. 2 ∗ 2 ∗ (a Let r = a (a∨ b) b . (Note (a∨ b) consists of all words on A.) ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ (b) Note s = b ab ab consists of all words with exactly two a’s. Then let r = s = (b ab ab ) . FINITE AUTOMATA 12.14. Let M be the automaton with the following input set A, state set S with initial state s , and accepting 0 (“yes”) state set Y: A=a,b,S=s ,s ,s ,Y =s 0 1 2 2 Suppose next state function F of M is given by the table in Fig.12-7(a). (a) Draw the state diagram D = D(M) of M. (b) Describe the language L = L(M) accepted by M. (a) The state diagram D appears in Fig.12.7(b). The vertices of D are the states, and a double circle indicates an accepting state. IfF(s ,x)= s , then there is a directed edge from s to s labeled by the input symbol x. Also, j k j k there is a special arrow which terminates at the initial state s . 0 (b) L(M) consists of all words w with exactly one b. Specifically, if an input word w has no b’s, then it terminates in s and if w has two or more b’s then it terminates in s . Otherwise w terminates in s , which is the only accepting 0 2 1 state. Fig. 12-7 12.15. Let A=a,b. Construct an automaton M which will accept precisely those words from A which have an even number of a’s. For example, aababbab, aa, bbb, ababaa will be accepted by M, but ababa, aaa, bbabb will be rejected by M. We need only two states, s and s . We assume that M is in state s or s according as the number of a’s up to 0 1 0 1 the given step is even or odd. (Thus s is an accepting state, but s is a rejecting state.) Then only a will change the 0 1 state. Also, s is the initial state. The state diagram of M is shown in Fig.12-8(a). 0 12.16. Let A=a,b. Construct an automaton M which will accept those words from A which begin with an a followed by (zero or more) b’s. The automaton M appears in Fig.12-8(b).CHAP. 12 LANGUAGES, AUTOMATA, GRAMMARS 317 Fig. 12-8 12.17. Describe the words w in the language L accepted by the automaton M in Fig.12-9(a). The system can reach the accepting state s only when there exists an a in w which follows a b. 2 Fig. 12-9 12.18. Describe the words w in the language L accepted by the automaton M in Fig.12-9(b). Each a in w does not change the state of the system, whereas each b in w changes the state from R,to s i i+1 (modulo 4). Thus w is accepted by M if the number n of b’s in w is congruent to 3 modulo 4, that is, where n= 3,7,11,···. 12.19. Suppose L is a language over A which is accepted by the automaton M = (A,S,Y,s ,F). Find an 0 C automaton N which accepts L , that is, those words from A which do not belong to L. SimplyinterchangetheacceptingandrejectingstatesinM toobtainN. Thenw willbeacceptedinthenewmachine C N if and only if w is rejected in M, that is, if and only if w belongs to L . Formally, N = (A,S,S\Y,s ,F). 0 12.20. Let M = (A,S,Y,s ,F) and M = (A, S ,Y ,S ,F ) be automata over the same alphabet A which 0 0 accept the languages L(M) and L(M ) over A, respectively. Construct an automaton N over A which accepts precisely L(M)∩ L(M ). Let S× S be the set of states of N. Let (s, s ) be an accepting state of N if both s and s are accepting states in M andM ,respectively. Let(s , s )betheinitialstateofN. Letthenext-statefunctionofN, G: (S×S )×A → (S×S ), 0 0 be defined by: G((s, s ), a) = (F(s, a), F (s ,a)) Then N will accept precisely those words in L(M)∩ L(M ).318 LANGUAGES, AUTOMATA, GRAMMARS CHAP. 12 12.21. Repeat Problem 12.20 except now let N accept precisely L(M)∪ L(M ). Again, let S× S be the set of states of N and let (s , s ) be the initial state of N. Now let (S× Y )∪ (Y × S ) be 0 0 the accepting states in N. The next-state function G is again defined by G((s, s ), a)= (F(s, a), F (s ,a)) Then N will accept precisely those words in L(M)∪ L(M ). GRAMMARS 12.22. Define: (a) context-free grammar; (b) regular grammar. (a) A context-free grammar is the same as a Type 2 grammar, that is, every production is of the form A → β, that is, the left side is a single variable and the right side is a word in one or more symbols. (b) A regular grammar is the same as a Type 3 grammar, that is, every production is of the form A → a or of the form A→ aB, that is, the left side is a single variable and the right side is either a single terminal or a terminal followed by a variable. 12.23. Find the language L(G) generated by the grammar G with variables S, A, B, terminals a, b, and produc- tions S → aB, B → b, B → bA, A→ aB. Observe that we can only use the first production once since the start symbol S does not appear anywhere else. Also, we can only obtain a terminal word by finally using the second production. Otherwise we alternately add a’s and b’s using the third and fourth productions. Accordingly, n L(G)=(ab) = ababab···ab n∈ N 12.24. Let L be the language on A=a,b which consists of all words w with exactly one b, that is, r s r s L=b,a b,ba ,a ba r 0,s 0 (a) Find a regular expression r such that L= L(r). (b) Find a regular grammar G which generates the language L. ∗ ∗ (a) Let r = a ba . Then L(r)= L. (b) The regular grammar G with the following productions generates L: S → (b, aA), A→ (b, aA, bB), B → (a, aB) That is, the letter b can only appear once in any word derived from S. G is regular since it has the required form. 12.25. Let G be the regular grammar with productions: S → aA, A→ aB, B → bB, B → A. (a) Find the derivation tree of the word w = aaba. (b) Describe all words w in the language L generated by G. (a) Note first that w can be derived from S as follows: S ⇒ aA⇒ a(aB) ⇒ aa(bB)⇒ aaba Figure 12-10(a) shows the corresponding derivation tree. r (b) Using the production 1, then 2, then 3, r times, and finally 4 will derive the word w = aab a where r ≥ 0. No other word can be derived from S. 12.26. Figure 12-10(b) is the derivation tree of a word w in the language L of a context-free grammar G. (a) Find w.(b) Which terminals, variables, and productions must lie in G? (a) The sequence of leaves from left to right yields the word w= ababbbba.CHAP. 12 LANGUAGES, AUTOMATA, GRAMMARS 319 Fig. 12-10 (b) The leaves show that a and b must be terminals, and the internal vertices show that S and A must be variables with S the starting variable. The children of each variable show that S → AbS, A → aS, S → ba, and A → b must be productions. 12.27. Does a derivation tree exist for any word w derived from the start symbol S in a grammar G? No. Derivation trees only exist for Type 2 and 3 grammars, that is, for context-free and regular grammars. 12.28. Determine the type of grammar G which consists of the following productions: (a) S → aA, A→ aAB, B → b, A→ a (b) S → aAB, AB → bB, B → b, A→ aB (c) S → aAB, AB → a, A→ b, B → AB (d) S → aB, B → bA, B → b, B → a, A→ aB, A→ a (a) Each production is of the form A→ α; hence G is a context-free or Type 2 grammar. (b) The length of the left side of each production does not exceed the length of the right side; hence G is a Type 1 grammar. (c) The production AB → a means G is a Type 0 grammar. (d) G is a regular or Type 3 grammar since each production has the form A→ a or A→ aB. 12.29. Rewrite each grammar G in Problem 12.28 in Backus-Naur form. The Backus-Naur form only applies to context-free grammars (which includes regular grammars). Thus only (a) and (d) can be written in Backus-Naur form. The form is obtained as follows: (i) Replace→ by :: =. (ii) Enclose nonterminals in brackets). (iii) All productions with the same left-hand side are combined in one statement with all the right-hand sides listed on the right of :: = separated by vertical bars. Accordingly: (a) )S::= a)A, )A::= a)A)B a, )B::= b (b) )S::= a)B, )B::= b)A b a, )A::= a)B a Supplementary Problems WORDS 2 2 3 2 2 12.30. Consider the words u = ab a and v = aba b . Find: (a) uv; (b) vu; (c) u ; (d)lu; (e) vlv. 2 3 2 2 2 12.31. For the words u= ab a and v = aba b , find: u,v,uv,vu, andv . 12.32. Let w = abcde. (a) Find all subwords of w. (b) Which of them are initial segments? 12.33. Suppose u= a a ··· a and the a are distinct. Find the number n of subwords of u. 1 2 r k320 LANGUAGES, AUTOMATA, GRAMMARS CHAP. 12 LANGUAGES 2 2 12.34. Let L=a ,ab and K=a,ab,b . Find: (a) LK; (b) KL; (c) L∨ K; (d) K ∧ L. 2 0 2 3 12.35. Let L=a ,ab. Find: (a) L ; (b) L ; (c) L . 2 2 2 3 12.36. Let A=a,b,c. Describe L if: (a) L=a ; (b) L=a,b ; (c) L=a,b ,c . 2 ∗ ∗ 2 12.37. Does (L ) = (L ) ? If not, how are they related? 12.38. Consider a countable alphabet A=a ,a ,···. Let L the language over A consisting of those words w such that 1 2 k the sum of the subscripts of the letters in w is equal to k. (See Problem 12.10.) Find: (a) L;(b) L . 3 5 REGULAR EXPRESSIONS, REGULAR LANGUAGES 12.39. Let A=a,b,c. Describe the language L(r) for each of the following regular expressions: ∗ ∗ ∗ (a) r = ab c; (b) r = (ab∨ c) ; (c) r = ab∨ c . 12.40. Let A=a,b. Find a regular expression r such that L(r) consists of all words w where: (a) w contains exactly three a’s. (b) The number of a’s is divisible by 3. 12.41. Let A=a,b,c and let w = ac. State whether or not w belongs to L(r) where: ∗ ∗ ∗ ∗ ∗ (a)r = a bc ; (b)r = a b c; (c)r = (ab∨ c) 12.42. Let A=a,b,c and let w = abc. State whether or not w belongs to L(r) where: ∗ ∗ ∗ ∗ ∗ 2 ∗ (a)r = ab (bc) ; (b)r = a ∨ (b∨ c) ; (c) r = a b(bc∨ c ) . FINITE AUTOMATA 12.43. Let A=a,b. Construct an automaton M such that L(M) will consist of those words w where: (a) the number of b’s is divisible by 3. (b) w begins with a and ends in b. 12.44. Let A=a,b. Construct an automaton M which will accept the language: s r r s (a) L(M)=b ab r 0,s 0; (b) L(M)=a b r 0,s 0. 12.45. Let A=a,b. Construct an automaton M such that L(M) will consist of those words where the number of a’s is divisible by 2 and the number of b’s is divisible by 3. (Hint: Use Problems 12.15, 12.43(a), and 12.20.) 12.46. Find the language L(M) accepted by the automaton M in Fig.12-11. Fig. 12-11CHAP. 12 LANGUAGES, AUTOMATA, GRAMMARS 321 GRAMMARS 12.47. Determine the type of grammar G which consists of the productions: (a)S → aAB; S → AB; A → a; B → b (b)S → aB; B → AB; aA→ b; A→ a; B → b (c)S → aB; B → bB; B → bA; A→ a; B → b 12.48. Find a regular grammar G which generates the language L which consists of all words in a and b such that no two a’s appear next to each other. 12.49. Find a context-free grammar G which generates the language L which consists of all words in a and b with twice as many a’s as b’s. 12.50. Find a grammar G which generates the language L which consists of all words in a and b with an even number of a’s. n n 12.51. Find a grammar G which generates the language L which consists of all words of the form a ba with n≥ 0. 12.52. Show that the language G in Problem 12.51 is not regular. (Hint: Use the Pumping Lemma.) 12.53. Describe the language L= L(G) where G has the productions S → aA, A → bbA, A → c. 12.54. Describe the language L= L(G) where G has the productions S → aSb, Sb → bA, abA→ c. 12.55. Write each grammar G in Problem 12.47 in Backus-Naur form. 12.56. Let G be the context-free grammar with productions S → (a, aAS) and A→ bS. (a) Write G in Backus-Naur form. (b) Find the derivation tree of the word w= abaa. 12.57. Figure 12-12 is the derivation tree of a word w in a language L of a context-free grammar G. (a) Find w. (b) Which terminals, variables, and productions must belong to G? Fig. 12-12 Answers to Supplementary Problems (c)Allwordsina,b,c witheachpowerof b evenand each power of c a multiple of 3. 2 2 2 2 4 2 2 3 12.30. (a) uv = ab a ba b ; (b) vu = aba b ab a ; 2 ∗ ∗ 2 2 2 4 2 3 12.37. No. (L ) ⊆ (L ) . (c) u = ab a b a ; (d)lu= u; 2 2 2 2 2 (e) vlv = v = aba b aba b . 12.38. (a) a a a , a a , a a a (b) a a a a a , a a a a , 1 1 1 1 2 2 1 3 1 1 1 1 1 1 1 1 2 12.31. 6, 6, 12, 12, 12. a a a a , a a a a , a a a a , a a a , a a a , 1 1 2 1 1 2 1 1 2 1 1 1 1 1 3 1 3 1 a a a , a a , a a , a a , a a , a 12.32. (a) l,a, b, c, d, e, ab, bc, cd, de, abc, bcd, cde, 3 1 1 2 3 3 2 1 4 4 1 5 n abcd, bcde, w= abcde. 12.39. (a) L(r)=ab c n ≥ 0. (b) All words in x and c n (b)l,a,ab,abc,abcd,w = abcde. where x = ab. (c) L(r)= ab∪c n≥ 0. ∗ ∗ ∗ ∗ ∗ ∗ 12.33. If u = l then n = 1; otherwise, n = 1+r + ∗ ∗ ∗ 12.40. (a) r = b ab ab ab ; (b) r = (b ab ab ab ) . (r − 1)+···+ 2+ 1= 1+ r(r + 1)/2. 12.41. (a) No; (b) yes; (c) no. 3 3 2 2 3 12.34. (a) LK=a ,a b, a b , aba, abab, ab ; 3 2 2 2 2 2 12.42. (a) Yes; (b) no; (c) no. (b) KL=a ,a b, aba , abab, b a ,b ab; 2 2 (c)L∨K=a ,ab,a,b ;(d)K∧L not defined. 12.43. See: (a) Fig.12-13(a); (b) Fig.12-13(b). 2 0 2 4 3 12.35. (a) L =l; (b) L =a ,a b, aba ,abab; 12.44. See: (a) Fig.12-14; (b) Fig.12-15(a). 2 4 3 3 6 5 3 3 (c) L =a ,a b, a ba ,a bab, aba ,aba b, 2 12.45. See: Fig.12-15(b). ababa , ababab. n 12.36. (a) L∗=a n is even. (b)All words w in a and b 12.46. L(M) consists of all words w which contain aabb as with only even powers of b. a subword.322 LANGUAGES, AUTOMATA, GRAMMARS CHAP. 12 Fig. 12-13 Fig. 12-14 Fig. 12-15 12.47. (a) Type 2; (b) Type 0; (c) Type 3. (c))S::= a)B,)B::= b)B b)A,)A::= a b. 12.48. S → (a,b,aB,bA), A → (bA, ab, a, b), B → 12.56. (a) )S ::= a a)A)S, )A ::= b)S; (b) See (b, bA). Fig. 12-16. 12.49. S → (AAB, ABA, BAA), A → (a, BAAA, ABAA, AABA, AAAB), B → (b, BBAA, BABA, aBAAB, ABAB, AABB). 12.50. S → (aA, bB), A → (aB, bA, a), B → (bB, aA, b) 12.51. S → (aSa, b). 2n 12.53. L=ab c n≥ 0 n n Fig. 12-16 12.54. L=a cb n 0 12.55. (a))S::= a)A)B)A)B,)A::= a, )B::= b. (b) Not defined for Type 0 language. 12.57. (a)w = aababa; (b)S → aAB, A → aB, B → ba.

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.