Lab Manual for Compiler Design

lab manual for principles of compiler design and lab manual for compiler design for anna university
KomalMittal Profile Pic
KomalMittal,India,Teacher
Published Date:13-07-2017
Your Website URL(Optional)
Comment
Laboratory Manual for Compiler Design Robb T. KoetherLaboratory 1 Getting Started Key Concepts  The Cygwin window  Environment variables  The Java compiler  The JLex lexical analyzer generator  The CUP parser generator  The WinZip program Before the Lab Read Chapter 1 of Compilers: Principles, Techniques, and Tools. Preliminary In your folder in //hams-acad-fs/Students, create a folder named Coms 480. Keep all of your work for this course in this folder. Copy the folder Lab 01 from the Compiler Design CD to your folder. 1314 LABORATORY 1. GETTING STARTED 1.1 Introduction In this lab we will download and install a number of programs. The purpose of this is twofold. You will be more aware of the setup that we will be using and you will be able to set up the same software on your own computer. 1.2 The Cygwin Window Throughout this course we will use the Cygwin command window. You may use the DOS command window if you want, but I think it would be better if you gained experience with a UNIX-type system. An exception to this will be editing source text. The UNIX editors are truly awful. Even though there is some bene t in learning how to use them, I recommend that you use CodeWarrior to edit your Java source les in this course. Cygwin creates a UNIX-type environment for Windows. A large number of the standard UNIX commands are available. Cygwin has already been installed on your computer. You will see the Cygwin icon on the desktop, so we will not download it now. However, to download Cygwin, go to the web site http://www.cygwin.com/ If you download Cygwin in your room, you should go to this web page and click on the install icon. The program setup.exe will be downloaded. When you run the setup program, one of the familiar installer programs will start up, asking youseveral questions. Generally, you should go with the defaults. However, when you choose which packages to install, be sure to select \install" for the development (Devel) package. You will get a minimal install plus the development tools. The program will go through three stages: downloading, installing, and executing. These stages should take roughly 25 minutes, 10 minutes, and 1 minute, respectively. If the installation fails, then try again. After Cygwin is installed, double-click on the Cygwin icon on the desktop to start the Cygwin window. Then right-click on the title bar and select Properties. You may change the font, the size of the window, and the colors. My preference is to choose a small font (12 pt) and then make the window as wide and as tall as possible. Type the command pwd (print working directory) to see the pathname of the current directory.15 Type the command dir (directory) to see the names of all the les in the current directory. Choose the name of one of the subdirectories in the current directory and type the command cd directory-name where directory-name is the name of the subdirectory that you chose. Now repeat the commands pwd and dir. Type the command cd .. to return to the previous directory. A single period (.) refers to the current directory and two periods (..) refers to the parent directory. For example, to move up two levels and then down to a directory called programs, you could type cd ../../programs 1.3 The HOME Environment Variable The Cygwin window has opened to a default directory. Use pwd to see what it is. You will nd it very convenient to set the HOME environment variable to your Coms 480 folder. Then Cygwin will always start there when you open a Cygwin window and you can always return there by typing cd To make this the home directory, bring up the System control panel. Click on the Advanced tab and then click on Environment Variables. In the top section, named User Variables, click New. Enter HOME as the Variable name and type the exact pathname of your Coms 480 directory as the Variable value. If you open a window to that directory, then you should be able to copy and paste the pathname. Be sure to use the backslash as a separator between directories. Then click OK (on all three windows) to save the settings. Now close the Cygwin window and open a new one. (This is necessary in order to reinitialize the environment variables.) This window should have opened to your folder. You can con rm that by typing pwd. From now on, Cygwin will begin in this folder.16 LABORATORY 1. GETTING STARTED 1.4 The Java Development Kit The latest version of the Java Development Kit (JDK) is update 1.6, release 11. Go to the website http://java.sun.com/javase/downloads/index.jsp Next to the title \Java SE Development Kit (JDK) 6 Update 11," click on Download. Follow the instructions through the next two web pages. (Do not download the Sun Download Manager.) When you are nished there should be a Java folder in the Program Files folder of the C drive. Inside one of the subfolders is the javac.exe compiler. 1.5 Running Java Programs Change the current directory to Lab 01 if you are not already there. I have written a Java program that prints "Hello, World" to standard output. It is in the Lab 01 folder that you downloaded. Type the command javac Hello.java You probably will get an error message, because the computer could not nd the Java compiler javac.exe. Open the System program on the Control Panel and go the Environment Variables window again. This time we must add a PATH variable. The PATH variable tells the computer where to nd executable les, including the Java compiler. You may have to search for the Java compiler. Indeed, there may be more than one on the computer. We will use the one in the folder named C:nProgram FilesnJavanjdk1.6.0 11nbin. Once you know where it is, then create the PATH variable with this pathname as its value, as you did the HOME variable earlier. Close the Cygwin window and open a new one. Now try again to compile the program. This time the program should compile. Type dir to see that the le Hello.class is in the directory. This is the compiled Java program. Now run the program by typing java Hello The program should print "Hello, world" This con rms that you are set up to run Java programs in the Cygwin window.17 1.6 The Java API Our programs will be written Java. If you know C++, then you should have no trouble picking up Java since it is quite similar. The two languages use mostly the same keywords, same constructs, and the same syntax. One major di erence, however, is that Java is heavily object-oriented. Every function must be a member function of some class. Another di erence is that Java comes with an extensive library of classes. Open Internet Explorer and go to the website http://java.sun.com/j2se/1.4.2/docs/api/ Save this as a bookmark. You will want to return here many times later in the course. This web site contains the documentation for all Java classes. For example, in the upper left frame, scroll down and click on java.lang. In the window below, the names of all the classes in the java.lang package appear. Click on Integer. To the right you see the information about the Integer class.  In what ways can an Integer be constructed?  How does one convert an Integer to a double? Another di erence between Java and C++ is that Java is weak on operators. Oper- ators are de ned only for the primitive objects: int, float, double, char, etc.  How do you compare one Integer to another?  How do you add two Integers and represent the result as an Integer? Remember, there are no operators + or for the Integer class 1.7 The JLex Java-based Lexical Analyzer Generator Be sure you are now in the Coms 480 folder (not Lab 01). Open another instance of Internet Explorer and go to the web site http://www.cs.princeton.edu/appel/modern/java/JLex/18 LABORATORY 1. GETTING STARTED This is the web site for JLex, the Java lexical analyzer generator. Click on Installa- tion Instructions. Read these instructions carefully. The directory \J" will be your Coms 480 folder. Create a subfolder named JLex. You can either do this in Windows or you can type the command mkdir JLex When you are ready, click on Source Code and save it in your JLex folder. The Java code for JLex (Main.java) should now be in the folder JLex. Move to the JLex directory and compile JLex by typing javac Main.java Now we will test our installation by using a test program available from the JLex web page. On the web page, click on Sample Input. Copy and paste the contents of the page into a new le in CodeWarrior and save it in Lab 01 as sample.lex. This le must be edited slightly in order to work on our PCs. In DOS les, each line ends with a return character \r followed by a newline character \n. In UNIX les, each line ends with only a newline character. Go to line 116 in sample.lex: YYINITIAL,COMMENT \n Immediately below this line, add a similar line, replacing \n with \r. Now type java JLex.Main sample.lex It probably did not work. The notation JLex.Main means the Main.class le in the JLex folder. The program javac could not nd the le Main.java. To solve this problem, we must de ne the CLASSPATH variable. The CLASSPATH variable tells the Java compiler where to look for Java source code les. Set this variable as before, setting the value of CLASSPATH to .;\\hams-acad-fs\Students\your-name\Coms 480 and restart the Cygwin window. Be sure to replace your-name with the name of your workspace in the Students directory. Note the initial dot (.). This refers to the current directory. Thus, Cygwin will search the current directory rst. The semicolon is a separator. Cygwin will next search19 \\hams-acad-fs\Students\your-name\Coms 480 Now type java JLex.Main sample.lex again. It should work. This creates the le sample.lex.java, which contains Java source code for the classes Sample, Utility, Yytoken, and Yylex. Then type javac sample.lex.java to compile this, creating the les Sample.class, Utility.class, Yytoken.class, and Yylex.class. You will get an error message (7 of them) about the assert keyword. This program creates an assert() function, but the latest versions of Java use assert as a keyword. Go back to the le and change assert to myAssert. Then recreate the le sample.lex.java and compile it with javac. Now we can test our program. Type java Sample Enter various lines of C code and see what the output is. Enter a line that contains an embedded /-style comment. When you are satis ed, type CTRL-Z (as many times as necessary) to indicate end of le. The program should terminate. 1.8 The CUP Java-based Parser Generator Open one more instance of Internet Explorer and go to the web site http://www2.cs.tum.edu/projects/cup/ This is the web site for CUP, the Java parser generator. CUP stands for Constructor of Useful Parsers. It also is a play on the java theme. Get it? CUP? Java? Cup of java? We will save the CUP les in a CUP directory. Create a CUP directory now as a subdirectory of your Coms 480 directory. On the web page, in the section \Archived versions," click on CUP 10k sourcecode release. This is the most recent stable version. Then click on Save. Save it in your CUP folder. This will download a zip le to be unzipped. All the les needed for CUP will be extracted. Double-click on20 LABORATORY 1. GETTING STARTED the le java cup v10k.tar.gz. Follow the Zip instructions. You should save the extracted les in your CUP folder. The Java classes in CUP are already compiled. We will now test CUP using a slightly modi ed version of the sample program that appears in the CUP User's Manual. This example creates a program that evaluates integer expressions involving +, -, , /, %, and parentheses. In the Lab 01 folder, there are the les scanner.java, grammar.cup, and Evaluator.java. Type the command java java_cup.Main grammar.cup Again, this probably did not work. The lename java cup.Main refers to the le Main.class in the subdirectory java cup of the CUP directory. Java does not know to look in the CUP directory for Java class les. Therefore, we must add the pathname of the CUP directory to the paths to be searched in the CLASSPATH environment variable. Make this change, close the Cygwin window, and open a new one. Try again to execute the command java java_cup.Main grammar.cup It should work. This will create the Java source les parser.java and sym.java from the grammar le grammar.cup. Compile these two les and then compile the les scanner.java and Evaluator.java. Run the evaluator program by typing java Evaluator The program accepts keyboard input. Type in an integer expression such as 2 + 3 4; Be sure to end the expression with a semicolon. The program will print the value of the expression. When you are nished, type CTRL-Z.21 1.9 Zipping Files Most of your assignment will involve a large number of source les. Rather than drag each one individually to the dropbox, you will place your work in a folder, zip the folder into a zip le, and drop the zip le in the dropbox. I will unzip it and test it. Let's test WinZip. We will zip the les used in the last two examples, namely, the les Sample.class, Utility.class, Yylex.class, Yytoken.class, scanner.java, parser.java, sym.java, and Evaluator.java, save the le as Lab 01.zip, and then unzip them into a test folder. To do this, start up WinZip and follow the instructions. Use the Wizard version of WinZip. Select Create a new Zip le. Give it the name Lab 01. Add the speci ed les by repeatedly clicking Add les. . . and selecting the les. Have the output directed to your Coms 480 folder. Then click Zip Now and exit WinZip. Next create a folder named Test in which to put the extracted les. Double-click on the zip le and follow the instructions to extract the les. Direct the output to folder Test. Open Test to verify that the original les are there. Now test the results by recompiling the les (be sure to change the directory to Test in Cygwin) and running sample and Evaluator again. 1.10 Assignment Turn in the le Lab 01.zip.22 LABORATORY 1. GETTING STARTEDPart II Lexical Analysis 23Laboratory 2 Writing a Lexical Analyzer Key Concepts  Lexical Analyzers  Redirected input and output  Make les Before the Lab Read Sections 3.1 - 3.4 of Compilers: Principles, Techniques, and Tools. Preliminary Copy the folder Lab 02 from the Compiler Design CD to your Coms 480 folder. 2.1 Introduction In this lab we will create a lexical analyzer that will return the tokens that occur in statements of the form id =expr; where expr is an in x expression consisting of integers, identi ers, parentheses, and the operators +, -, , and /. Once we understand how this is done, we will be able to create a lexer for a larger set of C tokens. 2526 LABORATORY 2. WRITING A LEXICAL ANALYZER 2.2 Make les Open the le Makefile. A make le consists mainly of a list of dependencies and actions. A dependency is written in the form target: sources action where target is a le name and sources is a list of le names. (The tab before the action part is mandatory.) This means that the target le depends on the source les. Whenever any of the source les is updated, then the target le will be updated by performing the action. This is all governed by the timestamps on the les. The make le for Lab 2 contains the dependencies among the les used by this program. In this case, it is very simple: there are only four les and two depen- dencies. The le MyLexer.class depends on the le MyLexer.java and the le Token.class depends (in the same way) on the le Token.java. That means that whenever MyLexer.java is updated, i.e., modi ed, then MyLexer.class should also be updated and whenever Token.java is updated, then Token.class should also be updated. In the make le, the line MyLexer.class: MyLexer.java expresses that dependency. The line below that, javac MyLexer.java describes the action to be taken whenever the timestamp of MyLexer.java is more recent than the timestamp of MyLexer.class. Note that this line begins with a mandatory tab. A similar pair of lines appears for Token.class and Token.java. As our programs become more and more complicated, you will come to appreciate the make les more and more. In Lab 3 the make le will be more sophisticated. To invoke the make le, type the command make Try this now. You should see that the Java compiler is invoked and MyLexer and Token are compiled. Type the command again and you will see that it says that MyLexer is up to date, so it does not recompile it. Now we will delete the le MyLexer.class and rebuild it. Type the command27 rm MyLexer.class This removes the le. Now execute the make command. Let's do it one more time. This time we will not remove MyLexer.class, but merely change the timestamp of MyLexer.java. The touch command will change the timestamp of a le to the current time. Type touch MyLexer.java and then type the make command again. 2.3 Running the Lexer Open the leMyLexer.java in CodeWarrior. This is a Java program that nds certain tokens in the input stream. Currently if nds only positive integers, plus signs (+), and times signs (). Look in the Lab 02 folder and see that there is now a le named MyLexer.class. This is the compiled bytecode version of MyLexer.java. Now we will run MyLexer. Type java MyLexer The program expects input from the keyboard, so type 123 + 456 789 The program should nd the ve tokens in this expression. Then type CTRL-Z and press return again. CTRL-Z is interpreted as end-of- le. This program does not recognize any grammar rules; those will come later. There- fore, any string of legal tokens will be processed correctly. For example, try 123456++++ It also skips white space, so it will accept the input 1 23 45 6 +++ + We wish to expand the lexer so that it will recognize input such as avg_grade = (3test + 2exam)/5; Run MyLexer again, using this input to see what the output is.28 LABORATORY 2. WRITING A LEXICAL ANALYZER 2.4 Understanding the Lexer Before improving the program, let's take a look at how it works. First we will look at the tokens. Open the le Token.java. The purpose of the Token class is to provide a list of symbolic constants to be used by the lexer. The Token class also provides a set of strings so that we can print the name of the token in an readable form. Now look in the le MyLexer.java. The program creates a BufferedReader object named source. Look at the function getNextChar(). It gets an integer iVal from source and then converts it to a character cVal. Look at the function advance(). Once a character has been processed, we place it in a character bu er and read another character. The purpose of the character bu er is to store the \value" of the token as a character string for use in the cases when the token is an identi er or a number. For example, if the token is radius, then the type is ID and the value is "radius", and if the token is 123, then the type is NUM and the value is "123". To clear the bu er, we simply set charCnt to 0. The main function initializes the lexer and then processes tokens by repeatedly calling next token() until it returns the EOF token. Note the use of the expression new String(buffer, 0, charCnt) to convert the contents of the bu er to a String. Go to the Java web page, access the String page, and look at the String constructors. The URL is http://java.sun.com/j2se/1.4.2/docs/api/ Find the constructor that is used here. You should develop the habit of referring to the Java API pages as often as necessary, i.e., often. You will nd the answers to many of your Java questions there. The heart of the MyLexer class is the next token() function. By looking at the current character cVal, next token() is able to decide which type of token is being read. It processes the token and returns the token type. 2.5 Assignment For the \toy" lexer that we are building in this lab, the complete set of tokens is shown in Table 2.1.29 Token Symbolic Name Description identi er ID a letter followed by zero or more letters, digits, and underscores number NUM one or more digits + PLUS - MINUS TIMES / DIVIDE ( LPAREN ) RPAREN = ASSIGN ; SEMI Table 2.1: Tokens used in Lab 2 Complete the MyLexer class by adding code that will recognize each of these types of token. After you have nished the lexer, test it on the le testfile. The lexer uses standard input (keyboard) and standard output (monitor), but you may redirect them to les. To read input from the le testfile, type java MyLexer testfile To redirect output to a le named, say, outfile.txt, type java MyLexer outfile.txt Or you can do both at the same time. java MyLexer testfile outfile.txt Type this command and then open outfile.txt in CodeWarrior or Notepad and inspect it. When the output is complicated, this method allows you to inspect it at your leisure. Or you can print it and inspect it later. Zip the les MyLexer.java, Token.java, and Makefile in a folder named Lab 02 and drop it in the dropbox.30 LABORATORY 2. WRITING A LEXICAL ANALYZERLaboratory 3 A Lexical Analyzer using JLex Key Concepts  Lexical analyzer generators  JLex  The JLex interface  Regular expressions Before the Lab Read Sections 3.5 - 3.9 of Compilers: Principles, Techniques, and Tools. Also read the JLex User's Manual. Preliminary Copy the folder Lab 03 from the Compiler Design CD to your folder. 3.1 Introduction In this lab we will create the same lexical analyzer as in Lab 2, but by using JLex and a le tokens.lex. The le tokens.lex contains the rules for recognizing tokens and the actions to take for each kind of token. JLex will use these rules to build a Java program that will be a lexical analyzer. The rules in the le tokens.lex are regular expressions and the lexical analyzer that JLex builds is a DFA. By inspecting 31