Question? Leave a message!




Lecture notes Compiler Design

lecture notes on principles of compiler design and advanced compiler design and implementation lecture notes | pdf free download
GeorgeThomson Profile Pic
GeorgeThomson,Greenland,Professional
Published Date:09-07-2017
Your Website URL(Optional)
Comment
BasicsofCompilerDesign Anniversaryedition Torben Ægidius Mogensen DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF COPENHAGENPublished throughlulu.com. c Torben Ægidius Mogensen 2000 – 2010 torbenmdiku.dk Department of Computer Science University of Copenhagen Universitetsparken 1 DK-2100 Copenhagen DENMARK Book homepage: http://www.diku.dk/torbenm/Basics First published 2000 This edition: August 20, 2010 ISBN 978-87-993154-0-6Contents 1 Introduction 1 1.1 What is a compiler? . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 The phases of a compiler . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Interpreters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Why learn about compilers? . . . . . . . . . . . . . . . . . . . . 4 1.5 The structure of this book . . . . . . . . . . . . . . . . . . . . . . 5 1.6 To the lecturer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.8 Permission to use . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Lexical Analysis 9 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Regular expressions . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Shorthands . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Nondeterministic finite automata . . . . . . . . . . . . . . . . . . 15 2.4 Converting a regular expression to an NFA . . . . . . . . . . . . . 18 2.4.1 Optimisations . . . . . . . . . . . . . . . . . . . . . . . . 20 2.5 Deterministic finite automata . . . . . . . . . . . . . . . . . . . . 22 2.6 Converting an NFA to a DFA . . . . . . . . . . . . . . . . . . . . 23 2.6.1 Solving set equations . . . . . . . . . . . . . . . . . . . . 23 2.6.2 The subset construction . . . . . . . . . . . . . . . . . . . 26 2.7 Size versus speed . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.8 Minimisation of DFAs . . . . . . . . . . . . . . . . . . . . . . . 30 2.8.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.8.2 Dead states . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.9 Lexers and lexer generators . . . . . . . . . . . . . . . . . . . . . 35 2.9.1 Lexer generators . . . . . . . . . . . . . . . . . . . . . . 41 2.10 Properties of regular languages . . . . . . . . . . . . . . . . . . . 42 2.10.1 Relative expressive power . . . . . . . . . . . . . . . . . 42 2.10.2 Limits to expressive power . . . . . . . . . . . . . . . . . 44 iii CONTENTS 2.10.3 Closure properties . . . . . . . . . . . . . . . . . . . . . 45 2.11 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3 Syntax Analysis 53 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2 Context-free grammars . . . . . . . . . . . . . . . . . . . . . . . 54 3.2.1 How to write context free grammars . . . . . . . . . . . . 56 3.3 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.1 Syntax trees and ambiguity . . . . . . . . . . . . . . . . . 60 3.4 Operator precedence . . . . . . . . . . . . . . . . . . . . . . . . 63 3.4.1 Rewriting ambiguous expression grammars . . . . . . . . 64 3.5 Other sources of ambiguity . . . . . . . . . . . . . . . . . . . . . 66 3.6 Syntax analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.7 Predictive parsing . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.8 Nullable and FIRST . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.9 Predictive parsing revisited . . . . . . . . . . . . . . . . . . . . . 73 3.10 FOLLOW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.11 A larger example . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.12 LL(1) parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.12.1 Recursive descent . . . . . . . . . . . . . . . . . . . . . . 80 3.12.2 Table-driven LL(1) parsing . . . . . . . . . . . . . . . . . 81 3.12.3 Conflicts . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.13 Rewriting a grammar for LL(1) parsing . . . . . . . . . . . . . . 84 3.13.1 Eliminating left-recursion . . . . . . . . . . . . . . . . . 84 3.13.2 Left-factorisation . . . . . . . . . . . . . . . . . . . . . . 86 3.13.3 Construction of LL(1) parsers summarized . . . . . . . . 87 3.14 SLR parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.15 Constructing SLR parse tables . . . . . . . . . . . . . . . . . . . 90 3.15.1 Conflicts in SLR parse-tables . . . . . . . . . . . . . . . 94 3.16 Using precedence rules in LR parse tables . . . . . . . . . . . . . 95 3.17 Using LR-parser generators . . . . . . . . . . . . . . . . . . . . . 98 3.17.1 Declarations and actions . . . . . . . . . . . . . . . . . . 99 3.17.2 Abstract syntax . . . . . . . . . . . . . . . . . . . . . . . 99 3.17.3 Conflict handling in parser generators . . . . . . . . . . . 102 3.18 Properties of context-free languages . . . . . . . . . . . . . . . . 104 3.19 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105CONTENTS iii 4 Scopes and Symbol Tables 113 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 4.2 Symbol tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 4.2.1 Implementation of symbol tables . . . . . . . . . . . . . . 115 4.2.2 Simple persistent symbol tables . . . . . . . . . . . . . . 115 4.2.3 A simple imperative symbol table . . . . . . . . . . . . . 117 4.2.4 Efficiency issues . . . . . . . . . . . . . . . . . . . . . . 117 4.2.5 Shared or separate name spaces . . . . . . . . . . . . . . 118 4.3 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5 Interpretation 121 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.2 The structure of an interpreter . . . . . . . . . . . . . . . . . . . 122 5.3 A small example language . . . . . . . . . . . . . . . . . . . . . 122 5.4 An interpreter for the example language . . . . . . . . . . . . . . 124 5.4.1 Evaluating expressions . . . . . . . . . . . . . . . . . . . 124 5.4.2 Interpreting function calls . . . . . . . . . . . . . . . . . 126 5.4.3 Interpreting a program . . . . . . . . . . . . . . . . . . . 128 5.5 Advantages and disadvantages of interpretation . . . . . . . . . . 128 5.6 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 6 Type Checking 133 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 6.2 The design space of types . . . . . . . . . . . . . . . . . . . . . . 133 6.3 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.4 Environments for type checking . . . . . . . . . . . . . . . . . . 135 6.5 Type checking expressions . . . . . . . . . . . . . . . . . . . . . 136 6.6 Type checking of function declarations . . . . . . . . . . . . . . . 138 6.7 Type checking a program . . . . . . . . . . . . . . . . . . . . . . 139 6.8 Advanced type checking . . . . . . . . . . . . . . . . . . . . . . 140 6.9 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 7 Intermediate-Code Generation 147 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.2 Choosing an intermediate language . . . . . . . . . . . . . . . . . 148 7.3 The intermediate language . . . . . . . . . . . . . . . . . . . . . 150 7.4 Syntax-directed translation . . . . . . . . . . . . . . . . . . . . . 151 7.5 Generating code from expressions . . . . . . . . . . . . . . . . . 152 7.5.1 Examples of translation . . . . . . . . . . . . . . . . . . . 155iv CONTENTS 7.6 Translating statements . . . . . . . . . . . . . . . . . . . . . . . 156 7.7 Logical operators . . . . . . . . . . . . . . . . . . . . . . . . . . 159 7.7.1 Sequential logical operators . . . . . . . . . . . . . . . . 160 7.8 Advanced control statements . . . . . . . . . . . . . . . . . . . . 164 7.9 Translating structured data . . . . . . . . . . . . . . . . . . . . . 165 7.9.1 Floating-point values . . . . . . . . . . . . . . . . . . . . 165 7.9.2 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 7.9.3 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 7.9.4 Records/structs and unions . . . . . . . . . . . . . . . . . 171 7.10 Translating declarations . . . . . . . . . . . . . . . . . . . . . . . 172 7.10.1 Example: Simple local declarations . . . . . . . . . . . . 172 7.11 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 8 Machine-Code Generation 179 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 8.2 Conditional jumps . . . . . . . . . . . . . . . . . . . . . . . . . . 180 8.3 Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 8.4 Exploiting complex instructions . . . . . . . . . . . . . . . . . . 181 8.4.1 Two-address instructions . . . . . . . . . . . . . . . . . . 186 8.5 Optimisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 8.6 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 9 Register Allocation 191 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 9.2 Liveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 9.3 Liveness analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 193 9.4 Interference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 9.5 Register allocation by graph colouring . . . . . . . . . . . . . . . 199 9.6 Spilling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 9.7 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 9.7.1 Removing redundant moves . . . . . . . . . . . . . . . . 205 9.7.2 Using explicit register numbers . . . . . . . . . . . . . . 205 9.8 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 10 Function calls 209 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 10.1.1 The call stack . . . . . . . . . . . . . . . . . . . . . . . . 209 10.2 Activation records . . . . . . . . . . . . . . . . . . . . . . . . . . 210 10.3 Prologues, epilogues and call-sequences . . . . . . . . . . . . . . 211CONTENTS v 10.4 Caller-saves versus callee-saves . . . . . . . . . . . . . . . . . . 213 10.5 Using registers to pass parameters . . . . . . . . . . . . . . . . . 215 10.6 Interaction with the register allocator . . . . . . . . . . . . . . . . 219 10.7 Accessing non-local variables . . . . . . . . . . . . . . . . . . . 221 10.7.1 Global variables . . . . . . . . . . . . . . . . . . . . . . 221 10.7.2 Call-by-reference parameters . . . . . . . . . . . . . . . . 222 10.7.3 Nested scopes . . . . . . . . . . . . . . . . . . . . . . . . 223 10.8 Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 10.8.1 Variable-sized frames . . . . . . . . . . . . . . . . . . . . 226 10.8.2 Variable number of parameters . . . . . . . . . . . . . . . 227 10.8.3 Direction of stack-growth and position of FP . . . . . . . 227 10.8.4 Register stacks . . . . . . . . . . . . . . . . . . . . . . . 228 10.8.5 Functions as values . . . . . . . . . . . . . . . . . . . . . 228 10.9 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 11 Analysis and optimisation 231 11.1 Data-flow analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 232 11.2 Common subexpression elimination . . . . . . . . . . . . . . . . 233 11.2.1 Available assignments . . . . . . . . . . . . . . . . . . . 233 11.2.2 Example of available-assignments analysis . . . . . . . . 236 11.2.3 Using available assignment analysis for common subex- pression elimination . . . . . . . . . . . . . . . . . . . . 237 11.3 Jump-to-jump elimination . . . . . . . . . . . . . . . . . . . . . 240 11.4 Index-check elimination . . . . . . . . . . . . . . . . . . . . . . 241 11.5 Limitations of data-flow analyses . . . . . . . . . . . . . . . . . . 244 11.6 Loop optimisations . . . . . . . . . . . . . . . . . . . . . . . . . 245 11.6.1 Code hoisting . . . . . . . . . . . . . . . . . . . . . . . . 245 11.6.2 Memory prefetching . . . . . . . . . . . . . . . . . . . . 246 11.7 Optimisations for function calls . . . . . . . . . . . . . . . . . . . 248 11.7.1 Inlining . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 11.7.2 Tail-call optimisation . . . . . . . . . . . . . . . . . . . . 250 11.8 Specialisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 11.9 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 12 Memory management 257 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 12.2 Static allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 12.2.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . 258 12.3 Stack allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . 258vi CONTENTS 12.4 Heap allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 12.5 Manual memory management . . . . . . . . . . . . . . . . . . . 259 12.5.1 A simple implementation ofmalloc() andfree() . . . . 260 12.5.2 Joining freed blocks . . . . . . . . . . . . . . . . . . . . 263 12.5.3 Sorting by block size . . . . . . . . . . . . . . . . . . . . 264 12.5.4 Summary of manual memory management . . . . . . . . 265 12.6 Automatic memory management . . . . . . . . . . . . . . . . . . 266 12.7 Reference counting . . . . . . . . . . . . . . . . . . . . . . . . . 266 12.8 Tracing garbage collectors . . . . . . . . . . . . . . . . . . . . . 268 12.8.1 Scan-sweep collection . . . . . . . . . . . . . . . . . . . 269 12.8.2 Two-space collection . . . . . . . . . . . . . . . . . . . . 271 12.8.3 Generational and concurrent collectors . . . . . . . . . . 273 12.9 Summary of automatic memory management . . . . . . . . . . . 276 12.10Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 13 Bootstrapping a compiler 281 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 13.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 13.3 Compiling compilers . . . . . . . . . . . . . . . . . . . . . . . . 283 13.3.1 Full bootstrap . . . . . . . . . . . . . . . . . . . . . . . . 285 13.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 A Set notation and concepts 291 A.1 Basic concepts and notation . . . . . . . . . . . . . . . . . . . . . 291 A.1.1 Operations and predicates . . . . . . . . . . . . . . . . . 291 A.1.2 Properties of set operations . . . . . . . . . . . . . . . . . 292 A.2 Set-builder notation . . . . . . . . . . . . . . . . . . . . . . . . . 293 A.3 Sets of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 A.4 Set equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 A.4.1 Monotonic set functions . . . . . . . . . . . . . . . . . . 295 A.4.2 Distributive functions . . . . . . . . . . . . . . . . . . . . 296 A.4.3 Simultaneous equations . . . . . . . . . . . . . . . . . . 297 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297List of Figures 2.1 Regular expressions . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Some algebraic properties of regular expressions . . . . . . . . . 14 2.3 Example of an NFA . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4 Constructing NFA fragments from regular expressions . . . . . . 19  2.5 NFA for the regular expression (ajb) ac . . . . . . . . . . . . . . 20 2.6 Optimised NFA construction for regular expression shorthands . . 21 + 2.7 Optimised NFA for0-9 . . . . . . . . . . . . . . . . . . . . . 21 2.8 Example of a DFA . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.9 DFA constructed from the NFA in figure 2.5 . . . . . . . . . . . . 29 2.10 Non-minimal DFA . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.11 Minimal DFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.12 Combined NFA for several tokens . . . . . . . . . . . . . . . . . 38 2.13 Combined DFA for several tokens . . . . . . . . . . . . . . . . . 39 2.14 A 4-state NFA that gives 15 DFA states . . . . . . . . . . . . . . 44 3.1 From regular expressions to context free grammars . . . . . . . . 56 3.2 Simple expression grammar . . . . . . . . . . . . . . . . . . . . 57 3.3 Simple statement grammar . . . . . . . . . . . . . . . . . . . . . 57 3.4 Example grammar . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.5 Derivation of the stringaabbbcc using grammar 3.4 . . . . . . . 59 3.6 Leftmost derivation of the stringaabbbcc using grammar 3.4 . . . 59 3.7 Syntax tree for the stringaabbbcc using grammar 3.4 . . . . . . . 61 3.8 Alternative syntax tree for the stringaabbbcc using grammar 3.4 . 61 3.9 Unambiguous version of grammar 3.4 . . . . . . . . . . . . . . . 62 3.10 Preferred syntax tree for2+34 using grammar 3.2 . . . . . . . . 63 3.11 Unambiguous expression grammar . . . . . . . . . . . . . . . . . 66 3.12 Syntax tree for2+34 using grammar 3.11 . . . . . . . . . . . . . 67 3.13 Unambiguous grammar for statements . . . . . . . . . . . . . . . 68 3.14 Fixed-point iteration for calculation of Nullable . . . . . . . . . . 71 3.15 Fixed-point iteration for calculation of FIRST . . . . . . . . . . . 72 3.16 Recursive descent parser for grammar 3.9 . . . . . . . . . . . . . 81 viiviii LISTOFFIGURES 3.17 LL(1) table for grammar 3.9 . . . . . . . . . . . . . . . . . . . . 82 3.18 Program for table-driven LL(1) parsing . . . . . . . . . . . . . . 83 3.19 Input and stack during table-driven LL(1) parsing . . . . . . . . . 83 3.20 Removing left-recursion from grammar 3.11 . . . . . . . . . . . . 85 3.21 Left-factorised grammar for conditionals . . . . . . . . . . . . . . 87 3.22 SLR table for grammar 3.9 . . . . . . . . . . . . . . . . . . . . . 90 3.23 Algorithm for SLR parsing . . . . . . . . . . . . . . . . . . . . . 91 3.24 Example SLR parsing . . . . . . . . . . . . . . . . . . . . . . . . 91 3.25 Example grammar for SLR-table construction . . . . . . . . . . . 92 3.26 NFAs for the productions in grammar 3.25 . . . . . . . . . . . . . 92 3.27 Epsilon-transitions added to figure 3.26 . . . . . . . . . . . . . . 93 3.28 SLR DFA for grammar 3.9 . . . . . . . . . . . . . . . . . . . . . 94 3.29 Summary of SLR parse-table construction . . . . . . . . . . . . . 95 3.30 Textual representation of NFA states . . . . . . . . . . . . . . . . 103 5.1 Example language for interpretation . . . . . . . . . . . . . . . . 123 5.2 Evaluating expressions . . . . . . . . . . . . . . . . . . . . . . . 125 5.3 Evaluating a function call . . . . . . . . . . . . . . . . . . . . . . 127 5.4 Interpreting a program . . . . . . . . . . . . . . . . . . . . . . . 128 6.1 The design space of types . . . . . . . . . . . . . . . . . . . . . . 134 6.2 Type checking of expressions . . . . . . . . . . . . . . . . . . . . 137 6.3 Type checking a function declaration . . . . . . . . . . . . . . . . 139 6.4 Type checking a program . . . . . . . . . . . . . . . . . . . . . . 141 7.1 The intermediate language . . . . . . . . . . . . . . . . . . . . . 150 7.2 A simple expression language . . . . . . . . . . . . . . . . . . . 152 7.3 Translating an expression . . . . . . . . . . . . . . . . . . . . . . 154 7.4 Statement language . . . . . . . . . . . . . . . . . . . . . . . . . 156 7.5 Translation of statements . . . . . . . . . . . . . . . . . . . . . . 158 7.6 Translation of simple conditions . . . . . . . . . . . . . . . . . . 159 7.7 Example language with logical operators . . . . . . . . . . . . . . 161 7.8 Translation of sequential logical operators . . . . . . . . . . . . . 162 7.9 Translation for one-dimensional arrays . . . . . . . . . . . . . . . 166 7.10 A two-dimensional array . . . . . . . . . . . . . . . . . . . . . . 168 7.11 Translation of multi-dimensional arrays . . . . . . . . . . . . . . 169 7.12 Translation of simple declarations . . . . . . . . . . . . . . . . . 173 8.1 Pattern/replacement pairs for a subset of the MIPS instruction set . 185 9.1 Gen and kill sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 9.2 Example program for liveness analysis and register allocation . . . 195LISTOFFIGURES ix 9.3 succ, gen and kill for the program in figure 9.2 . . . . . . . . . . . 196 9.4 Fixed-point iteration for liveness analysis . . . . . . . . . . . . . 197 9.5 Interference graph for the program in figure 9.2 . . . . . . . . . . 198 9.6 Algorithm 9.3 applied to the graph in figure 9.5 . . . . . . . . . . 202 9.7 Program from figure 9.2 after spilling variable a . . . . . . . . . . 203 9.8 Interference graph for the program in figure 9.7 . . . . . . . . . . 203 9.9 Colouring of the graph in figure 9.8 . . . . . . . . . . . . . . . . 204 10.1 Simple activation record layout . . . . . . . . . . . . . . . . . . . 211 10.2 Prologue and epilogue for the frame layout shown in figure 10.1 . 212 10.3 Call sequence for x := CALL f(a ;:::;a ) using the frame layout 1 n shown in figure 10.1 . . . . . . . . . . . . . . . . . . . . . . . . . 213 10.4 Activation record layout for callee-saves . . . . . . . . . . . . . . 214 10.5 Prologue and epilogue for callee-saves . . . . . . . . . . . . . . . 214 10.6 Call sequence for x :=CALL f(a ;:::;a ) for callee-saves . . . . . 215 1 n 10.7 Possible division of registers for 16-register architecture . . . . . . 216 10.8 Activation record layout for the register division shown in figure 10.7216 10.9 Prologue and epilogue for the register division shown in figure 10.7 217 10.10Call sequence for x :=CALL f(a ;:::;a ) for the register division 1 n shown in figure 10.7 . . . . . . . . . . . . . . . . . . . . . . . . . 218 10.11Example of nested scopes in Pascal . . . . . . . . . . . . . . . . . 223 10.12Adding an explicit frame-pointer to the program from figure 10.11 224 10.13Activation record with static link . . . . . . . . . . . . . . . . . . 225 10.14Activation records forf andg from figure 10.11 . . . . . . . . . . 225 11.1 Gen and kill sets for available assignments . . . . . . . . . . . . . 235 11.2 Example program for available-assignments analysis . . . . . . . 236 11.3 pred, gen and kill for the program in figure 11.2 . . . . . . . . . . 237 11.4 Fixed-point iteration for available-assignment analysis . . . . . . 238 11.5 The program in figure 11.2 after common subexpression elimination.239 11.6 Equations for index-check elimination . . . . . . . . . . . . . . . 242 11.7 Intermediate code for for-loop with index check . . . . . . . . . . 244 12.1 Operations on a free list . . . . . . . . . . . . . . . . . . . . . . . 261x LISTOFFIGURESChapter 1 Introduction 1.1 What is a compiler? In order to reduce the complexity of designing and building computers, nearly all of these are made to execute relatively simple commands (but do so very quickly). A program for a computer must be built by combining these very simple commands into a program in what is called machine language. Since this is a tedious and error- prone process most programming is, instead, done using a high-level programming language. This language can be very different from the machine language that the computer can execute, so some means of bridging the gap is required. This is where the compiler comes in. A compiler translates (or compiles) a program written in a high-level program- ming language that is suitable for human programmers into the low-level machine language that is required by computers. During this process, the compiler will also attempt to spot and report obvious programmer mistakes. Using a high-level language for programming has a large impact on how fast programs can be developed. The main reasons for this are:  Compared to machine language, the notation used by programming lan- guages is closer to the way humans think about problems.  The compiler can spot some obvious programming mistakes.  Programs written in a high-level language tend to be shorter than equivalent programs written in machine language. Another advantage of using a high-level level language is that the same program can be compiled to many different machine languages and, hence, be brought to run on many different machines. 12 CHAPTER1. INTRODUCTION On the other hand, programs that are written in a high-level language and auto- matically translated to machine language may run somewhat slower than programs that are hand-coded in machine language. Hence, some time-critical programs are still written partly in machine language. A good compiler will, however, be able to get very close to the speed of hand-written machine code when translating well- structured programs. 1.2 The phases of a compiler Since writing a compiler is a nontrivial task, it is a good idea to structure the work. A typical way of doing this is to split the compilation into several phases with well-defined interfaces. Conceptually, these phases operate in sequence (though in practice, they are often interleaved), each phase (except the first) taking the output from the previous phase as its input. It is common to let each phase be handled by a separate module. Some of these modules are written by hand, while others may be generated from specifications. Often, some of the modules can be shared between several compilers. A common division into phases is described below. In some compilers, the ordering of phases may differ slightly, some phases may be combined or split into several phases or some extra phases may be inserted between those mentioned be- low. Lexical analysis This is the initial part of reading and analysing the program text: The text is read and divided into tokens, each of which corresponds to a sym- bol in the programming language, e.g., a variable name, keyword or number. Syntax analysis This phase takes the list of tokens produced by the lexical analysis and arranges these in a tree-structure (called the syntax tree) that reflects the structure of the program. This phase is often called parsing. Type checking This phase analyses the syntax tree to determine if the program violates certain consistency requirements, e.g., if a variable is used but not declared or if it is used in a context that does not make sense given the type of the variable, such as trying to use a boolean value as a function pointer. Intermediate code generation The program is translated to a simple machine- independent intermediate language. Register allocation The symbolic variable names used in the intermediate code are translated to numbers, each of which corresponds to a register in the target machine code.1.3. INTERPRETERS 3 Machine code generation The intermediate language is translated to assembly language (a textual representation of machine code) for a specific machine architecture. Assembly and linking The assembly-language code is translated into binary rep- resentation and addresses of variables, functions, etc., are determined. The first three phases are collectively called the frontend of the compiler and the last three phases are collectively called the backend. The middle part of the compiler is in this context only the intermediate code generation, but this often includes various optimisations and transformations on the intermediate code. Each phase, through checking and transformation, establishes stronger invari- ants on the things it passes on to the next, so that writing each subsequent phase is easier than if these have to take all the preceding into account. For example, the type checker can assume absence of syntax errors and the code generation can assume absence of type errors. Assembly and linking are typically done by programs supplied by the machine or operating system vendor, and are hence not part of the compiler itself, so we will not further discuss these phases in this book. 1.3 Interpreters An interpreter is another way of implementing a programming language. Interpre- tation shares many aspects with compiling. Lexing, parsing and type-checking are in an interpreter done just as in a compiler. But instead of generating code from the syntax tree, the syntax tree is processed directly to evaluate expressions and execute statements, and so on. An interpreter may need to process the same piece of the syntax tree (for example, the body of a loop) many times and, hence, inter- pretation is typically slower than executing a compiled program. But writing an interpreter is often simpler than writing a compiler and the interpreter is easier to move to a different machine (see chapter 13), so for applications where speed is not of essence, interpreters are often used. Compilation and interpretation may be combined to implement a programming language: The compiler may produce intermediate-level code which is then inter- preted rather than compiled to machine code. In some systems, there may even be parts of a program that are compiled to machine code, some parts that are compiled to intermediate code, which is interpreted at runtime while other parts may be kept as a syntax tree and interpreted directly. Each choice is a compromise between speed and space: Compiled code tends to be bigger than intermediate code, which tend to be bigger than syntax, but each step of translation improves running speed. Using an interpreter is also useful during program development, where it is more important to be able to test a program modification quickly rather than run4 CHAPTER1. INTRODUCTION the program efficiently. And since interpreters do less work on the program before execution starts, they are able to start running the program more quickly. Further- more, since an interpreter works on a representation that is closer to the source code than is compiled code, error messages can be more precise and informative. We will discuss interpreters briefly in chapters 5 and 13, but they are not the main focus of this book. 1.4 Why learn about compilers? Few people will ever be required to write a compiler for a general-purpose language like C, Pascal or SML. So why do most computer science institutions offer compiler courses and often make these mandatory? Some typical reasons are: a) It is considered a topic that you should know in order to be “well-cultured” in computer science. b) A good craftsman should know his tools, and compilers are important tools for programmers and computer scientists. c) The techniques used for constructing a compiler are useful for other purposes as well. d) There is a good chance that a programmer or computer scientist will need to write a compiler or interpreter for a domain-specific language. The first of these reasons is somewhat dubious, though something can be said for “knowing your roots”, even in such a hastily changing field as computer science. Reason “b” is more convincing: Understanding how a compiler is built will al- low programmers to get an intuition about what their high-level programs will look like when compiled and use this intuition to tune programs for better efficiency. Furthermore, the error reports that compilers provide are often easier to understand when one knows about and understands the different phases of compilation, such as knowing the difference between lexical errors, syntax errors, type errors and so on. The third reason is also quite valid. In particular, the techniques used for read- ing (lexing and parsing) the text of a program and converting this into a form (ab- stract syntax) that is easily manipulated by a computer, can be used to read and manipulate any kind of structured text such as XML documents, address lists, etc.. Reason “d” is becoming more and more important as domain specific languages (DSLs) are gaining in popularity. A DSL is a (typically small) language designed for a narrow class of problems. Examples are data-base query languages, text- formatting languages, scene description languages for ray-tracers and languages1.5. THESTRUCTUREOFTHISBOOK 5 for setting up economic simulations. The target language for a compiler for a DSL may be traditional machine code, but it can also be another high-level language for which compilers already exist, a sequence of control signals for a machine, or formatted text and graphics in some printer-control language (e.g. PostScript). Even so, all DSL compilers will share similar front-ends for reading and analysing the program text. Hence, the methods needed to make a compiler front-end are more widely ap- plicable than the methods needed to make a compiler back-end, but the latter is more important for understanding how a program is executed on a machine. 1.5 The structure of this book The first part of the book describes the methods and tools required to read program text and convert it into a form suitable for computer manipulation. This process is made in two stages: A lexical analysis stage that basically divides the input text into a list of “words”. This is followed by a syntax analysis (or parsing) stage that analyses the way these words form structures and converts the text into a data structure that reflects the textual structure. Lexical analysis is covered in chapter 2 and syntactical analysis in chapter 3. The second part of the book (chapters 4 – 10) covers the middle part and back- end of interpreters and compilers. Chapter 4 covers how definitions and uses of names (identifiers) are connected through symbol tables. Chapter 5 shows how you can implement a simple programming language by writing an interpreter and notes that this gives a considerable overhead that can be reduced by doing more things be- fore executing the program, which leads to the following chapters about static type checking (chapter 6) and compilation (chapters 7 – 10. In chapter 7, it is shown how expressions and statements can be compiled into an intermediate language, a language that is close to machine language but hides machine-specific details. In chapter 8, it is discussed how the intermediate language can be converted into “real” machine code. Doing this well requires that the registers in the processor are used to store the values of variables, which is achieved by a register allocation process, as described in chapter 9. Up to this point, a “program” has been what corresponds to the body of a single procedure. Procedure calls and nested proce- dure declarations add some issues, which are discussed in chapter 10. Chapter 11 deals with analysis and optimisation and chapter 12 is about allocating and freeing memory. Finally, chapter 13 will discuss the process of bootstrapping a compiler, i.e., using a compiler to compile itself. The book uses standard set notation and equations over sets. Appendix A con- tains a short summary of these, which may be helpful to those that need these concepts refreshed. Chapter 11 (on analysis and optimisation) was added in 2008 and chapter 56 CHAPTER1. INTRODUCTION (about interpreters) was added in 2009, which is why editions after April 2008 are called “extended”. In the 2010 edition, further additions (including chapter 12 and appendix A) were made. Since ten years have passed since the first edition was printed as lecture notes, the 2010 edition is labeled “anniversary edition”. 1.6 To the lecturer This book was written for use in the introductory compiler course at DIKU, the department of computer science at the University of Copenhagen, Denmark. At DIKU, the compiler course was previously taught right after the introduc- tory programming course, which is earlier than in most other universities. Hence, existing textbooks tended either to be too advanced for the level of the course or be too simplistic in their approach, for example only describing a single very simple compiler without bothering too much with the general picture. This book was written as a response to this and aims at bridging the gap: It is intended to convey the general picture without going into extreme detail about such things as efficient implementation or the newest techniques. It should give the students an understanding of how compilers work and the ability to make simple (but not simplistic) compilers for simple languages. It will also lay a foundation that can be used for studying more advanced compilation techniques, as found e.g. in 35. The compiler course at DIKU was later moved to the second year, so additions to the original text has been made. At times, standard techniques from compiler construction have been simplified for presentation in this book. In such cases references are made to books or articles where the full version of the techniques can be found. The book aims at being “language neutral”. This means two things:  Little detail is given about how the methods in the book can be implemented in any specific language. Rather, the description of the methods is given in the form of algorithm sketches and textual suggestions of how these can be implemented in various types of languages, in particular imperative and functional languages.  There is no single through-going example of a language to be compiled. In- stead, different small (sub-)languages are used in various places to cover ex- actly the points that the text needs. This is done to avoid drowning in detail, hopefully allowing the readers to “see the wood for the trees”. Each chapter (except this) has a section on further reading, which suggests additional reading material for interested students. All chapters (also except this) has a set of exercises. Few of these require access to a computer, but can be solved on paper or black-board. In fact, many of the exercises are based on exercises that1.7. ACKNOWLEDGEMENTS 7 have been used in written exams at DIKU. After some of the sections in the book, a few easy exercises are listed. It is suggested that the student attempts to solve these exercises before continuing reading, as the exercises support understanding of the previous sections. Teaching with this book can be supplemented with project work, where students write simple compilers. Since the book is language neutral, no specific project is given. Instead, the teacher must choose relevant tools and select a project that fits the level of the students and the time available. Depending on how much of the book is used and the amount of project work, the book can support course sizes ranging from 5 to 15 ECTS points. 1.7 Acknowledgements The author wishes to thank all people who have been helpful in making this book a reality. This includes the students who have been exposed to draft versions of the book at the compiler courses “Dat 1E” and “Oversættere” at DIKU, and who have found numerous typos and other errors in the earlier versions. I would also like to thank the instructors at Dat 1E and Oversættere, who have pointed out places where things were not as clear as they could be. I am extremely grateful to the people who in 2000 read parts of or all of the first draft and made helpful suggestions. 1.8 Permission to use Permission to copy and print for personal use is granted. If you, as a lecturer, want to print the book and sell it to your students, you can do so if you only charge the printing cost. If you want to print the book and sell it at profit, please contact the author attorbenmdiku.dk and we will find a suitable arrangement. In all cases, if you find any misprints or other errors, please contact the author attorbenmdiku.dk. See also the book homepage: http://www.diku.dk/torbenm/Basics.8 CHAPTER1. INTRODUCTION