Compiler design lexical analysis

compiler lexical analysis and compiler phases lexical analysis
Dr.MasonHanks Profile Pic
Dr.MasonHanks,Germany,Teacher
Published Date:23-07-2017
Your Website URL(Optional)
Comment
U Unopti ti mi i zed C d C od de Generation From From the the intermediate intermediate representation to the machine code code5 Outline • Introduction • Machine Language • Overview of a modern p processor • Memory Layout • Procedure Procedure A Abstraction bstraction • Procedure Linkage • • Guidelines Guidelines in in Creating Creating a a C Code ode Generator Generator Anatomy of a compiler Program (character stream) Lexical Analy( yzer (Scanner) ) Token Stream Syyy ntax Analyzer ((Parser)) Parse Tree Semantic Analy yzer Intermediate Representation Intermediate Code Op ptimizer Optimized Intermediate Representation Code Generator Assembly codeAnatomy of a compiler Program (character stream) Lexical Analy( yzer (Scanner) ) Token Stream Syy yntax Analyzer ((Parser)) Parse Tree Semantic Analy yzer High-level IR Low Low-level level IR IR Intermediate Representation Code Generator Assembly codeComponents of a High Level Language Language CODE DATA Global Static Variables Procedures Global Global Dynamic Dynamic Data Data Control Flow Local Variables Temporaries Statements Parameter Parameter Passing Passing Data Access Read-only DataMachine Code Generator Should Should... • Translate all the instructions in the intermediate representation to assembly language language • Allocate space for the variables, arrays etc. • Adhere to calling conventions • Create the necessaryyy symbolic information7 Outline • Introduction • Machine Language • Overview of a modern p processor • Memory Layout • Procedure Procedure A Abstraction bstraction • Procedure Linkage • • Guidelines Guidelines in in Creating Creating a a C Code ode Generator Generator Machines understand... LOCATION DATA 0046 8B45FC 0049 4863F0 004c 8B45FC 004f 4863D0 0052 0052 8B45FC 8B45FC 0055 4898 0057 8B048500 000000 005e 8B149500 000000 000000 0065 01C2 0067 8B45FC 006a 4898 006c 89D7 006e 033C8500 000000 0075 8B45FC 0078 4863C8 007b 007b 8B45F8 8B45F8 007e 4898 0080 8B148500 000000 Machines understand... LOCATION DATA ASSEMBLY INSTRUCTION 0046 8B45FC -4(%rbp), %eax 0049 4863F0 %eax,%rsi 004c 8B45FC -4(%rbp), %eax 004f 4863D0 %eax,%rdx 0052 0052 8B45FC 8B45FC movl movl -4(%rbp), 4(%rbp), %eax %eax 0055 4898 0057 8B048500 B(,%rax,4), %eax 000000 movl 005e 8B149500 A(,%rdx,4), %edx movslq 000000 000000 movl 0065 01C2 %eax, %edx movslq 0067 8B45FC -4(%rbp), %eax 006a 4898 movl 006c 89D7 addl ,, %edi cltq 006e 033C8500 C(,%rax,4), %edi movl %edx 000000 0075 8B45FC movl -4(%rbp), %eax 0078 4863C8 %eax,%rcx 007b 007b 8B45F8 8B45F8 movl movl -8(%rbp) 8(%rbp), %eax %eax movl 007e 4898 0080 8B148500 B(,%rax,4), %edx cltqaddl movl movslq movl cltqProgram (character stream) Lexical Analyzer (Scanner) Token Stream Syntax Syntax Analyzer Analyzer (Parser) (Parser) Parse Tree Intermediate Code Generator High-level IR Low-level IR Intermediate Representation Code Generator Assembly Assembly code codeProgram (character stream) Lexical Analyzer (Scanner) Token Stream Syntax Syntax Analyzer Analyzer (Parser) (Parser) Parse Tree Intermediate Code Generator High-level IR Low-level IR Intermediate Representation Code Generator Assembly Assembly code code Assembler & linker Binary executable ProcessorAssembly y lang g uag g e • Advantages – Simplifies Simplifies code code generation generation due due to to use use o of f symbolic symbolic instructions and symbolic names – Logical Logical abstraction abstraction llayer ayer – Multiple Architectures can describe by a single assembly assembly language language ⇒ can modify the implementation • macro assembly y instructions • Disadvantages – Additional Additional process process o of f a assembling ssembling and and linking linking – Assembler adds overhead Assembly y lang guag g e • Relocatable machine language (object modules) – all locations(addresses) represented by symbols – Mapped to memory addresses at link and load time – Flexibility of separate compilation • Absolute machine language – add ddresses are h hard d-codd ded – simple and straightforward implementation – inflexible inflexible hard hard to to reload reload generated generated code code – Used in interrupt handlers and device drivers 11 Assembly example .section .LC0 LC0 : 0000 6572726F7200 .string "error" .text .globl fact .rodata fact: 0000 55 p pushq q %rbpp 0001 4889E5 %rsp, %rbp 0004 4883EC10 16, %rsp 0008 897DFC %edi, -4(%rbp) 000b 837DFC00 000f 7911 0, -4(%rbp) 0011 0011 BF00000000 BF00000000 movl movl .LC0, .LC0, %edi %edi 0016 B800000000 .L2 001b E800000000 jns 0, %eax 0020 EB22 printf .L2: subq movq 0022 837DFC00 .L3 0026 0026 7509 7509 jne jne L4 L4 cmpl movl jmp 0, -4(%rbp) 0028 C745F801000000 movl . 002f EB13 call 1, -8(%rbp) .L4: 0031 8B7DFC -4(%rbp), %edi .L3 0034 FFCF movl jmp 0036 0036 E800000000 E800000000 ll ll f factt %edi 003b 0FAF45FC -4(%rbp), %eax cmpl 003f 8945F8 %eax, -8(%rbp) 0042 EB00 decl .L3: 0044 8B45F8 -8(( %rbp), p), %eax .L1 0047 C9 ca jmp 0048 C3 movl imull movl movl leave ret Comp position of an Objj ect File .file • We use the ELF file format "test2.c" .section .LC0: .LC0: .string "error %d" • The object file has: .rodata .section .text – Multiple Segments .globl fact fact: – – Symbol Symbol Information Information pushq %rbp movq %rsp, %rbp – Relocation Information subq 16, %rsp movl -8(%rbp), %eax leave • Segments Segments ret ret . – Global Offset Table .comm bar,4,4 .comm a,1,1 – Procedure Linkage Table .comm b,1,1 – Text Text (code) (code) .section –Data .long .LECIE1-.LSCIE1 .long 0x0 .eh_frame,"a",progbits – Read Only Data .byte 0x1 .string "" .uleb128 0x1 11 Outline • Introduction • Machine Language • Overview of a modern p processor • Memory Layout • Procedure Procedure A Abstraction bstraction • Procedure Linkage • • Guidelines Guidelines in in Creating Creating a a C Code ode Generator Generator Overview of a modern processor processor • ALU Memory Memory • Control Registers ALU • Memory • Registers Registers ControlArithmetic and Log gic Unit • Performs most of the data Memory Memory operations operations • Has the form: OP oprnd , oprnd 1 2 – oprnd oprnd = oprnd oprnd OP OP oprnd oprnd 2 2 1 1 2 2 Registers ALU Or OP oprnd 1 Control • Operands Operands are: are: – Immediate Value 25 –Register %rax – Memory Memory 4(%rbp) 4(%rbp) • Operations are: – Arithmetic operations (add, sub, imul) – Logical Logical operations operations (and (and, sal) sal) – Unitary operations (inc, dec)Arithmetic and Log gic Unit • Many arithmetic operations can cause an exception Memory Memory – overflow and underflow • Can operate on different data types Registers ALU – addb 8 bits – addw 16 bits Control – addl 32 bits – addq 64 bits (Decaf is all 64 bit) – signed signed and and unsigned unsigned arithmetic arithmetic – Floating-point operations ((p separate ALU))Control • Handles the instruction seqqg uencing Memory Memory • Executing instructions – All All instructions instructions are are in in memory memory Registers ALU – Fetch the instruction pointed by the PC PC and and execute execute it it Control – For general instructions, increment the PC to p point to the next location in memory