compiler lexical analysis and compiler phases lexical analysis
Dr.MasonHanks,Germany,Teacher
Published Date:23-07-2017
Your Website URL(Optional)
Comment
U Unopti ti mi i zed C d C od de
Generation
From From the the intermediate intermediate
representation to the machine
code code5
Outline
• Introduction
• Machine Language
• Overview of a modern p processor
• Memory Layout
• Procedure Procedure A Abstraction bstraction
• Procedure Linkage
• • Guidelines Guidelines in in Creating Creating a a C Code ode Generator Generator Anatomy of a compiler
Program (character stream)
Lexical Analy( yzer (Scanner) )
Token Stream
Syyy ntax Analyzer ((Parser))
Parse Tree
Semantic Analy yzer
Intermediate Representation
Intermediate Code Op ptimizer
Optimized Intermediate Representation
Code Generator
Assembly codeAnatomy of a compiler
Program (character stream)
Lexical Analy( yzer (Scanner) )
Token Stream
Syy yntax Analyzer ((Parser))
Parse Tree
Semantic Analy yzer
High-level IR
Low Low-level level IR IR
Intermediate Representation
Code Generator
Assembly codeComponents of a High Level
Language Language
CODE DATA
Global Static Variables
Procedures
Global Global Dynamic Dynamic Data Data
Control Flow
Local Variables
Temporaries
Statements
Parameter Parameter Passing Passing
Data Access
Read-only DataMachine Code Generator
Should Should...
• Translate all the instructions in the
intermediate representation to assembly
language language
• Allocate space for the variables, arrays etc.
• Adhere to calling conventions
• Create the necessaryyy symbolic information7
Outline
• Introduction
• Machine Language
• Overview of a modern p processor
• Memory Layout
• Procedure Procedure A Abstraction bstraction
• Procedure Linkage
• • Guidelines Guidelines in in Creating Creating a a C Code ode Generator Generator Machines understand...
LOCATION DATA
0046 8B45FC
0049 4863F0
004c 8B45FC
004f 4863D0
0052 0052 8B45FC 8B45FC
0055 4898
0057 8B048500
000000
005e 8B149500
000000 000000
0065 01C2
0067 8B45FC
006a 4898
006c 89D7
006e 033C8500
000000
0075 8B45FC
0078 4863C8
007b 007b 8B45F8 8B45F8
007e 4898
0080 8B148500
000000 Machines understand...
LOCATION DATA ASSEMBLY INSTRUCTION
0046 8B45FC -4(%rbp), %eax
0049 4863F0
%eax,%rsi
004c 8B45FC -4(%rbp), %eax
004f 4863D0
%eax,%rdx
0052 0052 8B45FC 8B45FC movl movl -4(%rbp), 4(%rbp), %eax %eax
0055 4898
0057 8B048500 B(,%rax,4), %eax
000000
movl
005e 8B149500 A(,%rdx,4), %edx
movslq
000000 000000
movl
0065 01C2 %eax, %edx
movslq
0067 8B45FC -4(%rbp), %eax
006a 4898
movl
006c 89D7 addl ,, %edi
cltq
006e 033C8500 C(,%rax,4), %edi
movl %edx
000000
0075 8B45FC movl -4(%rbp), %eax
0078 4863C8
%eax,%rcx
007b 007b 8B45F8 8B45F8 movl movl -8(%rbp) 8(%rbp), %eax %eax
movl
007e 4898
0080 8B148500 B(,%rax,4), %edx
cltqaddl
movl
movslq
movl
cltqProgram (character stream)
Lexical Analyzer (Scanner)
Token Stream
Syntax Syntax Analyzer Analyzer (Parser) (Parser)
Parse Tree
Intermediate Code Generator
High-level IR
Low-level IR
Intermediate Representation
Code Generator
Assembly Assembly code codeProgram (character stream)
Lexical Analyzer (Scanner)
Token Stream
Syntax Syntax Analyzer Analyzer (Parser) (Parser)
Parse Tree
Intermediate Code Generator
High-level IR
Low-level IR
Intermediate Representation
Code Generator
Assembly Assembly code code
Assembler & linker
Binary executable
ProcessorAssembly y lang g uag g e
• Advantages
– Simplifies Simplifies code code generation generation due due to to use use o of f symbolic symbolic
instructions and symbolic names
– Logical Logical abstraction abstraction llayer ayer
– Multiple Architectures can describe by a single
assembly assembly language language
⇒ can modify the implementation
• macro assembly y instructions
• Disadvantages
– Additional Additional process process o of f a assembling ssembling and and linking linking
– Assembler adds overhead Assembly y lang guag g e
• Relocatable machine language (object modules)
– all locations(addresses) represented by symbols
– Mapped to memory addresses at link and load time
– Flexibility of separate compilation
• Absolute machine language
– add ddresses are h hard d-codd ded
– simple and straightforward implementation
– inflexible inflexible hard hard to to reload reload generated generated code code
– Used in interrupt handlers and device drivers 11
Assembly example
.section
.LC0 LC0 :
0000 6572726F7200 .string "error"
.text
.globl fact
.rodata
fact:
0000 55 p pushq q %rbpp
0001 4889E5 %rsp, %rbp
0004 4883EC10 16, %rsp
0008 897DFC %edi, -4(%rbp)
000b 837DFC00
000f 7911
0, -4(%rbp)
0011 0011 BF00000000 BF00000000 movl movl .LC0, .LC0, %edi %edi
0016 B800000000
.L2
001b E800000000
jns
0, %eax
0020 EB22
printf
.L2:
subq
movq
0022 837DFC00
.L3
0026 0026 7509 7509 jne jne L4 L4
cmpl
movl
jmp
0, -4(%rbp)
0028 C745F801000000
movl .
002f EB13
call 1, -8(%rbp)
.L4:
0031 8B7DFC -4(%rbp), %edi
.L3
0034 FFCF
movl
jmp
0036 0036 E800000000 E800000000 ll ll f factt
%edi
003b 0FAF45FC -4(%rbp), %eax
cmpl
003f 8945F8 %eax, -8(%rbp)
0042 EB00
decl
.L3:
0044 8B45F8 -8(( %rbp), p), %eax
.L1
0047 C9
ca
jmp
0048 C3
movl
imull
movl
movl
leave
ret Comp position of an Objj ect File
.file
• We use the ELF file format
"test2.c" .section
.LC0: .LC0:
.string "error %d"
• The object file has:
.rodata
.section .text
– Multiple Segments .globl fact
fact:
– – Symbol Symbol Information Information
pushq %rbp
movq %rsp, %rbp
– Relocation Information
subq 16, %rsp
movl -8(%rbp), %eax
leave
• Segments Segments
ret ret
.
– Global Offset Table
.comm bar,4,4
.comm a,1,1
– Procedure Linkage Table
.comm b,1,1
– Text Text (code) (code)
.section
–Data
.long .LECIE1-.LSCIE1
.long 0x0 .eh_frame,"a",progbits
– Read Only Data
.byte 0x1
.string ""
.uleb128 0x1 11
Outline
• Introduction
• Machine Language
• Overview of a modern p processor
• Memory Layout
• Procedure Procedure A Abstraction bstraction
• Procedure Linkage
• • Guidelines Guidelines in in Creating Creating a a C Code ode Generator Generator Overview of a modern
processor processor
• ALU
Memory Memory
• Control
Registers ALU
• Memory
• Registers Registers
ControlArithmetic and Log gic Unit
• Performs most of the data
Memory Memory
operations operations
• Has the form:
OP oprnd , oprnd
1 2
– oprnd oprnd = oprnd oprnd OP OP oprnd oprnd
2 2 1 1 2 2
Registers ALU
Or
OP oprnd
1
Control
• Operands Operands are: are:
– Immediate Value 25
–Register %rax
– Memory Memory 4(%rbp) 4(%rbp)
• Operations are:
– Arithmetic operations (add, sub, imul)
– Logical Logical operations operations (and (and, sal) sal)
– Unitary operations (inc, dec)Arithmetic and Log gic Unit
• Many arithmetic operations can
cause an exception
Memory Memory
– overflow and underflow
• Can operate on different data types
Registers ALU
– addb 8 bits
– addw 16 bits
Control
– addl 32 bits
– addq 64 bits (Decaf is all 64 bit)
– signed signed and and unsigned unsigned arithmetic arithmetic
– Floating-point operations
((p separate ALU))Control
• Handles the instruction seqqg uencing
Memory Memory
• Executing instructions
– All All instructions instructions are are in in memory memory
Registers ALU
– Fetch the instruction pointed by the
PC PC and and execute execute it it
Control
– For general instructions, increment
the PC to p point to the next location in
memory
Advise:Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.