IKI10230 Pengantar Organisasi Komputer Kuliah no. 09: Compiling-Assembling-Linking
description
Transcript of IKI10230 Pengantar Organisasi Komputer Kuliah no. 09: Compiling-Assembling-Linking
1
IKI10230Pengantar Organisasi Komputer
Kuliah no. 09: Compiling-Assembling-LinkingSumber:1. Paul Carter, PC Assembly Language2. Hamacher. Computer Organization, ed-53. Materi kuliah CS61C/2000 & CS152/1997, UCB
21 April 2004
L. Yohanes Stefanus ([email protected])Bobby Nazief ([email protected])
bahan kuliah: http://www.cs.ui.ac.id/kuliah/POK/
2
Steps to Starting a Program
C program: foo.c
Assembly program: foo.s
Executable(mach lang pgm): foo.exe
Compiler
Assembler
Linker
Loader
Memory
Object(mach lang module): foo.o
lib.o
3
Example: C Asm Obj Exe Run
#include <stdio.h>
int main (int argc, char *argv[]) {
int i;
int sum = 0;
for (i = 0; i <= 100; i = i + 1)
sum = sum + i * i;
printf ("The sum from 0 .. 100 is %d\n", sum);
}
4
Compiler
° Input: High-Level Language Code (e.g., C, Java)
° Output: Assembly Language Code(e.g., Intel x86)
° Note: Output may contain directives & pseudoinstructions
5
Example: C Asm Obj Exe Run segment .text
LC0: db "The sum from 0 .. 100 is %d",0xa,0
_main:
push ebp
mov ebp,esp
sub esp,24
mov dword [ebp-8],0
mov dword [ebp-4],0
L3:
cmp dword [ebp-4],100
jle L6
jmp L4
L6:
mov eax,[ebp-4]
imul eax,[ebp-4]
add [ebp-8],eax
L5:
inc dword [ebp-4]
jmp L3
L4:
add esp,-8
mov eax,[ebp-8]
push eax
push dword LC0
call _printf
add esp,16
L2:
mov esp,ebp
pop ebp
ret
6
Where Are We Now?
C program: foo.c
Assembly program: foo.s
Executable(mach lang pgm): a.out
Compiler
Assembler
Linker
Loader
Memory
Object(mach lang module): foo.o
lib.o
7
Assembler
° Reads and Uses Directives
° Replace Pseudoinstructions
° Produce Machine Language
° Creates Object File
8
Producing Machine Language
° Simple Case• Arithmetic, Logical, Shifts, and so on.
• All necessary info is within the instruction already.
° What about Branches?• PC-Relative
• So once pseudoinstructions are replaced by real ones, we know by how many instructions to branch.
° What about jumps?• Some require absolute address.
° What about references to data?• These will require the full 32-bit address of the data.
° Addresses can’t be determined yet, so we create two tables…
9
Symbol Table
° List of “items” in this file that may be used by other files.
° What are they?• Labels: function calling
• Data: anything in the .data section; variables which may be accessed across files
° First Pass: record label-address pairs
° Second Pass: produce machine code• Result: can jump to a later label without first declaring it
10
Relocation Table
° List of “items” for which this file needs the address.
° What are they?• Any label jumped to: jmp or call
- internal
- external (including lib files)
• Any piece of data
11
Object File Format
° object file header: size and position of the other pieces of the object file
° text segment: the machine code
° data segment: binary representation of the data in the source file
° relocation information: identifies lines of code that need to be “handled”
° symbol table: list of this file’s labels and data that can be referenced
° debugging information
12
Example: C Asm Obj Exe Run segment .text
0x0:
db "The sum from 0 .. 100 is %d",0xa,0
0x1d:
push ebp
mov ebp,esp
sub esp,24
mov dword [ebp-8],0
mov dword [ebp-4],0
0x34:
cmp dword [ebp-4],100
jle 0x05 (0x42)
jmp 0x00000012 (0x54)
0x42:
mov eax,[ebp-4]
imul eax,[ebp-4]
add [ebp-8],eax
0x4c:
inc dword [ebp-4]
jmp 0xffffffe0 (0x34)
0x54:
add esp,-8
mov eax,[ebp-8]
push eax
push 0x0
call 0x0
add esp,16
0x6e:
mov esp,ebp
pop ebp
ret
13
Symbol Table Entries
° Symbol Table • Label Address
LC0: 0x00000000
main: 0x0000001d
L3: 0x00000034
L6: 0x00000042
L5: 0x0000004c
L4: 0x00000054
L2: 0x0000006e
° Relocation Information• Offset Type Value
0x0000005f dir32 .text
(LC0: offset 0 of .text segment)
0x00000064 DISP32 _printf
14
Where Are We Now?
C program: foo.c
Assembly program: foo.s
Executable(mach lang pgm): a.out
Compiler
Assembler
Linker
Loader
Memory
Object(mach lang module): foo.o
lib.o
15
Link Editor/Linker
° Step 1: Take text segment from each .o file and put them together.
° Step 2: Take data segment from each .o file, put them together, and concatenate this onto end of text segments.
° Step 3: Resolve References• Go through Relocation Table and handle each entry
• That is, fill in all absolute addresses
16
Four Types of Addresses
° PC-Relative Addressing (beq, bne): never relocate
° Absolute Address (jmp, call): always relocate
° External Reference (usually call): always relocate
° Data Reference: always relocate
17
Resolving References
° Linker assumes first word of first text segment is at address 0x00000000.
° Linker knows:• length of each text and data segment
• ordering of text and data segments
° Linker calculates:• absolute address of each label to be jumped to (internal or
external) and each piece of data being referenced
° To resolve references:• search for reference (data or label) in all symbol tables
• if not found, search library files (for example, for printf)
• once absolute address is determined, fill in the machine code appropriately
° Output of linker: executable file containing text and data (plus header)
18
Example: C Asm Obj Exe Run segment .text
0x15c0:
db "The sum from 0 .. 100 is %d",0xa,0
0x15dd:
push ebp
mov ebp,esp
sub esp,24
mov dword [ebp-8],0
mov dword [ebp-4],0
0x15f4:
cmp dword [ebp-4],100
jle 0x05 (0x1602)
jmp 0x12 (0x1614)
0x1602:
mov eax,[ebp-4]
imul eax,[ebp-4]
add [ebp-8],eax
0x160c:
inc dword [ebp-4]
jmp 0xe0 (0x15f4)
0x1614:
add esp,-8
mov eax,[ebp-8]
push eax
push 0x000015c0
call 0x00001778 (0x2da0)*
add esp,16
0x162e:
mov esp,ebp
pop ebp
ret
*0x1628 + 0x1778 = 0x2da0
19
Peta Memori .EXE
00000000
...
000015C0
00001631
...
0000B000
...
0000BB04
Obj lainnya
(..., _printf, ...)
Obj lainnya
Foo.o .text
.data
20
Where Are We Now?
C program: foo.c
Assembly program: foo.s
Executable(mach lang pgm): a.out
Compiler
Assembler
Linker
Loader
Memory
Object(mach lang module): foo.o
lib.o
21
Loader (1/3)
° Executable files are stored on disk.
° When one is run, loader’s job is to load it into memory and start it running.
° In reality, loader is the operating system (OS) • loading is one of the OS tasks
22
Loader (2/3)
° So what does a loader do?
° Reads executable file’s header to determine size of text and data segments
° Creates new address space for program large enough to hold text and data segments, along with a stack segment
° Copies instructions and data from executable file into the new address space (this may be anywhere in memory)
23
Loader (3/3)
° Copies arguments passed to the program onto the stack
° Initializes machine registers• Most registers cleared, but stack pointer assigned address of
1st free stack location
° Jumps to start-up routine that copies program’s arguments from stack to registers and sets the PC
• If main routine returns, start-up routine terminates program with the exit system call
24
Example: C Asm Obj Exe Run
0x000015c0: 0x20656854 0x206d7573 0x6d6f7266 0x2e203020
0x000015d0: 0x3031202e 0x73692030 0x0a642520 0xe5895500
0x000015e0: 0x0018ec81 0x45c70000 0x000000f8 0xfc45c700
0x000015f0: 0x00000000 0x64fc7d81 0x7e000000 0x0012e905
0x00001600: 0x458b0000 0x45af0ffc 0xf84501fc 0xe9fc45ff
0x00001610: 0xffffffe0 0xfff8c481 0x458bffff 0xc06850f8
0x00001620: 0xe8000015 0x00001778 0x0010c481 0xec890000
0x00001630: 0x0000c35d
0x000015c0: 54 68 65 20 73 75 62 20 66 72 6f 6d 20 30 20 2e
T h e s u m f r o m 0 .
000015dd: 55 push ebp
000015de: 89e5 mov ebp,esp
000015e0: 81ec18000000 sub esp,0x18
000015e6: c745f800000000 mov [ebp-8],0
000015ed: c745fc00000000 mov [ebp-4],0
000015f4: 817dfc64000000 cmp [ebp-4],0x64
000015fb: 7e05 jle 0x1602
000015fd: e912000000 jmp 0x1614
25
.ASM, .O, & .EXE
(FORMAT COFF)
26
Example: C Asm Obj Exe Run
.text
LC0:
.ascii "The sum from 0 .. 100 is %d\12\0"
main:
pushl %ebp
movl %esp,%ebp
subl $24,%esp
movl $0,-8(%ebp)
movl $0,-4(%ebp)
L3:
cmpl $100,-4(%ebp)
jle L6
jmp L4
L6:
movl -4(%ebp),%eax
imull -4(%ebp),%eax
addl %eax,-8(%ebp)
L5:
incl -4(%ebp)
jmp L3
L4:
addl $-8,%esp
movl -8(%ebp),%eax
pushl %eax
pushl LC0
call _printf
addl $16,%esp
L2:
movl %ebp,%esp
popl %ebp
ret
27
Example: C Asm Obj Exe Run
.text
0x0:
.ascii "The sum from 0 .. 100 is %d\12\0"
0x20:
pushl %ebp
movl %esp,%ebp
subl $24,%esp
movl $0,-8(%ebp)
movl $0,-4(%ebp)
0x34:
cmpl $100,-4(%ebp)
jle 6 (0x40)
jmp 0x14 (0x50)
0x40:
movl -4(%ebp),%eax
imull -4(%ebp),%eax
addl %eax,-8(%ebp)
0x4a:
incl -4(%ebp)
jmp -0x1b (0x34)
0x50:
addl $-8,%esp
movl -8(%ebp),%eax
pushl %eax
pushl 0x0
call 0x0 (undefined)
addl $16,%esp
0x64:
movl %ebp,%esp
popl %ebp
ret
28
Symbol Table Entries
° Symbol Table • Label Address
LC0: 0x00000000
L2: 0x00000064
L3: 0x00000034
L4: 0x00000050
L5: 0x0000004a
L6: 0x00000040
main: 0x00000020
° Relocation Information• Address Instr. Type Dependency • 0x0000005c call printf
29
Example: C Asm Obj Exe Run
.text
0x15c0:
.ascii "The sum from 0 .. 100 is %d\12\0"
0x15e0:
pushl %ebp
movl %esp,%ebp
subl $24,%esp
movl $0,-8(%ebp)
movl $0,-4(%ebp)
0x15f4:
cmpl $100,-4(%ebp)
jle 6 (0x1600)
jmp 0x14 (0x1610)
0x1600:
movl -4(%ebp),%eax
imull -4(%ebp),%eax
addl %eax,-8(%ebp)
0x160a:
incl -4(%ebp)
jmp -0x1b (0x15f4)
0x1610:
addl $-8,%esp
movl -8(%ebp),%eax
pushl %eax
pushl 0x15c0
call 0x2d90
addl $16,%esp
0x1624:
movl %ebp,%esp
popl %ebp
ret