Lec2 5 MachineLevelProgramming Unions Alignment
-
Upload
sudarshan-suresh -
Category
Documents
-
view
215 -
download
0
Transcript of Lec2 5 MachineLevelProgramming Unions Alignment
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
1/39
Carnegie Mellon
1
Machine-Level Programming V:
Advanced Topics
15-213: Introduction to Computer Systems8thLecture, Sep. 16, 2010
Instructors:
Randy Bryant & Dave OHallaron
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
2/39
Carnegie Mellon
2
Today
Structures Alignment
Unions
Memory Layout
Procedures (x86-64)
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
3/39
Carnegie Mellon
3
Structures & Alignment
Unaligned Data
Aligned Data Primitive data type requires Kbytes
Address must be multiple of K
c i[0] i[1] vbytes 4 bytesp+0 p+4 p+8 p+16 p+24
Multiple of 4 Multiple of 8
Multiple of 8 Multiple of 8
c i[0] i[1] v
p p+1 p+5 p+9 p+17
struct S1 {char c;int i[2];double v;
} *p;
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
4/39
Carnegie Mellon
4
Alignment Principles
Aligned Data Primitive data type requires Kbytes
Address must be multiple of K
Required on some machines; advised on IA32
treated differently by IA32 Linux, x86-64 Linux, and Windows! Motivation for Aligning Data
Memory accessed by (aligned) chunks of 4 or 8 bytes (system
dependent)
Inefficient to load or store datum that spans quad word
boundaries
Virtual memory very tricky when datum spans 2 pages
Compiler
Inserts gaps in structure to ensure correct alignment of fields
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
5/39
Carnegie Mellon
5
Specific Cases of Alignment (IA32) 1 byte: char,
no restrictions on address
2 bytes: short,
lowest 1 bit of address must be 02
4 bytes: int, float, char *,
lowest 2 bits of address must be 002
8 bytes: double,
Windows (and most other OSs & instruction sets):
lowest 3 bits of address must be 0002
Linux:
lowest 2 bits of address must be 002
i.e., treated the same as a 4-byte primitive data type
12 bytes: long double
Windows, Linux:
lowest 2 bits of address must be 002
i.e., treated the same as a 4-byte primitive data type
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
6/39
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
7/39
Carnegie Mellon
7
struct S1 {char c;int i[2];double v;
} *p;
Satisfying Alignment with Structures Within structure:
Must satisfy each elements alignment requirement
Overall structure placement
Each structure has alignment requirement K
K= Largest alignment of any element
Initial address & structure length must be multiples of K
Example (under Windows or x86-64):
K = 8, due to doubleelement
c i[0] i[1] vbytes 4 bytesp+0 p+4 p+8 p+16 p+24
Multiple of 4 Multiple of 8
Multiple of 8 Multiple of 8
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
8/39
Carnegie Mellon
8
Different Alignment Conventions
x86-64 or IA32 Windows: K = 8, due to doubleelement
IA32 Linux
K = 4; doubletreated like a 4-byte data type
struct S1 {
char c;int i[2];double v;
} *p;
c3 bytes
i[0] i[1]4 bytes
vp+0 p+4 p+8 p+16 p+24
c 3 bytes i[0] i[1] v
p+0 p+4 p+8 p+12 p+20
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
9/39
Carnegie Mellon
9
Meeting Overall Alignment Requirement
For largest alignment requirement K
Overall structure must be multiple of K
struct S2 {double v;int i[2];char c;
} *p;
v i[0] i[1] c 7 bytes
p+0 p+8 p+16 p+24
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
10/39
Carnegie Mellon
10
Arrays of Structures
Overall structure lengthmultiple of K
Satisfy alignment requirement
for every element
struct S2 {
double v;int i[2];char c;
} a[10];
v i[0] i[1] c 7 bytes
a+24 a+32 a+40 a+48
a[0] a[1] a[2]
a+0 a+24 a+48 a+72
C i M ll
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
11/39
Carnegie Mellon
11
Accessing Array Elements
Compute array offset 12i sizeof(S3), including alignment spacers
Element jis at offset 8 within structure
Assembler gives offset a+8
Resolved during linking
struct S3 {short i;float v;
short j;} a[10];
short get_j(int idx){return a[idx].j;
}
# %eax = idxleal (%eax,%eax,2),%eax # 3*idx
movswl a+8(,%eax,4),%eax
a[0] a[i]
a+0 a+12 a+12i
i 2 bytes v j 2 bytesa+12i a+12i+8
C i M ll
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
12/39
Carnegie Mellon
12
Saving Space
Put large data types first
Effect (K=4)
struct S4 {char c;int i;char d;
} *p;
struct S5 {int i;char c;char d;
} *p;
c ibytes d 3 bytes
ci d bytes
C i M ll
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
13/39
Carnegie Mellon
13
Today
Structures Alignment
Unions
Memory Layout
Procedures (x86-64)
C i M ll
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
14/39
Carnegie Mellon
14
Union Allocation
Allocate according to largest element
Can only use one field at a time
union U1 {char c;int i[2];
double v;} *up;
struct S1 {char c;int i[2];
double v;} *sp;
c 3 bytes i[0] i[1] 4 bytes v
sp+0 sp+4 sp+8 sp+16 sp+24
c
i[0] i[1]
v
up+0 up+4 up+8
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
15/39
Carnegie Mellon
15
typedef union {float f;unsigned u;
} bit_float_t;
float bit2float(unsigned u){bit_float_t arg;arg.u = u;return arg.f;
}
unsigned float2bit(float f){bit_float_t arg;arg.f = f;return arg.u;
}
Using Union to Access Bit Patterns
Same as (float) u? Same as (unsigned) f?
u
f
0 4
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
16/39
Carnegie Mellon
16
Byte Ordering Revisited
Idea
Short/long/quad words stored in memory as 2/4/8 consecutive bytes
Which is most (least) significant?
Can cause problems when exchanging binary data between machines
Big Endian
Most significant byte has lowest address
Sparc
Little Endian
Least significant byte has lowest address
Intel x86
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
17/39
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
18/39
Carnegie Mellon
18
Byte Ordering Example (Cont).int j;for (j = 0; j < 8; j++)
dw.c[j] = 0xf0 + j;
printf("Characters 0-7 ==[0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x]\n",
dw.c[0], dw.c[1], dw.c[2], dw.c[3],dw.c[4], dw.c[5], dw.c[6], dw.c[7]);
printf("Shorts 0-3 == [0x%x,0x%x,0x%x,0x%x]\n",dw.s[0], dw.s[1], dw.s[2], dw.s[3]);
printf("Ints 0-1 == [0x%x,0x%x]\n",
dw.i[0], dw.i[1]);
printf("Long 0 == [0x%lx]\n",dw.l[0]);
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
19/39
Carnegie Mellon
19
Byte Ordering on IA32
Little Endian
Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6]Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4]Long 0 == [0xf3f2f1f0]
Output:
f0 f1 f2 f3 f4 f5 f6 f7
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
s[0] s[1] s[2] s[3]
i[0] i[1]l[0]
LSB MSB LSB MSB
Print
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
20/39
Carnegie Mellon
20
Byte Ordering on Sun
Big Endian
Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]Shorts 0-3 == [0xf0f1,0xf2f3,0xf4f5,0xf6f7]Ints 0-1 == [0xf0f1f2f3,0xf4f5f6f7]Long 0 == [0xf0f1f2f3]
Output on Sun:
f0 f1 f2 f3 f4 f5 f6 f7
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
s[0] s[1] s[2] s[3]
i[0] i[1]l[0]
MSB LSB MSB LSB
Print
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
21/39
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
22/39
Carnegie Mellon
22
Summary
Arrays in C Contiguous allocation of memory
Aligned to satisfy every elements alignment requirement
Pointer to first element
No bounds checking
Structures
Allocate bytes in order declared
Pad in middle and at end to satisfy alignment
Unions
Overlay declarations
Way to circumvent type system
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
23/39
g
23
Today
Structures Alignment
Unions
Memory Layout
Procedures (x86-64)
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
24/39
g
24
IA32 Linux Memory Layout
Stack Runtime stack (8MB limit)
E. g., local variables
Heap
Dynamically allocated storage When call malloc(), calloc(), new()
Data
Statically allocated data
E.g., arrays & strings declared in code
Text
Executable machine instructions
Read-onlyUpper 2 hex digits
= 8 bits of address
FF
00
Stack
Text
Data
Heap
08
8MB
not drawn to scale
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
25/39
g
25
Memory Allocation Example
char big_array[1
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
26/39
g
26
IA32 Example Addresses
$esp 0xffffbcd0p3 0x65586008p1 0x55585008p4 0x1904a110p2 0x1904a008&p2 0x18049760&beyond 0x08049744
big_array 0x18049780huge_array 0x08049760
main() 0x080483c6useless() 0x08049744finalmalloc() 0x006be166
address range ~232
FF
00
Stack
Text
Data
Heap
08
80
not drawn to scale
malloc() is dynamically linkedaddress determined at runtime
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
27/39
27
x86-64 Example Addresses
$rsp 0x00007ffffff8d1f8p3 0x00002aaabaadd010p1 0x00002aaaaaadc010p4 0x0000000011501120p2 0x0000000011501010&p2 0x0000000010500a60&beyond 0x0000000000500a44
big_array 0x0000000010500a80huge_array 0x0000000000500a50
main() 0x0000000000400510useless() 0x0000000000400500finalmalloc() 0x000000386ae6a170
address range ~247
00007F
000000
Stack
Text
Data
Heap
000030
not drawn to scale
malloc() is dynamically linkedaddress determined at runtime
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
28/39
28
Today
Structures Alignment
Unions
Memory Layout
Procedures (x86-64)
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
29/39
29
%rax
%rbx
%rcx
%rdx
%rsi
%rdi
%rsp
%rbp
x86-64 Integer Registers
Twice the number of registers
Accessible as 8, 16, 32, 64 bits
%eax
%ebx
%ecx
%edx
%esi
%edi
%esp
%ebp
%r8
%r9
%r10
%r11
%r12
%r13
%r14
%r15
%r8d
%r9d
%r10d
%r11d
%r12d
%r13d
%r14d
%r15d
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
30/39
30
%rax
%rbx
%rcx
%rdx
%rsi
%rdi%rsp
%rbp
x86-64 Integer Registers:
Usage Conventions
%r8
%r9
%r10
%r11
%r12
%r13%r14
%r15Callee saved Callee saved
Callee saved
Callee saved
Callee saved
Caller saved
Callee saved
Stack pointer
Caller Saved
Return value
Argument #4
Argument #1
Argument #3
Argument #2
Argument #6
Argument #5
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
31/39
31
x86-64 Registers
Arguments passed to functions via registers If more than 6 integral parameters, then pass rest on stack
These registers can be used as caller-saved as well
All references to stack frame via stack pointer Eliminates need to update %ebp/%rbp
Other Registers
6 callee saved
2 caller saved
1 return value (also usable as caller saved)
1 special (stack pointer)
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
32/39
32
x86-64 Long Swap
Operands passed in registers
First (xp) in %rdi, second (yp) in %rsi
64-bit pointers
No stack operations required (except ret)
Avoiding stack
Can hold all local information in registers
void swap_l(long *xp, long *yp)
{long t0 = *xp;long t1 = *yp;*xp = t1;*yp = t0;
}
swap:
movq (%rdi), %rdxmovq (%rsi), %raxmovq %rax, (%rdi)movq %rdx, (%rsi)ret
rtn Ptr %rsp
No stack
frame
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
33/39
33
x86-64 Locals in the Red Zone
Avoiding Stack Pointer Change Can hold all information within small
window beyond stack pointer
/* Swap, using local array */void swap_a(long *xp, long *yp){
volatile long loc[2];loc[0] = *xp;loc[1] = *yp;
*xp = loc[1];*yp = loc[0];
}
swap_a:movq (%rdi), %raxmovq %rax, -24(%rsp)movq (%rsi), %raxmovq %rax, -16(%rsp)movq -16(%rsp), %rax
movq %rax, (%rdi)movq -24(%rsp), %raxmovq %rax, (%rsi)ret
rtn Ptr
unused
%rsp
8
loc[1]
loc[0]
16
24
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
34/39
34
x86-64 NonLeaf without Stack Frame
No values held while swap beinginvoked
No callee save registers needed
repinstruction inserted as no-op Based on recommendation from AMD
/* Swap a[i] & a[i+1] */void swap_ele(long a[], int i){
swap(&a[i], &a[i+1]);}
swap_ele:movslq %esi,%rsi # Sign extend ileaq 8(%rdi,%rsi,8), %rax # &a[i+1]
leaq (%rdi,%rsi,8), %rdi # &a[i] (1starg) movq %rax, %rsi # (2ndarg)
call swaprep # No-opret
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
35/39
35
x86-64 Stack Frame Example
Keeps values of &a[i]and&a[i+1]in callee save
registers
Must set up stack frame to
save these registers
long sum = 0;/* Swap a[i] & a[i+1] */void swap_ele_su(long a[], int i)
{swap(&a[i], &a[i+1]);
sum += (a[i]*a[i+1]);}
swap_ele_su:movq %rbx, -16(%rsp)movq %rbp, -8(%rsp)subq $16, %rsp
movslq %esi,%raxleaq 8(%rdi,%rax,8), %rbx
leaq (%rdi,%rax,8), %rbpmovq %rbx, %rsimovq %rbp, %rdicall swap
movq (%rbx), %raximulq (%rbp), %rax
addq %rax, sum(%rip)movq (%rsp), %rbxmovq 8(%rsp), %rbpaddq $16, %rspret
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
36/39
36
Understanding x86-64 Stack Frameswap_ele_su:
movq %rbx, -16(%rsp) # Save %rbxmovq %rbp, -8(%rsp) # Save %rbpsubq $16, %rsp # Allocate stack frame
movslq %esi,%rax # Extend ileaq 8(%rdi,%rax,8), %rbx # &a[i+1] (callee save)leaq (%rdi,%rax,8), %rbp # &a[i] (callee save)
movq %rbx, %rsi # 2ndargumentmovq %rbp, %rdi # 1stargumentcall swap
movq (%rbx), %rax # Get a[i+1]imulq (%rbp), %rax # Multiply by a[i]
addq %rax, sum(%rip) # Add to summovq (%rsp), %rbx # Restore %rbxmovq 8(%rsp), %rbp # Restore %rbpaddq $16, %rsp # Deallocate frameret
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
37/39
37
Understanding x86-64 Stack Frame
rtn addr%rbp
%rsp8
%rbx16
rtn addr
%rbp
%rsp
+8
%rbx
movq %rbx, -16(%rsp) # Save %rbx
movq %rbp, -8(%rsp) # Save %rbp
subq $16, %rsp # Allocate stack frame
movq (%rsp), %rbx # Restore %rbxmovq 8(%rsp), %rbp # Restore %rbp
addq $16, %rsp # Deallocate frame
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
38/39
38
Interesting Features of Stack Frame
Allocate entire frame at once All stack accesses can be relative to %rsp
Do by decrementing stack pointer
Can delay allocation, since safe to temporarily use red zone
Simple deallocation
Increment stack pointer
No base/frame pointer needed
Carnegie Mellon
-
8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment
39/39
x86-64 Procedure Summary
Heavy use of registers Parameter passing
More temporaries since more registers
Minimal use of stack Sometimes none
Allocate/deallocate entire block
Many tricky optimizations What kind of stack frame to use
Various allocation techniques