Lec2 5 MachineLevelProgramming Unions Alignment

download Lec2 5 MachineLevelProgramming Unions Alignment

of 39

Transcript of Lec2 5 MachineLevelProgramming Unions Alignment

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    1/39

    Carnegie Mellon

    1

    Machine-Level Programming V:

    Advanced Topics

    15-213: Introduction to Computer Systems8thLecture, Sep. 16, 2010

    Instructors:

    Randy Bryant & Dave OHallaron

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    2/39

    Carnegie Mellon

    2

    Today

    Structures Alignment

    Unions

    Memory Layout

    Procedures (x86-64)

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    3/39

    Carnegie Mellon

    3

    Structures & Alignment

    Unaligned Data

    Aligned Data Primitive data type requires Kbytes

    Address must be multiple of K

    c i[0] i[1] vbytes 4 bytesp+0 p+4 p+8 p+16 p+24

    Multiple of 4 Multiple of 8

    Multiple of 8 Multiple of 8

    c i[0] i[1] v

    p p+1 p+5 p+9 p+17

    struct S1 {char c;int i[2];double v;

    } *p;

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    4/39

    Carnegie Mellon

    4

    Alignment Principles

    Aligned Data Primitive data type requires Kbytes

    Address must be multiple of K

    Required on some machines; advised on IA32

    treated differently by IA32 Linux, x86-64 Linux, and Windows! Motivation for Aligning Data

    Memory accessed by (aligned) chunks of 4 or 8 bytes (system

    dependent)

    Inefficient to load or store datum that spans quad word

    boundaries

    Virtual memory very tricky when datum spans 2 pages

    Compiler

    Inserts gaps in structure to ensure correct alignment of fields

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    5/39

    Carnegie Mellon

    5

    Specific Cases of Alignment (IA32) 1 byte: char,

    no restrictions on address

    2 bytes: short,

    lowest 1 bit of address must be 02

    4 bytes: int, float, char *,

    lowest 2 bits of address must be 002

    8 bytes: double,

    Windows (and most other OSs & instruction sets):

    lowest 3 bits of address must be 0002

    Linux:

    lowest 2 bits of address must be 002

    i.e., treated the same as a 4-byte primitive data type

    12 bytes: long double

    Windows, Linux:

    lowest 2 bits of address must be 002

    i.e., treated the same as a 4-byte primitive data type

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    6/39

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    7/39

    Carnegie Mellon

    7

    struct S1 {char c;int i[2];double v;

    } *p;

    Satisfying Alignment with Structures Within structure:

    Must satisfy each elements alignment requirement

    Overall structure placement

    Each structure has alignment requirement K

    K= Largest alignment of any element

    Initial address & structure length must be multiples of K

    Example (under Windows or x86-64):

    K = 8, due to doubleelement

    c i[0] i[1] vbytes 4 bytesp+0 p+4 p+8 p+16 p+24

    Multiple of 4 Multiple of 8

    Multiple of 8 Multiple of 8

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    8/39

    Carnegie Mellon

    8

    Different Alignment Conventions

    x86-64 or IA32 Windows: K = 8, due to doubleelement

    IA32 Linux

    K = 4; doubletreated like a 4-byte data type

    struct S1 {

    char c;int i[2];double v;

    } *p;

    c3 bytes

    i[0] i[1]4 bytes

    vp+0 p+4 p+8 p+16 p+24

    c 3 bytes i[0] i[1] v

    p+0 p+4 p+8 p+12 p+20

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    9/39

    Carnegie Mellon

    9

    Meeting Overall Alignment Requirement

    For largest alignment requirement K

    Overall structure must be multiple of K

    struct S2 {double v;int i[2];char c;

    } *p;

    v i[0] i[1] c 7 bytes

    p+0 p+8 p+16 p+24

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    10/39

    Carnegie Mellon

    10

    Arrays of Structures

    Overall structure lengthmultiple of K

    Satisfy alignment requirement

    for every element

    struct S2 {

    double v;int i[2];char c;

    } a[10];

    v i[0] i[1] c 7 bytes

    a+24 a+32 a+40 a+48

    a[0] a[1] a[2]

    a+0 a+24 a+48 a+72

    C i M ll

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    11/39

    Carnegie Mellon

    11

    Accessing Array Elements

    Compute array offset 12i sizeof(S3), including alignment spacers

    Element jis at offset 8 within structure

    Assembler gives offset a+8

    Resolved during linking

    struct S3 {short i;float v;

    short j;} a[10];

    short get_j(int idx){return a[idx].j;

    }

    # %eax = idxleal (%eax,%eax,2),%eax # 3*idx

    movswl a+8(,%eax,4),%eax

    a[0] a[i]

    a+0 a+12 a+12i

    i 2 bytes v j 2 bytesa+12i a+12i+8

    C i M ll

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    12/39

    Carnegie Mellon

    12

    Saving Space

    Put large data types first

    Effect (K=4)

    struct S4 {char c;int i;char d;

    } *p;

    struct S5 {int i;char c;char d;

    } *p;

    c ibytes d 3 bytes

    ci d bytes

    C i M ll

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    13/39

    Carnegie Mellon

    13

    Today

    Structures Alignment

    Unions

    Memory Layout

    Procedures (x86-64)

    C i M ll

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    14/39

    Carnegie Mellon

    14

    Union Allocation

    Allocate according to largest element

    Can only use one field at a time

    union U1 {char c;int i[2];

    double v;} *up;

    struct S1 {char c;int i[2];

    double v;} *sp;

    c 3 bytes i[0] i[1] 4 bytes v

    sp+0 sp+4 sp+8 sp+16 sp+24

    c

    i[0] i[1]

    v

    up+0 up+4 up+8

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    15/39

    Carnegie Mellon

    15

    typedef union {float f;unsigned u;

    } bit_float_t;

    float bit2float(unsigned u){bit_float_t arg;arg.u = u;return arg.f;

    }

    unsigned float2bit(float f){bit_float_t arg;arg.f = f;return arg.u;

    }

    Using Union to Access Bit Patterns

    Same as (float) u? Same as (unsigned) f?

    u

    f

    0 4

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    16/39

    Carnegie Mellon

    16

    Byte Ordering Revisited

    Idea

    Short/long/quad words stored in memory as 2/4/8 consecutive bytes

    Which is most (least) significant?

    Can cause problems when exchanging binary data between machines

    Big Endian

    Most significant byte has lowest address

    Sparc

    Little Endian

    Least significant byte has lowest address

    Intel x86

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    17/39

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    18/39

    Carnegie Mellon

    18

    Byte Ordering Example (Cont).int j;for (j = 0; j < 8; j++)

    dw.c[j] = 0xf0 + j;

    printf("Characters 0-7 ==[0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x]\n",

    dw.c[0], dw.c[1], dw.c[2], dw.c[3],dw.c[4], dw.c[5], dw.c[6], dw.c[7]);

    printf("Shorts 0-3 == [0x%x,0x%x,0x%x,0x%x]\n",dw.s[0], dw.s[1], dw.s[2], dw.s[3]);

    printf("Ints 0-1 == [0x%x,0x%x]\n",

    dw.i[0], dw.i[1]);

    printf("Long 0 == [0x%lx]\n",dw.l[0]);

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    19/39

    Carnegie Mellon

    19

    Byte Ordering on IA32

    Little Endian

    Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6]Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4]Long 0 == [0xf3f2f1f0]

    Output:

    f0 f1 f2 f3 f4 f5 f6 f7

    c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]

    s[0] s[1] s[2] s[3]

    i[0] i[1]l[0]

    LSB MSB LSB MSB

    Print

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    20/39

    Carnegie Mellon

    20

    Byte Ordering on Sun

    Big Endian

    Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]Shorts 0-3 == [0xf0f1,0xf2f3,0xf4f5,0xf6f7]Ints 0-1 == [0xf0f1f2f3,0xf4f5f6f7]Long 0 == [0xf0f1f2f3]

    Output on Sun:

    f0 f1 f2 f3 f4 f5 f6 f7

    c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]

    s[0] s[1] s[2] s[3]

    i[0] i[1]l[0]

    MSB LSB MSB LSB

    Print

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    21/39

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    22/39

    Carnegie Mellon

    22

    Summary

    Arrays in C Contiguous allocation of memory

    Aligned to satisfy every elements alignment requirement

    Pointer to first element

    No bounds checking

    Structures

    Allocate bytes in order declared

    Pad in middle and at end to satisfy alignment

    Unions

    Overlay declarations

    Way to circumvent type system

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    23/39

    g

    23

    Today

    Structures Alignment

    Unions

    Memory Layout

    Procedures (x86-64)

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    24/39

    g

    24

    IA32 Linux Memory Layout

    Stack Runtime stack (8MB limit)

    E. g., local variables

    Heap

    Dynamically allocated storage When call malloc(), calloc(), new()

    Data

    Statically allocated data

    E.g., arrays & strings declared in code

    Text

    Executable machine instructions

    Read-onlyUpper 2 hex digits

    = 8 bits of address

    FF

    00

    Stack

    Text

    Data

    Heap

    08

    8MB

    not drawn to scale

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    25/39

    g

    25

    Memory Allocation Example

    char big_array[1

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    26/39

    g

    26

    IA32 Example Addresses

    $esp 0xffffbcd0p3 0x65586008p1 0x55585008p4 0x1904a110p2 0x1904a008&p2 0x18049760&beyond 0x08049744

    big_array 0x18049780huge_array 0x08049760

    main() 0x080483c6useless() 0x08049744finalmalloc() 0x006be166

    address range ~232

    FF

    00

    Stack

    Text

    Data

    Heap

    08

    80

    not drawn to scale

    malloc() is dynamically linkedaddress determined at runtime

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    27/39

    27

    x86-64 Example Addresses

    $rsp 0x00007ffffff8d1f8p3 0x00002aaabaadd010p1 0x00002aaaaaadc010p4 0x0000000011501120p2 0x0000000011501010&p2 0x0000000010500a60&beyond 0x0000000000500a44

    big_array 0x0000000010500a80huge_array 0x0000000000500a50

    main() 0x0000000000400510useless() 0x0000000000400500finalmalloc() 0x000000386ae6a170

    address range ~247

    00007F

    000000

    Stack

    Text

    Data

    Heap

    000030

    not drawn to scale

    malloc() is dynamically linkedaddress determined at runtime

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    28/39

    28

    Today

    Structures Alignment

    Unions

    Memory Layout

    Procedures (x86-64)

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    29/39

    29

    %rax

    %rbx

    %rcx

    %rdx

    %rsi

    %rdi

    %rsp

    %rbp

    x86-64 Integer Registers

    Twice the number of registers

    Accessible as 8, 16, 32, 64 bits

    %eax

    %ebx

    %ecx

    %edx

    %esi

    %edi

    %esp

    %ebp

    %r8

    %r9

    %r10

    %r11

    %r12

    %r13

    %r14

    %r15

    %r8d

    %r9d

    %r10d

    %r11d

    %r12d

    %r13d

    %r14d

    %r15d

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    30/39

    30

    %rax

    %rbx

    %rcx

    %rdx

    %rsi

    %rdi%rsp

    %rbp

    x86-64 Integer Registers:

    Usage Conventions

    %r8

    %r9

    %r10

    %r11

    %r12

    %r13%r14

    %r15Callee saved Callee saved

    Callee saved

    Callee saved

    Callee saved

    Caller saved

    Callee saved

    Stack pointer

    Caller Saved

    Return value

    Argument #4

    Argument #1

    Argument #3

    Argument #2

    Argument #6

    Argument #5

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    31/39

    31

    x86-64 Registers

    Arguments passed to functions via registers If more than 6 integral parameters, then pass rest on stack

    These registers can be used as caller-saved as well

    All references to stack frame via stack pointer Eliminates need to update %ebp/%rbp

    Other Registers

    6 callee saved

    2 caller saved

    1 return value (also usable as caller saved)

    1 special (stack pointer)

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    32/39

    32

    x86-64 Long Swap

    Operands passed in registers

    First (xp) in %rdi, second (yp) in %rsi

    64-bit pointers

    No stack operations required (except ret)

    Avoiding stack

    Can hold all local information in registers

    void swap_l(long *xp, long *yp)

    {long t0 = *xp;long t1 = *yp;*xp = t1;*yp = t0;

    }

    swap:

    movq (%rdi), %rdxmovq (%rsi), %raxmovq %rax, (%rdi)movq %rdx, (%rsi)ret

    rtn Ptr %rsp

    No stack

    frame

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    33/39

    33

    x86-64 Locals in the Red Zone

    Avoiding Stack Pointer Change Can hold all information within small

    window beyond stack pointer

    /* Swap, using local array */void swap_a(long *xp, long *yp){

    volatile long loc[2];loc[0] = *xp;loc[1] = *yp;

    *xp = loc[1];*yp = loc[0];

    }

    swap_a:movq (%rdi), %raxmovq %rax, -24(%rsp)movq (%rsi), %raxmovq %rax, -16(%rsp)movq -16(%rsp), %rax

    movq %rax, (%rdi)movq -24(%rsp), %raxmovq %rax, (%rsi)ret

    rtn Ptr

    unused

    %rsp

    8

    loc[1]

    loc[0]

    16

    24

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    34/39

    34

    x86-64 NonLeaf without Stack Frame

    No values held while swap beinginvoked

    No callee save registers needed

    repinstruction inserted as no-op Based on recommendation from AMD

    /* Swap a[i] & a[i+1] */void swap_ele(long a[], int i){

    swap(&a[i], &a[i+1]);}

    swap_ele:movslq %esi,%rsi # Sign extend ileaq 8(%rdi,%rsi,8), %rax # &a[i+1]

    leaq (%rdi,%rsi,8), %rdi # &a[i] (1starg) movq %rax, %rsi # (2ndarg)

    call swaprep # No-opret

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    35/39

    35

    x86-64 Stack Frame Example

    Keeps values of &a[i]and&a[i+1]in callee save

    registers

    Must set up stack frame to

    save these registers

    long sum = 0;/* Swap a[i] & a[i+1] */void swap_ele_su(long a[], int i)

    {swap(&a[i], &a[i+1]);

    sum += (a[i]*a[i+1]);}

    swap_ele_su:movq %rbx, -16(%rsp)movq %rbp, -8(%rsp)subq $16, %rsp

    movslq %esi,%raxleaq 8(%rdi,%rax,8), %rbx

    leaq (%rdi,%rax,8), %rbpmovq %rbx, %rsimovq %rbp, %rdicall swap

    movq (%rbx), %raximulq (%rbp), %rax

    addq %rax, sum(%rip)movq (%rsp), %rbxmovq 8(%rsp), %rbpaddq $16, %rspret

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    36/39

    36

    Understanding x86-64 Stack Frameswap_ele_su:

    movq %rbx, -16(%rsp) # Save %rbxmovq %rbp, -8(%rsp) # Save %rbpsubq $16, %rsp # Allocate stack frame

    movslq %esi,%rax # Extend ileaq 8(%rdi,%rax,8), %rbx # &a[i+1] (callee save)leaq (%rdi,%rax,8), %rbp # &a[i] (callee save)

    movq %rbx, %rsi # 2ndargumentmovq %rbp, %rdi # 1stargumentcall swap

    movq (%rbx), %rax # Get a[i+1]imulq (%rbp), %rax # Multiply by a[i]

    addq %rax, sum(%rip) # Add to summovq (%rsp), %rbx # Restore %rbxmovq 8(%rsp), %rbp # Restore %rbpaddq $16, %rsp # Deallocate frameret

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    37/39

    37

    Understanding x86-64 Stack Frame

    rtn addr%rbp

    %rsp8

    %rbx16

    rtn addr

    %rbp

    %rsp

    +8

    %rbx

    movq %rbx, -16(%rsp) # Save %rbx

    movq %rbp, -8(%rsp) # Save %rbp

    subq $16, %rsp # Allocate stack frame

    movq (%rsp), %rbx # Restore %rbxmovq 8(%rsp), %rbp # Restore %rbp

    addq $16, %rsp # Deallocate frame

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    38/39

    38

    Interesting Features of Stack Frame

    Allocate entire frame at once All stack accesses can be relative to %rsp

    Do by decrementing stack pointer

    Can delay allocation, since safe to temporarily use red zone

    Simple deallocation

    Increment stack pointer

    No base/frame pointer needed

    Carnegie Mellon

  • 8/11/2019 Lec2 5 MachineLevelProgramming Unions Alignment

    39/39

    x86-64 Procedure Summary

    Heavy use of registers Parameter passing

    More temporaries since more registers

    Minimal use of stack Sometimes none

    Allocate/deallocate entire block

    Many tricky optimizations What kind of stack frame to use

    Various allocation techniques