Post on 09-Mar-2018
Systems Programming and Computer Architecture
(252-0061-00)
Timothy Roscoe
Herbstsemester 2016
© Systems Group | Department of Computer Science | ETH Zürich
AS 2016 1 Basic x86 Architecture
7: Basic x86 architecture
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2016
Timothy Roscoe
AS 2016 Basic x86 Architecture 2
7.1: What is an instruction set architecture?
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2016
Timothy Roscoe
AS 2016 Basic x86 Architecture 4
Definitions
• Architecture: (also instruction set architecture: ISA) The parts of a processor design that one needs to understand to write assembly code. Examples: – instruction set specification, registers.
• Microarchitecture: Implementation of the architecture. • Examples:
– cache sizes and core frequency.
• Example ISAs: x86, MIPS, ia64, VAX, Alpha, ARM, etc.
AS 2016 Basic x86 Architecture 5
Instruction Set Architecture
• Assembly Language View – Processor state
• Registers, memory, …
– Instructions • addl, movq, leal, … • How instructions are encoded as bytes
• Layer of Abstraction – Above: how to program machine
• Processor executes instructions in a sequence
– Below: what needs to be built • Use variety of tricks to make it run fast • E.g., execute multiple instructions
simultaneously
ISA
Compiler OS
CPU Design
Circuit Design
Chip Layout
Application Program
AS 2016 Basic x86 Architecture 6
There are many architectures…
• You’ve already seen MIPS 2000 → MIPS 3000 → … – Workstations, minicomputers, now mostly embedded networking
• IBM S/360 → S/370 → … → zSeries – First to separate architecture from (many) implementations
• ARM (several variants) – Very common in embedded systems, basis for Advanced OS course at ETHZ
• IBM POWER → PowerPC (→ Cell, sort of) – Basis for all 3 last-gen games console systems
• DEC Alpha – Personal favorite; killed by Compaq, team left for Intel to work on…
• Intel Itanium – First 64-bit Intel product; very fast (esp. FP), hot, and expensive – Mostly overtaken by 64-bit x86 designs
• etc.
AS 2016 Basic x86 Architecture 7
CISC: Complex Instruction Set
• Dominant style through mid-80’s • Stack-oriented instruction set
– Use stack to pass arguments, save program counter – Explicit push and pop instructions
• Arithmetic instructions can access memory – addl %eax, 12(%rbx,%rcx,4)
• requires memory read and write • Complex address calculation
• Condition codes – Set as side effect of arithmetic and logical instructions
• Philosophy – Add instructions to perform “typical” programming tasks
AS 2016 Basic x86 Architecture 8
RISC: Reduced Instruction Set
• Internal project at IBM – Popularized by Hennessy (Stanford) and Patterson (Berkeley)
• Fewer, simpler instructions – Might need more to get given task done – Can execute them with small and fast hardware
• Register-oriented instruction set – Many more (typically 32) registers – Use for arguments, return pointer, temporaries
• Load-Store architecture – Only load and store instructions can access memory
• No condition codes – Test instructions return 0/1 in register
AS 2016 Basic x86 Architecture 9
Contrast MIPS with x86 / 64-bit
• Operations are highly uniform – All encoded in exactly 32 bits
– All take the same time to execute (mostly)
– All operate between registers, or only load/store
– Operate on 64 or 32 bit quantities (nothing smaller)
• No condition codes: use registers
• Lots of registers, including zero – All registers are uniform
AS 2016 Basic x86 Architecture 10
Other RISC features (not always seen)
• Explicit delay slots (e.g. MIPS)
– E.g. can’t use a value until 2 instructions after the load
• Make most instructions conditional (e.g. ARM)
– Needs condition codes (why?)
– Reduce branches
– Increase code density
• Key message: x86 is not like this!
AS 2016 Basic x86 Architecture 11
CISC vs. RISC
• An old debate with strong opinions!
• CISC proponents:
– Easy for compiler
– Smaller code size
• RISC proponents
– Better for optimizing compilers
– Run fast with simple chip design
AS 2016 Basic x86 Architecture 12
CISC vs. RISC today
• Desktops and servers:
– Choice of ISA not a technical issue
– Enough hardware can make anything run fast
• Embedded processors:
– RISC still makes sense
– Smaller, cheaper, less power – but for how long?
• Code compatibility more important…
AS 2016 Basic x86 Architecture 13
Summary
• Architecture vs. Microarchitecture
• Instruction set architectures
• RISC vs. CISC
• x86: comparison with MIPS
AS 2016 Basic x86 Architecture 14
7.2: A bit of x86 history
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2016
Timothy Roscoe
AS 2016 Basic x86 Architecture 15
Intel x86 Processors
• The x86 Architecture still dominates the computer market – modulo ARM…
• Evolutionary design
– Backwards compatible up until 8086, introduced in 1978 – Added more features as time goes on
• Complex instruction set computer (CISC) – Not the most CISC, but definitely not RISC! – Performance matches or exceeds RISC – why?
AS 2016 Basic x86 Architecture 16
Intel x86 Evolution: Milestones
Name Date Transistors MHz • 8086 1978 29K 5-10
– First 16-bit processor. Basis for IBM PC & DOS – 1MB address space
• 80386 1985 275K 16-33 – First 32 bit processor , referred to as IA32 – Added “flat addressing” – Capable of running Unix – 32-bit Linux/gcc uses no instructions introduced in later models
• Pentium 4F 2005 230M 2800-3800 – First 64-bit [x86] processor – Meanwhile, Pentium 4s (Netburst arch.) phased out in favor of
“Core” line
AS 2016 Basic x86 Architecture 17
Intel x86 Processors: Overview
X86-64 / EM64t
X86-32/IA32
X86-16 8086 286
386 486 Pentium Pentium MMX
Pentium III
Pentium 4
Pentium 4E
Pentium 4F Core 2 Duo Core i7
IA: often redefined as latest Intel architecture
time
Architectures Processors
MMX
SSE
SSE2
SSE3
SSE4, AVX512
AS 2016 18 Basic x86 Architecture
Intel x86 Processors, contd.
• Machine evolution, examples: 486 1989 1.9M Pentium 1993 3.1M Pent./MMX 1997 74.5M PentiumPro 1995 6.5M Pentium III 1999 8.2M Pentium 4 2001 42M Core 2 Duo 2006 291M Xeon 7400 2008 1.9B Xeon i7 2012 4.3B
• Added Features – Instructions to support multimedia operations
• Parallel operations on 1, 2, and 4-byte data, both integer & FP
– Instructions to enable more efficient conditional operations – Virtualization extensions – Multiprocessor synchronization
AS 2016 Basic x86 Architecture 19
x86 clones: e.g. Advanced Micro Devices (AMD) • Historically
– AMD has followed just behind Intel – A little bit slower, a lot cheaper
• Then – Recruited top circuit designers from Digital Equipment
Corp. and other downward trending companies – Built Opteron: tough competitor to Pentium 4 – Developed x86-64, their own extension to 64 bits
• Recently – Intel much quicker with multicore core design – Intel currently far ahead in performance
AS 2016 Basic x86 Architecture 20
Move from 32 to 64 bits
• Intel attempted radical shift from ia32 to “ia64”
– Totally different architecture (“Itanium”)
– Executes IA32 code only as legacy - slowly
• AMD stepped in with evolutionary solution
– x86-64 (now called “AMD64”)
– 2004: Intel announces EM64T extension to ia32
– Almost identical to x86-64!
We'll use 64-bit x86 in this course AS 2016 21
Other extensions
• SGX: Software Guard Extensions
– Execute code in trusted enclaves
• TSX-NI: Transactional Memory
– Automatic lock elision
– Hardware memory transactions
• VT-x / VT-d
– Support for processor and I/O virtualization
• and several others…
AS 2016 Basic x86 Architecture 22
Curiosities: Intel Single-Chip Cloud Computer - 2010
• Experimental processor (only a few 100 made) – Designed for research – Working version in our Lab
• 48 Pentium/MMX cores • Very fast interconnection
network – Hardware support for
messaging between cores – Variable speed of network
• Non-cache coherent – Sharing memory between
cores won’t work with a conventional OS!
AS 2016 Basic x86 Architecture 23
Curiosities: Intel Xeon Phi - 2012
• PCIe-based card
• 22nm, 5 billion transistors
• 62 cores – Pentium/MMX (!)
– 64 bit extensions
– 512 bit AVX floating point
• Single “shared” L2 cache
AS 2016 Basic x86 Architecture 24
Current: Intel Knights Landing
• CPU socket
– 14nm process
• 16GB on-chip fast RAM
• 72 Atom cores
– 4 threads/core!
AS 2016 Basic x86 Architecture 25
7.3: Basics of machine code
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2016
Timothy Roscoe
AS 2016 Basic x86 Architecture 26
A quick note on syntax
There are two common ways to write x86 Assembler:
• AT&T syntax
– What we'll use in this course, common on Unix
• Intel syntax
– Generally used for Windows machines
AS 2016 Basic x86 Architecture 27
CPU
Assembly programmer’s view
Programmer-Visible State – PC: Program counter
• Address of next instruction • Called “EIP” (IA32) or “RIP” (x86-64)
– Register file • Heavily used program data
– Condition codes • Store status information about most
recent arithmetic operation • Used for conditional branching
Memory • Byte addressable array • Code, user data, (some) OS data • Includes stack used to support
procedures
PC Registers
Memory
Object Code Program Data OS Data
Addresses
Data
Instructions
Stack
Condition Codes
AS 2016 Basic x86 Architecture 28
Compiling into assembly
int sum(int x, int y) { int t = x+y; return t; }
Generated x86 assembly sum: pushq %rbp movq %rsp, %rbp movl %edi, -20(%rbp) movl %esi, -24(%rbp) movl -24(%rbp), %eax movl -20(%rbp), %edx addl %edx, %eax movl %eax, -4(%rbp) movl -4(%rbp), %eax popq %rbp ret
Obtain with command
gcc -O -S code.c
Produces file code.s
Some compilers use single instruction “leave”
C code
AS 2016 29
Assembly data types
• “Integer” data of 1, 2, 4, or 8 bytes
– Data values
– Addresses (untyped pointers)
• Floating point data of 4, 8, or 10 bytes
– See later in the course
• No aggregate types (arrays, structures, …)
– Just contiguously allocated bytes in memory
AS 2016 Basic x86 Architecture 30
Assembly code operations
• Perform arithmetic function on register or memory data
• Transfer data between memory and register – Load data from memory into register – Store register data into memory
• Transfer control
– Unconditional jumps to/from procedures – Conditional branches
AS 2016 Basic x86 Architecture 31
Code for sum 0x401040 <sum>: 0: 55 1: 48 89 e5 4: 89 7d ec 7: 89 75 e8 a: 8b 45 e8 d: 8b 55 ec 10: 01 d0 12: 89 45 fc 15: 8b 45 fc 18: 5d 19: c3
Object code
• Assembler – Translates .s into .o – Binary encoding of each instruction – Nearly-complete image of
executable code – Missing linkages between code in
different files
• Linker – Resolves references between files – Combines with static run-time
libraries • E.g., code for malloc, printf
– Some libraries are dynamically linked • Linking occurs when program begins
execution
• Total of 26 bytes
• Each instruction 1, 2, or 3 bytes
• Starts at address 0x401040
AS 2016 32
Machine instruction example
• C Code – Add two signed integers
• Assembly – Add 2 4-byte integers
• “Long” words in GCC parlance • Same instruction whether
signed or unsigned
– Operands: • x: Register %eax • y: Memory M[%rbp+8] • t: Register %eax
– Return function value in %eax
• Object Code – 3-byte instruction – Stored at address 0x401046
int t = x+y;
addl 8(%rbp),%eax
0x401046: 03 45 08
Similar to expression:
x += y
More precisely:
int eax;
int *rbp;
eax += rbp[2]
AS 2016 Basic x86 Architecture 33
Disassembled
Disassembling object code
• Disassembler – objdump -d p – Useful tool for examining object code – Analyzes bit pattern of series of instructions – Produces approximate rendition of assembly code – Can be run on either a.out (complete executable) or .o file
AS 2016 Basic x86 Architecture 34
0000000000000000 <sum>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 89 7d ec mov %edi,-0x14(%rbp) 7: 89 75 e8 mov %esi,-0x18(%rbp) a: 8b 45 e8 mov -0x18(%rbp),%eax d: 8b 55 ec mov -0x14(%rbp),%edx 10: 01 d0 add %edx,%eax 12: 89 45 fc mov %eax,-0x4(%rbp) 15: 8b 45 fc mov -0x4(%rbp),%eax 18: 5d pop %rbp 19: c3 retq
(gdb) disassemble sum Dump of assembler code for function sum: 0x0000000000000000 <+0>: push %rbp 0x0000000000000001 <+1>: mov %rsp,%rbp 0x0000000000000004 <+4>: mov %edi,-0x14(%rbp) 0x0000000000000007 <+7>: mov %esi,-0x18(%rbp) 0x000000000000000a <+10>: mov -0x18(%rbp),%eax 0x000000000000000d <+13>: mov -0x14(%rbp),%edx 0x0000000000000010 <+16>: add %edx,%eax 0x0000000000000012 <+18>: mov %eax,-0x4(%rbp) 0x0000000000000015 <+21>: mov -0x4(%rbp),%eax 0x0000000000000018 <+24>: pop %rbp 0x0000000000000019 <+25>: retq End of assembler dump. (gdb)
Alternate disassembly
AS 2016 Basic x86 Architecture 35
Within gdb debugger:
Examining bytes
AS 2016 Basic x86 Architecture 36
(gdb) x/26xb sum 0x0 <sum>: 0x55 0x48 0x89 0xe5 0x89 0x7d 0xec 0x89 0x8 <sum+8>: 0x75 0xe8 0x8b 0x45 0xe8 0x8b 0x55 0xec 0x10 <sum+16>: 0x01 0xd0 0x89 0x45 0xfc 0x8b 0x45 0xfc 0x18 <sum+24>: 0x5d 0xc3 (gdb)
(gdb) x/26bx sum
eXamine memory
26 bytes… .. in heX
What can be disassembled?
• Anything that can be interpreted as executable code • Disassembler examines bytes and reconstructs assembly source
% objdump -d WINWORD.EXE WINWORD.EXE: file format pei-i386 No symbols in "WINWORD.EXE". Disassembly of section .text: 30001000 <.text>: 30001000: 55 push %ebp 30001001: 8b ec mov %esp,%ebp 30001003: 6a ff push $0xffffffff 30001005: 68 90 10 00 30 push $0x30001090 3000100a: 68 91 dc 4c 30 push $0x304cdc91
AS 2016 Basic x86 Architecture 37
Summary
• Compiling into assembly
• Data types in assembly
• Assembly code operations
• Object code, and disassembling it
AS 2016 Basic x86 Architecture 38
7.4: x86 architecture
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2016
Timothy Roscoe
AS 2016 Basic x86 Architecture 39
8086 registers
gen
eral
pu
rpo
se
%ax %ah %al accumulate
%cx %ch %cl counter
%dx %dh %dl data
%bx %bh %bl base
%si source index
%di dest. index
%sp stack pointer
%bp base pointer
AS 2016 40 16 bits
%ip instruction pointer
%sr status (flags)
80386 (ia32) registers ge
ner
al p
urp
ose
%ax %ah %al accumulate
%cx %ch %cl counter
%dx %dh %dl data
%bx %bh %bl base
%si source index
%di dest. index
%sp stack pointer
%bp base pointer
AS 2016 41 16 bits
%ip instruction pointer
%sr status (flags)
%eax
%ecx
%edx
%ebx
%esi
%edi
%esp
%ebp
%eip
%esr
x86-64 integer registers %rax
%rbx
%rcx
%rdx
%rsi
%rdi
%rsp
%rbp
%eax
%ebx
%ecx
%edx
%esi
%edi
%esp
%ebp
%r8
%r9
%r10
%r11
%r12
%r13
%r14
%r15
%r8d
%r9d
%r10d
%r11d
%r12d
%r13d
%r14d
%r15d
AS 2016 42
%rip %eip %rsr %esr
gen
eral
pu
rpo
se
Moving data
• movx Source, Dest – x in {b, w, l,q}
– movq Source, Dest: Move 8-byte “quad word”
– movl Source, Dest: Move 4-byte “long word”
– movw Source, Dest: Move 2-byte “word”
– movb Source, Dest: Move 1-byte “byte”
• Lots of these in typical code
AS 2016 Basic x86 Architecture 43
%rax
%rbx
%rcx
%rdx
%rsi
%rdi
%rsp
%rbp
%eax
%ebx
%ecx
%edx
%esi
%edi
%esp
%ebp
%r8
%r9
%r10
%r11
%r12
%r13
%r14
%r15
%r8d
%r9d
%r10d
%r11d
%r12d
%r13d
%r14d
%r15d
Moving data
movx Source, Dest:
• Operand Types – Immediate: Constant integer data
• Example: $0x400, $-533 • Like C constant, but prefixed with ‘$’ • Encoded with 1, 2, 4, 8 bytes
– Register: One of 16 integer registers • Example: %eax, %r14d • Note some (e.g. %rsp, %rbp) reserved for special use • Others have special uses for particular instructions
– Memory: 1,2,4, or 8 consecutive bytes of memory at address given by register • Simplest example: (%rax) • Various other “address modes”
AS 2016 Basic x86 Architecture 44
%rax
%rbx
%rcx
%rdx
%rsi
%rdi
%rsp
%rbp
%eax
%ebx
%ecx
%edx
%esi
%edi
%esp
%ebp
%r8
%r9
%r10
%r11
%r12
%r13
%r14
%r15
%r8d
%r9d
%r10d
%r11d
%r12d
%r13d
%r14d
%r15d
movl operand combinations
Cannot do memory-memory transfer with a single instruction
movl
Imm
Reg
Mem
Reg
Mem
Reg
Mem
Reg
Source Dest C Analog
movl $0x4,%eax temp = 0x4;
movl $-147,(%rax) *p = -147;
movl %eax,%edx temp2 = temp1;
movl %eax,(%rdx) *p = temp;
movl (%rax),%edx temp = *p;
Src,Dest
AS 2016 Basic x86 Architecture 45
Simple memory addressing modes
• Normal (R) Mem[Reg[R]] – Register R specifies memory address
movq (%rcx),%rax
• Displacement D(R) Mem[Reg[R]+D] – Register R specifies start of memory region
– Constant displacement D specifies offset movl 8(%ebp),%edx
AS 2016 Basic x86 Architecture 46
On x86_64, can also be %rip
Using simple addressing modes swap: pushq %rbp movq %rsp, %rbp movq %rdi, -24(%rbp) movq %rsi, -32(%rbp) movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax) popq %rbp ret AS 2016 47
The optimizer
is off!
void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }
Using simple addressing modes swap: pushq %rbp movq %rsp, %rbp movq %rdi, -24(%rbp) movq %rsi, -32(%rbp) movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax) popq %rbp ret
Body
Set Up
Finish AS 2016
void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }
First 2 arguments passed in
%rdi, %rsi
Understanding swap void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }
swap: pushq %rbp movq %rsp, %rbp movq %rdi, -24(%rbp) movq %rsi, -32(%rbp) movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax) popq %rbp ret
AS 2016 49
t1 t0
old %rbp
rtn addr.
…
?
xp
yp
%rbp 0
8
-4
-24
-32
-8
t0 = *xp
t1 = *yp
*xp = t1
*yp = t0
Basic x86 Architecture
Understanding swap
movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax)
AS 2016 50
0x1040 0x1000
0x1050 0x1008
0x1010
0x1018
old %rbp 0x1020
rtn addr 0x1028
0x1030
0x1038
456 0x1040
0x1048
123 0x1050 0x1050
0x1020
%rax (%eax)
%rdx (%edx)
%rbp
registers
memory
%rbp
xp
yp
Basic x86 Architecture
Understanding swap
movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax)
AS 2016 51
0x1040 0x1000
0x1050 0x1008
0x1010
0x1018
old %rbp 0x1020
rtn addr 0x1028
0x1030
0x1038
456 0x1040
0x1048
123 0x1050 123
0x1020
%rax (%eax)
%rdx (%edx)
%rbp
registers
memory
%rbp
xp
yp
Basic x86 Architecture
123
Understanding swap
movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax)
AS 2016 52
0x1040 0x1000
0x1050 0x1008
0x1010
0x1018
old %rbp 0x1020
rtn addr 0x1028
0x1030
0x1038
456 0x1040
0x1048
123 0x1050 0x1040
0x1020
%rax (%eax)
%rdx (%edx)
%rbp
registers
memory
%rbp
xp
yp
123
Basic x86 Architecture
Understanding swap
movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax)
AS 2016 53
0x1040 0x1000
0x1050 0x1008
0x1010
0x1018
old %rbp 0x1020
rtn addr 0x1028
0x1030
0x1038
456 0x1040
0x1048
123 0x1050 456
0x1020
%rax (%eax)
%rdx (%edx)
%rbp
registers
memory
%rbp
xp
yp
456 123
Basic x86 Architecture
Understanding swap
movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax)
AS 2016 54
0x1040 0x1000
0x1050 0x1008
0x1010
0x1018
old %rbp 0x1020
rtn addr 0x1028
0x1030
0x1038
456 0x1040
0x1048
123 0x1050 01050
0x1020
%rax (%eax)
%rdx (%edx)
%rbp
registers
memory
%rbp
xp
yp
456 123
Basic x86 Architecture
Understanding swap
movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax)
AS 2016 55
0x1040 0x1000
0x1050 0x1008
0x1010
0x1018
old %rbp 0x1020
rtn addr 0x1028
0x1030
0x1038
456 0x1040
0x1048
456 0x1050 01050
456
0x1020
%rax (%eax)
%rdx (%edx)
%rbp
registers
memory
%rbp
xp
yp
456 123
Basic x86 Architecture
Understanding swap
movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax)
AS 2016 56
0x1040 0x1000
0x1050 0x1008
0x1010
0x1018
old %rbp 0x1020
rtn addr 0x1028
0x1030
0x1038
456 0x1040
0x1048
456 0x1050 0x1040
456
0x1020
%rax (%eax)
%rdx (%edx)
%rbp
registers
memory
%rbp
xp
yp
456 123
Basic x86 Architecture
Understanding swap
movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax)
AS 2016 57
0x1040 0x1000
0x1050 0x1008
0x1010
0x1018
old %rbp 0x1020
rtn addr 0x1028
0x1030
0x1038
123 0x1040
0x1048
456 0x1050 0x1040
123
0x1020
%rax (%eax)
%rdx (%edx)
%rbp
registers
memory
%rbp
xp
yp
456 123
Basic x86 Architecture
With the optimizer on…
• Operands passed in registers – First (xp) in %rdi, second (yp) in %rsi
– 32-bit integers, 64-bit pointers
• No stack operations required
void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }
swap: movl (%rdi), %edx movl (%rsi), %eax movl %eax, (%rdi) movl %edx, (%rsi) retq
AS 2016 Basic x86 Architecture 58
Complete memory addressing modes
• Most General Form:
– D: Constant “displacement” 1, 2, or 4 bytes (not 8!) – Rb: Base register: Any of 16 integer registers – Ri: Index register: Any, except for %rsp
• Unlikely you’d use %rbp, either
– S: Scale: 1, 2, 4, or 8 (why these numbers?)
• Special Cases (Rb,Ri) Mem[Reg[Rb]+Reg[Ri]] D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D] (Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]
D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+ D]
(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]] D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D] (Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]
AS 2016 Basic x86 Architecture 59
Address computation examples
%rdx
%rcx
0xf000
0x100
Expression Address Computation Address
0x8(%rdx) 0xf000 + 0x8 0xf008
(%rdx,%rcx) 0xf000 + 0x100 0xf100
(%rdx,%rcx,4) 0xf000 + 4*0x100 0xf400
0x80(,%rdx,2) 2*0xf000 + 0x80 0x1e080
AS 2016 Basic x86 Architecture 60
Address computation instruction
• lea Src,Dest
– Src is address mode expression
– Set Dest to address denoted by expression
• Uses
– Computing addresses without a memory reference
• E.g., translation of p = &x[i];
– Computing arithmetic expressions of the form x + k*y
• k = 1, 2, 4, or 8
AS 2016 Basic x86 Architecture 61
Summary
• 64-bit x86 registers (and 32, and 16…)
• mov instruction: loads and stores
• memory addressing modes
– Example: swap()
• lea: address computation
AS 2016 Basic x86 Architecture 62
7.5: x86 integer arithmetic
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2016
Timothy Roscoe
AS 2016 Basic x86 Architecture 63
Some arithmetic operations
• Two-operand instructions (longword variants): Format Computation addl Src,Dest Dest ← Dest + Src subl Src,Dest Dest ← Dest - Src imull Src,Dest Dest ← Dest * Src sall Src,Dest Dest ← Dest << Src Also called shll sarl Src,Dest Dest ← Dest >> Src Arithmetic shrl Src,Dest Dest ← Dest >> Src Logical xorl Src,Dest Dest ← Dest ^ Src andl Src,Dest Dest ← Dest & Src orl Src,Dest Dest ← Dest | Src
• No distinction between signed and unsigned int (why?)
AS 2016 Basic x86 Architecture 64
Some arithmetic operations
• One operand instructions
Format Computation
incl Dest Dest ← Dest + 1
decl Dest Dest ← Dest - 1
negl Dest Dest ← -Dest
notl Dest Dest ← ~Dest
• See book for more instructions
AS 2016 Basic x86 Architecture 65
Using leal for arithmetic expressions
int arith (int x, int y, int z) { int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval; }
arith: leal (%rdi,%rsi), %eax addl %edx, %eax leal (%rsi,%rsi,2), %edx sall $4, %edx leal 4(%rdi,%rdx), %ecx imull %ecx, %eax ret
AS 2016 Basic x86 Architecture 66
Understanding arith int arith (int x, int y, int z) { int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval; }
leal (%rdi,%rsi), %eax # eax = x + y addl %edx, %eax # edx = z + eax leal (%rsi,%rsi,2), %edx # edx = y * 3 sall $4, %edx # edx *= 16 leal 4(%rdi,%rdx), %ecx # ecx = x + 4 + edx imull %ecx, %eax # eax *= ecx ret
AS 2016 Basic x86 Architecture 67
Another example
int logical(int x, int y) { int t1 = x^y; int t2 = t1 >> 17; int mask = (1<<13) - 7; int rval = t2 & mask; return rval; }
logical: xorl %esi, %edi sarl $17, %edi movl %edi, %eax andl $8185, %eax ret
xorl %esi, %edi # edi = x^y (t1) sarl $17, %edi # edi = t1>>17 (t2) movl %edi, %eax andl $8185,%eax # eax = t2 & 8185
213 = 8192, 213 – 7 = 8185
AS 2016 Basic x86 Architecture 68
7.6: Condition codes
Computer Architecture and Systems Programming
252-0061-00, Herbstsemester 2016
Timothy Roscoe
AS 2016 Basic x86 Architecture 69
Condition codes (implicit setting)
• Single bit registers CF Carry Flag (for unsigned) SF Sign Flag (for signed) ZF Zero Flag OF Overflow Flag (for signed)
• Implicitly set (think of it as side effect) by arithmetic operations
Example: addl/addq Src,Dest ↔ t = a+b – CF set if carry out from most significant bit (unsigned overflow) – ZF set if t == 0 – SF set if t < 0 (as signed) – OF set if two’s complement (signed) overflow
(a>0 && b>0 && t<0) || (a<0 && b<0 && t>=0)
• Not set by lea instruction • Full documentation link on course website
AS 2016 Basic x86 Architecture 70
Condition Codes (Explicit Setting: Compare)
• Explicit Setting by Compare Instruction cmpl/cmpq Src2,Src1 cmpl b,a like computing a-b without setting destination
CF set if carry out from most significant bit (used for unsigned comparisons) ZF set if a == b SF set if (a-b) < 0 (as signed) OF set if two’s complement (signed) overflow: (a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-b)>0)
AS 2016 Basic x86 Architecture 71
Condition Codes (Explicit Setting: Test)
• Explicit Setting by Test instruction
testl/testq Src2,Src1
testl b,a like computing a&b w/o setting destination
– Sets condition codes based on value of Src1 & Src2
– Useful to have one of the operands be a mask
ZF set when a&b == 0
SF set when a&b < 0
AS 2016 Basic x86 Architecture 72
Reading Condition Codes
• SetX Instructions
– Set single byte based on combinations of condition codes
SetX Condition Description
sete ZF Equal / Zero
setne ~ZF Not Equal / Not Zero
sets SF Negative
setns ~SF Nonnegative
setg ~(SF^OF)&~ZF Greater (Signed) setge ~(SF^OF) Greater or Equal (Signed) setl (SF^OF) Less (Signed) setle (SF^OF)|ZF Less or Equal (Signed) seta ~CF&~ZF Above (unsigned) setb CF Below (unsigned)
AS 2016 73 Basic x86 Architecture
Reading Condition Codes
• setx Instructions: – Set single byte based on combination of condition codes
– Does not alter remaining 7 bytes
AS 2016 74
int gt (long x, long y) { return x > y; }
xorl %eax, %eax # eax = 0 cmpq %rsi, %rdi # Compare x and y setg %al # al = x > y
Body (same for both)
long lgt (long x, long y) { return x > y; }
Is %rax now zero? Yes: 32-bit instructions set high-order 32 bits to 0
Basic x86 Architecture
Reading Condition Codes
• setX does not alter remaining 7 bytes
• Typically use movzbl to finish job
AS 2016 Basic x86 Architecture 75
int gt (int x, int y) { return x > y; }
cmpl %esi, %edi # Compare x : y setg %al # al = x > y movzbl %al,%eax # Zero rest of %eax
Body
%rax %eax %al %ah
movzbl expands a byte to 32 bits with leading
zeros (c.f. movsbl)
Jumping
jX Instructions: Jump to different part of code depending on condition codes
jX Condition Description jmp 1 Unconditional
je ZF Equal / Zero
jne ~ZF Not Equal / Not Zero
js SF Negative
jns ~SF Non-negative
jg ~(SF^OF)&~ZF Greater (Signed)
jge ~(SF^OF) Greater or Equal (Signed)
jl (SF^OF) Less (Signed)
jle (SF^OF)|ZF Less or Equal (Signed)
ja ~CF&~ZF Above (unsigned)
jb CF Below (unsigned)
AS 2016 76 Basic x86 Architecture