Systems Programming and Computer Use stack to pass arguments, save program counter ... Basis for...

77
Systems Programming and Computer Architecture (252-0061-00) Timothy Roscoe Herbstsemester 2016 © Systems Group | Department of Computer Science | ETH Zürich AS 2016 1 Basic x86 Architecture

Transcript of Systems Programming and Computer Use stack to pass arguments, save program counter ... Basis for...

Systems Programming and Computer Architecture

(252-0061-00)

Timothy Roscoe

Herbstsemester 2016

© Systems Group | Department of Computer Science | ETH Zürich

AS 2016 1 Basic x86 Architecture

7: Basic x86 architecture

Computer Architecture and Systems Programming

252-0061-00, Herbstsemester 2016

Timothy Roscoe

AS 2016 Basic x86 Architecture 2

Full disclosure…

I used to work for Intel.

(Employee nr. 10668584)

AS 2016 Basic x86 Architecture 3

7.1: What is an instruction set architecture?

Computer Architecture and Systems Programming

252-0061-00, Herbstsemester 2016

Timothy Roscoe

AS 2016 Basic x86 Architecture 4

Definitions

• Architecture: (also instruction set architecture: ISA) The parts of a processor design that one needs to understand to write assembly code. Examples: – instruction set specification, registers.

• Microarchitecture: Implementation of the architecture. • Examples:

– cache sizes and core frequency.

• Example ISAs: x86, MIPS, ia64, VAX, Alpha, ARM, etc.

AS 2016 Basic x86 Architecture 5

Instruction Set Architecture

• Assembly Language View – Processor state

• Registers, memory, …

– Instructions • addl, movq, leal, … • How instructions are encoded as bytes

• Layer of Abstraction – Above: how to program machine

• Processor executes instructions in a sequence

– Below: what needs to be built • Use variety of tricks to make it run fast • E.g., execute multiple instructions

simultaneously

ISA

Compiler OS

CPU Design

Circuit Design

Chip Layout

Application Program

AS 2016 Basic x86 Architecture 6

There are many architectures…

• You’ve already seen MIPS 2000 → MIPS 3000 → … – Workstations, minicomputers, now mostly embedded networking

• IBM S/360 → S/370 → … → zSeries – First to separate architecture from (many) implementations

• ARM (several variants) – Very common in embedded systems, basis for Advanced OS course at ETHZ

• IBM POWER → PowerPC (→ Cell, sort of) – Basis for all 3 last-gen games console systems

• DEC Alpha – Personal favorite; killed by Compaq, team left for Intel to work on…

• Intel Itanium – First 64-bit Intel product; very fast (esp. FP), hot, and expensive – Mostly overtaken by 64-bit x86 designs

• etc.

AS 2016 Basic x86 Architecture 7

CISC: Complex Instruction Set

• Dominant style through mid-80’s • Stack-oriented instruction set

– Use stack to pass arguments, save program counter – Explicit push and pop instructions

• Arithmetic instructions can access memory – addl %eax, 12(%rbx,%rcx,4)

• requires memory read and write • Complex address calculation

• Condition codes – Set as side effect of arithmetic and logical instructions

• Philosophy – Add instructions to perform “typical” programming tasks

AS 2016 Basic x86 Architecture 8

RISC: Reduced Instruction Set

• Internal project at IBM – Popularized by Hennessy (Stanford) and Patterson (Berkeley)

• Fewer, simpler instructions – Might need more to get given task done – Can execute them with small and fast hardware

• Register-oriented instruction set – Many more (typically 32) registers – Use for arguments, return pointer, temporaries

• Load-Store architecture – Only load and store instructions can access memory

• No condition codes – Test instructions return 0/1 in register

AS 2016 Basic x86 Architecture 9

Contrast MIPS with x86 / 64-bit

• Operations are highly uniform – All encoded in exactly 32 bits

– All take the same time to execute (mostly)

– All operate between registers, or only load/store

– Operate on 64 or 32 bit quantities (nothing smaller)

• No condition codes: use registers

• Lots of registers, including zero – All registers are uniform

AS 2016 Basic x86 Architecture 10

Other RISC features (not always seen)

• Explicit delay slots (e.g. MIPS)

– E.g. can’t use a value until 2 instructions after the load

• Make most instructions conditional (e.g. ARM)

– Needs condition codes (why?)

– Reduce branches

– Increase code density

• Key message: x86 is not like this!

AS 2016 Basic x86 Architecture 11

CISC vs. RISC

• An old debate with strong opinions!

• CISC proponents:

– Easy for compiler

– Smaller code size

• RISC proponents

– Better for optimizing compilers

– Run fast with simple chip design

AS 2016 Basic x86 Architecture 12

CISC vs. RISC today

• Desktops and servers:

– Choice of ISA not a technical issue

– Enough hardware can make anything run fast

• Embedded processors:

– RISC still makes sense

– Smaller, cheaper, less power – but for how long?

• Code compatibility more important…

AS 2016 Basic x86 Architecture 13

Summary

• Architecture vs. Microarchitecture

• Instruction set architectures

• RISC vs. CISC

• x86: comparison with MIPS

AS 2016 Basic x86 Architecture 14

7.2: A bit of x86 history

Computer Architecture and Systems Programming

252-0061-00, Herbstsemester 2016

Timothy Roscoe

AS 2016 Basic x86 Architecture 15

Intel x86 Processors

• The x86 Architecture still dominates the computer market – modulo ARM…

• Evolutionary design

– Backwards compatible up until 8086, introduced in 1978 – Added more features as time goes on

• Complex instruction set computer (CISC) – Not the most CISC, but definitely not RISC! – Performance matches or exceeds RISC – why?

AS 2016 Basic x86 Architecture 16

Intel x86 Evolution: Milestones

Name Date Transistors MHz • 8086 1978 29K 5-10

– First 16-bit processor. Basis for IBM PC & DOS – 1MB address space

• 80386 1985 275K 16-33 – First 32 bit processor , referred to as IA32 – Added “flat addressing” – Capable of running Unix – 32-bit Linux/gcc uses no instructions introduced in later models

• Pentium 4F 2005 230M 2800-3800 – First 64-bit [x86] processor – Meanwhile, Pentium 4s (Netburst arch.) phased out in favor of

“Core” line

AS 2016 Basic x86 Architecture 17

Intel x86 Processors: Overview

X86-64 / EM64t

X86-32/IA32

X86-16 8086 286

386 486 Pentium Pentium MMX

Pentium III

Pentium 4

Pentium 4E

Pentium 4F Core 2 Duo Core i7

IA: often redefined as latest Intel architecture

time

Architectures Processors

MMX

SSE

SSE2

SSE3

SSE4, AVX512

AS 2016 18 Basic x86 Architecture

Intel x86 Processors, contd.

• Machine evolution, examples: 486 1989 1.9M Pentium 1993 3.1M Pent./MMX 1997 74.5M PentiumPro 1995 6.5M Pentium III 1999 8.2M Pentium 4 2001 42M Core 2 Duo 2006 291M Xeon 7400 2008 1.9B Xeon i7 2012 4.3B

• Added Features – Instructions to support multimedia operations

• Parallel operations on 1, 2, and 4-byte data, both integer & FP

– Instructions to enable more efficient conditional operations – Virtualization extensions – Multiprocessor synchronization

AS 2016 Basic x86 Architecture 19

x86 clones: e.g. Advanced Micro Devices (AMD) • Historically

– AMD has followed just behind Intel – A little bit slower, a lot cheaper

• Then – Recruited top circuit designers from Digital Equipment

Corp. and other downward trending companies – Built Opteron: tough competitor to Pentium 4 – Developed x86-64, their own extension to 64 bits

• Recently – Intel much quicker with multicore core design – Intel currently far ahead in performance

AS 2016 Basic x86 Architecture 20

Move from 32 to 64 bits

• Intel attempted radical shift from ia32 to “ia64”

– Totally different architecture (“Itanium”)

– Executes IA32 code only as legacy - slowly

• AMD stepped in with evolutionary solution

– x86-64 (now called “AMD64”)

– 2004: Intel announces EM64T extension to ia32

– Almost identical to x86-64!

We'll use 64-bit x86 in this course AS 2016 21

Other extensions

• SGX: Software Guard Extensions

– Execute code in trusted enclaves

• TSX-NI: Transactional Memory

– Automatic lock elision

– Hardware memory transactions

• VT-x / VT-d

– Support for processor and I/O virtualization

• and several others…

AS 2016 Basic x86 Architecture 22

Curiosities: Intel Single-Chip Cloud Computer - 2010

• Experimental processor (only a few 100 made) – Designed for research – Working version in our Lab

• 48 Pentium/MMX cores • Very fast interconnection

network – Hardware support for

messaging between cores – Variable speed of network

• Non-cache coherent – Sharing memory between

cores won’t work with a conventional OS!

AS 2016 Basic x86 Architecture 23

Curiosities: Intel Xeon Phi - 2012

• PCIe-based card

• 22nm, 5 billion transistors

• 62 cores – Pentium/MMX (!)

– 64 bit extensions

– 512 bit AVX floating point

• Single “shared” L2 cache

AS 2016 Basic x86 Architecture 24

Current: Intel Knights Landing

• CPU socket

– 14nm process

• 16GB on-chip fast RAM

• 72 Atom cores

– 4 threads/core!

AS 2016 Basic x86 Architecture 25

7.3: Basics of machine code

Computer Architecture and Systems Programming

252-0061-00, Herbstsemester 2016

Timothy Roscoe

AS 2016 Basic x86 Architecture 26

A quick note on syntax

There are two common ways to write x86 Assembler:

• AT&T syntax

– What we'll use in this course, common on Unix

• Intel syntax

– Generally used for Windows machines

AS 2016 Basic x86 Architecture 27

CPU

Assembly programmer’s view

Programmer-Visible State – PC: Program counter

• Address of next instruction • Called “EIP” (IA32) or “RIP” (x86-64)

– Register file • Heavily used program data

– Condition codes • Store status information about most

recent arithmetic operation • Used for conditional branching

Memory • Byte addressable array • Code, user data, (some) OS data • Includes stack used to support

procedures

PC Registers

Memory

Object Code Program Data OS Data

Addresses

Data

Instructions

Stack

Condition Codes

AS 2016 Basic x86 Architecture 28

Compiling into assembly

int sum(int x, int y) { int t = x+y; return t; }

Generated x86 assembly sum: pushq %rbp movq %rsp, %rbp movl %edi, -20(%rbp) movl %esi, -24(%rbp) movl -24(%rbp), %eax movl -20(%rbp), %edx addl %edx, %eax movl %eax, -4(%rbp) movl -4(%rbp), %eax popq %rbp ret

Obtain with command

gcc -O -S code.c

Produces file code.s

Some compilers use single instruction “leave”

C code

AS 2016 29

Assembly data types

• “Integer” data of 1, 2, 4, or 8 bytes

– Data values

– Addresses (untyped pointers)

• Floating point data of 4, 8, or 10 bytes

– See later in the course

• No aggregate types (arrays, structures, …)

– Just contiguously allocated bytes in memory

AS 2016 Basic x86 Architecture 30

Assembly code operations

• Perform arithmetic function on register or memory data

• Transfer data between memory and register – Load data from memory into register – Store register data into memory

• Transfer control

– Unconditional jumps to/from procedures – Conditional branches

AS 2016 Basic x86 Architecture 31

Code for sum 0x401040 <sum>: 0: 55 1: 48 89 e5 4: 89 7d ec 7: 89 75 e8 a: 8b 45 e8 d: 8b 55 ec 10: 01 d0 12: 89 45 fc 15: 8b 45 fc 18: 5d 19: c3

Object code

• Assembler – Translates .s into .o – Binary encoding of each instruction – Nearly-complete image of

executable code – Missing linkages between code in

different files

• Linker – Resolves references between files – Combines with static run-time

libraries • E.g., code for malloc, printf

– Some libraries are dynamically linked • Linking occurs when program begins

execution

• Total of 26 bytes

• Each instruction 1, 2, or 3 bytes

• Starts at address 0x401040

AS 2016 32

Machine instruction example

• C Code – Add two signed integers

• Assembly – Add 2 4-byte integers

• “Long” words in GCC parlance • Same instruction whether

signed or unsigned

– Operands: • x: Register %eax • y: Memory M[%rbp+8] • t: Register %eax

– Return function value in %eax

• Object Code – 3-byte instruction – Stored at address 0x401046

int t = x+y;

addl 8(%rbp),%eax

0x401046: 03 45 08

Similar to expression:

x += y

More precisely:

int eax;

int *rbp;

eax += rbp[2]

AS 2016 Basic x86 Architecture 33

Disassembled

Disassembling object code

• Disassembler – objdump -d p – Useful tool for examining object code – Analyzes bit pattern of series of instructions – Produces approximate rendition of assembly code – Can be run on either a.out (complete executable) or .o file

AS 2016 Basic x86 Architecture 34

0000000000000000 <sum>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 89 7d ec mov %edi,-0x14(%rbp) 7: 89 75 e8 mov %esi,-0x18(%rbp) a: 8b 45 e8 mov -0x18(%rbp),%eax d: 8b 55 ec mov -0x14(%rbp),%edx 10: 01 d0 add %edx,%eax 12: 89 45 fc mov %eax,-0x4(%rbp) 15: 8b 45 fc mov -0x4(%rbp),%eax 18: 5d pop %rbp 19: c3 retq

(gdb) disassemble sum Dump of assembler code for function sum: 0x0000000000000000 <+0>: push %rbp 0x0000000000000001 <+1>: mov %rsp,%rbp 0x0000000000000004 <+4>: mov %edi,-0x14(%rbp) 0x0000000000000007 <+7>: mov %esi,-0x18(%rbp) 0x000000000000000a <+10>: mov -0x18(%rbp),%eax 0x000000000000000d <+13>: mov -0x14(%rbp),%edx 0x0000000000000010 <+16>: add %edx,%eax 0x0000000000000012 <+18>: mov %eax,-0x4(%rbp) 0x0000000000000015 <+21>: mov -0x4(%rbp),%eax 0x0000000000000018 <+24>: pop %rbp 0x0000000000000019 <+25>: retq End of assembler dump. (gdb)

Alternate disassembly

AS 2016 Basic x86 Architecture 35

Within gdb debugger:

Examining bytes

AS 2016 Basic x86 Architecture 36

(gdb) x/26xb sum 0x0 <sum>: 0x55 0x48 0x89 0xe5 0x89 0x7d 0xec 0x89 0x8 <sum+8>: 0x75 0xe8 0x8b 0x45 0xe8 0x8b 0x55 0xec 0x10 <sum+16>: 0x01 0xd0 0x89 0x45 0xfc 0x8b 0x45 0xfc 0x18 <sum+24>: 0x5d 0xc3 (gdb)

(gdb) x/26bx sum

eXamine memory

26 bytes… .. in heX

What can be disassembled?

• Anything that can be interpreted as executable code • Disassembler examines bytes and reconstructs assembly source

% objdump -d WINWORD.EXE WINWORD.EXE: file format pei-i386 No symbols in "WINWORD.EXE". Disassembly of section .text: 30001000 <.text>: 30001000: 55 push %ebp 30001001: 8b ec mov %esp,%ebp 30001003: 6a ff push $0xffffffff 30001005: 68 90 10 00 30 push $0x30001090 3000100a: 68 91 dc 4c 30 push $0x304cdc91

AS 2016 Basic x86 Architecture 37

Summary

• Compiling into assembly

• Data types in assembly

• Assembly code operations

• Object code, and disassembling it

AS 2016 Basic x86 Architecture 38

7.4: x86 architecture

Computer Architecture and Systems Programming

252-0061-00, Herbstsemester 2016

Timothy Roscoe

AS 2016 Basic x86 Architecture 39

8086 registers

gen

eral

pu

rpo

se

%ax %ah %al accumulate

%cx %ch %cl counter

%dx %dh %dl data

%bx %bh %bl base

%si source index

%di dest. index

%sp stack pointer

%bp base pointer

AS 2016 40 16 bits

%ip instruction pointer

%sr status (flags)

80386 (ia32) registers ge

ner

al p

urp

ose

%ax %ah %al accumulate

%cx %ch %cl counter

%dx %dh %dl data

%bx %bh %bl base

%si source index

%di dest. index

%sp stack pointer

%bp base pointer

AS 2016 41 16 bits

%ip instruction pointer

%sr status (flags)

%eax

%ecx

%edx

%ebx

%esi

%edi

%esp

%ebp

%eip

%esr

x86-64 integer registers %rax

%rbx

%rcx

%rdx

%rsi

%rdi

%rsp

%rbp

%eax

%ebx

%ecx

%edx

%esi

%edi

%esp

%ebp

%r8

%r9

%r10

%r11

%r12

%r13

%r14

%r15

%r8d

%r9d

%r10d

%r11d

%r12d

%r13d

%r14d

%r15d

AS 2016 42

%rip %eip %rsr %esr

gen

eral

pu

rpo

se

Moving data

• movx Source, Dest – x in {b, w, l,q}

– movq Source, Dest: Move 8-byte “quad word”

– movl Source, Dest: Move 4-byte “long word”

– movw Source, Dest: Move 2-byte “word”

– movb Source, Dest: Move 1-byte “byte”

• Lots of these in typical code

AS 2016 Basic x86 Architecture 43

%rax

%rbx

%rcx

%rdx

%rsi

%rdi

%rsp

%rbp

%eax

%ebx

%ecx

%edx

%esi

%edi

%esp

%ebp

%r8

%r9

%r10

%r11

%r12

%r13

%r14

%r15

%r8d

%r9d

%r10d

%r11d

%r12d

%r13d

%r14d

%r15d

Moving data

movx Source, Dest:

• Operand Types – Immediate: Constant integer data

• Example: $0x400, $-533 • Like C constant, but prefixed with ‘$’ • Encoded with 1, 2, 4, 8 bytes

– Register: One of 16 integer registers • Example: %eax, %r14d • Note some (e.g. %rsp, %rbp) reserved for special use • Others have special uses for particular instructions

– Memory: 1,2,4, or 8 consecutive bytes of memory at address given by register • Simplest example: (%rax) • Various other “address modes”

AS 2016 Basic x86 Architecture 44

%rax

%rbx

%rcx

%rdx

%rsi

%rdi

%rsp

%rbp

%eax

%ebx

%ecx

%edx

%esi

%edi

%esp

%ebp

%r8

%r9

%r10

%r11

%r12

%r13

%r14

%r15

%r8d

%r9d

%r10d

%r11d

%r12d

%r13d

%r14d

%r15d

movl operand combinations

Cannot do memory-memory transfer with a single instruction

movl

Imm

Reg

Mem

Reg

Mem

Reg

Mem

Reg

Source Dest C Analog

movl $0x4,%eax temp = 0x4;

movl $-147,(%rax) *p = -147;

movl %eax,%edx temp2 = temp1;

movl %eax,(%rdx) *p = temp;

movl (%rax),%edx temp = *p;

Src,Dest

AS 2016 Basic x86 Architecture 45

Simple memory addressing modes

• Normal (R) Mem[Reg[R]] – Register R specifies memory address

movq (%rcx),%rax

• Displacement D(R) Mem[Reg[R]+D] – Register R specifies start of memory region

– Constant displacement D specifies offset movl 8(%ebp),%edx

AS 2016 Basic x86 Architecture 46

On x86_64, can also be %rip

Using simple addressing modes swap: pushq %rbp movq %rsp, %rbp movq %rdi, -24(%rbp) movq %rsi, -32(%rbp) movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax) popq %rbp ret AS 2016 47

The optimizer

is off!

void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }

Using simple addressing modes swap: pushq %rbp movq %rsp, %rbp movq %rdi, -24(%rbp) movq %rsi, -32(%rbp) movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax) popq %rbp ret

Body

Set Up

Finish AS 2016

void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }

First 2 arguments passed in

%rdi, %rsi

Understanding swap void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }

swap: pushq %rbp movq %rsp, %rbp movq %rdi, -24(%rbp) movq %rsi, -32(%rbp) movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax) popq %rbp ret

AS 2016 49

t1 t0

old %rbp

rtn addr.

?

xp

yp

%rbp 0

8

-4

-24

-32

-8

t0 = *xp

t1 = *yp

*xp = t1

*yp = t0

Basic x86 Architecture

Understanding swap

movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax)

AS 2016 50

0x1040 0x1000

0x1050 0x1008

0x1010

0x1018

old %rbp 0x1020

rtn addr 0x1028

0x1030

0x1038

456 0x1040

0x1048

123 0x1050 0x1050

0x1020

%rax (%eax)

%rdx (%edx)

%rbp

registers

memory

%rbp

xp

yp

Basic x86 Architecture

Understanding swap

movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax)

AS 2016 51

0x1040 0x1000

0x1050 0x1008

0x1010

0x1018

old %rbp 0x1020

rtn addr 0x1028

0x1030

0x1038

456 0x1040

0x1048

123 0x1050 123

0x1020

%rax (%eax)

%rdx (%edx)

%rbp

registers

memory

%rbp

xp

yp

Basic x86 Architecture

123

Understanding swap

movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax)

AS 2016 52

0x1040 0x1000

0x1050 0x1008

0x1010

0x1018

old %rbp 0x1020

rtn addr 0x1028

0x1030

0x1038

456 0x1040

0x1048

123 0x1050 0x1040

0x1020

%rax (%eax)

%rdx (%edx)

%rbp

registers

memory

%rbp

xp

yp

123

Basic x86 Architecture

Understanding swap

movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax)

AS 2016 53

0x1040 0x1000

0x1050 0x1008

0x1010

0x1018

old %rbp 0x1020

rtn addr 0x1028

0x1030

0x1038

456 0x1040

0x1048

123 0x1050 456

0x1020

%rax (%eax)

%rdx (%edx)

%rbp

registers

memory

%rbp

xp

yp

456 123

Basic x86 Architecture

Understanding swap

movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax)

AS 2016 54

0x1040 0x1000

0x1050 0x1008

0x1010

0x1018

old %rbp 0x1020

rtn addr 0x1028

0x1030

0x1038

456 0x1040

0x1048

123 0x1050 01050

0x1020

%rax (%eax)

%rdx (%edx)

%rbp

registers

memory

%rbp

xp

yp

456 123

Basic x86 Architecture

Understanding swap

movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax)

AS 2016 55

0x1040 0x1000

0x1050 0x1008

0x1010

0x1018

old %rbp 0x1020

rtn addr 0x1028

0x1030

0x1038

456 0x1040

0x1048

456 0x1050 01050

456

0x1020

%rax (%eax)

%rdx (%edx)

%rbp

registers

memory

%rbp

xp

yp

456 123

Basic x86 Architecture

Understanding swap

movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax)

AS 2016 56

0x1040 0x1000

0x1050 0x1008

0x1010

0x1018

old %rbp 0x1020

rtn addr 0x1028

0x1030

0x1038

456 0x1040

0x1048

456 0x1050 0x1040

456

0x1020

%rax (%eax)

%rdx (%edx)

%rbp

registers

memory

%rbp

xp

yp

456 123

Basic x86 Architecture

Understanding swap

movq -24(%rbp), %rax movl (%rax), %eax movl %eax, -8(%rbp) movq -32(%rbp), %rax movl (%rax), %eax movl %eax, -4(%rbp) movq -24(%rbp), %rax movl -4(%rbp), %edx movl %edx, (%rax) movq -32(%rbp), %rax movl -8(%rbp), %edx movl %edx, (%rax)

AS 2016 57

0x1040 0x1000

0x1050 0x1008

0x1010

0x1018

old %rbp 0x1020

rtn addr 0x1028

0x1030

0x1038

123 0x1040

0x1048

456 0x1050 0x1040

123

0x1020

%rax (%eax)

%rdx (%edx)

%rbp

registers

memory

%rbp

xp

yp

456 123

Basic x86 Architecture

With the optimizer on…

• Operands passed in registers – First (xp) in %rdi, second (yp) in %rsi

– 32-bit integers, 64-bit pointers

• No stack operations required

void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; }

swap: movl (%rdi), %edx movl (%rsi), %eax movl %eax, (%rdi) movl %edx, (%rsi) retq

AS 2016 Basic x86 Architecture 58

Complete memory addressing modes

• Most General Form:

– D: Constant “displacement” 1, 2, or 4 bytes (not 8!) – Rb: Base register: Any of 16 integer registers – Ri: Index register: Any, except for %rsp

• Unlikely you’d use %rbp, either

– S: Scale: 1, 2, 4, or 8 (why these numbers?)

• Special Cases (Rb,Ri) Mem[Reg[Rb]+Reg[Ri]] D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D] (Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]

D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+ D]

(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]] D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D] (Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]

AS 2016 Basic x86 Architecture 59

Address computation examples

%rdx

%rcx

0xf000

0x100

Expression Address Computation Address

0x8(%rdx) 0xf000 + 0x8 0xf008

(%rdx,%rcx) 0xf000 + 0x100 0xf100

(%rdx,%rcx,4) 0xf000 + 4*0x100 0xf400

0x80(,%rdx,2) 2*0xf000 + 0x80 0x1e080

AS 2016 Basic x86 Architecture 60

Address computation instruction

• lea Src,Dest

– Src is address mode expression

– Set Dest to address denoted by expression

• Uses

– Computing addresses without a memory reference

• E.g., translation of p = &x[i];

– Computing arithmetic expressions of the form x + k*y

• k = 1, 2, 4, or 8

AS 2016 Basic x86 Architecture 61

Summary

• 64-bit x86 registers (and 32, and 16…)

• mov instruction: loads and stores

• memory addressing modes

– Example: swap()

• lea: address computation

AS 2016 Basic x86 Architecture 62

7.5: x86 integer arithmetic

Computer Architecture and Systems Programming

252-0061-00, Herbstsemester 2016

Timothy Roscoe

AS 2016 Basic x86 Architecture 63

Some arithmetic operations

• Two-operand instructions (longword variants): Format Computation addl Src,Dest Dest ← Dest + Src subl Src,Dest Dest ← Dest - Src imull Src,Dest Dest ← Dest * Src sall Src,Dest Dest ← Dest << Src Also called shll sarl Src,Dest Dest ← Dest >> Src Arithmetic shrl Src,Dest Dest ← Dest >> Src Logical xorl Src,Dest Dest ← Dest ^ Src andl Src,Dest Dest ← Dest & Src orl Src,Dest Dest ← Dest | Src

• No distinction between signed and unsigned int (why?)

AS 2016 Basic x86 Architecture 64

Some arithmetic operations

• One operand instructions

Format Computation

incl Dest Dest ← Dest + 1

decl Dest Dest ← Dest - 1

negl Dest Dest ← -Dest

notl Dest Dest ← ~Dest

• See book for more instructions

AS 2016 Basic x86 Architecture 65

Using leal for arithmetic expressions

int arith (int x, int y, int z) { int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval; }

arith: leal (%rdi,%rsi), %eax addl %edx, %eax leal (%rsi,%rsi,2), %edx sall $4, %edx leal 4(%rdi,%rdx), %ecx imull %ecx, %eax ret

AS 2016 Basic x86 Architecture 66

Understanding arith int arith (int x, int y, int z) { int t1 = x+y; int t2 = z+t1; int t3 = x+4; int t4 = y * 48; int t5 = t3 + t4; int rval = t2 * t5; return rval; }

leal (%rdi,%rsi), %eax # eax = x + y addl %edx, %eax # edx = z + eax leal (%rsi,%rsi,2), %edx # edx = y * 3 sall $4, %edx # edx *= 16 leal 4(%rdi,%rdx), %ecx # ecx = x + 4 + edx imull %ecx, %eax # eax *= ecx ret

AS 2016 Basic x86 Architecture 67

Another example

int logical(int x, int y) { int t1 = x^y; int t2 = t1 >> 17; int mask = (1<<13) - 7; int rval = t2 & mask; return rval; }

logical: xorl %esi, %edi sarl $17, %edi movl %edi, %eax andl $8185, %eax ret

xorl %esi, %edi # edi = x^y (t1) sarl $17, %edi # edi = t1>>17 (t2) movl %edi, %eax andl $8185,%eax # eax = t2 & 8185

213 = 8192, 213 – 7 = 8185

AS 2016 Basic x86 Architecture 68

7.6: Condition codes

Computer Architecture and Systems Programming

252-0061-00, Herbstsemester 2016

Timothy Roscoe

AS 2016 Basic x86 Architecture 69

Condition codes (implicit setting)

• Single bit registers CF Carry Flag (for unsigned) SF Sign Flag (for signed) ZF Zero Flag OF Overflow Flag (for signed)

• Implicitly set (think of it as side effect) by arithmetic operations

Example: addl/addq Src,Dest ↔ t = a+b – CF set if carry out from most significant bit (unsigned overflow) – ZF set if t == 0 – SF set if t < 0 (as signed) – OF set if two’s complement (signed) overflow

(a>0 && b>0 && t<0) || (a<0 && b<0 && t>=0)

• Not set by lea instruction • Full documentation link on course website

AS 2016 Basic x86 Architecture 70

Condition Codes (Explicit Setting: Compare)

• Explicit Setting by Compare Instruction cmpl/cmpq Src2,Src1 cmpl b,a like computing a-b without setting destination

CF set if carry out from most significant bit (used for unsigned comparisons) ZF set if a == b SF set if (a-b) < 0 (as signed) OF set if two’s complement (signed) overflow: (a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-b)>0)

AS 2016 Basic x86 Architecture 71

Condition Codes (Explicit Setting: Test)

• Explicit Setting by Test instruction

testl/testq Src2,Src1

testl b,a like computing a&b w/o setting destination

– Sets condition codes based on value of Src1 & Src2

– Useful to have one of the operands be a mask

ZF set when a&b == 0

SF set when a&b < 0

AS 2016 Basic x86 Architecture 72

Reading Condition Codes

• SetX Instructions

– Set single byte based on combinations of condition codes

SetX Condition Description

sete ZF Equal / Zero

setne ~ZF Not Equal / Not Zero

sets SF Negative

setns ~SF Nonnegative

setg ~(SF^OF)&~ZF Greater (Signed) setge ~(SF^OF) Greater or Equal (Signed) setl (SF^OF) Less (Signed) setle (SF^OF)|ZF Less or Equal (Signed) seta ~CF&~ZF Above (unsigned) setb CF Below (unsigned)

AS 2016 73 Basic x86 Architecture

Reading Condition Codes

• setx Instructions: – Set single byte based on combination of condition codes

– Does not alter remaining 7 bytes

AS 2016 74

int gt (long x, long y) { return x > y; }

xorl %eax, %eax # eax = 0 cmpq %rsi, %rdi # Compare x and y setg %al # al = x > y

Body (same for both)

long lgt (long x, long y) { return x > y; }

Is %rax now zero? Yes: 32-bit instructions set high-order 32 bits to 0

Basic x86 Architecture

Reading Condition Codes

• setX does not alter remaining 7 bytes

• Typically use movzbl to finish job

AS 2016 Basic x86 Architecture 75

int gt (int x, int y) { return x > y; }

cmpl %esi, %edi # Compare x : y setg %al # al = x > y movzbl %al,%eax # Zero rest of %eax

Body

%rax %eax %al %ah

movzbl expands a byte to 32 bits with leading

zeros (c.f. movsbl)

Jumping

jX Instructions: Jump to different part of code depending on condition codes

jX Condition Description jmp 1 Unconditional

je ZF Equal / Zero

jne ~ZF Not Equal / Not Zero

js SF Negative

jns ~SF Non-negative

jg ~(SF^OF)&~ZF Greater (Signed)

jge ~(SF^OF) Greater or Equal (Signed)

jl (SF^OF) Less (Signed)

jle (SF^OF)|ZF Less or Equal (Signed)

ja ~CF&~ZF Above (unsigned)

jb CF Below (unsigned)

AS 2016 76 Basic x86 Architecture

Summary

• Condition codes (C, Z, S, O)

• Explicit setting of condition codes

– Compare

– Test

• Reading condition codes

– setX

• Jumps

AS 2016 Basic x86 Architecture 77