CS61C Pointers, Arrays, and Wrapup of Assembly Language Lecture 11
Lecture 2 - Assembly Language
Transcript of Lecture 2 - Assembly Language
Lecture 2Assembly Language
Computer and Network Security8th of October 2018
Computer Science and Engineering Department
CSE Dep, ACS, UPB Lecture 2, Assembly Language 1/38
Recap: Explorations Tools
I assembly and C language
I scripting language (Bash, Python, Perl)
I hexadecimal
I executable exploration: strings, xxd, objdump, IDA
I process exploration: strace, ltrace, lsof, pmap
I Capture the Flag (CTF) contests: http://ctftime.org
CSE Dep, ACS, UPB Lecture 2, Assembly Language 2/38
More Info on this Lecture
I https://ocw.cs.pub.ro/courses/iocla
CSE Dep, ACS, UPB Lecture 2, Assembly Language 3/38
Outline
Introduction to Assembly Language
Assembly Language Basics
x86 Assembly
Dealing with Binary Files
Summary
CSE Dep, ACS, UPB Lecture 2, Assembly Language 4/38
Evolution of Programming Languages
I machine code (punch cards)
I assembly language (architecture dependent)
I high-level languages (portable, compilers and interpreters)
CSE Dep, ACS, UPB Lecture 2, Assembly Language 5/38
Current Need for Assembly Language
I low-level optimizations
I unavailable features in C language
I security (binary analysis, offensive security)
I learning how the machine works
CSE Dep, ACS, UPB Lecture 2, Assembly Language 6/38
Mnemonics
I basic blocks for assembly language
I keywords for assembly instructions
I direct mapping to machine code
CSE Dep, ACS, UPB Lecture 2, Assembly Language 7/38
Sample Instruction
Assembly to machine code mapping
NASM syntax: add dword [0xdeadbeef], 42
hex: 8 3 0 5 e f b e a d d e 2 a
binary: [1000 0011][0000 0101][1110 1111 1011 1110 1010 1101 1101 1110][0010 1010]
| | | \- immediate: 42
| | \- memory address: 0xdeadbeef (note the endianness)
| \- opcode modifiers:
| 2 bits = addressing mode
| 3 bits = register/opcode modifier
| 3 bits = r/m field
\- opcode: add sign-extended 8-bits immediate to register, or 32-bits memory address
CSE Dep, ACS, UPB Lecture 2, Assembly Language 8/38
Computer Architecture
I instruction set architecture (ISA)
I register set
I addressing methods
CSE Dep, ACS, UPB Lecture 2, Assembly Language 9/38
Instruction Set Architecture (ISA)
I the types of assembly instructions
I addressing
I moving data
I control flow
I multiple processors may implement the same instruction set
I x86, x86 64, ARM, ARM64, MIPS, PowerPC
CSE Dep, ACS, UPB Lecture 2, Assembly Language 10/38
The Memory Hierarchy
I registers (used in assembly)
I cache memory (controlled by hardware)
I RAM (uses in assembly)
I flash/USB, hard drive
I tape backup
CSE Dep, ACS, UPB Lecture 2, Assembly Language 11/38
Outline
Introduction to Assembly Language
Assembly Language Basics
x86 Assembly
Dealing with Binary Files
Summary
CSE Dep, ACS, UPB Lecture 2, Assembly Language 12/38
Simple Assembly Program
1 extern puts
2 section .data
3 helloStr: db ’Hello, world!’,0
4 section .text
5 global main6 main:7 push helloStr
8 call puts
CSE Dep, ACS, UPB Lecture 2, Assembly Language 13/38
Assembling
Using nasm for assembling
$ nasm -f elf32 hello.asm
Using objdump for inspecting
$ objdump -M intel -d hello.o
[...]
Disassembly of section .text:
00000000 <main>:
0: 68 00 00 00 00 push 0x0
5: e8 fc ff ff ff call 6 <main+0x6>
$ objdump -M intel -r hello.o
[...]
RELOCATION RECORDS FOR [.text]:
OFFSET TYPE VALUE
00000001 R_386_32 .data
00000006 R_386_PC32 puts
CSE Dep, ACS, UPB Lecture 2, Assembly Language 14/38
Linking
Using ld for linking
$ ld -s -lc -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 -e main hello.o -o
hello-min
Using objdump for inspecting
$ objdump -M intel -d hello-min
[...]
Disassembly of section .plt:
08048170 <puts@plt-0x10>:
8048170: ff 35 40 92 04 08 push DWORD PTR ds:0x8049240
8048176: ff 25 44 92 04 08 jmp DWORD PTR ds:0x8049244
804817c: 00 00 add BYTE PTR [eax],al
...
08048180 <puts@plt>:
8048180: ff 25 48 92 04 08 jmp DWORD PTR ds:0x8049248
8048186: 68 00 00 00 00 push 0x0
804818b: e9 e0 ff ff ff jmp 8048170 <puts@plt-0x10>
Disassembly of section .text:
08048190 <.text>:
8048190: 68 4c 92 04 08 push 0x804924c
8048195: e8 e6 ff ff ff call 8048180 <puts@plt>
CSE Dep, ACS, UPB Lecture 2, Assembly Language 15/38
Another Program
1 extern printf
2 section .data
3 sum str: db ’Sum is %d.’,10,0
4 section .text
5 global main6 main:7 xor eax, eax ; Initialize sum register to 0.8
9 mov ecx, 100 ; Start from value and decrement.10 add number:11 add eax, ecx ; Add value to sum register.12 dec ecx ; Decrement value.13 test ecx, ecx ; Test if value is 0.14 jnz add number ; If value is 0, quit loop; otherwise jump to label.15 ;loopnz add number ; Does what the above three instructions do.16
17 ; Print value.18 push eax
19 push sum str
20 call printf
CSE Dep, ACS, UPB Lecture 2, Assembly Language 16/38
Computer Registers
I used for storing and managing data
I CPU/assembly instructions deal with registers
I the register size shows the architecture size/type
I may be orthogonal, or may have specific roles
I are referenced by names: eax, ebp, eflags (x86) or r0, r1, r2(ARM)
CSE Dep, ACS, UPB Lecture 2, Assembly Language 17/38
CPU Instructions
I instruction mnemonic: what the instruction does
I instruction operands: what the instruction uses
CSE Dep, ACS, UPB Lecture 2, Assembly Language 18/38
Addressing Modes
I ways for instructions to identify operandsI for code
I absolute addressing: in the instructionI relative addressing: in the instruction (+ current offset)I register indirect: address in the register
I for dataI register: data in registerI base plus offset: add offset to base valueI immediate: in the instruction
CSE Dep, ACS, UPB Lecture 2, Assembly Language 19/38
CISC vs. RISC Architectures
I Complex/Reduced Instruction Set Computing
I CISC: relative instruction size, multi-clock complexinstructions, memory-to-memory
I RISC: load-store architecture, focus on software
CSE Dep, ACS, UPB Lecture 2, Assembly Language 20/38
Outline
Introduction to Assembly Language
Assembly Language Basics
x86 Assembly
Dealing with Binary Files
Summary
CSE Dep, ACS, UPB Lecture 2, Assembly Language 21/38
Assembly Language Syntax
Intel Syntax
xor eax,eax
mov ecx,0x64
add eax,ecx
dec ecx
test ecx,ecx
jne 7 <add_number>
push eax
push 0x0
call 15 <add_number+0xe>
AT&T Syntax
xor %eax,%eax
mov $0x64,%ecx
add %ecx,%eax
dec %ecx
test %ecx,%ecx
jne 7 <add_number>
push %eax
push $0x0
call 15 <add_number+0xe>
CSE Dep, ACS, UPB Lecture 2, Assembly Language 22/38
Tools of the Trade
I NASM: assembler (Intel Syntax)
I GCC (gas): assembler (x86 Syntax)
I GCC (gcc, ld): compiler/linker
I objdump: disassembler (multiple syntaxes)
CSE Dep, ACS, UPB Lecture 2, Assembly Language 23/38
x86 Registers
I eax: accumulator, used in arithmetic operations
I ebx: base pointer in memory operations (e.g. arrays)
I ecx: loop counters
I edx: also used in arithmetic operations
I esi: source addresses in memory operations
I edi: destination addreses in memory operations
I ebp: frame base pointer
I esp: stack pointer
I named rax, rbx etc. in x86 64
CSE Dep, ACS, UPB Lecture 2, Assembly Language 24/38
Addressing
x86 Addressing Modes
mov eax, [0xcafebab3] ; direct (displacement)
mov eax, [esi] ; register indirect (base)
mov eax, [ebp-8] ; based (base + displacement)
mov eax, [ebx*4 + 0xdeadbeef] ; indexed (index*scale + displacement)
mov eax, [edx + ebx + 12] ; based-indexed w/o scale (base + index + displacement)
mov eax, [edx + ebx*4 + 42] ; based-indexed w/ scale (base + index*scale + displacement)
CSE Dep, ACS, UPB Lecture 2, Assembly Language 25/38
Data Transfer
I mov 〈dest〉, 〈src〉: move
I xchg 〈dest〉, 〈src〉: exchange (swap)
I movzx 〈dest〉, 〈src〉: move with zero extend
I movsx 〈dest〉, 〈src〉: move with sign extend
I movsb: move byte from location pointed to by esi to edi
I movsw: similar, move word (2 bytes)
I lea 〈dest〉, 〈src〉: load effective address (calculate address of〈src〉 and load it to 〈dest〉)
CSE Dep, ACS, UPB Lecture 2, Assembly Language 26/38
Control Flow
I Control Instructions:I jmp 〈addr〉: loads 〈addr〉 into eipI call 〈addr〉: pushes current eip on stack, and loads 〈addr〉 into
eipI ret 〈val〉: loads head of stack into eip, and pops 〈val〉 bytes off
the stackI loop 〈addr〉: decrements ecx, and jumps to 〈addr〉 if ecx != 0
I Conditional Jump Flags:I ZF (zero flag): previous arithmetic operation resulted in zeroI SF (sign flag): previous result’s most significant bitI CF (carry flag): previous result requires a carryI OF (overflow flag): previous result overflows the maximum
value that fits a register
CSE Dep, ACS, UPB Lecture 2, Assembly Language 27/38
Arithmetic/Logical
I Arithmetic Instructions:I add 〈dest〉, 〈src〉: additionI sub 〈dest〉, 〈src〉: subtractionI mul 〈arg〉: multiplication with corresponding byte-wise eax (i.e.
〈arg〉 = ”dh” ? dh * ah)I imul 〈arg〉: signed multiplicationI imul 〈dest〉, 〈src〉: signed multiplication (dest = dest * src)I imul 〈dest〉, 〈src〉, 〈aux〉: signed multiplication (dest = src *
aux)I div 〈arg〉: divisionI idiv 〈arg〉: signed divisionI neg 〈arg〉: 2’s complement negation
CSE Dep, ACS, UPB Lecture 2, Assembly Language 28/38
Arithmetic/Logical (2)
I Shifts and Rotations:I shr, shl (logical shift right/left)I sar, sal (arithmetic shift right/left)I shld, shrd (double-shift)I ror, rol (rotate)I rcr, rcl (rotate with carry)
I Logical Instructions:I and, or, xor, not
CSE Dep, ACS, UPB Lecture 2, Assembly Language 29/38
Function Calls
More in Lecture 4: The Stack. Buffer Management
CSE Dep, ACS, UPB Lecture 2, Assembly Language 30/38
System Calls
I the interface that allows user applications to request servicesfrom the OS kernel
I mechanism is invoked by triggering an interrupt (int 0x80)I conventions for invoking a syscall on Linux:
I eax contains the syscall IDI parameters are passed in ebx, ecx, edx, esi, edi, ebp (in this
order)I the syscall is responsible of saving and restoring all registers
CSE Dep, ACS, UPB Lecture 2, Assembly Language 31/38
Outline
Introduction to Assembly Language
Assembly Language Basics
x86 Assembly
Dealing with Binary Files
Summary
CSE Dep, ACS, UPB Lecture 2, Assembly Language 32/38
Disassembling
I checking the assembly code in object/executable files
I use disassemblers; no need for source code
I useful for reverse engineering
I objdump, IDA, GDB, radare2, Hopper, ImmunityDbg
CSE Dep, ACS, UPB Lecture 2, Assembly Language 33/38
Disassembling (2)
I disassemble object code in non-object files
I objdump -D -b binary -m i386 binary-file
CSE Dep, ACS, UPB Lecture 2, Assembly Language 34/38
Using NOPs
I when altering binary machine code
I you can’t remove data, you will mess the offsets
I use a hex editor (hexedit, bless) and replace code with NOPinstructions (0x90 in x86 assembly)
CSE Dep, ACS, UPB Lecture 2, Assembly Language 35/38
Outline
Introduction to Assembly Language
Assembly Language Basics
x86 Assembly
Dealing with Binary Files
Summary
CSE Dep, ACS, UPB Lecture 2, Assembly Language 36/38
Keywords
I assembly
I mnemonics
I instructions
I architecture
I ISA
I registers
I addressing modes
I CISC and RISC
I memory-to-memory
I load-store
I assembling
I linking
I control flow
I arithmetic/logical
I data transfer
I function calls
I system calls
I disassembling
I objdump
I NOP
CSE Dep, ACS, UPB Lecture 2, Assembly Language 37/38
Useful Links
I https://ocw.cs.pub.ro/courses/iocla
I http://en.wikibooks.org/wiki/X86_Assembly
I http://www.nasm.us/xdoc/2.11.05/html/nasmdoc0.html
I http://timelessname.com/elfbin/
I http://www.cs.umd.edu/~jkatz/security/s12/lecture22.ppt
I http://gcc.godbolt.org/
CSE Dep, ACS, UPB Lecture 2, Assembly Language 38/38