MIPS I t ti S t MIPS Instruction Set Architecture IIcsl.skku.edu/uploads/ICE3003F09/3-mips2.pdfMIPS...
Transcript of MIPS I t ti S t MIPS Instruction Set Architecture IIcsl.skku.edu/uploads/ICE3003F09/3-mips2.pdfMIPS...
MIPS I t ti S t MIPS I t ti S t MIPS Instruction Set Architecture II
MIPS Instruction Set Architecture IIArchitecture IIArchitecture II
Jin-Soo Kim ([email protected])Jin Soo Kim ([email protected])Computer Systems Laboratory
Sungkyunkwan Universityhtt // l kk dhttp://csl.skku.edu
Making Decisions (1)Making Decisions (1)Making Decisions (1)Making Decisions (1)Conditional operationsp• Branch to a labeled instruction if a condition is true
– Otherwise, continue sequentially
beq rs, rt, L1– if (rs == rt) branch to instruction labeled L1;
bne rs rt L1
– if (rs != rt) branch to instruction labeled L1;
bne rs, rt, L1
– unconditional jump to instruction labeled L1;
j L1
2ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
j p
Making Decisions (2)Making Decisions (2)Making Decisions (2)Making Decisions (2)Compiling If statementsp g• C code:
if (i==j) f = g+h;
– f g in $s0 $s1
if (i j) f g+h;else f = g‐h;
f, g, … in $s0, $s1, …
• Compiled MIPS code:
bne $s3, $s4, Elseadd $s0, $s1, $s2j Exitj Exit
Else: sub $s0, $s1, $s2Exit: ...
3ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Exit: ...Assembler calculates addresses
Making Decisions (3)Making Decisions (3)Making Decisions (3)Making Decisions (3)Compiling loop statementsp g p• C code:
while (save[i] == k) i += 1;
– i in $s3, k in $s5, address of save in $s6
• Compiled MIPS code:
while (save[i] == k) i += 1;
• Compiled MIPS code:
Loop: sll $t1, $s3, 2add $t1 $t1 $s6add $t1, $t1, $s6lw $t0, 0($t1)bne $t0, $s5, Exit$ , $ ,addi $s3, $s3, 1j Loop
4ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Exit: ...
Making Decisions (4)Making Decisions (4)Making Decisions (4)Making Decisions (4)Basic blocks• A basic block is a sequence of instructions with
– No embedded branches (except at end)– No branch targets (except at beginning)
• A compiler identifies basicA compiler identifies basic blocks for optimization
A d d• An advanced processor canaccelerate execution of basicblocksblocks
5ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Making Decisions (5)Making Decisions (5)Making Decisions (5)Making Decisions (5)More conditional operationsp• Set result to 1 if a condition is true
– Otherwise, set to 0
if (rs < rt) rd = 1; else rd = 0;
slt rd, rs, rt– if (rs < rt) rd = 1; else rd = 0;
slti rt, rs, constant– if (rs < constant) rt = 1, else rt = 0;
• Use in combination with beq bne• Use in combination with beq, bne
slt $t0, $s1, $s2 # if ($s1 < $s2)bne $t0 $zero L1 # branch to L
6ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
bne $t0, $zero, L1 # branch to L
Making Decisions (6)Making Decisions (6)Making Decisions (6)Making Decisions (6)Branch instruction designg• Why not blt, bge, etc?• Hardware for <, ≥, … slower than =, ≠, , ,
– Combining with branch involves more work per instruction, requiring a slower clockAll i t ti li d!– All instructions are penalized!
• beq and bne are the common caseThi i d d i i• This is a good design compromise
7ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Making Decisions (7)Making Decisions (7)Making Decisions (7)Making Decisions (7)Signed vs. Unsignedg g• Signed comparison: slt, slti• Unsigned comparison: sltu, sltuig p ,• Example
$s0 = 1111 1111 1111 1111 1111 1111 1111 1111$s1 = 0000 0000 0000 0000 0000 0000 0000 0001
slt $t0, $s0, $s1 # signed⇒ –1 < +1 ⇒ $t0 = 1
lt $t0 $ 0 $ 1 # i dsltu $t0, $s0, $s1 # unsigned⇒ +4,294,967,295 > +1 11 ⇒ $t0 = 0
8ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Procedure Calls (1)Procedure Calls (1)Procedure Calls (1)Procedure Calls (1)Steps required for procedure callingp q p g
Place parameters in registers
Transfer control to procedure
Acquire storage for procedure
Perform procedure’s operations
Place result in register for caller
Return to place of call
9ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Return to place of call
Procedure Calls (2)Procedure Calls (2)Procedure Calls (2)Procedure Calls (2)Register usageg g
Registers Usage
$a0 – $a3 arguments (reg’s 4 – 7)$a0 $a3 arguments (reg s 4 7)
$v0, $v1 result values (reg’s 2 – 3)
$t0 – $t9 temporaries$ $ p(caller‐save reg’s: can be overwritten by callee)
$s0 – $s7 saved
(callee‐save reg’s: must be saved/restored by callee)
$gp global pointer for static data (reg 28)
$ t k i t ( 29)$sp stack pointer (reg 29)
$fp frame pointer (reg 30)
$ra return address (reg 31)
10ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
$ra return address (reg 31)
Procedure Calls (3)Procedure Calls (3)Procedure Calls (3)Procedure Calls (3)Procedure call: jump and linkj p
dd f f ll i i i i
jal ProcedureLabel
• Address of following instruction put in $ra• Jumps to target address
Procedure return: jump register
j $
• Copies $ra to program counter
jr $ra
• Can also be used for computed jumps– e.g., for case/switch statements
11ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Procedure Calls (4)Procedure Calls (4)Procedure Calls (4)Procedure Calls (4)Leaf procedure examplep p• C code:
int leaf example (int g h i j)int leaf_example (int g, h, i, j){
int f;;f = (g + h) – (i + j);return f;
}
– Arguments g, …, j in $a0, …, $a3
}
g g, , j , ,– f in $s0 (hence, need to save $s0 on stack)– Result in $v0
12ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Procedure Calls (5)Procedure Calls (5)Procedure Calls (5)Procedure Calls (5)Leaf procedure example (cont’d)p p ( )• MIPS code:
leaf example:leaf_example:addi $sp, $sp, ‐4sw $s0, 0($sp)
Save $s0 on stack$ , ($ p)
add $t0, $a0, $a1add $t1, $a2, $a3
$ $ $Procedure body
sub $s0, $t0, $t1add $v0, $s0, $zerolw $s0 0($sp)
Result
lw $s0, 0($sp)addi $sp, $sp, 4jr $ra
Restore $s0
Return
13ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
j
Procedure Calls (6)Procedure Calls (6)Procedure Calls (6)Procedure Calls (6)Non-leaf proceduresp• Procedures that call other procedures• For nested call, caller needs to save on the stack,
– Its return address– Any arguments and temporaries needed after the call
• Restore from the stack after the call
14ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Procedure Calls (7)Procedure Calls (7)Procedure Calls (7)Procedure Calls (7)Non-leaf procedure examplep p• C code:
int fact (int n)int fact (int n){
if (n < 1) return 1;( ) ;else return n * fact(n – 1);
}
– Argument n in $a0– Result in $v0Result in $v0
15ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Procedure Calls (8)Procedure Calls (8)Procedure Calls (8)Procedure Calls (8)Non-leaf procedure example (cont’d)p p ( )• MIPS code:
fact:ddi $ $ 8 # dj t t k f 2 itaddi $sp, $sp, ‐8 # adjust stack for 2 itemssw $ra, 4($sp) # save return addresssw $a0, 0($sp) # save argumentslti $t0, $a0, 1 # test for n < 1$ , $ ,beq $t0, $zero, L1addi $v0, $zero, 1 # if so, result is 1addi $sp, $sp, 8 # pop 2 items from stackj $ # d tjr $ra # and return
L1:addi $a0, $a0, ‐1 # else decrement njal fact # recursive calljlw $a0, 0($sp) # restore original nlw $ra, 4($sp) # and return addressaddi $sp, $sp, 8 # pop 2 items from stackmul $v0 $a0 $v0 # multiply to get result
16ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
mul $v0, $a0, $v0 # multiply to get resultjr $ra # and return
Procedure Calls (9)Procedure Calls (9)Procedure Calls (9)Procedure Calls (9)Local data on the stack
• Local data allocated by calleey– e.g., C automatic variables
• Procedure frame (activation record)
17ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
– Used by some compilers to manage stack storage
Procedure Calls (10)Procedure Calls (10)Procedure Calls (10)Procedure Calls (10)Memory layouty y• Text: program code• Static data: global variablesg
– e.g., static variables in C, constant arrays and strings$ i iti li d t dd– $gp initialized to address allowing ±offsets into this segment
• Dynamic data: heap– e.g., malloc in C, new in Java
• Stack: automatic storage
18ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Strings (1)Strings (1)Strings (1)Strings (1)Character data• Byte-encoded character sets
– ASCII: 128 characters» 95 graphic, 33 control
– Latin-1: 256 charactersASCII 96 hi h t» ASCII + 96 more graphic characters
• Unicode: 32-bit character set• Unicode: 32-bit character set– Used in Java, C++ wide characters, …– Most of the world’s alphabets, plus symbolsp , p y– UTF-8, UTF-16: variable-length encodings
19ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Strings (2)Strings (2)Strings (2)Strings (2)Byte/halfword operationsy p• Could use bitwise operations• MIPS byte/halfword load/storey / /
– String processing is a common case
lb rt offset(rs) lh rt offset(rs)
– Sign extend to 32 bits in rt
lb rt, offset(rs) lh rt, offset(rs)
– Zero extend to 32 bits in rt
lbu rt, offset(rs) lhu rt, offset(rs)
sb rt, offset(rs) sh rt, offset(rs)
20ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
– Store just rightmost byte/halfword
Strings (3)Strings (3)Strings (3)Strings (3)String copy exampleg py p• C code: Null-terminated string
int strcpy (char x[], char y[])int strcpy (char x[], char y[]){
int i;i = 0;while ((x[i] = y[i]) != ‘\0’)
i + 1;i += 1;}
– Addresses of x, y in $a0, $a1– i in $s0
21ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Strings (4)Strings (4)Strings (4)Strings (4)String copy example (cont’d)g py p ( )• MIPS code:
strcpy:ddi $ $ 4 # dj t t k f 1 itaddi $sp, $sp, ‐4 # adjust stack for 1 itemsw $s0, 0($sp) # save $s0add $s0, $zero, $zero # i = 0
L1: L1: add $t1, $s0, $a1 # addr of y[i] in $t1lbu $t2, 0($t1) # $t2 = y[i]add $t3, $s0, $a0 # addr of x[i] in $t3b $t2 0($t3) # [i] [i]sb $t2, 0($t3) # x[i] = y[i]beq $t2, $zero, L2 # exit loop if y[i] == 0addi $s0, $s0, 1 # i = i + 1j L1 # next iteration of loopj p
L2: lw $s0, 0($sp) # restore saved $s0addi $sp, $sp, 4 # pop 1 item from stackj $ # d t
22ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
jr $ra # and return
Arrays vs. Pointers (1)Arrays vs. Pointers (1)Arrays vs. Pointers (1)Arrays vs. Pointers (1)Array indexing involvesy g• Multiplying index by element size• Adding to array base addressg y
Pointers correspond directly to memoryPointers correspond directly to memory addresses• Can avoid indexing complexity• Can avoid indexing complexity
23ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Arrays vs. Pointers (2)Arrays vs. Pointers (2)Arrays vs. Pointers (2)Arrays vs. Pointers (2)Example: Clearing an arrayp g y
clear1(int array[], int size) {int i;for (i = 0; i < size; i += 1)
clear2(int *array, int size) {int *p;for (p = &array[0]; p < &array[size];for (i 0; i < size; i 1)
array[i] = 0;}
for (p &array[0]; p < &array[size];p = p + 1)
*p = 0;}
move $t0,$zero # i = 0loop1: sll $t1,$t0,2 # $t1 = i * 4
add $t2,$a0,$t1 # $t2 =# &array[i]
move $t0,$a0 # p = & array[0]sll $t1,$a1,2 # $t1 = size * 4add $t2,$a0,$t1 # $t2 =
# &array[size]# &array[i]sw $zero, 0($t2) # array[i] = 0addi $t0,$t0,1 # i = i + 1slt $t3,$t0,$a1 # $t3 =
# &array[size]loop2: sw $zero,0($t0) # Memory[p] = 0
addi $t0,$t0,4 # p = p + 4slt $t3,$t0,$t2 # $t3 =$ ,$ ,$ $
# (i < size)bne $t3,$zero,loop1 # if (…)
# goto loop1
$ ,$ ,$ $#(p<&array[size])
bne $t3,$zero,loop2 # if (…)# goto loop2
24ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Arrays vs. Pointers (3)Arrays vs. Pointers (3)Arrays vs. Pointers (3)Arrays vs. Pointers (3)Comparisonp• Multiply “strength reduced” to shift• Array version requires shift to be inside loopy q p
– Part of index calculation for incremented i– cf. incrementing pointer
• Compiler can achieve same effect as manual use of pointers
I d i i bl li i i– Induction variable elimination– Better to make program clearer and safer
25ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Addressing (1)Addressing (1)Addressing (1)Addressing (1)32-bit constants• Most constants are small
– 16-bit immediate is sufficient
• For the occasional 32-bit constant
lui rt, constant
– Copies 16-bit constant to left 16 bits of rtClears right 16 bits of rt to 0
lui rt, constant
– Clears right 16 bits of rt to 0
0000 0000 0111 1101 0000 0000 0000 0000lui $s0 61 0000 0000 0111 1101 0000 0000 0000 0000lui $s0, 61
0000 0000 0111 1101 0000 1001 0000 0000ori $s0, $s0, 2304
26ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Addressing (2)Addressing (2)Addressing (2)Addressing (2)Branch addressingg• Branch instructions specify
– Opcode, two registers, target address
op rs rt constant or address6 bits 5 bits 5 bits 16 bits
• Most branch targets are near branchForward or backward
6 bits 5 bits 5 bits 16 bits
– Forward or backward
• PC-relative addressing– Target address = PC + offset × 4Target address = PC + offset × 4– PC already incremented by 4 by this time
27ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Addressing (3)Addressing (3)Addressing (3)Addressing (3)Jump addressingp g• Jump (j and jal) targets could be anywhere in text
segment– Encode full address in instruction
op addressop address6 bits 26 bits
• (Pseudo) Direct jump addressing– Target address = PC31…28 : (address × 4)
28ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Addressing (4)Addressing (4)Addressing (4)Addressing (4)Target addressing exampleg g p• Loop code from earlier example
– Assume loop at location 80000
Loop: sll $t1, $s3, 2 80000 0 0 19 9 4 0
$ $ $add $t1, $t1, $s6 80004 0 9 22 9 0 32
lw $t0, 0($t1) 80008 35 9 8 0
b $t0 $ 5 E it 80012 5 8 21 2bne $t0, $s5, Exit 80012 5 8 21 2
addi $s3, $s3, 1 80016 8 19 19 1
j L 80020 2 20000j Loop 80020 2 20000
Exit: … 80024
29ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Addressing (5)Addressing (5)Addressing (5)Addressing (5)Branching far awayg y• If branch target is too far to encode with 16-bit
offset, assembler rewrites the code• Example:
beq $s0, $s1, L1
↓bne $s0, $s1, L2j L1
↓
j L1L2:
30ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Addressing (6)Addressing (6)Addressing (6)Addressing (6)Summaryy
31ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Synchronization (1)Synchronization (1)Synchronization (1)Synchronization (1)Synchronization instructionsy• Two processors sharing an area of memory
– P1 writes, then P2 reads– Data race if P1 and P2 don’t synchronize
» Result depends on order of accesses
H d i d• Hardware support required– Atomic read/write memory operation
No other access to the location allowed between the read– No other access to the location allowed between the read and write
• Could be a single instructiong– e.g., atomic swap of register ↔ memory
– Or an atomic pair of instructions
32ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Synchronization (2)Synchronization (2)Synchronization (2)Synchronization (2)Synchronization in MIPSy• Load linked: ll rt, offset(rs)• Store conditional: sc rt, offset(rs), ( )
– Succeeds if location not changed since the ll» Returns 1 in rt
– Fails if location is changed» Returns 0 in rt
E l t i (t t t/ t l k i bl )• Example: atomic swap (to test/set lock variable)
try: add $t0, $zero, $s4 # copy exchange valuell $t1 0($ 1) # l d li k dll $t1, 0($s1) # load linkedsc $t0, 0($s1) # store conditionalbeq $t0, $zero, try # branch store failsadd $s4, $zero, $t1 # put load value in $s4
33ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
$ , $ , $ p $
Translation and Startup (1)Translation and Startup (1)Translation and Startup (1)Translation and Startup (1)Overall flow
Many compilers produce object modules directly
Static linking
34ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Translation and Startup (2)Translation and Startup (2)Translation and Startup (2)Translation and Startup (2)Assembler pseudoinstructionsp• Most assembler instructions represent machine
instructions one-to-one• Pseudoinstructions: figments of the assembler’s
imagination
move $t0, $t1 add $t0, $zero, $t1→blt $t0, $t1, L slt $at, $t0, $t1
bne $at, $zero, L→
– $at (register 1): assembler temporary
35ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Translation and Startup (3)Translation and Startup (3)Translation and Startup (3)Translation and Startup (3)Producing an object moduleg j• Assembler (or compiler) translates program into
machine instructions• Provides information for building a complete
program from the pieces– Header: described contents of object module– Text segment: translated instructions
Static data segment: data allocated for the life of the– Static data segment: data allocated for the life of the program
– Relocation info: for contents that depend on absolute location of loaded program
– Symbol table: global definitions and external referencesDebug info: for associating with source code
36ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
– Debug info: for associating with source code
Translation and Startup (4)Translation and Startup (4)Translation and Startup (4)Translation and Startup (4)Linking object modulesg j• Produces an executable image
1. Merges segments2. Resolve labels (determine their addresses)3. Path location-dependent and external references
Could leave location dependencies for fixing b l ti l dby a relocating loader• But with virtual memory, no need to do this• Program can be loaded into absolute location in
virtual memory space
37ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Translation and Startup (5)Translation and Startup (5)Translation and Startup (5)Translation and Startup (5)Loading a programg p g• Load from image file on disk into memory
1. Read header to determine segment sizes2. Create virtual address space3. Copy text and initialized data into memory,
(or set page table entries so they can be faulted in)(or set page table entries so they can be faulted in)4. Setup arguments on stack5. Initialize registers (including $sp, $fp, $gp)g ( g p p gp)6. Jump to startup routine
» Copies arguments to $a0, … and calls main» When main returns, do exit syscall
38ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Translation and Startup (6)Translation and Startup (6)Translation and Startup (6)Translation and Startup (6)Dynamic linkingy g• Only link/load library procedure when it is called
– Requires procedure code to be relocatable– Avoids image bloat caused by static linking of all (transitively)
referenced librariesAutomatically picks up new library versions– Automatically picks up new library versions
39ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
Translation and Startup (7)Translation and Startup (7)Translation and Startup (7)Translation and Startup (7)Lazy linkagey g
Indirection table
Stub: Loads routine ID,Jump to linker/loader
Linker/loader code
Dynamicallymapped code
40ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
mapped code
Translation and Startup (8)Translation and Startup (8)Translation and Startup (8)Translation and Startup (8)Starting Java applicationsg pp
Simple portableSimple portable instruction set for the JVM
Interprets bytecodes
Compiles bytecodes of bytecodes“hot” methods into native code for host machine
41ICE3003: Computer Architecture | Fall 2009 | Jin-Soo Kim ([email protected])
for host machine