INTRODUCTION & INSTRUCTIONS - Santa Clara Universityxyi/coen210/notes/1__Intro_Instructi… · ·...
Transcript of INTRODUCTION & INSTRUCTIONS - Santa Clara Universityxyi/coen210/notes/1__Intro_Instructi… · ·...
1
INTRODUCTIONINTRODUCTION& INSTRUCTIONS& INSTRUCTIONS
Dr. Bill YiSanta Clara University
(Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3rd Ed., Morgan Kaufmann, 2007)
(Also based on presentation: Dr. Nam Ling, COEN210 Lecture Notes)
2
COURSE CONTENTSCOURSE CONTENTSIntroductionIntroductionInstructionsInstructionsComputer ArithmeticProcessor: DatapathProcessor: ControlPipelining TechniquesMemoryInput/Output Devices
3
INTRODUCTIONINTRODUCTION
Overview the Computer SystemsEvolution of Memory and ProcessorHistorical PerspectiveLevels of Representation
4
A Desktop ComputerA Desktop Computer
A desktop computer (left figure)Motherboard, I/O interface board, board for memory chips, power supply, disk drives (right figure)
6
PC MotherboardPC Motherboard
Intel Pentium 4 processor - upper left, covered by metal fins (heat sink)Main memory DRAM – middle, small board perpendicular to mother board (DIMMs)The rest – mostly connectors for external I/O devices
7
Processor Chip Processor Chip -- 11
Earlier Intel Pentium Chip
Datacache
Instructioncache
Bus
Integerdata-path Floating-
pointdata-path
BranchControl
8
Processor Chip Processor Chip -- 22
Intel Pentium 4Intel Pentium 4 – die photo (Henessey & Patterson, Morgan Kaufmann 2003)
Intel Pentium 4 with 3 GHz -package (intel 2003)
10
Hardware / SoftwareHardware / Software
Hardware: physical componentsSystem software: operating system, compiler, ....Application software: PowerPoint, spreadsheet, ...
System software
Application software
Hardware
11
Five Classic Components of Five Classic Components of a Computer + Networka Computer + Network
DatapathDatapath:: performs arithmetic & logic operationControl:Control: tells datapath, memory, I/O what to do according to instructionsMemory:Memory: stores programs + datacache (SRAM): small & fastDRAM: main memoryoptical disk (CD, DVD), magnetic disk, FLASH, magnetic tapes: secondary, nonvolatileInput:Input: inputs instructions, data, etc.; e.g. keyboard, mouse (electromech optical), disk...Output:Output: outputs results, information, etc.; e.g. monitor (flat-panel LCDs or CRT), printer, disk, …
Network:Network: communicates with other computers, resource sharing, non-local accesses; e.g. LAN, Internet, ...
Input Output
Datapath
Memory
Control
CPU
Network
12
A Historical PerspectiveA Historical Perspective
1946: J. Presper Eckert & John Mauchly (U. Penn.) announced ENIAC (Electronic Numerical Integrator and Calculator). It used vacuum tubes and performed 1900 adds/secJohn von Neumann joined Eckert & Mauchly and built EDVAC (Electronic Discrete Variable Automatic Computer), a stored-program computer1948: U. Manchester built Mark-I, first operational, stored-program computer1949: Maurice Wilkes (Camb. U.) built EDSAC (Electronic Delay Storage Automatic Calculator), first full-scale, operational, stored-program computer1940s: Other pioneers include Konrad Zuse (Germany), Alan Turing (UK)1940s: Howard Aiken (Harvard) built Mark-III & Mark-IV, with separate memories for instructions & data, hence Harvard Architecture1947: Whirlwind started at MIT, using magnetic core memory1951: 1st successful commercial computer, UNIVAC I (Universal Automatic Computer), built and sold (Remington-Rand / Eckert-Mauchly Computer Corp.)1952: IBM shipped IBM 701
13
A Historical PerspectiveA Historical Perspective
1964: IBM Syst/360. IBM/360 architectures dominated large computer market1965: DEC unveiled PDP-8, 1st commercial minicomputer1971: Intel invented 1st microprocessor, Intel 40041963: Seymour Cray at CDC announced CDC 6600, 1st supercomputer1976: Cray announced Cray-I, then fastest supercomputerNo single fountainhead for personal computer1977: Apple II by Steve Jobs & Steve Wozniak set standards for low cost, high volume1981: IBM announced IBM PC and became the best-selling computer of any kind; its success gave Intel the most popular microprocessor and Microsoft the most popular operating system1990s: Multimedia, networks, Internet, embedded processors, graphics, etc. 2000 - : Wireless & mobile (e.g. cell phone), 3-D graphics, multimedia (e.g. video), Internet, GHz processors, embedded, dual-core, quad-core, multi-core, etc.90s, 2000 - : Architectural techniques: Superscalar, dynamic pipelining, speculative execution, VLIW, multithreading, multi-core arch, etc.
14
Intel 80x86 HistoryIntel 80x86 History1978: Intel announced 8086 16-bit architecture (an extension to 8080 8-bit)1980: Intel announced 8087 floating point co-processor1982: Intel announced 80286, with address-space extended to 24 bits1985: Intel announced 80386, a 32-bit architecture1989: Intel 80486, with improved performance, pipelining1992: Intel Pentium, improved performance1995: Intel Pentium Pro, improved performance (> 100 MHz)1997: MMX extension, set of instructions to accelerate multimedia & communication applications1998: Intel Pentium II1999: Intel Pentium III2000: Intel Pentium III > 1 GHz, competition from AMD, Pentium IV (11/00)2002: Intel Pentium IV > 3 GHz (3.06 GHz) with multithreading and 0.13 micron technology2005: Intel Pentium D (dual-core version of Pentium 4 Extreme) - 2 independent execution units onto same processor2006-07: Intel Quad-Core, 65 nm technology
16
Technology Trends Technology Trends -- 22
Moore’s law: transistor capacity doubles every 18-24 months
17
Multithreading &Multithreading &MultiMulti--core CPUscore CPUs
Threads (threads of execution) - a program forks itself into 2 or more simultaneously (or pseudo-simultaneously) running tasks
Multiple threads can be executed in parallel on many computers:Single processor - by time slicing when a single processor switches between different threads, so fast as to give the illusion of simultaneityMultiprocessor or multi-core system - achieved via multiprocessing, different threads & processes run simultaneously on different processors or cores.
Multi-core CPUs:Multi-chip approach - cores are made by different chips that are put together in a single package. Cores communicate using front side bus. L2 cache is separatedMonolithic approach - Cores are manufactured in only one chip, do not need to use front side bus. Memory cache is shared between the two cores. Better performance
18
Levels of RepresentationLevels of Representation
temp = v[k];v[k] = v[k+1];v[k+1] = temp;
lw $15, 0($2)lw $16, 4($2)sw $16, 0($2)sw $15, 4($2)
00000000101000010000000000011000
High level language program
Compiler
Assembly language program
Assembler
Object: Machine language modu.
Object: Library routine (machine lang.)Linker
Executable: Machine language prog.
Loader Memory
20
IntroductionIntroduction
Instruction: Words of machine’s languageInstruction Set: Set of instructionRISC (Reduced Instruction Set Computer) Design Principles:
Principle 1: Simplicity favors regularityPrinciple 2: Smaller is fasterPrinciple 3: Good design demands good compromisesPrinciple 4: Make the common case fast
We’ll be working with MIPS architectureUsed by NEC, Nintendo, Cisco, Silicon Graphics, Sony, …
21
MIPS Instruction Set Arch.: MIPS Instruction Set Arch.: RegistersRegisters
Registers - 32 general purpose registers, 3 special purpose registers, each 32 bits
$zero (0): constant 0$at (1): reserved for assembler$v0-v1 (2-3): values for results & expression evaluation$a0-a3 (4-7): arguments$t0-t7 (8-15): temporaries$s0-s7 (16-23): saved$t8-t9 (24-25): more temporaries$gp (28): global pointer$sp (29): stack pointer$fp (30): frame pointer$ra (31): return address
Registers $0 - $31
PC
Hi
Lo
3 special purpose registersPC: program counterHi, Lo: for multiply and divide
22
MIPS Instruction Set Arch.:MIPS Instruction Set Arch.:MemoryMemory
Word length = 32 bitsMemory: byte addressable, Big Endian
1 word = 4 bytesEach address is to a byte
Registers are smaller than memory, but with faster access time
Note:Word – unit of access in a computerBig-endian – uses leftmost or “big end” byte as word addressLittle-endian – uses rightmost or “little end”byte as word address
Memory
Register
32 bits
8 bits
23
Registers vs. MemoryRegisters vs. Memory
Arithmetic instructions operands must be registers,
Only 32 registers providedCompiler associates variables with registersWhat about programs with lots of variables
Processor I/O
Control
Datapath
Memory
Input
Output
24
InstructionsInstructions
Load and store instructionsExample:
C code: A[12] = h + A[8];MIPS code: lw $t0, 32($s3)
add $t0, $s2, $t0sw $t0, 48($s3)
Can refer to registers by name (e.g., $s2, $t2) instead of numberStore word has destination lastRemember arithmetic operands are registers, not memory!
Can’t write: add 48($s3), $s2, 32($s3)
25
Our First ExampleOur First Example
Can we figure out the code?
swap(int v[], int k);{ int temp;temp = v[k]v[k] = v[k+1];v[k+1] = temp;}
swap:muli $2, $5, 4add $2, $4, $2lw $15, 0($2)lw $16, 4($2)sw $16, 0($2)sw $15, 4($2)jr $31
26
MIPS Instruction TypesMIPS Instruction Types
Arithmetic & logic (AL)add $s1, $s2, $s3 # $s1 ← $s2 + $s3 sub $s1, $s2, $s3 # $s1 ← $s2 - $s3
each AL inst. has exactly 3 operands, all in registers
addi $s1, $s2, 100 # s1 ← $s2 + 100the constant is kept in the instruction itself
Data transfer (load & store)lw $s1, 100($s2) # $s1 ← memory [$s2+100] (load word)sw $s1, 100($s2) # memory[$s2+100] ← $s1 (store word)lb $s1, 100($s2) # $s1 ← memory [$s2+100] (load byte)sb $s1, 100($s2) # memory[$s2+100] ← $s1 (store byte)
load/store bytes commonly used for moving characters (ASCII)
27
MIPS Instruction TypesMIPS Instruction Types
Conditional Branchbeq $s2, $s3, L1 # branch to L1 if $s2 = $s3bne $s2, $s3, L1 # branch to L1 if $s2 ≠ $s3beq $s1, $s2, 25 # branch to PC + 4 + 100 (=4x25) if $s1 = $s2slt $s2, $s3, $s4 # if ($s3) < ($s4) then $s2 ← 1;
# else $s2 ← 0 (set on less than)
Unconditional Branchj Loop # go to Loop (jump)j 2500 # go to 4x2500=10000 (jump)jr $t1 # go to $t1 (jump register)jal Proc1 # $ra ← PC + 4; go to Proc1 (jump & link)
28
Compiling a HighCompiling a HighLevel LanguageLevel Language
Assignment statement (operands in registers, operands in memory)Assignment statement (operands with variable array index)If-then-else statement Loop with variable array indexWhile loopCase / switch statementProcedure that doesn’t call another procedureNested proceduresUsing stringsUsing constantsPutting things together
29
Arithmetic instructionsuseful for assignment statements
Data transfer instructionsuseful for arrays or structures
Conditional branchesuseful for if-then-else statements & loops
Unconditional branchesCase / switch statements, procedure calls and returns
Compiling a HighCompiling a HighLevel LanguageLevel Language
30
Basic BlocksBasic Blocks
A basic block is a sequence of instructionswithout branches except possibly at the end, andwithout branch targets or branch labels, except possibly at the beginning
One of the first early phases of compilation is breaking the program into basic blocks
31
Procedure CallProcedure Call
Use the following registers$a0-a3: to pass parameters$v0-v1: to return values for results & expression evaluation$ra: return address$sp: stack pointer (points to top of stack)$fp: frame pointer
Use the following instructionsjal ProcedureAddress # it jumps to the procedure address and saves
# the return address (PC + 4) in register $rajr $ra # return jump; jump to the address stored in register $ra
Use stack a part of memoryto save the registers needed by the callee
32
Nested ProceduresNested Procedures
Use stack to preserve values ($a0-a3, $s0-s7, $sp, $ra, stack above $sp, and $fp& $gp if need to use them)No need to preserve $t0-t9, $v0-v1, stack below $spFrame pointer serves as stable base register within procedure for local referencesProcedure frame (activation record):
$fp
$sp
Arg. registers
Return address
$fp
$sp
Saved registers
Local arrays &structures
$fp
$sp
High address
Low address
33
Instruction FormatInstruction Format
All instructions are 32 bits3 types of formats: R-type (Regular)I-type (Immediate)J-type (Jump)Fields (# of bits)
op (6): opcode (basic operation)rs (5): 1st register source operandrt (5): 2nd register source opd.rd (5): register destination opd.shamt (5): shift amountfunct (6): function (select specific variant of operation in op field)
Op rs rt rd shamt funct
Op rs rt address/immediate
Op target address
address/immediate (16)target address (26)
34
Instruction Format Instruction Format (Examples) (Examples) -- 11
R-type Examples: add $t0, $s2, $t0sub $s1, $s2, $s3slt $s1, $s2, $s3jr $ra #0s in rt, rd, and shamt fields
I-type Examples: lw $s1, 100($s2) #100 appears in address/immediate fieldsw $s1, 100($s2) #100 appears in address/immediate fieldbeq $s1, $s2, 25 # 25 appears in address/immediate field (eqv. to 100)
J-type Examples:j 2500 #2500 appears in target address field (eqv. to 4x2500=10000)jal 2500 #2500 appears in target address field (eqv. to 4x2500=10000)
35
R-type Example: add $t0, $s2, $t0
I-type Example: lw $s1, 100($s2)
J-type Example: j 2500
Op=35 rs=18 rt=17 100
Op=0 rs=18 rt=8 rd=8 shamt=0 funct=32
Op=2 2500
000000 10010 01000 01000 00000 100000
Instruction Format Instruction Format (Examples) (Examples) -- 22
36
Motivation for IMotivation for I--type type InstructionsInstructions
For many operations, one operand = constantC compiler gcc: 52%Spice 69%
Design principle: Make the common case fast
37
JJ--Type InstructionsType Instructions
Example:j 200# go to location 800 (=200*4)
Other J type instruction:jal 200 # jump & link, go to location 800 (=200*4)
# $31(ra) ← PC + 4
38
Assembly Language vs. Assembly Language vs. Machine LanguageMachine Language
Assembly provides convenient symbolic representation
much easier than writing down numberse.g., destination first
Machine language is the underlying realitye.g., destination is no longer first
Assembly can provide ‘pseudoinstructions’e.g., “move $t0, $t1” exists only in Assemblywould be implemented using “add $t0, $t1, $zero”
When considering performance you should count real instructions
39
Overview of MIPSOverview of MIPS
Simple instructions all 32 bits wideVery structured, no unnecessary baggageOnly three instruction formats
Addresses are not 32 bitsHow do we handle this with load and store instructions
op rs rt rd shamt functop rs rt 16 bit addressop 26 bit address
RIJ
40
Addresses in Addresses in Branches and JumpsBranches and Jumps
Instructions:bne $t4,$t5,Label Next instruction is at Label if $t4≠$t5beq $t4,$t5,Label Next instruction is at Label if $t4=$t5j Label Next instruction is at Label
Formats:
op rs rt 16 bit addressop 26 bit address
IJ
41
Addresses in BranchesAddresses in Branches
Instructions:bne $t4,$t5,Label Next instruction is at Label if $t4≠$t5beq $t4,$t5,Label Next instruction is at Label if $t4=$t5
Formats:
Could specify a register (like lw and sw) and add it to addressUse Instruction Address Register (PC = program counter)Most branches are local (principle of locality)
Jump instructions just use high order bits of PCAddress boundaries of 256 MB
op rs rt 16 bit addressI
42
Addressing ModesAddressing Modes
Register addressingRegister addressingoperand is in a register, e.g. add $s1, $s2, $s3
Base or displacement addressingBase or displacement addressingoperand at memory location [register + constant (base)]e.g. 2nd operand in lw $t0, 200($s1)
Immediate addressingImmediate addressingoperand is a constant within instructione.g. 3rd operand in addi $s1, $s2, 10
PCPC--relative addressingrelative addressingaddress = PC (+4) + constant in instruction (*4)e.g. 3rd operand in bne $s0, $s1, Exit
PseudodirectPseudodirect addressingaddressingaddress = PC upper bits concatenated with 26-bit address in inst.
43
Byte Halfword Word
Registers
Memory
Register
Register
1. Immediateaddressing
2. Registeraddressing
3. Baseaddressing
op rs rt
op rs rt
op rs rt Address
rd . . . funct
Immediate
+
Addressing ModesAddressing Modes
44
Memory
Word
Memory
Word
4. PC-relativeaddressing
5. Pseudodirect addressing
op
op
rs rt Address
Address
PC
PC
+
Addressing ModesAddressing Modes
45
Other IssuesOther Issues
MIPS assembler accepts this pseudoinstruction even though it is not found in MIPS architecture:
move $t0, $t1 #$t0 ← $t1it translates it to: add $t0, $zero, $t1
Other pseudoinstructions: mult, blt, bge, etc.Assembler keeps track of addresses of labels in symbol tableDetails of assembler, linker, & loader are given in Appendix ADetails of MIPS instruction set & architecture in Appendix A% frequency of instruction execution
Instruction Class gcc frequency spice frequencyArithmetic 48% 50%
Data Transfer 33% 41%
Conditional branch 17% 8%
Jump & proc. call 2% 1%
46
Instruction Set Instruction Set Architecture ClassesArchitecture Classes
Use of accumulator (a default register):1 address instruction; e.g. add A: acc ← acc + mem[A]e.g. EDSAC, IBM 701, DEC PDP-8, MC 6800, Intel 8008
Use of stack:0 address instruction; e.g. add: top(stack) ← top(stack) + next_top(stack)
Use of general purpose registers:2 address instruction; e.g. add A, B: A ← A + B3 address instruction; e.g. add A,B,C: A ← B + Cload/store (reg/reg): e.g. MIPS, Sun’s SPARC, MC PowerPC, DEC Alphamemory/memory: e.g. DEC VAXmemory/register: e.g. DEC VAX, IBM 360, DEC PDP-11, MC 68000, Intel 80386
47
RISC vs. CISCRISC vs. CISC
RISC -- Reduced Instruction Set Computer -- philosophy (instruction sets measured by how well compilers used them)
Emphasis on softwareSingle-clock, reduced instruction onlyRegister to register: “LOAD” and “STORE” are independent instructionsLow cycles per secondLarge code sizesSpends more transistors on memory registers
CISC – Complex Instruction Set Computer --Emphasis on hardwareIncludes multi-clock complex instructionsMemory-to-memory: “LOAD” and “STORE” incorporated in instructionsSmall code sizes, high cycles per secondTransistors used for storing complex instructions