EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures...
Transcript of EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures...
![Page 1: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/1.jpg)
EE382A – Spring 2009 Christos Kozyrakis Lecture 1 - 1
Department of Electrical Engineering
Stanford University
http://eeclass.stanford.edu/ee382a
EE382A
Advanced Processor Architecture
Christos Kozyrakis & John Shen
![Page 2: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/2.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 2
A Few Words About Christos
• Associate professor of EE & CS
– Ph.D. from U.C. Berkeley
– B.Sc. from University of Crete
• Current research
– Parallel systems (scheduling, TM)
– Energy efficient data-centers
– Security systems
– More info at http://csl.stanford.edu/~christos
• Systems I have worked on
– Networking chips: ATLAS & Telegraphos switches
– Processor chips: VIRAM media-processor
• 125 million transistors, 9.6 billion ops/sec
– FPGA prototypes: Raksha & Atlas
– Server prototypes: CoolSort
VIRAM media-processor
IRAM test chip
Telegraphos DSM
switch ATLAS ATM Switch
Raksha Security
System ATLAS TM System
![Page 3: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/3.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 3
A Few Words About John
• Head of Nokia Research Center in Palo Alto
– Ph.D. from USC
– B.Sc. from University of Michigan
• Prior to Nokia
– Director of the Microarchitecture Research Lab (MRL) at Intel
• Superscalar architecture, speculative multithreading and memory prefetching,
3D die-stacking technology, and heterogeneous multi-sequencer architectures
– Professor of Computer Engineering at CMU
• Author of the main textbook for EE382a
![Page 4: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/4.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 4
EE382a Team
• Instructors: Christos Kozyrakis & John Shen
• Teaching assistant: David Signiorelli
• Guest lectures: Ben Lee + one more
• Administrative support: Teresa Lynn
• Contact info & office hours: up-to-date info on class webpage
– http://eeclass.stanford.edu/ee382a
– Check frequently
![Page 5: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/5.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 5
You…
• Class participation is EXTREMELY important in EE382a
• Your goals
– Ask questions
– Offer answers
– Suggest discussion topics
– Make us learn your name
• Will take and post photos of everyone next week
![Page 6: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/6.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 6
Class Basics
• Lectures: Mo & We, 11am-12.15pm, Hewlett 101
– There will also be some discussion sessions on Fridays
• Friday 2-3pm, Gates Hall 498
• Discussion sessions will be explicitly announced
– The class is not available on SCPD this quarter
• Web page: http://eeclass.stanford.edu/ee382a
– Announcements, handouts, office hours, latest schedule, bulletin board
– Check frequently
– Signup with webpage for on-line access to grades
• We will let you know when registration is open…
![Page 7: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/7.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 7
The Bulletin Board
• The preferred way to ask class-related questions
– We promise to check & answer often, especially close to deadlines
– We encourage you to contribute to answers & have on-line discussions on
class material
• The bulletin rules
– Before posting a new question
• Check if question has already been asked or even answered
– Use the search capabilities of your web browser
• Check the FAQ page for the assignment
– Choose an appropriate subject for your question
• E.g. “HW2, problem 3, definition of memory latency”
• For questions not appropriate for the public: send us an email
![Page 8: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/8.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 8
EE382a Topics
• Pipelining overview and analysis
• Architectures for instruction level parallelism
– Supersalar: instruction fetch, branch prediction, dynamic scheduling &
register renaming, memory disambiguation
– VLIW and dynamic binary translation
• Architecture for task and data level paralellism
– Multithreading, multi-core architectures, vector processing, GPUs, tradeoffs
in designing multi-core chips, memory hierarchy for multi-core
• Cross-cutting issues
– Checkpointed processors, phase-change memory, …
![Page 9: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/9.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 9
Textbooks and Papers
• Textbooks
– Required: "Modern Processor Design: Fundamentals of Superscalar
Processors", J.P. Shen and M. Lipasti, 1st edition, McGraw-Hill
• Do not use/buy the beta edition!
– Reference: “Computer Architecture: A Quantitative Approach”, J. Hennessy
& D. Patterson, 4th edition, Morgan Kaufmann
– Reference: “Computer Organization and Design: The Hardware/Software
Interface”, D. Patterson & J. Hennessy, 4th edition, Morgan Kaufmann
• Papers (check handouts link on the webpage)
– A few required papers
• These papers are included in the exam materials
• Have to submit a 1-page paper summary by the next lecture
– Several optional papers
• Further in-depth information, references for projects, …
![Page 10: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/10.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 10
Assignments, Exams, and Class Load
• Single exam and 1+2 homework assignments
• Large research project
– On an open question in computer architecture
– Work in groups of up to 3 students
– See topic suggestions on-line or suggest your own project
– Milestones: proposal, halfway review/status, presentation, paper…
• Grade breakdown (tentative)
– Exam 40%, Project 40%, HW + summaries + participation 20%
– All deadlines are final, no extensions, no exceptions
– Remember the honor code (more info on web page)
• Warnings
– This will be a loaded class!!
– This class will be as good as your participation…
![Page 11: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/11.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 11
Prerequisites and Registration
• Prerequisites: EE108B or equivalent
– Expected to know: simple pipelines, basic caching, virtual memory, main memory
• EE282 is not a required prerequisite
• Class registration:
– Limited to 30 students; all students must receive instructor’s approval
• Homework 1: prerequisite assessment
– Due on in-class on Monday
– Work on it on your own
– Will send you email about your registration by Wednesday
![Page 12: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/12.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 12
Should I Take EE382A?
• Good reason to take EE382A
– Prepare for research in computer architecture
– Broaden your Ph.D. research perspective
– Become a digital systems architect in industry
– Honest curiosity (how do Intel/AMD/… processors work?)
– Want to take a class with a research project
• Not a good reason to take EE382A
– Prepare for quals, comps, etc…
– Need another course for your degree program
• “EE382A is supposed to be an easy A, right?”
– Learn about digital circuits and CAD tools
![Page 13: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/13.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 13
On Reading & Summarizing Papers
• Look for the following
– The issue or problem addressed by the paper
– The original contributions (real or claimed, you have to check)
– Critique: what are the major strengths and weaknesses of the papers?
• Look at the claims and assumptions, the methodology, the analysis of data, and the presentation style
– Future work: what are the natural extensions or improvements to this work?
• Or, can we apply a similar methodology to other problems of interest
• Do not submit the paper abstract as your summary :)
• Helpful tips
– Read the abstract, introduction, and conclusions sections first.
– Read the rest of the paper twice
• First a quick pass to get rough idea of details, then a detailed reading
– Underline/highlight the important parts of the paper
– Keep notes on the paper margins about comments or questions
• Important insights, questionable claims, relevance to other topics, ways to improve some technique etc.
– Look up references that seem to be important or missing
• In some cases, you may also want to check who and how references this paper
![Page 14: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/14.jpg)
EE382A – Spring 2009 Christos Kozyrakis Lecture 1 - 14
Department of Electrical Engineering
Stanford University
http://eeclass.stanford.edu/ee382a
EE382A Lecture 1:
Introduction to Advanced Processor Architecture
![Page 15: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/15.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 15
Historical Perspectives on Processors
• The Decade of the 1970’s: “Birth of Microprocessors”
– Programmable Controller
– Single-Chip Microprocessors
– Personal Computers (PC)
• The Decade of the 1980’s: “Quantitative Architecture”
– Instruction Pipelining
– Fast Cache Memories
– Compiler Considerations
– Workstations
• The Decade of the 1990’s: “Instruction-Level Parallelism”
– Superscalar,Speculative Microarchitectures
– Aggressive Compiler Optimizations
– Low-Cost Desktop Supercomputing
![Page 16: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/16.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 16
Performance Growth
• Doubling every 18 months (1982-2000):
– total of 3,200X
– Cars travel at 176,000 MPH; get 64,000 miles/gal.
– Air travel: L.A. to N.Y. in 5.5 seconds (MACH 3200)
– Wheat yield: 320,000 bushels per acre
• Doubling every 24 months (1971-2001):
– total of 36,000X
– Cars travel at 2,400,000 MPH; get 600,000 miles/gal.
– Air travel: L.A. to N.Y. in 0.5 seconds (MACH 36,000)
– Wheat yield: 3,600,000 bushels per acre
Unmatched by any other industry!!
[John Crawford, Intel, 1993]
![Page 17: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/17.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 17
Convergence of Key Enabling Technologies
• CMOS VLSI:
– Submicron feature sizes: 0.3u 0.25u 0.18u 0.13u 90n 65n 45nm…
– Metal layers: 3 4 5 6 7 (copper) 12 …
– Power supply voltage: 5V 3.3V 2.4V 1.8V 1.3V 1.1V …
• CAD Tools:
– Interconnect simulation and critical path analysis
– Clock signal propagation analysis
– Process simulation and yield analysis/learning
• Architecture & Microarchitecture:
– Superpipelined and superscalar machines
– Speculative and dynamic microarchitectures
– Simulation tools and emulation systems
• Compilers: – Extraction of instruction-level parallelism
– Aggressive and speculative code scheduling
– Object code translation and optimization
![Page 18: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/18.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 18
Evolution of Single-Chip Processors
1970’s 1980’s 1990’s 2010
Transistor Count 10K-100K 100K-1M 1M-100M 0.5-1B
Clock Frequency 0.2-2MHz 2-20MHz 20M-1GHz 1-5GHz
Instruction/Cycle < 0.1 0.1-0.9 0.9- 2.0 10
MIPS or MFLOPS < 0.2 0.2-20 20-2,000 100,000
Watt < 2 <10 <40 1-100+ (?)
CPUs/chip` 1 1 1 4-10
![Page 19: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/19.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 19
Aspects of Computer Architecture
• ARCHITECTURE (instruction set architecture)
– programmer/compiler view - “Functional appearance to its immediate user/
system programmer”
• IMPLEMENTATION (microarchitecture)
– processor designer view - “Logical structure or organization that
implements the instruction set”
• DESIGN (chip realization)
– chip/system designer view - “Physical structure that embodies the
implementation”
![Page 20: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/20.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 20
Our Objective for this Quarter
• The “What’s-How’s-Why’s” of Processor Design
1. Knowledge (“what’s”)
- Technology
- Techniques
2. Design Skills (“how’s”)
- Critical Issues
- Trade-off Intuitions
3. Understanding (“why’s”)
- Deeper Insights
- Fundamental Principles
![Page 21: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/21.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 21
Basic Tools and Principles for Architects
![Page 22: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/22.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 22
f
Amdahl’s Law
• Speedup= timewithout enhancement / timewith enhancement
• Suppose an enhancement speeds up a fraction f of a task by a
factor of S
timenew = timeold·( (1-f) + f/S )
Soverall = 1 / ( (1-f) + f/S )
(1 - f)
timeold
(1 - f)
timenew
f/S
![Page 23: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/23.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 23
Amdahl’s Law (continued)
• Real life analogy: After driving through 60 minutes of traffic jam, how
much time can you make up by speeding in the final mile?
• Applications in Computer Architecture
– RISC - Reduced Instruction Set Computer
– Optimized to execute frequently used instructions quickly
– Infrequently used instructions take longer, or even emulated with SW
We should concentrate efforts on improving frequently occurring events or
frequently used mechanisms
![Page 24: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/24.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 24
Pipelining
• Latency : Elapsed time from start to completion of a particular task
• Throughput : How many tasks can be completed per unit of time
• A pipeline is like an assembly line!
• Pipelining only improves throughput
– Latency: each job still takes 5 cycles to complete
– Throughput: 1 job per cycle if pipelined vs. 1 job per 5 cycles if not pipelined
stage1 stage2 stage3 stage4 stage5
start finish
![Page 25: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/25.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 25
Pipelining (continued)
• Real life analogy: Henry Ford’s automobile assembly line.
• Example in computer architecture:
– 5-stage Instruction Execution Pipeline
– Fetch-Decode-Execute-Memory-Writeback
time
Stages t0 t1 t2 t3 t4 t5 t6 t7 . . . .
Fetch I1 I2 I3 I4 I5
Decode I1 I2 I3 I4 I5
Execute I1 I2 I3 I4 I5
Memory I1 I2 I3 I4 I5
Writeback I1 I2 I3 I4 I5
![Page 26: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/26.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 26
Parallel Processing
• Parallelism - the amount of independent sub-tasks available
• If sub-tasks are independent, the order that they are carried out does
not matter
• Thus by executing the independent subtasks concurrently, we can
finish the entire task faster
Improve Speedup!!!
![Page 27: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/27.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 27
Parallel Processing
• Real life analogy: collaboration on problem sets
(although not always encouraged)
• Examples in computer architecture:
– Parallel computers
– Superscalar processors
– Multi-core processors
![Page 28: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/28.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 28
Our-of-order Execution
• Specification (or Program) Order vs Dataflow Order
• Dataflow: Data-driven scheduling of events
– The start of an event should be enabled by the availability of its required
input (data dependency)
– The completion of an event will produce an output that will enable the start
of other events
x = a + b; y = b * 2 z = (x-y) * (x+y)
+
+-
*
*2
a b
xy
![Page 29: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/29.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 29
Our-of-order Execution
• Real life analogy:
– A tip on taking tests: work on the questions you know first
• Examples in computer architecture
– Most modern microprocessors (Intel P4, Opteron etc) all schedule
instruction execution in dataflow order
![Page 30: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/30.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 30
Work and Critical Path
• Work
T1 - time to complete a computation on a
sequential system
• Critical Path
T - time to complete the same computation
on an infinitely-parallel system
• Average Parallelism
Pavg = T1 / T
• For a p wide system
Tp max{ T1/p, T }
Pavg>>p Tp T1/p
+
+-
*
*2
a b
xy
x = a + b; y = b * 2 z =(x-y) * (x+y)
![Page 31: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/31.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 31
Work and Critical Path
• Real life analogy: undergraduate degree requirements
– Work = unit requirement
– Critical Path to graduation is determined by course sequences and their
prerequisites
• Added constraints: classes are only available on specific quarters…
• Applications to computer architecture
– Parallel job scheduling
– Given a collection of inter-dependent task:
• How much resources should be allocated?
• Which sequence of tasks should be given priority?
![Page 32: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/32.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 32
Speculation
Is it possible to parallelize the critical path?
i.e. violate data dependence?
• Guess the outcome of an operation from its inputs without performing
the operation
• Even better, guess the outcome of an operation before the inputs to
the operation are even known
• Speculation techniques must also include mechanisms for
1. Checking if the guesses are correct
2. Undoing “speculative execution” after wrong guesses
![Page 33: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/33.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 33
Speculation (continued)
• Real life analogy:
– Another tip on taking tests: You can often guess what is going to be on an
exam by looking at lectures and HWs.
• Examples in computer architecture
– Circuit-level speculations: Carry Select Adder
– Architectural-level speculations
• Branch target predictions
• Load value predictions
• Speculative loop execution
![Page 34: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/34.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 34
Locality Principle
• One’s recent past is a very good indication of his near future
– Temporal Locality: If you just did something, it is very likely that you will do
the same thing again soon
– Spatial Locality: If you just did something, it is very likely you will do some
thing related or similar next
• Locality == Patterns == Predictability
– Converse:
• Anti-locality : If you haven’t done something for a very long time, it is very likely
you won’t do it in the near future either
![Page 35: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/35.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 35
Locality Principle (continued)
• Real life analogy:
– spatial locality - where you choose to sit in a room
– temporal locality - will you be here again next week?
• Examples in computer architecture:
– Execution of program loops
• Spatial locality - after you execute an instruction, with very good probability, you
will execute the next instruction
• Temporal locality - you are very likely to repeat the same instructions many
times
![Page 36: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/36.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 36
Memoization
• If something is expensive to compute, you might want to remember the
answer for a while, just in case you will need the same answer again
Why does memoization work??
• Real life analogy:
– Keeping a list of frequently used phone numbers by your telephone
• Examples in computer architecture
– ?
![Page 37: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/37.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 37
Amortization
• Overhead cost : one-time cost to set something up
• Per-unit cost : cost for per unit of operation
total cost = overhead + per-unit cost x N
• It is often okay to have a high overhead cost if the cost can be
distributed over a large number of units
low the average cost
average cost = total cost / N
= ( overhead / N ) + per-unit cost
![Page 38: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/38.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 38
Amortization (continued)
• Real life analogy: economy of scale
– Why is pasta sauce cheaper when bought by the gallon?
• Examples in computer architecture:
Cache Access Latency
Tmiss= 50 cycles
Thit = 1 cycle
If on the average a cache line is reused n times before being ejected
Tave = ( Tmiss+ (n-1)Thit ) / n Tmiss / n + Thit
n = 50 Tavg 2
n = 2 Tavg 25
![Page 39: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/39.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 39
Basic Equations and Metrics
• Performance
– CPUtime = Instruction Count * CPI * Clock Cycle Tie
– AMAT = Hit Time + Miss Rate * Miss Penalty
– Amdahl’s law, amortization
• Cost
– Processor cost = f(die area4)
• Power Consumption
– Power = C*Vdd2*F + Vdd*Ishortcircuit*F + Vdd*Ileakage
– Energy = Power * Time
– E*D, E*D2, ED3, …
• Fault tolerance: MTTF, MTTR, …
• Design complexity: ?
![Page 40: EE382A Advanced Processor Architectureacs.pub.ro/~cpop/SMPA/L01-intro 382a.pdf · • Architectures for instruction level parallelism – Supersalar: instruction fetch, branch prediction,](https://reader033.fdocuments.in/reader033/viewer/2022041500/5e2118f9f38cfe49af0fe4dd/html5/thumbnails/40.jpg)
EE282 – Autumn 2009 Christos Kozyrakis Lecture 1 - 40
Ready to Learn More?