1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of...
-
Upload
shavonne-waters -
Category
Documents
-
view
217 -
download
3
Transcript of 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of...
![Page 1: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/1.jpg)
1
Computer Architecture Research Overview
Rajeev Balasubramonian
School of Computing, University of Utahhttp://www.cs.utah.edu/~rajeev
![Page 2: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/2.jpg)
2
What is Computer Architecture?
![Page 3: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/3.jpg)
3
What is Computer Architecture?
• If the Intel Pentium4 has a faster clock speed than the IBM Power4, does it execute your programs faster?
![Page 4: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/4.jpg)
4
What is Computer Architecture?
• If the Intel Pentium4 has a faster clock speed than the IBM Power4, does it execute your programs faster?
Completing instruction
Clock tick
Case 1:
Case 2:
Time
![Page 5: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/5.jpg)
5
What is Computer Architecture?
To a large extent, computer architecture determines:
• the number of instructions used to execute a program
• the time each instruction takes to execute
• the idle cycles when no work gets done
• the number of instructions that can execute in parallel
![Page 6: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/6.jpg)
6
A Typical Microprocessor
BranchPredictor
Decode &Rename Issue Logic
ALUALU ALU ALU
L2 Cache
L1 InstrCache
L1 DataCache
RegisterFile
![Page 7: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/7.jpg)
7
Architecture Trends in the 90s
• Performance was the ultimate metric
• Transistors were a limiting factor
As on-chip transistors became available in the 90s, more functionalityand complex circuitry was added to boost performance – most of the low-hanging fruit has now been picked
![Page 8: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/8.jpg)
8
Hitting the Wall
We have now hit the following walls:
• Single core performance
• Memory
• Complexity
• Power, temperature
![Page 9: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/9.jpg)
9
Hitting the Power Wall
Power is as important a metric today as performance
From Shekhar Borkar, MICRO’99
![Page 10: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/10.jpg)
10
The Advent of Multi-Core Chips
• In the past, performance magically increased by 50% every year• In the future, this improvement will be only ~20% every year … unless … the application is multi-threaded!
Core
Cache bank
![Page 11: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/11.jpg)
11
Upcoming Architecture Challenges
• Improving single core performance
• Functionalities in multi-core chips
• Simplifying the programmer’s task
• Efficient interconnects
• Power and temperature-efficient designs
• Designs tolerant of errors
For publications, see http://www.cs.utah.edu/~rajeev/research.html
![Page 12: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/12.jpg)
12
Interconnects as a Bottleneck
• In the past, on-chip data transmission on wires cost almost nothing
• Interconnect speed and power has been improving, but not at the same rate as transistor speeds
Hence, relative to computation, communication is much more expensive
• In the near future, it will take 100 cycles to travel across the chip
• 50% of chip power can be attributed to interconnects
![Page 13: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/13.jpg)
13
Interconnects in Multi-Core Chips
A
L1
A
CPU 3
CPU 1 CPU 2
L2cache
L2control
AA
A
A
A
L2control
![Page 14: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/14.jpg)
14
Not all Wires are Created Equal
B-Wires L-Wires W-Wires PW-Wires
Relative latency 1x 0.5x 1.6x 3.2xRelative area 1x 4x 0.5x 0.5xDynamic power (W/m) 2.65 1.46 2.9 0.87Static Power (W/m) 1.02 0.57 1.16 0.31
![Page 15: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/15.jpg)
15
Data Transfers have Varying Needs
• Example of a cache coherence transaction: Read exclusive request for a shared block
![Page 16: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/16.jpg)
16
Other Interconnect Choices
• Optical interconnects: speed of light, cost in converting between optical and electrical domains
• 3D chips: reduces communication distances, low cost for vertical signal transmission, increase in power density
![Page 17: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/17.jpg)
17
3D Layouts
Cluster
(a) Arch-1 (cache-on-cluster) (b) Arch-2 (cluster on cluster) (c) Arch-3 (staggered)
Cache bank Intra-die horizontal wire Inter-die vertical wire
Die 1
Die 0
![Page 18: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/18.jpg)
18
Upcoming Architecture Challenges
• Improving single core performance
• Functionalities in multi-core chips
• Simplifying the programmer’s task
• Efficient interconnects
• Power and temperature-efficient designs
• Designs tolerant of errors
Clustered architectures: relatively low complexity scalable solution easily handles multiple threads
![Page 19: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/19.jpg)
19
Upcoming Architecture Challenges
• Improving single core performance
• Functionalities in multi-core chips
• Simplifying the programmer’s task
• Efficient interconnects
• Power and temperature-efficient designs
• Designs tolerant of errors
Heterogeneous perf/powerCores that execute the OSCores that verify results
![Page 20: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/20.jpg)
20
Upcoming Architecture Challenges
• Improving single core performance
• Functionalities in multi-core chips
• Simplifying the programmer’s task
• Efficient interconnects
• Power and temperature-efficient designs
• Designs tolerant of errors
Hardware to supporttransactional memory
![Page 21: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/21.jpg)
21
Upcoming Architecture Challenges
• Improving single core performance
• Functionalities in multi-core chips
• Simplifying the programmer’s task
• Efficient interconnects
• Power and temperature-efficient designs
• Designs tolerant of errors
Faults are caused by high energy particles that deposit enough charge to toggle bits
Variations in conditions may cause a circuit to not produce its result in time
![Page 22: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/22.jpg)
22
Research Methodologies
It’s all about the simulators!
• Simplescalar & Wattch & Hotspot: about 10,000 lines of C code that models the flow of instructions through a modern processor
• Inputs: configuration file that specifies processor parameters, benchmark program (say, gzip)
• Outputs: how long the program runs on the simulated processor (Simplescalar), how much power is consumed (Wattch), what is the peak temperature (Hotspot)
![Page 23: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/23.jpg)
23
Evaluating a New Idea
• Lots of reading (it’s better than waiting for divine inspiration)
• Identify bottlenecks, identify problems, develop an idea, repeatedly question that idea
• Understand simulator
• Engineer a solution, modify simulator code (perhaps, write fewer than 1000 lines of C code)
• Analyze data (things never work the first time), engineer/optimize/debug your solution
• Write papers
• Implement in silicon?
![Page 24: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/24.jpg)
24
To Learn More…
• CS/EE 3810: Computer Organization
• CS/EE 6810: Computer Architecture
• CS/EE 7810: Advanced Computer Architecture
• CS/EE 7820: Parallel Computer Architecture
• CS 7937 / 7940: Architecture Reading Seminar
![Page 25: 1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah rajeev.](https://reader030.fdocuments.in/reader030/viewer/2022032606/56649eb35503460f94bb9fac/html5/thumbnails/25.jpg)
25
Title
• Bullet