Post on 25-Jun-2018
Overview Teaching Staff Introduction to Computer Architecture
History Future / Trends Significance
The course Content Workload Administrative Matters
2[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Dr. Soo Yuen Jien Contact Information: Room: COM2 #02-61 Consultation Hour:
Friday 3pm-5pm Wednesday after lecture Email me for other timing
Email: sooyj@comp.nus.edu.sg
Comments / Suggestions welcome
Who am I?
3[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Computer Architecture: Definition Architecture (in Computing):
The organization of the components and functionalities of a system
Computer Architecture: The study of computer (processor) architecture To maximize performance within constraints Typically classified into 3 categories:
Instruction Set MicroArchitecture System Design
5[ CS5222 Adv. Comp. Arch. AY1415S2 ]
The 3 Categories
•The hardware/software interface•Expose the functionalities to
programmer
Instruction Set
•Organization of components•Techniques / Mechanisms for
performanceMicroArchitecture
• Interconnection, data path•Memory hierarchySystem Design
6[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Computer Architecture VS Hardware Engineering
Computer Architecture: Describes the behavior of the processor Describes the high level mechanisms /
techniques for better performance
Hardware Engineering: Concerns with the actual implementation of
the architecture Logic / Circuit implementation, Packaging,
Cooling, Transistor process technology etc
7[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Computer System: The brief history Let's review the progress of computer
system in the past:1. Follow the thread of "Personal" Computer2. Another thread on High-end supercomputer
Observe the progress in terms of: Speed ( Operations / Second ) Size Availability and Cost
8[ CS5222 Adv. Comp. Arch. AY1415S2 ]
The Brief History: 1946 - ENIAC ENIAC:
World’s first programmable electronic digital computer
1900 additions per second 18,000 vacuum tubes 30 ton, 80 by 8.5 feet
9[ CS5222 Adv. Comp. Arch. AY1415S2 ]
The Brief History: 1951 - UNIVAC UNIVAC:
first commercial computer of US Uses Von Neumman design
2000 additions per second for $1 million Sold 48 copies
10[ CS5222 Adv. Comp. Arch. AY1415S2 ]
The Brief History: 1964 – IBM 360 IBM System/360:
Six implementations with varying price, performance
An example: 2MHz, 128KB-256KB memory, 500K operations/sec for $1M
All binary compatible, redefines industry!
11[ CS5222 Adv. Comp. Arch. AY1415S2 ]
The Brief History: 1965 – PDP-8
DEC PDP-8: first minicomputer 4k of 12-bit words 4 registers
330K operations per second for $16,000 sold 50,000 copies!
12[ CS5222 Adv. Comp. Arch. AY1415S2 ]
The Brief History: 1971 – Intel 4004
Intel 4004: First microprocessor
(single chip CPU) 4-bit processor for
calculator 1KB data + 4KB
program memory Only 2300 transistors 16-pin package 740KHz 100K operations per
second
13[ CS5222 Adv. Comp. Arch. AY1415S2 ]
The Brief History: 1977 – Apple II
Apple II: first personal computer
1 Mhz clock, 4kB of RAM, $1300 ~200k operations
per second
14[ CS5222 Adv. Comp. Arch. AY1415S2 ]
The Brief History: 1981 – IBM PC
IBM PC The system that shapes
the IT industry as we know it
Intel 8088 Processor 4.77 MHz, 16-256kB
RAM
240K operations for $3000!
15[ CS5222 Adv. Comp. Arch. AY1415S2 ]
The Brief History: 2003 – Pentium 4 Intel Pentium 4 processor
Clock speed 3.0GHz for around $300 169 million transistors 6000M operations/sec
16[ CS5222 Adv. Comp. Arch. AY1415S2 ]
The Brief History: 2011 – Intel i7 Intel Core i7 processor
Clock speed 3.2GHz for around $500 ~120GFlops
17[ CS5222 Adv. Comp. Arch. AY1415S2 ]
The Brief History: Supercomputer
18[ CS5222 Adv. Comp. Arch. AY1415S2 ]
0.0
5,000.0
10,000.0
15,000.0
20,000.0
25,000.0
30,000.0
35,000.0
Linpack Performance ( teraflops )
1,105.01,759.02,566.0
10,510.0
17,590.0
33,826.7
Nov, 2008 Road‐Runner (US)
Nov, 2009 Jaguar (US)
Nov, 2010 TianHe (China)
Nov, 2011 K‐Computer (Japan)
Nov, 2012 Titan (US)
Nov, 2013 TianHe‐2 (China)
Summary: From a few to manyn
Transistor is the building block of CPU since 1960s
1970 - 1980
2K – 100K
1980 - 1990
100K – 1M
1990 - 2000
1M – 100M
2000 - 2011
100M – 2.2B
Current World Population = 7Billionabout the number of transistors in 3 CPU chips!
19[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Summary: From BIG to small
Process size = Minimum length of a transistor
80286
1982
1.5 µm
Pentium
1993
0.80 µm- 0.25 µm
Pentium 4
2000
0.180 µm- 0.065 µm
Wave length of visible light = 350nm (violet) to 780nm (red)Process size now smaller than wavelength of violet light!
Core i7
2010
0.045 µm- 0.032 µm
20[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Summary: From S-L-O-W to fastFLOPS = FLoating-point Operation Per Second
80286
1982
1.8 MIPS*
Pentium
1993
200 MFLOPS#
Pentium 4
2000
4 GFLOPS#
Core i7
2011
120 GFLOPS #
21[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Summary: The Brief History Unprecedented progress since late 1940s
Performance doubling ~2 years (1971-2005): Total of 36,000X improvement! If transportation industry matched this
improvement, we could have traveled Singapore to Shanghai, China in about a second for roughly a few cents!
Incredible amount of innovations to revolutionize the computing industry again and again
22[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Moore’s Law Intel co-founder Gordon Moore "predicted" in
1965 that Transistor density will double every 18 months
24[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Growth in Processor Performance Prior to mid-80s
Largely technology driven Average 25% performance gain per year
Mid-80s to 2002 Both technology, instruction set (RISC), and
organization Average 52% performance gain per year Factor of seven gain from organization
2002 onwards Average 20% performance gain per year
26[ CS5222 Adv. Comp. Arch. AY1415S2 ]
The Three Walls Three major reasons for the unsustainable
growth in uniprocessor performance
1. The Memory Wall: Increasing gap between CPU and Main memory
speed
2. The ILP Wall: Decreasing amount of "work" (instruction level
parallelism) for processor
3. The Power Wall: Increasing power consumption of processor
27[ CS5222 Adv. Comp. Arch. AY1415S2 ]
The Memory Wall Memory access speed increases at about
10% / yr Processor speed increases at about 50% / yr
Memory is now order of magnitude slower than the processor speed E.g. Intel Core i7 has 0.3ns cycle, DDR3
SDRAM latency is ~10ns
Increasing amount of chip area dedicated to on-chip cache
28[ CS5222 Adv. Comp. Arch. AY1415S2 ]
The ILP Wall Instruction Level Parallelism (ILP) defines the
amount of instructions that can be executed in parallel The main source of performance for superscalar
processors
Very limited for implicit ILP, discovered on-the-fly by processor Average ~3 instructions (depends!!)
Move to explicit ILP Parallel Programming and Execution
29[ CS5222 Adv. Comp. Arch. AY1415S2 ]
The Power Wall We can now cramp more transistor into a
chip than the ability (power) to turn them on!
30[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Power Consumption: A comparison
~10mega watts
1 HDB block ~50kilo watts
~500 watts
Frige ~600 watts
31[ CS5222 Adv. Comp. Arch. AY1415S2 ]
The Power Wall: Challenges Mobile/Portable (cell phone, laptop, PDA)
Battery life is critical
Desktop 400 million computers in the world 0.16PW (PetaWatt = 1015 Watt) of power
dissipation Equivalent to 26 nuclear power plants
Data centers 1 single server rack is between 5 and 20 kW 100s of those racks in a single room
32[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Meeting the challenge Hyper-Threading Technology (HTT) in Xeon and
Pentium 4 Allow one physical processor to appear and behave as
two virtual processors to the operating system Two independent thread gives more ILP!
Intel dual-core (Pentium D) Multiple microprocessor cores on a single chip
Copyright © 2005 Intel
34[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Parallelism saves Power Dynamic Power = C x V2 x f
C = Capacitance, V = Voltage, f = clock freq Performance is proportional to clock frequency
Exploit explicit parallelism for reducing power using additional cores Increase density (=more transistors = more
capacitance) Can increase cores (2x) and performance (2x) Or increase cores (2x) but decrease frequency
(f/2)
35[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Multicore Revolution Chip density is continuing to increase ~2x
every 2 years Clock speed is not Number of processor cores may double instead
36[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Multicore Revolution: Industry
All microprocessor companies switch to MP (2X CPUs / 2 yrs) Procrastination results in 2X sequential perf. / 5 yrs
Current State: Intel i7 has 6 cores The STI Cell processor (PS3) has 8 cores nVidia Tesla GPU has up to 512 cores Intel MIC has > 50 cores
“We are dedicating all of our future product development to multicore designs. … This is a sea
change in computing” Paul Otellini, President, Intel (2005)
37[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Multicore/Manycore Roadmap Multicore: 2X / 2 yrs ≈ 64 cores in 8
years Manycore: 8X to 16X multicore
12
48
1632
6464128
256512
1
10
100
1000
2003 2005 2007 2009 2011 2013 2015
38[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Architecture Outlook Expect modestly pipelined processors
Small cores not much slower than large cores
Parallelism is energy efficient path to performance Lower threshold and supply voltages lowers energy per
operation
Small, regular processing elements easier to verify
Heterogeneous processors Special function units to accelerate popular functions
39[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Multicore: Impacts All major processor vendors are producing
multicore chips Every machine will soon be a parallel machine All programmers will be parallel programmers???
Complexity may eventually be hidden in libraries, compilers, and high level languages But a lot of work is needed to get there
Big open questions: What will be the killer apps for multicore machines? How should the chips be designed, and how will
they be programmed? Many others…..
40[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Parallel Revolution May Fail“…when we start talking about parallelism and ease of use of truly parallel computers, we're talking about a
problem that's as hard as any that computer science has faced. …
I would be panicked if I were in industry.”John Hennessy, President, Stanford University, 1/07
41[ CS5222 Adv. Comp. Arch. AY1415S2 ]
100% failure rate of Parallel Computer Companies Convex, Encore, MasPar, NCUBE, Kendall Square Research,
Sequent, (Silicon Graphics), Transputer, Thinking Machines, …
What if IT goes from a growth industry to a replacement industry? If SW can’t effectively multiple cores per chip
SW no faster on new computer Only buy if computer wears out
Parallel Computing: A view from BerkeleyApplications 1. What are the applications? 2. What are common kernels of the
applications?
Architecture and Hardware 3. What are the HW building blocks? 4. How to connect them?
Programming Model and Systems Software
5. How to describe applications and kernels?
6. How to program the hardware?
Evaluation 7. How to measure success?
42[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Compiler Challenges Heterogeneous processors
Increase in the design space for code optimization
Auto-tuners: optimizing code at runtime
Software controlled memory management Example: Cell processor
43[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Parallel Programming Challenges Finding enough parallelism (Amdahl’s
Law) Granularity Locality Load balance Coordination and synchronization Debugging Performance modeling
44[ CS5222 Adv. Comp. Arch. AY1415S2 ]
What will we learn in CS5222? Instruction-Level Parallelism (ILP)
Pipelining Dynamic Scheduling (Superscalar out-of-order) Static scheduling (VLIW processors) Branch Prediction
Multi-threaded processors Multiprocessors
Symmetric shared-memory architectures Synchronization Memory consistency
Memory Hierarchy Design
46[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Where can CS5222 takes you? Advanced Compiler
System Software
Operating System
High Performance Computing
Parallel Computing
47[ CS5222 Adv. Comp. Arch. AY1415S2 ]
We expect you to know Computer Organization (CS2100) Multi-Core Architecture (CS4223)
Significant overlap in topics, but more indepth Instruction set concepts:
RISC instruction set design philosophy registers, instructions, etc.
Simple pipelining Basic caches, main memory Low-level programming experience
C is very likely to be needed
48[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Reference
Computer Architecture: A Quantitative Approach 4th Edition Hennessy & Patterson Published by Morgan
Kaffman
49[ CS5222 Adv. Comp. Arch. AY1415S2 ]
Resources Primary and only information source is
IVLE
Workbin: Lecture notes Assignment submissions
Forum: Ask course-related technical questions in the
forum. Email is only for your personal concerns.
50[ CS5222 Adv. Comp. Arch. AY1415S2 ]