21st Century Computer Architecture A community white paper 21st Century Computer Architecture.

24
Computer Architecture A community white paper http:// cra.org/ccc/docs/init/21stcenturyarchitecturewhit epaper.pdf Technion, Haifa Israel, June 2013 Information & Commun. Tech’s Impact Semiconductor Technology’s Challenges Computer Architecture’s Future Example: Bypassing Paged Virtual Memory

Transcript of 21st Century Computer Architecture A community white paper 21st Century Computer Architecture.

21st CenturyComputer

Architecture A community white paper

http://cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdf

Technion, Haifa Israel, June 2013• Information & Commun. Tech’s Impact• Semiconductor Technology’s Challenges• Computer Architecture’s Future• Example: Bypassing Paged Virtual Memory

2

White Paper Participants

“*” contributed prose; “**” effort coordinatorThanks of CCC, Erwin Gianchandani & Ed Lazowska for guidance and Jim Larus & Jeannette Wing for feedback

Sarita Adve, U Illinois * David H. Albonesi, Cornell UDavid Brooks, Harvard ULuis Ceze, U Washington * Sandhya Dwarkadas, U Rochester Joel Emer, Intel/MIT Babak Falsafi, EPFL Antonio Gonzalez, Intel/UPC Mark D. Hill, U Wisconsin *,** Mary Jane Irwin, Penn State U * David Kaeli, Northeastern U * Stephen W. Keckler, NVIDIA/U TexasChristos Kozyrakis, Stanford UAlvin Lebeck, Duke UMilo Martin, U Pennsylvania

José F. Martínez, Cornell UMargaret Martonosi, Princeton U * Kunle Olukotun, Stanford UMark Oskin, U Washington Li-Shiuan Peh, M.I.T. Milos Prvulovic, Georgia Tech Steven K. Reinhardt, AMDMichael Schulte, AMD/U WisconsinSimha Sethumadhavan, Columbia UGuri Sohi, U Wisconsin Daniel Sorin, Duke UJosep Torrellas, U Illinois * Thomas F. Wenisch, U Michigan * David Wood, U Wisconsin * Katherine Yelick, UC Berkeley/LBNL *

3

20th Century ICT Set Up• Information & Communication Technology (ICT)

Has Changed Our Worldo <long list omitted>

• Required innovations in algorithms, applications, programming languages, … , & system software

• Key (invisible) enablers (cost-)performance gainso Semiconductor technology (“Moore’s Law”)o Computer architecture (~80x per Danowitz et al.)

4

Enablers: Technology +

Architecture

Danowitz et al., CACM 04/2012, Figure 1

Technology

Architecture

5

21st Century Promise• ICT Promises Much More

o Data-centric personalized health careo Computation-driven scientific discoveryo Human network analysiso Much more: known & unknown

• Characterized byo Big Datao Always Onlineo Secure/Privateo …

Whither enablers of future (cost-)performance gains?

6

Technology’s Challenges 1/2

Late 20th Century The New Reality

Moore’s Law —2× transistors/chip

Transistor count still 2× BUT…

Dennard Scaling —~constant power/chip

Gone. Can’t repeatedly double power/chip

Classic CMOS Dennard Scaling: the Science behind Moore’s Law

7National Research Council (NRC) – Computer Science and Telecommunications Board (CSTB.org)

Scaling:

Oxide: tOX/a

Results:

Power Density:

Voltage: V/a

Power/ckt: 1/a2

~Constant

(Finding 2)

Source: Future of Computing Performance: Game Over or Next Level?,

National Academy Press, 2011

Power Density: ~Constant

Post-classic CMOS Dennard Scaling

8National Research Council (NRC) – Computer Science and Telecommunications Board (CSTB.org)

Scaling:

Oxide: tOX/a

Results:

Voltage: V/a V

Power/ckt: 1a2

1/a2

Post Dennard CMOS Scaling Rule TODO:Chips w/ higher power (no), smaller (),

dark silicon (), or other (?)

9

Technology’s Challenges 2/2

Late 20th Century The New Reality

Moore’s Law —2× transistors/chip

Transistor count still 2× BUT…

Dennard Scaling —~constant power/chip

Gone. Can’t repeatedly double power/chip

Modest (hidden) transistor unreliability

Increasing transistor unreliability can’t be hidden

Focus on computation over communication

Communication (energy) more expensive than computation

1-time costs amortized via mass market

One-time cost much worse &want specialized platforms

How should architects step up as technology falters?

10

21st Century Comp Architecture

20th Century

21st Century 

Single-chip in generic computer

Architecture as Infrastructure: Spanning sensors to cloudsPerformance plus security, privacy, availability, programmability, …

  Cross-Cutting:

Break current layers with new interfaces

Performance via invisible instr.-level parallelism

Energy First● Parallelism● Specialization● Cross-layer design

Predictable technologies: CMOS, DRAM, & disks

New technologies (non-volatile memory, near-threshold, 3D, photonics, …) Rethink: memory & storage, reliability, communication

X

X

12

What Research Exactly?• Research areas in white paper (& backup slides)

1. Architecture as Infrastructure: Spanning Sensors to Clouds2. Energy First3. Technology Impacts on Architecture4. Cross-Cutting Issues & Interfaces

• Much more research developed by future PIs!

• E.g.: Efficient Virtual Memory for Big Memory Serverso Basu, Gandhi, Chang, Hill, & Swift [ISCA 2013]o Big Memory: graph500, memcached, databaseso Self-manage most memory (e.g., bufferpool)

1310/5/12

Execution Time Overhead: TLB Misses

1. Significant waste

2. Larger memory?3. Byte-addr NVM?

Lower is better

Hardware: Direct Segment

OFFSET

BASE LIMIT VA

Conventional Paging

PA

1 2 Direct Segment

Why Direct Segment?• Matches Big Memory Workload

needs• NO Paging => NO TLB Miss

15

Execution Time Overhead: TLB Misses

10/5/12

92-100% TLB “misses” to direct segmentRequires: Both small SW + small HW

changes

21st CenturyComputer

Architecture A community white paper

http://cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdf

Technion, Haifa Israel, June 2013• Information & Commun. Tech’s Impact• Semiconductor Technology’s Challenges• Computer Architecture’s Future• Example: Bypassing Paged Virtual Memory

19

Back Up Slides• Detailed research areas in white paper

1. Architecture as Infrastructure: Spanning Sensors to Clouds

2. Energy First3. Technology Impacts on Architecture4. Cross-Cutting Issues & Interfaces

http://cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdf

• Findings on National Academy “Game Over” Study

• Glimpse at DARPA/ISAT Workshop “Advancing Computer Systems without Technology Progress”

20

1. Architecture as Infrastructure: Spanning

Sensors to Clouds• Beyond a chip in a generic computer• To pillar of 21st century societal infrastructure.

o Computation in context (sensor, mobile, …, data center) o Systems often large & distributedo Communication issues can dominate computationo Goals beyond performance (battery life, form factor)

• Opportunities (not exhaustive)o Reliable sensors harvesting (intermittent) energyo Smart phones to Star Trek’s medical “tricorder”o Cloud infrastructure suitable for both “Big Data” streams

& low-latency qualify-of-service with stragglerso Analysis & design tools that scale

21

2. Energy First• Beyond single-core performance computer• To (cost-)performance per watt/joule

• Energy across the layerso Circuit/technology (near-threshold CMOS, 3D stacking)o Architecture (reducing unnecessary data movement)o Software (communication-reducing algorithms)

• Parallelism to save energyo Vast (fined-grained) homogeneous & heterogeneouso Improved SW stacko Applications focus (beyond graphic processing units)

• Specialization for performance & energy efficiencyo Abstractions for specialization (reducing 1-time cost)o Energy-efficient memory hierarchieso Reconfigurable logic structures

22

3. Technology Impacts on Architecture

• Beyond CMOS, Dram, & Disks of last 3+ decades to

• Using replacement circuit technologieso Sub/near-threshold CMOS, QWFETs, TFETs, and QCAs

• Non-volatile storageo Beyond flash memory to STT-RAM, PCRAM, & memristor

• 3D die stacking & interposerso logic, cache, small main memory

• Photonic interconnectso Inter- & even intra-chip

• Design automationo from circuit-design w/ new technologies to o pre-RTL functional, performance, power, area modeling of

heterogeneous chips & systems

23

4. Cross-Cutting Issues & Interfaces

• Beyond performance w/ stable interfaces to

• New design goals (for pillar of societal infrastructure)o Verifiability (bugs kill)o Reliability (“dependability” computing base?)o Security/Privacy (w/ non-volatile memory?)o Programmability (time to correct-performant solution)

• Better Interfaceso High-level information (quality of service, provenance)o Parallelism ((in)dependence, (lack of) side-effects)o Orchestrating communication ((recursive) locality)o Security/Reliability (fine-grain protection)

Executive summary (Added to National Academy Slides)

Highlights of National Academy Findings(F1) Computer hardware has transitioned to multicore(F2) Dennard scaling of CMOS has broken down(F3) Parallelism and locality must be exploited by software(F4) Chip power will soon limit multicore scaling

Eight recommendations from algorithms to education

We know all of this at some level, BUT:Are we all acting on this knowledge or hoping for business as usual?

Thinking beyond next paper to where future value will be created?– Questions Asked but Not Answered Embedded in NA Talk– Briefly Close with Kübler-Ross Stages of Grief:

Denial … AcceptanceSource: Future of Computing Performance: Game Over or Next Level?,

National Academy Press, 2011Mark Hill talk (http://www.cs.wisc.edu/~markhill/NRCgameover_wisconsin_2011_05.pptx)

The Graph

25

Syste

m C

ap

ab

ilit

y

(log

)

80s

90s

00s

10s

20s

30s

40s

CMO

SFallow Period

New

Tech

nologyOur

Focus

50sSource: Advancing Computer Systems without Technology Progress,

ISAT Outbrief (http://www.cs.wisc.edu/~markhill/papers/isat2012_ACSWTP.pdf)

Mark D. Hill and Christos Kozyrakis, DARPA/ISAT Workshop, March 26-27, 2012.

Approved for Public Release, Distribution UnlimitedThe views expressed are those of the author and do not reflect the official policy or position of

the Department of Defense or the U.S. Government.

Surprise 1 of 2• Can Harvest in the “Fallow” Period!

• 2 decades of Moore’s Law-like perf./energy gains

• Wring out inefficiencies used to harvest Moore’s Law

HW/SW Specialization/Co-design (3-100x)

Reduce SW Bloat (2-1000x)

Approximate Computing (2-500x)

---------------------------------------------------

~1000x = 2 decades of Moore’s Law!

26

“Surprise” 2 of 2

• Systems must exploit LOCALITY-AWARE parallelism

• Parallelism Necessary, but not Sufficient

• As communication’s energy costs dominate

• Shouldn’t be a surprise, but many are in denial

• Both surprises hard, requiring “vertical cut” thru SW/HW

27