Avinoam Kolodny Power dissipation! · 8085 NMOS to CMOS Pentium 3 transition 2 i iDD allnodes P CV...

Avinoam Kolodny

The VLSI Interconnect Challenge

Electrical Engineering Department

Technion – Israel Institute of Technology

VLSI Challenges

�System complexity

�Performance

�Tolerance to digital noise and faults

�More challenges…

��

The Dominant Challenge is

Power dissipation!

Interconnect is the crux of the problem

��

Interconnect is the crux of the problem

“Old view” of VLSI:

�Speed and power are dominated by logic gates

�Wires are “ideal”

“New view ” : �Logic is fast and

virtually free

�Speed and power are limited by wires

��

Outline of this talk

�Background of the VLSI interconnect challenge

�Implications for energy-efficient computing

�Research directions�

��

2001 - Extrapolation Towards A Power Crisis

1970 1980 1990 2000 2010 2020

Hot Plate

Nuclear Reactor

Rocket Nozzle

Pentium 4

Pentium Pro

i486 8080

i286 Pentium

Pentium 3

NMOS to CMOS transition

��

VLSI hits the power wall

1970 1980 1990 2000 2010 2020

Pentium 4

Pentium Pro

i486 8080

i286 Pentium

Pentium 3 NMOS to CMOS transition

� ��

��

� Intel’s Pentium-M, low-power microprocessor, 0.13 micron CMOS

�Bit-Transportation energy is larger than computation energy!

Interconnect Power: A case study - 2004

��

� ��

Chips are Like Cities: Complexity is Shown in Connectivity

�In each generation of technology:

More transistors

More interconect wires

Technology Scaling: Faster Transistors, Slower Wires

��

Technology Scaling: Faster Transistors, Slower Wires

��

Trying to Keep Wire Resistance in Check Leads to Larger Capacitances

��

2 1 3 k

��

If Bits Were Cars…. The Nature of Design for Low-Power

�No critical root cause Because power is cumulative

Need power-saving efforts at all levels!

� ��

3 Types of Improvement

�1 Reduce waste of energy �2 (Optimal) Tradeoff power with delay (or other metric) �3 Change the algorithm or computational task

� ��

��

�� iccad��

��

Wire layout optimization:

Wire Widths and Spaces in a Wire Bundle

��

� ��

��

� ��

��

� � ��

��

Finding Optimal Wire Widths and Spaces under Delay Constraints

��

�� Integration, 2014.�

��

Optimal Ordering Theorem for Power

�Given an interconnect channel with wires of uniform width W, use ‘Symmetric Hill’ ordering according to activity factors of the signals

��

�� INTEGRATION, 2007��

*R. J. Gutmann et al., “Three-Dimensional (3D) ICs: A Technology Platform for Integrated Systems and Opportunities for New Polymeric Adhesives,” Proceedings of the Conference on Polymers and Adhesives in Microelectronics and Photonics, pp. 173-180, October 2001

Bulk CMOS

Substrate

Substrate 2nd plane

3rd plane

1st plane

Through silicon vias (TSV)

��

3-D Integrated Circuit Technology

��

Circuit optimization:

Optimal Power-Delay Tradeoff for Logic Paths

� � � ��

��

� ��

��

� ��

� ��For 2.5% delay increase, get 12x2.5=30% energy

reduction!

For 20% delay increase, get 2x20=40% energy

reduction

Most Power Savings Can be Made at High Abstraction Levels

� ��

��

Pollack’s Rule on Power Efficiency of Uniprocessors

��

� ��

��

�� = 𝜶𝟏 𝑨𝒓𝒆𝒂 �� 𝐰𝐞𝐫 = 𝜶𝟐 (𝑨𝒓𝒆𝒂)�

� � ��

��

Architecture optimization:

Processor System Evolution to CMP

��

�� = 𝜶𝟏 𝑨𝒓𝒆𝒂 �� 𝐰𝐞𝐫 = 𝜶𝟐 (𝑨𝒓𝒆𝒂)�

��

Chip Multi-

Processors� ��

Processor Architectures: Uni-core, Symmetric multicore, Asymmetric (Heterogeneous)

��

� � � �

Asymmetric Multi-Core Performance ACCMP Performance Vs. Power

� ��

� �

� � �

� � � �Relative Power

� ��

��

(25% serial code)

��

Classes of Replicated cores Standard modules (Processors,

Accelerators, Cache banks, ...)

Network on Chip (NoC)

Power management Different clocks

Different operating voltages

“dark silicon”

� ��

��

If a chip is like a city, Network-on-Chip (NoC) is like a subway system

Network on-Chip (NoC)

� ��

��

� ��

��

� ��

��

Issues Addressed by NoC

�Multi-Core Processor Systems (key to power-efficient computing)

��Global wire design (delay, power, noise, scalability, reliability issues)

��System integration productivity

(key to modular design)

��

Interconnect-aware and NoC-aware Architectural Research

��

Accessing On-Chip Cache Banks through a NoC

��

� �

��

� �

��

Where to Store the Shared Data?

What can be done better?

Bring shared data closer to all processors

Preserve vicinity of private data

��

A small number of lines, shared by many processors, is accessed numerous times

��

� ��

��

� ��

��

Memory Bottleneck

��

Memory

Control Unit

Arithmetic/ Logic Unit

Reducing Distances by Embedding Memory in Execution Units

��

Memristor Devices

� ��

� �

��

� ��

Sea of Memory

�Dense and fast

��

Towards Memory-Intensive Machines

� Constant-throughput-curves

� increase on-die-memory!

��

𝑩𝒂𝒏𝒅𝒘𝒊𝒅𝒕𝒉�

𝑻𝒉𝒓𝒐𝒖𝒈𝒉𝒑𝒖𝒕

𝑩𝒂𝒏𝒅𝒘𝒊𝒅𝒕𝒉�

� �

� ��

� �

Logic within the Memory Beyond von Neumann Architecture

��

Memory

Control Unit

Arithmetic/ Logic Unit

� ��

��

Summary �VLSI power is dominated by interconnect!

�New architectures are driven by interconnect distances/latencies/power

��

Avinoam Kolodny Power dissipation! · 8085 NMOS to CMOS Pentium 3 transition 2 i iDD allnodes P CV...

Documents

Transcript of Avinoam Kolodny Power dissipation! · 8085 NMOS to CMOS Pentium 3 transition 2 i iDD allnodes P CV...

Past & Present - Orith Kolodny

Andrew Kolodny

Avinoam Kolodny Technion – Israel Institute of Technology Intel PVPD Symposium July 2006

Desert Vegetation of Israel by Prof. Avinoam Danin

Avinoam Danin © Flora of the Shroud of Turin Avinoam Danin.

The 20th ACM Symposium on Parallelism in Algorithms and ...spaa/2008/SPAA-Program.pdfZvika Guz, Idit Keidar, Avinoam Kolodny and Uri C. Weiser Utilizing Shared Data in Chip Multiprocessors

1 Modeling and Optimization of VLSI Interconnect 049031 Lecture 6: Interconnect power Avinoam Kolodny Konstantin Moiseev.

Topics (cont.) Goals - Chula · Topics ¾Introduction ¾ ... Pentium 4 processor 2000 42,000,000 Pentium III processor 1999 24,000,000 Pentium II processor 1997 7,500,000 Pentium®

In-memory Accelerators with Memristors Yuval Cassuto Koby Crammer Avinoam Kolodny Technion – EE ICRI-CI Retreat May 8, 2013 PU MEM NVM.

Science 2011 Avinoam 589 92

Setup for AMIBIOS8 - American Megatrends Inc. - Home · Intel, Pentium, Pentium Pro, Pentium II, Pentium III, Pentium 4, and Xeon are registered trademarks of Intel Corporation. MS-DOS,

Dr. Andrew Kolodny: "Reporting on America’s Opioid Drug Crisis" 4.11.17

1 Evgeny Bolotin – Efficient Routing, DATE 2007 Routing Table Minimization for Irregular Mesh NoCs Evgeny Bolotin, Israel Cidon, Ran Ginosar, Avinoam Kolodny.

Pentium I, Pentium II, Ranuras de expansión

High Performance Processor Architectureneeraj/doc/pentium/pentium.pdf · Intel Pentium processor 1993 3,100,000 Intel Pentium II processor 1997 7,500,000 Intel Pentium III processor

SLIP-2008, Newcastle upon Tyne, UKApril 5, 2008 Parallel vs. Serial On-Chip Communication Rostislav (Reuven) Dobkin Arkadiy Morgenshtein Avinoam Kolodny.

· THE INTEL MICROPROCESSORS 8086/8088,80186/80188,80286,80386,80 486,PENTIUM,PENTIUM PRO PROCESSOR, PENTIUM II PENTIUM III PENTIUM 4 AND CORE 2 WITH 64 BIT EXTENSIONS ...

Journal of Philosophy, Inc. Author(s): NIKO KOLODNY and ...

The Pentium Processor. Pentium family history Pentium processor details Pentium registers –Data –Pointer and index –Control –Segment Real mode memory.

1 Asynchronous Bit-stream Compression (ABC) IEEE 2006 ABC Asynchronous Bit-stream Compression Arkadiy Morgenshtein, Avinoam Kolodny, Ran Ginosar Technion.