Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar...

23
1 Multi-core Architectures Rakesh Kumar [email protected] Progress of processor technology/architecture Specint2000 1.00 10.00 100.00 1000.00 10000.00 85 86 87 88 89 9091 92 93 94 95 96 97 98 99 0001 02 03 04 05 Intel Alpha Sparc Mips HP PA Pow er PC AMD

Transcript of Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar...

Page 1: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

1

Multi-core Architectures

Rakesh [email protected]

Progress of processor technology/architecture

Specint2000

1.00

10.00

100.00

1000.00

10000.00

85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05

Intel

Alpha

Sparc

Mips

HP PA

Pow er PC

AMD

Page 2: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

2

Price being paid

0.01

0.1

1

1 10 100 1000 10000

Spec2000

Wat

ts/S

pec

Intel

Alpha

Sparc

Mips

HP PA

Pow er PC

AMD

Lessons learned

� Marginal utility of transistors decreasing� If n be the number of transistors

�Power and Area are O(n)�Performance is O(sqrt(n))

• Wrong side of square law

� Increasingly difficult to squeeze performance�Not enough exploitable ILP in programs �Easy ILP already extracted

� More transistors available than we know to how make use of when applied to a single processor

Clearly, we have a problem!

Page 3: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

3

One way of handling a problem is….

� ..instead of confronting the problem try skipping to a simpler one

�Change the focus from single-thread performance to throughput

�Don’t have increasingly complex uniprocessors

� Have multiple simple processors on the same die instead [Olukotun et al, ASPLOS96]

�Each on-chip processor (called core) can execute a program now

We can now jump to the right side of the square law

� If n be the number of transistors on a die:� Area = O(n)� Performance = O(n1-x)

� Roughly O(sqrt(n))

� More aggregate performance (throughput) can be had using large number of small cores than small number of large cores� At the expense of single-thread performance

� For example,� In terms of area:

� 1 EV6�5 EV5 cores� In terms of throughput:

� 1 EV6�2.0-2.2 EV5 cores� 5EV5 cores >=2 EV6 cores

• Performance doubled just by having multiple cores!

The main motivation for having multi-core architecures

Page 4: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

4

Multi-core Architecture: Definition

A multi-core architecture (or a chip multiprocessor) is a general-purpose processor that consists of multiple cores on the same die and can execute programs

simultaneously

Multi-core architecture: Advantages

� (Relatively) High performance/watt

� (Relatively) High performance/area

� Simpler core�Possibility of lower cycle time, better optimisation etc.

�Ease of design, verification etc.

Page 5: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

5

So, the next question to ask obviously is…

How should one design a multi-core architecture?

This is the question I address in my thesis research

���� ���� ���� ����

���� ���� ��� ���

A Naive methodology for Multi-core Design

�� �� �� ��

�� �� � �

� ���������������� ��� ����������������

��� �!�� �"�# "����������

Page 6: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

6

Goals of my thesis research

� Demonstrate that the prior methodology is highly inefficient in terms of area and power

� Demonstrate the need to do holistic design of multi-core architectures�Subsystem design should be aware of the multi-core

architecture it is going to be a part of

� Propose and evaluate novel and efficient multi-core architecture design methodologies that follow a holistic approach

Assumptions inherent to the naïve approach

�All cores have to be the same

�Each core is distinct

�Core/memory and interconnect can be designed in isolation

I will talk about the first assumption today

Page 7: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

7

Before scrutinizing the “identical cores” assumption...

…let’s consider characteristics of typical workloads

There is enormous diversity among applications

Page 8: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

8

Implication of diversity on multi-core design

� If all cores are to be identical, then can’t address diverse workload demands�E.g. need to decide beforehand if the core targets gcc or

mcf

� Either way one application loses�Underutilization or low performance

An example multi-core architecture

���� �$��

Page 9: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

9

%&

%&�

%&�

%&�

%&

%&�

%&�

%&�

%&

%&�

%&�

%&�

%&

%&�

%&�

%&�

%&�%&�%&�

%&�%&�%&�

An example multi-core architecture

���� �$��

%&�

%&�

Processors and Program diversity

� Some applications will run much faster on an EV6 than on an EV5

� Others will take little advantage of the larger processor and run at the same speed on either

� With a homogeneous architecture, � you either have the former running very slowly on small

processors, � or the latter unnecessarily wasting the capabilities of the large

processor.

Page 10: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

10

An alternate multi-core architecture

%& %& %& %&

%&�

���� �$��

%&�

%&�

%&�%&�

An alternate multi-core architecture

'��� ��(������ ��"�$ ������$��)����� ������ ��������������%&��(����!�����$���( ����*���%&���

Page 11: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

11

Single-ISA Heterogeneous Multi-core Architectures

� Have multiple heterogeneous cores on the same die

�Each core-type represents a different point in the power performance space�i.e. while one core-type might be small low-

performance, low-power, some other core-type might be big high performance, high power

�Each core capable of executing the same ISA�Unlike SoCs/embedded heterogeneous multi-core

architecturesSuch an architecture will be highly efficient on workloads with diverse applications

Another Performance Advantage: Adjusts to varying TLP

Page 12: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

12

Another Performance Advantage: Adjusts to varying TLP

%& %& %& %&

%&�

���� �$�

%&�

%&�

%&�%&�

+,��$����$�� �$( ���$ �� ����������� ���$�(�� ���(��������

+-�����(��������$��$���������$�� ��(��*��� ���

Comparing Single-ISA Heterogeneous Architectures against Conventional CMPs

0

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Number of threads

Wei

ght

ed S

pee

dup

4EV6

Page 13: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

13

Comparing Single-ISA Heterogeneous Architectures against Conventional CMPs

0

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Number of threads

Wei

gh

ted

Sp

eed

up

4EV6 20EV5

A choice has to be made between throughput and ST performance

Comparing Single-ISA Heterogeneous Architectures against Conventional CMPs

0

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Number of threads

Wei

gh

ted

Sp

eed

up

4EV63EV6 & 5EV5 (static best)20EV5

+.(�����/���������$�� �$( ���$ ���%&

+0�� ����������%&�!� (�����1�����������������������$�� ��(��*��� ���

Best of both the worlds!

Page 14: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

14

Then there is intra-program diversity as well!

0

0.4

0.8

1.2

1.6

2

1 201 401 601 801

Committed instructions (in millions)

IPS

EV8-

EV6

EV5

EV4

Dynamic scheduling results

0

1

2

3

4

5

6

7

1 2 3 4 5 6 7 8

N umber o f threads

4EV6

3EV6 & 5EV5 (random)

3EV6 & 5EV5 (stat ic best)

3EV6 & 5EV5 (bounded-global-event )

Page 15: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

15

To sum up….

� Single-ISA Heterogeneous architectures a good design point for throughput as well as performance:

�Efficient use of die-area for a given thread-level parallelism� Provides low-latency for few application on powerful cores� A large number of applications can be hosted at once on simple cores

�Efficient adaptation to application diversity� Enables it approach the performance of an architecture with a large

number of complex cores� Provides higher performance in the same area than a conventional chip

multiprocessor

Talk Outline

�All cores have to be the same�Single-ISA heterogeneous multi-core

architectures�Performance Benefits�Power Benefits

Page 16: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

16

Reducing power for a conventional multi-core architecture

� Done at the core-level �Each core optimised for power and then replicated

multiple times�Multi-core oblivious

� Processor power reduction typically involves V/f scaling, gating etc for the core

� Power reduction techniques applied at single-core level have limited effectiveness

����2�3 ����

Page 17: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

17

����2�3 ����

�����������

�����

����2�3 ����

Page 18: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

18

4���"� ����(�#��!������ ����(�#��

�����������!��������(�#��

� Have multiple heterogeneous cores on the same die

� Match workload (or workload phase) to core that achieves best efficiency according to some objective function

� Power down the unused cores completely

������������� ���������������� � �������� ���� � ������

���� ���������

Page 19: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

19

An example Single-ISA heterogeneous multi-core architecture

26092.88EV8-

2417.80EV6

59.83EV5

34.97EV4

Core-area (in mm^2)Peak-power (in W)Processor

+��������� �(���������#��$�*� ������� ������*�5�($ �����������$��� ������

+5��������� �� ������������(�������������1�������� �������2��� ���1�3,6!���( ������ ����1�&

The processor only marginally bigger than EV8- !

7$����)��������������#���$���� ����(��������� ������������#��$���(��*��� ������ ���

Page 20: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

20

Choosing Dynamically the Core with Least Energy (perf. loss<10%)

0

0.4

0.8

1.2

1.6

2

1 201 401 601 801

Committed instructions (in millions)

IPS

EV8-

EV6

EV5

EV4

Choosing Dynamically the Core with Least Energy (perf. loss<10%)

5������������� ���

0

0.4

0.8

1.2

1.6

2

1 201 401 601 801

Committed instructions (in millions)

IPS

EV8-EV6EV5EV4Best-path

Page 21: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

21

Choosing Dynamically the Core with Least Energy (perf. loss<10%)

[Summary of results]

3.438.5Mean0.10.1Minimum8.577.3Maximum

Performance Degradation(%)

Energy Savings(%)

Results “verified” by other researchers using real prototypes

[Grochowski ICCD2004, Ghiasi CF2005]

Realistic heuristics

0

0.2

0.4

0.6

0.8

1

neighbour neighbor-global

random all Dynamicoracle

Nor

mal

ized

Val

ue (

wrt

EV

8-)

EnergyPerformance(1/execution-time)Energy-delay

5�$������ (����8�/��*��$���� ���

Page 22: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

22

To sum up…

� A single-ISA heterogeneous multi-core architecture offers enormous potential for even power-savings

� Realistic heuristics can achieve much of the savings potential

� Beats chip-wide voltage scaling handsomely (50.6% ED2

improvement)� Subsequent research has shown this technique to better than

dynamic V/f scaling, gating, adaptive optimizations etc. [Grochowski et al ICCD2004]

Bottomline

All cores do not have to be the same

In fact, should not be same

Page 23: Multi-core Architectures - University of California, San … Multi-core Architectures Rakesh Kumar rakumar@cs.ucsd.edu Progress of processor technology/architecture Specint2000 1.00

23

Summary of talk

�Decreasing marginal utility of transistors is leading us to multi-core architectures

�Conventional multi-core architectures have identical cores

�Having heterogeneous architectures lead to higher performance and lower power