1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan...

52
1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer Science State University of New York Binghamton, NY 13902-6000 http://www.cs.binghamton.edu/~lowpower

Transcript of 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan...

Page 1: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

1

Reducing Datapath Energy Through the Isolation of Short-Lived Operands

Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad GhoseDepartment of Computer Science

State University of New YorkBinghamton, NY 13902-6000

http://www.cs.binghamton.edu/~lowpower

Page 2: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

2

Outline

– Introduction– Motivations– Contributions

Basic idea: isolate short-lived operands in a small dedicated register file and avoid their writes to the ROB and the ARF

Resources impacted: ROB, ARF Power savings: 21% with 32-entry additional RF

– Results– Conclusions– Future work

Page 3: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

3

IQ

FunctionUnitsInstruction Issue

F1 D1

FU1

FU2

FUm

ARF

Result/status forwarding buses

EX

Instruction dispatch

Architectural Register File

F2

Fetch Decode/Dispatch

D2

D-cache

LSQ

ROB

A P6-like Superscalar Datapath

Page 4: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

4

Out-of-Order Execution and In-Order Retirement

ROB

F R D

Inst. Queue ExARF

In-order front end

Out-of-order core

In-order retirement

Page 5: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

5

Energy-dissipating Events

ROB

F R D

Inst. Queue ExARF

In-order front end

Out-of-order core

In-order retirement

WriteWrite

Read

Page 6: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

6

The Idea : Isolating Short-Lived Values

ROB

F R D

Inst. Queue ExARF

Write

Write

Read

SRF

Write short-lived values into a small

dedicated RF (SRF)

In-order front end

Out-of-order core

In-order retirement

Page 7: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

7

– Used to avoid false data dependencies.– A new physical register is allocated for EVERY new

result– P6 style: ROB slots serve as physical registers

Register Renaming

LOAD R1, R2, 100

SUB R5, R1, R3 ADD R1, R5, R4

LOAD P31, P2, 100

SUB P32, P31, P3

ADD P33, P32, P4

Page 8: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

8

– Register Alias Table (RAT) maintains the mappings between logical and physical registers

Register Renaming: the Implementation

Arch. Reg

Phys. Reg.

Location(0-ROB,1-ARF)

0 0 1

1 1 1

2 2 1

3 3 1

4 4 1

5 5 1

LOAD R1, R2, 100SUB R5, R1, R3ADD R1, R5, R4

Original code

Page 9: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

9

– Register Alias Table (RAT) maintains the mappings between logical and physical registers

Register Renaming: the Implementation

Arch. Reg

Phys. Reg.

Location(0-ROB,1-ARF)

0 0 1

1 31 0

2 2 1

3 3 1

4 4 1

5 5 1

LOAD R1, R2, 100SUB R5, R1, R3ADD R1, R5, R4

LOAD P31, R2, 100

Original code

Renamed code

Page 10: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

10

– Rename Table (RT) is used to maintain the mappings between logical and physical registers

Register Renaming: the Implementation

Arch. Reg

Phys. Reg.

Location(0-ROB,1-ARF)

0 0 1

1 31 0

2 2 1

3 3 1

4 4 1

5 32 0

LOAD R1, R2, 100SUB R5, R1, R3ADD R1, R5, R4

LOAD P31, R2, 100SUB P32, P31, R3

Original code

Renamed code

Page 11: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

11

– Rename Table (RT) is used to maintain the mappings between logical and physical registers

Register Renaming: the Implementation

Arch. Reg

Phys. Reg.

Location(0-ROB,1-ARF)

0 0 1

1 33 0

2 2 1

3 3 1

4 4 1

5 32 0

LOAD R1, R2, 100SUB R5, R1, R3ADD R1, R5, R4

LOAD P31, R2, 100SUB P32, P31, R3ADD P33, P32, R4

Original code

Renamed code

Page 12: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

12

– Our definition: a value is short-lived if the destination register is renamed by the time of the result generation.

– Identified one cycle before the result writeback

Short-Lived Values

LOAD R1, R2, 100SUB R5, R1, R3ADD R1, R5, R4

LOAD P31, R2, 100SUB P32, P31, R3ADD P33, P32, R4RENAMER

Page 13: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

13

0

10

20

30

40

50

60

70

80

90

10096-entry ROB, 4-way processor

The Good News : 80%+ of the Values are Short-Lived

As rename-to-writeback latency increases in future datapaths, the percentage of short-lived values will also go up

Page 14: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

14

The Idea : Isolating Short-Lived Values

ROB

F R D

Inst. Queue ExARF

Write

Write

Read

SRF

Write short-lived values into a small

dedicated RF (SRF)

LOAD R1, R2, 100SUB R5, R1, R3ADD R1, R5, R4

In-order front end

Out-of-order core

In-order retirement

Page 15: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

15

Need to hang on to the short-lived values to:Recover from branch mispredictionsReconstruct precise state

Why do we need the SRF ?

LOAD R1, R2, 100BEQ R5, R1, #100ADD R1, R5, R4

Page 16: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

16

– Maintain the bit-vector Renamed– Set by the Renamer at the time of renaming

Identifying Short-Lived Values

Arch. Reg

Phys. Reg.

Location(0-ROB,1-ARF)

0 0 1

1 31 0

2 2 1

3 3 1

4 4 1

5 32 0

LOAD P31, R2, 100SUB P32, P31, R3ADD P33, P32, R4

LOAD R1, R2, 100SUB R5, R1, R3ADD R1, R5, R4

31

1

Renamed

Page 17: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

17

– Maintain the bit-vector Renamed– Set by the Renamer at the time of renaming

Identifying Short-Lived Values

Arch. Reg

Phys. Reg.

Location(0-ROB,1-ARF)

0 0 1

1 33 0

2 2 1

3 3 1

4 4 1

5 32 0

LOAD P31, R2, 100SUB P32, P31, R3ADD P33, P32, R4

LOAD R1, R2, 100SUB R5, R1, R3ADD R1, R5, R4

31

1

Renamed

Page 18: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

18

– Renamed bit is checked one cycle before writeback– Value produced by LOAD is short-lived because

Renamed [31]=1

Identifying Short-Lived Values

LOAD P31, R2, 100SUB P32, P31, R3ADD P33, P32, R4

LOAD R1, R2, 100SUB R5, R1, R3ADD R1, R5, R4

31

1

Renamed

Page 19: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

19

– When do we write short-lived values into the SRF?

– When and how are the short-lived values removed from the SRF?

– What happens on a branch misprediction?

– How do we reconstruct a precise state?

Managing the SRF: the Issues

Page 20: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

20

Format of an SRF entry

Valid ROB idx Data Branch Tag 1

Branch Tag 2

Dest. Arch. Reg.

Branch Identifier for Renamer : used to remove this entry if renamer gets squashed

Branch Identifier for this instruction : used to remove this entry if this instruction gets squashed

Branch Identifier of an instruction = id/tag of immediately preceding conditional branch

Page 21: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

21

– An instruction writes a short-lived result value into the SRF if:

A free entry exists in the SRF No SRF entry keyed with the same ROB slot is already

established– Bit-vector Allocated_in_SRF is maintained– One bit for each ROB entry– Set at the time of writeback if value is written into the SRF– Reset at the time of removing the value from the SRF

Writing to the SRF: the Conditions

Valid ROB idx Data Branch Tag 1

Branch Tag 2

Dest. reg

Page 22: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

22

Scenario 1 : Normal Commitment of Renamer

Scenario 2 : Renamer gets squashed

Scenario 3 : The instruction generating the short- lived value itself gets squashed

Scenarios for Removing the Values from the SRF

Page 23: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

23

– Values are removed by the Renamer– 2-step process:

Mark the instruction whose value is to be removed from the SRF (done at the time of renaming)

Remove the marked value from the SRF IF NEED BE (done at the time of commitment)

– When ADD commits, it removes the value written by LOAD

Removing the Values from the SRF : Scenario 1

LOAD R1, R2, 100SUB R5, R1, R3ADD R1, R5, R4

LOAD P31, R2, 100SUB P32, P31, R3ADD P33, P32, R4Renamer

Page 24: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

24

Marking the Values for Removal

Arch. Reg

Phys. Reg.

Location(0-ROB,1-ARF)

0 0 1

1 31 0

2 2 1

3 3 1

4 4 1

5 32 0

LOAD P31, R2, 100SUB P32, P31, R3ADD P33, P32, R4

LOAD R1, R2, 100SUB R5, R1, R3ADD R1, R5, R4

31ROB

LO

AD

SU

B

32 33

Page 25: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

25

Marking the Values for Removal

Arch. Reg

Phys. Reg.

Location(0-ROB,1-ARF)

0 0 1

1 31 0

2 2 1

3 3 1

4 4 1

5 32 0

LOAD P31, R2, 100SUB P32, P31, R3ADD P33, P32, R4

LOAD R1, R2, 100SUB R5, R1, R3ADD R1, R5, R4

31ROB

LO

AD

SU

B

AD

D

32 33

31

FS (Flush SRF) field of the ROB

Page 26: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

26

– FS field of B must match the ROB index field of a SRF entry

– This SRF entry must belong to A

Removing the Values (B is the renamer for A)

LOAD R1, R2, 100SUB R5, R1, R3ADD R1, R5, R4

31

LO

AD

SU

B

AD

D32 33

31

SRF

ROB

1 31 1 load

Valid ROB idx Data Branch Tag 1

Branch Tag 2

Dest

SRF format

A B

Page 27: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

27

Another Example (LOAD could not write to SRF)

Arch. Reg

Phys. Reg.

Location(0-ROB,1-ARF)

0 0 1

1 33 0

2 2 1

3 3 1

4 4 1

5 32 0LOAD P31, R2, 100SUB P32, P31, R3ADD P33, P32, R4

LOAD R1, R2, 100SUB R5, R1, R3ADD R1, R5, R4

Original code

Renamed code

SRF was full!31

1

Renamed

Page 28: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

28

Another Example

Arch. Reg

Phys. Reg.

Location(0-ROB,1-ARF)

0 0 1

1 33 0

2 2 1

3 3 1

4 4 1

5 5 1LOAD P31, R2, 100SUB P32, P31, R3ADD P33, P32, R4

LOAD R1, R2, 100SUB R5, R1, R3ADD R1, R5, R4

…MUL R2, R3, R4DIV R2, R2, R5

Original code

Renamed codeCommitted

31

0

Renamed

Committed

Page 29: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

29

Another Example

Arch. Reg

Phys. Reg.

Location(0-ROB,1-ARF)

0 0 1

1 33 0

2 31 0

3 3 1

4 4 1

5 5 1LOAD P31, R2, 100SUB P32, P31, R3ADD P33, P32, R4

…MUL P31, R3, R4

LOAD R1, R2, 100SUB R5, R1, R3ADD R1, R5, R4

…MUL R2, R3, R4DIV R2, R2, R5

Original code

Renamed codeCommitted

31

0

Renamed

Committed

Page 30: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

30

Another Example

Arch. Reg

Phys. Reg.

Location(0-ROB,1-ARF)

0 0 1

1 33 0

2 32 0

3 3 1

4 4 1

5 5 1LOAD P31, R2, 100SUB P32, P31, R3ADD P33, P32, R4

…MUL P31, R3, R4DIV P32, R31, R5

LOAD R1, R2, 100SUB R5, R1, R3ADD R1, R5, R4

…MUL R2, R3, R4DIV R2, R2, R5

31

1

Renamed

Original code

Renamed codeCommitted

Committed

Page 31: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

31

Another Example (A’s ROB slot is assigned for C)

31

LO

AD

SU

B

AD

D

32 33

31

SRFROB

0

Valid ROB idx Data Branch Tag 1

Branch Tag 2

Dest

SRF format

A BLOAD P31, R2, 100SUB P32, P31, R3ADD P33, P32, R4

Page 32: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

32

Another Example (A’s ROB slot is assigned for C)

31

MU

L

DIV

AD

D

32 33

31

SRFROB

1 31 2 mul

Valid ROB idx Data Branch Tag 1

Branch Tag 2

Dest

SRF format

C BLOAD P31, R2, 100SUB P32, P31, R3ADD P33, P32, R4

…MUL P31, R3, R4DIV P32, R31, R5

D

Page 33: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

33

– Bit-vector Uncommitted_Write is maintained One bit for each ROB entry Set at the time of establishing SRF entry Reset at the time of commitment

– Instruction B removes the value written by A (allocated to ROB slot i) if:

Allocated_in_SRF[i]=1, and (this needs to be better explained) Uncommitted_Write[i]=0;

Ensuring that the right values are removed

Page 34: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

34

– When an instruction allocated to ROB slot i commits and Allocated_in_SRF[i]=1, the data is not copied to the ARF.

Avoiding Unnecessary Committments

Dest. reg

ROB

F R D

Inst. Queue ExARF

Write

Read

SRFWrite

Page 35: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

35

– Problem: Renamer can get squashed -> stale entries remain in the SRF if

nothing is done

– Example:

Handling Branch Mispredictions : Scenario 2

32

BR

SU

B

AD

D33 34

31

ROB

SRF

1 31 1 load

LO

AD

31

Page 36: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

36

– Problem: Renamer can get squashed -> stale entries remain in the SRF if

nothing is done

– Example:

Handling Branch Mispredictions

32

BR

ROB

SRF

1 31 1 load

LO

AD

31 33 34

Page 37: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

37

– Solution: Tag each entry in the SRF with the id of the branch preceding

the renamer (BT1). When the renamer is squashed, the value is removed from the

SRF and is written to either the ROB (based on the value of Uncommitted_Write bit)

Multiplex the ports to reduce complexity

Handling Branch Mispredictions

Valid ROB idx Data Branch Tag 1

Branch Tag 2

Dest

SRF format

Page 38: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

38

– Maintain the array Branch_Tags– One entry for each ROB slot

Obtaining Branch Tag BT1

Arch. Reg

Phys. Reg.

Location(0-ROB,1-ARF)

0 0 1

1 31 0

2 2 1

3 3 1

4 4 1

5 33 0

LOAD P31, P2, 100BEQ P6, P7, 200SUB P33, P31, P3ADD P34, P33, P4

LOAD R1, R2, 100BEQ R6, R7, 200SUB R5, R1, R3ADD R1, R5, R4

31

Branch_Tags

7

Page 39: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

39

– Problem: The instruction whose value was inserted into the SRF can

itself be squashed

– Example:

Handling Branch Mispredictions : Scenario 3

31

LO

AD

SU

B

AD

D32 33

31

ROB

SRF

1 31 1 load

BR

30

Page 40: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

40

– Problem: The instruction whose value was inserted into the SRF can

itself be squashed

– Example:

Handling Branch Mispredictions

31 32 33

ROB

SRF

1 31 1 load

BR

30

Page 41: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

41

– Solution: Tag each entry in the SRF with the id of the branch preceding

the instruction itself (BT2). Simply remove the value from the SRF if such a branch in

mispredicted

Handling Branch Mispredictions

Valid ROB idx Data Branch Tag 1

Branch Tag 2

Dest

SRF format

Page 42: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

42

– Allow all instructions preceding the faulting instruction to commit

– Squash all instructions following the faulting instruction– Copy the values of ALL valid SRF entries to the ARF.

Supporting Precise Interrupts

Valid ROB idx Data Branch Tag 1

Branch Tag 2

Dest

SRF format

Page 43: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

43

CompiledSPEC

benchmarks

Datapathspecs

Performance stats

VLSI layoutdata

SPICEdecks

SPICE

MicroarchitecturalSimulator

Energy/PowerEstimator Power/energy

stats

SPICE measures ofEnergy per transition

Transition counts,Context information

Inter-thread buffers

Data analyzer/Intra-stream analysis

Two separate threads

Experimental Setup

Page 44: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

44

0

20

40

60

80

100

bzip2 gap gcc gzip mcf pars perl twolf vort vpr applu apsi art eq mesa mgrid swim wupw

8 entries 16 entries 32 entries 48 entries % of short-lived results

%

Results: Percentage of Values Written into the SRF

40.5% 60.1% 77.5% 82.3% 86.7%

Page 45: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

45

0

10

20

30

40

50

60

bzip2 gap gcc gzip mcf pars perl twolf vort vpr applu apsi art eq mesa mgrid swim wupw

8 entries 16 entries 32 entries 48 entries

cycles

Results: Average Time Spent by a Value in the SRF

Average: 12-15 cycles

Page 46: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

46

0

20

40

60

80

100

bzip2 gap gcc gzip mcf pars perl twolf vort vpr applu apsi art eq mesa mgrid swim wupw

8 entries 16 entries 32 entries 48 entries % of short-lived results

%

Results: Percentage of Values not copied into the ARF

42.2% 61.9% 79.3% 84.1% 86.7%

Page 47: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

47

0

200

400

600

800

Baseline 8 entries 16 entries 32 entries 48 entries

pJ

Results: Net Energy Reduction

21%16%9%

ROB+additional logic

ARF SRF

23%

Page 48: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

48

pJ

Results: Net Energy Reduction

21%16%9%

ROB + additional

logic

ARF

SRF

23%

0

200

400

600

800

Baseline 8 entries 16 entries 32 entries 48 entries

Page 49: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

49

– Register Traffic Analysis (Franklin and Sohi, MICRO’92). Studied the useful lifetime of register instances Delaying the writes until 30 more instructions are dispatched, can eliminate

80% of the writes (if perfect knowledge of the last use is available) Buffering 30 most recently generated results avoids 80% of wbks

– Lozano and Gao (MICRO’95) 90% of all results values are short-lived (consumed while in the ROB) Mechanism to avoid commitment of these values and also avoid register

allocation for them is proposed ROB slots are exposed to the compiler in the form of symbolic registers

– Lazy Retirement (Savransky, Ronen, Gonzalez, WCED’02) Hardware-based scheme to avoid unnecessary commitments Copying from the ROB to the ARF is delayed until the ROB slot is reused. In

many cases, the register is invalidated by the newer instruction Additional rename table is needed. About 75% of commits are avoided.

Related Work

Page 50: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

50

– Significant power savings & negligible impact on performance

– Sources of power savings: majority of generated results written into small lightly-ported

SRF Unnecessary commitments are avoided Additional logic/ storage needed to do this is simple

– For a 32-entry SRF, more than 77% of writebacks and more than 79% of commitments can be avoided

– This results in the energy savings of 21% on the ROB and the ARF

Conclusions

Page 51: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

51

THANK YOU !

This work was supported in part by DARPA through the PAC-C program and NSF

LOW POWER RESEARCH GROUP Department of Computer Science

State University of New YorkBinghamton, NY 13902-6000

http://www.cs.binghamton.edu/~lowpowerParallel Architectures and Compilation Techniques (PACT’03)

October 1st 2003

Page 52: 1 Reducing Datapath Energy Through the Isolation of Short-Lived Operands Dmitry Ponomarev, Gurhan Kucuk, Oguz Ergin, Kanad Ghose Department of Computer.

52

– SRF

– Three bit vectors (same size as the ROB) Renamed Allocated_in_SRF Uncommitted_Write

– 4-bit array Branch_Tags (same size as the ROB)

Complexity of the Solution