Energy Efficient Data Encoding in DRAM channels...

20
Multimedia VLSI Lab. The 43 rd International Symposium on Computer Architecture (Session 10B: Memory 2) 1 Energy Efficient Data Encoding in DRAM channels exploiting Data Value Similarity Hoseok Seol , Wongyu Shin, Jaemin Jang, Jungwhan Choi, Jinwoong Suh, Lee-Sup Kim Department of Electrical Engineering

Transcript of Energy Efficient Data Encoding in DRAM channels...

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

1

Energy Efficient Data Encoding

in DRAM channels exploiting

Data Value Similarity

Hoseok Seol, Wongyu Shin, Jaemin Jang,

Jungwhan Choi, Jinwoong Suh, Lee-Sup Kim

Department of Electrical Engineering

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

2

1. Introduction

2. BD-Encoding

3. Evaluation Results

4. Conclusion

Outline

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

3

Modern DRAM Interface

DRAM off-chip data bus consumes significant energy.

Data Bus Energy: Switching + Termination (dominant)

Modern DRAMs introduce asymmetric termination.

⇒ Pseudo Open Drain (POD): DDR4, GDDR4/5

⇒ Low Voltage Swing Terminated Logic (LVSTL): LPDDR4

< Center Tapped > < POD > < LVSTL >

bit 1

bit 0

bit 0 bit 1

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

4

Hamming Weight & Interface Energy

Hamming Weight: number of 1’s in a string of bits.

Decreasing Hamming Weight reduces both the

termination and switching energy.

We propose novel data encoding to reduce data bus energy.

Data: “11101010”

Data: “00000010”

Ex) LVSTL interface

Encoding

Hamming Weight: 5

Switching Activity: 6

Hamming Weight: 1

Switching Activity: 2

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

5

Bitwise Difference (BD) Encoding

Observation: Similar data words are sent over the

DRAM data bus.

Key Idea: Transfer the bit-wise difference between a

current data word and the most similar data words.

Energy Reduction: 58.3% of termination and 45.3% of

switching energy.

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

6

1. Introduction

2. BD-Encoding

3. Evaluation Results

4. Conclusion

Outline

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

7

Motivation

43.1

21.5

14.2 14.1

7.0

0.0

10.0

20.0

30.0

40.0

50.0

Stand-by ACT/PRE RD/WR Termination Switching

Pro

po

rtio

n [

%]

< Energy dissipation in DRAM sub-system >

(Micron DDR4 Power Calculator, DDR4-2133)

Energy dissipated in DDR4 data bus:

Termination (14.1%) + Switching Activity (7%)

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

8

Observation: Data Value Similarity

Transfer libquantum mcf

1 38 ad b3 00 18 83 24 00 18 67 df aa aa 2a 00 00

2 58 ad b3 00 18 83 24 00 01 00 00 00 00 00 00 00

3 78 ad b3 00 18 83 24 00 98 53 b8 aa aa 2a 00 00

4 98 ad b3 00 18 83 24 00 08 63 b8 aa aa 2a 00 00

5 a8 ad b3 00 18 83 24 00 00 00 00 00 00 00 00 00

6 c8 ad b3 00 18 83 24 00 00 27 bd aa aa 2a 00 00

Observation: Similar data words are sent over the DRAM

data bus.

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

9

Observation: Data Value Similarity

All the workloads in SPEC 2006 have Data Value Similarity.

The probability of the similar data occurrence (with recent 64

data words) is 72% in SPEC 2006 workloads.

< Probability of 90% data matching among 64 recent data words >

0

20

40

60

80

100

Pro

ba

bil

ity [

%]

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

10

Bitwise Difference Coder

< Overall Structure of BD-coder >

Recent data is stored in both tables in Encoder / Decoder

When transfer data, search the most similar data word.

If similar data exists, transfer 1) bitwise difference, 2) index NO.

If not, transfer the original data.

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

11

Example of BD-encoding

W/O encoding BD-encoding

Hamming Weight 5 1

Switching Activity 6 2

0 0 0 1 1 0 0 1

1 1 1 0 1 0 0 0

0 0 0 1 1 0 0 1

1 1 1 0 1 0 0 0

11101010 11101010

1 1 1 0 1 0 1 0

W/O encoding

BD-encoding ( xor data)

0 0 0 0 0 0 1 0

Data Data

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

12

Hardware Overheads

Coder (data table 64 entries)

Area: 0.044% of commodity DDR4

Latency: 2.3ns (Transmitter), 0.7ns (Receiver)

Energy: 7pJ (Transmitter), 2pJ (Receiver)

Designed by 65nm logic process

Index Line

a single extra line per 8 data lines.

can be shared with DBI / DM pins in DDR4.

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

13

1. Introduction

2. BD-Encoding

3. Evaluation Results

4. Conclusion

Outline

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

14 Introduction

Methodology

Component Parameters

Processor Gem5, X86, 3.3GHz

Caches L1 I-cache : 32KB, 4way

L1 D-cache : 64KB, 4way

L2 cache : 2MB, 8way

DRAM DDR4-2133, 8GB

Interface Pseudo Open Drain (DDR4)

Termination Energy Calculation:

Micron DDR4 Power Calculator

Switching Energy Calculation:

E = CV2

Channel capacitance: 15 [pF]

Workloads SPEC CPU 2006

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

15

Comparison Points

Data Bus Inversion [M.stan, TVLSI ‘95]

⇒ Transfer inverted data if the hamming weight of inverted one is smaller.

⇒ Adopted in the commodity DRAMs (GDDR4/5, DDR4, LPDDR4)

Power Protocol [K.Basu, MICRO’02], Frequent Value Encoding [J.Yang, ISLPED’01]

⇒ Transfer the table index instead of data when current data is the same

as data transferred recently.

Variable Length Value Encoder [D.suresh, ICCD’05]

⇒ Transfer the table index instead of data when current data is partly

matched with data transferred recently.

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

16

Hamming Weight Reduction

BD-Encoding decreases the hamming weights in all

workloads (the least effect in bzip: 29%)

The results increase as the number of table entries

increases (28-58% for 1-64 table entries)

< Hamming Weight Reduction Rate of BD-Encoding >

0

20

40

60

80

100 1 8 64

Re

du

cti

on

Ra

te [

%]

(number of entries)

(workloads)

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

17

Comparison to Prior Works

BD-encoding reduces 58.3% of the termination and 45.3% of

the switching energy.

The probability for similar data occurrence is much higher

than that for the same data ⇒ BD-encoding shows better

results than Power Protocol and VALVE.

< Energy Reduction Rate >

Re

du

cti

on

Ra

te [

%]

DBI: Data Bus Inversion

PP: Power Protocol

VALVE: Variable Length Value Encoder

12.4

25.5 34.8

58.3

10.9 20.7

25.8

45.3

0

20

40

60

80

DBI PP_64 VALVE_64 Proposed

Work_64

DBI PP_64 VALVE_64 Proposed

Work_64

Termination Energy Switching Energy

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

18

Interface Energy Reduction

< Interface Energy including Coder Hardware >

BD-encoding reduces overall interface energy including

coder hardware energy (24-47.6% for 1-64 entries)

Optimal number of entries exists (32ea) due to overhead of

index line and coder hardware.

0

20

40

60

80

100

1 2 4 8 16 32 64

Index line

Coder

Data Bus

(number

of entries)

Baseline

Rela

tive E

nerg

y [

%]

24 30.4 36.8 41.9 45.6 47.6 47.1

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

19

1. Introduction

2. BD-Encoding

3. Evaluation Results

4. Conclusion

Outline

Multimedia VLSI Lab.

The 43rd International Symposium on Computer Architecture (Session 10B: Memory 2)

20

Conclusion

Reducing hamming weight decreases both the termination and switching energy.

Data Value Similarity: Similar data words are sent over the

DRAM data bus.

Bitwise Different Encoding: Transfer the bit-wise difference

between a current data word and the most similar data

word recently transferred.

Evaluation Results: Reduce 58.3% of termination and

45.3% of switching energy.