ROSS: A Design of Read-Oriented STT-MRAM Storage for ... · ROSS: A Design of Read-Oriented...
Transcript of ROSS: A Design of Read-Oriented STT-MRAM Storage for ... · ROSS: A Design of Read-Oriented...
ROSS: A Design of Read-Oriented STT-MRAM Storage for Energy-Efficient Non-Uniform Cache Architecture
Jie Zhang, Miryeong Kwon, Changyoung Park, Myoungsoo Jung, Songkuk Kim
Computer Architecture and Memory System Lab,
Yonsei University
SummaryMotivation:There is a huge demand to increase the last-level cache size. However, SRAMsuffers from the high leakage power and low density.STT-MRAM can be a good candidate to replace SRAM, due to its small cellsize and low leakage power.
Challenge:Conventional STT-MRAM cannot directly substitute SRAM, because of
- High write latency- High write energy
Solution:We observe read-oriented data pattern in real applications;We propose ROSS, a read-oriented STT-MRAM storage that can
- dynamically detect the read-oriented data pattern- reduce the write frequency in STT-MRAM- accommodate as many read misses as possible in LLC
MotivationData set size increases explosively.
Videos Graphs Scientific Computing
300-hour new videos/min
30 Billion Photos
10-month Spacecraft Simulation
1 Billion Web Pages
Webs
Motivation
Huge demand to increase on-chip last-level cache size in future
ChallengesTraditional SRAM
Read Latency 397ps Write Latency 397ps Read Energy 35pJ Write Energy 35pJ
Fast speed
Reasonable energy
Cell Area (F^2) 50~120 Low density
Leakage 75.7mW High Leakage
ChallengesTraditional SRAM
Read Latency 397ps Write Latency 397ps Read Energy 35pJ Write Energy 35pJ
Cell Area (F^2) 50~120
Leakage 75.7mW
STT-MRAM
6~40
High density
6.6mW Low leakage
238ps
13pJ 6932ps
90pJ
High density
High WriteLatency/Energy
Prior work
Write Request Rerouting: accommodate write requests with idle blocks Parallelize write requests to hide long write latency; Cannot reduce the write energy;
Read Preemption: suspend write operation and give priority to read operation Mitigate the impact of long write latency to read access; Cannot reduce the write latency
Deadwrite bypass: predict and bypass the “deadwrite” (single write no read). Effectively avoid unnecessary write access on STT-MRAM; Bypassing deadwrite is not sufficient to avoid massive write on STT-MRAM
ROSS-key observation
Last Level Cache eviction trend
Read-miss distribution
How the application behavior can impact LLC data access?
Large working sets result in frequent cache eviction
What if we can detect the singular write blocks and put in STT-MRAM?
ROSS
Target: build a hybrid last-level cache to
i. avoid many cache read misses with larger STT-MRAM;ii. accommodate write updates in SRAM;iii. intelligent data migration algorithm;
Hybrid-NUCAConventional cache hierarchy
Uniform latency
Different wire latency latency
Non-uniform cache architecture
Severe wire delay
Hybrid-NUCA
LLC Controller
SRA
Mb
anks
1-w
ay3
-way
SRAM-SRAM migrat.
SRAM->STT-MRAM migration
STT-
MR
AM
ban
ks
Cache Bank Set
Migration policy
Address map policy
Less flexibility, but simplified design
Hybrid-NUCAMigration
policyAddress
map policyData search
policy
IndexTag Bank set
De
cod
erTagArray
DataArray
S/A S/A
ComparatorMatch?
1-w
ay3
-way
Request routes to specific cache bank set
Tag broadcast to each bank for searching
Data Flow
STT-MRAM
MEM
SRAM
L1 Cache
Block Fill1
Write-Back Miss2
4
ER: Migrate multiple cache lines
HB: Migrate one cache line
Evict Block
6
L2 C
ach
e 5 Write Hit Write Hit5
Read Hit3
Singular write detection
LLC Controller
…1-bit status reg.
C C C C C
8-bit counter
…
Initialize
00
00
0
00
00
0
Poll
00000
25410101010
Write hit 00100
25511
1111
Evict
00100
25511
1111
0 Evict
Experiment SetupEvaluation model:
ND-NUCA: traditional SRAM NUCA model;DB-NUCA: STT-MRAM NUCA model aware of deadblock;HB-ROSS: Hybrid Read-Oriented STT-MRAM Storage;ER-ROSS: Early Retirement aware Raed-Oriented STT-MRAM Storage;
Gem5 simulation parameters and workloads:
Evaluation
DB-NUCA: suffer from long write latency
HB-ROSS: larger capacity and less STT-MRAM write
ER-ROSS: benefits from early retirement
Evaluation
DB-NUCA: suffer from evicting singular write blocks
ROSS: STT-MRAM accommodates read-intensive data
Evaluation
ND-NUCA: consume more leakage energy
ROSS: reduce the write energy overhead in STT-MRAM
Thank You