Efficient FIB Caching using Minimal Non-overlapping Prefixes*
Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient...
Transcript of Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient...
![Page 1: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/1.jpg)
Loop Instruction Caching for Energy-Efficient Embedded Processors
Ji GuDepartment of Communications & Computer Engineering
Graduate School of InformaticsKyoto University
![Page 2: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/2.jpg)
2
OutlineOutline
1. Background2. Research overview3. DLIC: a single-task based approach4. PLIC: a multi-task based approach5. Conclusions
![Page 3: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/3.jpg)
3
BackgroundBackground
Processors in data centers consume 1.5% of the global energyWhere does the processor energy go?• Caches are energy-consuming due to instruction/data supply
Processor power Instruction supply power
![Page 4: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/4.jpg)
4
Research Problem (1/2)Research Problem (1/2)
Observed behavior of embedded applications[1]• 77% of execution time spent in loops• 47% of execution time spent in loops of size 64 or less• 46% of execution time spent in loops that iterate 5 times or more
Loop behavior can be exploited for low-energy design
[1] J. Villarreal et al. A Study on the Loop Behavior of Embedded Programs. University of California,Riverside. Technical Report UCR-CSE-01-03, 2001.
![Page 5: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/5.jpg)
5
Research Problem (2/2)
Caching decoded instructions for most of loops, including large, complicated and nested loops
• to avoid repeated instruction fetching and decoding operations as much as possible
A
H
I
E
C
FDL1 L3 L4 L5
B
L2
G
![Page 6: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/6.jpg)
6
Design Overview
IF IDDLIC
EXE
IF/IDstall
load EXEstall
MEM WB
EXEsrc
DLIC: Decoded Instruction Loop CacheHardware/Software Co-design• Using customized hardware design• Using software to control the operation of DLIC
![Page 7: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/7.jpg)
7
Software DesignSoftware Design
brbH
E
C
F
slp
elp
brf
Four special instructions: slp, brb, brf, elp• Inserted into program code at design time – statically• Controlling DLIC operations at run time - dynamically
H
E
C
F
Loop 1
Loop 2
![Page 8: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/8.jpg)
8
IF IDDLIC
EXE
IF/IDstall
load EXEstall
MEM WB
EXEsrc
![Page 9: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/9.jpg)
9
Hardware Design: Hierarchical Cache Table
Decoded Instruction Word Format
control word branch memory target address
flag c_index
opcode control word
dlic_index branch cache target address
DLIC Index Table
Control Word Dictionary Table
Branch Cache Target Table
Instruction Format
opcode
operand
operand
![Page 10: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/10.jpg)
10
Results
Nor
mal
ized
ene
rgy
cons
umpt
ion
adpc
mbc
ntblo
wfish
crc32 de
sjpe
gqs
ortraw
caud
ioraw
daud
io rc4rijn
dael
salsa sh
astr
ingse
arch
AVG
77% Instr. fetch and decode Red.66% Energy Saving1.4% Performance Overhead
0
0.2
0.4
0.6
0.8
1
1.2
adpc
mbc
ntblo
wfish
crc32 de
sjpe
gqs
ortraw
caud
ioraw
daud
io rc4rijn
dael
salsa sh
astr
ingse
arch
AVG
DIB DLIC
Ji Gu, Hui Guo and Tohru Ishihara. DLIC: Decoded Loop Instructions Caching for Energy-Aware Embedded Processors. To appear in ACM TECS, accepted March 2012.
![Page 11: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/11.jpg)
11
Loop Caching for MultitaskingLoop Caching for Multitasking
Processors increasingly used in multitasking systems• Several tasks running on a single processor• Tasks executed in time-interleaved fashion• Inter-task interference in cache memories• High energy consumption
Loop caching: reduce the inter-task interference in the I-cache by reducing the I-cache accesses
![Page 12: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/12.jpg)
12
Hardware Design for Context SwitchHardware Design for Context Switch
task ID partition ID
PLIC
L-PC
P0
Pn-1
Pi
From OS/Task Scheduler
Task ID
Task State Table
instruction
Tagless I-cache
Partitioned Loop Instruction Cache (PLIC): • Tasks allocated to different partitions: no interference • Task State Table for context switch
Conventional context switch by OS
Updating task state table during context switch
![Page 13: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/13.jpg)
13
Case StudyCase Study
A case study of multitasking application of 5 tasks:• adpcm, jpeg, rawdaudio, sha, stringsearch
Processor specified at RTL level for simulation (ISS)
1KB PLIC, 8KB I-cache• CACTI, DesignCompiler used for energy/area evaluation
Round Robbin task scheduling, with switching intervals of 5K, 10K, and 20K cycles
![Page 14: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/14.jpg)
14
ResultsResults
Reduction: 50% I-cache access, 6~18% I-cache miss, 36% I-cache energy
![Page 15: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/15.jpg)
15
ConclusionsConclusions
Loops are common in applications of most embedded systemsDLIC: reduce instruction fetch/decode power
Software-controlled SPM-like structure for decoded instructions 66% (up to 87%) energy saved with performance overhead of 1.4%
PLIC: reduce I-cache access/miss for multitasking systemA low-cost Task State Table for context switch at hardware level Reduction: 50% I-cache access, 6~18% I-cache miss, 36% I-cache energy
![Page 16: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/16.jpg)
16
Thank you!Thank you!
![Page 17: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/17.jpg)
17
DLIC Overall Architecture
![Page 18: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/18.jpg)
18
PLIC Overall ArchitecturePLIC Overall Architecture
![Page 19: Loop Instruction Caching for Energy-Efficient …Loop Instruction Caching for Energy-Efficient Embedded Processors Ji Gu Department of Communications & Computer Engineering Graduate](https://reader035.fdocuments.in/reader035/viewer/2022070818/5f14cf9468784f71b4020c9a/html5/thumbnails/19.jpg)
19
ASIPmeisterSimplescalar
GCC
VHDL(Syn.)
VHDL(Sim.) Object code
SynopsysDesign
CompilerModelSim
ISA (PISA)
HW eval. area, energy,
delay
Application
HW/SWco-design
DLIC
SW eval.performance
execution trace
CACTI
I-cache,Memory
profiling
Experimental Setup