Energy-efficient and High Throughput Sparse Distributed...
Transcript of Energy-efficient and High Throughput Sparse Distributed...
Energy-efficient and High ThroughputSparse Distributed Memory Architecture
Mingu Kang, Eric P. Kim, Min-Sun Keel, Naresh R. Shanbhag([email protected])
May 27, 2015
• Energy efficient - 100X over von Neumann architecture• Associate stored data with information• Extracts information from noisy and incomplete data to achieve
cognition and decision making
Brain-inspired Computing
1/10
• Mathematical model inspired by human long-term memory (Kanerva 98’)
• Used as auto/hetero associative memory
• Hamming distance based address decoder + counter array
• Tolerance to noisy data translates to tolerance to hardware non-idealities
Sparse Distributed Memory (SDM)
2/10
Sparse distributed memory (SDM)
A(i) S C(i)
P D
Addr
ess d
ecod
er
Coun
ter a
rray
sums
Read/Write Operation of SDM
Write operation of SDM
Read operation of SDMSparse distributed memory (SDM)
3/10
A(i) S C(i)
P D
Addr
ess d
ecod
er
Coun
ter a
rray
sums
SDM Implementation
Challenges
• Address decoder presents a throughput bottleneck and energy – Memory read requires over 90% delay and 70% energy
• Counter array requires high data rate communication between memory banks– Large numbers of routing lines toggled (ie. 30K lines toggled / read)
• Proposed solution – Compute Memory based address decoder– Hierarchical Binary Decision (HBD) based counter array 4/10
• In-memory computing platform• Multi-row (MR) READ• Embedded analog processing
w/ low voltage swing• Pattern matching (ICASSP 14’)• Conv. Net. (ICASSP 15’)
Compute Memory
5/10
• Separate memory (low-swing) and processor (high-swing)
• Memory-processor interface bottleneck → severe for data rich applications
• Memory hierarchy and latency
Conventional System
Compute Memory-based Address Decoder
CM-based address decoder
Capacitive adder
• All column read at a time• No bottleneck from IO bus-width
– Throughput -> less leakage w/ power gating• Embedded analog signal processing
– No digital blocks -> energy & area savings𝐻𝐻(𝑖𝑖) = �
𝑖𝑖,𝑗𝑗𝑎𝑎 𝑖𝑖, 𝑗𝑗 + 𝑝𝑝(𝑗𝑗)
6/10
Counter Array usingHierarchical Binary Decision (HBD)
Multi-block counter array w/ HBD
• Local binary decision from each block– Reduced global routing lines and toggling energy
• Final decision based on weighted summation by number of chosen rows in m-th block (Nm ) 7/10
Auto-associative Memory Recall
25% 21% 4% 0%Error rate
Image
Input 1st iteration 2nd iteration 3nd iteration
• 45nm SOI process technology• Read/Write operation
– Gray-scale nine patterns (P) with shapes of 1 to 9 with 25% randomly reversed bits
– 225 noisy copies for each P written -> total 2025 training data
Auto-associative memory
Example of proposed SDM operation
8/10
% of incorrect pixels
Error Rate vs. Energy
• 14.5X delay reduction• 2.7X energy saving• Negligible error rate degradation (< 0.4%)
Bit error rate Energy consumptions
9/10
Conclusion
• In-memory computing platform (Compute Memory) applied to address decoder
• Hierarchical binary decision (HBD) used for hardware reduction in counter array
• Delay reduction: 14.5X• Energy saving: 2.7X• Energy delay product reduction: 39X• Negligible accuracy degradation
10/10
Q & A