Nano-Tera Poster 2014 2nd PrizeObeSense
-
Upload
nanoterach -
Category
Documents
-
view
216 -
download
0
Transcript of Nano-Tera Poster 2014 2nd PrizeObeSense
Hardware / Software Optimizations for Efficient Embedded Digital Signal Processing
in Wireless Body Sensor Nodes
ObeSense RTD 2013
Rubén Braojos, Giovanni Ansaloni, David Atienza Embedded Systems Laboratory, EPFL
Overhead 3L-MF 3LMMD RPCLASS Code size 2,6 % 0,9 % 0,7 % Run-time 1,7 % 1,0 % 0,6 %
Experimental Results 3L-MF
SC MC 0
25
50
75
100 3L-MMD RP-CLASS
Pow
er c
onsu
mpt
ion
(µW
)
Clock Tree D-Crossbar I-Crossbar Cores & logic Data Mem. Prog. Mem.
SC MC SC MC
• Mul$-‐core WBSN are more energy efficient than single-‐core equivalents if they are properly synchronized !Up to 40% less power consump:on (3L-‐MF) !Very low overhead (< 3%) due to synchroniza$on
• Synchroniza$on + Broadcas:ng = Memory energy efficiency à 40% less accesses to IM and 3.7% less accesses to DM
• SIMD + Workload division à Lower clock constraint à VFS ! 15% Reduc:on of System VDD
Wireless Body Sensor Nodes (WBSNs) in Healthcare
Input Bio-signals
Output Diagnosis Fusion Analysis Filter
Consecutive phasesà Parallelism?
Multiple inputs à Parallelism?
• WBSNs are miniaturized devices able to acquire, process and transmit bio-‐signals (ECG, EMG, blood pressure movements…) é Wearable, unobtrusive, inexpensive ê Limited resources and autonomy
• On-‐board Digital Signal Processing (DSP) is employed to improve energy efficiency.
• Can algorithmic parallelism be exploited?
New Trend: Mul:-‐core WBSN for Efficient DSP
Multi-core WBSN
Inst. memory (IM)
Data memory (DM)
Core Core
Core Core
Core Core
Peripherals
Application partition
Parallelism
Parallelism
• Workload is divided into subtasks by pipelining phases and performing SIMD execu$on over several bio-‐signals à Each subtask is executed in a core
à Voltage frequency scaling (VFS) • Challenges
• Data-‐dependent branches (SIMD) • Data and control flow among phases
• Lack of efficient mechanism for • Re-‐synchroniza$on to maximize SIMD • Producer-‐consumer no$fica$on
REFERENCES: [1] Dogan, et al., “Low-‐ power processor architecture explora$on for online biomedical signal analysis,” IET-‐CDS 2012. [2] Rahimi, et al. “A fully-‐synthesizable single-‐cycle interconnec$on network for Shared-‐L1 processor clusters” DATE’11 [3] Y. Sun et al., “ECG signal condi$oning by morphological filtering,” Computers in Biology and Medicine, 2002. [4] F. Rincon et al., “Development and evalua$on of mul$lead wavelet-‐based ECG delinea$on algorithms for embedded wireless sensor nodes,” Informa$on Tech. in Biomedicine, 2011. [5] Braojos et al., “A methodology for embedded classifica$on of heartbeats using random projec$ons,” in DATE‘13
Conclusions Two considered systems: • Synchronized mul$-‐core
WBSN (MC) • Equivalent single-‐core
WBSN (SC)
Evalua$on Framework • RTL implementa$on • SystemC cycle-‐accurate
simulator • Custom compila$on tool-‐chain • Power model derived from
post-‐layout simula$ons
Proposed Solu:on: Lightweigth HW / SW Mechanism for Code Synchroniza:on
Cores
INTERC
ONNEC
T
INTERC
ONNEC
T
#0
#1
#2
#3
#4
#N
…
IM DM
Private #0
Private #1
Private #2
Private #3
Private #4
Shared
…
…
Synchronizer Steps to adapt exis$ng bio-‐medical applica$ons
1. Par::oning: Iden$fy algorithmic phases (producer-‐consumer) and poten$al parallel computa$ons (SIMD)
2. Instruc:on inser:on: Implement re-‐synchroniza$on and producer-‐consumer rela$ons with the specific ISE
3. Code mapping: Cores performing SIMD share the shame IM bank
Synchronizer unit: • Stalls and wakes-up cores to
orchestrate execution • Guaranties re-synchronization
Synchronization points: • Store run-time information for each
synchronization event • Reserved space in shared memory
Instruction Set Extension (ISE): • SLEEP, SINC, SDEC and SNOP • Mark synchronization events and
modify synchronization points
Mul$-‐core WBSN (based on [1]) • RISC ultra-‐low power
processors
• Mul:-‐banked IM and DM + shared and private DM
à Minimize memory conflicts
• Logarithmic combina$onal interconnect [3] with arbitra$on capabili$es
• Broadcas:ng: Simultaneous request of the same address are merged into a single memory access
à Memory energy efficiency
filtering
filtering
filtering
filt.
filt.
filt.
Combina$on Delinea$on filt.
filt.
filt.
Combina$on Delinea$on
classifica$on
Mul:-‐lead ECG filtering (3L-‐MF) [3] • Remove unwanted ar$facts (perspira$on,
muscular ac$vity,…) from mul$ple ECG signals
Mul:-‐lead ECG delinea:on (3L-‐MMD) [4] • Combines mul$ple filtered signals and finds
onset, peak and end of the main ECG waves
Heart-‐beat classifier (RP-‐CLASS) [5] • Performs mul$-‐lead delinea$on only in
the presence of abnormali$es
Benchmarks: Embedded Electrocardiogram (ECG) Applica:ons for WBSNs
Only SIMD execution
SIMD execution + Producer-consumer
SIMD execution + Producer-consumer + complex control
activation