Nano-Tera Poster 2014 2nd PrizeObeSense

Post on 14-Apr-2017

216 views 0 download

Transcript of Nano-Tera Poster 2014 2nd PrizeObeSense

Hardware / Software Optimizations for Efficient Embedded Digital Signal Processing

in Wireless Body Sensor Nodes

ObeSense RTD 2013

Rubén Braojos, Giovanni Ansaloni, David Atienza Embedded Systems Laboratory, EPFL

Overhead 3L-MF 3LMMD RPCLASS Code size 2,6 % 0,9 % 0,7 % Run-time 1,7 % 1,0 % 0,6 %

Experimental  Results  3L-MF

SC MC 0

25

50

75

100 3L-MMD RP-CLASS

Pow

er c

onsu

mpt

ion

(µW

)

Clock Tree D-Crossbar I-Crossbar Cores & logic Data Mem. Prog. Mem.

SC MC SC MC

•  Mul$-­‐core  WBSN  are  more  energy  efficient  than  single-­‐core  equivalents  if  they  are  properly  synchronized    !Up  to  40%  less  power  consump:on  (3L-­‐MF)    !Very  low  overhead  (<  3%)  due  to  synchroniza$on  

 

•  Synchroniza$on  +  Broadcas:ng  =  Memory  energy  efficiency    à  40%  less  accesses  to  IM  and  3.7%    less  accesses  to  DM  

 

•  SIMD  +  Workload  division  à  Lower  clock  constraint  à  VFS    !  15%  Reduc:on  of  System  VDD  

Wireless  Body  Sensor  Nodes  (WBSNs)  in  Healthcare  

Input Bio-signals

Output Diagnosis Fusion   Analysis  Filter  

Consecutive phasesà Parallelism?

Multiple inputs à Parallelism?

•  WBSNs  are  miniaturized  devices  able  to  acquire,  process  and  transmit  bio-­‐signals  (ECG,  EMG,  blood  pressure  movements…)  é  Wearable,  unobtrusive,  inexpensive  ê  Limited  resources  and  autonomy  

•  On-­‐board  Digital  Signal  Processing  (DSP)  is  employed  to  improve  energy  efficiency.  

•  Can  algorithmic  parallelism  be  exploited?  

New  Trend:  Mul:-­‐core  WBSN  for  Efficient  DSP  

Multi-core WBSN

Inst.  memory  (IM)  

Data  memory  (DM)  

Core   Core  

Core   Core  

Core   Core  

Peripherals  

Application partition

Parallelism

Parallelism

•  Workload  is  divided  into  subtasks  by  pipelining  phases  and  performing        SIMD  execu$on  over  several  bio-­‐signals    à  Each  subtask  is  executed  in  a  core    

       à  Voltage  frequency  scaling  (VFS)  •  Challenges  

•  Data-­‐dependent  branches  (SIMD)  •  Data  and  control  flow  among  phases  

•  Lack  of  efficient  mechanism  for  •  Re-­‐synchroniza$on  to  maximize    SIMD  •   Producer-­‐consumer  no$fica$on  

REFERENCES:    [1]  Dogan,  et  al.,  “Low-­‐  power  processor  architecture  explora$on  for  online  biomedical  signal  analysis,”  IET-­‐CDS  2012.  [2]  Rahimi,  et  al.  “A  fully-­‐synthesizable  single-­‐cycle  interconnec$on  network  for  Shared-­‐L1  processor  clusters”  DATE’11  [3]  Y.  Sun  et  al.,  “ECG  signal  condi$oning  by  morphological  filtering,”  Computers  in  Biology  and  Medicine,  2002.  [4]  F.  Rincon  et  al.,  “Development  and  evalua$on  of  mul$lead  wavelet-­‐based  ECG  delinea$on  algorithms  for  embedded  wireless  sensor  nodes,”  Informa$on  Tech.  in  Biomedicine,  2011.    [5]  Braojos  et  al.,  “A  methodology  for  embedded  classifica$on  of  heartbeats  using  random  projec$ons,”  in  DATE‘13  

 

Conclusions  Two  considered  systems:  •  Synchronized  mul$-­‐core  

WBSN  (MC)  •  Equivalent  single-­‐core  

WBSN  (SC)  

Evalua$on  Framework    •  RTL  implementa$on  •  SystemC  cycle-­‐accurate  

simulator  •  Custom  compila$on              tool-­‐chain  •  Power  model  derived  from  

post-­‐layout  simula$ons  

Proposed  Solu:on:  Lightweigth  HW  /  SW  Mechanism  for  Code  Synchroniza:on  

Cores

INTERC

ONNEC

T  

INTERC

ONNEC

T  

#0  

#1  

#2  

#3  

#4  

#N  

IM DM

Private  #0  

Private  #1  

Private  #2  

Private  #3  

Private  #4  

Shared  

Synchronizer  Steps  to  adapt  exis$ng  bio-­‐medical  applica$ons    

1.   Par::oning:  Iden$fy  algorithmic  phases  (producer-­‐consumer)    and  poten$al  parallel  computa$ons  (SIMD)  

2.   Instruc:on  inser:on:  Implement  re-­‐synchroniza$on  and  producer-­‐consumer  rela$ons  with  the  specific  ISE  

3.   Code  mapping:  Cores  performing  SIMD  share  the  shame  IM  bank  

Synchronizer unit: •  Stalls and wakes-up cores to

orchestrate execution •  Guaranties re-synchronization

Synchronization points: •  Store run-time information for each

synchronization event •  Reserved space in shared memory

Instruction Set Extension (ISE): •  SLEEP, SINC, SDEC and SNOP •  Mark synchronization events and

modify synchronization points

Mul$-­‐core  WBSN  (based  on  [1])  •  RISC  ultra-­‐low  power  

processors  

•  Mul:-­‐banked  IM  and  DM  +  shared  and  private  DM    

       à  Minimize  memory  conflicts  

•  Logarithmic  combina$onal  interconnect  [3]  with  arbitra$on  capabili$es  

•  Broadcas:ng:  Simultaneous  request  of  the  same  address  are  merged  into  a  single  memory  access    

     à    Memory  energy  efficiency  

filtering  

filtering  

filtering  

filt.  

filt.  

filt.  

Combina$on   Delinea$on   filt.  

filt.  

filt.  

Combina$on   Delinea$on  

classifica$on  

Mul:-­‐lead  ECG  filtering  (3L-­‐MF)    [3]  •  Remove  unwanted  ar$facts  (perspira$on,  

muscular  ac$vity,…)  from  mul$ple  ECG  signals  

Mul:-­‐lead  ECG  delinea:on  (3L-­‐MMD)  [4]  •  Combines  mul$ple  filtered  signals  and  finds  

onset,  peak  and  end  of  the  main  ECG  waves  

Heart-­‐beat  classifier  (RP-­‐CLASS)  [5]  •  Performs  mul$-­‐lead  delinea$on  only  in  

the  presence  of  abnormali$es  

Benchmarks:  Embedded  Electrocardiogram  (ECG)  Applica:ons  for  WBSNs  

Only SIMD execution

SIMD execution + Producer-consumer

SIMD execution + Producer-consumer + complex control

activation