A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI...
Transcript of A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI...
![Page 1: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/1.jpg)
Atomix: A Framework for Deploying
Signal Processing Applica:ons on Wireless Infrastructure
Manu Bansal, Aaron Schulman, Sachin KaA
Stanford University
NSDI ‘15
![Page 2: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/2.jpg)
SoIware-‐defined base-‐sta:ons
int main() { //PHY & MAC … }
Wireless infrastructure is programmable
TI 6670 mul:core DSP SoC
DSP0 DSP1
Shared SRAM
Local SRAM
Local SRAM
![Page 3: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/3.jpg)
We could be deploying apps into infrastructure
3
RF localiza:on (SecureArray)
Video-‐op:mized wireless stack
(APEX)
BER feedback (SoIRate)
What modifica:ons are needed to deploy apps?
![Page 4: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/4.jpg)
Primi:ves needed for deploying apps
4
Base-‐sta:on soIware needs to provide a modular interface to tap, tweak, and insert
CSI
EQ VITDEC
Tap
OFDM SLICER
(BER)
Insert
SYNC
Tweak
![Page 5: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/5.jpg)
Apps require high throughput and low latency
5 SoIware must deliver hardware-‐like performance
Time 250us 16us
Data frame ACK frame
… 20Msps == 640Mbps
High processing throughput
Low processing latency
WiFi stack example (LTE has easier latency constraints)
![Page 6: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/6.jpg)
GeAng hardware-‐like performance
6
DSP0 DSP1
Shared SRAM
Local SRAM
Local SRAM
Pipeline programming Memory management Inter-‐core data transfers
Customized hardware
Hand-‐op:mized soIware
![Page 7: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/7.jpg)
GeAng hardware-‐like performance
7
DSP0 DSP1
Shared SRAM
Local SRAM
Local SRAM
Pipeline programming Memory management Inter-‐core data transfers
Customized hardware
Hand-‐op:mized soIware
![Page 8: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/8.jpg)
GeAng hardware-‐like performance
8
DSP0 DSP1
Shared SRAM
Local SRAM
Local SRAM
Pipeline programming Memory management Inter-‐core data transfers
Modular changes can have unpredictable effects on :ming
Customized hardware
Hand-‐op:mized soIware
![Page 9: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/9.jpg)
The Atom Abstrac:on
9
Atom A
tA (e.g. 200 cycles)
Atom: A unit of execu:on with fixed, known :ming
Atom A Atom B
tA + tB
Composability: Composi:on of atoms is also an atom
Base-‐sta:on soIware in Atomix… 1) Can be built en:rely out of atoms
2) Achieves hardware-‐like performance 3) Enables modular modifica:ons
![Page 10: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/10.jpg)
Base-‐sta:on soIware can be built with atoms Atomix WiFi chain can meet throughput and latency Apps are easily added to Atomix WiFi
10
![Page 11: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/11.jpg)
WiFi signal processing chain
11
Time
Data frame Data frame
…
CSI EQ
OFDM SLICER SYNC
Data flowgraph over signal processing blocks
(BPSK or QPSK)
(BPSK) (QPSK)
![Page 12: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/12.jpg)
Implemen:ng blocks as atoms
12
Slicer QPSK
BPSK Constella:on? BPSK: 200cy QPSK: 400cy
Input data length? 1 OFDM symbol: 200cy 2 OFDM symbols: 400cy
N? M?
1) Split out branches, 2) Fix data lengths Make branching explicit
tBPSK48 = 200cy
QPSK48
48 cplx 48 bits
48 cplx 96 bits
BPSK48
CSI EQ
OFDM SLICER SYNC
(BPSK or QPSK)
![Page 13: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/13.jpg)
Implemen:ng flowgraphs as atoms
13 Make branching explicit
CSI EQ
OFDM SLICER SYNC
CSI EQ
OFDM BPSK48 SYNC
CSI EQ
OFDM QPSK48 SYNC
(BPSK or QPSK)
![Page 14: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/14.jpg)
Implemen:ng flowgraphs as atoms
Explicitly model data access cost using FIFO atoms 14
BPSK Atom
CSI EQ
OFDM BPSK48 SYNC
CSI
EQ
OFDM
BPSK SYNC
F F
F F
F F
F F F F F
![Page 15: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/15.jpg)
BPSK F F
Parallelizing flowgraphs with atoms
15
BPSK Atom CSI
EQ
OFDM
F F
F F
F F
F SYNC F F
Core 0
Core 1
![Page 16: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/16.jpg)
BPSK F F
Parallelizing flowgraphs with atoms
16
EQUALIZER Atom – Core 0
BPSK48 F F
BPSK Atom – Core 1
Transfer F F
CSI
EQ
OFDM
F F
F F
F F
F SYNC F F
Core 0
Core 1
Explicitly model data transfer cost as an atom
![Page 17: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/17.jpg)
Implemen:ng decisions with atoms
17 Make branch explicit, push to the top-‐level
DISPATCHER Header atom
BPSK atom
QPSK atom
BPSK
QPSK
![Page 18: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/18.jpg)
Atomix framework in 1-‐slide
• Everything as an atom – Signal processing components – Hardware management components
• Atoms for blocks, flowgraphs, states • Simple control flow makes atoms composable • Declara:ve language allows easy modifica:on
18 Atoms enable modularity, precise :ming, efficient pipelines
![Page 19: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/19.jpg)
Base-‐sta:on soIware can be built with atoms Atomix WiFi chain can meet throughput and latency Apps are easily added to Atomix WiFi
19
![Page 20: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/20.jpg)
Fine-‐grained pipeline parallelism
20
CSI
EQ
OFDM
QAM64
SYNC
DEINTER-‐LEAVER
DEPUNC-‐TURER
DECODE-‐SCATTERER
VITERBI-‐ ISSUE
DECODE-‐GATHERER
DESCRA-‐MBLER CRC32
DSP0
DSP1
DSP2
DSP3
VCP0
VITERBI-‐DECODING
VITERBI-‐DECODING
VITERBI-‐DECODING
VITERBI-‐DECODING
VCP1 VCP2 VCP3
80-‐sample buffers (OFDM symbols)
Decoded bits … 010110
…
![Page 21: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/21.jpg)
21
![Page 22: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/22.jpg)
22
Packet Energy
CRC Toggle
![Page 23: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/23.jpg)
Fine-‐grained pipeline parallelism
23 Atomix WiFi decodes 10MHz with resources to spare
![Page 24: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/24.jpg)
Tight packet decode latency
24
WiFi highest-‐MCS 1000-‐byte packets, CDF of decode latency (deadline = 64us at 5MHz, 32us at 10MHz, 16us at 20MHz)
Atomix WiFi decodes 10MHz in low latency, predictable :ming
![Page 25: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/25.jpg)
Experience with WiFi in Atomix
25
Low-‐level code (C)
Signal processing func:ons (C)
Parallelized Atoms (Ax)
Schedule, Resource Assignment
Atomix compiler
Atomix run:me libraries
Na:ve app binary
Na:ve compiler
3,000 Loc (with Atomix)
30,000 LoC (w/o Atomix)
![Page 26: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/26.jpg)
Base-‐sta:on soIware can be built with atoms Atomix WiFi chain can meet throughput and latency Apps are easily added to Atomix WiFi
26
![Page 27: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/27.jpg)
Loca:on-‐signature app
No change in WiFi packet decode latency 27
CSI EQ
OFDM
SLICER
PH
SYNC
Rxx EIG SPT
4 new signal processing blocks 30 lines of code to add app
Predictable change in SYNC atom
2 new signal processing blocks 20 lines of code to add app
Predictable change in SYNC, DATA
CSI EQ
EVM
OFDM
SLICER
CIR
SYNC
ED
CNR
![Page 28: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/28.jpg)
Related work • Modular frameworks for GPPs and FPGAs – SORA: works on GPPs, no clear mapping to DSPs – AirBlue: Targets FPGAs, different challenges than DSPs – Ziria: complementary to Atomix
• Embedded real-‐:me opera:ng systems (Neutrino, VxWorks, TI SYS/BIOS) – Typically for low sample rate apps (e.g., an:-‐locking brakes)
– Misfits for expressing modular signal-‐processing apps – No abstrac:ons for blocks, flowgraphs, state-‐machine
28
![Page 29: A(Framework(for(Deploying(( Signal(Processing(Applicaons ... · DSP1 DSP2 DSP3 VCP0(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VITERBI DECODING(VCP1( VCP2( VCP3(80Jsample(buffers((OFDMsymbols)(Decoded(bits](https://reader034.fdocuments.in/reader034/viewer/2022042521/5f91183a9d9bdc4d4c7d662a/html5/thumbnails/29.jpg)
Conclusion Atomix: a new programming framework • Everything is an atom • Hardware-‐like performance • Modularity to tap, tweak, insert Future work: • Automated resource scheduling • Sta:c program checking • Extending to L2-‐L7 NFV packet processing
29