End-to-End Signal Processing and Deep Learning
Transcript of End-to-End Signal Processing and Deep Learning
![Page 1: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/1.jpg)
Daniel B. Bryant
End-to-End Signal Processingand Deep Learning
(using Embedded GPUs)
![Page 2: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/2.jpg)
Deepwave Digital
• Full stack solutions for deep learning and GPU enabled signal processing systems• Edge compute hardware• Custom Applications• Tight coupling of hardware and software for performance• Radio embedded with FPGA, CPU, GPU • GPU-based signal processing algorithms• Pruned neural networks for inference on edge RF systems
• Testing and deployment platform for customer developed applications• AIR-T open platform for custom applications• Streamlines development, testing, and deployment• Many open source software tools
Enabling the Incorporation of Deep Learning and Radio Frequency (RF) Systems
Artificial Intelligence Radio Transceiver
![Page 3: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/3.jpg)
AI to Solve Complex ProblemsArtificial Networks Using Deep Learning
Simple Example: Image Recognition
![Page 4: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/4.jpg)
AI to Solve Complex ProblemsArtificial Networks Using Deep Learning
Simple Example: Image Recognition
![Page 5: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/5.jpg)
AI to Solve Complex ProblemsArtificial Networks Using Deep Learning
Simple Example: Image Recognition
Deep Learning identifies intricate patterns that are too obscure and subtle to be implemented into a human-engineered algorithm
Not a cat
Not a cat
![Page 6: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/6.jpg)
AI to Solve Complex ProblemsArtificial Networks Using Deep Learning
Simple Example: Image Recognition Congested Wireless Spectrum
Frequency (MHz)
Tim
ePo
wer
Deep Learning identifies intricate patterns that are too obscure and subtle to be implemented into a human-engineered algorithm
Not a cat
Not a cat
![Page 7: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/7.jpg)
Deep Learning and Radio Frequency (RF) Systems
7
Deep Learning is Emerging
• Intrusion Detection• Threat classification• Facial recognition• Imagery analysis
• Tumor Detection• Medical data analysis• Diagnosis• Drug discovery
• Pedestrian / obstacle detection
• Navigation• Street sign reading• Speech recognition
• Image classification• Speech recognition• Language translation• Document / database
searching
Cyber Medicine Autonomy Internet
Deep learning technology enabled and accelerated by GPU processors - Only beginning to impact design and applications in wireless and radio frequency systems
Ga
mm
a
X-Ray UV
Vis
ible
Infrared Radio Frequencies
1 p
m
10
pm
10
nm
40
0 n
m
70
0 n
m
1 m
m
10
0 k
m
RadarSatellite
CommunicationsElectronic Warfare
Telecommunications
MilitaryCommunications
Navigation
UAV Wireless ControlWireless Networking
Internet of ThingsRF Ablation(Medical)
Radio Frequency Technology is Pervasive
Enabled by low-cost, highly capable general purpose graphics processing units (GPUs)
![Page 8: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/8.jpg)
Where to Use Deep Learning in RF Systems
8
Modulate1 0 1 1 0 0 1
User Apps
Transmit Frequency
Convert
User Apps
Receive Frequency
Convert
1 0 1 1 0 0 1Demodulate
![Page 9: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/9.jpg)
• Spectrum monitoring (threats)• Intelligent spectrum usage• Electronic protection (anti-jam)• Cognitive system control
Spectrum / Network Centric Applications
• Advanced modulation techniques• Adaptive waveforms• Encryption and security
Device / BasestationCentric Applications
• Voice / image recognition• Multi-sensor fusion• Decision making and data reduction
User AppCentric Applications
Where to Use Deep Learning in RF Systems
9
Modulate1 0 1 1 0 0 1
User Apps
Transmit Frequency
Convert
User Apps
Receive Frequency
Convert
1 0 1 1 0 0 1Demodulate
![Page 10: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/10.jpg)
Deep Learning Comparison
• Multiple channels (RGB)• x, y spatial dependence• Temporal dependence (video)
• Single channel• Frequency, phase, amplitude• Temporal dependence
• Multiple channels• Frequency, phase, amplitude• Temporal dependence• Complex data (I/Q)• Large Bandwidths• Human engineered
10
Image and Video Audio and Language RF Systems and Signals
Existing deep learning potentially adaptable to systems and signals• Must contend with wideband signals and complex data types
![Page 11: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/11.jpg)
Why Has Deep Learning in RF Not Been Addressed
Backhaul Bandwidth Edge Compute Resources Disjointed software
11
• Insufficient bandwidth to upload to data center for processing
• Applications are latency sensitive
• Insufficient resources for AI/RF applications
• No RF agile AI / RF radio systems
• Complications of AI / signal processing software merger
• No existing unifying framework
![Page 12: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/12.jpg)
Hardware for Deep Learning in RF Systems
12
Training Inference
Pros Cons Pros Cons
CPU• Supported by ML Frameworks• Lower power consumption
• Slower than GPU• Fewer software
architectures
• Adaptable architecture• Software programmable• Medium latency
• Low parallelism• Limited real-time bandwidth• Medium power requirements
GPU
• Supported by ML Frameworks• Widely utilized• Highly parallel / adaptable• Good throughput vs power
• Overall power consumption
• Requires highly parallel algorithms
• Adaptable architecture• High real-time bandwidth• Software programmable
• Medium power requirements• Not well integrated into RF• Higher latency
FPGANot widely utilized, not well suited
(yet)
• High power efficiency• High real-time bandwidth• Low latency
• Long development / upgrades• Limited reprogrammability• Requires special expertise
ASIC Not widely utilized, not well suited
• Extremely power efficient• High real-time bandwidth• Highly reliable• Low latency
• Extremely expensive• Long development time• No reprogrammability• Requires special expertise
![Page 13: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/13.jpg)
Critical Performance Parameters for Deep Learning in RF Systems
13
Adaptability / Upgradability
Deployment Time Lifecycle Cost
Real Time Bandwidth
Compute / Watt Latency
CPU
GPU
FPGA
ASIC
GPU signal processing can provide wideband capability and software upgradability at lower cost and development time
- Must contend with increased latency (~2 microsecond)
![Page 14: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/14.jpg)
Artificial Intelligence Radio Transceiver (AIR-T)
•2x2 MIMO Transceiver• Tunable from 300 MHz to 6 GHz•125 MSPS (100 MHz bandwidth per channel)
•Digital Signal / Deep Learning Processors•Xilinx FPGA•NVIDIA Jetson TX2
• 6 CPU cores
• 256 Core GPU
• Shared GPU/CPU memory (zero-copy)
•AirStack Software Suite•Ubuntu Linux w/ Deepwave hardware drivers•All common AI software frameworks supported•Python or C++
AIR-T Platform System Specifications
Mini ITX Form Factor
13
The only software defined radio with built-in deep learning processors
Versatile embedded RF transceiver for cellular, communications, and defense applications
Embedded deep learning processor
![Page 15: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/15.jpg)
Artificial Intelligence Radio Transceiver (AIR-T)Block Diagram
RF TransceiverAnalog Devices 9371
NVIDIA Jetson
GPUNVIDIA Parker (256 Core)
CPUArm A57 (4 Core)GPU/CPU
Shared Memory
1 GigE
USB 3.0
GPIO
HDMI
SATA
JESD PCIe
Clock
FPGAXilinx Artix 7
GPIO
REF TIMEIncorporation of GPU in RF system allows for wideband processing of signal data in software environment
- Reduces development time and cost
2x2 MIMO
![Page 16: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/16.jpg)
16
Shared Memory Architecture for Embedded GPUs
Traditional GPU: PCIe Jetson GPU: Embedded
Reducing data copies and latency
CPU GPU
Shared Memory
CPU
System Memory
PCIe
GPU
GPU Memory
Jetson Embedded GPUs eliminate extra data copy with GPU/CPU shared memory• Enables signal processing stream applications with latency driven requirements
Signal Receive
Signal Transmit
Signal Receive
Signal Transmit
![Page 17: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/17.jpg)
Deploy Application
TensorRT
Optimizer Runtime
Optimize Neural Network
Inference at the Edge with AirStack Software
17
Train Neural Network
Streamlined workflow for deploying deep learning in software defined radio
![Page 18: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/18.jpg)
Inference Pipeline for Signals
Signal Input Inference Engine
"signal processing"
Magic
![Page 19: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/19.jpg)
GNU Radio – Software Defined Radio (SDR) Framework
• Popular open source software defined radio (SDR) toolkit:• RF Hardware optional• Can run full software simulations
• Python API• C++ under the hood
• Easily create DSP algorithms• Custom user blocks
• Primarily uses CPU• Advanced parallel instructions
• Deepwave is integrating GPU support for both DSP and ML
19
![Page 20: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/20.jpg)
Tying it Together in GNU Radio
GNU RadioOpen-source framework for rapid prototyping of signal processing
Signal Input
Inference Engine
"signal processing"
gr-soapy [CPU] Built-in signal processing blocks
[CPU] Custom blocks (C++ or numpy)
[GPU] Custom kernels using gr-cuda
gr-wavelearner
SoapySDR TensorRT
![Page 21: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/21.jpg)
GPU Custom Signal Processing: GR-CUDACustom GPU Signal Processing with GNU Radio on the AIR-T
• Deepwave provides a simple example for wrapping a custom CUDA kernel with a GNU Radio block• Uses pyCUDA under the hood
• Can place a series of operations into one block with a simple interface
• Output can be routed to gr-wavelearner for inference
• Source code available on GitHub
• Full tutorial on Deepwave website
Deepwave Tutorial: https://deepwavedigital.com/tutorials/custom-gpu-signal-processing-with-gnu-radio-on-the-air-t
GR-CUDA GitHub: https://github.com/deepwavedigital/gr-cuda
![Page 22: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/22.jpg)
GR-Wavelearner
• Three blocks currently:• Inference – wraps a serialized
TensorRT neural network
• Terminal Sink – Python module for displaying classifier output
• FFT – cuFFT wrapper
• Open source module for GNU Radio
• C++ and Python API
• GPLv3 license
• README with instructions to get started quickly
22
Easily Incorporate Inference into Signal Processing
![Page 23: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/23.jpg)
What About Training?
(some assembly required)
Where do I put my training wheels?
![Page 24: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/24.jpg)
Python Stack for Training and Inference
Signal Input
Deep Learning Model
"signal processing"
SoapySDR python bindings
scipy.signal
numpy
cuSignal
SoapySDR
ARM CPU
CUDA
NVIDIA GPU
cupyPyCUDA, numba
custom CUDA kernels
TensorRT python bindings
TensorRT
![Page 25: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/25.jpg)
Example: A Simple Radio Interface
![Page 26: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/26.jpg)
Example: A Simple Radio Interface
Data is transferred directly from the radio hardware into memory accessible by the GPU
![Page 27: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/27.jpg)
Example: Power Estimation / Energy Detection
Processing data on the ARM CPU
Processing on the GPU and reading back a single float
![Page 28: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/28.jpg)
AIR-T: numpy vs. cupy vs. a custom kernel
Processing on the GPU is not always better! Benchmark, good
tools are available and easy to use
![Page 29: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/29.jpg)
Deep Dive: Why did we see this performance?
• Annotate each signal processing function to capture a profile using nvprof
• cupy makes this simple, and you can analyze mixed CPU/GPU code just as easily
• Use pytest-benchmark to run each function many times...
![Page 30: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/30.jpg)
Deep Dive: Profiling Results
Profile of 16384 samples case, GPU (cupy) and GPU (custom kernel)
Profiler immediately shows each cupy kernel and the overhead of kernel launch.Note the lack of any memory copying calls in either case!
![Page 31: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/31.jpg)
Example: Performing Inference
![Page 32: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/32.jpg)
Signal Processing Recap
Python signal processing code can be shared between training and inference pipelines
Rich open-source libraries exist today to make this easy on the GPU
Profiling support is mature and will help you optimize
Algorithms can be easily wrapped in GNU Radio blocks or python to integrate deep learning with a larger signal processing system
![Page 33: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/33.jpg)
Deep Learning Wireless Deployment Scenario
33
Goal: Detect and classify signals in congested environment using AIR-T
![Page 34: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/34.jpg)
Radar Signal Detector Model
34
Example Classifier
Signal stream in Signal identification out
•••
Process with deep learning
![Page 35: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/35.jpg)
Radar Signal Detector Model: Transmitted Signals
Radar Waveform
Linear Pulse X X X X X X
Non-Linear Pulse X X X
Phase Coded Pulse X
Pulsed Doppler X X X
35
Technique demonstration shown with nominal radar signals• Method applicable to communications, cellular, and other RF protocols
![Page 36: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/36.jpg)
Real Time Deep Neural Network (DNN) ClassifierMonitors 125 MHz of instantaneous bandwidth
Input Signal
36
• 125 MSPS GPU channelizer • Modular design
• Trained on 10,000+ signal segments• Hardened through channel models and
analog distortions
AIR-TGPU Signal Processing
(pyCUDA)
DNN Classifier C
lient
Det
ecto
r
ServerRF 1 GigEID
Rec
eive
r
DMA N-channels
Cha
nnel
izer
![Page 37: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/37.jpg)
Pre-processing Overview and Methodology
37
• Significantly improves signal classification performance• Increases SNR and PCC
• Signal isolation
• Provide separate path for negative SNR (signal less likely to be present) cases
Signal Pulses(full BW)
Signal PSD
ChannelizedSignal
Signal Pulses
Synthetic Data
Laboratory Data
Over the Air Data
Prove model with synthetic data (TRL-3)
Validation in controlled environment (TRL-6)
Validation in deployed environment (TRL-9)
![Page 38: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/38.jpg)
Processing Utilization on AIR-T
Input Signal
AIR-TGPU
DNN Classifier C
lien
t
Cha
nnel
izer
De
tect
or
ServerRF 1 GigER
ece
ive
rDMA
• CPU utilization: • 40% of one ARM core for processing• 30% of one ARM core for network I/O, data logging, and other management tasks
• GPU utilization: • 85% includes both signal processing and inference tasks
![Page 39: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/39.jpg)
Performance Benchmarking Test Setup
39
Define Model Structure
Repeat for multiple models
• Stream unthrottled data to network
• Measure data rate at two locations:
1. Aggregate data rate for entire process
• Number of bytes processed / wall time
2. Computation data rate in work() function
• Number of bytes processed / computation time
Train ModelMeasure
SensitivityMeasure Real
Time Throughput
![Page 40: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/40.jpg)
Data Rate Benchmark for AIR-T (Jetson TX2)
40
• Tested 91 different CNN classifier models with 8 batch sizes• 728 models tested
• Able to achieve 200 MSPS (real samples) with AIR-T
AIR-T
![Page 41: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/41.jpg)
Model Accuracy Benchmarks
41
![Page 42: End-to-End Signal Processing and Deep Learning](https://reader034.fdocuments.in/reader034/viewer/2022052301/6288fa448111d701a97ecab0/html5/thumbnails/42.jpg)
Summary
• Deep learning within signal processing is emerging
• High bandwidth requirements driving edge solutions• Embedded GPUs now suitable for signal processing
• Deepwave developed AIR-T• Edge-compute inference engine with MIMO transceiver
• FPGA, CPU, GPU
• Open-source Python ecosystem for deep learning with signals is improving daily: give it a try!
• Benchmarking demonstrates real-time signal classification inference at rates approaching 200 MSPS with AIR-T
42
More info at www.deepwavedigital.com/sdr