A New DSP Approach for 5G and AI...• New Short Range Wireless, 802.11af, ay, bb (LiFi) • Both...
Transcript of A New DSP Approach for 5G and AI...• New Short Range Wireless, 802.11af, ay, bb (LiFi) • Both...
A New DSP Approach for 5G and AI
Albert CamilleriVP Business Development North AmericaVSORA Inc.
• Company founded in 2015
• Headquarters: France
• Each founder has more than 10 years experience in Digital Signal
Processor (DSP) design, working in global consumer markets
• Previous founders’ designs widely used in successful consumer,
automotive and industrial high volume products
Paris
SanDiego
Taipei
Shenzhen
Tokyo
Company Background
The information contained in this document is confidential and shall not be disclosed to third parties without a written consent from Vsora
Reinventing ‘Digital Signal Processing’ (DSP)
The information contained in this document is confidential and shall not be disclosed to third parties without a written consent from Vsora
Wireless
CommsRF
Baseband
ChannelEncoder
/ Decoder
AppProcessorDSP
DataDSPINFERENCE
Trained Model Result
AppProcessor
Artificial Intelligence
Artificial Intelligence (Terminals / Edge)
• Neural Networks
• Image / video
• Speech recognition / Audio
• Language Translation
5G Wireless Communications
• mmWave, MiMo, Beamforming, Carrier Aggregation
• Enhanced 1Gbps+ Mobile Broadband
• Massive Machine Type Comms, Smart Home / Cities
• Ultra reliable low latency comms (< 1ms), IoT
• New Short Range Wireless, 802.11af, ay, bb (LiFi)
• Both terminals and infrastructure
Traditional Architecture Limits Flexibility
• Single threaded processors falling
further behind 1 Gbps+ demand
• Bespoke, fixed algorithm, co-processors
increase the well known ASIC problems
• Inflexible, hard to mature quickly,
inappropriate in the new world of rapid
standards evolutions
The information contained in this document is confidential and shall not be disclosed to third parties without a written consent from Vsora
Host CPU+
Memory
SignalsIn / Out
The Memory Bottleneck Problem
Signal Memory bottleneck will stall and limit the promise of 5G and AI
• Need for ever greater symbol word length and depth
• Signal Memory (Cache) I/O bandwidth explosion
• 5G modems and Massively Parallel Neural Network Processors are predominantly built on the same DSP type architectures today
The information contained in this document is confidential and shall not be disclosed to third parties without a written consent from Vsora
• Completely configurable: • Number of ALUs• Memory size• Quantization (IEEE754 like), i.e.
number of exponent/mantissa bits
• Liberates the “Bottleneck”• Signal (cache) memory more tightly
coupled
• Signals manager pre-configures signal data
• DSP is tightly controlled by the host processor
The information contained in this document is confidential and shall not be disclosed to third parties without a written consent from Vsora
VSORA MPU
ALUs
High BW Signal Memory
Signals Manager
+ / x
ACC
a1 b1 c1
1 2 43
+ / x
ACC
a2 b2 c2
+ / x
ACC
a3 b3 c3
+ / x
ACC
a4 b4 c4
Host Processor
Introducing the Matrix Processor Unit (MPU)
The information contained in this document is confidential and shall not be disclosed to third parties without a written consent from Vsora
Single-core / Multi-core Architecture
• MPUs are programmed at an algorithm level in C++ with a MATLAB like API
• High-level simulation methodology provides performance/power/area trade-off data• Can be modified and iterated at
the algorithmic level to attempt 100% DSP utilization
• Algorithm code compiled directly to DSP via modified LLVM compiler• No low level code required
• Engineering productivity enhancer
Completely configurable in terms of: • The number of cores (single/multi-core)• The number of DMAs/core
Multi-Core MPU
VsoraMPU-1
VsoraMPU-4
VsoraMPU-2
VsoraMPU-3
SignalsIn / Out
Host CPU+
Memory
Ability to map complex systems onto multiple cores, and dimension optimal solutions.
The information contained in this document is confidential and shall not be disclosed to third parties without a written consent from Vsora
AI Supported Frameworks
VSORA AI Framework Load ToolGraph OptimizationModel Quantization
VSORA Library
VSORA AI-DSP
Compiler
ALUs
High BW Signal Memory
Signals Manager
+ / x
ACC
a1 b1 c1
1 2 43
+ / x
ACC
a2 b2 c2
+ / x
ACC
a3 b3 c3
+ / x
ACC
a4 b4 c4
• Fully programmable Solution
• TensorFlow, PyTorch, …, supported frameworks
• Configurable:
• Number of MACs: 256, 1024, 2304, 4096, 6400, 9216, 12544, 16384, …, 65536
• IEEE754 Quantization: number of bits (sign/exponent/mantissa)
• Number of DMAs
• High MPU processing efficiency
• Does not suffer memory bandwidth bottleneck to load large numbers of MACs
VSORA AI Solution
The information contained in this document is confidential and shall not be disclosed to third parties without a written consent from Vsora
The information contained in this document is confidential and shall not be disclosed to third parties without a written consent from Vsora
Reinvented Development Flow
Drawbacks• Four different, large engineering teams•Very slow process, exceedingly expensive
AlgorithmDefinition
Implementation• Wired logic
(DSP Co-Pro)• DSP
Link Layer SoftwareDevelopment
ASIC HardwareIntegration
months
High-Level Code or Specifications
Algo Engineers[MATLAB]
DSP Engineers[C/C++/Verilog/VHDL]
Binary Code
API Definition
HW Engineers[Verilog, System Verilog]
SW Engineers[C/C++]
Simulation Platform
Algorithms definition
MSP dimensioning
min
Algo Engineers [MATLAB Like]
Link Layer SoftwareDevelopment
ASIC HardwareIntegration
HW Engineers[Verilog, System Verilog]
SW Engineers[C/C++]
MSPConfiguration
Benefits• Reduced personnel• Fast algorithm definition and DSP
dimensioning• Easy integration of Signal Processing &
Embedded SW code
High Level Code
Algorithms ReworkRequired
Signal ProcessingRelated HW
Summary
The information contained in this document is confidential and shall not be disclosed to third parties without a written consent from Vsora
Highly configurable “tiled” solution
• “Unlimited” number of Cores
• Scalable memory/DMA bandwidth avoids bottlenecks
Eliminates need for inflexible co-processors
• Flexible coding: mix signal processing and link-layer/neural-processing SW
Implementation independent, high-level programmability
• Supports design flexibility to facilitate market evolution
Tiered simulation platforms
• MATLAB/Tensorflow level, FPGA (Cloud) platform, IP/RTL simulation
Compiler technology empowers 100% DSP utilization
• Optimizes engineering efficiency
• Facilitates performance/area/power tradeoffs
Thank You