4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 1
Development Tools for 4G Hardware and Software
Frank Schirrmeister, SynopsysFrank Vince, Steepest Ascent
Chris Rowen, Tensilica
So it’s the Software…
“Phone differentiation used to be about radios andPhone differentiation used to be about radios and antennas and things like that. We think, going forward,
the phone of the future will be differentiated by software.”
Steve Jobs, CEO, Apple, August 11, 2008
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 2
… Unless it’s the Hardware
“PA Semi is going to do t hi f iPh ”system-on-chips for iPhones...”
Steve JobsCEO, Apple
June 10, 2008
“…a strategic choice…ensuring Apple can continue to differentiate its flagship phone...”
Forbes.comApril 23, 2008
Some 4G Development Challenges
MIMO-OFDM Transmitter Receiver Chain Decide on the right algorithm
Optimize the algorithmOp e e a go
Comms Layer 2Comms Layer 3
HW/SW Implementation
Find the best implementation architecture
Optimize algorithm in the architecture context
Optimize HW/SW Middleware
ApplicationsComms Layer 2Comms Layer 3
HW/SW Chip and System Integration
Optimize HW/SW Integration
MPU DSPModem
Modem Subsystem
HAL
RTOSComms Layer 1
iversDevice Drivers
Partitioning
Mobile Chipset Hardware
IPC
Application Subsystem
MPU DSPMulti-Media
Operating System
Middleware
HALDevice DriversDevice Drivers
MPU DSPModem
Modem Subsystem
HAL
RTOSComms Layer 1
iversDevice Drivers
Start HW/SW
integration early
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 3
Some 4G Development Challenges
MIMO-OFDM Transmitter Receiver Chain Decide on the right algorithm
Optimize the algorithmOp e e a go
Comms Layer 2Comms Layer 3
HW/SW Implementation
Find the best implementation architecture
Optimize algorithm in the architecture context
Optimize HW/SW Middleware
ApplicationsComms Layer 2Comms Layer 3
HW/SW Chip and System Integration
Optimize HW/SW Integration
MPU DSPModem
Modem Subsystem
HAL
RTOSComms Layer 1
iversDevice Drivers
Partitioning
Mobile Chipset Hardware
IPC
Application Subsystem
MPU DSPMulti-Media
Operating System
Middleware
HALDevice DriversDevice Drivers
MPU DSPModem
Modem Subsystem
HAL
RTOSComms Layer 1
iversDevice Drivers
Start HW/SW
integration early
Frank Vince, Steepest Ascent
STEEPEST ASCENT
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 4
Steepest Ascent: Optimizing the 4G Algorithm
MIMO-OFDM Transmitter Receiver Chain Decide on the right algorithm
Optimize the algorithmOp e e a go
Comms Layer 2Comms Layer 3
HW/SW Implementation
Find the best implementation architecture
Optimize algorithm in the architecture context
Optimize HW/SW Middleware
ApplicationsComms Layer 2Comms Layer 3
HW/SW Chip and System Integration
Optimize HW/SW Integration
MPU DSPModem
Modem Subsystem
HAL
RTOSComms Layer 1
iversDevice Drivers
Partitioning
Mobile Chipset Hardware
IPC
Application Subsystem
MPU DSPMulti-Media
Operating System
Middleware
HALDevice DriversDevice Drivers
MPU DSPModem
Modem Subsystem
HAL
RTOSComms Layer 1
iversDevice Drivers
Start HW/SW
integration early
Presentation OverviewLTE Library for System Studio introduction
Features and capabilities
Why the LTE Library? Productivity increaseWhy the LTE Library? Productivity increase
LTE physical resources overview
Efficient use of matrices in LTE Library
LTE channel coding overview
LTE Library capabilitiesy p
Physical channels and modulation overview
Implementation example with LTE Library
Summary
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 5
Introduction: LTE Library for System Studio
3G Evolution LabPart of Steepest Ascent’s 3G Evolution Lab product family
Comprehensive PHY simulation libraryComprehensive PHY simulation library
Release 8 of E-UTRA standard
Library applications
Golden reference verification
Custom T&M waveform generationCustom T&M waveform generation
Algorithm design – don’t sweat the math
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 6
LTE System ComplexityLTE: High system complexity
Advanced algorithms required
Extensive testing required
eq alisationequalisationsynchronisation
Simplify the Design Process
Reference receivers System Studio
testsimulationverification
Test models & RMCs
Custom waveforms
LTE Library
verification
synchronisation
ch. estimation
MIMO receiver& equalisation
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 7
Library OverviewPhysical layer blocks offered
Channel coding/decoding
Modulation/demodulationModulation/demodulation
Transport and physical channels supported
Downlink Uplink
Tr. channels & control information
DL-SCH BCH HI UL-SCH UCIPCH DCI CFIPCH DCI CFI
Physical channels & signals
PDSCH PDCCH PSS- SSS PUSCH PUCCH
PBCH PHICH PCFICH DRS SRS
Physical Layer
Presentation
Application
Physical
Data Link
Network
Transport
Session
LTE Library for System Studio
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 8
Library FeaturesDownlink and uplink FDD duplexing
Transmit receive processing chain
Transport channel coding/decoding
Scrambling/descrambling
Symbol modulation/demapping
Layer mapping & precoding
Resource element mappingResource element mapping
OFDM & SC-FDMA modulation
DCI message creation
Why the LTE Library for System Studio?Productivity Increase
Prebuilt models: get standard waveforms fast
Test models & RMCs
Speed of execution: assess your designs fast
Efficient use of matrices to represent resource grid
Minimises data passing overhead between lib bl klibrary blocks
Exploit MKL library available in System Studio
Tried and tested functionality
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 9
3G Evolution Lab – LTE LibraryPhysical Resources
LTE Introduction
This section describes how physical resources of time and frequency are quantised and used in LTE
Time is quantised as
Frame, subframe and slot
Frequency is quantised in the following wayFrequency is quantised in the following way
Subcarriers for OFDM modulation
Resource blocks and resource allocations
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 10
Frame Structure Type 1 (FDD)
Type 1 (for FDD) frame structure
0 101 2 3 4 5 6 7 8 9 11 12 13 14 15 16 17 18 19
Frame: 10 msec (20 slots)
slot0.5 msec
subframe1 msec(2 slots)( )
Resource Grid2 dimensional structure:
OFDM symbol (time) and subcarrier (freq)slot
eq
OFDM symbols
time
fre
freq
subcarriers (ba
time
andwidth
resourceelement
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 11
LTE Library Resource GridRepresented using multidimensional arrays
subc
arrie
rs
OFDM symbols
Efficient when passing between blocks
Efficient use of MKL matrix operations in System Studio
3G Evolution Lab – LTE LibraryChannel Coding
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 12
Channel Coding in LTELTE makes use of the following techniques
CRC bits
Code block segmentation
Turbo and convolutional coding
Rate matching
Code block concatenation
Repetition and block codes
Don’t sweat the math
Steepest Ascent LTE library takes care of the math behind the physical layer
Downlink Transport Channel Coding
DL-SCH, PCH & MCH BCH
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 13
Downlink Control Information Coding
CFI & HI DCI
Data & Control on PUSCHData and control information can be transmitted on the uplink shared channel
data control
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 14
Control Signalling on PUSCHControl information can be transmitted on the uplink shared channel
Control Information on PUCCHControl information can be transmitted on the uplink control channel
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 15
3G Evolution Lab – Channel CodingFull channel coding capability as per TS36.212
Channel coding operations grouped in blocks
3G Evolution Lab – Channel CodingChannel coding operations grouped in blocks
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 16
3G Evolution Lab – Channel CodingFully parameterisable blocks
3G Evolution Lab – LTE LibraryPhysical Channels & Modulation
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 17
Physical Channels & Modulation in LTEReview
Physical channels and signals in LTE
Processing chain
Mapping to resource grid
LTE Library for System Studio
SSupported physical channels and signals
PDSCH and Ref Signal example
Physical Channels & Signals
Physical channels
Set of resource elements carrying informationSet of resource elements carrying information originating at higher layers
Physical signals
Set of resource elements carrying information t i i ti t hi h lnot originating at higher layers
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 18
Physical Channels & Signals
Different physical channels and signals have
Different processing chain requirements
Different mapping to the resource grid
PDSCH Processing ChainPDSCH (data channel) processing chain
Other physical channels have similar chainsOther physical channels have similar chains
Code words are the result of channel coding stages
Layer mapping & precoding: multi-antenna processing
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 19
PUSCH Processing ChainPUSCH (data channel) processing chain
Other physical channels have similar chains
Main differences with DL processing
No multi-antenna processingNo multi-antenna processing
SC-FDMA used instead of OFDM
Precoding stage: DFT for SC-FDMA
Physical Signals Mapping Example
slot slot
Resource grid
Cell specific reference signals
Primary synch signals
Secondary synch signals
subcarriers
Resource grid
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 20
Physical Channels Mapping Example
slot slot
Resource grid
PDCCH
PHICH
PCFICH
subcarriers
PBCH
PDSCH
Mapping to Physical ResourcesEquations describing mapping operations can be complex
( )⎧ ⎥⎢m ⎪⎪⎨
⎧
⎤⎡⎥⎢ Δ⋅−
Δ⋅<
=
if(1)PUCCH(1)(1)
PUCCHshift
(1)cs
(1)PUCCH
(2)RB
NNcnNcnN
m
Mapping is made easy by using the LTE Library
( )
( )⎪⎪⎩
⎪⎪⎨
⎧
=+⎥⎦⎥
⎢⎣⎢−−
=+⎥⎦⎥
⎢⎣⎢
=
12mod2mod if2
1
02mod2mod if2
sULRB
s
PRB
nmmN
nmm
n
⎩⎨⎧
=
⎪⎪⎩
⎨
⎥⎥⎥
⎤
⎢⎢⎢
⎡++
⎥⎥⎦
⎥
⎢⎢⎣
⎢
Δ⋅
Δ⋅−=
prefix cyclic extended2prefix cyclic normal3
otherwise8
( )cs(2)
RBPUCCHshift
RBsc
shiftcsPUCCH
c
NN
Nc
Ncnm
Mapping blocks are controlled by simple parameters
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 21
3G Evolution Lab – DL Phy. Channels
Supported downlink physical channels
PDSCH
PBCH
PDCCH
PCFICH
PHICH
3G Evolution Lab – DL Phy. Signals
Supported downlink physical signals g
Primary synch signals
Secondary synch signals
Cell specific reference signal
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 22
3G Evolution Lab – UL Phy. Channels
Supported uplink physical channels
PUSCH
PUCCH
3G Evolution Lab – UL Phy. Signals
Supported uplink physical signals
SRS
g
Demodulation reference signals (DRS)
S di fSounding reference signals (SRS) DRSDRS
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 23
PDSCH & Reference Signal ExampleGenerate a waveform with
PDSCH
Reference signalReference signal
Processing chain to implement
Ref signalPDSCH
3G Evolution Lab – PDSCH ExamplePDSCH modulation operations
Scrambling
Symbol modulationy
Layer mapping & precoding
Reference signal generation
Mapping to resource grid (PDSCH & ref signal)
OFDM modulation
IFFT modulation
Cyclic prefix insertion
Windowing
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 24
PDSCH ModulationProcessing chain
System Studio implementation with LTE Libraryy p y
Ref Signal Generation & Mapping
Map PDSCH and reference signal to the resource grid
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 25
Ref Signal Generation & Mapping
System Studio implementation with LTE Library
OFDM ModulationApply OFDM modulation
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 26
OFDM ModulationSystem Studio implementation with LTE Library
IFFT modulation
Cyclic prefix insertion
Windowing
Fully Parameterisable BlocksCell specific reference signal generator blockparameters
PDSCH mapper blockparameters
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 27
Efficient Use of MatricesMatrices passed between blocks
High efficiency
multi-dimensionalmatrices
Matrix sizes can change during simulation (unlike other simulators)
3G Evolution Lab –LTE Library for System StudioSummary
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 28
SummaryThe LTE Library for System Studio enables design and validation of the physical layer early in product life cycle
Full channel coding/decoding capability provide authentic data
Physical channel, signal generation and mapping of data to stimulate subsequent design stages
Block based model: Don’t sweat the math
Prebuilt systems available
Test models
RMCs
Why the LTE Library for System Studio?
Comprehensive set of blocks take care of the math behind the physical layer
I d ti itIncrease productivity:
Prebuilt models: Test Models & RMCs
Exploit System Studio speed of execution
Efficient use of data types
Exploit MKL library available in SystemExploit MKL library available in System Studio
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 29
Chris Rowen, Tensilica
TENSILICA
Tensilica: Optimizing HW/SW Implementation
MIMO-OFDM Transmitter Receiver Chain Decide on the right algorithm
Optimize the algorithmOp e e a go
Comms Layer 2Comms Layer 3
HW/SW Implementation
Find the best implementation architecture
Optimize algorithm in the architecture context
Optimize HW/SW Middleware
ApplicationsComms Layer 2Comms Layer 3
HW/SW Chip and System Integration
Optimize HW/SW Integration
MPU DSPModem
Modem Subsystem
HAL
RTOSComms Layer 1
iversDevice Drivers
Partitioning
Mobile Chipset Hardware
IPC
Application Subsystem
MPU DSPMulti-Media
Operating System
Middleware
HALDevice DriversDevice Drivers
MPU DSPModem
Modem Subsystem
HAL
RTOSComms Layer 1
iversDevice Drivers
Start HW/SW
integration early
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 30
Outline
• Baseband Trends• A Fresh Look at Dataplane Processorsp• Reaching for Performance:
– Integrating RTL Accelerators– Special-purpose Instruction Extension for Acceleration– Advanced Baseband DSPs
• LTE Reference Architecture• Drill Down on Turbo Processor
M th d l Fl f Fl ibl B b d• Methodology Flow for Flexible Baseband• Tensilica Integration with Synopsys Tools
Next-Generation Baseband StandardsDrive Fundamental Change in Market
100
1000
4G
orm
ance
(GO
PS
)
High End DSPs
1
10
0.1 1 10 100
2G
3G
Pea
k P
erfo
Power (Watts)
Embedded DSPs
General purpose
processors
Drive towards multi-standard receivers requires
programmable solutions
Emerging standards (LTE, WiMAX) require processing power exceeding the capabilities of today’s DSPs
60
Push towards low-cost green infrastructure requires high performance at very low power
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 31
Complex Cellular Standards1G
(Voice)2G
(Capacity/ Coverage)
3G (Data)
Beyond 3G (Mobile broadband)
Future
AMPS
• LTE (Long Term Evolution) is the next generation cellular standard: Supports data rates of 150Mbps
1983 1993 1995 2005 2010
GSM GPRS Edge WCDMA HSDPA HSUPA HSPA+
CDMA RTT EVDO REV. 0
EVDO REV. A EVDO REV. B
TD-SCDMA
LTE4G
(LTE-A)
of 150Mbps• LTE uses 2 new fundamental technologies
– OFDM: Orthogonal Frequency Division Multiplexing• Has been used in WLAN, DSL, DVB• Requires high computation: Fast Fourier Transforms
– MIMO: Multiple Input Multiple Output• Uses multiple antennas both for transmit and receive• Proven successful in WLAN
• Traditional DSP based radio designs are not sufficient for LTE/4G basebands
Components of digital basebandMain data-path• Computationally very intensive• Configurability requirements for data-path is not trivial (many different modes)
Feed-forward receive data path
Filters FFT MIMO Demod. FEC
Control Elements
Freq/phase offsets/ gain.
Control
Sync. / decoding. Control Ch. processing
Channel Estimation
Baseband master control
From ADC To MAC
Control processing control
Control Elements• Requires programmable solution• Cat 4/5 LTE data-rates (150Mbps) have high computational requirement even for control.
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 32
Typical Baseband Designer Problems
• Have solutions for current radio standards, but need latest generation (e.g. LTE)
– May reuse existing [certified] blocks or may want to cover all with programmable solution
• Have historically used hardwired accelerator blocks, controlled by RISC, but the number and service demands of blocks has grown beyond the capacity current structure, especially for multi-standard platforms
• Have historically used DSPs cores, but new designs require 10 hi h h h>10x higher throughput
• As baseband algorithms grow larger and more complex, customers outgrowing assembly-level tools and simple DSP architectures.
New Wireless Standards Drive Performance and Efficiency
Evolving from 2G to 4G:
• >100x increase in operation rate
1000
operation rate• Baseband power budget
reduced by more than 2-3x over previous generations
Preferred Implementation:
• 2G (GSM) DSP• 3G (UMTS) DSP +
function-specific
10
100
3G
4G
Pea
k P
erfo
rman
ce (G
OP
S)
Embedded DSP
High End DSPs
General purpose p
coprocessors• 3.9G/4G (LTE/LTE-
A) Dataplane Processing Units (DPUs)
10.1 1 10
2G
Power (Watts)
DSPs processors
64
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 33
Typical 3G DesignDSP + multiple big RTL accelerators
[Infineon X-GOLD]
2/2 5G accelerators2/2.5G accelerators
3G accelerators
Tensilica Focus: Dataplane Processing Units (DPUs)
DPUs: A unique blend of CPU + DSP that deliver programmability and improved power, performance & cost
EmbeddedController
ForDataplane
Processing
Main Applications CPU
Tensilica focus: Dataplane Processors
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 34
What is Automated Dataplane Processor Generation?
Complete Hardware DesignSource pre-verified RTL, EDA scripts, test suiteCores as small as 0.02mm2 (45nm)
Processor
XtensaProcessorGeneratorProcessor Configuration
1. Select from menu2. Add instruction description (TIE)
CustomizedSoftware Tools
C/C++ il D b
Extensions
Chips – Correct the First Time
Build SW+Hardware Estimates: <5 minutesBuild Full Hardware: <2 hours
description (TIE)3. Automatic instruction (XPRES)
C/C++ compiler Debuggers, SimulatorsRTOSesSystem Models
Anatomy of an Extensible Processor:Xtensa LX2 Block Diagram
Instruction Fetch and Decode
Base Execution
Xtensa LX2
N-issue FLIX
System Bus
InstCache
IROM
IRAM0
ALU
Register File
Pipeline
RTL, MEM, CPU
FIFOsTIE Queues
RTL Execution Units, R
egister Files and Interfaces
. . . . .
Execution Units, R
egister Files and Interfaces
. . . .
N issue FLIX parallel pipelines
. . . .
TIE Ports
U d fi d
RTL or Lookup Table TIE Lookup
Interface
FPU
MUL16/32, MAC16
On-Chip DebugHiFi2 Audio Engine
Vectra LX DSP
NSA, MIN,MAX, etc
Zero overhead loop
JTAG Tap DMA
SDRAMDDR
Device B
Device A
MasterInterface
SlaveInterface
Writ
eB
uffe
r
I-Fet
ch
Buf
fer
LD/ST1
Instr Mem MMU
LD/ST2
DataMem MMU
ECC/Parity
ECC/Parity
AMB
AH
B/A
XI B
ridge
s
Optional FunctionConfigurable FunctionBase ISA Feature Optional & Configurable
Designer Defined ExtensionsExternal RTL & PeripheralsMemories & Caches
User-defined Execution Units
Interrupt Control
Timers 0 to 3
Trace Port
On Chip Debug
Exception Support
JTAG Tap
TRAXPC Trace
DMAPIF
RTL, CoProc, Shared RAM
DROM
DataCache
XLMI
DRAM0
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 35
DPU Architecture ChoicesKey Tradeoff:
Adaptability vs. Efficiency
Combine ultra-efficient task engines with existing or new (programmable) accelerators. Example: LTE Handsets
U t t f th tne
rgy
Effic
ienc
y
Hard-wired
Use state-of-the-art programmable DSPs optimized for baseband. Example: LTE Basestations
E
Adaptability
TraditionalDSP
Integration of Existing Accelerators• RTL accelerator blocks have many
interface types and widths• Data input stream• Data output stream• Data command inputs• Data output flags• Configuration registers
RTL Accelerator Block
Data DataI
Dat
am
emor
y
• Configuration registers• Mode control• Status outputs
Dat
apat
hE
lem
ents
ModeControl
Status
Command
DataFlags
DataOut
ConfigRegs• Extensible processor matches RTL
interface type and width (to 1024b)• Output queues• Input queues• Read only lookups• Read/write lookups• Import wires• Export states
• Full software support for interfaces:Mapped to instr ctions and compiler at
or C
ontro
l Pro
cess
or
In
Sys
tem
mem
ory
Bus
inte
rface
Flags Out• Mapped to instructions and compiler• Modeling in high-level and RTL tools• Visible to source debugger
Acc
eler
a
AdditionalRTL Accelerators
• Multiple RTL blocks controlled by one processor
• Processor performs “smart DMA” for RTL data transfers
Con
trol
mem
ory
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 36
Direct Control of Multiple RTL Blocks
Processor Data Memories
RTLA
RTLB
RTLC
InstMemory Memories
On-chip bus
Load Reg[] Reg[] StoreReg[]
RTL A(Cmd)Reg[]
RTL C(Cmd)Reg[]
RTL B(Cmd)
5-slot VLIW Instruction streams data from memory, through 3 RTL data-paths and back to memory:
regfile DR 128 16 dlookup LUA {`128+32+8`, Mstage} {`128`, Mstage+3}state ModeA 32 add_read_writelookup LUB {`128+32+8`, Mstage} {`128`, Mstage +3}}state ModeB 32 add_read_writelookup LUC {`128+32+8`, Mstage} {`128`, Mstage +3}}state ModeC 32 add_read_writeformat f64 64 {l_slot,s_slot,a_slot,b_slot,c_slot}table cmdA 8 8 {0, 1, 2, 3, 4, 5, 6, 7}table cmdB 8 8 {0, 1, 2, 3, 4, 5, 6, 7}table cmdC 8 8 {0, 1, 2, 3, 4, 5, 6, 7}slot_opcodes l_slot {LDIU}slot_opcodes s_slot {SDIU}
Input Data
Output Data
Controlword
Cmd word
128b 128b 32b 3b
Interface and Instruction Set Declaration:
Load Reg[] Reg[] Store RTL_A(Cmd) Reg[]
RTL_C(Cmd) Reg[]
RTL_B(Cmd) Reg[]
slot_opcodes a_slot {LUOpA}slot_opcodes b_slot {LUOpB}slot_opcodes c_slot {LUOpC}operation LUOpA {out DR do, in DR di, in cmdA cmd}
{in ModeA, out LUA_Out, in LUA_In} { assign LUA_Out = {cmd,ModeA,di};assign do = LUA_In;}
operation LUOpB {out DR do, in DR di, in cmdB cmd} {in ModeB, out LUB_Out, in LUB_In} {
assign LUB_Out = {cmd,ModeB,di};assign do = LUB_In;}
operation LUOpC {out DR do, in DR di, in cmdC cmd} {in ModeC, out LUC_Out, in LUC_In} {
assign LUC_Out = {cmd,ModeC,di};assign do = LUC_In;}
Typical operations per cycle:1 128b read from memory1 128b operation through RTL A1 128b operation through RTL B1 128b operation through RTL C1 128b write to memory
Full Accelerator Integration• For new functions, integrated
acceleration is easy and efficient• Your proprietary accelerators are
fully integrated into instruction set and software tools for each Pr
ivat
e m
emor
y
Reg
iste
r File
WideDatapath
orD
ata
mem
ory
processor• Add any number of new data
pipelines, registers, memories, inter-processor channels – up to 100s of ops per cycle
• Tensilica Instruction Extension (TIE) format typically 10x more concise than Verilog
• The cycle-by-cycle behavior of each accelerator written in standard C and
Reg
iste
r Fi
le Datapath
SpecialFunction
Reg
iste
r
Reg
iste
r
SpecialFunction
SpecialFunctionR
egis
ter
Reg
iste
r
d ion cele
rato
r Con
trol P
roce
sso
Sys
tem
Mem
ory
Bus
inte
rface
acce e ato tte sta da d C a dmodeled in fast cycle-accurate simulator
• Use multiple small processor for additional throughput on complex sets of tasks
Reg
iste
rFi
le
Ded
icat
edC
omm
unic
ati
Cha
nnel
s
Acc
Con
trol
mem
ory
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 37
Tensilica DSPs for LTE/4G baseband design
16 MAC and more
ConnX BBE16 MAC
Quad MAC
8 MACXtensa
TIE
ConnXVectra LX
ConnX545CK
Single MAC
Dual MAC
MAC16
ConnX D2
Custom DSPs Comms DSP (16 / 32-bit)
Anatomy of a DSPConnX Baseband Engine Architecture
YR Vector Register Bank(8 x 4 x 40b)YR Vector Register Bank
(8 x 4 x 40b)YR Vector Register Bank(8 x 4 x 40b)
Vector Register Bank(16 x 4 x 40b)Vector Register Bank
(16 x 4 x 40b)Vector Register Bank(16 4 40b)VR Vector Register Bank (16 x 4 x 40b)
YR Vector Register Bank (8 x 4 x 40b)AR General Registers
(16 x 32bits)
Local Memory and/or Cache
40-bit : 32+8 guard bits20-bit real 20-bit real20-bit imag 20-bit real
( )40-bit : 32+8 guard bits
20-bit real 20-bit real20-bit imag 20-bit real
(16 x 4 x 40b)40-bit : 32+8 guard bits
20-bit real 20-bit real20-bit imag 20-bit real
Local Memory and/or Cache I R I R
X XX X
g ( )
40-bit : 32+8 guard bits
20-bit real 20-bit real
20-bit imag 20-bit real
Load Store Unit (128 160b)
32b/128b
Load Store Unit (32/128 160b)
UR Alignment Registers(4 x 128 bits)
128bVector Selection Registers
(4 x 32b bits)
+
+
Shift / Saturation
ACC Registers
rounding 40b
36b
-
Shift / Saturation
I
ALU
R I RQ Q
Addressing Modes• Immediate• Immediate
updating• Indexed• Indexed updating• Aligning updating• Circular• Bit-reversed
Arith, Logical, Shift OpsArith, Logical, Shift Ops
Xtensa 32bBase Ops
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 38
Anatomy of a DSP ConnX Baseband Engine Instruction Set
Rich baseline instruction set: up to 153 operations DSP instruction set: 285 operations in 3 VLIW slots
Load/Stores ops:• Addressing Modes:
• offset• offset-update • Index• index-update• circular• bit-reversed
• Load 16b/32b scalars and vectors
• Store 16b/32b scalar, vectors, transposed
Multiply ops:• Complex and scalar
18bx18b multiplies• Multiply, multiply-
round, multiply-add, multiply-subtract
• Multiply complex conjugate
• Magnitude-squared of complex
• Full precision and saturated/rounded
ALU ops:• 20b/40b extended
precision in 160b vectors
• Full arithmetic, logical and shift with saturation operations
• SIMD boolean setting for compares
• Ops: ABS, ADD, AND, ASUB, CLAMPS EQ XOR
Other ops:• Direct support for
single-cycle radix-2 and radix-4 butterfly operations
• 8-way SIMD integer and fractional divide [optional]
• 4-way SIMD reciprocal square root [optional]
• Arbitrary permutationtransposed• Load/store unaligned
and masked delivers full bandwidth loads and stores with unaligned data
saturated/rounded outputs
• Up to 16 multiplies per operations
• FIR-optimized multiply-add
CLAMPS EQ, XOR, LE , MAX, MAXB, MAXU, MIN, MINB, NAND, NEG, NSA, NSAU, OR, PACK, SLL, SLLI, SLLV, SRA SRAI, RADD SUB
Arbitrary permutation and selection from vector pairs
• Zero-overhead looping
• Conditional vector moves
Anatomy of a DSPBaseband Engine: FFT and FIR performance
• Multiple wide memories and parallel execution units for high performance:• Native support for complex arithmetic• VLIW instructions and vector register files support complex code with
minimum load-store
Performance includes cache and local-data-memory modeling
512 complex points
1024 complex points
2048 complex points
4096 complex points
8192 complex points
minimum load-store• Rich addressing modes minimize data reference overhead• Advanced compilers for register allocation, code scheduling, software
pipelining and vectorization.
FFT (incl bit reversal) 853 1,812 3,630 7,930 16,247FIR - 8 tap 1,100 2,200 4,350 8,700 17,400
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 39
2-8 ConnX Baseband Engine ClusterBaseBand Engine 2
160b vector registers
Computation Units
32b scalar registers
BaseBand Engine 1
160b vector registers
Computation Units
32b scalar registers
2-8 Baseband Engines form powerful shared memory baseband processor platform8 engine cluster:
64b InstC h
128b DataRAM 0 64b InstC h
128b DataRAM 0
64b InstCache 128b DataRAM 0
128b DataRAM 164b InstCache128b DataRAM 0
128b DataRAM 1• 128 MACs/cycle• Up to 640 ops/cycle• 880K 2048pt complex FFTs per
secondDistributed DataRAM space visible to all engines accessed across 128b pipelined interconnectWrite-buffered interface allows aggregate 120GB/s processor load/store data
128blinks
Processor InterFace (PIF) Processor InterFace (PIF)
Processor InterFace (PIF)Processor InterFace (PIF)
BaseBand Engine 3
Cache128b DataRAM 1
160b vector registers
Computation Units
32b scalar registers
BaseBand Engine 4
Cache 128b DataRAM 1
160b vector registers
Computation Units
32b scalar registers
bandwidth and 60GB/s inter-engine data bandwidth (at 500MHz)Native SystemC modeling of multi-engine processors, including cycle-accurate and fast “Turbo” mode bit-accurate simulation
Typical 4-engine configuration
A continuum of design solutions
ConnXBBE
ConnXBBE
ConnXBBE
ConnXBBE
RTL
/TIE
Acce
lR
TL/T
IEAc
cel
RTL
/TIE
Acce
l
ConnXVectra/D2
RTL
/TIE
Acce
lR
TL/T
IEAc
cel
RTL
/TIE
Acce
l
Xtensa Controller
RTL
/TIE
Acce
lR
TL/T
IEAc
cel
RTL
/TIE
Acce
l
RTL
/TIE
Acce
l
TE
RTL
/TIE
Acce
l
TE
RTL
/TIE
Acce
l
TE
SingleBaseband Engine
MultipleBaseband Engines
Baseband Engine plusRTL or TIE
Lighter DSP plusRTL or TIE
One XtensaController plus
Multiple XtensaTask
function-specificaccelerators
function-specificaccelerators
RTL or TIEfunction-specificaccelerators
Engines withRTL or TIEfunction-specificaccelerators
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 40
Putting it all together: LTE-Terminal Baseband PHY Development Architecture
• Purpose– Demonstrate ease of design of world-class LTE terminal PHY– Evaluate performance, cost, power, and flexibility
Support customers and partners in delivering production solutions– Support customers and partners in delivering production solutions• Functionality
– LTE Category-4 Solution, FDD– Functional Domain: Transport Blocks -- Time Domain Frames @Nyquist
• Features– Configurable– Modular-Distributed Processing– Resource Sharing
• Signal-BBE: NCO, FFT, Channel Estimation, Synchronization• Channel-BBE: MIMO decode: SM and TD,• Signal-BBE: Layer Mapping Pre-coding DFT FFT
79
• Signal-BBE: Layer Mapping, Pre-coding, DFT, FFT– Block-Adaptive Pipelining
• Low pipeline delay• Low memory requirement• Fine-grain power management
– Easy of Verification and Bringup– Partitioned into four domains: Signal, Matrix, Bit, Control
• Physical Parameters: 65LP process @ <300MHz
Domain Partitioning of Baseband PHY
1. Signal Domain (Time -- Frequency)– Signal (I-Q Data) Operations– FFT, DFT, Synchronization, Channel Estimation
Two BBEs with Extensions– Two BBEs with Extensions2. Matrix Domain (Frequency -- Soft Bits)
– Matrix Operations (I-Q Data)– MIMO Decoding
3. Bit Domain (Soft Bits-Bits)– FEC Encoding, Bit Scrambling, Interleaving, Rate Matching– Soft Descrambling, Deinterleaving, HARQ Recombining, Turbo Decoding– Turbo Processor– HARQ Processor: Controller 64-bit
PDCCH Processor: Vectra LX Class Processor + Viterbi Acc
80
– PDCCH Processor: Vectra LX Class Processor + Viterbi Acc– Tx Bit Processor: Vectra LX Class Processor
4. Control Domain– Configure and Control System– Communicate with MAC and Host– Bit PDCCH Processor: Vectra LX Class Processor + Viterbi Acc
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 41
Tensilica LTE PHY Development ArchitectureConnX Baseband DSPs + Accelerators for Minimum Power /Area
Tx
Rx
Specialized Processor as Efficient as RTL Tensilica Turbo Processor Development Architecture
• LTE requires high-data-rate Turbo decoding: ~6000 ops per bit
• Xtensa’s instruction
Size of Turbo Decoders (scaled to 154Mbps)
1,600
(8
extensions, wide data-paths and multiple wide memories enable efficient programmable Turbo Engine
• Method:• 8 parallel windows per block –
two bits per cycle• 0.5 cycle for each forward and
backward pass (2 cycles per iteration) per bit 200
400
600
800
1,000
1,200
1,400
s to
ach
ieve
154
Mbp
s @
350
MH
z (
tions
) for
Pub
lishe
d Tu
rbo
Blo
cks
82
iteration) per bit• Log correction term for
improved bit error rate• Implementation:
• 325K gates + 80KB memory (2mm^2 in 65LP) achieves 154Mbps at 350MHz
0
[Lin]
[Bikerst
aff 1]
[Bickers
taff 2
]
[Salmela 1]
[Agarwala
,Wolf
]
[Vogt][Thu
l][Shin]
[Benkese
r]
[Xtensa
]
[Salmela 2]
Design
K g
ates
itera
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 42
Turbo Processor Architecture
• Highly specialized 5-slot VLIW instruction set
• Six memory references per
LS0 LS1 Addr RdWrSt αβLD_SYS128.ILD_SYS128.IU
LD_PAR128.ILD_PAR128.IU
RD_INTERADDRRD_DEINTERADDR
RD_STATE_U ALPHABETADECISION• Six memory references per
cycle• Single 640b wide state memory• Single 30b interleave address
memory• Dual 64b interleaved apriori
memory• Dual 128b load/store interface
for main memory• Massive SIMD:
• Each state update: 21 8b
MAPαβ
83
poperations
• 8 states per window• 2 successive bits per window• 8 windows in parallel• Ops per cycle: >2600
Turbo Processor inner loop C code examplefor (k=kq/2-1;k>=0;k--) {InterleaveAddr = RD_INTERADDR(DEC);StateWord = RD_STATEU();BETA(StateWord,InterleaveAddr,*(SystemP),*(--ParityP));
}}
650: rd_statest1, a15, 0653: addmia2, a1, 0x600656: addmia2, a2, 0x7f00659: wr_statest1, a1565c: rd_statest0, a9, 265f: l32ia2, a2, 208662: state_statem1st0665: { ld_sys128.iusy0, a0, -16;ld_par128.iupa0, a2,-16; rd_interaddria0,0;rd_stateust0; nop }66d: { ld sys128.iusy1, a0, -16;ld par128.iupa1, a2,-16; rd interaddria1,0;rd stateust1; nop }
84
{ _ y y , , _p p , , _ , _ p }675: loopgtza3, 688 <main+0x688>678: { ld_sys128.iusy0, a0, -16;ld_par128.iupa0, a2,-16; rd_interaddria0,0;rd_stateust0; betast0, ia0, sy0, pa0 }680: { ld_sys128.iusy1, a0, -16;ld_par128.iupa1, a2,-16; rd_interaddria1,0;rd_stateust1; betast1, ia1, sy1, pa1 }688: movia3, 191
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 43
Top-Down Baseband Design Methodology Algorithm Design
f1 f2 f3 f4 f5 f6 f7 f8Synopsys System StudioMatlab
Fixed Point Algorithm Refinement
f3 f4 f5 f6 f7 f8f1 f2Visual C++Synopsys System StudioXtensa XplorerTensilica cstubs
Partitioning to DSP C and Accelerators: Simulation of sub-systems
Accelerators
DSP C f1f2
Tensilica fast ISSTensilica cycle-accurate ISSTensilica TIE CompilerAccelerators
DSP C f3f4
f5Accelerators
DSP C f6f7
f8
Integration of sub system models
85
Integration of sub-system models Tensilica XTSCCoware Platform ArchitectCarbon SOC Designer
Interconnect and peripheralsAccelerators
DSP C f8f1f2
f3f4
f5 f6f7
Mapping of system to FPGA and ASIC RTL
Block 1 Block 1 Block 1
f1 f2 f3 f4 f5 f6 f7 f8
Interconnect and peripherals
Tensilica FPGA netlistTensilica RTL generationTensilica pin-level XTSCVerilog simulatorsFPGA synthesis and mappingTensilica RTL Testbench
Top-Down Baseband Design Methodology
MIMO-OFDM Transmitter Receiver Chain Decide on the right algorithm
Optimize the algorithm
Comms Layer 1Comms Layer 2Comms Layer 3
HW/SW Implementation
Find the best implementation architecture
Optimize algorithm in the architecture context
Optimize HW/SW
Partitioning Operating System
Middleware
Applications
RTOSComms Layer 1Comms Layer 2Comms Layer 3
HW/SW Chip and System Integration
Optimize HW/SW Integration
MPU DSPModem
Modem Subsystem
HAL
RTOSiversDevice Drivers
Partitioning
Mobile Chipset Hardware
IPC
Application Subsystem
MPU DSPMulti-Media
Operating System
HALDevice DriversDevice Drivers
MPU DSPModem
Modem Subsystem
HAL
RTOSiversDevice Drivers
Start HW/SW
integration early
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 44
Algorithm ↔ Processor OptimizationReference Algorithm
P fil Al ith
TensilicaDPU
Host-basedCstubs
Profile Algorithm on ISS
Optimizedalgorithm
Host-basedfunctional
Validate performance on
(optional)Instruction extensions
Processor subsystem RTL
generation
RTL integrationfunctional Verification
performance on ISS
Validate in FPGA and SOC system implementation
Validate functionality and performance in SystemC model
RTL integrationverification
Tensilica DPU ISS in Synopsys Innovator
Wire to PinWire-to-PinAdaptor
Tensilica core ISS
PIF-to-TLM-2.0Adaptor
ScriptingEngine
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 45
Tensilica DPU in Synopsys Innovator
Information for debugger
attachment
Tensilica DPU in Synopsys Innovator:Debugger attach
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 46
Tensilica DPU in Synopsys Innovator:Debugging
Xtensa Xplorer Debug
Environment
Xtensa Xplorer console window
Wrap-up
• Baseband Trends• A Fresh Look at Dataplane Processorsp• Reaching for Performance:
– Integrating RTL– Special-purpose Instruction Extension– Advanced Baseband DSPs
• LTE Reference Architecture• Drill Down on Turbo Processor
M th d l Fl f Fl ibl B b d• Methodology Flow for Flexible Baseband• Tensilica Integration with Synopsys Tools
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 47
Frank Schirrmeister, Synopsys
SYNOPSYS
Synopsys: Enabling Optimization and Software Development
MIMO-OFDM Transmitter Receiver Chain Decide on the right algorithm
Optimize the algorithmOp e e a go
Comms Layer 2Comms Layer 3
HW/SW Implementation
Find the best implementation architecture
Optimize algorithm in the architecture context
Optimize HW/SW Middleware
ApplicationsComms Layer 2Comms Layer 3
HW/SW Chip and System Integration
Optimize HW/SW Integration
MPU DSPModem
Modem Subsystem
HAL
RTOSComms Layer 1
iversDevice Drivers
Partitioning
Mobile Chipset Hardware
IPC
Application Subsystem
MPU DSPMulti-Media
Operating System
Middleware
HALDevice DriversDevice Drivers
MPU DSPModem
Modem Subsystem
HAL
RTOSComms Layer 1
iversDevice Drivers
Start HW/SW
integration early
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 48
Outline
• Challenges• Algorithm Optimization• Algorithm Optimization
– Performance– Productivity– Flow Integration
• HW/SW Optimization– Software Development
Verification Integration– Verification Integration– Architecture Analysis– System Prototyping
• Ecosystem
Some System-Level Challenges
Source: EETimes, 02/05/2007
How do I develop the right signal processing
algorithms, implement them and export them to
How do I get a virtual model of my platform to the programmers for
software development pHDL and Verification
flows?
pearly?
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 49
ALGORITHM OPTIMIZATION
Designer’s Wish
Designer’s Challenge
Algorithm Design & Analysis
Designer’s Task
Model-based design,ultrafast simulation, and analysis of signal processing algorithms
Growing complexity⇒ efficiency is key
• Modeling efficiency• Simulation performance
• Analysis capabilitiesD i & ifi ti
Create algorithm and model meeting key requirements
• BER• SNR• Word length• Image/audio quality
S d/ lit • Design & verification flow integration
• Sync speed/quality• . . .
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 50
Users need to handle complexity
rfor
man
ce Performance
Highest simulation performance (Stream-Driven Simulation
Per
Productivity
Fastest fixed-point simulation (20x-200x faster than OSCI SystemC reference)
Productivity
Model-based designSignal-processing specific analysisWorkgroup / data managementSimulation management
Flow Integration
Verification: Export to SystemC (FIFO & RTL interface)Reference modelVerification testbench
Implementation Code Generation
Handling Complexity
rfor
man
ce Performance
Highest simulation performance through Stream-Driven Simulation
Per
Productivity
Fastest fixed-point simulation (20x-200x faster than OSCI SystemC reference)
Productivity
Model-based designSignal-processing specific analysisWorkgroup / data managementSimulation management
Flow Integration
Verification: Export to SystemC (FIFO & RTL interface)Reference modelVerification testbench
Implementation Code Generation
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 51
Modeling EfficiencyGetting to Simulations Earlier
• Writing a model– “Text book” modeling style– Focus on your algorithmy g– Design checks
• Developing a design– Block based design– Standard interfaces– Instantiate models via drag &
drop
• Managing your design & team– Compiler/linker flags, Makefile generation, build process– Parameters & scripting– Link to revision control system
Model LibrariesFrom Simple to Most Complex
• Signal-processing modelsData sources A l i d l ( BER)– Data sources
– Channel models– Display models
• PLUS: Reference design kits (RDKs)J t t ith t d d li t f
– Analysis models (e.g. BER)– Filters– Coder / Decoder
– Jump-start with standard compliant reference models of advanced wireless, multimedia and telecom technical standards
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 52
Example LTE PHY Library
Complete physical layer simulation h i d i d t l tchain designed to accelerate your
physical layer and algorithm development
Channel coding
Scrambling Modulation mapper
Layer mapper
Pre-coding
Resource mapper
OFDM modulation
Channel coding
Scrambling Modulation mapper
Layer mapper
Pre-coding
Resource mapper
OFDM modulation
coding
Scrambling Modulation mapper
mapper coding
Resource mapper
OFDM modulation
coding
Scrambling Modulation mapper
mapper coding
Resource mapper
OFDM modulation
Transmitter example
LTE PHY LibraryApplications• Design and verify communications algorithms• Create custom test and measurement waveforms• Generate test models and reference measurement channels
(RMC’s)• Supports Golden Reference verification for both hardware and
software• Generate BitErrorRate (BER) and BlockErrorRate (BLER) link
level curves
Developed by and available from Steepest Ascent, d di t d t th LTE t d ddedicated experts on the LTE standard
Beta availability: ImmediatelyGeneral availability: October 2009
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 53
LTE PHY Library Highlights
• 3GPP Release 8 E-UTRA Physical Layer implementation conforming to TS36.211, TS36.212 and TS36.213
• Tracking compliance with Release 8 into Release 9
• Full support for Downlink Reference and Synchronization Signals and Uplink Reference Signals
• Complete support for 1, 2 and 4 antenna transmissions including all MIMO layering and precoding options
• Encode/decode data channels: DL-SCH, UL-SCH, PCH, MCH & BCH
• Encode/decode control region channels: DCI, HI and CFI
LTE LibraryModel List
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 54
Handling Complexity
rfor
man
ce Performance
Highest simulation performance through Stream-Driven Simulation
Per
Productivity
Fastest fixed-point simulation (20x-200x faster than OSCI SystemC reference)
Productivity
Model-based designSignal-processing specific analysisWorkgroup / data managementSimulation management
Flow Integration
Verification: Export to SystemC (FIFO & RTL interface)Reference modelVerification testbench
Implementation Code Generation
Simulation SpeedGetting to Results Faster
• High-speed dataflow simulation engine– Enabling industry’s fastest simulation and analysis of
i l i l ithsignal-processing algorithms – Highly efficient stream-driven simulation technology
• Fastest fixed-point simulation– Speed-up SystemC fixed-point by 20x to 200x– Simulate bit-true models almost at floating-point speed
• Distributed Simulations– Automatically distribute simulations on your compute
cluster– Get to results faster
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 55
Complexity Demands FASTEST Simulation
Bit ErrorRate
Simulate 100 million data samples for one performance point
10-6
10-3
SNR
for one performance point
• Performance is judged by simulating billions of input data• Mandatory to deploy most efficient simulation paradigm• Compiled C/C++ simulations
SNR
core N
CPUcores
Is FASTESTFASTEST not Fast Enough?Parallel Iterations
core 1
core N
…
Time toTime saving results
• Use of job scheduler significantly reduces time to results• Utilize the power of your entire compute farm• Get to results faster!
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 56
Objective
Fast Fixed-Point SimulationGet The Best of Two Worlds
native data30x Objective
SystemSt di
Sim
ulat
ion
effic
ienc
ydata types
fixed-pointlib i S
wl
1 Studio
Modelingefficiency
librariesiwl fwl
1x
20x1x
Handling Complexity
rfor
man
ce Performance
Highest simulation performance through Stream-Driven Simulation
Per
Productivity
Fastest fixed-point simulation (20x-200x faster than OSCI SystemC reference)
Productivity
Model-based designSignal-processing specific analysisWorkgroup / data managementSimulation management
Flow Integration
Verification: Export to SystemC (FIFO & RTL interface)Reference modelVerification testbench
Implementation Code Generation
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 57
Flow Integration
• Re-use your algorithm
System
HW Design &
Verification
Embedded Software
• HW design & verification– Golden reference– Functional testbench
• Embedded SW dev.– Use in virtual platforms
RTL not available or tooyStudio – RTL not available or too
slow
• De-risk your flow
SystemC/C++ ExportIntegration into design/verification flow
Automatic generation of SystemC/C++ wrapper...g y ppFunction call interfaceSystemC FIFO interfaceSystemC signal interface
... and utilitiesMakefileExistence testbench
Plug&play: Export into HDL simulationNo System Studio knowledge required
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 58
HDL Import
Automatic interface and wrapper generationDataflow interface for HDL modelDataflow interface for HDL model
Automatic synchronization of simulation engines
Support for all major HDL simulators for Verilog and VHDLg
VCS-MXMTINC-SIM
RTL Code Generation
Synthesizable RTL from dataflow description
Two options Library based: full control over HDL implementationSingle-source concept: fastest route to RTLBoth options can be mixed
Automatic generation of HDL co-simulation
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 59
Integration into HW Verification FlowExporting Algorithmic Models
• Automatic generation of wrappers clock resetwrappers– C/C++ interface– SystemC FIFO or signal
• Replaces need for paper spec & I/O files
• No System Studio knowledge required by
AlgorithmicModel
port0datavalid
ready
param2=256param1
param1
port1datavalid
ready
dataknowledge required by HW verification engineer
• Optional encryption
SystemC Wrapperport2
datavalid
ready
Hardware-in-the-Loop Simulation
• Execute RTL code on CHIPit box (FPGA prototyping)• Reuse algorithm simulation setup for stimuli
generation and analysis
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 60
Integration into Embedded SW FlowModel Reuse in Virtual Platforms
Picture Generator
ARM920 DMA Memory
controller
System Bus
SoC
RAM
Rendering
Analysis
Peripheral BusRendering
VT100Picture Generator
LCD
Stimuli generation
Functional model
Algorithm Design & Analysis Needs
Model-based Design
rfor
man
ce
Analysis and Debugging
Models (Source Code)
Highest Simulation Performance
Productivity
Per
Models (Source Code)
Verification Flow Integration
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 61
Synopsys’ DSP Algorithm PortfolioSystem Studio Libraries System Studio
2000+ models(signal gen, basic DSP,
analog and digital)
Environment for Design and analysis of signal processing algorithmsanalog and digital) processing algorithms
Fastest way to functional specificationFastest way to functional specification through model based designShortest time to results through highest simulation performanceIncrease overall verification productivity through HDL import and SystemC export capabilities
HW/SW OPTIMIZATION
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 62
Why Virtual Platforms?A Chip Design Project P&L Without Virtual Platforms
Tests for post-silicon validation are
developed late as well
Prototype for software development available late in the design cycle
Virtual PlatformsPre-Silicon Models of the Hardware
E l A il bilitEarly AvailabilityEnhanced Debugging
Easy Deployment
Fully functional software model of SoC, board, I/O,
user interface
Executes unmodifiedproduction code
Runs at almost real-time
High system visibility and control incl. multi-core
debug
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 63
Virtual Platform Business ImpactGet to market early and increase profit
Start software development pre-silicon, long before
h d i il bl !hardware is available !
Start pre-silicon verification and post-
silicon validation using virtual platform as “DUT”
Virtual Platform Usage ExamplePre- and Post-Silicon
# of
runt
ime
licen
ses Deliver to OEM
Hardw
are Availability
Early (Pre-Si) Software Development#
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 64
Functional Specification – Defining the overall hardware / software system
Virtual PlatformsFrom Functional Specification to RTL Implementation
Platform Creation & Analysis Platform Deployment
3 d P t
Model Creation
Algorithms
IP-XACT
IP Libraries
RTL to GDSII Implementation Flow
3rd Party Software
DebuggersSystemCYour-lib-1Your-lib-1Your-lib-1
Virtual Platform Value Three Major Use Cases of Virtual Platforms
Software Development(Prior to RTL)
Verification & Validation(Post RTL / Silicon)
Performance Profiling(Architecture Analysis)
High-performance simulation Simulation of complete Hybrid simulation of “loosely”High performance simulation of complete systems
Simulation of complete systems including board-level test harnesses
Hybrid simulation of loosely and “accurately” timed models
Professional IDE for enhanced debug & visibility
Innovator GUI enables test engineer productivity, emulating test harnesses
Innovator “Platform Analyzer” add-on enables customizable analysis
Open “Framework” based on SystemC TLM-2.0
Links to VCS / VMM“System to Silicon” verification flow
Innovator extensions to System-C for profiling & instrumentation
Scalable environment that enables internal & external users
Proven links to emulation solutions: EVE, Paladium
Proven integration with 3rd
party models
Integration with all major embedded software tool chains
Proven Links to Synopsys CHIPit
Developing “save & restore” capability to enable more visibility
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 65
Virtual Platform Development Needs
• Powerful IDE– Graphical design capturep g p– Model creation wizard– Debugging support– Domain-specific
visualization• Native SystemC
support
• Run Time– User’s seat– Enabling execution of
Virtual platform
• Accelerate virtual platform creation, through a portfolio of transaction-level models (TLMs)
Library Needs
Example: DesignWare®
System-Level Librarytransaction-level models (TLMs)High performance & quality100+ titles & growing …
• Written in SystemC™• Migrating to TLM-2.0 API
• Tool independent: works with any IEEE-1666 compliant SystemC simulator
y y
Processors DesignWareCores
DW AMBAModels
CoreConnectModels
PrimeCell InfrastructureNEW
SystemC simulator• Supported on Windows & Linux• Delivered in binary format• Model Authoring Libraries
Models Models
Model Authoring Libraries
Pre-assembledPlatforms
NEW
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 66
Example: Texas Instruments
OMAPOMAP OMAP
1623
OMAP2420
OMAP2430
OMAP3430
OMAP3®
Cellular• Virtual platform available 9-12 months before HW is available - allowing 85-90% of all software to be
Consumer
OMAP1®
OMAP1510
1610 1623
OMAPV1030
OMAPV1230
OMAPV2230
OMAP2®
OMAP-Vox®
OMAPV2320
TCS2310(LoCosto)
DRP
90% of all software to be developed pre-silicon
• 2-5x SW development productivity improvement
• 1st-day HW/SW integration • Deployed to TI customers
DA295S DM420
DaVinci Family Unannounceddevices“TI used the VPOM-2430 Virtual Platform successfully to accelerate software development for the OMAP2430
processor device. Because of the effective simulation environment provided by SNPS, our teams were able to immediately run and test our software when the OMAP2430 processor became available. SNPS is a key element of TI's plan to reduce the time needed to provide software after silicon becomes available.”
Avner Goren, Worldwide Marketing Director, Cellular Systems, TI's Wireless Terminal Business Unit.
Virtual Platform Value Three Major Use Cases of Virtual Platforms
Software Development(Prior to RTL)
Verification & Validation(Post RTL / Silicon)
Performance Profiling(Architecture Analysis)
High-performance simulation Simulation of complete Hybrid simulation of “loosely”High performance simulation of complete systems
Simulation of complete systems including board-level test harnesses
Hybrid simulation of loosely and “accurately” timed models
Professional “Innovator” IDE for enhanced debug & visibility
Innovator GUI enables test engineer productivity, emulating test harnesses
Innovator “Platform Analyzer” add-on enables customizable analysis
Open “Framework” based on SystemC TLM-2.0
Links to VCS / VMM“System to Silicon” verification flow
Innovator extensions to System-C for profiling & instrumentation
Scalable environment that enables internal & external users
Proven links to emulation solutions: EVE, Paladium
Proven integration with 3rd
party models
Integration with all major embedded software tool chains
New links to SynplicityHAPS available soon
Developing “save & restore” capability to enable more visibility
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 67
RTL Verification in the Presence of SWMixing TLM Model, RTL & Assertions
Key Values & Usage– Software/Hardware Integration
ValidationV lid t HW/SW i t ti
System-on-Chip
PeriphPeriph PeriphPeriph
(System)Software
Validate HW/SW integration on actual hardware (RTL)
– System ValidationScales verification to system contextSoftware becomes part of the verification test bench Verification confidence increases with “real” system scenarios
RTL Si l ti S d
CPU(s)CPU(s)InstructionSet Simulator
TLM BusTLM Bus
SystemSystemI/OI/O
RTLRTL(sub)system(sub)system
Transactor
TLM B
usTLM
Bus
MemMemCtrlCtrl
SystemSystemI/OI/O VCS/VMM – RTL Simulation Speed-up
Maintain TLM level where possible
TLM model used to generate system stimuli
I/OI/O
CameraCamera
CtrlCtrl
FlashFlashMemoryMemory
I/OI/O
USBUSB
System/Device
VCS/VMM
Early Testbench Creation & IntegrationIntegration with VMM Methodology
TestbenchKey Values & Usage
Early Testbench DevelopmentDevelop all TB infrastructure with TLM platform
E l T t / S i
Test-case /Scenario
RTL
DUTUT
Virtual Platform
river
nito
r
Early Test-case / Scenario Development
VMM scenarios / test-cases“Embedded directed software” tests used for system (integration) testing can be efficiently developed on TLM model
Higher Test-case (software) productivity
Faster turnaround & better i ibilit i t TLM l tf
Coverage / Self-checkTestbench
RTL
DUT
Dr
Mo
Early testbench creation& test development
visibility into TLM platform Technology / Methodology
SystemC support in VMMLayered VMM testbench approachVCS TLI interface
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 68
Co-VerificationMixing TLM Model, RTL & Assertions
Assertion Specification
DesignWare™ Virtual PlatformSoftware Debugger
VCS / DVE
System PrototypingVirtual & Hardware Prototypes
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 69
Combined Virtual And FPGA Prototype
Best Of Both WorldsUse Models
– Offload software to workstation
– Virtual Platform re-using existing RTL in FPGA Prototype
– Virtual Platform as test bench for FPGA Prototype
– Joint virtual / real system environment connectionsenvironment connections (USB, SATA, …)
– Virtual ICE in virtual platform connected to FPGA prototype
System PrototypingFrom pre-RTL virtual prototype to hardware prototype
ChipArchitecture
Si ProtoRTL Netlist GDSII
Firmware OS & Driver
typi
ngw
SemiconductorHouseVirtual Prototype SDK usage
Middleware
Hybrid
ProductArchitecture
Device Proto
OS & Driver Middleware Application SW
Device Proto
End
to E
nd P
roto
tD
esig
n Fl
ow
System House
SI PrototypeFPGA Prototype
SI Prototype
Virtual PrototypeFPGA Prototype
Schedule Improvement
SDK usage
Previous Chip (for derivative)
Previous Chip
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 70
Virtual Platform Value Three Major Use Cases of Virtual Platforms
Software Development(Prior to RTL)
Verification & Validation(Post RTL / Silicon)
Performance Profiling(Architecture Analysis)
High-performance simulation Simulation of complete Hybrid simulation of “loosely”High performance simulation of complete systems
Simulation of complete systems including board-level test harnesses
Hybrid simulation of loosely and “accurately” timed models
Professional “Innovator” IDE for enhanced debug & visibility
Innovator GUI enables test engineer productivity, emulating test harnesses
Innovator “Platform Analyzer” add-on enables customizable analysis
Open “Framework” based on SystemC TLM-2.0
Links to VCS / VMM“System to Silicon” verification flow
Innovator extensions to System-C for profiling & instrumentation
Scalable environment that enables internal & external users
Proven links to emulation solutions: EVE, Paladium
Proven integration with 3rd
party models
Integration with all major embedded software tool chains
New links to SynplicityHAPS available soon
Developing “save & restore” capability to enable more visibility
Transaction-level Modeling Abstraction Levels
80+ MIPS
App ViewTLM (AV)
40-60 MIPS
1-10 MIPS
Prog. ViewTLM (LT)
PV with TimingTLM (AT)
Pre-silicon Software Development & Integration
Architectural ExplorationSystem Verification
1-100 KIPS
FunctionallyAccurate
CycleApproximate
CycleAccurate
C-translated-RTL ModelsCo-Emulation
RTL co-simulation
& Real-Time SW Development
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 71
SW Centric Architecture AnalysisTLM-2.0 “AT” Simulation
• Data visualization and processing
logfile
p g• Flexible and extensible
– Support for user-defined types– Field configurable through
plug-in API
• Rich set of data views• High performance loggerHigh performance logger
and loaderinteractive
data processing
Display Options (Examples)
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 72
Synopsys’ Virtual Platform PortfolioDesignWare®
System-Level Library Innovator Services
High-performance models to build virtual
Environment for developing, running &
d b i i t lExpert services for model creation, virtual platformmodels to build virtual
platforms
SystemC™ TransactionLevel Models
Processors
DesignWare®
System-Level Library
Pre-Assembled Platforms
DesignWare Cores
DesignWareAMBA
Components
debugging virtual platforms
creation, virtual platform assembly & customization
Start SW development early and
Virtual Platforms
Start SW development early and shrink time-to-market using high-performance Virtual PlatformsEnhance design quality through SystemC executable specificationIncrease design confidence through complete HW/SW system verification
SYSTEM-LEVEL ECO-SYSTEM
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 73
Example: Virtual PlatformsSolution for Early and Productive Software Development!
ChipArchitecture
Si ProtoRTL Netlist GDSII
Firmware OS & Driver
ProductArchitecture
Device Proto
OS & Driver Middleware Application SW
SI Prototype
Device PrototypeTrad
ition
al
Des
ign
Flow
SemiconductorHouse
System House
Middleware
Si ProtoRTL Netlist GDSII
ChipArchitecture
Firmware OS & Driver
ProductArchitecture
Device Proto
OS & Driver Middleware Application SW
Device ProtoEnd
to E
nd P
roto
typi
ngD
esig
n Fl
ow
SemiconductorHouse
System House
SI Prototype
Virtual PrototypeFPGA Prototype
SI Prototype
Virtual PrototypeFPGA Prototype
Schedule Improvement
SDK usage
SDK usage
Previous Chip (for derivative)
Middleware
Previous Chip
Multiple Players Needed
Models from multiple
Different Options for multiple
vendors
Different Simulator
Software Debug
Options
Driver Software for Specific IP Blocks
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 74
System-Level Eco-system
Steepest Ascent and Synopsys
Reference receivers System Studio
testsimulationverification
Test models & RMCs
Custom waveforms
LTE Library
verification
synchronisation
ch. estimation
MIMO receiver& equalisation
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 75
Tensilica DPU ISS in Synopsys Innovator
Wire to PinWire-to-PinAdaptor
Tensilica core ISS
PIF-to-TLM-2.0Adaptor
ScriptingEngine
SUMMARY
4GWE Fall 2009: Development Tools for 4G Hardware and Software
9/2/2009
© Tensilica, Steepest Ascent, Synopsys 76
System-Level ChallengesHow do I develop the
right signal processing algorithms, implement
them and export them to HDL and Verification
How do I get a virtual model of my platform to the programmers for
software development early?
Source: EETimes, 02/05/2007
System Studio
Design and analysis of signal
System Studio Model Libraries
HDL and Verification flows?
early?
Innovator
Develop, execute & analyze virtual
l tf
DesignWare® System-Level Library
processing algorithms
Services: Model & virtual platform
creation & supportTraining & methodology transfer
platforms
Predictable Success
Top Related