The Digital Video Revolution Transition from Analog to Digital Video
Navigate, store, retrieve and share digital programs as well as access to new interactive services and connectivity possibility
Home entertainment systems will be implemented with a number of Digital Video appliances Digital Televisions (DTVs) DVD Players Digital Video Recorders Set-top Boxes
The Philips Nexperia Platform Approach Philips Semiconductors decided to serve the application
domains of digital video and mobile with a platform approach : Nexperia
Nexperia : ‘next experience’
Nexperia platform properties Flexibility Innovation Future-proof Using architecture framework and IP blocks
Nexperia-DVP Platform Conceps Nexperia-DVP is a Reference Architecture
A set of documents the describes how the products of the Digital Video Product Family will be partitioned into subsystems and how functionality will be split over theses subsystems.
1) Nexperia-DVP Soc Reference Architecture2) Nexperia-DVP Software Reference Architecture3) Nexperia-DVP System Reference Architecture
Nexperia-DVP SoC Reference Architecture
.
Scalable VLIW Media Processor: 100 to 300+ MHz 32-bit or 64-bit
Nexperia™
System Buses 32-128 bit
General-purpose Scalable RISC Processor50 to 300+ MHz32-bit or 64-bit
Library of DeviceIP Blocks Image coprocessors DSPs UART 1394 USB…and more
TM-xxxxTM-xxxxTM-xxxxTM-xxxx D$D$
D$D$
I$I$I$I$
TriMedia CPUTriMedia CPU
DEVICE IP BLOCKDEVICE IP BLOCKDEVICE IP BLOCKDEVICE IP BLOCK
DEVICE IP BLOCKDEVICE IP BLOCKDEVICE IP BLOCKDEVICE IP BLOCK
DEVICE IP BLOCKDEVICE IP BLOCKDEVICE IP BLOCKDEVICE IP BLOCK
.. .. ..
DVP SYSTEM SILICONP
I B
US
SDRAMSDRAM
MMIMMI
DV
P M
EM
OR
Y
BU
S
DEVICE IP BLOCKDEVICE IP BLOCK
PRxxxxPRxxxxD$
D$
I$I$
MIPS CPU
DEVICE IP BLOCKDEVICE IP BLOCK. . .
DEVICE IP BLOCKDEVICE IP BLOCK
PI
BU
S
TriMediaTriMedia™™MIPSMIPS™™
Three Levels of AbstractionLevel 1 : DVP Software-Hardware Rules
Deals with the software view of the hardware Unified Memory Architecture Rules for Data Movement Endianess Ordering & Coherency Interrupts Data formats, including Pixel Formats Trimedia-MIPS Communication Protection Boot
Three Levels of AbstractionLevel 2 : Device Transaction Level (DTL)
Deals with point-to-point transfers between Device IPs and the Connection Network, specifying a Device IP partition and architecture that is compatible with Level 1
Two main types of ports : Typical of Device IP Device Control & Status Ports DMA ports
Three Levels of AbstractionLevel 3 : Connection Network
Deals with the more traditional Bus Hierarchy & Bus Level
In second generation Nexperia-DVP The Device Control & Status (DCS) Network - Primarily a low latency communication path for the CPUs and other initiat
ors to access the control & status register in the Device IPs The Memory Connection Network - It will automate the generation of structures - current generation is implemented with two key elements : Memory Transaction Level (MTL) ports and protocol & Pipelined Memory Access Network
Chiplet-based Design Chiplet : partitioning divid
es the logic hierarchy into manageable sized blocks
A chiplet is a group of modules which are placed together because either they are synchronous to each other or they are not timing critical.
<PNX8550 chip layout>
Chiplet-based Design (Cont.) The partitioning of the top
-level netlist among chiplets followed these guidelines There should be as few synchr
onous signal crossings between chiplets as possible
The clock module is placed into a separate chiplet because of its complexity
Cross-chiplet scan chains are not allowed
<PNX8550 floorplan>
Nexperia-DVP Software Architecture Scalable from low-end to high-end Consistent API (on MIPS or TriMedia) Single Streaming Architecture for MIPS and
TriMedia Aligned to Nexperia-DVP hardware architecture
andIP blocks
Operating system independent software layers Re-use of software components on any instance
of the platform
Nexperia-DVP System Reference Architecture Performance Characteristics
The system must work under normal conditions The system must have a given (short) maximum input/output
delay The system must behave gracefully under exceptional
condition The system must be as cost effective as possible
Critical Resources Memory Bandwidth CPU Cycles (RAM) Memory Size
Effect of bandwidth on transaction latency The memory transaction latency experienced by
a CPU is influenced by four different factors Minimum transaction cost Transactions of other blocks that take precedence Pending transactions of blocks that do not take precedence Other transactions of the CPU itself
Effect of transaction latencyon CPU performance Typical average behavior of two types of code on a Nexperia-DVP system
DSP code Control Code
CharacteristicsLong expressionsHighly repetitive loops
Lots of switch/if statementsMany function calls
Working Set SizeTypically smaller than CPUs I-cache size
Typically much larger than CPUs I-cache size
Typical instruction repetition rate before cache line invalidate
40 or more Between 1 and 3
Typical CPU stall cycles due to cache misses on moderately loaded system
20% and below on TM
80% and above on TM60% and above on MIPS
Predominant Cache Misses D-Cache I-Cache
Typical bandwidth consumed for each effective CPU Mcycle
400KB/s on TM 6.4MB/s on TM1.5MB/s on MIPS
Memory Arbitration To regulate the distribution of available bandwidth over th
e requesting blocks : Utilize an advanced DDR arbitration scheme Fair distribution of bandwidth over requesting blocks Shortest possible transaction latency for CPUs
Nexperia-DVP memory arbiter deploys a sophisticated algorithm for distribution of bandwidth over requesting blocks Guaranteed bandwidth Priority based distribution of remainder
Scheduling Techniques Scheduling of Hardware Accelerators
time
A1 A2 A1 A2 A1 A2
<Sequencing>
“Perform A2 after A1 is finished”
<Slicing>
“Run slice of A2 when A1 finished a slice”
<Staggering>
“Start A2 after a slice of A1, and guarantee that A2 will not overtake A1”
Scheduling Techniques (Cont.) Properties of Scheduling Techniques
Sequencing Slicing Staggering
Software effort LowestHigher, depends on slice size
Low
Buffer memory Largest Smallest Middle
Input/output delay
Longest Shorter Shorter
Requires of processors
NothingMust support slicing
Must be sequential and localized
Top Related