Signalling in the Heterogeneous Architecture Multiprocessor Paradigm
-
Upload
daria-cardenas -
Category
Documents
-
view
32 -
download
4
description
Transcript of Signalling in the Heterogeneous Architecture Multiprocessor Paradigm
SPIE Gran Canaria 2003 A. Nunez 1
Signalling in the Heterogeneous Architecture Multiprocessor Paradigm
Antonio Núñez, Victor Reyes, Tomás Bautista
Keynote
IUMA, Institute for Applied Microelectronics, ULPGC
SPIE Gran Canaria 2003 A. Nunez 2
Index
MPSoC Architectures -> Hetero MPSoCCommunication Architectures -> Split Transport and Signalling NetworksPrevious and Related workOur SystemC Based Modelling ApproachExperimentsConclusions
SPIE Gran Canaria 2003 A. Nunez 4
Technological ForecastsMoore's Law: number of transistors per chip double every two years
ITRS:Year of 1st shipment 1997 1999 2002 2005 2008 2011 2014Local Clock (GHz) 0,75 1,25 2,1 3,5 6 10 16,9Across Chip (GHz) 0,75 1,2 1,6 2 2,5 3 3,674Chip Size (mm²) 300 340 430 520 620 750 901Dense Lines (nm) 250 180 130 100 70 50 35Number of chip I/O 1515 1867 2553 3492 4776 6532 8935Transistors per chip 11M 21M 76M 200M 520M 1,4B 3,62B
GALSNoC
SoC MPSoC
SPIE Gran Canaria 2003 A. Nunez 6
Processor to DRAM Performance Gap
µProc60%/yr.
DRAM7%/yr.
1
10
100
10001980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
DRAM
CPU
1982
Processor-MemoryPerformance Gap:(grows 50% / year)
Per
form
ance
Time
“Moore’s Law”
SPIE Gran Canaria 2003 A. Nunez 10
Index
MPSoC Architectures -> Hetero MPSoCCommunication Architectures -> Split Transport and Signalling NetworksPrevious and Related workOur SystemC Based Modelling ApproachExperimentsConclusions
SPIE Gran Canaria 2003 A. Nunez 11
Processor Architecture Paradigms Cfr. Ungerer et al, Patterson et al, Tenhunnen et al, Computer special issue
Processor/Memory/Switch Processor- Memory- Communications- dominated systems Communications architecture
Processor-Mono: Speed-up of a single-threaded application Advanced superscalar Trace Cache Superspeculative Multiscalar processors
Processor-Multi: Speed-up of multi-threaded applications Simultaneous multithreading (SMT) Chip multiprocessors (CMPs)
Memory, Processor-in-Memory, IRAM, othersNetwork on Chip Homo
Hetero
Patt, Sohi…
Patterson
Mihal, Tenhunnen, Goosens
Many..
SPIE Gran Canaria 2003 A. Nunez 12
Monoprocessor: Superflow Processor
Fine granularity, data wordThe Superflow processor speculates on instruction flow: two-phase branch predictor combined with
trace cache register data flow: dependence prediction: predict the register
value dependence between instructions source operand value prediction constant value prediction value stride prediction: speculate on constant, incremental
increases in operand values dependence prediction predicts inter-instruction
dependences memory data flow: prediction of load values, of load addresses
and alias prediction
SPIE Gran Canaria 2003 A. Nunez 14
Multiscalar ProcessorsA program is represented as a control flow graph (CFG), where basic blocks are nodes, and arcs represent flow of control.
A multiscalar processor walks through the CFG speculatively, taking task-sized steps, without pausing to inspect any of the instructions within a task.
The tasks are distributed to a number of parallel PEs within a processor.
Each PE fetches and executes instructions belonging to its assigned task.
The primary constraint: it must preserve the sequential program semantics.
SPIE Gran Canaria 2003 A. Nunez 15
Multiscalar mode of execution
A
B C
D
E
Task A
PE 0
Task B
PE 1
Task D
PE 2
Task E
PE 3
Dat
a va
lues
SPIE Gran Canaria 2003 A. Nunez 17
Multiscalar, Trace and Speculative Multithreaded Processors
Multiscalar: A program is statically partitioned into tasks which are marked by annotations of the CFG.Trace Processor: Tasks are generated from traces of the trace cache.Speculative multithreading: Tasks are otherwise dynamically constructed.
Common target: Increase of single-thread program performance by dynamically utilizing thread-level speculation additionally to instruction-level parallelism.A „thread“ means a „HW thread“
SPIE Gran Canaria 2003 A. Nunez 18
Multis: Additional utilization of more coarse-grained parallelism
CMPs Chip multiprocessors or multiprocessor chips integrate two or more complete processors on a single chip, every functional unit of a processor is duplicated.
SMPs Simultaneous multithreaded processors store multiple contexts in different register sets on the chip, the functional units are multiplexed between the threads, instructions of different contexts are simultaneously executed.
SPIE Gran Canaria 2003 A. Nunez 19
CMPs-Homo: Com-arch by shared global memory
Pro-cessor
Pro-cessor
Pro-cessor
Pro-cessor
Secndary Cache
Global Memory
Primary Cache
Shared global memory, no caches
Global Memory
SPIE Gran Canaria 2003 A. Nunez 20
CMPs-Homo: Com-arch by shared primary cache
Pro-cessor
Pro-cessor
Pro-cessor
Pro-cessor
Secondary Cache
Global Memory
Primary Cache
Shared primary cache
SPIE Gran Canaria 2003 A. Nunez 21
CMPs-Homo: Com-arch by global memory, caches
Pro-cessor
Pro-cessor
Pro-cessor
Pro-cessor
PrimaryCache
SecondaryCache
SecondaryCache
SecondaryCache
SecondaryCache
Global Memory
PrimaryCache
PrimaryCache
PrimaryCache
Pro-cessor
Pro-cessor
Pro-cessor
Pro-cessor
PrimaryCache
Secondary Cache
Global Memory
PrimaryCache
PrimaryCache
PrimaryCache
Shared caches and memory Shared secondary cache
SPIE Gran Canaria 2003 A. Nunez 22
Com-arch in Hydra: A Single-Chip Multiprocessor
CPU 0
Centralized Bus Arbitration Mechanisms
Cache SRAM Array DRAM Main Memory I/O Device
A S
ingle Chip
PrimaryI-cache
PrimaryD-cache
CPU 0 Memory Controller
Rambus MemoryInterface
Off-chip L3Interface
I/O BusInterface
DMA
CPU 1
PrimaryI-cache
PrimaryD-cache
CPU 1 Memory Controller
CPU 2
PrimaryI-cache
PrimaryD-cache
CPU2 Memory Controller
CPU 3
PrimaryI-cache
PrimaryD-cache
CPU 3 Memory Controller
On-chip Secondary Cache
SPIE Gran Canaria 2003 A. Nunez 23
CMPs-Hetero: Communications Architecture
Architectures found in today’s heterogeneous processors for platform based designE.gr. CPU cores, AMBA buses, internal/external shared memories
RISCCoreRISCCore
ExternalI/O
ExternalI/O
AMBA BusAMBA Bus
Shared BusShared Bus
Engines EnginesInternal/ExternalMemory
Internal/ExternalMemory
SPIE Gran Canaria 2003 A. Nunez 25
Multithreaded Processors
Aim: Latency tolerance
What is the problem? Load access latencies measured on an Alpha Server 4100 SMP with four Alpha 21164 processors are: 7 cycles for a primary cache miss which hits in the on-chip L2 cache of the
21164 processor, 21 cycles for a L2 cache miss which hits in the L3 (board-level) cache, 80 cycles for a miss that is served by the memory, and 125 cycles for a dirty miss, i.e., a miss that has to be served from another
processor's cache memory.
SPIE Gran Canaria 2003 A. Nunez 26
MultithreadingMultithreading
The ability to pursue two or more threads of control in parallel within a processor pipeline.
Advantage: The latencies that arise in the computation of a single instruction stream are filled by computations of another thread.
Multithreaded processors are able to bridge latencies by switching to another thread of control - in contrast to chip multiprocessors.
SPIE Gran Canaria 2003 A. Nunez 27
Approaches of Multithreaded Processors
Cycle-by-cycle interleaving An instruction of another thread is fetched and fed into the execution pipeline
at each processor cycle.
Block-interleaving The instructions of a thread are executed successively until an event occurs
that may cause latency. This event induces a context switch.
Simultaneous multithreading SMTs Instructions are simultaneously issued from multiple threads to the FUs of a
superscalar processor. combines a wide issue superscalar instruction issue with multithreading.
SPIE Gran Canaria 2003 A. Nunez 28
Multithreading versus Non-Multithreading Approaches
(a) single-threaded scalar
(b) cycle-by-cycle interleaving multithreaded scalar
(c) block interleaving multithreaded scalar
(a)
Tim
e (p
roce
ss c
ycle
s)
(c)
Con
text
sw
itch
(b)
Con
text
sw
itch
SPIE Gran Canaria 2003 A. Nunez 29
Simultaneous Multithreading (SMT)and Chip Multiprocessors (CMP)
(a) SMT
(b) CMP
(a)
Tim
e (p
roce
ssor
cyc
les)
Issue slots
(b)
SPIE Gran Canaria 2003 A. Nunez 30
Combining SMT and Multimedia
Start with a wide-issue superscalar general-purpose processor
Enhance by simultaneous multithreading
Enhance by multimedia unit(s)
Enhance by on-chip RAM memory for constants and local variables
SPIE Gran Canaria 2003 A. Nunez 31
Branch
ComplInteger
RT WBRI
IDIF
GlobalL/S
LocalL/S
ThreadControl
SimpleInteger
LocalMemory
I/O
Memory-interface DCache
BTAC
ICache
Rename
Register
IDIF
To Memory
The SMT Multimedia Processor
SPIE Gran Canaria 2003 A. Nunez 32
12
4 6 81
4
8
6,32
5,56
3,84
1,98
1
6,33
5,64
3,89
1,99
1
5,67
5,34
3,91
1,99
1
3,533,52
3,27
1,96
1
1,861,86
1,86
1,57
0,960123
4
5
6
7
IPC
Issue
Threads
IPC of Maximum Processor Models
SPIE Gran Canaria 2003 A. Nunez 33
Combining CMP-hetero and Multimedia
Start with a general-purpose processor
Enhance by hierarchical-bus com-arch
Enhance by hardware accelerators and copros including multimedia unit(s)
Enhance by on-chip RAM memories for constants, local variables, frames…
SPIE Gran Canaria 2003 A. Nunez 34
Real implementation example: Philips Eclipse architecture instance for video coding
SPIE Gran Canaria 2003 A. Nunez 35
CMP or SMT?
The performance race between SMT and CMP is not yet decided. CMP is easier to implement, but only SMT has the ability to hide latencies. A functional partitioning is not easily reached within a SMT processor due to the centralized instruction issue.
A separation of the thread queues is a possible solution, although it does not remove the central instruction issue.
A combination of simultaneous multithreading with the CMP may be superior.Research: combine SMT or CMP organization with the ability to create threads with compiler support or fully dynamically out of a single thread
thread-level speculation close to multiscalar
SPIE Gran Canaria 2003 A. Nunez 36
Processor-in-Memory
Technological trends have produced a large and growing gap between processor speed and DRAM access latency. Today, it takes dozens of cycles for data to travel between the CPU and main memory.CPU-centric design philosophy has led to very complex superscalar processors with deep pipelines. Much of this complexity is devoted to hiding memory access latency. Memory wall: the phenomenon that access times are increasingly limiting system performance.Memory-centric design is envisioned for the future
SPIE Gran Canaria 2003 A. Nunez 37
PIM or Intelligent RAM (IRAM)
PIM (processor-in-memory) or IRAM (intelligent RAM) approaches couple processor execution with large, high-bandwidth, on-chip DRAM banks.PIM or IRAM merge processor and memory into a single chip.Advantages:
The processor-DRAM gap in access speed increases in future. PIM provides higher bandwidth and lower latency for (on-chip-)memory accesses.
DRAM can accommodate 30 to 50 times more data than the same chip area devoted to caches.
On-chip memory may be treated as main memory - in contrast to a cache which is just a redundant memory copy.
PIM decreases energy consumption in the memory system due to the reduction of off-chip accesses.
VIRAM, CODE
SPIE Gran Canaria 2003 A. Nunez 38
V-IRAM-2: 0.13 µm, Fast Logic, 1GHz 16 GFLOPS(64b)/64 GOPS(16b)/128MB
Memory Crossbar Switch
M
M
…
M
M
M
…
M
M
M
…
M
M
M
…
M
M
M
…
M
M
M
…
M
…
M
M
…
M
M
M
…
M
M
M
…
M
M
M
…
M
+
Vector Registers
x
÷
Load/Store
8K I cache 8K D cache
2-way Superscalar VectorProcessor
8 x 64 8 x 64 8 x 64 8 x 64 8 x 64
8 x 64or
16 x 32or
32 x 16
8 x 648 x 64
QueueInstruction
I/OI/O
I/OI/O
SerialI/O
SPIE Gran Canaria 2003 A. Nunez 39
NoC Processor ArchitectureNetwork-on-chip, specialized PEs, advanced interconnect technologiesWill use packet network architectures in 2010
DSPPE Array
ControllerPE
ControllerPE
On-ChipMemoryOn-ChipMemory
SwitchNode
SwitchNode
PEPE PEPE PEPE
ExternalMemoryExternalMemoryPEPE
ExternalI/O
ExternalI/OPacketPacket
NetworkNetworkSwitchNode
SwitchNode
SPIE Gran Canaria 2003 A. Nunez 40
NoC Mescal Communication Architecture General Paradigm
Mescal Communication Architecture is a general, coarse-grained on-chip interconnection scheme for various system components such as Processing Elements, memory and other communicating elements.
PEPEPEPE $$$$ MEMMEMMEMMEM
ProcessingProcessingElementElement
ProcessingProcessingElementElement
ProcessingProcessingElementElement
ProcessingProcessingElementElement
switchswitchswitchswitch
switchswitchswitchswitch
PEPEPEPE PEPEPEPE
$$$$ MEMMEMMEMMEM bridgebridgebridgebridge
SPIE Gran Canaria 2003 A. Nunez 41
NoC Mescal Abstract System Architecture
ProcessingElement
CommunicationInstructions(send/recv)
CommunicationAssist
On-Chip-NetworkOperations
On Chip Network
ProcessingElement
CommunicationAssist
CommunicationInstructions(send/recv)
On-Chip-NetworkOperations
Physical Layer
Data Link Layer
Network Layer
Transport Layer
Session Layer
Presentation Layer
Application Layer
SPIE Gran Canaria 2003 A. Nunez 42
NoC Communication Architecture
Packet Assembler PacketDeassembler
Packet SwitchNetwork Operation
N3
N4
N0
N5N2
N6
N1 N7
Packet Switching Network
Translation of network operations topacket switch operations
On-Chip-NetworkOperations
On-Chip-NetworkOperations
Physical Layer
Network Layer
Data Link Layer
CorrespondingProtocol Stack
SPIE Gran Canaria 2003 A. Nunez 43
NoC: Example for a bus
Physical Layer
Data Link LayerBus Interface Adapter
On Chip Bus
Bus Interface Adapter
Bus Operation
Translation of network operationsto bus operations
On-Chip-NetworkOperations
On-Chip-NetworkOperations
Corresponding ProtocolStack
SPIE Gran Canaria 2003 A. Nunez 44
Index
MPSoC Architectures -> Hetero MPSoCCommunication Architectures -> Split Transport and Signalling NetworksPrevious and Related workOur SystemC Based Modelling ApproachExperimentsConclusions
SPIE Gran Canaria 2003 A. Nunez 45
Todays Communication Architecture Paradigms: Topology
Single and Shared Transport and Signalling Channel p2p Bus Hierarchical bus Switch
Crossbar Multistage…
Ring Trees Network
Circuit sw Packet sw w/o connection Packet sw w connection..
SPIE Gran Canaria 2003 A. Nunez 46
Todays Communication Architecture Paradigms: Topology
Split Transport and Signalling Transport
Topology (bus, h-bus, switch, ring, network…) Signalling (Addresses and routing, services,
synchronisms) Associated channel
Topology Common channel
Topology… Protocol layer stack: software and process view of
the generation of hardware signalling requires mapping onto actual interfaces
SPIE Gran Canaria 2003 A. Nunez 47
Todays Communications Architecture Paradigms: Bandwidth
Application Granularity
Transport Granularity Fine grain Medium grain Coarse grain Bus sizes, transfer sizes
Traffic Characterization Traffic Characterization E.gr. Streaming, burstiness, interval requests, space-time
distribution
SPIE Gran Canaria 2003 A. Nunez 48
Todays Communications Architecture Paradigms: Protocols
Protocols High level signalling primitives mapping Communications to architecture mapping Access policies mapping, priorities, static, dynamic Traffic and flow control
Burstiness Request Intervals Concurrency
SPIE Gran Canaria 2003 A. Nunez 49
Todays Communications Architecture Paradigms: Signalling
Addressing, routing infoService infoHand-shake and command sync strobes
High level signalling primitives mapping Communications to architecture mapping Access policies mapping, priorities, static, dynamic Traffic and flow control
Burstiness Request Intervals Concurrency Streaming ...
SPIE Gran Canaria 2003 A. Nunez 50
Com-arch Modelling: Ptolemy-MescalUCBerkeley PtolemyI&II, Mescal, UCSD-Dey, PR-Vissers, Goosens, Lippen.., TIMA-Jerraya..Components for channels:
Synchronous digital bus (shared or point-to-point) ARM AMBA bus IBM CoreConnect bus Analog channel
Actors encapsulate the physical layerEach actor has a common interface to make experimentation possiblePtolemy actor interface is a higher level than the channel’s actual electrical interface
SPIE Gran Canaria 2003 A. Nunez 51
Com-arch Modelling: Ptolemy-Mescal
Components for CommAssists Queues Arbitrators PE interfaces Bus interfaces External memory or I/O cycle generators Switches Small memories
Parameterizable components
Programmable components
Designing a CA, very similar to designing a PE
SPIE Gran Canaria 2003 A. Nunez 52
Com-arch Modelling: Ptolemy-Mescal
Encapsulate a PE model as a composite actorCombine with CA components to make a CommunicatorEncapsulate Communicator model as a composite actorCombine multiple Communicators with Channel components to make a complete system
SPIE Gran Canaria 2003 A. Nunez 53
Index
MPSoC Architectures -> Hetero MPSoCCommunication Architectures -> Split Transport and Signalling NetworksPrevious and Related workOur SystemC Based Modelling ApproachExperimentsConclusions
SPIE Gran Canaria 2003 A. Nunez 54
Case study: Communication architecture in HA-MPSoC
Mapping communicating processes and threads on HA-
MPSoC requires efficient ways of implementing the on-
chip communicationPrevious work: comparative performance of different classes of data communication architectures (San Diego)But: The communication architecture can be split in: the data communication architecture, and the signalling and synchronization architectureThe impact of different signalling and synchronization architectural options on the overall performance has not been sufficiently studied
SPIE Gran Canaria 2003 A. Nunez 55
Our focus: Signalling in the HA-MPSoC paradigm, split sync, SystemC modelling
New solutions for signalling and synchronization in the HA-MPSoC paradigmBased in a technique for modelling the communication and synchronization architectures using SystemCHigh abstraction modelling based on the Kahn Process Network Model of ComputationHere: Variations on Dey’s simple communication architecture (bus)
SPIE Gran Canaria 2003 A. Nunez 56
Previous related work: UCSD-Dey
Analysis of the performance of various SoC communication architectures under different classes of on-chip communication trafficIdentifying parts of the application’s “communiation traffic space” for which different communication architectures are well-suited Methodology based on POLIS/PTOLEMY
SPIE Gran Canaria 2003 A. Nunez 57
Previous related work: Dey’s communication architectures
Static Priority Based Shared Bus ArchitectureTwo-level TDMA Based ArchitectureHierarchical Bus ArchitectureRing Based Architecture
SPIE Gran Canaria 2003 A. Nunez 58
Index
MPSoC Architectures -> Hetero MPSoCCommunication Architectures -> Split Transport and Signalling NetworksPrevious and Related workOur SystemC Based Modelling ApproachExperimentsConclusions
SPIE Gran Canaria 2003 A. Nunez 59
Abstracting high level communication
KPN: concurrent tasks interconnected by channels (FIFOs)Processes have to share service administrative information related to the FIFOsAdministrative information divided in two parts: static and dynamic information The update of the dynamic information of the FIFO is the synchronization aspect of the complete signalling function
SPIE Gran Canaria 2003 A. Nunez 60
Producer
ConsumerFIFO
Administrative information- Base address memory- FIFO size- Number of data in FIFO
Administrative information- Base address memory- FIFO size- Number of data in FIFO
A simple KPN example
SPIE Gran Canaria 2003 A. Nunez 61
Signalling Primitives in MPSoC
Flexiblity and scalability, a protocol for communicating tasks is neededSet of primitives for data communication and synchronization. The Eclipse (Philips Research) example:- Primitives for data communication:
void Read(int port_id, int offset, int n_bytes, Bytes *bytevector)
void Write(int port_id, int offset, int n_bytes, Bytes *bytevector)
- Primitives for data synchronization:
bool GetSpace(int port_id, int n_bytes)
void PutSpace(int port_id, int n_bytes)
SPIE Gran Canaria 2003 A. Nunez 62
Our SystemC-based Modelling
Executable specification of a system described in different abstraction levels (functional untimed, timed, transaction level and cycle-true)TLM is a natural method to perform system level performance simulationSystemC Master/Slave library hides the more complex details of C++ programming and fits well for TLM developmentThe design time of complex MPSoC models can be greatly shortened using the SystemC Master/Slave library
SPIE Gran Canaria 2003 A. Nunez 63
Application modelling
Chain of P processors interconnected
through FIFOsSimulation parameters: number of processes (P), token size (data-granularity), request intervals, waiting cycles, transfer cycles, execution time, total simulation time
Pin P1 PP-2 Pout
FIFO1 FIFOP-1
SPIE Gran Canaria 2003 A. Nunez 64
Index
MPSoC Architectures -> Hetero MPSoCCommunication Architectures -> Split Transport and Signalling NetworksPrevious and Related workOur SystemC Based Modelling ApproachExperimentsConclusions
SPIE Gran Canaria 2003 A. Nunez 65
Static Priority Based Shared Bus Architecture
0
50
100
150
200
250
1 10 50 100
Token size
Avera
ge C
om
mu
nic
ati
on
rate
Inter-Request = 10
Inter-Request = 100
Inter-Request = 500
Inter-Request = 1000
SPIE Gran Canaria 2003 A. Nunez 66
Two-level TDMA Based Architecture
0
50
100
150
200
250
1 10 50 100
Token size
Ave
rag
e C
om
mu
nic
atio
n r
ate
Inter-Request = 10
Inter-Request = 100
Inter-Request = 500
Inter-Request = 1000
SPIE Gran Canaria 2003 A. Nunez 67
Hierarchical Bus Architecture
0
50
100
150
200
250
300
350
1 10 50 100
Token size
Ave
rag
e C
om
mu
nic
atio
n r
ate
Inter-Request = 10
Inter-Request = 100
Inter-Request = 500
Inter-Request = 1000
SPIE Gran Canaria 2003 A. Nunez 68
Ring Based Architecture
0
50
100
150
200
250
300
350
1 10 50 100
Token size
Ave
rag
e C
om
mu
nic
atio
n r
ate
Inter-Request = 10
Inter-Request = 100
Inter-Request = 500
I nter-Request = 1000
SPIE Gran Canaria 2003 A. Nunez 69
Reminder of Dey’s communication architectures
Static Priority Based Shared Bus ArchitectureTwo-level TDMA Based ArchitectureHierarchical Bus ArchitectureRing Based Architecture
SPIE Gran Canaria 2003 A. Nunez 70
Experiments: Additional models of communication architectures
ARB
MEM
P1
Wd
P2
Wd
P3
Wd
P4
Wd
Ws Ws Ws Ws
P1
Wd
P2
Wd
P3
Wd
P4
Wd
ARB
MEM
SYNC
ARB
MEM
P1
Wd
P2
Wd
P3
Wd
P4
Wd
ARB
MEM
P1
Wd
P2 P3 P4
ARB
Wd Wd Wd
Ws Ws Ws Ws
ARB
MEM
P1
Wd - Ws
P2 P3 P4
Wd - Ws Wd - Ws Wd - Ws
SPIE Gran Canaria 2003 A. Nunez 71
ARB
MEM
P1
Wd
P2 P3 P4
Wd Wd Wd
Centralized architecture using shared memory (Mem)
sync
SPIE Gran Canaria 2003 A. Nunez 72
SYNC
ARB
MEM
P1
Wd
P2
Wd
P3
Wd
P4
Wd
Centralized architecture using a central synchronization module (Central)
SPIE Gran Canaria 2003 A. Nunez 73
ARB
MEM
P1
Wd-Ws
P2 P3 P4
Wd-Ws Wd-Ws Wd-Ws
Distributed architecture, same bus for data transport and synchronization (Single-Bus)
SPIE Gran Canaria 2003 A. Nunez 74
ARB
MEM
P1
Wd
P2 P3 P4
ARB
Wd Wd Wd
Ws Ws Ws Ws
Distributed architecture, splitting data transport bus and sync bus (2-Busses)
SPIE Gran Canaria 2003 A. Nunez 75
ARB
MEM
P1
Wd
P2
Wd
P3
Wd
P4
Wd
Ws Ws Ws Ws
Distributed architecture with ring topology for synchronization (Ring)
SPIE Gran Canaria 2003 A. Nunez 76
Implementation example: Philips Eclipse architecture instance for video coding
SPIE Gran Canaria 2003 A. Nunez 77
Additional measurements
Quantify what synchronization topology allows the shortest execution time for an application, i.e. the more efficient from the performance point of viewThe Coprocessor Usage percentage figure (Ucop):
%Ucop = (Texec/Tsim) · 100
SPIE Gran Canaria 2003 A. Nunez 78
.
Coprocessor Usage, P = 4
0123456789
10
1 4 8 16
Token size
%
Ring
Single-bus
Mem
Central
2-busses
SPIE Gran Canaria 2003 A. Nunez 79
Coprocessor Usage, P = 8
00.5
11.5
22.5
33.5
44.5
5
1 4 8 16
Token size
%
Ring
Single-bus
Mem
Central
2-busses
SPIE Gran Canaria 2003 A. Nunez 80
ConclusionsIncreasing importance of communication architecture, MPSoCs <-> NoCsDesign space exploration extended with communication-architecturesSystemC master/slave library powerful modelling toolLarge performance spread found due to communication topologies, signalling protocols, and traffic characteristicsNeed of more qualitative and quantitative modelling, analysis, studies, toolsConsider splitting transport and signallingHierarchical buses, rings, plus splitting ++