Anshul Kumar, CSE IITD Other Architectures & Examples Multithreaded architectures Dataflow...
-
Upload
kristopher-lucas -
Category
Documents
-
view
223 -
download
3
Transcript of Anshul Kumar, CSE IITD Other Architectures & Examples Multithreaded architectures Dataflow...
Anshul Kumar, CSE IITD
Other Architectures & ExamplesOther Architectures & ExamplesOther Architectures & ExamplesOther Architectures & Examples
Multithreaded architectures
Dataflow architectures
Multiprocessor examples
1st May, 2006
Anshul Kumar, CSE IITD
Context switchingContext switchingContext switchingContext switching
• Delays and poor resource utilization due to -– Data/control hazards– cache misses– waiting for some event
• Solution – – context switch to another thread
• Context switch mechanism –– operating system - slow– hardware - fast
Anshul Kumar, CSE IITD
Multithreaded architectureMultithreaded architectureMultithreaded architectureMultithreaded architecture
• Hardware context switching• Models
– control flow or hybrid (control flow, data flow)
• Granularity– fine grain or coarse grain
• Memory organization– shared?, distributed?, cache coherent?
• No. of threads– small, medium, large
ILP and MultithreadingILP and MultithreadingILP and MultithreadingILP and MultithreadingILP Coarse MT Fine MT SMT
Hen
ness
y an
d P
atte
rson
Anshul Kumar, CSE IITD
Chip level multithreadingChip level multithreadingChip level multithreadingChip level multithreading
Executing instructions from multiple threads within one processor chip at the same time.
• Multithreading: Interleaved issue of multiple instructions from different threads
• Simultaneous multithreading (SMT): Issue multiple instructions from multiple threads in one cycle.
• Chip-level multiprocessing (CMP or Multicore): integrate two or more superscalar processors into one chip, each execute one thread independently
• Any combination of multithreading/SMT/CMP
Wik
iped
ia
Anshul Kumar, CSE IITD
Historical ExamplesHistorical ExamplesHistorical ExamplesHistorical Examples
Machine Granu- Procs Threads/ Memory Year
larity proc
HEP from fine max 16 8 active shared 1978
Denelcor 64 max centralized
Tera fine max 256 128 distributed 1990
shared
Alewife coarse max 512 1 active CC 1990
(MIT) sparcle 3 loaded
Anshul Kumar, CSE IITD
Modern examplesModern examplesModern examplesModern examples
• Pentium 4 Hyperthreading• MIPS MT 8 cores with 4 threads each
• IBM Power 5 dual core, 2 threads each
• Ultrasparc T1 fine grained multithreading
Anshul Kumar, CSE IITD
HEPHEPHEPHEP
FU1 FU2 FUn
Operandfetch
Matchingunit
Registers
Programmemory
Incrementcontrol
PSWqueue
To/fromdata
memory
SFU
Control loop 8 stage pipelinescheduler function unit
Anshul Kumar, CSE IITD
Control Flow & Data Flow modelsControl Flow & Data Flow modelsControl Flow & Data Flow modelsControl Flow & Data Flow models• Control Flow (von Neumann)
– control flows through a sequence of instructions, branches can alter the flow
– instructions get data from or put data in memory
– explicit parallelism through control operators – fork/join
• Data Flow– instructions are triggered by availability of data– data flows from instruction to instruction– explicit parallelism
Anshul Kumar, CSE IITD
Dataflow ModelDataflow ModelDataflow ModelDataflow Model
- +
*
A B 1
A-B B+1
R=(A-B)*(B+1)
Anshul Kumar, CSE IITD
Dataflow ProgramDataflow ProgramDataflow ProgramDataflow Program
A
B
A-B B+1
R=(A-B)*(B+1)
-
L4/1
+
1L4/2
*
L6/1
-
L2/2L3/1
B
L1:
L2: L3:
L4:
Compute B
Anshul Kumar, CSE IITD
Static Dataflow ArchitectureStatic Dataflow ArchitectureStatic Dataflow ArchitectureStatic Dataflow Architecture
FU1 FU2 FUn
Fetchunit
Updateunit
ActivityStore
Instructionqueue
to/from other PEs
Anshul Kumar, CSE IITD
Tagged-token dataflow architectureTagged-token dataflow architectureTagged-token dataflow architectureTagged-token dataflow architecture
FU1 FU2 FUn
Fetchunit
Formtoken unit
Instruction/data
memory
Tokenqueue
to/from other PEs
Matchingunit
Matchingstore
Anshul Kumar, CSE IITD
UMA ExamplesUMA ExamplesUMA ExamplesUMA Examples
• Earlier approach : Large number of processors (e.g. Denelcor HEP, NYU Ultracomputer)
• Now realized : Good only for small number of processors (e.g. Encore Multimax - 1980’s, SGI Power Challenge - 1990’s)
Anshul Kumar, CSE IITD
SGI Power ChallengeSGI Power ChallengeSGI Power ChallengeSGI Power Challenge
• 18 MIPS R 8000
• 16 GB RAM, 8-way interleaved
• 4 power channel-2, each 320 MB/s (I/O bus)
• Power path-2 : split transaction shared bus (256 bit data, 40 bit address)
• Snoopy cache coherence protocol
Anshul Kumar, CSE IITD
NUMA ExamplesNUMA ExamplesNUMA ExamplesNUMA Examples
• BBN TC2000
• IBM RP3
• Hector
• Cray T3D
Anshul Kumar, CSE IITD
HectorHectorHectorHector
• Hierarchical Structureglobal ring
local rings
stations
Proc module (P+C+M)
I/O module
Anshul Kumar, CSE IITD
HectorHectorHectorHector
local ring
global ring
local ring
station station station
station station station
Procmodule
Procmodule
Procmodule
I/Omodule
Stationcontroller
Station bus
Station
Anshul Kumar, CSE IITD
Cray T3DCray T3DCray T3DCray T3D
• Alpha 21064 Proc Cray Y-MP host
• upto 128 GB memory
• 4x4x4 3D torus - config upto 8x8x8
• 2 PEs in each node
Anshul Kumar, CSE IITD
CC-NUMA examplesCC-NUMA examplesCC-NUMA examplesCC-NUMA examples
Machine Nodes Mem Cache NetWisconsin single proc per col bus snoopybus gridMulticubeAquarius single proc per node snoopy+ bus gridMultimulti directoryStanford cluster per cluster snoopy+ pair ofDash 4 R3000+ directory meshes
FPU on busStanford single proc per node directory 2DFlash T5+magic chip meshConvex hyper node per SCI X barExemplar 8 PA-RISC hyper node (hyper node)
multi rings
Magic chip : memory + I/O + network controller
Anshul Kumar, CSE IITD
COMA examplesCOMA examplesCOMA examplesCOMA examples
• DDM (Data Diffusion Machine)– single bus (split transaction)– can be made hierarchical
• KSR 1– hierarchical rings– distributed directory is a matrix :
rows for pages, columns for caches
Anshul Kumar, CSE IITD
Distr Mem Arch ExamplesDistr Mem Arch ExamplesDistr Mem Arch ExamplesDistr Mem Arch ExamplesMachine Comp. Comm. Vec. Switch Topology
proc proc procnCUBE2 custom custom hyper cubeiPSC2 i386 yes yes hyper cubeIntel i860 i860 custom 2D mesh ParagonGenesis i870 i870 custom 2 level X barManna i860 i860 16x16 X bar hierarch.Parsytec P.PC601 T805 C004 3D meshTranstech i860 T805 C004 variable ParamidIBM SP2 Power2 i860 custom fat treeMeiko SPARC custom Fujitsu custom fat tree C32Parsys T900 T900 C104 hierarch sw SN9800