MIMD Shared Memory

MIMD Shared Memory

Multiprocessors

MIMD -- Shared Memory Each processor has a full CPU Each processors runs its own code

– can be the same program as other processors

or different All processors access the same memory

– Same address space for all processors– UMA Uniform Memory Access

» all memory accessible in same time for every processor

– NUMA Non-Uniform Memory Access» memory is localized

» each processor can access some memory faster than other

MIMD - SM - UMA

PROCESSORS

MEMORY

MODULES

CONNECTION

Options for Connection -- UMA Bus

– Sequential, can be used for one message at a time Switching Network

– Can send many messages at once» depends on connection scheme

– Crossbar» Maximal connections» expensive

– Omega (also called Butterfly, Banyan)» several permutations of proc-mem possible

Bus

Needs smart local cache schemes to reduce bus traffic

Works for low number of processors Depending on technology 20-50 processors

overloads bus, performance degrades Common on 4, 8 processor SMP servers

Bus

Cache

Processors

Bus

Memory

Crossbar switch

Every permutation of processor to memory can work

Expensive N*M switches where

where N = number of processors,

M = Number of memory modules

Processors

Memory

Switches

Crossbar switch

Omega Network

Every Processor Connects to Every Memory Many, but not all, permutations possible An Extra stage adds redundancy and more

permutations Number of switches = (N/2) log N

» For N processors, N memory modules

Number of stages = log N (determines latency)

Omega Network

Processors

Memory

Omega Network

000001

010011

100101

110111

000001

010011

100101

110111

Destination = 101

Omega Network -- A Permutation

000001

010011

100101

110111

000001

010011

100101

110111

Destination = 101

Omega Network with combining

Smart Switches– combine two requests with same destination– make memory accesses equivalent to serial sequence– split return values appropriately

Time trade-off Used in NYU Ultra-computer

– also in IBM RP3 experimental machine Example: Fetch and Increment

Omega Network

000001

010011

100101

110111

000001

010011

100101

110111

Destination = 101

Options for Connection -- NUMA

Each Processor has a segment of memory closer than others– Could be several different levels of access

All Processors still use same address space Omega network with wrap around

– BBN Butterfly Hierarchy of Rings (or other switches)

– Kendall Square Research KSR-1– SGI Origin series

Hierarchical Rings

DirectoryNodes

ComputeNodeTo higher level

ring

Issues for MIMD Shared Memory

Memory Access– Can reads be simultaneous?– How to control multiple writes?

Synchronization mechanism needed– semaphores– monitors

Local caches need to be coordinated– cache coherency protocols

MIMD Shared Memory

Documents

Transcript of MIMD Shared Memory