Distributed Shared Memory

26
CSS434 DSM 1 CSS434 Distributed Shared CSS434 Distributed Shared Memory Memory Textbook Ch18 Textbook Ch18 Professor: Munehiro Fukuda

description

Basic Concepts of Distributed Shared MEMORY

Transcript of Distributed Shared Memory

  • CSS434 DSM*CSS434 Distributed Shared MemoryTextbook Ch18Professor: Munehiro Fukuda

    CSS434 DSM

  • CSS434 DSM*Basic ConceptCommunication NetworkDistributed Shared Memory(exists only virtually)Data = read(address);write(address, data);addressA cache line or a page is transferred to and cached inthe requested computer.

    CSS434 DSM

  • CSS434 DSM*Writer Process on DSM#include "world.h" structshared { int a,b; };Program Writer:main(){int x;struct shared *p;methersetup(); /* Initialize the Mether run-time */p = (struct shared *)METHERBASE; /* overlay structure on METHER segment */p->a = p->b = 0;/* initialize fields to zero */while(TRUE){/* continuously update structure fields */p >a = p >a + 1;p >b = p >b - 1;}}

    CSS434 DSM

  • CSS434 DSM*Reader Process on DSMProgram Reader:main(){struct shared *p;methersetup();p = (struct shared *)METHERBASE;while(TRUE) {/* read the fields once every second */printf("a = %d, b = %d\n", p >a, p >b);sleep(1);}}

    CSS434 DSM

  • CSS434 DSM*Why DSM?Simpler abstractionUnderlying tedious communication primitives are all shielded by memory accessesBetter portability of distributed application programsNatural transition from sequential to distributed applicationBetter performance of some applicationsData locality, one-demand data movement, and large memory space reduce network traffic and paging/swapping activities.Flexible communication environmentSender and receiver have no need to know each other. They even need not coexist.Ease of process migrationMigration is completed only by transferring the corresponding PCB to the destination.

    CSS434 DSM

  • CSS434 DSM*Main IssuesGranularityFine (less false sharing but more network traffic) Cache line (e.g. Dash and Alewife), Object (e.g. Orca and Linda), Page (e.g. Ivy) Coarse(more false sharing but less network traffice)Memory coherence and access synchronizationStrict, Sequential, Causal, Weak, and Release Consistency models Data location and accessBroadcasting, centralized data locator, fixed distributed data locator, and dynamic distributed data locatorReplacement strategyLRU or FIFO (The same issue as OS virtual memory)ThrashingHow to prevent a block from being exchanged back and forth between two nodes.Heterogeneity

    CSS434 DSM

  • CSS434 DSM*Consistency ModelsTwo processes accessing shared variablesAt the beginning a = b = 0;DSM needs a consistency model.

    CSS434 DSM

  • CSS434 DSM*Consistency ModelsStrict ConsistencyWi(x, a): Processor i writes a on variable x, (i.e., x = a;).bRi(x): Processor i reads b from variable x. (i.e., y = x; && y == b;).Any read on x must return the value of the most recent write on x.

    Strict ConsistencyNot Strict ConsistencyP1P2P3P1P2P3W2(x, a)aR1(x)aR3(x)W2(x, a)nilR1(x)aR3(x)aR1(x)aR1(x)

    CSS434 DSM

  • CSS434 DSM*Consistency ModelsLinearizability and Sequential ConsistencyLinearlizability: Operations of each individual process appear to all processes in the same order as they happen.Sequential Consistency: Operations of each individual process appear in the same order to all processes.LinearlizabilitySequential ConsistencyP1P2P3W2(x, a)aR1(x)bR1(x)P4aR4(x)W3(x, b)bR4(x)P1P2P3W2(x, a)bR1(x)aR1(x)P4bR4(x)W3(x, b)aR4(x)Nil
  • CSS434 DSM*Consistency ModelsFIFO and Processor ConsistencyFIFO Consistency: writes by a single process are visible to all other processes in the order in which they were issued.Processor Consistency: FIFO Consistency + all write to the same memory location must be visible in the same order.FIFO ConsistencyProcessor ConsistencyP1P2P3W2(x, b)aR1(x)0R1(x)P1P2P3W2(x, a)W3(x, 1)W3(x, 0)W2(x, b)W2(x, a)W3(y, 1)W3(y, 0)P4P41R1(x)bR1(x)aR1(x)0R1(x)bR1(x)1R1(z)W2(y, a)W3(z, a)aR1(y)1R1(x)1R1(z)aR1(y)W2(y, a)W3(z, 1)aR1(x)0R1(y)1R1(y)bR1(x)aR1(y)1R1(z)aR1(x)0R1(y)1R1(y)bR1(x)aR1(y)1R1(z)

    CSS434 DSM

  • CSS434 DSM*Consistency ModelsCausal ConsistencyCausally related write must be visible to all processes in the same order. Concurrent writes may be propagated in a different order.Causal ConsistencyNot Causal ConsistencyP1P2P3bR4(x)cR1(x)P4P1P2P3P4W2(x, a)aR3(x)W3(x, b)bR1(x)cR4(x)W2(x, c)aR4(x)aR3(x)aR1(x)W2(x, a)aR3(x)W3(x, b)bR1(x)bR4(x)aR4(x)

    CSS434 DSM

  • CSS434 DSM*Consistency ModelsWeak ConsistencyAccesses to synchronization variables must obey sequential consistency.All previous writes must be completed before an access to a synchronization variable.All previous accesses to synchronization variables must be completed before access to non-synchronization variable.Weak ConsistencyNot Weak ConsistencyP1P2P3P1P2P3W2(x, a)W2(x, b)W2(y, c)S2S1S3bR4(x)cR4(y)cR4(y)bR4(x)W2(x, a)W2(x, b)W2(y, c)S2S1S3aR4(x)cR4(y)bR4(x)cR4(y)aR4(x)NilR4(y)bR4(x)

    CSS434 DSM

  • CSS434 DSM*Consistency ModelsRelease ConsistencyAccess to acquire and release variables obey processor consistency.Previous acquires requested by a process must be completed before the process performs a data access.All previous data accesses performed by a process must be completed before the process performs a release.P1P2P3aR3(x)Acq1(L)W1(x, a)W1(x, b)Rel1(L)Acq2(L)Rel2(L)bR2(x)bR2(x)

    CSS434 DSM

  • CSS434 DSM*Process 1: acquireLock();// enter critical sectiona := a + 1;b := b + 1;releaseLock();// leave critical sectionProcess 2: acquireLock();// enter critical sectionprint ("The values of a and b are: ", a, b);releaseLock();// leave critical sectionConsistency Models Release Consistency (Example)

    CSS434 DSM

  • CSS434 DSM*Implementing Sequential ConsistencyReplicated and Migrating Data BlocksNode 1Node 3mbxThen what if Node 2 updates x?

    CSS434 DSM

  • CSS434 DSM*Implementing Sequential ConsistencyWrite Invalidationa copy ofblockblocka copy ofblockClient wants to write:new copy

    CSS434 DSM

  • CSS434 DSM*Implementing Sequential ConsistencyWrite Updatea copy ofblockblocka copy ofblockClient wants to write:

    CSS434 DSM

  • CSS434 DSM*Implementing Sequential ConsistencyRead/Write RequestUnusedWritableRead onlyNilRead-ownedRead(Read a copy from the onwer)Read(Read from memory and get an ownership)Write(invalidate others if they have a copyand get an ownership)Write(invalidate others if they have a copy)Write(invalidate others if they have a copyand get an ownership)Write invalidateWrite invalidateWrite invalidateReplacementReplacementReplacementReplacement

    CSS434 DSM

  • CSS434 DSM*Implementing Sequential ConsistencyLocating Data Fixed Distributed-Server AlgorithmsAddr0writableAddr1read ownedAddr5writableAddr3read ownedAddr7writableAddr2read ownedAddr6writableAddr8read ownedAddr4read ownedProcessor 0Processor 1Processor 2Read addr2Addr2read only

    AddressOwner0P01P02P2

    AddressOwner6P27P18P2

    AddressOwner3P14P25P0

    CSS434 DSM

  • CSS434 DSM*Implementing Sequential ConsistencyLocating Data Dynamic Distributed-Server AlgorithmsAddr0writableAddr1read ownedAddr5writableAddr3read ownedAddr7writableAddr2read ownedAddr8read ownedAddr4read ownedProcessor 0Processor 1Processor 2Read addr2Addr2read ownedAddr2read onlyBreaking the chain of nodes:When the node receives an invalidationWhen the node relinquishes ownershipWhen the node forwards a fault requestThe node points to a new owner

    AddressProbable0P01P02P2

    AddressProbable2P27P18P2

    AddressProbable3P14P25P0

    CSS434 DSM

  • CSS434 DSM*Replacement StrategyWhich block to replaceNon-usage based (e.g. FIFO)Usage based (e.g. LRU)Mixed of those (e.g. Ivy )Unused/Nil: replaced with the highest priorityRead-only: the second priorityRead-owned: the third priorityWritable: the lowest priority and LRU used.Where to place a replaced blockInvalidating a block if other nodes have a copy.Using secondary storeUsing the memory space of other nodes

    CSS434 DSM

  • CSS434 DSM*ThrashingThrashing:Two or more processes try to write the same shared block.An owner keeps writing its block shared by two or more reader processes.The larger a block, the more chances of false sharing that causes thrashing.Solutions:Allow a process to prevent a block from accessed from the others, using a lock.Allow a process to hold a block for a certain amount of time.Apply a different coherence algorithm to each block. What do those solutions require users to do?Are there any perfect solutions?

    CSS434 DSM

  • CSS434 DSM*Paper Review by StudentsIVYDashMuninLinda/Jini/JavaSpaceDiscussions:Classify which system is based on sequential consistency, release consistency, and lazy release consistency.Classify the shared data granularity of these systems: cache-line based, page-based, and object-based.Classify the implementation of these systems: hardware implementation, OS implementation, and User-level implementation.

    CSS434 DSM

  • CSS434 DSM*Non-Turn-In ExercisesIs the memory underlying the following execution of two processes sequentially consistent (assuming that, initially, all variables are set to zero)? P1:R(x)1; R(x)2; W(y)1P2:W(x)1; R(y)1; W(x)2Show that the following history is not causally consistent. P1:W(a)0; W(a)1P2:R(a)1; W(b)2P3:R(b)2; R(a)0Explain the relationship between false sharing and data granularity in DSM.

    CSS434 DSM

  • CSS434 DSM*Non-Turn-In ExercisesProcessor 1ownership tableaddr012ownerP0P0P3sharedP3addr0addr1eventProcessor 2ownership tableaddr345ownerP2P3P0sharedaddr3addr7addr8Processor 3ownership tableaddr678ownerP3P2P2sharedaddr2addr4addr6data itemsdata itemsdata itemscopyaddr14There is a DSM system that is based on the write-invalidation protocol, uses a fixed distributed-server algorithm for locating a given data item, and consists of three processors such as 1, 2, and 3. Each processor has the following data items and an ownership/sharing-processor table.

    CSS434 DSM

  • CSS434 DSM*Non-Turn-In ExercisesGiven the following sequence of memory accesses, draw additional arrows and circles in the above figure as instructed. To distinguish which arrow corresponds to which operation, add the operation number 1 8 to each arrow. Also, update the corresponding ownership table entries.

    (1) Memory access #1: Processor 2 reads data from address 2.Add arrows in the above figure to indicate operations required for the memory access #1.1. Send a query to search for the address 22. Send a request to read from the address 23. Read data from the address 2 to Processor 2Update the corresponding ownership table entry. (Just add P2 in the share field.)Draw a circle to indicate that a copy of address 2 was created on Processor 2. (2) Memory access #2: Processor 1 reads data from address 2.Add arrows in the above figure to indicate operations required for the memory access #2.4. Send a query to search for the address 25. Send a request to read from the address 26. Read data from the address 1 to Processor 2Update the corresponding ownership table entry. (Just add P1 in the share field.)Draw a circle to indicate that a copy of address 2 was created on Processor 1.

    (3) Memory access #3: Processor 2 writes data to address 2.Add arrows in the above figure to indicate operations required for the memory access #3.7. Send a request to update the ownership information on the address 28. Send a write invalidation to all non-owner processors sharing the address 2Update the corresponding ownership table entry. (Make Processor 2 a new owner of address 2 and cross out all other processor Ids in the entry.)Cross out all circles to indicate that old copies of address 2 were all invalidated.

    CSS434 DSM

    **