Data/Thread Level Speculation (TLS) in the Stanford...

38
EECC722 EECC722 - - Shaaban Shaaban #1 lec # 10 Fall 2005 10-24-2005 Data/Thread Level Speculation (TLS) in the Stanford Hydra Chip Multiprocessor (CMP) A 4-core Chip Multiprocessor (CMP) based microarchitecture/compiler effort at Stanford that provides hardware/software support for Data/Thread Level Speculation (TLS) to extract parallel speculated threads from sequential code (single thread) augmented with software thread speculation handlers (Primary papers: 4, 6)

Transcript of Data/Thread Level Speculation (TLS) in the Stanford...

Page 1: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#1 lec # 10 Fall 2005 10-24-2005

Data/Thread Level Speculation(TLS) in the Stanford Hydra Chip Multiprocessor (CMP)

A 4-core Chip Multiprocessor (CMP) basedmicroarchitecture/compiler effort at Stanford that provides hardware/software support for Data/Thread Level Speculation (TLS) to extract parallel speculated threads from sequential code (single thread) augmented with software thread speculation handlers

(Primary papers: 4, 6)

Page 2: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#2 lec # 10 Fall 2005 10-24-2005

Motivation for Chip Multiprocessors (CMPs)• A CMP offers implementation benefits

– High-speed signals are localized in individual CPUs– A proven CPU design is replicated across the die (including SMT

processors, e.g IBM Power 5)

• Overcomes diminishing performance/transistor return problem in uniprocessors (similar motivation for SMT)– Transistors are used today mostly for ILP extraction– MPs use transistors to run multiple threads (exploit thread level

parallelism, TLP):• On parallelized programs• With multiprogrammed workloads

– A number of single-threaded applications executing of different CPUs

• Fast inter-processor communication eases parallelization of code (Using shared L2 cache)

• Potential Drawback of CMPs: High power/heat issues using current VLSI processes.

Page 3: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#3 lec # 10 Fall 2005 10-24-2005

Stanford Hydra CMP Approach Goals• Exploit all levels of program parallelism.• Develop a single-chip multiprocessor architecture that simplifies

microprocessor design and achieves high performance.• Make the multiprocessor transparent to the average user.• Integrate use of parallelizing compiler technology in the design of

microarchitecture that supports data/thread level speculation (TLS).

Within a single CPU core

On multiple CPU cores withina single CMP or multiple CMPs

On multiple CPU cores withina single CMP using Thread LevelSpeculation (TLS)

Page 4: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#4 lec # 10 Fall 2005 10-24-2005

Hydra Prototype Overview

• 4 CPU cores with modified private L1 caches.• Speculative coprocessor (for each processor core)

– Speculative memory reference controller– Speculative interrupt screening mechanism– Statistics mechanisms for performance evaluation and to

allow feedback for code tuning• Memory system

– Read and write buses– Controllers for all resources– On-chip shared L2 cache– L2 Speculation write buffers.– Simple off-chip main memory controller– I/O and debugging interface

Page 5: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#5 lec # 10 Fall 2005 10-24-2005

The Basic Hydra CMP

• 4 processors and secondary cache on a chip• 2 buses connect processors and memory• Coherence: writes are broadcast on write bus

Page 6: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#6 lec # 10 Fall 2005 10-24-2005

Hydra Memory Hierarchy Characteristics

Page 7: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#7 lec # 10 Fall 2005 10-24-2005

Hydra Prototype Layout

250 MHz clock rate target

SharedL2

L2SpeculationWriteBuffers

Page 8: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#8 lec # 10 Fall 2005 10-24-2005

CMP Parallel Performance• Varying levels of performance

– Multiprogrammed workloads work well– Very parallel apps (matrix-based FP and multimedia) are excellent– Acceptable only with a few less parallel (i.e. integer) applications

WithoutThread LevelSpeculation (TLS)

Thread LevelSpeculation (TLS)Target Applications

Page 9: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#9 lec # 10 Fall 2005 10-24-2005

The Parallelization Problem• Current automated parallelization software (parallel compilers) is limited

– Parallel compilers are generally successful for scientific applications with statically known dependencies (e.g dense matrix computations).

– Automated parallization of general-purpose applications provides poor parallel performance especially for integer applications due to ambiguous dependencies resulting from:

• Significant pointer use: Pointer aliasing (Pointer disambiguation problem)• Dynamic loop limits• Complex control flow• Irregular array accesses • Inter-procedural dependencies

– Ambiguous dependencies limit extracted parallelism/performance:• Complicate static dependency analysis• Introduce imprecision into dependence relations • Force conservative performance-degrading synchronization to safely handle potential

dependencies. Parallelism may exist in algorithm, but code hides it.

• Manual parallelization can provide good performance on a much wider range of applications:

– Requires different initial program design/data structures/algorithms– Programmers with additional skills.– Handling ambiguous dependencies present in general-purpose applications may still force

conservative synchronization greatly limiting parallel performance

• Can hardware help the situation?

Page 10: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#10 lec # 10 Fall 2005 10-24-2005

Possible Limited Parallel Software Solution:Data Speculation &

Thread Level Speculation (TLS)• Data speculation and Thread Level Speculation (TLS)

enable parallelization without regard for data dependencies– Normal sequential program is broken up into speculative threads– Speculative threads are now run in parallel on multiple physical CPUs (e.g. CMP)

and/or logical CPUs (e.g. SMT). – Speculation hardware (TLS processor) architecture ensures correctness

• Parallel software implications– Loop parallelization is now easily automated– Ambiguous dependencies resolved dynamically without conservative

synchronization – More “arbitrary” threads are possible (subroutines)– Add synchronization only for performance

• Thread Level Speculation (TLS) hardware support mechanisms– Speculative thread control mechanism– Five basic speculation hardware/memory system requirements for correct

data/thread speculation

Page 11: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#11 lec # 10 Fall 2005 10-24-2005

Subroutine Thread Speculation

Speculated Thread

Page 12: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#12 lec # 10 Fall 2005 10-24-2005

Loop Iteration Speculative Threads

A Simple example of a speculatively executed loop usingData/Thread Level Speculation (TLS)

Original Sequential(Single Thread)Loop

Speculated Threads

Most commonApplication of TLS

Page 13: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#13 lec # 10 Fall 2005 10-24-2005

Overview of Loop-Iteration Thread Speculation• Parallel regions (loop iterations) are annotated by the compiler.

– e.g. Begin_Speculation … End_Speculation• The hardware uses these annotations to run loop iterations in

parallel as speculated threads on a number of CPUs.• Each CPU knows which loop iteration it is running• CPUs dynamically prevent data/name dependency violations

– “later” iterations can’t use data before write by “earlier” iterations (RAW)

– “earlier” iterations never see writes by “later” iterations (WAW, WAR hazards prevented):

• Multiple views of memory are created by TLS hardware• If a “later” iteration has used data that an “earlier” iteration writes

(RAW hazard), it is restarted– All following iterations are halted and restarted, also– All writes by the later iteration are discarded (undo speculated

work).

Page 14: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#14 lec # 10 Fall 2005 10-24-2005

Hydra’s Data & Thread Speculation Operations

Page 15: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#15 lec # 10 Fall 2005 10-24-2005

Hydra Loop Compiling for SpeculationHydra Loop Compiling for Speculation

SpeculatedThreads

Page 16: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#16 lec # 10 Fall 2005 10-24-2005

Loop Execution with Thread Speculation

Data Dependency Violation(RAW hazard)

Page 17: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#17 lec # 10 Fall 2005 10-24-2005

Speculative Thread Creation in Hydra

Register Passing Buffer(RPB)

Page 18: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#18 lec # 10 Fall 2005 10-24-2005

Speculative Data Access in Speculated Threadsi Less Speculated thread i+1 More speculated thread

WAR

RAW

WAW

i

i+1

Write by i+1Not seen by i

Page 19: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#19 lec # 10 Fall 2005 10-24-2005

To provide the desired memory behavior, the data/thread speculation hardware must provide:

1. A method for detecting true memory dependencies, in order to determine when a dependency has been violated (RAW hazard).

2. A method for backing up and re-executing speculative loads and any instructions that may be dependent upon them when the load causes a violation.

3. A method for buffering any data written during a speculative region of a program so that it may be discarded when a violation occurs or permanently committed at the right time.

Speculative Data Access in Speculated Threads

Page 20: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#20 lec # 10 Fall 2005 10-24-2005

Five Basic Speculation Hardware Requirements For Correct Data/Thread Speculation

1. Forward data between parallel threads (RAW). A speculative system must be able to forward shared data quickly and efficiently from an earlier thread running on one processor to a later thread running on another.

2. Detect when reads occur too early (RAW hazards). If a data value is read by a later thread and subsequently written by an earlier thread, the hardware must notice that the read retrieved incorrect data since a true dependence violation has occurred.

3. Safely discard speculative state after violations. All speculative changes to the machine state must be discarded after a violation, while no permanent machine state may be lost in the process.

4. Retire speculative writes in the correct order (WAW hazards). Once speculative threads have completed successfully, their state must be added to the permanent state of the machine in the correct program order, considering the original sequencing of the threads.

5. Provide memory renaming (WAR hazards). The speculative hardware must ensure that the older thread cannot “see” any changes made by later threads, as these would not have occurred yet in the original sequential program. (i.g. Multiple views of memory)

Page 21: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#21 lec # 10 Fall 2005 10-24-2005

Speculative Hardware/Memory Requirements 1-2

(RAW)

(RAW hazard or violation)

1

2

More SpeculatedThread

Page 22: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#22 lec # 10 Fall 2005 10-24-2005

Speculative Hardware/Memory Requirements 3-4

(RAW hazard).

(prevent WAW hazards)

Restart

3 4

More Speculated Thread

Page 23: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#23 lec # 10 Fall 2005 10-24-2005

Speculative Hardware/Memory Requirement 5

Memory Renaming to prevent WAR hazards.

Write X by i+1 not visible toless speculatedthreads (thread i here)(i.e. no WAR hazard)

Lessspeculated Thread i

More Speculated Thread i + 1

Even more Speculated Threadi + 2

Not visible to less speculated thread i

5

Page 24: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#24 lec # 10 Fall 2005 10-24-2005

Hydra Thread Level Speculation (TLS) Hardware

Page 25: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#25 lec # 10 Fall 2005 10-24-2005

Hydra Thread Level Speculation (TLS) Support

Page 26: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#26 lec # 10 Fall 2005 10-24-2005

L1 Cache Tag Details

- Record writes of more speculated threads

Page 27: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#27 lec # 10 Fall 2005 10-24-2005

L2 Speculation Buffer Details

Page 28: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#28 lec # 10 Fall 2005 10-24-2005

The Operation of Speculative Loads

Check First

Check LastDo Not Check: More SpeculatedLater writes not visible (otherwise WAR)

LessSpeculative

MoreSpeculative

Page 29: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#29 lec # 10 Fall 2005 10-24-2005

Reading L2 Cache Speculative Buffers

Similar to last slide

Page 30: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#30 lec # 10 Fall 2005 10-24-2005

The Operation of Speculative Stores

Less Speculated More Speculated

Similar to invalidatecache coherency protocolsRAW

Detection

Page 31: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#31 lec # 10 Fall 2005 10-24-2005

Hydra’s Handling of Five Basic Speculation Hardware Requirements For Correct Data/Thread Speculation

1. Forward data between parallel threads (RAW). – When a speculative thread writes data over the write bus, all

more-speculative threads that may need the data have their current copy of that cache line invalidated.

– This is similar to the way the system works during non-speculative operation (invalidate cache coherency protocol).

– If any of the threads subsequently need the new speculative dataforwarded to them, they will miss in their primary cache and access the secondary cache.

• The speculative data contained in the write buffers of the current or older threads replaces data returned from the secondary cache on a byte-by-byte basis just before the composite line is returned to the processor and primary cache.

SpeculativeLoad

Page 32: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#32 lec # 10 Fall 2005 10-24-2005

2. Detect when reads occur too early (RAW hazards). – Primary cache bits are set to mark any reads that may cause

violations. – Subsequently, if a write to that address from an earlier thread (less

speculated) invalidates the address, a violation is detected, and the thread is restarted.

3. Safely discard speculative state after violations.– Since all permanent machine state in Hydra is always maintained

within the secondary cache, anything in the primary caches and secondary cache speculation buffers may be invalidated at any time without risking a loss of permanent state.

• As a result, any lines in the primary cache containing speculative data (marked with a special modified bit) may simply be invalidated all at once to clear any speculative state from a primary cache.

• In parallel with this operation, the secondary cache buffer for the thread may be emptied to discard any speculative data written by the thread.

Hydra’s Handling of Five Basic Speculation Hardware Requirements For Correct Data/Thread Speculation

Page 33: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#33 lec # 10 Fall 2005 10-24-2005

4. Retire speculative writes in the correct order (WAW hazards).– Separate secondary cache speculation buffers are maintained for each thread. As

long as these are drained into the secondary cache in the original program sequenceof the threads, they will reorder speculative memory references correctly.

5. Provide memory renaming (WAR hazards). – Each processor can only read data written by itself or earlier threads (less

speculated threads) when reading its own primary cache or the secondary cache speculation buffers.

– Writes from later threads don’t cause immediate invalidations in the primary cache, since these writes should not be visible to earlier (less speculative) threads.

– However, these “ignored” invalidations are recorded using an additional pre-invalidate primary cache bit associated with each line. This is because they must be processed before a different speculative or non-speculative thread executes on this processor.

– If future threads have written to a particular line in the primary cache, the pre-invalidate bit for that line is set. When the current thread completes, these bits allow the processor to quickly simulate the effect of all stored invalidations caused by all writes from later processors all at once, before a new thread begins execution on this processor.

Hydra’s Handling of Five Basic Speculation Hardware Requirements For Correct Data/Thread Speculation

More speculative writes not visible

Page 34: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#34 lec # 10 Fall 2005 10-24-2005

Thread Speculation Performance• Results representative of entire uniprocessor applications• Simulated with accurate modeling of Hydra’s memory and

hardware speculation support.

Page 35: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#35 lec # 10 Fall 2005 10-24-2005

Hydra Conclusions

• Hydra offers a number of advantages– Good performance on parallel applications– Promising performance on difficult to parallelize

sequential (single-threaded) applications using data/Thread Level Speculation (TLS) mechanisms.

– Scalable, modular design– Low hardware overhead support for speculative thread

parallelism, yet greatly increases the number of parallel applications.

Page 36: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#36 lec # 10 Fall 2005 10-24-2005

Other Thread Level Speculation (TLS) Efforts:

Wisconsin Multiscalar (1995)• This CMP-based design proposed the first reasonable hardware to

implement TLS. • Unlike Hydra, Multiscalar implements a ring-like network between all

of the processors to allow direct register-to-register communication. – Along with hardware-based thread sequencing, this type of communication

allows much smaller threads to be exploited at the expense of more complex processor cores.

• The designers proposed two different speculative memory systems to support the Multiscalar core. – The first was a unified primary cache, or address resolution buffer (ARB).

Unfortunately, the ARB has most of the complexity of Hydra’s secondary cache buffers at the primary cache level, making it difficult to implement.

– Later, they proposed the speculative versioning cache (SVC). • The SVC uses write-back primary caches to buffer speculative writes in the

primary caches, using a sophisticated coherence scheme.

Page 37: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#37 lec # 10 Fall 2005 10-24-2005

• This CMP-with-TLS proposal is very similar to Hydra,– Including the use of software speculation handlers.

• However, the hardware is simpler than Hydra’s. • The design uses write-back primary caches to buffer writes—

similar to those in the SVC—and sophisticated compiler technology to explicitly mark all memory references that require forwarding to another speculative thread.

• Their simplified SVC must drain its speculative contents as each thread completes, unfortunately resulting in heavy bursts of bus activity.

Other Thread Level Speculation (TLS) Efforts:

Carnegie-Mellon Stampede

Page 38: Data/Thread Level Speculation (TLS) in the Stanford …meseec.ce.rit.edu/eecc722-fall2005/722-10-24-2005.pdf2005/10/24  · (single thread) augmented with software thread speculation

EECC722 EECC722 -- ShaabanShaaban#38 lec # 10 Fall 2005 10-24-2005

• This CMP design has three processors that share a primary cache and can communicate register-to-register through a crossbar.

• Each processor can also switch dynamically among several threads. (TLS & SMT??)

• As a result, the hardware connecting processors together is quite complex and slow.

• However, programs executed on the M-machine can be parallelized using very fine-grain mechanisms that are impossible on an architecture that shares outside of the processor cores, like Hydra.

• Performance results show that on typical applications extremely fine-grained parallelization is often not as effective as parallelismat the levels that Hydra can exploit. The overhead incurred by frequent synchronizations reduces the effectiveness.

Other Thread Level Speculation (TLS) Efforts:

MIT M-machine