hyperthreadtechnology-130831120940-phpapp02
Transcript of hyperthreadtechnology-130831120940-phpapp02
8/10/2019 hyperthreadtechnology-130831120940-phpapp02
http://slidepdf.com/reader/full/hyperthreadtechnology-130831120940-phpapp02 1/16
Introduction:Throughout the evolution of the IA-32 Intel Architecture,
Intel has continuously added innovations to the architecture to improve
processor performance and to address specific needs of compute intensive
applications. The latest of these innovations is Hyper-Threading technology,
which Intel has developed to improve the performance of IA-32 processors
when executing multiple-processor !"# capa$le operating systems and multi-
threaded applications.
Intel%s Hyper-Threading Technology $rings the concept of
simultaneous multi-threading to the Intel Architecture. Hyper-Threading
Technology ma&es a single physical processor appear as two logical processors'
the physical execution resources are shared and the architecture state is
duplicated for the two logical processors. (rom a software or architecture
perspective, this means operating systems and user programs can schedule processes or threads to logical processors as they would on multiple physical
processors. (rom a micro architecture perspective, this means that instructions
from $oth logical processors will persist and execute simultaneously on shared
execution resources.
What is Hyper Thread Technology?
Hyper-Thread Technology allows a single physical processor to execute multiple threads instruction streams# simultaneously,
potentially providing greater throughput and improved performance.
Intel%s Hyper-Thread Technology $rings the concept of
simultaneous multi-threading to the Intel Architecture. Hyper-Threading
Technology ma&es a single physical processor appear as two logical processors'
the physical execution resources are shared and the architecture state is
duplicated for the two logical processors. (rom a software or architecture
perspective, this means operating systems and user programs can schedule
processes or threads to logical processors as they would on multiple physical processors. (rom a micro architecture perspective, this means that instructions
from $oth logical processors will persist and execute simultaneously on shared
execution resources.
8/10/2019 hyperthreadtechnology-130831120940-phpapp02
http://slidepdf.com/reader/full/hyperthreadtechnology-130831120940-phpapp02 2/16
Hyper Thread Technology Architecture:
Hyper Thread Technology ma&es a single physical processor
appear as multiple logical processors. To do this, there is one copy of the
architecture state for each logical processor, and the logical processors sharea single set of physical execution resources. (rom a software or architecture
perspective, this means operating systems and user programs can schedule
processes or threads to logical processors as they would on conventional
physical processors in a multi-processor system. (rom a micro architecture
perspective, this means that instructions from logical processors will persist
and execute simultaneously on shared execution resources.
)iagramIn the a$ove figure, a multiprocessor system with two physical
processors those are not capa$le with Hyper Thread Technology.
)iagramThe a$ove figure shows a multiprocessor system with tow
physical processors that are Hyper Thread Technology capa$le. *ith two
copies of the architectural state on each physical processor, the system
appears to have four logical processors.
The first implementation of Hyper Thread Technology is $eing
made availa$le on the Intel+ eon processor family for dual and
multiprocessor servers, with two logical processors per physical processor.
y more efficiently using existing processor resources, the Intel eon
processor family can significantly improve performance at virtually the
same system cost. This implementation of Hyper Thread Technology added
less than /0 to the relative chip si1e and maximum power reuirements, $ut
can provide performance $enefits much greater than that.
ach logical processor maintains a complete set of the
architecture state. The architecture state consists of registers including the
general-purpose registers, the control registers, the advanced programma$le
interrupt controller A"I4# registers, and some machine state registers. (rom
a software perspective, once the architecture state is duplicated, the
processor appears to $e two processors. The num$er of transistors to store
8/10/2019 hyperthreadtechnology-130831120940-phpapp02
http://slidepdf.com/reader/full/hyperthreadtechnology-130831120940-phpapp02 3/16
the architecture state is an extremely small fraction of the total. 5ogical
processors share nearly all other resources on the physical processor, such as
caches, execution units, $ranch predictors, control logic, and $uses.
ach logical processor has its own interrupt controller or A"I4.
Interrupts sent to a specific logical processor are handled only $y that logical
processor.
FIRST IMPLEMETATI! ! THE ITEL "E!
PR!#ESS!R FAMIL$
6everal goals were at the heart of the micro architecture design
choices made for the Intel+ eonT! processor !" implementation of Hyper-
Threading Technology. 7ne goal was to minimi1e the die area cost of
implementing Hyper-Threading Technology. 6ince the logical processorsshare the vast ma8ority of micro architecture resources and only a few small
structures were replicated, the die area cost of the first implementation was
less than /0 of the total die area.
A second goal was to ensure that when one logical processor is
stalled the other logical processor could continue to ma&e forward progress.
A logical processor may $e temporarily stalled for a variety of reasons,
including servicing cache misses, handling $ranch miss predictions, or
waiting for the results of previous instructions. Independent forward
progress was ensured $y managing $uffering ueues such that no logical
processor can use all the entries when two active software threads2 were
executing. This is accomplished $y either partitioning or limiting the num$er
of active entries each thread can have.
A third goal was to allow a processor running only one active
software thread to run at the same speed on a processor with Hyper-
Threading Technology as on a processor without this capa$ility. This means
that partitioned resources should $e recom$ined when only one software
thread is active. A high-level view of the micro architecture pipeline isshown in (igure 9. As shown, $uffering ueues separate ma8or pipeline logic
$loc&s. The $uffering ueues are either partitioned or duplicated to ensure
independent forward progress through each logic $loc&.
8/10/2019 hyperthreadtechnology-130831120940-phpapp02
http://slidepdf.com/reader/full/hyperthreadtechnology-130831120940-phpapp02 4/16
Active software threads include the operating system idle loop
$ecause it runs a seuence of code that continuously chec&s the wor&
ueues#. The operating system idle loop can consume considera$le
execution resources.
Instruction Scheduling:
The schedulers are at the heart of the out-of-order execution
engine. (ive uop schedulers are used to schedule different types of uops for
the various execution units. 4ollectively, they can dispatch up to six uops
each cloc& cycle. The schedulers determine when uops are ready to execute
$ased on the readiness of their dependent input register operands and the
availa$ility of the execution unit resources.
The memory instruction ueue and general instruction ueuessend uops to the five scheduler ueues as fast as they can, alternating
$etween uops for the two logical processors every cloc& cycle, as needed.
ach scheduler has its own scheduler ueue of eight to twelve
entries from which it selects uops to send to the execution units. The
schedulers choose uops regardless of whether they $elong to one logical
processor or the other. The schedulers are effectively o$livious to logical
processor distinctions. The uops are simply evaluated $ased on dependent
inputs and availa$ility of execution resources. (or example, the schedulers
could dispatch two uops from one logical processor and two uops from theother logical processor in the same cloc& cycle. To avoid deadloc& and
ensure fairness, there is a limit on the num$er of active entries that a logical
processor can have in each scheduler%s ueue. This limit is dependent on the
si1e of the scheduler ueue.
FR!T E%:
The front end of the pipeline is responsi$le for delivering
instructions to the later pipe stages. Instructions generally come from thexecution Trace 4ache T4#, which is the primary or 5evel : 5:#
instruction cache. (igure /$ shows that only when there is a T4 miss does
the machine fetch and decode instructions from the integrated 5evel 2 52#
cache. ;ear the T4 is the !icrocode <7!, which stores decoded
instructions for the longer and more complex IA-32 instructions.
8/10/2019 hyperthreadtechnology-130831120940-phpapp02
http://slidepdf.com/reader/full/hyperthreadtechnology-130831120940-phpapp02 5/16
)IA=<A!
E&ecution Trace #ache 'T#(:
The T4 stores decoded instructions, called micro-operations or
>uops.> !ost instructions in a program are fetched and executed from the
T4. Two sets of next-instruction-pointers independently trac& the progress of
the two software threads executing. The two logical processors ar$itrate
access to the T4 every cloc& cycle. If $oth logical processors want access to
the T4 at the same time, access is granted to one then the other in alternating
cloc& cycles. (or example, if one cycle is used to fetch a line for one logical
processor, the next cycle would $e used to fetch a line for the other logical
processor, provided that $oth logical processors reuested access to the trace
cache. If one logical processor is stalled or is una$le to use the T4, the other
logical processor can use the full $andwidth of the trace cache, every cycle.
The T4 entries are tagged with thread information and are
dynamically allocated as needed. The T4 is ?-way set associative, and
entries are replaced $ased on a least-recently-used 5<@# algorithm that is
$ased on the full ? ways. The shared nature of the T4 allows one logical
processor to have more entries than the other if needed.
Microcode R!M:
*hen a complex instruction is encountered, the T4 sends a
microcode-instruction pointer to the !icrocode <7!. The !icrocode <7!
controller then fetches the uops needed and returns control to the T4. Two
microcode instruction pointers are used to control the flows independently if
$oth logical processors are executing complex IA-32 instructions.
oth logical processors share the !icrocode <7! entries.
Access to the !icrocode <7! alternates $etween logical processors 8ust as
in the T4.
8/10/2019 hyperthreadtechnology-130831120940-phpapp02
http://slidepdf.com/reader/full/hyperthreadtechnology-130831120940-phpapp02 6/16
ITL) and )ranch Prediction:
If there is a T4 miss, then instruction $ytes need to $e fetched
from the 52 cache and decoded into uops to $e placed in the T4. The
Instruction Translation 5oo& a side uffer IT5# receives the reuest from
the T4 to deliver new instructions, and it translates the next-instruction
pointer address to a physical address. A reuest is sent to the 52 cache, and
instruction $ytes are returned. These $ytes are placed into streaming $uffers,
which hold the $ytes until they can $e decoded.
The IT5s are duplicated. ach logical processor has its own
IT5 and its own set of instruction pointers to trac& the progress of
instruction fetch for the two logical processors. The instruction fetch logic in
charge of sending reuests to the 52 cache ar$itrates on a first-come first-
served $asis, while always reserving at least one reuest slot for each logical processor. In this way, $oth logical processors can have fetches pending
simultaneously.
ach logical processor has its own set of two 9-$yte streaming
$uffers to hold instruction $ytes in preparation for the instruction decode
stage. The IT5s and the streaming $uffers are small structures, so the die
si1e cost of duplicating these structures is very low.
The $ranch prediction structures are either duplicated or shared.
The return stac& $uffer, which predicts the target of return instructions, isduplicated $ecause it is a very small structure and the callBreturn pairs are
$etter predicted for software threads independently. The $ranch history
$uffer used to loo& up the glo$al history array is also trac&ed independently
for each logical processor. However, the large glo$al history array is a
shared structure with entries that are tagged with a logical processor I).
S*PP!RTI+ IA,-. PR!#ESS!RS:
An IA-32 processor with Hyper Thread Technology will appear to software as two independent IA-32 processors, similar to two physical
processors in a traditional )" platform. This configuration allows operating
system and application software that is already designed to run on a
traditional )" or !" system to run unmodified on a platform that uses one
or more IA-32 processors with Hyper Thread Technology. Here, the multiple
threads that would $e dispatched to two or more physical processors are now
8/10/2019 hyperthreadtechnology-130831120940-phpapp02
http://slidepdf.com/reader/full/hyperthreadtechnology-130831120940-phpapp02 7/16
dispatched to the logical processors in one or more IA-32 processors with
Hyper Thread Technology. At the firmware I76# level, the $asic
procedures to initiali1e multiple processors with Hyper Thread Technology
in an !" platform resem$le closely those for a traditional !" platform9. An
operating system designed to run on an traditional )" or !" platform can
use the 4"@-I) instruction to detect the presence of IA-32 processors with
Hyper Thread Technology. The same mechanisms that are descri$ed in the
multiprocessor specification version :.9 to wa&e physical processors apply
to the logical processors in an IA-32 with Hyper Thread Technology.
Although existing operating system and application code will run correctly
on a processor with Hyper Thread Technology, some relatively simple code
modifications are recommended to get the optimum $enefit from Hyper
Thread Technology.
IMPLEMETATI! !F IA,-. PR!#ESS!RS:
IA,-. Instruction %ecode:
IA-32 instructions are cum$ersome to decode $ecause the
instructions have a varia$le num$er of $ytes and have many different
options. A significant amount of logic and intermediate state is needed to
decode these instructions. (ortunately, the T4 provides most of the uops,
and decoding is only needed for instructions that miss the T4.
The decode logic ta&es instruction $ytes from the streaming $uffers and decodes them into uops. *hen $oth threads are decoding
instructions simultaneously, the streaming $uffers alternate $etween threads
so that $oth threads share the same decoder logic. The decode logic has to
&eep two copies of all the state needed to decode IA-32 instructions for the
two logical processors even though it only decodes instructions for one
logical processor at a time. In general, several instructions are decoded for
one logical processor $efore switching to the other logical processor. The
decision to do a coarser level of granularity in switching $etween logical
processors was made in the interest of die si1e and to reduce complexity. 7f course, if only one logical processor needs the decode logic, the full decode
$andwidth is dedicated to that logical processor. The decoded instructions
are written into the T4 and forwarded to the uop ueue.
8/10/2019 hyperthreadtechnology-130831120940-phpapp02
http://slidepdf.com/reader/full/hyperthreadtechnology-130831120940-phpapp02 8/16
*op /ueue:
After uops are fetched from the trace cache or the !icrocode
<7!, or forwarded from the instruction decode logic, they are placed in a
>uop ueue.> This ueue decouples the (ront nd from the 7ut-of-order
xecution ngine in the pipeline flow. The uop ueue is partitioned such
that each logical processor has half the entries. This partitioning allows $oth
logical processors to ma&e independent forward progress regardless of front-
end stalls e.g., T4 miss# and execution stalls.
!*T,!F,!R%ER E"E#*TI! E+IEC
The out-of-order execution engine consists of the allocation,
register renaming, scheduling, and execution functions, as shown in (igure. This part of the machine re-orders instructions and executes them as
uic&ly as their inputs are ready, without regard to the original program
order.
Allocator:
The out-of-order execution engine has several $uffers to
perform its re-ordering, tracing, and seuencing operations. The allocator
logic ta&es uops from the uop ueue and allocates many of the &ey machine
$uffers needed to execute each uop, including the :2 re-order $uffer
entries, :2? integer and :2? floating-point physical registers, 9? load and 29
store $uffer entries. 6ome of these &ey $uffers are partitioned such that each
logical processor can use at most half the entries. 6pecifically, each logical
processor can use up to a maximum of 3 re-order $uffer entries, 29 load
$uffers, and :2 store $uffer entries.
If there are uops for $oth logical processors in the uop ueue,
the allocator will alternate selecting uops from the logical processors every
cloc& cycle to assign resources. If a logical processor has used its limit of aneeded resource, such as store $uffer entries, the allocator will signal >stall>
for that logical processor and continue to assign resources for the other
logical processor. In addition, if the uop ueue only contains uops for one
logical processor, the allocator will try to assign resources for that logical
processor every cycle to optimi1e allocation $andwidth, though the resource
limits would still $e enforced.
8/10/2019 hyperthreadtechnology-130831120940-phpapp02
http://slidepdf.com/reader/full/hyperthreadtechnology-130831120940-phpapp02 9/16
y limiting the maximum resource usage of &ey $uffers, the
machine helps enforce fairness and prevents deadloc&s.
Register Rena0e:
The register rename logic renames the architectural IA-32
registers onto the machine%s physical registers. This allows the ? general-use
IA-32 integer registers to $e dynamically expanded to use the availa$le :2?
physical registers. The renaming logic uses a <egister Alias Ta$le <AT# to
trac& the latest version of each architectural register to tell the next
instructions# where to get its input operands.
6ince each logical processor must maintain and trac& its own
complete architecture state, there are two <ATs, one for each logical
processor. The register renaming process is done in parallel to the allocator logic descri$ed a$ove, so the register rename logic wor&s on the same uops
to which the allocator is assigning resources.
7nce uops have completed the allocation and register rename
processes, they are placed into two sets of ueues, one for memory
operations loads and stores# and another for all other operations. The two
sets of ueues are called the memory instruction ueue and the general
instruction ueue, respectively. The two sets of ueues are also partitioned
such that uops from each logical processor can use at most half the entries.
E&ecution *nits:
The execution core and memory hierarchy are also largely
o$livious to logical processors. 6ince the source and destination registers
were renamed earlier to physical registers in a shared physical register pool,
uops merely access the physical register file to get their destinations, and
they write results $ac& to the physical register file. 4omparing physical
register num$ers ena$les the forwarding logic to forward results to other
executing uops without having to understand logical processors.
After execution, the uops are placed in the re-order $uffer. The
re-order $uffer decouples the execution stage from the retirement stage. The
re-order $uffer is partitioned such that each logical processor can use half
the entries.
8/10/2019 hyperthreadtechnology-130831120940-phpapp02
http://slidepdf.com/reader/full/hyperthreadtechnology-130831120940-phpapp02 10/16
Retire0ent:
@op retirement logic commits the architecture state in program
order. The retirement logic trac&s when uops from the two logical processors
are ready to $e retired, then retires the uops in program order for each logical
processor $y alternating $etween the two logical processors. <etirement
logic will retire uops for one logical processor, then the other, alternating
$ac& and forth. If one logical processor is not ready to retire any uops then
all retirement $andwidth is dedicated to the other logical processor.
7nce stores have retired, the store data needs to $e written into
the level-one data cache. 6election logic alternates $etween the two logical
processors to commit store data to the cache.
MEM!R$ S*)S$STEM:
The memory su$system includes the )T5, the low-latency
5evel : 5:# data cache, the 5evel 2 52# unified cache, and the 5evel 3
unified cache the 5evel 3 cache is only availa$le on the Intel+ eonT!
processor !"#. Access to the memory su$system is also largely o$livious to
logical processors. The schedulers send load or store uops without regard to
logical processors and the memory su$system handles them as they come.
%TL):
The )T5 translates addresses to physical addresses. It has 9
fully associative entries' each entry can map either a 9D or a 9! page.
Although the )T5 is a shared structure $etween the two logical processors,
each entry includes a logical processor I) tag. ach logical processor also
has a reservation register to ensure fairness and forward progress in
processing )T5 misses.
L1 %ata #ache2 L. #ache2 L- #ache:
The 5: data cache is 9-way set associative with 9-$yte lines. It
is a write-through cache, meaning that writes are always copied to the 52
cache. The 5: data cache is virtually addressed and physically tagged.
8/10/2019 hyperthreadtechnology-130831120940-phpapp02
http://slidepdf.com/reader/full/hyperthreadtechnology-130831120940-phpapp02 11/16
The 52 and 53 caches are ?-way set associative with :2?-$yte
lines. The 52 and 53 caches are physically addressed. oth logical
processors, without regard to which logical processor%s uops may have
initially $rought the data into the cache, can share all entries in all three
levels of cache.
ecause logical processors can share data in the cache, there is
the potential for cache conflicts, which can result in lower o$served
performance. However, there is also the possi$ility for sharing data in the
cache. (or example, one logical processor may prefetch instructions or data,
needed $y the other, into the cache' this is common in server application
code. In a producer-consumer usage model, one logical processor may
produce data that the other logical processor wants to use. In such cases,
there is the potential for good performance $enefits.
)*S C
5ogical processor memory reuests not satisfied $y the cache
hierarchy are serviced $y the $us logic. The $us logic includes the local
A"I4 interrupt controller, as well as off-chip system memory and IB7 space.
us logic also deals with cachea$le address coherency snooping# of
reuests originated $y other external $us agents, plus incoming interrupt
reuest delivery via the local A"I4s.
(rom a service perspective, reuests from the logical processorsare treated on a first-come $asis, with ueue and $uffering space appearing
shared. "riority is not given to one logical processor a$ove the other.
)istinctions $etween reuests from the logical processors are
relia$ly maintained in the $us ueues nonetheless. <euests to the local
A"I4 and interrupt delivery resources are uniue and separate per logical
processor. us logic also carries out portions of $arrier fence and memory
ordering operations, which are applied to the $us reuest ueues on a per
logical processor $asis.
(or de$ug purposes, and as an aid to forward progress
mechanisms in clustered multiprocessor implementations, the logical
processor I) is visi$ly sent onto the processor external $us in the reuest
phase portion of a transaction. 7ther $us transactions, such as cache line
8/10/2019 hyperthreadtechnology-130831120940-phpapp02
http://slidepdf.com/reader/full/hyperthreadtechnology-130831120940-phpapp02 12/16
eviction or prefetch transactions, inherit the logical processor I) of the
reuest that generated the transaction
SI+LE,TAS3 A% M*LTI,TAS3 M!%ESC
To optimi1e performance when there is one software thread to
execute, there are two modes of operation referred to as single-tas& 6T# or
multi-tas& !T#. In !T-mode, there are two active logical processors and
some of the resources are partitioned as descri$ed earlier. There are two
flavors of 6T-modeC single-tas& logical processor E 6TE# and single-tas&
logical processor : 6T:#. In 6TE- or 6T:-mode, only one logical processor
is active, and resources that were partitioned in !T-mode are re-com$ined to
give the single active logical processor use of all of the resources. The IA-32
Intel Architecture has an instruction called HA5T that stops processor
execution and normally allows the processor to go into a lower-power mode.HA5T is a privileged instruction, meaning that only the operating system or
other ring-E processes may execute this instruction. @ser-level applications
cannot execute HA5T.
7n a processor with Hyper-Threading Technology, executing
HA5T transitions the processor from !T-mode to 6TE- or 6T:-mode,
depending on which logical processor executed the HA5T. (or example, if
logical processor E executes HA5T, only logical processor : would $e
active' the physical processor would $e in 6T:-mode and partitioned
resources would $e recom$ined giving logical processor : full use of all
processor resources. If the remaining active logical processor also executes
HA5T, the physical processor would then $e a$le to go to a lower-power
mode.
In 6TE- or 6T:-modes, an interrupt sent to the Halted processor
would cause a transition to !T-mode. The operating system is responsi$le
for managing !T-mode transitions descri$ed in the next section#.
The Intel+ eonT!
processor family delivers the highest server system performance of any IA-32 Intel architecture processor introduced to
date. Initial $enchmar& tests show up to a /0 performance increase on
high-end server applications when compared to the previous-generation
"entium+ III eon processor on 9-way server platforms. A significant
portion of those gains can $e attri$uted to Hyper-Threading Technology.
8/10/2019 hyperthreadtechnology-130831120940-phpapp02
http://slidepdf.com/reader/full/hyperthreadtechnology-130831120940-phpapp02 13/16
)IA=<A!
The online transaction processing performance, scaling from a
single-processor configuration through to a 9-processor system with Hyper-
Threading Technology ena$led. This graph is normali1ed to the performanceof the single-processor system. It can $e seen that there is a significant
overall performance gain attri$uta$le to Hyper-Threading Technology, 2:0
in the cases of the single and dual-processor systems.
The $enefit of Hyper-Threading Technology when executing
other server-centric $enchmar&s. The wor&loads chosen were two different
$enchmar&s that are designed to exercise data and *e$ server characteristics
and a wor&load that focuses on exercising a server-side Fava environment. In
these cases the performance $enefit ranged from : to 2?0.
All the performance results uoted a$ove are normali1ed to
ensure that readers focus on the relative performance and not the a$solute
performance.
"erformance tests and ratings are measured using specific
computer systems andBor components and reflect the approximate
performance of Intel products as measured $y those tests. Any difference in
system hardware or software design or configuration may affect actual
performance. uyers should consult other sources of information to evaluatethe performance of systems or components they are considering purchasing.
Hyper Thread Technology and Windo4s Strea0s:
*indows-$ased servers receive processor information from the
I76. ach server vendor creates their own I76 using specifications
provided $y Intel. Assuming the I76 is written according to Intel
specifications, it $egins counting processors using the first logical processor
on each physical processor. 7nce it has counted a logical processor on all of
the physical processors, it will count the second logical processor on each physical processor, and so on.
8/10/2019 hyperthreadtechnology-130831120940-phpapp02
http://slidepdf.com/reader/full/hyperthreadtechnology-130831120940-phpapp02 14/16
;um$ers indicate the order in which logical processors are recognised $y I76 when
written according to the Intel specifications. This diagram shows a four way systemena$led with Hyper Thread Technology.
It is critical that the I76 count logical processors in the
manner descri$ed' otherwise, *indows 2EEE or its applications may use
logical processors when they should $e using physical processors instead.
(or example, consider an application that is licensed to use two processors
on the system diagrammed a$ove. 6uch an application will achieve $etter
performance using two separate physical processors than it would use two
logical processors on the same physical processor.
#!#L*SI!:
IntelGs Hyper Thread Technology $rings the concept of
simultaneous multi-threading to the Intel Architecture. This is a significant
new technology direction for IntelGs future processors. It will $e $ecome
increasingly important going forward as it adds a new techniue for
o$taining additional performance for lower transistor and power costs. The
first implementation of Hyper Thread Technology was done on the Intel+
eon processor !". In this implementation there are two logical
processors on each physical processor. The logical processors have their own
independent architecture state, $ut they share nearly all the physical
execution and hardware resources of the processor. The goal was to
implement the technology at minimum cost while ensuring forward progress
on logical processors, even if the other is stalled, and to deliver full
: / 2 3 9 ?
Logical processors
Physical Processors
8/10/2019 hyperthreadtechnology-130831120940-phpapp02
http://slidepdf.com/reader/full/hyperthreadtechnology-130831120940-phpapp02 15/16
performance even when there is only one active logical processor. These
goals were achieved through efficient logical processor selection algorithms
and the creative partitioning and recom$ining algorithms of many &ey
resources. !easured performance on the Intel eon processor !" with
Hyper Thread Technology shows performance gains of up to 3E0 on
common server application $enchmar&s for this technology.
The potential of Hyper Thread Technology is tremendous' our
current implementation has only 8ust $egun to tap into this potential. Hyper
Thread Technology is expected to $e via$le from mo$ile processors to
servers' its introduction into mar&et segments other than server is only gated
$y the availa$ility and prevalence of threaded applications and wor&loads in
those mar&ets.
8/10/2019 hyperthreadtechnology-130831120940-phpapp02
http://slidepdf.com/reader/full/hyperthreadtechnology-130831120940-phpapp02 16/16