The Edwardsville Community Foundation Announces the First ...
Hyper Threading (HT) and OPs (Micro-Operations) Department of Computer Science Southern Illinois...
-
Upload
antonia-mcgee -
Category
Documents
-
view
220 -
download
1
Transcript of Hyper Threading (HT) and OPs (Micro-Operations) Department of Computer Science Southern Illinois...
Hyper Threading (HT) and OPs (Micro-Operations)
Department of Computer ScienceSouthern Illinois University Edwardsville
Summer, 2015
Dr. Hiroshi FujinokiE-mail: [email protected]
New_Technologies/001
CS 312 Computer Organization and Architecture
New_Technologies/002
Technologies in the recent processors
CS 312 Computer Organization and Architecture
New_Technologies/003
• A technology that makes one processor look as if it were multiple processors
• Using unutilized function-units in a pipeline datapath
• Invented by Intel and used for the first time in Pentium-4 (3.0Ghz of faster)
Hyper-Threading (HT)
CS 312 Computer Organization and Architecture
New_Technologies/004
The problem in multi function-unit and super-scalar pipeline processors
IF ID EX ME WB
Datapath #1
IF ID EX ME WB
Datapath #2
Super-Scalar
IDIF EX ME WB
EX ME WB
EX ME WB
Multi Function-Unit
Problem
• Number of pipes increased(e.g., 6 pipes)
• Resource utilization is low
(“up to 35%” by Intel)
Why?
• “Depth” of pipeline increased(20 stages in Pentium III)
• Pipeline flashes by branches
• Data dependency
Needed to increaseclock-cycle rate
CS 312 Computer Organization and Architecture
New_Technologies/005
The problem in multi function-unit and super-scalar pipeline processors
However, low resource utilization really does not make a sense
We have low resource utilization whilea large number of processes need it!
A large number of processes running (more than 50 processes)
CS 312 Computer Organization and Architecture
New_Technologies/006
Concept of HT
Process AA Process BB
Time
FU-1
FU-2
FU-3
FU-4
Process CC Process DD
Utilization = 35/96 = 36.4%
FU-1
FU-2
FU-3
FU-4
New Utilization =35/48 = 72.9%
All processescompleted
CS 312 Computer Organization and Architecture
New_Technologies/007
Concept of HT
FU-1
FU-2
FU-3
FU-4
Process A Process C
Process B Process D
PhysicalProcessor
FU-1
FU-2
FU-3
FU-4
FU-1
FU-2
FU-3
FU-4 Two (virtual) processors
from OS view point
Why not is this technology called “Hyper Processing”?
CS 312 Computer Organization and Architecture
Bus
L1 Cache
New_Technologies/008
Hardware Implementation in HT
Bus
L1 Cache
ProcessorCore
Bus
L1 Cache
ProcessorCore
L1 Cache
ProcessorCore
VirtualProcessor
VirtualProcessor L1 Cache
is shared!
Process A Process B
CS 312 Computer Organization and Architecture
New_Technologies/009
Concept of HT
Process AA Process BB
Time
FU-1
FU-2
FU-3
FU-4
Process CC Process DD
Utilization = 35/96 = 36.4%
FU-1
FU-2
FU-3
FU-4
New Utilization =35/48 = 72.9%
CS 312 Computer Organization and Architecture
MemoryAddress
Space
A p
roce
ss
MemoryAddress
Space
New_Technologies/010
The problem in multi function-unit and super-scalar pipeline processors
Data
Code
Data
CodeA
pro
cess
Data
A p
roce
ss
Thread 1
Thread 2
Thread 3
Thread 4
CS 312 Computer Organization and Architecture
New_Technologies/011
Concept of HT
FU-1
FU-2
FU-3
FU-4
FU-1
FU-2
FU-3
FU-4
FU-1
FU-2
FU-3
FU-4
PhysicalProcessor
Data
Thread 1
Thread 2
Thread 3
Thread 4
Thread 1
Thread 2
Thread 3
Thread 4
Why not is this technology called “Hyper Processing”?
CS 312 Computer Organization and Architecture
Bus
L1 Cache
New_Technologies/012
Hardware Implementation in HT
Bus
L1 Cache
ProcessorCore
Bus
L1 Cache
ProcessorCore
L1 Cache
ProcessorCore
VirtualProcessor
VirtualProcessor L1 Cache
is shared!
Process A Process BThread A Thread B
CS 312 Computer Organization and Architecture
New_Technologies/013
Problems in HT
- After HT is used, only 5 ~ 30% improvement
- Intel explained that this is still a good improvement, relative to the cost of HT implementation
(HT requires only 5% more transistors)
- HT requires a new chip set (i.e., new motherboard) and faster main memory module
(Intel doesn’t have to pay for this cost, but you do)
• Low performance gain
• Security is still a problem
- Some network applications use each thread to process each different client (Multithreaded network server)
CS 312 Computer Organization and Architecture
New_Technologies/014
Problems in HT Multithreaded web servers (e.g., “Apache”)
Web Server
Browser
Browser
Browser
void main (void) {
while (TRUE) { accept ( ……. );
beginthread (…… ); }
}T1T2T3
Data
CS 312 Computer Organization and Architecture
New_Technologies/015
The problem in multi function-unit and super-scalar pipeline processors
• Monitor access frequency to memory address owned by a process executing SSL encryption
• Not easy to decode this information for actual encryption cracking
• Proven to be logically possible
• At least to understand what is going on in your neighbor threads
CS 312 Computer Organization and Architecture
Mot
her
boa
rd
New_Technologies/016
Other two technologies used in Intel’s processor
- MMX (Multiple Math or Matrix Math eXtension)
(improved from MMX, first introduced in Pentium III)
• SIMD (Single Instruction stream over Multiple Data stream) parallel instructions
• UMA multiprocessor architecture and MESI Cache Coherence protocol
Processor 1
Processor 2
(Du
al-P
roce
ssor
Mot
her
boa
rd)
Mai
n M
emor
y
L1 cache
L1 cache
(first introduced in Pentium processor)
Uniform Memory Access (UMA) parallel architecture
- SSE (Streaming SIMD Extension) parallel instructions
CS 312 Computer Organization and Architecture
Mot
her
boa
rd
New_Technologies/017
Other two technologies used in Intel’s processor
- MMX (Multiple Math or Matrix Math eXtension)
- SSE (Streaming SIMD Extension) parallel instructions
(improved from MMX, first introduced in Pentium III)
• SIMD (Single Instruction stream over Multiple Data stream) parallel instructions
• UMA multiprocessor architecture and MESI Cache Coherence protocol
Processor 1
Processor 2
(Du
al-P
roce
ssor
Mot
her
boa
rd)
Mai
n M
emor
y
L1 cache
L1 cache
(first introduced in Pentium processor)
Read
CS 312 Computer Organization and Architecture
Mot
her
boa
rd
New_Technologies/018
Other two technologies used in Intel’s processor
- MMX (Multiple Math or Matrix Math eXtension)
- SSE (Streaming SIMD Extension) parallel instructions
(improved from MMX, first introduced in Pentium III)
• SIMD (Single Instruction stream over Multiple Data stream) parallel instructions
• UMA multiprocessor architecture and MESI Cache Coherence protocol
Processor 1
Processor 2
(Du
al-P
roce
ssor
Mot
her
boa
rd)
Mai
n M
emor
y
L1 cache
L1 cache
(first introduced in Pentium processor)
Cache CoherencyProblem
- MESI cache coherence protocol is a solution for this problem
Read
Modified
CS 312 Computer Organization and Architecture
New_Technologies/019
SIMD Vector Computer: Cray (multiple parallel processors on a mother board)
CS 312 Computer Organization and Architecture