Hyper Threading (HT) and OPs (Micro-Operations) Department of Computer Science Southern Illinois...

20
Hyper Threading (HT) and OPs (Micro-Operations) Department of Computer Science Southern Illinois University Edwardsville Summer, 2015 Dr. Hiroshi Fujinoki E-mail: [email protected] New_Technologies/001 S 312 Computer Organization and Architecture

Transcript of Hyper Threading (HT) and OPs (Micro-Operations) Department of Computer Science Southern Illinois...

Hyper Threading (HT) and OPs (Micro-Operations)

Department of Computer ScienceSouthern Illinois University Edwardsville

Summer, 2015

Dr. Hiroshi FujinokiE-mail: [email protected]

New_Technologies/001

CS 312 Computer Organization and Architecture

New_Technologies/002

Technologies in the recent processors

CS 312 Computer Organization and Architecture

New_Technologies/003

• A technology that makes one processor look as if it were multiple processors

• Using unutilized function-units in a pipeline datapath

• Invented by Intel and used for the first time in Pentium-4 (3.0Ghz of faster)

Hyper-Threading (HT)

CS 312 Computer Organization and Architecture

New_Technologies/004

The problem in multi function-unit and super-scalar pipeline processors

IF ID EX ME WB

Datapath #1

IF ID EX ME WB

Datapath #2

Super-Scalar

IDIF EX ME WB

EX ME WB

EX ME WB

Multi Function-Unit

Problem

• Number of pipes increased(e.g., 6 pipes)

• Resource utilization is low

(“up to 35%” by Intel)

Why?

• “Depth” of pipeline increased(20 stages in Pentium III)

• Pipeline flashes by branches

• Data dependency

Needed to increaseclock-cycle rate

CS 312 Computer Organization and Architecture

New_Technologies/005

The problem in multi function-unit and super-scalar pipeline processors

However, low resource utilization really does not make a sense

We have low resource utilization whilea large number of processes need it!

A large number of processes running (more than 50 processes)

CS 312 Computer Organization and Architecture

New_Technologies/006

Concept of HT

Process AA Process BB

Time

FU-1

FU-2

FU-3

FU-4

Process CC Process DD

Utilization = 35/96 = 36.4%

FU-1

FU-2

FU-3

FU-4

New Utilization =35/48 = 72.9%

All processescompleted

CS 312 Computer Organization and Architecture

New_Technologies/007

Concept of HT

FU-1

FU-2

FU-3

FU-4

Process A Process C

Process B Process D

PhysicalProcessor

FU-1

FU-2

FU-3

FU-4

FU-1

FU-2

FU-3

FU-4 Two (virtual) processors

from OS view point

Why not is this technology called “Hyper Processing”?

CS 312 Computer Organization and Architecture

Bus

L1 Cache

New_Technologies/008

Hardware Implementation in HT

Bus

L1 Cache

ProcessorCore

Bus

L1 Cache

ProcessorCore

L1 Cache

ProcessorCore

VirtualProcessor

VirtualProcessor L1 Cache

is shared!

Process A Process B

CS 312 Computer Organization and Architecture

New_Technologies/009

Concept of HT

Process AA Process BB

Time

FU-1

FU-2

FU-3

FU-4

Process CC Process DD

Utilization = 35/96 = 36.4%

FU-1

FU-2

FU-3

FU-4

New Utilization =35/48 = 72.9%

CS 312 Computer Organization and Architecture

MemoryAddress

Space

A p

roce

ss

MemoryAddress

Space

New_Technologies/010

The problem in multi function-unit and super-scalar pipeline processors

Data

Code

Data

CodeA

pro

cess

Data

A p

roce

ss

Thread 1

Thread 2

Thread 3

Thread 4

CS 312 Computer Organization and Architecture

New_Technologies/011

Concept of HT

FU-1

FU-2

FU-3

FU-4

FU-1

FU-2

FU-3

FU-4

FU-1

FU-2

FU-3

FU-4

PhysicalProcessor

Data

Thread 1

Thread 2

Thread 3

Thread 4

Thread 1

Thread 2

Thread 3

Thread 4

Why not is this technology called “Hyper Processing”?

CS 312 Computer Organization and Architecture

Bus

L1 Cache

New_Technologies/012

Hardware Implementation in HT

Bus

L1 Cache

ProcessorCore

Bus

L1 Cache

ProcessorCore

L1 Cache

ProcessorCore

VirtualProcessor

VirtualProcessor L1 Cache

is shared!

Process A Process BThread A Thread B

CS 312 Computer Organization and Architecture

New_Technologies/013

Problems in HT

- After HT is used, only 5 ~ 30% improvement

- Intel explained that this is still a good improvement, relative to the cost of HT implementation

(HT requires only 5% more transistors)

- HT requires a new chip set (i.e., new motherboard) and faster main memory module

(Intel doesn’t have to pay for this cost, but you do)

• Low performance gain

• Security is still a problem

- Some network applications use each thread to process each different client (Multithreaded network server)

CS 312 Computer Organization and Architecture

New_Technologies/014

Problems in HT Multithreaded web servers (e.g., “Apache”)

Web Server

Browser

Browser

Browser

void main (void) {

while (TRUE) { accept ( ……. );

beginthread (…… ); }

}T1T2T3

Data

CS 312 Computer Organization and Architecture

New_Technologies/015

The problem in multi function-unit and super-scalar pipeline processors

• Monitor access frequency to memory address owned by a process executing SSL encryption

• Not easy to decode this information for actual encryption cracking

• Proven to be logically possible

• At least to understand what is going on in your neighbor threads

CS 312 Computer Organization and Architecture

Mot

her

boa

rd

New_Technologies/016

Other two technologies used in Intel’s processor

- MMX (Multiple Math or Matrix Math eXtension)

(improved from MMX, first introduced in Pentium III)

• SIMD (Single Instruction stream over Multiple Data stream) parallel instructions

• UMA multiprocessor architecture and MESI Cache Coherence protocol

Processor 1

Processor 2

(Du

al-P

roce

ssor

Mot

her

boa

rd)

Mai

n M

emor

y

L1 cache

L1 cache

(first introduced in Pentium processor)

Uniform Memory Access (UMA) parallel architecture

- SSE (Streaming SIMD Extension) parallel instructions

CS 312 Computer Organization and Architecture

Mot

her

boa

rd

New_Technologies/017

Other two technologies used in Intel’s processor

- MMX (Multiple Math or Matrix Math eXtension)

- SSE (Streaming SIMD Extension) parallel instructions

(improved from MMX, first introduced in Pentium III)

• SIMD (Single Instruction stream over Multiple Data stream) parallel instructions

• UMA multiprocessor architecture and MESI Cache Coherence protocol

Processor 1

Processor 2

(Du

al-P

roce

ssor

Mot

her

boa

rd)

Mai

n M

emor

y

L1 cache

L1 cache

(first introduced in Pentium processor)

Read

CS 312 Computer Organization and Architecture

Mot

her

boa

rd

New_Technologies/018

Other two technologies used in Intel’s processor

- MMX (Multiple Math or Matrix Math eXtension)

- SSE (Streaming SIMD Extension) parallel instructions

(improved from MMX, first introduced in Pentium III)

• SIMD (Single Instruction stream over Multiple Data stream) parallel instructions

• UMA multiprocessor architecture and MESI Cache Coherence protocol

Processor 1

Processor 2

(Du

al-P

roce

ssor

Mot

her

boa

rd)

Mai

n M

emor

y

L1 cache

L1 cache

(first introduced in Pentium processor)

Cache CoherencyProblem

- MESI cache coherence protocol is a solution for this problem

Read

Modified

CS 312 Computer Organization and Architecture

New_Technologies/019

SIMD Vector Computer: Cray (multiple parallel processors on a mother board)

CS 312 Computer Organization and Architecture