CS 6354 by WeiKeng Qin, Jian Xiang, & Ren Xu December 8, 2009.

CS 6354

by WeiKeng Qin, Jian Xiang, & Ren

December 8, 2009

IntroductionMotivation

A Multi-Core on our desksA new microarchitecture to replace Netburst

Intel Core 2 DuoA dual-core CPUISA with SIMD ExtensionIntel Core microarchitectureMemory Hierarchy System

Instruction Set ArchitectureBase: X86-64No VLIW (Itanium)SIMD Extensions: MMX, SSE, SSE2, SSE3,

SSSE3, SSE4.1

Pentium MMX, 1996Pentium III, SSE, 1999

Pentium 4, SSE2, 2001Prescott, SSE3, 2004

Core 2, SSSE3, July 2006Walfdale, SSE4.1, Sep 2006

8 new registers, Float-point Operations8 new registers,

Packed data type, Integer Operations

Double precision, 128-bit register support

DSP-oriented math, process management

e.g. Permuting bytes in a word

Streaming SIMD Extension (SSE) 4.1Beginning with the 45 nm processors47 instructions that improve performance of

media data manipulatione.g. Fast and efficient bit width conversions

Convert single byte values to word (16-bit) values.

00000000000000000000000000000000

SSE2 CodeMOVDQU XMM0, M64PXOR XMM1, XMM1PUNPCKLBW XMM0, XMM1

SSE4.1 CodePMOVZXBW XMM0, M64

DEST[15:0] <-- ZeroExtend(SRC[7:0]);DEST[31:16] <-- ZeroExtend(SRC[15:8]);DEST[47:32] <-- ZeroExtend(SRC[23:16]);DEST[63:48] <-- ZeroExtend(SRC[31:24]);DEST[79:64] <-- ZeroExtend(SRC[39:32]);DEST[95:80] <-- ZeroExtend(SRC[47:40]);DEST[111:96] <-- ZeroExtend(SRC[55:48]);DEST[127:112] <-- ZeroExtend(SRC[63:56]);

BenefitsReduced instruction number (31)Better performance (~40% speedup each loop)Reduced register pressure (21)

MicroarchitectureThe Cores

Single-die(107 mm²), Two identical core(L1 cache 64K x 2), Shared L2 cache 6MNo Hyper-threading, no L3 cacheKeep front-side busLarger L2 cache

Microarchitecture• 14-stage

Pipeline• 4 wide decode• 4 wide Retire• Macro-fusion• Enhanced

ALUs• Deeper Buffers

Another View

Decode Hardware• 128 bits fetch

bandwidth• 18-entry IQ• Complex Decode

-produces 1-4 micro-ops

• Micro-code Sequencer

Macro-fusionNew Micro-op• Represent

instruction pair as single micro-op

Enhanced ALUs• To execute new

compare and jump (CMPJCC) micro-op in one clock

Out of Order Execution• 96 entries ROB• 32 Entry Reservation

Station

Execution Units• 6 dispatch ports(1 Load, 2 Store, 3

universal ports)• 3 integer ALU, 2 float point ALU

Branch Predictor• Loop Detector

- Track the number of loop iterations for future reference

• branch prediction unit (BPU) selects among for every branch:-bimodal predictor-global predictor

-loop detector

Cache Organizationprivate L1 DCache and ICache, 32K/core, 8way, 64B linesi

ze, write-back(directory-based conherence)shared L2 cache, 8way, 64B linesize (E8xxx)

pros: could be less bus trafficcons: longer access latency than private L2 cache;

potential conflict between threads-- FSB 1333MHz (E8xxx)

Memory disambiguationaggressive memory dependence speculation based on a l

oad's- EIP-address-indexed hash tablewatchdog mechanism

Prediction Implementation• History table indexed by Instruction Pointer• Each entry in the history array has a saturating

counter• Once counter saturates: disambiguation possibl

e on this load (take effect since next iteration) -load is allowed to go even meet unkown store addresses

• When a particular load failed disambiguation: reset its counter

• Each time a particular load correctly disambiguated: increment counter

when sent from RS, set disambiguation bit

If meets an older unknow store address, set "update"

If prediction is "go", dispatch, set "done"

Else blocked

A store in Load Buffer scan all previous load, if a match found, "reset" bit set.

When load commits, update history.

Predictor Lookup

Prediction Verification

Load Dispatch

Execute Disable Bit SupportAMD Enhanced Virus Protection; ARM eXecute Neverhelp prevent buffer overflow attacksno need of software patches for buffer overflow attackssegregate memory by either storage of code or dataprocessor disable code execution when malicious worms

try to inserting code into data buffers (with OS support)

Instruction Pointer Based Prefetcher L1 DCache:2 IP prefetchers/core

L1 ICache:1 traditional prefetcherL2 Cache: 2 IP prefetchers;

predict what memory address will be used and deliver in time

record every load's history using Instruction Pointer

IP history arrayparameters for prefetch traffic control fine-tuned f

or different platformsprefetch monitor

ReferencesIntel's Next Generation Microarchitecture

Unveiled, by David Kanter, Real World Technologies

Intel Core Microarchitecture Briefing, by Stephen Smith and Bob Valentine, Intel

Inside Intel Core Microarchitecture: Setting New Standards for Energy-Efficient Performance, Ofri Wechsler, Technology@Intel Magazine

Intel Core: A Next-Generation Microarchitecture, by Alan Zeichick, DevX

too many…

Questions?

CS 6354 by WeiKeng Qin, Jian Xiang, & Ren Xu December 8, 2009.

Documents

Transcript of CS 6354 by WeiKeng Qin, Jian Xiang, & Ren Xu December 8, 2009.

Qin Dynasty

Han Emperors in China Chapter 7 Section 3. The Founding of the Han Dynasty Began after the death of Qin Shi Huangdi Civil war between forces of Xiang.

The Qin Dynasty

01 qin dynasty

Qin to ming

Readinggroup xiang 24112016

Welcome to Chinese Class Miss Xiang Me Xiang Mei Mei Xiang plum flower )

SAPIEN: A SimulAted Part-based Interactive ENvironmentSAPIEN: A SimulAted Part-based Interactive ENvironment Fanbo Xiang 1 Yuzhe Qin 1 Kaichun Mo 2 Yikuan Xia 1 Hao Zhu 1 Fangchen

The Qin Dynasty - Mrs. Miller's Webpagesmillerresgmsd.weebly.com/uploads/3/8/3/5/38353143/ch.6... · 2018-09-10 · The Qin Dynasty. Qin Dynasty 221 ±206 BC. The Qin Emperor The

Fanzine du jeu de r ôôôôle Qin le Qin

Sensor Network Xiang Mao & Qin Chen Department of Electrical & Computer Engineering 04:35.

CS 6354: Tomasulocr4bd/6354/F2016/slides/lec... · 2016. 9. 28. · To read more… This day’s paper: Tomasulo, “An Efficient Algorithm for Exploiting Multiple Arithmetic Units”

Eight Views of the Xiao and Xiang (Xiao Xiang bajing ...

Perkins 6354 Fase IV Catalogue

Paper Reference(s) 6354/01 Edexcel GCEeiewebvip.edexcel.org.uk/Reports/Confidential Documents/0506/6354... · The graph shows the values of X ... Question 11 Price Discrimination

2012ipcc Qin

Bian Xiang

The Lancet...Xin Xiang Shan Zheng Xuefeng Li Manuscript Region of Origin: CHINA ... Xiuli Liu1,2,3 1,2 Geoffrey Hewings4 1,2Minghui Qin Xin Xiang1,2 1,2Shan Zheng Xuefeng Li ... 2020.

Chinese Named Entity Recognition Based on Character-Word …downloads.hindawi.com/journals/wcmc/2020/8866540.pdf · Na Ye , Xin Qin, Lili Dong, Xiang Zhang, and Kangkang Sun School

Xiang li portfolio