AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

31
AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006

Transcript of AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

Page 1: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

AMD Microprocessor Technologies

Ben SanderAMD Principal Member of Technical Staff

06/21/06

2006

Page 2: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander2

Motivation : PC Jargon Demystified

• “AMD Athlon™ 64 4200+* dual-core processor with 64-bit platform, Direct Connect Architecture and HyperTransport™ Technology for increased multitasking performance; improved security with Enhanced Virus Protection**; Cool'n'Quiet™ Technology to minimize heat and noise”

Page 3: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander3

Talk Outline

• Motivation• Recent innovations

– Dual-core processors

– Direct Connect ArchitectureTM and HyperTransportTM

– Power-efficient design (and Cool’n’QuietTM)

– AMD64 Architecture

• What’s next?

– Direct Connect ArchitectureTM enhancements

– HTX “Accelerators”

– Core enhancements

– Virtualization and AMD-V

• Summary and Conclusion

Page 4: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander4

Dual-Core AMD Opteron™ Processor Design

CPU0

1MB L2 Cache

CPU1

System Request Interface

Crossbar Switch

MemoryController 0 1 2

Existing AMD Opteron™ Processor Design

1MB L2 Cache

• Two AMD Opteron™ processor cores on a single die

– Each with 1MB L2 cache

• Shared Northbridge– Three HyperTransport™ technology links– Dual-channel (128 bit) DDR interface

• AMD Opteron processor designed as CMP from the start

– 2nd port on SRI, request management, 2 APICs, clocking microcode

• Two complete CPUs – Symmetric multiprocessor programming (SMP) model– Simpler, less restrictive programming model than ‘virtual

CPU’ approach

HyperTransport™

Page 5: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander5

MPF 2004 - AMD Dual-Core Processor Chip

Integration:• Two 64-bit CPU cores• 2MB L2 cache• On-chip Northbridge & Memory Controller

Bandwidth:• Dedicated 64-bit L2 busses for each core• Dual channel DDR (128-bit) memory bus• 3 HT links (16-bit each x 2 GT/sec x 2)

Usability and Scalability:• Socket compatible: Platform and TDP!• Glueless SMP up to 4 sockets• Memory capacity & BW scale w/ CPUs

Power Efficiency:• PowerNow! Optimized power management• Leadership system level power attributes

Page 6: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander6

AMD64 Dual-Core Physical Design

• 90nm

– Approximately same die size as

130nm single-core AMD Opteron™

processor

– ~205 million transistors

• 68/95 watt power envelope

– Fits into 90nm power infrastructure

• 939/940 Socket compatible

– Fits into existing sockets

Page 7: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander7

Dual-Core : Customer Value

• What is it?– Two processing cores on the same die

• AMD: Clean single-core to multi-core upgrade path– Same pinout– Same power envelope!

• Server customers– Server apps scale extremely well with increasing processors

Transaction processing, web serving– Doubles compute density

More compute power from the same motherboardMore compute power in a server rack

– More efficient software licensing• Consumers

– Efficiently run multiple programs at the same timeOperating system + background applicationVirus checker + photo-editing software

– Significantly improves performance of threaded applicationsVideo editing, MP3 encoding

Page 8: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander8

Dual-Core AMD Opteron™ Processor Design

CPU0

1MB L2 Cache

CPU1

System Request Interface

Crossbar Switch

MemoryController 0 1 2

Existing AMD Opteron™ Processor Design

1MB L2 Cache

• Two AMD Opteron™ processor cores on a single die

– Each with 1MB L2 cache

• Shared Northbridge– Three HyperTransport™ technology links– Dual-channel (128 bit) DDR interface

• AMD Opteron processor designed as CMP from the start

– 2nd port on SRI, request management, 2 APICs, clocking microcode

• Two complete CPUs – Symmetric multiprocessor programming (SMP) model– Simpler, less restrictive programming model than ‘virtual

CPU’ approach

• AMD Direct Connect Architecture– Everything connected directly to CPU– Reduces system architecture bottlenecks– Further reduces latency by directly connecting two

cores on same die

HyperTransport™

Page 9: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander9

I/O HubI/O HubUSBUSB

PCIPCI

PCIeTM Bridge

PCIeTM Bridge

PCIeTM

Bridge

PCIeTM

Bridge

I/O HubI/O Hub

8 GB/S

8 GB/S 8 GB/S

8 GB/S

PCI-E Bridge

PCI-E BridgePCI-E Bridge

PCI-E BridgePCIeTM Bridge

PCIeTM Bridge

USBUSB

PCIPCII/O HubI/O Hub

XMBXMBXMBXMB XMBXMB XMBXMB

SRQ

Crossbar

HTMem.Ctrlr

SRQ

Crossbar

HTMem.Ctrlr

SRQ

Crossbar

HTMem.Ctrlr

SRQ

Crossbar

HTMem.Ctrlr

Direct Connect : Advantages of good plumbing

Memory Controller

Hub

Memory Controller

Hub

MCPMCP MCPMCPMCPMCP MCPMCP

Legacy x86 Architecture• 20-year old front-side bus (FSB) architecture• CPUs, Memory, I/O all share a bus• Major bottleneck to performance• Faster CPUs or more cores ≠ performance

AMD64’s Direct Connect Architecture

• Industry-standard technology• Direct Connect eliminates the FSB bottleneck• HyperTransport™ interconnect offers scalable high

bandwidth and low latency

Chip

XChip

XChip

XChip

XChip

XChip

XChip

XChip

X

Page 10: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander10

AMD Direct Connect : Customer Value

• What is it?– Direct connection of cpu to the DRAM/memory– And cpu-to-cpu for multi-processor systems.

• Increased performance– Reduced memory latency– Reduced chip communication latency

• Reduced power– Reduced chip-count in system – Reduced external pin switching

• Scalability– Unlocks the potential of faster CPUs and additional cores

Page 11: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander11

What’s Consuming all the Power?

Computer Room Air Conditioner power

consumption23% - 54%

Battery Backup power consumption

6% - 13%

Lighting power consumption

1% - 2%

Server power consumption38% - 63%

Server Power Consumption Impacts Power throughout the Datacenter

Page 12: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander12

I/O HubI/O HubUSBUSB

PCIPCI

PCIeTM

Bridge

PCIeTM

BridgePCIeTM Bridge

PCIeTM Bridge

I/O HubI/O Hub

8 GB/S

8 GB/S 8 GB/S

8 GB/S

USBUSB

PCIPCI

XMBXMBXMBXMB XMBXMB XMBXMB

SRQ

Crossbar

HTMem.Ctrlr

SRQ

Crossbar

HTMem.Ctrlr

SRQ

Crossbar

HTMem.Ctrlr

SRQ

Crossbar

HTMem.Ctrlr

System-level Power Consumption – Present Day

380 watts380 watts

8.58.5wattswatts

8.58.5wattswatts

8.58.5wattswatts

8.58.5wattswatts

Dual-Core Packages with legacy technology• 692 watts for processors (173w each)• 48 watts for external memory controller

95% More Power

Dual-Core AMD Opteron™ processors• 380 watts for processors (95w each)

• Integrated memory controllers

740 watts 380 watts

MCPMCP MCPMCPMCPMCP MCPMCP

Chip

XChip

XChip

XChip

XChip

XChip

XChip

XChip

X

692 watts692 watts

Source: Mixture of publicly available data sheets and AMD internal estimates. Actual system power measurements may vary based on configuration and components used Source: Mixture of publicly available data sheets and AMD internal estimates. Actual system power measurements may vary based on configuration and components used

I/O HubI/O HubMemory

Controller Hub

Memory Controller

Hub

1414wattswatts

PCI-E Bridge

PCI-E BridgePCI-E Bridge

PCI-E BridgePCIeTM Bridge

PCIeTM Bridge

Page 13: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander13

Reducing Power and Cooling Requirements with Processor Performance States

P-StateP-StateHIGHHIGH

LOWLOW

P02600MHz

1.40V~95watts

P12400MHz

1.35V~90watts

P22200MHz

1.30V~76watts

P32000MHz

1.25V~65watts

P41800MHz

1.20V~55watts

P51000MHz

1.10V~32watts

PROCESSORPROCESSORUTILIZATIONUTILIZATION

Up to 75% power savings!

Average CPU Core Power(measured at CPU)

0

5

10

15

20

25

10500 Connections(~62% CPU Utilization)

5000 Connections(~40% CPU Utilization)

Idle(in OS)

Po

we

r (W

)

PowerNow! DISABLED

PowerNow! ENABLED

-33%

-62%-75%

Page 14: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander14

Power-efficient design : Customer Value

• What is it?– PowerNow! Technology changes frequency in response to workload

At lower frequencies, voltage is reduced as well

– Power efficiency “designed-in”Appropriate frequency targetsIntegrate external chipset logic (aka Dirrect Connect)“Fine gating” and other design-for-power techniques

• Customer value– Server: Save $$$ on server power and air conditioning – Desktop: Quieter operation via “Cool’n’Quiet™” technology– Notebook: Longer battery life

Page 15: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander15

AMD64 : Evolutionary 64-bit ISA

• What is it?– Evolutionary extension to support “64-bits” on x86 processors– Now an industry standard supported by other processor vendors

• Why 64 bits? – Driven by apps needing large amounts of memory

CAD tools, large databases, simulations

– 64-bit integer arithmeticSecurity and encryption applications

• Why extend x86 to 64 bits?– X86 is the most widely installed instruction set in the world– Delivers 64-bit advantages while providing full x86 compatibility– Doesn’t require a completely new tool chain

• User benefits from 64 bits:– Large-memory applications

Some applications see 10x speedup from additional memory.64-bit flat programming model massively easier for software developers

– Some performance improvement from additional registers and wider data operations– AMD64: Backwards compatibility allows migration on customer’s timeframe

Page 16: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander16

Design Goals for AMD64 Technology

•Processor is fully compatible with existing x86 modes•Straightforward extensions for 64 bits

– Minimize architectural divergencesMaintain consistency with existing architecture

– Minimize instruction set encoding changes– Straightforward implementation & verification

•Double the number of Integer and SSE registers•Architectural support for 64 bits of virtual address

space and 52 bits of physical address space– Implementations may support less

•64-bit integer operations •Eliminate unused/underutilized arcane x86 features

within the context of 64-bit mode

Page 17: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander17

AMD64 Programmer’s Model

RAX

Page 18: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander18

REX prefix byte

• Additional registers encoded without altering existing instruction format

• Optional REX prefix specifies 64-bit operation size override– Plus 3 additional register encoding bits

• REX is actually a family of 16 prefixes (40-4F)• Average instruction length in 64-bit mode increased by 0.4

bytes

Optional Instruction REX Prefixes Prefix Opcode MODRM SIB Displacement Immediate Byte

0 1 0 0 W R X B

7 6 5 4 3 2 1 0

Page 19: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander19

Talk Outline

• Motivation• Recent innovations

– Dual-core processors

– Direct Connect ArchitectureTM and HyperTransportTM

– Power-efficient design (and Cool’n’QuietTM)

– AMD64 Architecture

• What’s next?

– Direct Connect ArchitectureTM enhancements

– HTX “Accelerators”

– Core enhancements

– Virtualization and AMD-V

• Summary and Conclusion

Page 20: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander20

Promising Concept

Excellent way to get power-efficient performance boosts

Special-purpose, tuned solutions for common functions

Drop to low-power states when not in use Enabled by Modern API’s

Aligns with modularity imperative

Co-processor becomes another (optional) “IP block”

Micro-architecture: Command delivery, Synchronization, Streaming

Many possible opportunities now, and/or in the future Media processing JVM/CLR runtime hosting NIC integration (TOE, XML, SSL, etc)

Co-processors and Accelerators

Page 21: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander21

HyperTransport HTXTM Enables System-level Coprocessing Today

Page 22: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander22

AMD’s Next Generation Processor Technology

• Scalable performance and balance

Faster HyperTransport links (up to 5.2 GT/sec)Additional bandwidth enhancementsOn-chip shared L3 cache

• Maintain performance per watt leadership

Independent NB and CPU power managementIndependent CPU P-state and C-state controls

• Performance on diverse workloads

Enhanced IPC CPU core; >2X FPU performance48-bit virtual and physical address space1GB large page supportPlatform support for co-processors

• Compatibility DDR2 memory support with migration to DDR3FBDIMM Gen1 and Gen2 at the appropriate timeHT-1 backwards compatibility

• Enhanced Virtualization I/O VirtualizationNested paging support

• Enhanced RAS Memory mirroringData poisoning supportHT retry protocol support

Page 23: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander23

AMD’s Next Generation Processor Technology

Native quad core dieOptimized for 65nm SOI

and beyond

Expandable shared L3 cache

IPC enhanced CPU cores

32B instruction fetchImproved branch predictionOut-of-order load executionUp to 4 DP FLOPS/cycleDual 128-bit SSE dataflowDual 128-bit loads per cycleImproved core and Northbridge prefetchersBit Manipulation extensions (LZCNT/POPCNT)SSE extensions (EXTRQ/INSERTQ, MOVNTSD/MOVNTSS)

Enhanced Direct Connect Architecture and Northbridge

HT-3 links (5.2GT/sec)Enhanced crossbarDDR2 with migration path to DDR3FBDIMM when appropriateEnhanced power managementEnhanced RAS

Page 24: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander24

Virtualization

Virtualization

is the pooling and abstraction of resources

in a way that masks the physical nature and boundaries of those resources

from the resource users

Page 25: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander25

Virtualization: Customer Value

• What it is?– Allows a single computer to efficiently run multiple guest

Operating Systems and associated applications– AMD-V provides hardware acceleration for virtualization

And simplfies the development process.

• Benefits:– Consolidation

More efficient use of compute resourcesEliminate “single-application” serversConsolidate old unsupported servers onto newer

hardware– Migration/reliability

If a server fails, can easily move app to another server– Allows developers to easily test multiple OS environments on

a single machine.– Upgrades can be tested on hardware before deployment

Page 26: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander26

Virtualization Methods

• Software-only virtualization– Software acts a translator between OS and hardware– No need to modify the operating system– Available today– Can be slow

• OS-enabled virtualization– Host OS and virtualization software tightly integrated

Offers improved performanceBut requires changes to OS

• Processor-supported virtualization– Processor protects memory locations so that only

virtualization software can access them – Processor provides hooks on all system-level instructions– Accelerated performance and better security

Page 27: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander27

AMD-V: Overview

• Virtualization is being used in several server scenarios today

• AMD expects that virtualization will prove valuable for PC clients too

• There are ways to modify the X86 architecture, so that virtualization is easier to accomplish, performs better, and provides more security

• AMD’s AMD-V technology is being developed for future AMD64 CPUs for servers and clients

• Key technologies include adding new instructions, supporting different methods of handling page tables, handle host and guest interrupts (including SMI/SMM), and provide DMA protection

Page 28: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander28

Summary and Conclusion

AMD is focused on customer-centric innovation and value

– Dual-core processors

– Direct Connect Architecture and HyperTransport– Power-efficient design– AMD64 Architecture

– And more!

AMD is investing heavily in extending our leadership– Next generation Direct Connect Architecture technology– Next generation CPU technology– AMD-V and hardware virtualization– Developing a fundamental understanding of important emerging trends

Page 29: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander29

Thank you !

© 2006 Advanced Micro Devices, Inc. All rights reserved.

AMD, the AMD Arrow, AMD Athlon, AMD Opteron and combinations thereof, are trademarks of Advanced Micro Devices, Inc.

HyperTransport is a trademark of the HyperTransport Consortium PCI-X, PCIe and PCI Express are trademarks of PCI-SIG

Other names used in this presentation are for informational purposes only and may be trademarks of their respective owners.

www.amd.com/power

Page 30: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander30

Backup

Page 31: AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.

06/21/06 Ben Sander31

AMD Architectural Generations

Coming Soon

Extensions to AMD64

Multi-core Architecture

Scalable SMP Architecture

AMD-V Virtualization

HyperTransport v3.0

DDR3, FBDIMM

Partitioned PowerNow!

Mainframe-class reliability

System Perf. / Watt

Future

FPU Extensions to AMD64

Throughput Architecture

On-chip Coprocessors

Secure Execution

HyperTransport v4.0

DDR4, FBD2

System Resource Mgmnt

Best-in-class Reliability

Throughput / Watt / $$

AMD64 Architecture

Dual Core Architecture

Direct Connect Architecture

Enhanced Virus Protection

HyperTransport™ v1.0, v2.0

DDR, DDR2

AMD PowerNow!™ Technology

High Reliability RAS

System Performance

Now