AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.
-
Upload
myra-metcalf -
Category
Documents
-
view
223 -
download
1
Transcript of AMD Microprocessor Technologies Ben Sander AMD Principal Member of Technical Staff 06/21/06 2006.
AMD Microprocessor Technologies
Ben SanderAMD Principal Member of Technical Staff
06/21/06
2006
06/21/06 Ben Sander2
Motivation : PC Jargon Demystified
• “AMD Athlon™ 64 4200+* dual-core processor with 64-bit platform, Direct Connect Architecture and HyperTransport™ Technology for increased multitasking performance; improved security with Enhanced Virus Protection**; Cool'n'Quiet™ Technology to minimize heat and noise”
06/21/06 Ben Sander3
Talk Outline
• Motivation• Recent innovations
– Dual-core processors
– Direct Connect ArchitectureTM and HyperTransportTM
– Power-efficient design (and Cool’n’QuietTM)
– AMD64 Architecture
• What’s next?
– Direct Connect ArchitectureTM enhancements
– HTX “Accelerators”
– Core enhancements
– Virtualization and AMD-V
• Summary and Conclusion
06/21/06 Ben Sander4
Dual-Core AMD Opteron™ Processor Design
CPU0
1MB L2 Cache
CPU1
System Request Interface
Crossbar Switch
MemoryController 0 1 2
Existing AMD Opteron™ Processor Design
1MB L2 Cache
• Two AMD Opteron™ processor cores on a single die
– Each with 1MB L2 cache
• Shared Northbridge– Three HyperTransport™ technology links– Dual-channel (128 bit) DDR interface
• AMD Opteron processor designed as CMP from the start
– 2nd port on SRI, request management, 2 APICs, clocking microcode
• Two complete CPUs – Symmetric multiprocessor programming (SMP) model– Simpler, less restrictive programming model than ‘virtual
CPU’ approach
HyperTransport™
06/21/06 Ben Sander5
MPF 2004 - AMD Dual-Core Processor Chip
Integration:• Two 64-bit CPU cores• 2MB L2 cache• On-chip Northbridge & Memory Controller
Bandwidth:• Dedicated 64-bit L2 busses for each core• Dual channel DDR (128-bit) memory bus• 3 HT links (16-bit each x 2 GT/sec x 2)
Usability and Scalability:• Socket compatible: Platform and TDP!• Glueless SMP up to 4 sockets• Memory capacity & BW scale w/ CPUs
Power Efficiency:• PowerNow! Optimized power management• Leadership system level power attributes
06/21/06 Ben Sander6
AMD64 Dual-Core Physical Design
• 90nm
– Approximately same die size as
130nm single-core AMD Opteron™
processor
– ~205 million transistors
• 68/95 watt power envelope
– Fits into 90nm power infrastructure
• 939/940 Socket compatible
– Fits into existing sockets
06/21/06 Ben Sander7
Dual-Core : Customer Value
• What is it?– Two processing cores on the same die
• AMD: Clean single-core to multi-core upgrade path– Same pinout– Same power envelope!
• Server customers– Server apps scale extremely well with increasing processors
Transaction processing, web serving– Doubles compute density
More compute power from the same motherboardMore compute power in a server rack
– More efficient software licensing• Consumers
– Efficiently run multiple programs at the same timeOperating system + background applicationVirus checker + photo-editing software
– Significantly improves performance of threaded applicationsVideo editing, MP3 encoding
06/21/06 Ben Sander8
Dual-Core AMD Opteron™ Processor Design
CPU0
1MB L2 Cache
CPU1
System Request Interface
Crossbar Switch
MemoryController 0 1 2
Existing AMD Opteron™ Processor Design
1MB L2 Cache
• Two AMD Opteron™ processor cores on a single die
– Each with 1MB L2 cache
• Shared Northbridge– Three HyperTransport™ technology links– Dual-channel (128 bit) DDR interface
• AMD Opteron processor designed as CMP from the start
– 2nd port on SRI, request management, 2 APICs, clocking microcode
• Two complete CPUs – Symmetric multiprocessor programming (SMP) model– Simpler, less restrictive programming model than ‘virtual
CPU’ approach
• AMD Direct Connect Architecture– Everything connected directly to CPU– Reduces system architecture bottlenecks– Further reduces latency by directly connecting two
cores on same die
HyperTransport™
06/21/06 Ben Sander9
I/O HubI/O HubUSBUSB
PCIPCI
PCIeTM Bridge
PCIeTM Bridge
PCIeTM
Bridge
PCIeTM
Bridge
I/O HubI/O Hub
8 GB/S
8 GB/S 8 GB/S
8 GB/S
PCI-E Bridge
PCI-E BridgePCI-E Bridge
PCI-E BridgePCIeTM Bridge
PCIeTM Bridge
USBUSB
PCIPCII/O HubI/O Hub
XMBXMBXMBXMB XMBXMB XMBXMB
SRQ
Crossbar
HTMem.Ctrlr
SRQ
Crossbar
HTMem.Ctrlr
SRQ
Crossbar
HTMem.Ctrlr
SRQ
Crossbar
HTMem.Ctrlr
Direct Connect : Advantages of good plumbing
Memory Controller
Hub
Memory Controller
Hub
MCPMCP MCPMCPMCPMCP MCPMCP
Legacy x86 Architecture• 20-year old front-side bus (FSB) architecture• CPUs, Memory, I/O all share a bus• Major bottleneck to performance• Faster CPUs or more cores ≠ performance
AMD64’s Direct Connect Architecture
• Industry-standard technology• Direct Connect eliminates the FSB bottleneck• HyperTransport™ interconnect offers scalable high
bandwidth and low latency
Chip
XChip
XChip
XChip
XChip
XChip
XChip
XChip
X
06/21/06 Ben Sander10
AMD Direct Connect : Customer Value
• What is it?– Direct connection of cpu to the DRAM/memory– And cpu-to-cpu for multi-processor systems.
• Increased performance– Reduced memory latency– Reduced chip communication latency
• Reduced power– Reduced chip-count in system – Reduced external pin switching
• Scalability– Unlocks the potential of faster CPUs and additional cores
06/21/06 Ben Sander11
What’s Consuming all the Power?
Computer Room Air Conditioner power
consumption23% - 54%
Battery Backup power consumption
6% - 13%
Lighting power consumption
1% - 2%
Server power consumption38% - 63%
Server Power Consumption Impacts Power throughout the Datacenter
06/21/06 Ben Sander12
I/O HubI/O HubUSBUSB
PCIPCI
PCIeTM
Bridge
PCIeTM
BridgePCIeTM Bridge
PCIeTM Bridge
I/O HubI/O Hub
8 GB/S
8 GB/S 8 GB/S
8 GB/S
USBUSB
PCIPCI
XMBXMBXMBXMB XMBXMB XMBXMB
SRQ
Crossbar
HTMem.Ctrlr
SRQ
Crossbar
HTMem.Ctrlr
SRQ
Crossbar
HTMem.Ctrlr
SRQ
Crossbar
HTMem.Ctrlr
System-level Power Consumption – Present Day
380 watts380 watts
8.58.5wattswatts
8.58.5wattswatts
8.58.5wattswatts
8.58.5wattswatts
Dual-Core Packages with legacy technology• 692 watts for processors (173w each)• 48 watts for external memory controller
95% More Power
Dual-Core AMD Opteron™ processors• 380 watts for processors (95w each)
• Integrated memory controllers
740 watts 380 watts
MCPMCP MCPMCPMCPMCP MCPMCP
Chip
XChip
XChip
XChip
XChip
XChip
XChip
XChip
X
692 watts692 watts
Source: Mixture of publicly available data sheets and AMD internal estimates. Actual system power measurements may vary based on configuration and components used Source: Mixture of publicly available data sheets and AMD internal estimates. Actual system power measurements may vary based on configuration and components used
I/O HubI/O HubMemory
Controller Hub
Memory Controller
Hub
1414wattswatts
PCI-E Bridge
PCI-E BridgePCI-E Bridge
PCI-E BridgePCIeTM Bridge
PCIeTM Bridge
06/21/06 Ben Sander13
Reducing Power and Cooling Requirements with Processor Performance States
P-StateP-StateHIGHHIGH
LOWLOW
P02600MHz
1.40V~95watts
P12400MHz
1.35V~90watts
P22200MHz
1.30V~76watts
P32000MHz
1.25V~65watts
P41800MHz
1.20V~55watts
P51000MHz
1.10V~32watts
PROCESSORPROCESSORUTILIZATIONUTILIZATION
Up to 75% power savings!
Average CPU Core Power(measured at CPU)
0
5
10
15
20
25
10500 Connections(~62% CPU Utilization)
5000 Connections(~40% CPU Utilization)
Idle(in OS)
Po
we
r (W
)
PowerNow! DISABLED
PowerNow! ENABLED
-33%
-62%-75%
06/21/06 Ben Sander14
Power-efficient design : Customer Value
• What is it?– PowerNow! Technology changes frequency in response to workload
At lower frequencies, voltage is reduced as well
– Power efficiency “designed-in”Appropriate frequency targetsIntegrate external chipset logic (aka Dirrect Connect)“Fine gating” and other design-for-power techniques
• Customer value– Server: Save $$$ on server power and air conditioning – Desktop: Quieter operation via “Cool’n’Quiet™” technology– Notebook: Longer battery life
06/21/06 Ben Sander15
AMD64 : Evolutionary 64-bit ISA
• What is it?– Evolutionary extension to support “64-bits” on x86 processors– Now an industry standard supported by other processor vendors
• Why 64 bits? – Driven by apps needing large amounts of memory
CAD tools, large databases, simulations
– 64-bit integer arithmeticSecurity and encryption applications
• Why extend x86 to 64 bits?– X86 is the most widely installed instruction set in the world– Delivers 64-bit advantages while providing full x86 compatibility– Doesn’t require a completely new tool chain
• User benefits from 64 bits:– Large-memory applications
Some applications see 10x speedup from additional memory.64-bit flat programming model massively easier for software developers
– Some performance improvement from additional registers and wider data operations– AMD64: Backwards compatibility allows migration on customer’s timeframe
06/21/06 Ben Sander16
Design Goals for AMD64 Technology
•Processor is fully compatible with existing x86 modes•Straightforward extensions for 64 bits
– Minimize architectural divergencesMaintain consistency with existing architecture
– Minimize instruction set encoding changes– Straightforward implementation & verification
•Double the number of Integer and SSE registers•Architectural support for 64 bits of virtual address
space and 52 bits of physical address space– Implementations may support less
•64-bit integer operations •Eliminate unused/underutilized arcane x86 features
within the context of 64-bit mode
06/21/06 Ben Sander17
AMD64 Programmer’s Model
RAX
06/21/06 Ben Sander18
REX prefix byte
• Additional registers encoded without altering existing instruction format
• Optional REX prefix specifies 64-bit operation size override– Plus 3 additional register encoding bits
• REX is actually a family of 16 prefixes (40-4F)• Average instruction length in 64-bit mode increased by 0.4
bytes
Optional Instruction REX Prefixes Prefix Opcode MODRM SIB Displacement Immediate Byte
0 1 0 0 W R X B
7 6 5 4 3 2 1 0
06/21/06 Ben Sander19
Talk Outline
• Motivation• Recent innovations
– Dual-core processors
– Direct Connect ArchitectureTM and HyperTransportTM
– Power-efficient design (and Cool’n’QuietTM)
– AMD64 Architecture
• What’s next?
– Direct Connect ArchitectureTM enhancements
– HTX “Accelerators”
– Core enhancements
– Virtualization and AMD-V
• Summary and Conclusion
06/21/06 Ben Sander20
Promising Concept
Excellent way to get power-efficient performance boosts
Special-purpose, tuned solutions for common functions
Drop to low-power states when not in use Enabled by Modern API’s
Aligns with modularity imperative
Co-processor becomes another (optional) “IP block”
Micro-architecture: Command delivery, Synchronization, Streaming
Many possible opportunities now, and/or in the future Media processing JVM/CLR runtime hosting NIC integration (TOE, XML, SSL, etc)
Co-processors and Accelerators
06/21/06 Ben Sander21
HyperTransport HTXTM Enables System-level Coprocessing Today
06/21/06 Ben Sander22
AMD’s Next Generation Processor Technology
• Scalable performance and balance
Faster HyperTransport links (up to 5.2 GT/sec)Additional bandwidth enhancementsOn-chip shared L3 cache
• Maintain performance per watt leadership
Independent NB and CPU power managementIndependent CPU P-state and C-state controls
• Performance on diverse workloads
Enhanced IPC CPU core; >2X FPU performance48-bit virtual and physical address space1GB large page supportPlatform support for co-processors
• Compatibility DDR2 memory support with migration to DDR3FBDIMM Gen1 and Gen2 at the appropriate timeHT-1 backwards compatibility
• Enhanced Virtualization I/O VirtualizationNested paging support
• Enhanced RAS Memory mirroringData poisoning supportHT retry protocol support
06/21/06 Ben Sander23
AMD’s Next Generation Processor Technology
Native quad core dieOptimized for 65nm SOI
and beyond
Expandable shared L3 cache
IPC enhanced CPU cores
32B instruction fetchImproved branch predictionOut-of-order load executionUp to 4 DP FLOPS/cycleDual 128-bit SSE dataflowDual 128-bit loads per cycleImproved core and Northbridge prefetchersBit Manipulation extensions (LZCNT/POPCNT)SSE extensions (EXTRQ/INSERTQ, MOVNTSD/MOVNTSS)
Enhanced Direct Connect Architecture and Northbridge
HT-3 links (5.2GT/sec)Enhanced crossbarDDR2 with migration path to DDR3FBDIMM when appropriateEnhanced power managementEnhanced RAS
06/21/06 Ben Sander24
Virtualization
Virtualization
is the pooling and abstraction of resources
in a way that masks the physical nature and boundaries of those resources
from the resource users
06/21/06 Ben Sander25
Virtualization: Customer Value
• What it is?– Allows a single computer to efficiently run multiple guest
Operating Systems and associated applications– AMD-V provides hardware acceleration for virtualization
And simplfies the development process.
• Benefits:– Consolidation
More efficient use of compute resourcesEliminate “single-application” serversConsolidate old unsupported servers onto newer
hardware– Migration/reliability
If a server fails, can easily move app to another server– Allows developers to easily test multiple OS environments on
a single machine.– Upgrades can be tested on hardware before deployment
06/21/06 Ben Sander26
Virtualization Methods
• Software-only virtualization– Software acts a translator between OS and hardware– No need to modify the operating system– Available today– Can be slow
• OS-enabled virtualization– Host OS and virtualization software tightly integrated
Offers improved performanceBut requires changes to OS
• Processor-supported virtualization– Processor protects memory locations so that only
virtualization software can access them – Processor provides hooks on all system-level instructions– Accelerated performance and better security
06/21/06 Ben Sander27
AMD-V: Overview
• Virtualization is being used in several server scenarios today
• AMD expects that virtualization will prove valuable for PC clients too
• There are ways to modify the X86 architecture, so that virtualization is easier to accomplish, performs better, and provides more security
• AMD’s AMD-V technology is being developed for future AMD64 CPUs for servers and clients
• Key technologies include adding new instructions, supporting different methods of handling page tables, handle host and guest interrupts (including SMI/SMM), and provide DMA protection
06/21/06 Ben Sander28
Summary and Conclusion
AMD is focused on customer-centric innovation and value
– Dual-core processors
– Direct Connect Architecture and HyperTransport– Power-efficient design– AMD64 Architecture
– And more!
AMD is investing heavily in extending our leadership– Next generation Direct Connect Architecture technology– Next generation CPU technology– AMD-V and hardware virtualization– Developing a fundamental understanding of important emerging trends
06/21/06 Ben Sander29
Thank you !
© 2006 Advanced Micro Devices, Inc. All rights reserved.
AMD, the AMD Arrow, AMD Athlon, AMD Opteron and combinations thereof, are trademarks of Advanced Micro Devices, Inc.
HyperTransport is a trademark of the HyperTransport Consortium PCI-X, PCIe and PCI Express are trademarks of PCI-SIG
Other names used in this presentation are for informational purposes only and may be trademarks of their respective owners.
www.amd.com/power
06/21/06 Ben Sander30
Backup
06/21/06 Ben Sander31
AMD Architectural Generations
Coming Soon
Extensions to AMD64
Multi-core Architecture
Scalable SMP Architecture
AMD-V Virtualization
HyperTransport v3.0
DDR3, FBDIMM
Partitioned PowerNow!
Mainframe-class reliability
System Perf. / Watt
Future
FPU Extensions to AMD64
Throughput Architecture
On-chip Coprocessors
Secure Execution
HyperTransport v4.0
DDR4, FBD2
System Resource Mgmnt
Best-in-class Reliability
Throughput / Watt / $$
AMD64 Architecture
Dual Core Architecture
Direct Connect Architecture
Enhanced Virus Protection
HyperTransport™ v1.0, v2.0
DDR, DDR2
AMD PowerNow!™ Technology
High Reliability RAS
System Performance
Now