Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next...

22
1 AMD Athlon TM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck, Steve Polzin, Sridhar Subramanian, Fred Weber Advanced Micro Devices, Sunnyvale, CA * AMD Athlon was formerly code-named AMD-K7

Transcript of Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next...

Page 1: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

1

AMD AthlonTM Northbridge with 4x AGP andNext Generation Memory Subsystem

Chetana Keltcher, Jim Kelly, Ramani Krishnan,

John Peck, Steve Polzin, Sridhar Subramanian,

Fred Weber

Advanced Micro Devices, Sunnyvale, CA

* AMD Athlon was formerly code-named AMD-K7

Page 2: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

2

Outline of the Talk

• Introduction

• Architecture

• Clocking and Gearbox

• Performance

• Silicon Statistics

• Conclusion

Page 3: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

3

Introduction

• NorthBridge: “Electronic traffic cop” that directs data flowbetween the main memory and the rest of the system

• Bridge the gap between processor speed and memory speed– Higher bandwidth busses

• Example: AGP 2.0, EV6 and AMD Athlon system bus

– Better memory technology• Example: Double data rate SDRAM, RDRAM

Page 4: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

4

CPU

CPU

200 MHz, 64 bitsScaleable to 400 MHz

DRAM

GraphicsDevice

SouthBridge

PCIDevices

IDE USB PrinterPort

SerialPort

66MHz, 32 bits(1x,2x,4x)

ISA Bus

33MHz, 32 bits

PCI Bus

AGP Bus

NorthBridge

100MHz for PC-100 SDRAM200MHz for DDR SDRAM533-800MHz for RDRAM

AMD AthlonSystem Bus

System Block Diagram

Page 5: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

5

• Can support one or two AMD Athlon or EV6 processors

• 200MHz data rate (scaleable to 400MHz), 64-bit processor interface

• 33MHz, 32-bit PCI 2.2 compliant interface

• 66MHz, 32-bit AGP 2.0 compliant interface supports 1x, 2x and 4xdata transfer modes

• Versions for SDRAM, DDR SDRAM and RDRAM memory

• Single bit error correction and multiple bit error detection (ECC)

• Distributed Graphics Aperture Remapping Table (GART)

• Power management features including powerdown self-refresh ofSDRAM and PCI master grant suspend

Features of the AMD Athlon Northbridge

Page 6: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

6

L2Cache

BIU0

BIU1

Optional

PLLs

MRO

PCI

AGP

APC

MCT

Memory

PCI Slots

AGPMaster

CFG

ControlData

GTW

L2Cache Different versions for

PC100 SDRAM, DDRSDRAM and RDRAM

Architecture

Page 7: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

7

Bus Interface Unit (BIU)

• Processor interface; one per processor

• The AMD Athlon Front side bus:– high performance point-to-point bus

– non-multiplexed command, address, data and reply/snoopinterfaces

– double-data rate transfers on address and data busses

– split transaction bus: up to 24 outstanding operations per processor

– source synchronized clocking to compensate for PC boardpropagation delays

Page 8: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

8

ProbeResponse

CommandQueue

Rcv

ReplyRead Q

ReplyMemoryWrite Q

ReplyPCI / APCWrite Q

TransactionAgent

Xmt Mem ReadFIFO

PCI / APCRead FIFO

DRAM

PCI / APC

MRO, PCI & APC

Probe Q

MemoryWrite FIFO

Probe DataFIFO

PCI / APCWrite FIFO

I/O

DRAMPCI / APC

AM

D A

thlo

nS

yste

m B

us

Other BIU, PCI & APC

Other BIU

Other BIU orPCI or APC

Done MRO, PCI & APC

BIU Block Diagram

Page 9: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

9

Accelerated Graphics Port (AGP)

• 1x, 2x and 4x data transfer rates in pipelined and sideband addressing (SBA) modes

• Fast Write implementation for CPU to AGP master writetransfers

• Distributed GART mechanism using 3 fully associativetranslation lookaside buffers (TLBs)

• Dynamically compensated AGP 2.0 compliant I/Obuffers

• Independent R/W data buffers– Reordering of high priority over low priority transactions

Page 10: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

10

AGP Block Diagram

A/DRecv

SBARecv

A/DXmit

Addr

SBASeq

GARTXlat

Seq

SDI

Arb

WbT

Write Fifo8x144

Read Fifo32x128

TLB Miss

AGPAddress

AGP I/O AGP DataFIFOs

AGP Read Data

AGP Write Data

Mem ReadData

MemReq

AXQ

WrQ

RdQ

Mem WriteData

AG

P B

us

AG

P B

us

AGPQueues

Page 11: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

11

PCI Controller

• The PCI arbiter supports five external PCI masters plusthe SouthBridge

• PCI to memory traffic is coherent w.r.t. the processorcaches

• Consecutive processor cycles to sequential PCIaddresses are chained together into one burst PCI cycle

• Same controller block is instantiated to handle the AGPPCI Protocol (APC block)

Page 12: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

12

Memory Request Organizer (MRO)

• Request crossbar responsible for scheduling memoryread and write requests from CPU, PCI and AGP

• Serves as the coherence point

• Requests are reordered to minimize page conflicts andmaximize page hits

• Anti-starvation mechanism by aging of entries

• Arbitration bypassed during idle conditions to improvelatency

Page 13: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

13

SDRAM Memory Controller• Controls up to 3 PC100 SDRAM or 4 DDR SDRAM

DIMMs. 16, 64, 128 and 256Mb densities supported

• Open page policy with 4 banks open

• Peak bandwidth = 800MB/sec (PC100 SDRAM),1.6GB/sec (DDR)

MemoryRequestArbiter

Memorycontroller RAS, CAS, WE, CS

72 bitsECC logic

BIU read data

Read data

Write data

Mem WriteMem Read

AGP R/W

GART

Page 14: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

14

Rambus Memory Controller

• One 16-bit RDRAM channel with upto 32 devices distributedacross 3 RIMMs. 64, 128 and 256Mb densities supported

• Rambus RMC uses a closed page policy, but can keep banksopen with special chained commands

• Memory address mapped to reduce adjacent bank conflicts

• Peak bandwidth = 1.6GB/sec using 800MHz RDRAMs

RMC

RDRAM channel533-800 MHz

RAC

RMD

RMH

144 bits

18 bits

MRO

Page 15: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

15

Clocking and Gear Boxes

• Many clock domains have to be supported– 66 MHz peripheral (PCI and AGP) logic clock

– 66 / 133 / 266 MHz 1x / 2x / 4x AGP clock

– 66 / 100 / 133 MHz Core logic clock, AMD Athlon bus clockand SDRAM clock

• Gear box logic is used to synchronize data transfersbetween two clock domains

• The gearbox logic is correct by design for holdtimeacross clock domains

• The Rambus RAC synchronizes with the RMC using adifferent gearbox design

Page 16: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

16

Flop

D Q

SlowCLK

CombinationalLogic Latch

D Q

Ctlmask

CombinationalLogic

G

FastCLK

LatchEn

Gear Box Module

Flop

D Q

FastCLK

Slow clock domain Fast clock domain

1

0

Slow to Fast Clock Domain

SlowCLK(66 MHz)

FastCLK(100 MHz)

Phase 2 Phase 2Phase 3 Phase 1

D0Data D1 D2

LatchEn

D0Latched Data D1 D2

CtlMask

Phase 1

Sample points on theFastCLK domain

Page 17: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

17

Fast to Slow Clock Domain

SlowCLK(66 MHz)

FastCLK (100 MHz)

Phase 2 Phase 2Phase 3 Phase 1

D0Data D1

Phase 3

LatchEn

D0LatchedData D1 D4

D2 D3 D4

D3

Sample points on theFastCLK domain

Flop

D Q

FastCLK

CombinationalLogic Latch

D Q CombinationalLogic

G

FastCLK

LatchEn

Gear Box Module

Flop

D Q

SlowCLK

Fast clock domain Slow clock domain

0

1

Page 18: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

18

Performance

• Bypassing:– Can bypass MRO and BIU under low system loads for 25%

reduction in latency of CPU reads from main memory

– Overall performance boost equivalent to ~1/2 CPU speed binon Ziff-Davis WinBench® benchmark

• SPECint95 and SPECfp95 benchmarks on AMD Athlonand Pentium® III– 512K L2 cache, Single PC100 128MB DIMM

– http://www.amd.com/products/cpg/athlon/benchmarks.html

Page 19: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

19

Performance

AMD AthlonTM 600MHz[512K]

AMD Athlon 550MHz[512K]

III 550MHz[512K]

Pentium

146%

153%

Specfp_base95

109%

118%

Specint_base95

AMD AthlonTM 600MHz[512K]

AMD Athlon 550MHz[512K]

III 550MHz[512K]

Pentium100% 100%

Normalized Pentium III Performance = 1

RR

R

Benchmark System Configuration: Diamond 770 using nVidia TNT2 Ultra 150MHz core, 183MHz memory clock 32MB, WesternDigi tal Expert 41800, Single PC-100 128MB DIMM, SoundBlaster Live (Value) Audio, LinkSys HPN100 Ethernet card, Toshiba 6XDVD SD-M1212, Dual Boot Windows® 98 & Windows NT® 4.0 using Norton System Commander. Windows NT 4.0 is installed withSP4, Windows 98 with DX 6.1A build 2150, and nVidia TNT2 Ultra Driver Rev 1.81 under Windows 98 and Windows NT.

AMD AthlonTM processor-based system: Reference Motherboard Rev. B*, Bios Rev AFTB00-2, Bus Mastering EIDE Driver v1.03,AGP miniport v4.41.

Pentium® III processor-based system: ASUS P2B Rev 1.02, BIOS Rev 1008 beta 4, EIDE-BM Driver 5/11/98, AGP miniport 5/11/98.

* This motherboard is not commercially available at this time.

Page 20: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

20

Silicon Statistics

ChipVersion

Tech &Voltage

Max CoreSpeed

Die Size(pad limited)

No. ofPins

SDRAM,1P, 2xAGP

0.35 µ,3.3V

100 MHz 107 mm2 492

SDRAM,2P, 2xAGP

0.35 µ,3.3V

100 MHz 130 mm2 656

DDR,1P, 4xAGP

0.25 µ,2.5V

133 MHz 133 mm2 553

DDR,2P, 4xAGP

0.25 µ,2.5V

133 MHz

RDRAM,1P, 4xAGP

0.25 µ,2.5V

133 MHz 107 mm2 492

Page 21: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

21

Die photo: SDRAM, 2P, 2xAGP

MRO

PCI/APC

AGP

CFG

MCT

BIU0

BIU1

FY0

FY1

PLLApprox. 500K gates11.43x11.43mm2

Page 22: Next Generation Memory Subsystem - Hot Chips · AMD AthlonTM Northbridge with 4x AGP and Next Generation Memory Subsystem Chetana Keltcher, Jim Kelly, Ramani Krishnan, John Peck,

22

Conclusions

• Low cost, high performance system solution for theAMD Athlon processor - EV6 in a PC

• Multiprocessing architecture for workstation and servermarkets

• Design provides for a high degree of concurrency whichoptimizes throughput under heavy system loads

• Support three memory sub-systems:– PC-100 SDRAM

– Double Data Rate SDRAM

– Direct Rambus SDRAM