Loopback Architecture for Wafer-Level At-Speed Testing of ...

24
1 Loopback Loopback Architecture for Architecture for Wafer Wafer - - Level At Level At - - Speed Testing of Speed Testing of Embedded Embedded HyperTransport HyperTransport Processor Links Processor Links Alvin Loke, Bruce Doyle, Michael Oshima Alvin Loke, Bruce Doyle, Michael Oshima 1 1 , Wade Williams , Wade Williams 2 2 , , Robert Lewis Robert Lewis 2 2 , Charles Wang , Charles Wang 1 1 , Audie Hanpachern , Audie Hanpachern 3 3 , , Karen Tucker, Prashanth Gurunath Karen Tucker, Prashanth Gurunath 1 1 , Gladney Asada , Gladney Asada 1 1 , , Chad Lackey, Tin Chad Lackey, Tin Tin Tin Wee, and Emerson Fang Wee, and Emerson Fang 1 1 AMD, Fort Collins, Colorado, USA AMD, Fort Collins, Colorado, USA 1 1 AMD, Sunnyvale, California, USA AMD, Sunnyvale, California, USA 2 2 AMD, Austin, Texas, USA AMD, Austin, Texas, USA 3 3 Cortina Cortina Systems, Sunnyvale, CA Systems, Sunnyvale, CA Custom Integrated Circuits Conference Custom Integrated Circuits Conference September 16, 2009 September 16, 2009

Transcript of Loopback Architecture for Wafer-Level At-Speed Testing of ...

Page 1: Loopback Architecture for Wafer-Level At-Speed Testing of ...

1

LoopbackLoopback Architecture forArchitecture forWaferWafer--Level AtLevel At--Speed Testing of Speed Testing of Embedded Embedded HyperTransportHyperTransport™™Processor LinksProcessor LinksAlvin Loke, Bruce Doyle, Michael OshimaAlvin Loke, Bruce Doyle, Michael Oshima11, Wade Williams, Wade Williams22,,Robert LewisRobert Lewis22, Charles Wang, Charles Wang11, Audie Hanpachern, Audie Hanpachern33,,Karen Tucker, Prashanth GurunathKaren Tucker, Prashanth Gurunath11, Gladney Asada, Gladney Asada11,,Chad Lackey, Tin Chad Lackey, Tin TinTin Wee, and Emerson FangWee, and Emerson Fang11

AMD, Fort Collins, Colorado, USAAMD, Fort Collins, Colorado, USA11 AMD, Sunnyvale, California, USAAMD, Sunnyvale, California, USA22 AMD, Austin, Texas, USAAMD, Austin, Texas, USA33 CortinaCortina Systems, Sunnyvale, CASystems, Sunnyvale, CA

Custom Integrated Circuits ConferenceCustom Integrated Circuits ConferenceSeptember 16, 2009September 16, 2009

Page 2: Loopback Architecture for Wafer-Level At-Speed Testing of ...

2

OutlineOutline

MotivationMotivation

HyperTransportHyperTransport™™ OverviewOverview

LoopbackLoopback ImplementationImplementation•• ArchitectureArchitecture

•• LoopbackLoopback ChannelChannel

•• TransmitterTransmitter

•• ReceiverReceiver

Silicon ResultsSilicon Results

ConclusionConclusion

Page 3: Loopback Architecture for Wafer-Level At-Speed Testing of ...

3

MotivationMotivation

Processor dies now talk with each other using Processor dies now talk with each other using fullfull--duplex, bidirectional pointduplex, bidirectional point--toto--point linkspoint links•• HighHigh--bandwidth, lowbandwidth, low--latency communicationlatency communication•• Scalable vs. common FSB architectureScalable vs. common FSB architecture•• e.g., e.g., HyperTransportHyperTransport™™ (HT) in AMD products(HT) in AMD products

I/O ports per die is increasingI/O ports per die is increasing•• Higher socket counts Higher socket counts more board connectivitymore board connectivity•• MCM embedded links MCM embedded links more package connectivitymore package connectivity

Cost benefit is increasing to sort for functional Cost benefit is increasing to sort for functional I/O before packaging, especially for I/O before packaging, especially for MCMsMCMs

Implement onImplement on--chip I/O chip I/O loopbackloopback for for lowlow--costcost atat--speed waferspeed wafer--level testinglevel testing

Page 4: Loopback Architecture for Wafer-Level At-Speed Testing of ...

4

Embe

dded

Nor

thB

ridge

(NB

)

Embe

dded

Nor

thB

ridge

(NB

)

DieDie--toto--Die Processor CommunicationDie Processor Communication

PCB PCB –– max 30max 30”” trace + 2 connectorstrace + 2 connectorsMCM substrate MCM substrate –– 44”” tracetrace

TransmittersTransmitters

ReceiversReceivers TransmittersTransmitters

ReceiversReceivers

Data

CDRCDR

Data

CDRCDR

Forwarded Clock

Forwarded Clock

200MHz SSC

PLLPLLPLLPLL

Page 5: Loopback Architecture for Wafer-Level At-Speed Testing of ...

5

HyperTransportHyperTransport™™ (HT) Overview(HT) Overview

Source synchronousSource synchronous•• Forward halfForward half--rate clock for RX data retimingrate clock for RX data retiming•• CommonCommon--mode jitter rejection, low latencymode jitter rejection, low latency

0.4 to 6.4Gb/s (0.4Gb/s steps) 0.4 to 6.4Gb/s (0.4Gb/s steps) –– NRZ PAMNRZ PAM--22

20 lanes per direction (split into 2 20 lanes per direction (split into 2 sublinkssublinks))•• 1 CLK & 9 data (CAD/CTL) lanes per 1 CLK & 9 data (CAD/CTL) lanes per sublinksublink

HT1 (0.4HT1 (0.4––2.0Gb/s)2.0Gb/s)•• CDR bypassed, data RX simply retimed by CLK RXCDR bypassed, data RX simply retimed by CLK RX

HT3 (2.4HT3 (2.4––6.4Gb/s)6.4Gb/s)•• DLLDLL--based CDR aligns received forwarded CLK to based CDR aligns received forwarded CLK to

received data transitions for lower BER retimingreceived data transitions for lower BER retiming

Page 6: Loopback Architecture for Wafer-Level At-Speed Testing of ...

6

Core 1

Core 3

Core 5

Core 1

Core 3

Core 5

Die 0 (Master)Die 0 (Master)

Die 1 (Slave)Die 1 (Slave)

NorthBridge

Core 4

Core 2

Core 0

NorthBridge

Core 4

Core 2

Core 0

OpteronOpteron™™ 6000 Processor (G34 MCM)6000 Processor (G34 MCM)D

DR

3D

DR

3 Subl

ink

0H

T-Po

rt3

Subl

ink

1H

T-Po

rt0

Subl

ink

1Su

blin

k 0

HT-

Port

3Su

blin

k 1

Subl

ink

0H

T-Po

rt0

Subl

ink

1Su

blin

k 0

DDCL

DDCL

DDCL

DDCL embeddedembeddedfull HT linkfull HT linkembeddedembeddedhalf HT linkhalf HT link

Page 7: Loopback Architecture for Wafer-Level At-Speed Testing of ...

7

HT Link Training (Handshaking)HT Link Training (Handshaking)

Coordinated by Coordinated by NBNB--IOC in both diesIOC in both dies

Each NBEach NB--IOC sends IOC sends predefined training predefined training pattern to the other pattern to the other diedie

Training arms CDR to Training arms CDR to align clock to data & align clock to data & signals start of data signals start of data transfertransfer

# data lanes enabled # data lanes enabled depends on link trafficdepends on link traffic

NorthBridgeI/O Controller

(NB-IOC)

Processor Cores

Die1HT Port

Data Lane

Shared Memory

NorthBridge Core

Die2HT Port

Data Lane

Board or Package Channel

TX PLLClock

Training Pattern

TX

NBClock

CLK LaneRX Clock

FIFO

RX

Decoder

1:4 Deserializer

CDR

Encoder

FIFO

4:1 Serializer

Page 8: Loopback Architecture for Wafer-Level At-Speed Testing of ...

8

HT Data TransferHT Data Transfer

Data transfer starts Data transfer starts immediately after last immediately after last bit of trainingbit of training

Once data transfer is Once data transfer is completed, HT port is completed, HT port is disabled into one of disabled into one of several possible sleep several possible sleep states for power states for power savingsaving

Data is scrambled by Data is scrambled by XOR or by 8b/10b to XOR or by 8b/10b to reduce ISIreduce ISI

NorthBridgeI/O Controller

(NB-IOC)

Processor Cores

Die1HT Port

Data Lane

Shared Memory

NorthBridge Core

Die2HT Port

Data Lane

Board or Package Channel

TX PLLClock

Training Pattern

TX

NBClock

CLK LaneRX Clock

FIFO

RX

Decoder

1:4 Deserializer

CDR

Encoder

FIFO

4:1 Serializer

Page 9: Loopback Architecture for Wafer-Level At-Speed Testing of ...

9

OutlineOutline

MotivationMotivation

HyperTransportHyperTransport™™ Links in AMD ProcessorsLinks in AMD Processors

LoopbackLoopback ImplementationImplementation•• ArchitectureArchitecture

•• LoopbackLoopback ChannelChannel

•• TransmitterTransmitter

•• ReceiverReceiver

Silicon ResultsSilicon Results

ConclusionConclusion

Page 10: Loopback Architecture for Wafer-Level At-Speed Testing of ...

10

Enabling Internal Serial Enabling Internal Serial LoopbackLoopback

TXTX RX sRX serial erial loopbackloopbackvia onvia on--chip channelchip channel

No external channel No external channel required, hence test required, hence test can be performed at can be performed at waferwafer--level sortlevel sort

NBNB--IOC initiates link by IOC initiates link by sending training bits, sending training bits, then userthen user--specified test specified test patternpattern

RX is RX is selfself--trainedtrained using using bits sent by own TXbits sent by own TX

Controlled by JTAGControlled by JTAG

CLK LaneRX Clock

NorthBridgeI/O Controller

(NB-IOC)

Processor Cores

TX PLLClock

HT PortData Lane

NBClock

Shared Memory

NorthBridge Core

STARTSTARTFINISHFINISH

Bit Error CounterTest Pattern

Training PatternFIFO

Decoder

1:4 Deserializer

CDR

HT Serial Loopback

RXTX

Encoder

FIFO

4:1 Serializer

Page 11: Loopback Architecture for Wafer-Level At-Speed Testing of ...

11

Sublink0 Sublink0 LoopbackLoopback

Training Pattern

&Test

Pattern

(NB-IOC)

RX SublinkSelect

On-ChipLoopbackChannelsHT Port

TX SublinkSelect

TX-CAD1

TX-CAD2

TX-CAD3

TX-CAD4

TX-CAD5

TX-CAD6

TX-CAD7

TX-CLK1

TX-CTL1

TX-CAD9

TX-CAD10

TX-CAD11

TX-CAD12

TX-CAD13

TX-CAD14

TX-CAD15

TX-CLK0

TX-CTL0

TX-CAD0TX-CAD8

BitError

Counters

(NB-IOC)

RX-CAD1

RX-CAD2

RX-CAD3

RX-CAD4

RX-CAD5

RX-CAD6

RX-CAD7

RX-CLK1

RX-CTL1

RX-CAD9

RX-CAD10

RX-CAD11

RX-CAD12

RX-CAD13

RX-CAD14

RX-CAD15

RX-CLK0

RX-CTL0

RX-CAD0RX-CAD8

Page 12: Loopback Architecture for Wafer-Level At-Speed Testing of ...

12

Sublink1 Sublink1 LoopbackLoopback

Training Pattern

&Test

Pattern

(NB-IOC)

RX SublinkSelect

On-ChipLoopbackChannelsHT Port

TX SublinkSelect

TX-CAD1

TX-CAD2

TX-CAD3

TX-CAD4

TX-CAD5

TX-CAD6

TX-CAD7

TX-CLK0

TX-CTL0

TX-CAD0

BitError

Counters

(NB-IOC)

RX-CAD1

RX-CAD2

RX-CAD3

RX-CAD4

RX-CAD5

RX-CAD6

RX-CAD7

RX-CLK0

RX-CTL0

RX-CAD0

TX-CLK1

TX-CTL1

TX-CAD9

TX-CAD10

TX-CAD11

TX-CAD12

TX-CAD13

TX-CAD14

TX-CAD15

TX-CAD8

RX-CLK1

RX-CTL1

RX-CAD9

RX-CAD10

RX-CAD11

RX-CAD12

RX-CAD13

RX-CAD14

RX-CAD15

RX-CAD8

Page 13: Loopback Architecture for Wafer-Level At-Speed Testing of ...

13

Transceiver Transceiver LoopbackLoopback FloorplanFloorplan

Horizontal HT Port shownHorizontal HT Port shown

RX-

CA

D0

RX-

CA

D8

RX-

CA

D1

RX-

CA

D9

RX-

CA

D2

RX-

CA

D10

RX-

CA

D3

RX-

CA

D11

RX-

CLK

0

RX-

CLK

1

RX-

CA

D4

RX-

CA

D12

RX-

CA

D5

RX-

CA

D13

RX-

CA

D6

RX-

CA

D14

RX-

CA

D7

RX-

CA

D15

RX-

CTL

0

RX-

CTL

1TX

-CA

D0

TX-C

AD

8

TX-C

AD

1

TX-C

AD

9

TX-C

AD

2

TX-C

AD

10

TX-C

AD

3

TX-C

AD

11

TX-C

LK0

TX-C

LK1

TX-C

AD

4

TX-C

AD

12

TX-C

AD

5

TX-C

AD

13

TX-C

AD

6

TX-C

AD

14

TX-C

AD

7

TX-C

AD

15

TX-C

TL0

TX-C

TL1

Sublink 0 (CLK0, CTL0, CAD0 7)Sublink 1 (CLK1, CTL1, CAD8 15)On-Chip Loopback Channel

Page 14: Loopback Architecture for Wafer-Level At-Speed Testing of ...

14

WaferWafer--Level TestingLevel Testing

probe towerprobe tower

probe cardprobe card

pogo interposerpogo interposer

contact bumped wafercontact bumped wafer

LTX Sapphire platformLTX Sapphire platform

Page 15: Loopback Architecture for Wafer-Level At-Speed Testing of ...

15

WaferWafer--Level Test Supply NoiseLevel Test Supply Noise

Bum

p Su

pply

Vol

tage

(V)

Probe Card PinProbe Card PinModelModel

Comes primarily from TX driver switching high Comes primarily from TX driver switching high currents through probe card pin inductancecurrents through probe card pin inductance

Can disable any TX driver per Can disable any TX driver per sublinksublink during during loopbackloopback

Simulated with 1, 8 & Simulated with 1, 8 & 16 TX drivers enabled16 TX drivers enabled

Page 16: Loopback Architecture for Wafer-Level At-Speed Testing of ...

16

OutputDriver

TX

TX

TX TX LoopbackLoopback ImplementationImplementation

44--tap FFEtap FFEHybrid VHybrid V--/I/I--mode output drivermode output driver

post-cursor 1

Datafrom

NB-IOC

TXPLL

stagingflops

4FIFO

post-cursor 2

cursor

pre-cursor

Serializer

Page 17: Loopback Architecture for Wafer-Level At-Speed Testing of ...

17

Delay-LockedLoop

mainmain auxiliaryauxiliary

RX RX LoopbackLoopback ImplementationImplementation

FullFull--rate architecturerate architectureEqualization: 1Equalization: 1--bit speculative DFE + analog DFR filterbit speculative DFE + analog DFR filter

Ale

xand

erPh

ase

Det

ecto

r

SerialLoopback

Signal

SerialLoopback

Signal

Single-Ended to

DifferentialConverter

Page 18: Loopback Architecture for Wafer-Level At-Speed Testing of ...

18

External Serial External Serial LoopbackLoopback

PackagePackage--level sort testlevel sort test

Provides test coverage Provides test coverage not exercised by internal not exercised by internal serial serial loopbackloopback

TX output driverTX output driver

RX analog front endRX analog front end

TX & RX equalizationTX & RX equalization

Can inject jitter into Can inject jitter into external channel for eye external channel for eye marginingmargining

CLK LaneRX Clock

NorthBridgeI/O Controller

(NB-IOC)

Processor Cores

TX PLLClock

HT PortData Lane

NBClock

Shared Memory

NorthBridge Core

STARTSTARTFINISHFINISH

Bit Error CounterTest Pattern

Training PatternFIFO

Decoder

1:4 Deserializer

CDR

HT Serial Loopback

RXTX

Encoder

FIFO

4:1 Serializer

Jitter

Page 19: Loopback Architecture for Wafer-Level At-Speed Testing of ...

19

Parallel Parallel LoopbackLoopback ModesModes

PackagePackage--level sort testlevel sort test

RXRX TX pTX parallel arallel loopbackloopbackin HT or in NBin HT or in NB--IOCIOC

Requires another HT port Requires another HT port or BERT to initialize link or BERT to initialize link & provide test pattern to & provide test pattern to RXRX

Enables fault isolationEnables fault isolation

Page 20: Loopback Architecture for Wafer-Level At-Speed Testing of ...

20

OutlineOutline

MotivationMotivation

HyperTransportHyperTransport™™ Links in AMD ProcessorsLinks in AMD Processors

LoopbackLoopback ImplementationImplementation•• ArchitectureArchitecture

•• LoopbackLoopback ChannelChannel

•• TransmitterTransmitter

•• ReceiverReceiver

Silicon ResultsSilicon Results

ConclusionConclusion

Page 21: Loopback Architecture for Wafer-Level At-Speed Testing of ...

21

LoopbackLoopback Test DescriptionTest Description

WafersWafers•• 1212”” bumped AMD bumped AMD

OpteronOpteron™™ 6000 6000 processors processors (45nm SOI(45nm SOI--CMOS)CMOS)

Test conditionsTest conditions•• 1.1V, 1.3V1.1V, 1.3V

•• 5.2Gb/s, 6.4Gb/s5.2Gb/s, 6.4Gb/s

Test patternTest pattern•• 101088 cycles of alternating +K28.5 & cycles of alternating +K28.5 & −−K28.5K28.5

•• Passing test Passing test BER < 5 BER < 5 ×× 1010--1010

Conway Conway et alet al., ., Hot ChipsHot Chips 20092009

Page 22: Loopback Architecture for Wafer-Level At-Speed Testing of ...

22

Early Example of Test Sort ResultsEarly Example of Test Sort Results

Port2 Sublink0 @ 6.4Gb/s Port2 Sublink0 @ 6.4Gb/s –– 1.1V1.1VCAD4 bit error count = 63 (saturated)CAD4 bit error count = 63 (saturated)77

Port0 Sublink0 @ 6.4Gb/s Port0 Sublink0 @ 6.4Gb/s –– 1.1V1.1VCAD0 bit error count = 2CAD0 bit error count = 266

Port3 Sublink0 @ 6.4Gb/s Port3 Sublink0 @ 6.4Gb/s –– 1.1V1.1VCAD4 bit error count = 63 (saturated)CAD4 bit error count = 63 (saturated)55

Port0 Sublink0 @ 6.4Gb/s Port0 Sublink0 @ 6.4Gb/s –– 1.1,1.3V1.1,1.3VCAD2 bit error count = 63 (saturated)CAD2 bit error count = 63 (saturated)44

Port0 Sublink0/Sublink1 @ 5.2,6.4Gb/s Port0 Sublink0/Sublink1 @ 5.2,6.4Gb/s –– 1.1,1.3V1.1,1.3VTraining failure in all CTL/CAD lanesTraining failure in all CTL/CAD lanes33

Port0 Sublink0 @ 6.4Gb/s Port0 Sublink0 @ 6.4Gb/s –– 1.1V1.1VCAD2 bit error count = 2CAD2 bit error count = 222

Port3 Sublink0 @ 6.4Gb/s Port3 Sublink0 @ 6.4Gb/s –– 1.1V1.1VCAD2 bit error count = 63 (saturated)CAD2 bit error count = 63 (saturated)11

HT HT LoopbackLoopback Fail DescriptionFail DescriptionDie No.Die No.

Page 23: Loopback Architecture for Wafer-Level At-Speed Testing of ...

23

ConclusionConclusion

Transceiver Transceiver loopbackloopback enables waferenables wafer--level level atat--speed testing of speed testing of HyperTransportHyperTransport I/OI/O

Demonstrated 6.4Gb/s test functionalityDemonstrated 6.4Gb/s test functionality

Entirely digital architecture for simple Entirely digital architecture for simple implementation & verificationimplementation & verification

Significantly improves packageSignificantly improves package--level yield, level yield, especially for more expensive MCM packagesespecially for more expensive MCM packages

Adds no extra sort infrastructure costAdds no extra sort infrastructure cost

Established test for waferEstablished test for wafer--level screen of AMD level screen of AMD 45nm products45nm products

Page 24: Loopback Architecture for Wafer-Level At-Speed Testing of ...

24

AcknowledgmentsAcknowledgments

AMD Product Engineering OrganizationAMD Product Engineering Organization

Michael ParkerMichael Parker

Heidi GrandeHeidi Grande

Dennis FischetteDennis Fischette

DicaiDicai YangYang

Dean GonzalesDean Gonzales

Tim KasperTim Kasper

Ari ShtulmanAri Shtulman