Loopback Architecture for Wafer-Level At-Speed Testing of ...€¦ · 1 Loopback Architecture for...
Transcript of Loopback Architecture for Wafer-Level At-Speed Testing of ...€¦ · 1 Loopback Architecture for...
1
LoopbackLoopback Architecture forArchitecture forWaferWafer--Level AtLevel At--Speed Testing of Speed Testing of Embedded Embedded HyperTransportHyperTransport™™Processor LinksProcessor LinksAlvin Loke, Bruce Doyle, Michael OshimaAlvin Loke, Bruce Doyle, Michael Oshima11, Wade Williams, Wade Williams22,,Robert LewisRobert Lewis22, Charles Wang, Charles Wang11, Audie Hanpachern, Audie Hanpachern33,,Karen Tucker, Prashanth GurunathKaren Tucker, Prashanth Gurunath11, Gladney Asada, Gladney Asada11,,Chad Lackey, Tin Chad Lackey, Tin TinTin Wee, and Emerson FangWee, and Emerson Fang11
AMD, Fort Collins, Colorado, USAAMD, Fort Collins, Colorado, USA11 AMD, Sunnyvale, California, USAAMD, Sunnyvale, California, USA22 AMD, Austin, Texas, USAAMD, Austin, Texas, USA33 CortinaCortina Systems, Sunnyvale, CASystems, Sunnyvale, CA
Custom Integrated Circuits ConferenceCustom Integrated Circuits ConferenceSeptember 16, 2009September 16, 2009
2
OutlineOutline
MotivationMotivation
HyperTransportHyperTransport™™ OverviewOverview
LoopbackLoopback ImplementationImplementation•• ArchitectureArchitecture
•• LoopbackLoopback ChannelChannel
•• TransmitterTransmitter
•• ReceiverReceiver
Silicon ResultsSilicon Results
ConclusionConclusion
3
MotivationMotivation
Processor dies now talk with each other using Processor dies now talk with each other using fullfull--duplex, bidirectional pointduplex, bidirectional point--toto--point linkspoint links•• HighHigh--bandwidth, lowbandwidth, low--latency communicationlatency communication•• Scalable vs. common FSB architectureScalable vs. common FSB architecture•• e.g., e.g., HyperTransportHyperTransport™™ (HT) in AMD products(HT) in AMD products
I/O ports per die is increasingI/O ports per die is increasing•• Higher socket counts Higher socket counts more board connectivitymore board connectivity•• MCM embedded links MCM embedded links more package connectivitymore package connectivity
Cost benefit is increasing to sort for functional Cost benefit is increasing to sort for functional I/O before packaging, especially for I/O before packaging, especially for MCMsMCMs
Implement onImplement on--chip I/O chip I/O loopbackloopback for for lowlow--costcost atat--speed waferspeed wafer--level testinglevel testing
4
Embe
dded
Nor
thB
ridge
(NB
)
Embe
dded
Nor
thB
ridge
(NB
)
DieDie--toto--Die Processor CommunicationDie Processor Communication
PCB PCB –– max 30max 30”” trace + 2 connectorstrace + 2 connectorsMCM substrate MCM substrate –– 44”” tracetrace
TransmittersTransmitters
ReceiversReceivers TransmittersTransmitters
ReceiversReceivers
Data
CDRCDR
Data
CDRCDR
Forwarded Clock
Forwarded Clock
200MHz SSC
PLLPLLPLLPLL
5
HyperTransportHyperTransport™™ (HT) Overview(HT) Overview
Source synchronousSource synchronous•• Forward halfForward half--rate clock for RX data retimingrate clock for RX data retiming•• CommonCommon--mode jitter rejection, low latencymode jitter rejection, low latency
0.4 to 6.4Gb/s (0.4Gb/s steps) 0.4 to 6.4Gb/s (0.4Gb/s steps) –– NRZ PAMNRZ PAM--22
20 lanes per direction (split into 2 20 lanes per direction (split into 2 sublinkssublinks))•• 1 CLK & 9 data (CAD/CTL) lanes per 1 CLK & 9 data (CAD/CTL) lanes per sublinksublink
HT1 (0.4HT1 (0.4––2.0Gb/s)2.0Gb/s)•• CDR bypassed, data RX simply retimed by CLK RXCDR bypassed, data RX simply retimed by CLK RX
HT3 (2.4HT3 (2.4––6.4Gb/s)6.4Gb/s)•• DLLDLL--based CDR aligns received forwarded CLK to based CDR aligns received forwarded CLK to
received data transitions for lower BER retimingreceived data transitions for lower BER retiming
6
Core 1
Core 3
Core 5
Core 1
Core 3
Core 5
Die 0 (Master)Die 0 (Master)
Die 1 (Slave)Die 1 (Slave)
NorthBridge
Core 4
Core 2
Core 0
NorthBridge
Core 4
Core 2
Core 0
OpteronOpteron™™ 6000 Processor (G34 MCM)6000 Processor (G34 MCM)D
DR
3D
DR
3 Subl
ink
0H
T-Po
rt3
Subl
ink
1H
T-Po
rt0
Subl
ink
1Su
blin
k 0
HT-
Port
3Su
blin
k 1
Subl
ink
0H
T-Po
rt0
Subl
ink
1Su
blin
k 0
DDCL
DDCL
DDCL
DDCL embeddedembeddedfull HT linkfull HT linkembeddedembeddedhalf HT linkhalf HT link
7
HT Link Training (Handshaking)HT Link Training (Handshaking)
Coordinated by Coordinated by NBNB--IOC in both diesIOC in both dies
Each NBEach NB--IOC sends IOC sends predefined training predefined training pattern to the other pattern to the other diedie
Training arms CDR to Training arms CDR to align clock to data & align clock to data & signals start of data signals start of data transfertransfer
# data lanes enabled # data lanes enabled depends on link trafficdepends on link traffic
NorthBridgeI/O Controller
(NB-IOC)
Processor Cores
Die1HT Port
Data Lane
Shared Memory
NorthBridge Core
Die2HT Port
Data Lane
Board or Package Channel
TX PLLClock
Training Pattern
TX
NBClock
CLK LaneRX Clock
FIFO
RX
Decoder
1:4 Deserializer
CDR
Encoder
FIFO
4:1 Serializer
8
HT Data TransferHT Data Transfer
Data transfer starts Data transfer starts immediately after last immediately after last bit of trainingbit of training
Once data transfer is Once data transfer is completed, HT port is completed, HT port is disabled into one of disabled into one of several possible sleep several possible sleep states for power states for power savingsaving
Data is scrambled by Data is scrambled by XOR or by 8b/10b to XOR or by 8b/10b to reduce ISIreduce ISI
NorthBridgeI/O Controller
(NB-IOC)
Processor Cores
Die1HT Port
Data Lane
Shared Memory
NorthBridge Core
Die2HT Port
Data Lane
Board or Package Channel
TX PLLClock
Training Pattern
TX
NBClock
CLK LaneRX Clock
FIFO
RX
Decoder
1:4 Deserializer
CDR
Encoder
FIFO
4:1 Serializer
9
OutlineOutline
MotivationMotivation
HyperTransportHyperTransport™™ Links in AMD ProcessorsLinks in AMD Processors
LoopbackLoopback ImplementationImplementation•• ArchitectureArchitecture
•• LoopbackLoopback ChannelChannel
•• TransmitterTransmitter
•• ReceiverReceiver
Silicon ResultsSilicon Results
ConclusionConclusion
10
Enabling Internal Serial Enabling Internal Serial LoopbackLoopback
TXTX RX sRX serial erial loopbackloopbackvia onvia on--chip channelchip channel
No external channel No external channel required, hence test required, hence test can be performed at can be performed at waferwafer--level sortlevel sort
NBNB--IOC initiates link by IOC initiates link by sending training bits, sending training bits, then userthen user--specified test specified test patternpattern
RX is RX is selfself--trainedtrained using using bits sent by own TXbits sent by own TX
Controlled by JTAGControlled by JTAG
CLK LaneRX Clock
NorthBridgeI/O Controller
(NB-IOC)
Processor Cores
TX PLLClock
HT PortData Lane
NBClock
Shared Memory
NorthBridge Core
STARTSTARTFINISHFINISH
Bit Error CounterTest Pattern
Training PatternFIFO
Decoder
1:4 Deserializer
CDR
HT Serial Loopback
RXTX
Encoder
FIFO
4:1 Serializer
11
Sublink0 Sublink0 LoopbackLoopback
Training Pattern
&Test
Pattern
(NB-IOC)
RX SublinkSelect
On-ChipLoopbackChannelsHT Port
TX SublinkSelect
TX-CAD1
TX-CAD2
TX-CAD3
TX-CAD4
TX-CAD5
TX-CAD6
TX-CAD7
TX-CLK1
TX-CTL1
TX-CAD9
TX-CAD10
TX-CAD11
TX-CAD12
TX-CAD13
TX-CAD14
TX-CAD15
TX-CLK0
TX-CTL0
TX-CAD0TX-CAD8
BitError
Counters
(NB-IOC)
RX-CAD1
RX-CAD2
RX-CAD3
RX-CAD4
RX-CAD5
RX-CAD6
RX-CAD7
RX-CLK1
RX-CTL1
RX-CAD9
RX-CAD10
RX-CAD11
RX-CAD12
RX-CAD13
RX-CAD14
RX-CAD15
RX-CLK0
RX-CTL0
RX-CAD0RX-CAD8
12
Sublink1 Sublink1 LoopbackLoopback
Training Pattern
&Test
Pattern
(NB-IOC)
RX SublinkSelect
On-ChipLoopbackChannelsHT Port
TX SublinkSelect
TX-CAD1
TX-CAD2
TX-CAD3
TX-CAD4
TX-CAD5
TX-CAD6
TX-CAD7
TX-CLK0
TX-CTL0
TX-CAD0
BitError
Counters
(NB-IOC)
RX-CAD1
RX-CAD2
RX-CAD3
RX-CAD4
RX-CAD5
RX-CAD6
RX-CAD7
RX-CLK0
RX-CTL0
RX-CAD0
TX-CLK1
TX-CTL1
TX-CAD9
TX-CAD10
TX-CAD11
TX-CAD12
TX-CAD13
TX-CAD14
TX-CAD15
TX-CAD8
RX-CLK1
RX-CTL1
RX-CAD9
RX-CAD10
RX-CAD11
RX-CAD12
RX-CAD13
RX-CAD14
RX-CAD15
RX-CAD8
13
Transceiver Transceiver LoopbackLoopback FloorplanFloorplan
Horizontal HT Port shownHorizontal HT Port shown
RX-
CA
D0
RX-
CA
D8
RX-
CA
D1
RX-
CA
D9
RX-
CA
D2
RX-
CA
D10
RX-
CA
D3
RX-
CA
D11
RX-
CLK
0
RX-
CLK
1
RX-
CA
D4
RX-
CA
D12
RX-
CA
D5
RX-
CA
D13
RX-
CA
D6
RX-
CA
D14
RX-
CA
D7
RX-
CA
D15
RX-
CTL
0
RX-
CTL
1TX
-CA
D0
TX-C
AD
8
TX-C
AD
1
TX-C
AD
9
TX-C
AD
2
TX-C
AD
10
TX-C
AD
3
TX-C
AD
11
TX-C
LK0
TX-C
LK1
TX-C
AD
4
TX-C
AD
12
TX-C
AD
5
TX-C
AD
13
TX-C
AD
6
TX-C
AD
14
TX-C
AD
7
TX-C
AD
15
TX-C
TL0
TX-C
TL1
Sublink 0 (CLK0, CTL0, CAD0 7)Sublink 1 (CLK1, CTL1, CAD8 15)On-Chip Loopback Channel
14
WaferWafer--Level TestingLevel Testing
probe towerprobe tower
probe cardprobe card
pogo interposerpogo interposer
contact bumped wafercontact bumped wafer
LTX Sapphire platformLTX Sapphire platform
15
WaferWafer--Level Test Supply NoiseLevel Test Supply Noise
Bum
p Su
pply
Vol
tage
(V)
Probe Card PinProbe Card PinModelModel
Comes primarily from TX driver switching high Comes primarily from TX driver switching high currents through probe card pin inductancecurrents through probe card pin inductance
Can disable any TX driver per Can disable any TX driver per sublinksublink during during loopbackloopback
Simulated with 1, 8 & Simulated with 1, 8 & 16 TX drivers enabled16 TX drivers enabled
16
OutputDriver
TX
TX
TX TX LoopbackLoopback ImplementationImplementation
44--tap FFEtap FFEHybrid VHybrid V--/I/I--mode output drivermode output driver
post-cursor 1
Datafrom
NB-IOC
TXPLL
stagingflops
4FIFO
post-cursor 2
cursor
pre-cursor
Serializer
17
Delay-LockedLoop
mainmain auxiliaryauxiliary
RX RX LoopbackLoopback ImplementationImplementation
FullFull--rate architecturerate architectureEqualization: 1Equalization: 1--bit speculative DFE + analog DFR filterbit speculative DFE + analog DFR filter
Ale
xand
erPh
ase
Det
ecto
r
SerialLoopback
Signal
SerialLoopback
Signal
Single-Ended to
DifferentialConverter
18
External Serial External Serial LoopbackLoopback
PackagePackage--level sort testlevel sort test
Provides test coverage Provides test coverage not exercised by internal not exercised by internal serial serial loopbackloopback
TX output driverTX output driver
RX analog front endRX analog front end
TX & RX equalizationTX & RX equalization
Can inject jitter into Can inject jitter into external channel for eye external channel for eye marginingmargining
CLK LaneRX Clock
NorthBridgeI/O Controller
(NB-IOC)
Processor Cores
TX PLLClock
HT PortData Lane
NBClock
Shared Memory
NorthBridge Core
STARTSTARTFINISHFINISH
Bit Error CounterTest Pattern
Training PatternFIFO
Decoder
1:4 Deserializer
CDR
HT Serial Loopback
RXTX
Encoder
FIFO
4:1 Serializer
Jitter
19
Parallel Parallel LoopbackLoopback ModesModes
PackagePackage--level sort testlevel sort test
RXRX TX pTX parallel arallel loopbackloopbackin HT or in NBin HT or in NB--IOCIOC
Requires another HT port Requires another HT port or BERT to initialize link or BERT to initialize link & provide test pattern to & provide test pattern to RXRX
Enables fault isolationEnables fault isolation
20
OutlineOutline
MotivationMotivation
HyperTransportHyperTransport™™ Links in AMD ProcessorsLinks in AMD Processors
LoopbackLoopback ImplementationImplementation•• ArchitectureArchitecture
•• LoopbackLoopback ChannelChannel
•• TransmitterTransmitter
•• ReceiverReceiver
Silicon ResultsSilicon Results
ConclusionConclusion
21
LoopbackLoopback Test DescriptionTest Description
WafersWafers•• 1212”” bumped AMD bumped AMD
OpteronOpteron™™ 6000 6000 processors processors (45nm SOI(45nm SOI--CMOS)CMOS)
Test conditionsTest conditions•• 1.1V, 1.3V1.1V, 1.3V
•• 5.2Gb/s, 6.4Gb/s5.2Gb/s, 6.4Gb/s
Test patternTest pattern•• 101088 cycles of alternating +K28.5 & cycles of alternating +K28.5 & −−K28.5K28.5
•• Passing test Passing test BER < 5 BER < 5 ×× 1010--1010
Conway Conway et alet al., ., Hot ChipsHot Chips 20092009
22
Early Example of Test Sort ResultsEarly Example of Test Sort Results
Port2 Sublink0 @ 6.4Gb/s Port2 Sublink0 @ 6.4Gb/s –– 1.1V1.1VCAD4 bit error count = 63 (saturated)CAD4 bit error count = 63 (saturated)77
Port0 Sublink0 @ 6.4Gb/s Port0 Sublink0 @ 6.4Gb/s –– 1.1V1.1VCAD0 bit error count = 2CAD0 bit error count = 266
Port3 Sublink0 @ 6.4Gb/s Port3 Sublink0 @ 6.4Gb/s –– 1.1V1.1VCAD4 bit error count = 63 (saturated)CAD4 bit error count = 63 (saturated)55
Port0 Sublink0 @ 6.4Gb/s Port0 Sublink0 @ 6.4Gb/s –– 1.1,1.3V1.1,1.3VCAD2 bit error count = 63 (saturated)CAD2 bit error count = 63 (saturated)44
Port0 Sublink0/Sublink1 @ 5.2,6.4Gb/s Port0 Sublink0/Sublink1 @ 5.2,6.4Gb/s –– 1.1,1.3V1.1,1.3VTraining failure in all CTL/CAD lanesTraining failure in all CTL/CAD lanes33
Port0 Sublink0 @ 6.4Gb/s Port0 Sublink0 @ 6.4Gb/s –– 1.1V1.1VCAD2 bit error count = 2CAD2 bit error count = 222
Port3 Sublink0 @ 6.4Gb/s Port3 Sublink0 @ 6.4Gb/s –– 1.1V1.1VCAD2 bit error count = 63 (saturated)CAD2 bit error count = 63 (saturated)11
HT HT LoopbackLoopback Fail DescriptionFail DescriptionDie No.Die No.
23
ConclusionConclusion
Transceiver Transceiver loopbackloopback enables waferenables wafer--level level atat--speed testing of speed testing of HyperTransportHyperTransport I/OI/O
Demonstrated 6.4Gb/s test functionalityDemonstrated 6.4Gb/s test functionality
Entirely digital architecture for simple Entirely digital architecture for simple implementation & verificationimplementation & verification
Significantly improves packageSignificantly improves package--level yield, level yield, especially for more expensive MCM packagesespecially for more expensive MCM packages
Adds no extra sort infrastructure costAdds no extra sort infrastructure cost
Established test for waferEstablished test for wafer--level screen of AMD level screen of AMD 45nm products45nm products
24
AcknowledgmentsAcknowledgments
AMD Product Engineering OrganizationAMD Product Engineering Organization
Michael ParkerMichael Parker
Heidi GrandeHeidi Grande
Dennis FischetteDennis Fischette
DicaiDicai YangYang
Dean GonzalesDean Gonzales
Tim KasperTim Kasper
Ari ShtulmanAri Shtulman