Collateralized Debt Obligation Pricing on the Cell/B.E. -- A

IBM TJ Watson Research Center

Collateralized Debt Obligation Pricingon the Cell/B.E.-- A preliminary Result

Lurng-Kuo LiuVirat Agarwal

© 2007 IBM Corporation


Outline

Objecti eObjective

Collateralized Debt Obligation Basics

CDO on the Cell/B.E. – A preliminary result

Conclusion

© 2007 IBM Corporation2 CDO Pricing on Cell/Lurng-Kuo Liu/Virat Agarwal


Objectivej

ObjectiveObjective–Demonstrate the competitive edge of the Cell/B.E. on CDO pricing using Monte Carlo simulation with Gaussian Copula

–No intention to develop new models for CDO pricingNo intention to develop new models for CDO pricingWhy CDO?–The fastest growing sector of the asset-backed securities market. According to SIFMA, global CDO issuance increased to $488.6 cco d g to S , g oba C O ssua ce c eased to $ 88 6billion in 2006, nearly twice the $249.3 billion issued in 2005.

–CDO is challenging to price. Monte Carlo simulation has been the most popular method for CDO valuation. Monte Carlo simulation can be very resource intensive for large CDOssimulation can be very resource intensive for large CDOs.

–Seems to be the good fit for the Cell/B.E.



CDO Basics

A Collateralized Debt Obligation (CDO) is an asset-backedA Collateralized Debt Obligation (CDO) is an asset-backed security backed by a diversified pool of defaultable instruments like loans, junk bonds, mortgages, etc. If the portfolio contains only credit default swaps (CDS), it is called a synthetic CDOcalled a synthetic CDO.It is structured as multiple tranches and sold to investors. Each tranche has different priority to claim on the principal.Separate out the risks by prioritize the receipt of principalSeparate out the risks by prioritize the receipt of principal among the investors.

SeniorAssets sold to

the SPVPrincipal &

interest

SPVOriginatingBank

30-70%

Mezzanine5-30%Equity

Cash Funding

LossCas

h

Detachmentpoint - d

Attachmentpoint - a


0-5%point a


Distribution of Losses

Loss given default amount of the ith reference obligation:

NRL )1(where Ni is the notional amount and Ri is the recover rate.

The accumulated portfolio loss is

iii NRL )1( −=

The accumulated portfolio loss is

}{1

1)( t

n

ii i

LtL ≤=∑= τ

where is a default indicator

Cumulative loss on a given trance

1i=

}{1 ti ≤τ

Portfolio loss

d

a EquityMezzanine

Senior

)0max()(

))(())(()(,

xxwhere

dtLatLtL da

≡

−−−=+

++


)0,max()( xxwhere ≡


CDO Pricingg

Losses due to defaults (the issuer fails to satisfy the terms of the obligation) are the main source of risk as payoffs.Estimate the present value of tranche losses due to defaults – default leg (floating leg)

⎥⎦

⎤⎢⎣

⎡ ∫= ∫−T

da

duurtdLeEDL

t

0 ,

)()(0

Calculate the present value of the premium payments weighted by the outstanding capital – premium leg (fixed leg)

⎥⎤

⎢⎡

−−∫= ∑−w duur

adtLdeEsPLiT

)(}]0)(min{max[0δ

The fair price of the CDO tranche is defined to be spread such that the expected value of both legs is equal.

⎥⎦

⎢⎣

−−= ∑=i

iida adtLdeEsPL1

, }],0),(min{max[δ

⎤⎡ ∫Tt

⎥⎦

⎤⎢⎣

⎡−−∫

⎥⎦

⎤⎢⎣

⎡ ∫

=

∑

∫

=

−

−

w

ii

duur

i

T

da

duur

da

adtLdeE

tdLeEs

iT

1

)(

0 ,

)(

*,

}],0),(min{max[

)(

0

0

δ


⎦⎣ i 1


Modeling Default Times – Marginal Distributionsg g

Defa lt time for a single firm is modeled as theDefault time τ for a single firm is modeled as the first jump in a Cox process.

⎤⎡ ∫t

duu )(λ

⎥⎦

⎤⎢⎣

⎡ ∫−=≤⇒

⎥⎦

⎤⎢⎣

⎡ ∫=>

−

−

tduu

duu

eEtp

eEtp

0

0

)(

)(

1)(

)(

λ

λ

τ

τ

Default intensity or hazard rate of a given firm determines its default time

⎥⎦

⎢⎣

p )(

determines its default time.



Modeling Default Times – Joint Distributionsg

The primary driver of loss distributions is default coThe primary driver of loss distributions is default co-dependence – correlation sensitivity.–The higher the correlation, the more likely extreme loss events ( lti l d f lt ) b d th f i th d f(multiple defaults) become and therefore increases the spread of a senior tranche.

Need to model the join distribution of the default times (τi, …, j ( iτm) of the obligations in the portfolio

Gaussian copula is one of the first to be used for modeling the dependence structure in a credit portfoliothe dependence structure in a credit portfolio

[ ]))(()),...,((),...,( 111

111 NNNN tFtFttp −−

Σ ΦΦΦ=≤≤ ττ



Monte Carlo Simulation with Gaussian CopulapDraw a sample Z=(Z1,…,ZN) from an N-dimensional Gaussian distribution, with correlation matrix R

–Generate independent uniform random numbersConvert them into normal random numbers (W) by using e g Box Muller–Convert them into normal random numbers (W) by using e.g. Box-Muller transformation

–Perform Cholesky decomposition on the correlation matrix R=C.CT

–Generate correlated normal random numbers with X=CWConvert this sample to a correlated N-dimensional uniform vector U=(U1,…UN) = Φ(X)Turn each of these uniforms into a default time samples, by inversion: τi = Fi

-1(Ui)Sort the N-dimensional vector of default time in ascending order and select the default times that happen before maturity date.Use the random default times to generate the cash flow for the fixed leg and floating legg gDiscount these cash flow to get their present valuesRepeat the process for m times for the m-path Monte Carlo estimation


* For simplicity, calibration process is not included in this work.


Introducing Cell/B.E. v1.0C ll/B E i l t t i t 64b PCell/B.E. is an accelerator extension to 64b Power

– Built on a Power ecosystem– Used best know system practices for processor design

Sets a new performance standardSets a new performance standard – Exploits parallelism while achieving high frequency– Supercomputer attributes with extreme floating point

capabilities– Sustains high memory bandwidth with smart DMA First Generation Cell/B.E.– Sustains high memory bandwidth with smart DMA

controllersDesigned for natural human interaction

– Photo-realistic effectsP di t bl l ti

90 nm

241M transistors

235mm2

9 10 th d– Predictable real-time response– Virtualized resources for concurrent activities

Designed for flexibility– Wide variety of application domains

9 cores, 10 threads

>200 GFlops (SP)

>20 GFlops (DP)

Up to 25 GB/s memory B/WWide variety of application domains– Highly abstracted to highly exploitable programming

models– Reconfigurable I/O interfaces– Virtual trusted computing environment for security

Up to 75 GB/s I/O B/W

>300 GB/s EIB

Top frequency >4GHz (observed in lab)


Virtual trusted computing environment for securityCell/B.E. is the chip powering the Sony PS3

– (Shipped in volume the US in Nov ’06)


Heterogeneous multi-core system architecture

Power ProcessorSPE

SPUSPUSPUSPUSPUSPUSPUSPU

Cell/B.E. Features

– Power Processor Element for control tasks

– Synergistic Processor Elements for data-intensive processing

LS

SXUSPU

MFC

LS

SXUSPU

MFC

LS

SXUSPU

MFC

LS

SXUSPU

MFC

LS

SXUSPU

MFC

LS

SXUSPU

MFC

LS

SXUSPU

MFC

LS

SXUSPU

MFCintensive processingSynergistic Processor Element (SPE) consists of – Synergistic Processor

U it (SPU) 16B/cycle (2x)16B/cycle16B/cycle

EIB (up to 96B/cycle)

16B/cycle

CCCCC

Unit (SPU)– Synergistic Memory Flow

Control (MFC)• Data movement and

16B/cycle (2x)16B/cycle

BICMIC

PPE

PPUData movement and synchronization

• Interface to high-performance Element

FlexIOTMDual XDRTM

PXUL116B/cycle

L232B/cycle


Interconnect Bus 64-bit Power Architecture with VMX


Profiling results of the CDO pricing algorithmg p g g

Running time of various stages in CDO pricing

Ch l k D iti

Computational Complexity

of various stages:

Cholesky DecompositionCalculate PaymentsSum PaymentsStatistics

Generate Correlated

Generate Normals: O(Np)

Cholesky Decomposition: O(N3)

GenerateNormals

Random numbers

Generate Correlated

Random Numbers: O(N2p)

Generate Default Times: O(Np)

Generate Default Times

Sorting

( p)

Sort: O(pN logN)

Calculate Payments: O(Np)Using 100 firms and


100,000 paths


Random Numbers: Mersenne Twister

Astronomical period of 219937-1, suitable for Monte Carlo

Algorithm–

– series of shift operations on xk+n generates the output random number

2 different parallelization strategies– Optimize for a single SPE, use different (random) seeds.

– Fine-grain parallelism for generating a single stream.



Optimization for the SPEp

N = 624, M=397

V t t ti f l tiVector starting from location (i+1) or (i+M) may not be quadword aligned.

Computation of latter part of array requires updated dataarray requires updated data from the first M entries– Data dependence



Normalized Random Numbers: Polar Method

1. Generate Random Numbers a & b2 V 2a 1 V 2b 12. V1 ← 2a-1 V2 ← 2b-13. R ← V1

2 + V22

4. If R > 1, continue from STEP 1- R1 ← sqrt (-2 logR/R)- X ← V1R1

- Y ← V2R1Y ← V2R1

Optimization on SPEU t d b t & b– Use two random number vectors a & b

– Redo if condition fails for any pair of random numbers•Overheard due to skipping of perfectly normal random


•Overheard due to skipping of perfectly normal random numbers


Performance Comparison of RNG (MT) with other architectures

Time (in seconds) to generate 100 million random numbers in sequential


Time (in seconds) to generate 100 million random numbers in sequential and block pattern on various architectures.

* Source: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/speed.html


Performance Comparison of RNG (MT) with other

P f i f R N G (M T i t )

architectures

P erfo rm ance com parison o f R N G (M ersene Tw is te r)on various a rch itec tu res

1.4

1 .6

B lockS equentia l

20 0

22 .1

(sec

onds

)

0 .8

1 .0

1 .2

10 6

20.0

12 .4

Tim

e

0 .2

0 .4

0 .610 .6

6 .6 6 .38 .3

9 .9

In te l_1 .4 In te l_3 .0 A M D _2.4 P P C _1.33 C e ll0 .0



Performance compared with other Cell/B.E. implementations

Performance comparison of our optimized RNG (Mersene Twister)as compared with other Cell/B.E. implementations

4

7.7

nnin

g Ti

me

(sec

onds

)

2

3

2.7

Another Cell RNG (MT) SDK RNG* Our RNG

Ru

0

1

Time (in seconds) to generate 100 millionnormalized random numbers

Performance comparison of our optimized RNG (with Normalization)as compared with other Cell/B.E. implementatoins

onds

)

14

16

18

20

32-bit64-bit

2.3

* Vectorized Random Number generation

on a single SPE.R

unni

ng T

ime

(sec

o

4

6

8

10

122.2


ecto ed a do u be ge e at oavailable with Cell SDK 2.1

Another Cell RNG w/N Our RNG w/N0

2


Correlation Matrix : Cholesky Decompositiony p

Cholesky decomposition on correlation matrix

– C -> LLT , where L is a NxN lower triangular matrix

M difi d i f th G Al ith– Modified version of the Gauss Algorithm

Initial optimized version for a single SPEInitial optimized version for a single SPE– Analyzing ways to further optimize and parallelize on the Cell.



Generating Correlated Random Numbersg

Compute N (number of firms) normalized random numbers.

– Vector V[0 .. N-1].

Calculate V’ = LV , where L is a lower triangular matrix.

Cell Optimization:– Branch mispredicts compromise performance for small N.

– 2 load instructions (6 cycles) for each madd (6 cycles), inefficient use of the even pipeline.


p p

– Initial performance results.


Generating Correlated Random Numbersg

Also working on utilizing the lower triangular property of the matrix L, to achieve ,better performance.



Conclusions

CDO pricing is computationally intensive instead of communications intensive.

We use Monte-Carlo simulation– Highly scalable among various SPEsg y g

Initial Performance results– Show substantial speedup for Mersenne Twister and Normalization– Show substantial speedup for Mersenne Twister and Normalization as compared to other architectures

– Initial results for cholesky decomposition and generating correlated random numbers.

– Cell is a good fit for financial workloads.

Double precision is essential for FSS workloads


Double precision is essential for FSS workloads


Thank youThank you

Questions?


Collateralized Debt Obligation Pricing on the Cell/B.E. -- A

Documents

Transcript of Collateralized Debt Obligation Pricing on the Cell/B.E. -- A