Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 , Amirali Shayan 2 , Chung-Kuan Cheng 2

22
An Adaptive Parallel Flow for Power Distribution Network Simulation Using Discrete Fourier Transform Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 , Amirali Shayan 2 , Chung-Kuan Cheng 2 1 ECE Dept., 2 CSE Dept. University of California, San Diego 01/19/2010

description

An Adaptive Parallel Flow for Power Distribution Network Simulation Using Discrete Fourier Transform. Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 , Amirali Shayan 2 , Chung-Kuan Cheng 2 1 ECE Dept., 2 CSE Dept. University of California, San Diego 01/19/2010. Outline. Motivation - PowerPoint PPT Presentation

Transcript of Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 , Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 1: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

An Adaptive Parallel Flow for Power Distribution Network Simulation Using Discrete Fourier Transform

Xiang Hu1, Wenbo Zhao2, Peng Du2,

Amirali Shayan2, Chung-Kuan Cheng2

1ECE Dept., 2CSE Dept.

University of California, San Diego

01/19/2010

Page 2: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 2

Outline

Motivation

Adaptive simulation flow using discrete Fourier transform (DFT)

– Basic DFT flow

– Problems with the basic DFT flow

– Adaptive DFT flow

Experimental results

– Error analysis

– Time efficiency of the adaptive flow

– Time efficiency of parallel processing

Conclusions

Page 3: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 3

Outline

Motivation

Adaptive simulation flow using discrete Fourier transform (DFT)

– Basic DFT flow

– Problems with the basic DFT flow

– Adaptive DFT flow

Experimental results

– Error analysis

– Time efficiency of the adaptive flow

– Time efficiency of parallel processing

Conclusions

Page 4: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 4

PDN Simulation: Why Frequency Domain?

Huge PDN netlists

– Time-domain simulation: serial - slow

– Frequency-domain simulation: parallel – fast

Frequency dependent parasitics

Simulation results

– Time-domain: voltage drops, simultaneous switching noise (SSN) – input dependent

– Frequency-domain: impedance, anti-resonance peaks – input independent

Page 5: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 5

Transform Operations: Why Discrete Fourier Transform

Laplace Transform [Wanping ’07]

– Input: Series of ramp functions

– Output: Rational expressing via vector fitting

• Vector fitting may introduce large errors

– Choice of frequency samples is case dependent

Discrete Fourier Transform (DFT)

– Input: periodic signal

– Inverse DFT is straightforward: vector fitting is not needed

– Frequency sample points with uniform steps

Page 6: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 6

Outline

Motivation

Adaptive simulation flow using discrete Fourier transform (DFT)

– Basic DFT flow

– Problems with the basic DFT flow

– Adaptive DFT flow

Experimental results

– Error analysis

– Time efficiency of the adaptive flow

– Time efficiency of parallel processing

Conclusions

Page 7: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 7

Basic DFT Simulation Flow

Page 8: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 80 0.5 1 1.5 2 2.5

x 10-8

-1

0

1

2

3

4

5

6x 10

-3

Time (sec)V

olta

ge (

V)

0 0.5 1 1.5 2 2.5

x 10-8

-1

0

1

2

3

4

5

6x 10

-3

Time (sec)

Vol

tage

(V

)

Problem with Basic DFT Flow

“Wrap-around effect” requires long padding zeros at the end of the input

– Periodicity nature of DFT

Small uniform time steps are needed to cover the input frequency range

0 0.2 0.4 0.6 0.8 1

x 10-7

-1

0

1

2

3

4

5

6

7

8x 10

-3

Time (sec)

Cur

rent

(A

)

0 0.2 0.4 0.6 0.8 1

x 10-7

-1

0

1

2

3

4

5

6

7

8x 10

-3

Time (sec)

Cur

rent

(A

)

0 0.5 1 1.5 2 2.5

x 10-8

-1

0

1

2

3

4

5

6

7

8x 10

-3

Time (sec)

Cur

rent

(A

)

T 2T 3T 4T

T

DFTrepetition output

outputLarge number of simulation points! Correct

Distorted!

T

Wrap-around

Page 9: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 9

Adaptive DFT Simulation

Basic ideas of the adaptive DFT flow: cancel out the wrap-around effect by subtracting the tail from the main part of the output

– Main part of the output: obtained with small time step and small period; distorted by the wrap-around effect

– Tail of the output: low frequency oscillation; can be captured with large time steps

0 0.2 0.4 0.6 0.8 1

x 10-7

-1

0

1

2

3

4

5

6x 10

-3

Time (sec)

Vol

tage

(V

)

0 0.2 0.4 0.6 0.8 1

x 10-7

-1

0

1

2

3

4

5

6x 10

-3

Time (sec)

Vol

tage

(V

)

T 2T 3T 4T

---

0 0.5 1 1.5 2 2.5

x 10-8

-1

0

1

2

3

4

5

6x 10

-3

Time (sec)

Vol

tage

(V

)

T

Correct Distorted Correct!

Total number of simulation points is reduced significantly!

Page 10: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 10

Adaptive DFT Flow

Period[i]: the input period at each iteration

Interval[i]: the simulation time step at each iteration

FreqUpBd[i]: the upper bound of the input frequency range at each iteration

vi(t): tentative time-domain output within the frequency range [0, FreqUpBd] at each iteration

Iteration #1: obtain the main part of the output

Iteration #2~k: capture the oscillations in the tail of the output (high, middle, and low resonant frequencies)

For each iteration #i, i=k, k-1, …, 2, subtract the captured tail from the outputs at iteration #j, j<i to eliminate the wrap-around effect

Page 11: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 11

Outline

Motivation

Adaptive simulation flow using discrete Fourier transform (DFT)

– Basic DFT flow

– Problems with the basic DFT flow

– Adaptive DFT flow

Experimental results

– Error analysis

– Time efficiency of the adaptive flow

– Time efficiency of parallel processing

Conclusions

Page 12: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 12

Experimental Results: Test Case & Input

Test case: 3D PDN

– One resonant peak in the impedance profile

Input current

– Time step: ∆t = 20ps

– Duration: T0 = 16.88ns

104

106

108

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

Frequency (Hz)

Impe

danc

e (

)

0 0.5 1 1.5 2

x 10-8

-1

0

1

2

3

4

5

6

7

8x 10

-3

Time (sec)

Cur

rent

(A

)

Impedance Original Input

Page 13: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 130 1 2 3 4 5 6 7 8 9

x 10-8

-5

0

5

10x 10

-3

Time (sec)

Vol

tage

(V

)

Final outputv

1(t)

Final output-v1(t)

0 1 2 3 4 5 6 7 8 9

x 10-8

-5

0

5

10x 10

-3

Time (sec)

Vol

tage

(V

)

0 0.5 1 1.5 2 2.5

x 10-8

-4

-2

0

2

4

6

8

10x 10

-3

Time (sec)

Vol

tage

(V

)Experimental Results: Adaptive Flow Process

Iteration #1: v1(t)

– ∆t1=20ps

– T1=20.48ns

Iteration #2: v2(t)

– ∆t2 = 64∆t1

= 640ps

– T2 = 8T1

= 163.84ns

Final output:

– Main part:

– Tail:

T1

2T1 4T13T1

v1(t), ∆t1=20ps, T1=20.48ns

3

1 2 1 11

( ) ( : ( 1) )m

v t v mT m T

Final output 3

1 2 1 11

( ) ( : ( 1) )m

v t v mT m T

2 1 1( :8 )v T T

v2(t), ∆t2=640ps, T2=163.84ns

· · ·8T1

Page 14: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 14

0 0.5 1 1.5 2

x 10-7

-4

-2

0

2

4

6

8

10x 10

-3

Time (sec)

Vol

tage

(V

)

DFT flowSPICE transient simulation

Experimental Results: DFT Flow vs. SPICE

Relative error: 0.25%

Page 15: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 15

Error Analysis: Error Caused by Wrap-around Effect

0 0.5 1 1.5 2 2.5

x 10-8

-1

0

1

2

3

4

5

6x 10

-3

Time (sec)

Vol

tage

(V

)

DFT flow, T=20.48nsDFT flow, T=163.84nsSPICE transient simulation

Theorem 1: Let be the initial value of the output voltage. Suppose for some , then the meansquare error, i.e., is bounded by .

0 0.5 1 1.5 2 2.5

x 10-8

-1.5

-1

-0.5

0

0.5

1

1.5x 10

-4

Time (sec)

Vol

tage

(V

)

Difference between DFT and SPICE, T=20.48nsDifference between DFT and SPICE, T=163.84ns

Relative error: 2.09%

Relative error: 0.12%

Output comparison Error relative to SPICE

Page 16: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 16

Error Analysis: Error Caused by Different Interpolation Methods

SPICE: PWL interpolation

DFT: sinusoidal interpolation

0 0.2 0.4 0.6 0.8 1

x 10-7

-1

0

1

2

3

4

5

6x 10

-3

Time (sec)

Vol

tage

(V

)

DFT flow, t=20ps

DFT flow, t=2.5psSPICE transient simulation

0 0.2 0.4 0.6 0.8 1

x 10-7

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2x 10

-5

Time (sec)

Vol

tage

(V

)

Difference between DFT and SPICE, t=20ps

Difference between DFT and SPICE, t=2.5ps

Output comparison Error relative to SPICE

2.294 2.2945 2.295 2.2955 2.296

x 10-8

1.28

1.29

1.3

1.31

1.32

x 10-4

Time (sec)

Vol

tage

(V

)

DFT flow, t=20ps

DFT flow, t=2.5psSPICE transient simulation

Relative error: 0.12%

Relative error: 0.09%

Page 17: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 17

Time Complexity Analysis: Adaptive vs. Non-adaptive

Adaptive flow time complexity:

– Ti: simulation period at iteration #i,

– ∆ti: simulation time step at iteration #i,

Non-adaptive flow time complexity:

1

( / )k

i ii

O T t

1( / )kO T t

1 2 kT T T

1 2 kt t t

0 100 200 300 400 500 600 7000

0.5

1

1.5

2

2.5

Time (sec)

Re

lativ

e e

rro

r to

Hsp

ice

(%

)

Non-adaptive methodAdaptive method

Page 18: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 18

Parallel Processing

Test case: 3D PDN

– ~ 0.17 million nodes

The adaptive DFT flow has more advantage when the number of available processors is limited.

Simulations between each iteration of the adaptive flow need to be processed serially

Simulation time with different number of processors (sec)

1 prc 4 prcs 8 prcs 16 prcs 64 prcs 128 prcs 256 prcs

Hspice 21374 NA NA NA NA NA NA

Basic DFT 15976 6635 3225 1647 425 218 143

Adaptive DFT 4947 2096 904 490 180 120 115

0 50 100 150 200 250 3000

2000

4000

6000

8000

10000

12000

14000

16000

Number of processors

Sim

ula

tion

tim

e (

sec)

Basic DFTAdpative DFT

Page 19: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 19

Parallel Processing: DFT Flow vs. SPICE

0 0.5 1 1.5 2 2.5

x 10-8

-1

0

1

2

3

4

5

6

7

8x 10

-3

Time (sec)

Vo

ltag

e (

V)

Adaptive DFTHspice

0 0.5 1 1.5 2 2.5

x 10-8

-4

-3

-2

-1

0

1

2

3x 10

-5

Time (sec)

Vo

ltag

e (

V)

Simulation result DFT error compared to Hspice

Page 20: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 20

Outline

Motivation

Adaptive simulation flow using discrete Fourier transform (DFT)

– Basic DFT flow

– Problems with the basic DFT flow

– Adaptive DFT flow

Experimental results

– Error analysis

– Time efficiency of the adaptive flow

– Time efficiency of parallel processing

Conclusions

Page 21: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 21

Conclusions

Implemented an adaptive flow for large PDN simulation using DFT

Total number of simulation points is reduced significantly compared to the basic DFT flow

Achieved a relative error of the order of 0.1% compared to SPICE

10x speed up with a single processor compared to SPICE.

Parallel processing is incorporated to reduce the simulation time even more significantly

Page 22: Xiang Hu 1 , Wenbo Zhao 2 , Peng Du 2 ,  Amirali Shayan 2 , Chung-Kuan Cheng 2

Page 22