IEEE TRANSACTIONS ON MICROWAVE THEORY …...and ,and are the frequency-dependent resistive,...

IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 60, NO. 3, MARCH 2012 451

Longitudinal-Partitioning-Based WaveformRelaxation Algorithm for Efficient Analysis ofDistributed Transmission-Line Networks

Sourajeet Roy, Student Member, IEEE, Anestis Dounavis, Member, IEEE, and Amir Beygi, Student Member, IEEE

Abstract—In this paper, a waveform relaxation algorithm is pre-sented for efficient transient analysis of large transmission-line net-works. The proposed methodology represents lossy transmissionlines as a cascade of lumped circuit elements alternating with loss-less line segments, where the lossless line segments are modeledusing the method of characteristics. Partitioning the transmissionlines at the natural interfaces provided by the method of charac-teristics allows the resulting subcircuits to be weakly coupled byconstruction. The subcircuits are solved independently using a pro-posed hybrid iterative technique that combines the advantages ofboth traditional Gauss–Seidel and Gauss–Jacobi algorithms. Theoverall algorithm is highly parallelizable and exhibits good scalingwith both the size of the network involved and the number of CPUsavailable. Numerical examples have been presented to illustratethe validity and efficiency of the proposed work.

Index Terms—Convergence analysis, delay, longitudinal parti-tioning, transient simulation, signal integrity, transmission line,waveform relaxation.

I. INTRODUCTION

W ITH the constant increase in operating frequencies,interconnects need to be modeled as distributed

transmission lines for accurate signal integrity analysis ofmodern integrated circuits (IC) [1]. Accurate modeling of largedistributed networks using commercial circuit solvers with inte-grated circuit emphasis (like SPICE) require significant centralprocessing unit (CPU) time and memory, thereby makingthem computationally prohibitive for fast transient simulation.The waveform relaxation (WR) algorithm has emerged as anattractive technique to reduce the simulation costs of such largenetworks [2]–[23]. Typically, waveform relaxation attempts tobreak a large circuit into smaller subcircuits that can be solvediteratively in sequence or in parallel. Each iteration involves anexchange of voltage/current waveforms between the subcircuitsfor the response to converge to the actual solution.Presently, two approaches exist for application of waveform

relaxation to transmission line networks. One such approach

Manuscript received September 26, 2011; accepted November 21, 2011. Dateof publication January 18, 2012; date of current version March 02, 2012. Thiswork was supported in part by the Natural Sciences and Engineering ResearchCouncil of Canada, Canada Foundation for Innovation, Canadian Microelec-tronics Corporation and Ministry of Research and Innovation—Early ResearchAward.The authors are with the Department of Electrical and Computer Engi-

neering, University of Western Ontario, London, ON, Canada N6A 5B9(e-mail: [email protected]; [email protected]; [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TMTT.2011.2178261

is the transverse partitioning scheme [11]–[14] where multi-conductor transmission lines (MTLs) are partitioned into singlelines by assuming weak capacitive and inductive coupling be-tween the lines. The coupling between the lines is representedas time-domain relaxation sources introduced into the circuitmodel of each line.An alternative waveform relaxation algorithm is based on

longitudinal partitioning of the network into repeated sub-circuits [4]–[8], [10], [16]. While longitudinal partitioningschemes based on the generalized method of characteristics(MoC) has been reported in [4]–[8], more recent works [16]have focused on partitioning the line based on segmentationmodels such as the conventional resistive-inductive-con-ductive-capacitive (RLGC) lumped model [24]. Partitioningtechniques based on segmentation models have a commonlimitation that since each segment directly feeds into the nextsegment, the adjacent segments are strongly coupled in physicalspace. This is reflected in the fact that blindly partitioning theconductor between segments requires resolving the stringentDirichlet’s transmission condition across the partition andconsequently exhibits poor convergence [16]. The work of[16] accelerated the convergence of the WR algorithm byartificially exchanging additional voltage/current waveforms(i.e., increasing the overlap between subcircuits) followed byoptimization routines.More recently, in [25], a WR algorithm based on the delay

extraction-based passive compact transmission-line (DEPACT)segmentation model [26], [27] was presented for two conductortransmission-line networks. The DEPACT model representslossy transmission lines as a cascade of lumped circuit elementsalternating with lossless line segments where the lossless linesegments are realized in the time domain using the MoC [24],[28]. The work of [25] exploited the inherent weak couplingacross the natural interfaces provided by the MoC [4]–[8] tolongitudinally partition the transmission line at these interfacesinto smaller, disjoint subcircuits. The iterative solution of thesubcircuits was performed using the sequential Gauss–Seidel(GS) technique and was shown to naturally achieve fast conver-gence without the need of any artificial exchange of waveformsor optimization techniques as proposed in [16].This work extends the concepts of [25] to multiconductor

transmission-line systems. Furthermore, the efficiency of theproposed algorithm for any general transmission-line network(two conductor or multiconductor) has been investigated on par-allel processing-based platforms. To this end, two highly paral-lelizable iterative techniques have been implemented—the tra-ditional Gauss–Jacobi (GJ) and a novel hybrid technique that

0018-9480/$31.00 © 2012 IEEE

452 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 60, NO. 3, MARCH 2012

combines the complimentary features of Gauss–Seidel (GS) andthe Gauss–Jacobi (GJ). This hybrid technique exhibits supe-rior convergence properties when compared to the traditionalGJ algorithm while maintaining its high parallelizability withrespect to the number of CPUs available. In addition, a mathe-matical framework has been provided to demonstrate the scal-ability of the algorithm with respect to both the size of the net-work involved and the number of CPUs available for parallelprocessing. Numerical examples have been provided to illus-trate the validity and efficiency of the proposed WR algorithmover full SPICE simulations.The paper is organized as follows. Section II deals with the

background of waveform relaxation algorithms and concludeswith a review of the DEPACT model [26], [27]. Section IIIpresents the details of the proposed algorithm and Section IV de-scribes the mathematical framework for analyzing the compu-tational cost of the proposed work. The numerical examples andconclusions are presented in Sections V and VI, respectively.

II. BACKGROUND AND DEPACT MODEL

In order to explain the contributions of the proposed work,here we briefly discuss the background of general waveformrelaxation algorithms followed by a review of the DEPACTmodel.

A. Background of Waveform Relaxation Algorithms

Waveform relaxation, from its introduction in [2], has provento be an attractive algorithm to address the issue of exorbitantcomputational costs for solving large networks using traditionalcircuit solvers like SPICE. The algorithm is based on parti-tioning large networks into smaller subcircuits where the cou-pling between the subcircuits is represented using time-domainrelaxation sources introduced into each subcircuit. Assuming aninitial guess for the waveforms of the relaxation sources, thesubcircuits are solved independently. The present solution ofthe subcircuits is then used to update the relaxation sources forthe next iteration. This process is repeated until the error be-tween two successive iterations falls within a prescribed errortolerance. Solving the individual subcircuits using modern par-allel processing resources has allowed the utilization of mul-tiprocessor hardware and provided significant CPU savings inmemory and time compared with traditional full circuit simu-lation [14]. It is noted that the main limitation of relaxation al-gorithms is the speed of convergence of the iterations. Severalmethods have been reported to speed up convergence, such astimewindowing [3], overlapping subdomains [22], [23], and op-timization [16], [22].

B. Review of DEPACT Model

A general coupled MTL system for quasi-transverse elec-tromagnetic (TEM) mode of propagation is described by theTelegraphers partial differential equations [24]

(1)

where and represent the spatial distributionof the voltage and current along the longitudinal direction

and , and are the frequency-dependentresistive, inductive, conductive, and capacitive per-unit-length(p. u. l.) parameters of the line, respectively. The solution ofthe above equations can be written as an exponential matrixfunction [29], [30] as

(2)

where

(3)

and and are the p. u. l. induc-tive and capacitive parameters at the maximum frequency ofinterest . Typically, the solution of (2) does nothave an exact time domain counterpart and hence segmentationbased modeling techniques [26], [27], [29]–[34] are generallyused to derive an equivalent time domain expression of (2). Ofthese segmentation algorithms, the DEPACT is suitable for elec-trically long transmission lines due to the fact that it explicitlyextracts the delay of the network leading to smaller number oflumped segments.However, extracting the delay terms from is

not a trivial task since the matrices and do not commute(i.e., ). To approximate in terms ofa product of exponentials, a modified Lie product [35] is usedas

(4)

where is the number of sections. The associated error of theapproximation scale as [34] (i.e., (4) quicklyconverges to the exponential matrix of (2) with increase innumber of sections ). Equation (4) provides a methodology ofdiscretizing the transmission line into a cascade of alternatingsubsections with the individual stamps of and , asillustrated in Fig. 1 (for single lines) and Fig. 2 (for MTLs).The exponential matrix represents the attenuation

losses of the transmission line. Since does not containand , it can be approximated by a low-order rational

function, which in turn can be realized in SPICE using eitherlumped RLC elements or lumped dependent sources [26], [27].As a result, the subsections with stamps of are replacedby a macromodel referred to as “lumped circuit elements” inFigs. 1 and 2. On the other hand, the matrix containsonly and and can be modeled as a lossless line usingthe MoC [24], [28]. As a result, the subsections with stamps of

are replaced by the equivalent MoC circuit [24], [28] inFigs. 1 and 2. More detailed derivations of a SPICE realizationof the DEPACT model of (4) has been provided in [26] and[27]. The rational macromodel describing the lossy sectionsand the MoC equations describing the lossless sections bothenjoy exact representations in the time domain and togetherapproximate the frequency domain solution of (2) as a set of

ROY et al.: WAVEFORM RELAXATION ALGORITHM FOR EFFICIENT ANALYSIS OF TRANSMISSION-LINE NETWORKS 453

Fig. 1. SPICE equivalent circuit of a two conductor transmission line using DEPACT.

delayed ordinary differential equations in the time domainwhich can be solved by SPICE.Section III discusses the development of the proposed WR

algorithm based on the DEPACT model of (4).

III. DEVELOPMENT OF PROPOSED ALGORITHM

Here, we begin by describing the proposed longitudinal par-titioning scheme for single lines and the methodology to itera-tively solve the subcircuits. From this discussion, the algorithmis extended to MTLs.

A. Proposed Partitioning Scheme for Single Lines

The DEPACT model of (4) provides a methodology to dis-cretize two conductor transmission lines into alternating cas-cade of lossy and lossless line segments (Fig. 1). To better ex-plain the proposed partitioning methodology, consider the equa-tions for the th lossless line segment in Fig. 1 given as follows:

(5)

where are the near and far end voltages, respec-tively, and are the near and far end currents. respec-tively, of the th lossless line segment. Using simple algebraicmanipulations on (5) followed by converting the resultant equa-tions into the time domain provides the following MoC relation[24], [26], [27]:

(6)

where and are the char-acteristic impedance and the delay of each lossless section, re-spectively. The MoC equations of (6) can be realized by thesimple circuit equivalent of Fig. 1. From Fig. 1, it is observedthat the MoC provides natural interfaces across which informa-tion is exchanged using the time delayed equations of (6) rather

than the more stringent Dirichlet’s transmission conditions. Asa result, partitioning the transmission lines at these interfacesas shown in Fig. 3 was found to yield reliably efficient con-vergence without the need for artificial overlap of subcircuitsand optimization like [16]. From (6), it can be further concludedthat the delayed sources serve asthe relaxation sources responsible for ensuring the coupling be-tween the subcircuits for the proposed WR algorithm. The nextsection describes the methodology to iteratively solve the sub-circuits and update the relaxation sources.

B. Iterative Solution of Subcircuits for Single Lines

Typically, two techniques exist for the iterative solution of thesubcircuits—the Gauss–Seidel (GS) and the Gauss–Jacobi (GJ)techniques. According to the GS technique, the th iterative so-lution of any th subcircuit requires the present ( th) solution ofall of the preceding th subcircuits as well. This translates to asequential solution of the subcircuits where all of the relaxationsources are updated after solution of each individual subcircuit[3]. On the other hand, according to the GJ iterative technique,the th iterative solution of any th subcircuit requires only theprevious ( th) solution of all subcircuits. This correspondsto a possible parallel solution of the subcircuits where the relax-ation sources are only updated when the solution of all subcir-cuits is complete [3]. The above discussion shows that the GStechnique involves updates or exchanges of information periteration where is the number of subcircuits, compared withGJ that involves only one exchange of information. Thus, GSexhibits better convergence than GJ [15]. However, a potentialdrawback of GS is that it does not naturally lend itself to par-allel processing like the GJ technique since the present solutionof any th subcircuit is dependent on the present solution of allprevious subcircuits.In [25], a sequential GS iterative technique to solve the sub-

circuits was implemented. In this work, with the focus being onhighly parallelizable iterative techniques, two schemes are pro-posed—first, the traditional GJ technique, followed by a hybridtechnique that combines the complementary features of GS andGJ.


Fig. 2. SPICE equivalent circuit of an MTL using DEPACT.

Fig. 3. Partitioning of single line into subcircuits for waveform relaxation.

1) Gauss–Jacobi (GJ): This discussion begins by con-sidering a general two-conductor transmission line dis-cretized into subcircuits, as illustrated in Fig. 3. Priorto beginning the th iteration, it is assumed that the

th iteration has been completed for all subcir-cuits and waveforms of all of the relaxation sources havebeen updated to .For , the waveforms of the relaxation sources,

, is simply the initial guess.For the th iteration, considering the th subcircuit of Fig. 3,

the corresponding relaxations sources with known waveformsserve as the input excitation. This

translates to the following terminal conditions for the thsubcircuit:

(7)

The terminal conditions of (7) along with the equations ofthe corresponding lumped circuit elements, together formthe set of ordinary differential equations describing the thsubcircuit, which can be solved for a self consistent solutionof the waveforms . It is noted that the

relaxation sources of (7) (i.e., ) of

each th subcircuit are assumed to be known beforehand and,hence, considered independent of the present ( th) solution ofthe remaining subcircuits. This particular aspect allowsthe subcircuits to be solved in parallel on a multiprocessormachine.Once all of the subcircuits are solved, the voltage wave-

forms , determined from the present ( th)iteration, is used to update the relaxation sources for the future

th iteration using (6) as follows:

(8)

The total equations of (8) required to update all of the relax-ation sources, being decoupled, can be solved in parallel as well.Using the updated values of (8) as the new source waveformsfor the next th iteration, the subcircuits are solved again.This iterative cycle continues until the absolute error satisfies apredefined tolerance expressed as

(9)

where is the predefined error tolerance.


Fig. 4. Hybrid GS–GJ iterative technique.

2) Hybrid GS–GJ: To explain this contribution, thesubcircuits of Fig. 3 is considered to be divided among twogroups—group A containing the odd numbered subcircuits andgroup B containing the even numbered subcircuits, where thetotal number of subcircuits within each group is defined as

—group A

—group B (10)

and represents the modulus function. Since, for the specificcase of longitudinal partitioning, coupling exists between anodd-numbered and an even-numbered subcircuit only (and notbetween two odd-numbered or two even-numbered subcircuitsthemselves), the th iterative solution of any subcircuit in anygroup is independent of the present ( th) solution of any othersubcircuit within the same group and rather depends on thepresent ( th) solution of particular subcircuits within the op-posite group. This coupling is addressed using a nested itera-tive technique. The outer iteration solves groups A and B in se-quence (using GS) with updating the relaxation sources afterevery group solution. The inner iteration solves the subcircuitswithin each group in parallel (using GJ). This forms the basisof the proposed hybrid iterative technique and is illustrated inFig. 4.In each iteration, the GS sequence begins with group A be-

fore proceeding to group B. Hence, prior to beginning the thiteration, it is assumed that the th iteration has been com-pleted for all subcircuits and those relaxation sources respon-sible for exciting only the odd numbered subcircuits (group A)in Fig. 3 have been updated to

. If , the waveforms of the above relaxationsources, is simply the initial guess. For theth iteration, using the above relaxation sources with knownwaveforms as the input excitation to the corresponding subcir-cuits of group A, the subcircuits can be solved in parallelvia the GJ technique explained in previous section. Once theGJ is concluded, voltage waveformsdetermined from the present ( th) iteration of group A is used toupdate the relaxation sources responsible for exciting only theeven numbered subcircuits (group B) of Fig. 3 as

(11)

The total equations of (11) can be solved in parallel,similar to (8).The relaxation sources

of (11) serve as the input for the correspondingsubcircuits of group B and the subcircuits can also besolved in parallel using the GJ technique. The voltage wave-forms determined from the present( th) iteration of group B is used to update the relaxationsources responsible for exciting only the subcircuits of groupA for the future th iteration as

(12)

The total equations of (12) can be solved in parallel aswell. The above iterative cycle continues until the absolute errorof the iterations satisfies the error tolerance as in (9). It is notedthat the hybrid technique provides more frequent exchange ofwaveforms using (11)–(12) compared with traditional GJ whichallows only a single exchange of (8). As a result, the hybridtechnique exhibits better convergence than GJ. In Section III-C,the proposed algorithm is extended for MTLs.

C. Extension for Multiconductor Transmission Lines

To better explain the partitioning methodology for MTLs, theequations for the th lossless line segment in Fig. 2 is providedas

(13)

It is observed that (13) leads to coupled equations. However,the coupled lossless sections can be decoupled into singlelossless lines using a linear transformation of modal voltages/currents as

(14)


Fig. 5. Partitioning of MTLs into subcircuits for waveform relaxation.

where and are constant matrices chosen to diagonalizeand and have the following properties [24]:

(15)

and arediagonal matrices and the superscript denotes the transposeof the matrix. Replacing (14) and (15) in (13) and performingthe same algebraic manipulations as in Section III-A followedby converting the resultant equations into the time domain, thedecoupled lossless sections can be represented using the MoCequations similar to (6) as

(16)

where represents the line number, andrepresents the characteristic impedance and delay of

each lossless section, respectively, of the th line and

(17)

where arethe time-domain counterparts of the vectors

, respectively, de-fined in (14). The MoC equations (16) for MTLs can berealized using the equivalent circuit of Fig. 2, where thematrices and arising from the similarity transformationof (14) is grouped with the lumped representation of thelossy section. It is observed that, similar to the single-linecase of Fig. 1, the MoC provides natural interfaces forMTLs across which information is exchanged using the timedelayed equations of (16). Hence, longitudinally partitioningtransmission lines at these interfaces, as shown in Fig. 5,

is expected to yield efficient convergence of the proposedWR algorithm. The following section describes the iterativesolution of the subcircuits of Fig. 5.

D. Iterative Solution of Subcircuits for MTLs

Once the MTL network is partitioned using the abovemethodology, both the GJ and hybrid GS-GJ iterative techniquecan be used to solve the subcircuits as explained below. Theiterative procedures (GJ and hybrid GS-GJ) for MTLs aresimilar to that of two conductor line with the main differencebeing that, the MoC equations of (6) now has to be extended toconsider the decoupled equations of (16).1) GJ for MTLs: This discussion begins by considering a

general MTL discretized into subcircuits as illustrated inFig. 5. Assuming that the waveforms of all of the relaxationsourcesare known from the previous th iteration and are usedas input excitations for the subcircuits of Fig. 5, the terminalconditions required for the th iterative solution of the thsubcircuits is changed from (7) to include the effect of MTLsdescribed by (16) as

(18)

Since the relaxation sources of (18) (i.e.,) of each th subcircuit

are assumed known beforehand and independent of the present( th) solution of the remaining subcircuits, thesubcircuits can be solved in parallel, similar to two conductorlines. The th iterative solution of all of the subcircuitsprovides the self consistent solution of the waveforms

which are thereafter used to update therelaxation sources for the future th iteration using (16) as

(19)

This iterative cycle continues until the absolute error satisfies apredefined tolerance as

(20)


2) Hybrid GS–GJ for MTLs: The characteristic of lon-gitudinal partitioning where couplings exist between an oddnumbered and an even numbered subcircuit only (and notbetween two odd-numbered or two even-numbered subcircuitsthemselves), is applicable to MTLs as well. Hence, the hybriditerative technique of Fig. 4 can be easily extended to MTLs.Assuming that the waveforms of all of the relaxation

sources responsible for exciting the subcircuits of group Aare

known from the previous th iteration, the subcircuitsof group A can be solved in parallel via the GJ techniqueexplained in the previous section. Once the GJ is concluded,voltage waveforms determinedfrom the present ( th) iteration of group A is used to update therelaxation sources responsible for exciting only the subcircuitsof group B as

(21)

The relaxation sources of (21) now serve as the input for thecorresponding subcircuits of group B and the subcircuitscan also be solved in parallel using the GJ technique. Thevoltage waveforms determined fromthe present ( th) iteration of group B is used to update therelaxation sources responsible for exciting only the subcircuitsof group A for the future th iteration as

(22)

The above iterative cycle of continues till the absolute error ofthe iterations satisfies the error tolerance as in (20). Equations(21)–(22) provide twice the amount of waveform exchangecompared to the single waveform exchange of (19) and hence,the hybrid technique exhibits improved convergence comparedwith the GJ technique.

IV. COMPUTATIONAL COMPLEXITY OF THE PROPOSEDALGORITHM

The analysis begins by considering a general MTL networkof Fig. 2 discretized into DEPACT sections. Assuming eachDEPACT section to be described using number of delayedordinary differential equations, the size of the overall circuitmatrix describing the original network is . The com-putational complexity of directly inverting the above matrix toperform time-domain analysis is or [36], [37].However, the matrices obtained by traditional circuit simulatorsare sparse by nature and can be solved more efficiently usingsparse matrix routines at a cost of where typically

depending on the sparsity of the matrix [11].For large distributed networks, the interconnect have to be dis-cretized into many segments to accurately capture the responseat the output ports. For such cases, the super linear scaling of the

computational cost for traditional circuit simulators is a majorfactor limiting its applicability. To address the above issue inthe proposed WR algorithm, the DEPACT sections are sepa-rated into subcircuits each described using delayed differen-tial equations which can now be solved independently. The totalcomputational cost of the proposed WR algorithm is mathemat-ically quantified using the following lemmas.Lemma 1: For subcircuits, the computational cost of

the proposed WR algorithm using traditional GJ iterations is, where is the number of iterations and is the

number of CPUs available for parallel processing.Proof: For typical WR algorithms, the total computational

cost can be divided into two parts—the first part is to solve thesubcircuits independently and the next is to update the relax-

ation sources.It is assumed that the cost of solving one subcircuit scales

as , where is the scaling coefficient. Using a GJ itera-tive technique where the task of independently solving sub-circuits can be distributed over CPUs, the total cost of solvingthe subcircuits per iteration is given by . Thesecond stage of the algorithm involves updating the re-laxation sources using (8) and (19). This translates to the so-lution of linear algebraic equations in the time domainper iteration. Since the equations are all decoupled, they canbe solved independently in parallel using CPUs for a cost of

where is the scaling coefficient for the secondpart of the proposed WR algorithm. Since, within the context ofthis analysis, is a constant, the above cost can be rewrittenas .The total cost of each iteration is the sum of the above costs

given as

(23)

where is the cost of each GJ iteration. Since the aboveprocess needs to be redone for iterations, the total cost ofthe proposed algorithm using traditional GJ is

(24)

where is the total cost of the proposed algorithm using GJ.It is observed that the solution of the linear algebraic

equations to update the relaxation sources of (8) and (19) doesnot involve any matrix inversion. On the other hand, the solu-tion of each subcircuits involves the inversion of a matrix ofsize . As a result, the cost of solving the subcircuits (first part)is found to dominate over the cost of updating the relaxationsources (second part) [13] (i.e., ). Hence, the resultof (24) can be simplified to

(25)

where, within the context of this work, is a function of thenumber of MTLs and is treated as a constant. Equation(25) demonstrates that the proposed WR algorithm scales as

when using the traditional GJ. The following lemmaextends the above analysis to the hybrid iterative technique.Lemma 2: For subcircuits, the computational cost of the

proposed WR algorithm using the hybrid GS–GJ iterations is, where is the number of iterations.


Fig. 6. Circuit of Example 1.

Proof: The cost of the proposed WR algorithm using thehybrid iterative technique can be divided into two parts—thefirst part is to solve the subcircuits and update therelaxation sources using (21). The second part is to solve the

subcircuits and update the relaxation sourcesusing (22). Since updating the relaxation sources using (21) and(22) does not require any matrix inversion, the contribution ofsolving (21) and (22) is minimal compared with the cost of thesolution of each subcircuit. As a result, the total cost of the hy-brid iterative technique can be approximated as simply the costof the independent solution of the and subcircuits.The computational cost of solving the subcircuit per

iteration using the GJ technique with parallel CPUs is given by(from Lemma 1). Similarly, the cost of the

subcircuits per iteration is approximated as .Since the solution of and subcircuits proceeds insequence, the total cost of the hybrid technique per iteration isthe sum of the above two costs, given here as

(26)

Multiplying the above cost with the number of iterations (in thiscase, ) provides an estimate of the full computational cost ofthe proposed WR algorithm using the proposed GS-GJ hybriditerative technique as follows:

(27)

From the definition of and in (10), (27) can be ap-proximated to

(28)

Equation (28) demonstrates that the proposed WR algorithmscales as when using the hybrid iterative technique.Comparing the scaling of (25) and (28) with the number of

available CPUs , it is appreciated that the hybrid iterativetechnique retains the high degree of parallelizability as the GJ

technique. However, the hybrid technique has the added advan-tage of faster convergence over the GJ counterpart due to thegreater exchange of waveforms using (11) and (12) and (21)and (22) compared with the single exchange of (8) and (19).It is observed that the main reason behind the attractiveness

of the proposed algorithm [whether using GJ as in (25) or thehybrid technique as in (28)] is the ability to solve the subcircuitsindependently. This translates to an almost linear scaling of thecomputational costs of the proposed algorithm with number ofDEPACT sections unlike SPICE which suffers from a superlinear scaling. In addition, using GJ and the hybrid techniqueprovides an additional advantage over SPICE (and GS basedWR algorithms like [25]) of dividing the computational costof the proposed algorithm over multiple CPUs . Theseresults will be validated using the numerical examples inSection V.

V. NUMERICAL EXAMPLES

Three examples are presented here to demonstrate the validityand efficiency of the proposed algorithm. For a fair comparisonof the proposed work with full SPICE simulations, all of thesubcircuits of the WR iterations are also solved using SPICE.A customized C++ code is used to extract the waveforms ofthe th subcircuit and update the relaxation sources without anyexternal communication between the user and SPICE engine.The scheduling of each subcircuit solve (whether using GJ orGS–GJ technique) is automated using MATLAB 2010b. Withinthe context of this work, full SPICE simulations refer to theDEPACT algorithm of [26] and [27].Example 1: The objective of this example is to demonstrate

the accuracy of the proposed WR algorithm and the superiorconvergence of the hybrid iterative technique over the tradi-tional GJ technique. For this example a transmission line net-work consisting of seven transmission line segments as shownin Fig. 6 is considered. The p. u. l. parameters of the networkare 0.25 /cm, 4 nH/cm, pF/cm,mmho/cm and 5 /cm where

represents the skin effect losses as a function offrequency [38], [39]. The network is excited by a trapezoidalvoltage source of rise time 0.1 ns, pulsewidth 5 ns,


Fig. 7. Transient response for Example 1 using the proposed algorithm and full SPICE simulation. All line lengths are cm. (a) Transient response at outputport . (b) Transient response at output port .

Fig. 8. Convergence properties of the proposed hybrid iterative technique com-pared to GJ. All line lengths are cm.

amplitude of 2 V, and loaded with two SPICE level 49, CMOSinverters using 180-nm technology.To illustrate the accuracy of the proposed algorithm, the line

length of each segment is set to 30 cm. In this case, thenumber of subcircuits required is 420. The network is thensolved using both proposed work and the full SPICE simula-tion. The proposed work uses the hybrid iterative techniqueto solve the subcircuits on a sequential platform withthe predefined error tolerance set to and an initialguess of the relaxation sources set to the dc solution of zero.The transient responses at the far end of the networkusing the proposed WR algorithm and full SPICE simulationsare shown in Fig. 7.Next, the convergence properties of the proposed hybrid tech-

nique are compared with the traditional GJ technique. For eachalgorithm, the number of iterations is varied from 1 to 10 and thescaling of the associated error [ of (9)] is displayed in Fig. 8.It is observed that the proposed hybrid technique shows signif-icantly faster convergence than the traditional GJ algorithms.This is due to the fact that the proposed hybrid technique in-volves twice the amount of information exchange as the GJtechnique for same number of iterations (see Sections III-B andIII-D).

Example 2: The objective of this example is to illustrate thecomputational efficiency of the proposed work over full SPICEsimulations for MTL structures. For this example, a seven-cou-pled line network with the physical dimensions as shown inFig. 9(a) is considered. The p. u. l. parameters for this exampleare extracted from the HSPICE field solver [38] and include fre-quency dependent parameters. For the following analyses, theMTL network topology is shown in Fig. 9(b), where lines 1, 3,5, and 7 are excited with trapezoidal voltage sources of rise time

0.1 ns, pulsewidth 5 ns, and amplitude of 2 V.This example begins with a demonstration of the performance

of the proposed work compared with full SPICE simulations asthe size of the network increases. The line length of the networkin Fig. 9(b) is increased from 0 to 200 cm in steps of 10 cm.

To accurately model the network, the numbers of subcircuitsare increased in steps of 16 for each 10-cm step and range from0 to 320. For each case, the network is solved using both pro-posed work and the full SPICE simulation. The proposed workuses both the hybrid technique and traditional GJ technique on asequential platform with the predefined error toleranceset to and an initial guess of the relaxation sourcesset to the DC solution of zero. For this particular error tolerance,the number of iterations required for convergence is found tobe consistently between 5 and 6. The accuracy of the proposedwork (with the hybrid technique) compared to full SPICE sim-ulation is illustrated in Fig. 10 for cm (i.e., for 80 subcir-cuits). The scaling of the computational cost of both proposedwork and full SPICE simulation with the line length is shownin Fig. 11(a). It is observed from Fig. 11(a) that the proposedwork scales almost linearly for both GJ and the hybridalgorithm as predicted in (25), (28) respectively while the fullSPICE solution of the original network scale super linearly as

where for this example. In addition, the hy-brid iterative technique converges twice as fast as traditional GJtechnique.Next, the performance of the proposed work is demonstrated

on a parallel platform. The length of the network is fixed at thecorner of our design space where cm and the networksolved using both proposed work and full SPICE simulation.The proposed WR iterations are performed using both the hy-brid technique and the traditional GJ technique where numberof processors are varied from to for the same


Fig. 9. Transmission line structure of example 2.

Fig. 10. Transient response for Example 2 using proposed WR algorithm and the SPICE full simulation. Line length of the network is cm. (a) Transientresponse at output port . (b) Transient response at output port .

Fig. 11. Scaling of computational cost for Example 2. (a) Scaling of computational cost with line length where number of CPUs . (b) Scaling of CPUspeed up with number of CPUs where line length cm.

error tolerance as before. The CPU speed up offered by bothiterative techniques over full SPICE simulations is shown inFig. 11(b) and summarized in Table I. The speed up for eitheriterative technique scale almost linearly with number of proces-sors, thereby demonstrating the high parallelizability of both astheoretically expected from (25) and (28). The minor deviationof Fig. 11(b) from the exactly linear scaling of (25) and (28)with respect to number of CPUs is due to the incurred com-munication overheads between processors.Example 3: For this example a network consisting of a

cascade of subnetworks as shown in Fig. 12 is considered.Each subnetwork consists of the three coupled MTL structureof [40] with line length cm. For the following analysis,

TABLE ICPU TIME COMPARISON FOR EXAMPLE 2

line one and three of the network is excited with a trapezoidalvoltage source of rise time ns, pulsewidth ns


Fig. 12. Circuit of Example 3.

Fig. 13. Scaling of computational cost for Example 3. (a) Scaling of computational cost with number of subnetworks where number of CPUs .(b) Scaling of computational cost with number of CPUs where number of subnetworks .

and amplitude of 5 V. Each subnetwork is modeled using eightsubcircuits.In this analysis, the number of subnetworks ( of Fig. 12) is

increased from 0 to 50 in steps of 5 (i.e., the number of subcir-cuits are increased from 0 to 400 in steps of 40). For each case,the network is solved using both proposed work and the fullSPICE simulation. The WR iterations for the proposed work isperformed using the hybrid technique on a sequential machine

with the predefined error tolerance set toand an initial guess of the relaxation sources set to the dc so-lution of zero. For this particular error tolerance, the numberof iterations required for convergence was found to be consis-tently between 6 and 7. The scaling of the computational costof both proposed work and full SPICE simulation with isdemonstrated in Fig. 13(a). Similar to the previous example, theproposed WR algorithm shows linear scaling with thesize of the network compared to the super linear scaling of fullSPICE ( where for this example).Next, the performance of the proposed work is demonstrated

on a parallel platform. The number of subnetworks is fixed atthe corner of our design space where and the networksolved using both proposed work and full SPICE simulation.The proposed WR iterations are performed on a parallel plat-form where number of processors are varied from to

and the same error tolerance of is used withan initial guess of the relaxation sources set to the DC solutionof zero. The scaling of the CPU speed up offered by the pro-posed algorithm over full SPICE simulations as a function of thenumber of processors is shown in Fig. 13(b) and summarized inTable II. As expected, the speed up for the proposed WR algo-

TABLE IICPU TIME COMPARISON FOR EXAMPLE 3

rithm scales almost linearly with number of processors, similarto Example 2.

VI. CONCLUSION

In this paper, a longitudinal-partitioning-based waveform re-laxation algorithm for efficient transient analysis of distributedtransmission-line networks is presented. The proposed method-ology represents lossy transmission lines as a cascade of lumpedcircuit elements alternating with lossless line segments, wherethe lossless line segments are modeled using the method of char-acteristics. Partitioning the transmission lines at the natural in-terfaces provided by the method of characteristics allows theresulting subcircuits to be weakly coupled by construction. Thesubcircuits are solved independently using a hybrid iterativetechnique that combines the fast convergence of the proposedGS technique with the parallelizability of the GJ technique. Nu-merical examples illustrate that the proposed algorithm exhibitsgood scaling with both the size of the network and the number ofCPUs available for parallel processing, thereby providing sig-


nificant savings in run time costs compared with full SPICEsimulations.

REFERENCES

[1] R. Achar and M. Nakhla, “Simulation of high-speed interconnects,”Proc. IEEE, vol. 89, no. 5, pp. 693–728, May 2001.

[2] E. Lelarasmee, A. E. Ruehli, and A. L. Sangiovanni-Vincentelli, “Thewaveform relaxation method for time-domain analysis of large-scaleintegrated circuits,” IEEE Trans. Comput.-Aided Des. (CAD) Integr.Circuits Syst., vol. CAD-1, no. 3, pp. 131–145, Jul. 1982.

[3] J. White and A. L. Sangiovanni-Vincentelli, Relaxation Techniques forthe Simulation of VLSI Circuits. Norwell, MA: Kluwer, 1987.

[4] F. Y. Chang, “The generalized method of characteristics for waveformrelaxation analysis of lossy coupled transmission lines,” IEEE Trans.Microw. Theory Tech., vol. 37, no. 12, pp. 2028–2038, Dec. 1989.

[5] F. Y. Chang, “Waveform relaxation analysis of RLCG transmissionlines,” IEEE Trans. Circuits Syst., vol. 37, no. 11, pp. 1394–1415, Nov.1990.

[6] F. Y. Chang, “Relaxation simulation of transverse electromagneticwave propagation in coupled transmission lines,” IEEE Trans. CircuitsSyst., vol. 38, no. 8, pp. 916–936, Aug. 1991.

[7] F. Y. Chang, “Waveform relaxation analysis of nonuniform lost trans-mission lines characterized with frequency dependent parameters,”IEEE Trans. Circuits Syst., vol. 38, no. 12, pp. 1484–1500, Dec. 1991.

[8] F. Y. Chang, “Transient simulation of nonuniform coupled lossytransmission lines characterized with frequency-dependent param-eters—Part I: Waveform relaxation analysis,” IEEE Trans. CircuitsSyst. I, Fundam. Theory Appl., vol. 39, no. 8, pp. 585–603, Aug. 1992.

[9] J. Mao and Z. Li, “Waveform relaxation solution of ABCD matricesof nonuniform transmission lines for transient analysis,” IEEE Trans.Comput.-Aided Des. (CAD) Integr. Circuits Syst., vol. 13, no. 11, pp.1409–1412, Nov. 1994.

[10] F. C. M. Lau and E. M. Deeley, “Transient analysis of lossy cou-pled transmission lines in a lossy medium using the waveform relax-ation method,” IEEE Trans. Microw. Theory Tech., vol. 43, no. 3, pp.692–697, Mar. 1995.

[11] N. M. Nakhla, A. E. Ruehli, R. Achar, and M. S. Nakhla, “Simulationof coupled interconnects using waveform relaxation and transverse par-titioning,” IEEE Trans. Adv. Packag., vol. 29, no. 1, pp. 78–87, Feb.2006.

[12] N. Nakhla, A. E. Ruehli, M. S. Nakhla, R. Achar, and C. Chen, “Wave-form relaxation techniques for simulation of coupled interconnectswith frequency-dependent parameters,” IEEE Trans. Adv. Packag.,vol. 30, no. 2, pp. 257–269, May 2007.

[13] D. Paul, N. M. Nakhla, R. Achar, and M. S. Nakhla, “Parallel simu-lation of massively coupled interconnect networks,” IEEE Trans. Adv.Packag., vol. 33, no. 1, pp. 115–127, Feb. 2010.

[14] Y.-Z. Xie, F. G. Canavero, T. Maestri, and Z.-J. Wang, “Crosstalk anal-ysis of multiconductor transmission lines based on distributed analyt-ical representation and iterative technique,” IEEE Trans. Electromagn.Compatibil., vol. 52, no. 3, pp. 712–727, Aug. 2010.

[15] R. Achar, M. S. Nakhla, H. S. Dhindsa, A. R. Sridhar, D. Paul, and N.M. Nakhla, “Parallel and scalable transient simulator for power gridsvia waveform relaxation (PTS-PWR),” IEEE Trans. Very Large-ScaleIntegr. (VLSI) Syst., vol. 19, no. 2, pp. 319–332, Feb. 2011.

[16] M. Al-Khaleel, A. E. Ruehli, and M. J. Gander, “Optimized waveformrelaxation methods for longitudinal partitioning of transmission lines,”IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 56, no. 9, pp. 1732–1743,Aug. 2009.

[17] M. J. Gander and A. Stuart, “Space-time continuous analysis of wave-form relaxation for the heat equation,” SIAM J. Sci. Comput., vol. 19,no. 6, pp. 2014–2031, Nov. 1998.

[18] E. Giladi and H. B. Keller, “Space time domain decomposition for par-abolic problems,” Numer. Math., vol. 93, no. 2, pp. 279–313, 2002.

[19] W. T. Beyene, “Application of multilinear and waveform relaxationmethods for efficient simulation of interconnect-dominated nonlinearnetworks,” IEEE Trans. Adv. Packag., vol. 31, no. 3, pp. 637–648, Aug.2008.

[20] V. B. Dmitriev-Zdorov and B. Klaassen, “An improved relaxation ap-proach for mixed system analysis with several simulation tools,” inProc. EURO-DAC, 1995, pp. 274–279.

[21] V. B. Dmitriev-Zdorov, “Generalized coupling as a way to improvethe convergence in relaxation-based solvers,” in Proc. EURO-DAC/EUROVHDL Exhib., Geneva, Switzerland, Sep. 1996.

[22] M. J. Gander and L. Halpern, “Optimized Schwarz waveform relax-ation methods for advection reaction diffusion problems,” SIAM J.Numer. Anal., vol. 45, no. 2, pp. 666–697, Apr. 2007.

[23] M. J. Gander, “Overlapping Schwarz waveform relaxation methods forparabolic problems,” in Proc. Algoritmy, 1997, pp. 425–431.

[24] C. R. Paul, Analysis of Multiconductor Transmission Line. NewYork: Wiley-Interscience, 2008.

[25] S. Roy and A. Dounavis, “Longitudinal partitioning based waveformrelaxation algorithm for transient analysis of long delay transmissionlines,” in IEEE MTT-S Int. Microw. Symp. Dig., Baltimore, Jun. 2011,pp. 1–4.

[26] N. Nakhla, A. Dounavis, R. Achar, andM. S. Nakhla, “DEPACT: Delayextraction-based passive compact transmission-linemacromodeling al-gorithm,” IEEE Trans. on Adv. Packaging, vol. 28, no. 1, pp. 13–23,Feb. 2005.

[27] N. Nakhla, M. S. Nakhla, and R. Achar, “Simplified delay extrac-tion-based passive transmission line macromodeling algorithm,” IEEETrans. Adv. Packag., vol. 33, no. 2, pp. 498–509, May 2010.

[28] F. H. Branin, Jr., “Transient analysis of lossless transmission lines,”Proc. IEEE, vol. 55, no. 11, pp. 2012–2013, Nov. 1967.

[29] A. Odabasioglu, M. Celik, and L. T. Pilleggi, “PRIMA: Passivereduced-order interconnect macromodeling algorithm,” IEEE Trans.Comput.-Aided Des. (CAD) Integr. Circuits Syst., vol. 17, no. 8, pp.645–653, Aug. 1998.

[30] A. Dounavis, R. Achar, and M. Nakhla, “Efficient passive circuitmodels for distributed networks with frequency-dependent parame-ters,” IEEE Trans. Adv. Packag., vol. 23, no. 8, pp. 382–392, Aug.2000.

[31] A. Dounavis, R. Achar, and M. Nakhla, “A general class of passivemacromodels for lossy multiconductor transmission lines,” IEEETrans. Microw. Theory Tech., vol. 49, no. 10, pp. 1686–1696, Oct.2001.

[32] A. Cangellaris, S. Pasha, J. Prince, andM. Celik, “A new discrete trans-mission line model for passive model order reduction and macromod-eling of high-speed interconnections,” IEEE Trans. Adv. Packag., vol.22, no. 3, pp. 356–364, Aug. 1999.

[33] Q. Yu, J. M. L. Wang, and E. S. Kuh, “Passive multipoint momentmatching model order reduction algorithm on multiport distributed in-terconnect networks,” IEEE Trans. Circuits Syst. I, Fundam. TheoryAppl., vol. 46, no. 1, pp. 140–160, Jan. 1999.

[34] E. Gad and M. Nakhla, “Efficient simulation of nonuniform transmis-sion lines using integrated congruence transform,” IEEE Trans. VeryLarge-Scale Integr. (VLSI) Syst., vol. 12, no. 5, pp. 1307–1320, May2004.

[35] F. Fer, “Resolution de l’equation matricielle parproduit infini d’exponentielles matricielles,” Acad. Roy. Belg. Cl. Sci.,vol. 44, no. 5, pp. 818–829, 1958.

[36] J. D. Dixon, “Exact solution of linear equations using p-adic expan-tions,” Numerische Mathematik, vol. 40, no. 1, pp. 137–141, 1982.

[37] W. Eberly, M. Giesbrecht, P. Giorgi, A. Storjohann, and G. Villard,“Solving sparse integer linear systems,” in Proc. ISSAC’06, Genova,Italy, Jul. 2006, pp. 63–70.

[38] “HSPICE U-2008.09-RA,” Synopsis Inc..[39] “HSPICE Signal Integrity User Guide,” Synopsis Inc., Sep. 2005.[40] M. Celik and A. C. Cangellaris, “Efficient transient simulation of lossy

packaging interconnects using moment-matching techniques,” IEEETrans. Compon., Packag., Manuf. Technol. B, vol. 19, no. 1, pp. 64–73,Feb. 1996.

Sourajeet Roy (S’11) received the B.Tech. degree inelectrical engineering from Sikkim Manipal Univer-sity, India, in 2006, and theM.E.Sc. degree from Uni-versity of Western Ontario, London, ON, Canada, in2009, where he is currently working toward the Ph.D.degree.His research interests include modeling and simu-

lation of high speed interconnects, signal and powerintegrity analysis of electronic packages and designand implementation of parallel algorithms.Mr. Roy was the recipient of the Vice-Chancellors

Gold Medal for academic excellence at the undergraduate level.


Anestis Dounavis (S’00–M’03) received the B.Eng.degree from McGill University, Montreal, QC,Canada, in 1995, and the M.Sc. and Ph.D. degreesfrom Carleton University, Ottawa, ON, Canada,in 2000 and 2004, respectively, all in electricalengineering.He currently serves as an Associate Professor

with the Department of Computer and Electrical En-gineering, University of Western Ontario, London,ON, Canada. His research interests are in electronicdesign automation, simulation of high-speed and

microwave networks, signal integrity and numerical algorithms.Dr. Dounavis was the recipient of the Ottawa Centre for Research and Inno-

vation (OCRI) futures award—student researcher of the year in 2004 and theINTEL Best Student Paper Award at the Electrical Performance of ElectronicPackaging Conference in 2003. He also received the Carleton University Medalfor outstanding graduate work at the M.Sc. and Ph.D. levels in 2000 and 2004,respectively. He was the recipient of the University Student Council TeachingHonour Roll Award at the University of Western Ontario in 2009 to 2010.

Amir Beygi (S’08) received the B.S. degree inelectrical engineering from K.N. Toosi University ofTechnology, Tehran, Iran, in 2004, the M.S. degreein electrical engineering from Iran University ofScience and Technology, Tehran, Iran, in 2007, andthe Ph.D. in electrical and computer engineeringfrom The University of Western Ontario, London,ON, Canada, in 2011.His research interests include simulation and mod-

eling algorithms for electromagnetic compatibilityand signal integrity of high-speed interconnects.

IEEE TRANSACTIONS ON MICROWAVE THEORY …...and ,and are the frequency-dependent resistive,...

Documents

Transcript of IEEE TRANSACTIONS ON MICROWAVE THEORY …...and ,and are the frequency-dependent resistive,...