Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION 1
Abstract—In p-adaptive finite element analysis, the large,
sparse matrix that arises can be block structured according to the
hierarchical level of the unknowns. A multilevel preconditioner
for the matrix is a V-cycle that starts by applying Gauss-Seidel to
the highest level, then the next level down, and so on. On the
other side of the V, Gauss-Seidel is applied in the reverse order.
At the bottom of the V is the lowest order system, which typically
is solved exactly with a direct solver. However, for a complex
geometry even the lowest order system may be too large for
direct factorization. Here an alternative is proposed: to continue
the V-cycle downwards, first into a set of auxiliary, node-based
spaces, then through a series of progressively smaller matrices
generated by an algebraic multigrid method. The smallest matrix
is solved by factorization.
The method is applied to p-adaptive analysis of a five-
resonator iris filter, a split-ring resonator loaded waveguide, a
“buckyball” metallic frame surrounding a conducting sphere,
and a noncommensurate frequency selective surface. Tetrahedral
elements up to 4th
order are used. The largest matrix has over 12
million rows and 0.6 billion nonzero entries.
Index Terms—Finite element methods, multigrid methods,
microwave propagation, scattering parameters.
I. INTRODUCTION
High order finite elements can be very effective in solving
scattering problems, particularly when different orders can be
used in different regions. P-adaption involves repeated
solution of the problem, starting at low order and
automatically increasing the order in the regions where it is
needed. It is a powerful technique in its own right [1] [2] and
even more effective in combination with h-adaption [3] [4].
Like all finite element methods (FEMs), as the problem size
increases the dominant computational cost of p-adaption
becomes the solution of the large, sparse matrix problem to
which the electromagnetic field problem is reduced. Given the
demands for solving ever larger and more complex problems
with greater accuracy, it is particularly important to consider
efficient techniques for solving the matrix equation. The most
efficient methods tend to be those that take into account as
much as possible the nature of the matrix. A salient feature of
the matrix in p-adaption is that it is constructed in a series of
levels, corresponding to the orders of the finite elements. This
feature was exploited in [5][6][7] for two element orders with
Manuscript received February 1, 2013. This work was supported by the
Natural Sciences and Engineering Research Council of Canada.
The authors are with the Department of Electrical and Computer
Engineering, McGill University, Montreal, QC H3A 0E9, Canada. (e-mail:
[email protected], [email protected]).
a method called p-type multiplicative Schwartz (pMUS). It
was extended to higher orders in [8][9]. In none of this work
were mixed-order meshes considered; rather, the element
order was assumed to be uniform throughout the mesh.
The multilevel approach adopted in pMUS works very well,
but stops at the lowest order (the Whitney element [10]). The
matrix problem that corresponds to all elements being of the
lowest order is solved either exactly by a direct solver or
approximately by application of incomplete Choleski factors
[7]. For large, geometrically complex problems, the number of
tetrahedra is high and the Whitney problem itself can be
computationally challenging. On the other hand, there has
been progress in applying multilevel methods to the Whitney
mesh. Geometric multigrid methods (GMGs), though
powerful, requires special, nested grids of tetrahedra [11].
Algebraic multigrid methods (AMGs) are not constrained in
this way. AMGs that apply directly to the Whitney element
have been devised [12][13][14], but a promising alternative,
adopted here, is first to transfer the Whitney field onto
auxiliary spaces, on each of which a scalar AMG can be
applied. This is the Auxiliary Space Preconditioning (ASP)
method [15]. Its application to wave scattering is reported in
[16].
We describe in the next section a multilevel algorithm that
solves the matrix problem for a large mesh consisting of
elements of different orders. It uses a Krylov subspace solver
with a preconditioner that applies a multilevel strategy all the
way from the highest order present, down through the ASP
projection to a series of scalar AMG coarsenings.
Since p-adaption requires the solution of a number of such
matrix problems, consideration was also given to the
possibility of obtaining information from the first solve, when
all the elements are Whitney elements, and using this to
accelerate the solution at subsequent adaptive steps. This is
discussed in Section III. Results for the new algorithms are
presented in Section IV, along with comparison with some
standard iterative methods.
II. PMUS AND AUXILIARY SPACE PRECONDITIONING
The problem considered is finding the phasor electric field
inside a given volume of space, at a given frequency, in the
presence of materials with arbitrary permittivity and
permeability (symmetric, complex tensors in general). On the
boundary of the volume a number of different boundary
conditions may be specified: perfect electric conductor (PEC);
absorbing boundary conditions (ABCs) to represent waves
propagating outwards into free space; and port boundary
Multilevel Methods for 𝑝-adaptive Finite
Element Analysis of Electromagnetic Scattering
A. Aghabarati, J. P. Webb, Member, IEEE
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION 2
conditions where waveguides or transmission lines connect to
the volume. In this way, a large range of wave scattering
problems may be handled, both free-space and enclosed. The
mathematical details are summarized in [16].
In the present case, the finite elements used are the
tetrahedral, hierarchical, incomplete-order, vector elements
described in [8]. The order 1 elements are the well-known
Whitney elements [10], with 6 unknowns, associated with 6
basis functions, which will be referred to as “first order”. The
element of order 2 contains these 6 basis functions, plus
another 14, which are called “second order”. The elements of
orders 3 and 4 add 25 “third order” and 39 “fourth order” basis
functions, respectively. Since the elements are hierarchical,
different orders can be mixed together in the same mesh, as
will typically happen after the first iteration of p-adaption.
After discretization, the problem reduces to a matrix
equation , where is complex and symmetric and is
a column vector of the unknowns, representing the electric
field. We number the unknowns so that those associated with
first order basis functions come first, then those of second
order, and so on. (Note that, for example, a first order
unknown may belong only to elements of order higher than 1.
A Whitney edge function may be for an edge that is shared
only by elements of order 2, since these elements contain both
first and second order basis functions.) With this numbering,
the matrix can be partitioned as follows:
[
] (1)
where 𝑝 is the highest order present in the mesh. We define
to be the square, upper left, submatrix of whose lower
right block is . We also define an associated series of
trivial, rectangular, prolongation matrices, , by the equation
( ) for 𝑝. (2)
( ) “coarsens” the representation of the field by dropping
the entries corresponding to order ; “prolongates” the
representation by adding back these entries, setting their
values to zero.
The function 𝑝 ( ) returns an approximation to
( ) , where is a vector of the appropriate length. It
involves primarily the application of forward and backward
Gauss-Seidel (GS):
( ) ( ) ; ( ) ( )
(3)
where , and are the diagonal, strict lower triangle and
strict upper triangle, respectively, of square matrix . ( )
and ( ) are inexpensive approximations to and are
applied in a “V-cycle” to the diagonal blocks of (1), starting
with the highest block and proceeding down to the lowest,
then back up again to the highest. The algorithm 𝑝 ( ) is given recursively as follows:
( ) ( ) If :
Solve order 1: ( ) Else:
Backward GS: ( ) ; Residual update:
Coarsen: ( ) Solve coarse: 𝑝 ( ) Prolongate: ;
Residual update:
Forward GS: ( ) ;
The vectors and are blocks of and , respectively,
corresponding to the partitions in (1).
In conventional pMUS, the operation “ ( )” is either a
direct solution of , or multiplication by the inverse of
the incomplete Choleski factors of [7]. Here, instead, we
continue to apply a multilevel approach. The first step is to
transfer the Whitney field represented by to four auxiliary
spaces: the space of piecewise linear, scalar functions on the
same mesh and the three components of the space of nodal
vector functions. We will denote the transfer matrices as
( ) , where takes the values , , and . Details are in
[16], where is called . The algorithm ( ) is,
then:
( ) ( ) Backward GS: (
) ;
Residual update:
Initialize:
For each of 4 auxiliary spaces, :
Transfer: ( )
Solve: ( ) Transfer back:
;
Residual update:
Forward GS: ( ) ;
The function ( ) returns an approximation to
( ) , where
represents in auxiliary space . For
, it is defined as
( )
(4)
For , is obtained not directly from , but from
the version of that results when is replaced by (
) during matrix assembly, where is a real number
[16][17][18]. We call this matrix . Then:
( )
for . (5)
This shift in the frequency improves the quality of the
preconditioner. For all the results below, .
The approximation to ( ) is obtained by a multilevel
process similar to pMUS, only using a series of prolongation
matrices, , obtained by an algebraic coarsening process [19].
Each maps a vector representing the field in
to a shorter
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION 3
vector representing a coarser version of the same field. From
we can generate a series of “coarse” matrices:
(
)
for (6)
where is the chosen number of AMG coarse levels. The
function ( ) returns an approximation to
( ) by applying GS and successive coarsenings and
prolongations in another V-cycle. It is given recursively as
follows:
( ) ( )
If :
Solve exactly: ( )
Else:
Backward GS: ( ) ;
Residual update:
Coarsen: ( )
Solve coarse: ( ) Prolongate:
;
Residual update:
Forward GS: ( ) ;
At , the matrix problem is small enough to be solved
exactly by a direct method.
A call to 𝑝 ( 𝑝), then, initiates the overall V-
cycle shown in Fig. 1, during which GS is applied to the
sequence of matrices shown – backward GS on the downward
portion, and forward GS on the upward portion. The complete
V-cycle approximates ( ) , and constitutes a multilevel
preconditioner.
Since the most computationally intensive part of the
preconditioner is pMUS, it has been found beneficial to
execute the ASP/AMG part more than once in each cycle,
because this reduces the number of Krylov iterations at
relatively small additional cost. Hence the cycle shown in Fig.
2, which is called an extended W-cycle from its shape. Note
that the backward and forward GS has been arranged to
preserve the symmetry of the overall preconditioner. All of the
results below use this W-cycle version of 𝑝 ( 𝑝).
III. DEFLATION
The multilevel/AMG cycle described in the previous section
captures error components successively by construction of a
multilevel hierarchy. This hierarchy is defined by transfer
operators between different polynomial groups of basis
functions at higher levels and continued with virtual grid
transfer operators below the Whitney level. The cycle aims to
combine the smoothing properties of Gauss-Seidel with
transfer to the next lowest (coarser) level of the error
components invariant to relaxation. Errors that remain after
the smoother has been applied and that must be reduced at the
next level are called “algebraically smooth”. Algebraically
smooth components belong to the space of eigenvectors of
that have small eigenvalues and for rapid convergence,
accurate representation of them on the coarser level in needed.
The nature of the algebraically smooth errors and the near
null space of affect the performance of the multilevel cycle.
The frequency-shifting mentioned above reduces the effect of
this space on the preconditioner. Here deflation is also
employed to improve the convergence rate. The goal of
deflation is to remove from the system the eigenspace
corresponding to troublesome, algebraically smooth,
eigenvectors. Deflation permits the incorporation of existing
knowledge about the space of troublesome eigenvectors
available from a previous calculation, or a nearby problem.
In the context of Krylov subspace methods, deflation
techniques appear in the paper of Nicolaides [20] and a
comparable approach is proposed in [21] and [23]. Let
𝑝 ( ) be the Cholesky factorization of the
multilevel preconditioner and assume that the columns of the
matrix are the “deflation vectors” that approximately span
the space of slow converging modes for preconditioned
system , where , and . The deflated Lanczos algorithm [20] starts with initial
Fig. 1. The V-cycle version of 𝑝 ( 𝑝). A solid box around a matrix
means that it is used in an application of backward GS; a dashed box means
that it is used in an application of forward GS. The circle means an exact
solution using this matrix. Dashed arrows imply a series of steps with
decreasing (downward) or increasing (upward) matrix superscripts.
pMUS
ASP
scalar
AMG
scalar
AMG
Fig. 2. The W-cycle version of 𝑝 ( 𝑝).
pMUS
ASPASP ASP
scalar
AMG
scalar
AMG
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION 4
vector ‖ ‖ orthogonal to and builds a sequence of
vectors such that [21]:
{ } ; ‖ ‖ (7)
To obtain [ ] one can apply the standard
Lanczos procedure to the auxiliary matrix
(8)
where ( )
. If the matrix is singular, its
pseudo-inverse is considered. To satisfy , the residual
vector is associated with a special initial guess
( ) (9)
for some arbitrary . The resulting Krylov subspace at the
th step is denoted by ( ) and the algorithm seeks
an approximate solution ( ) by requiring
that the residual be orthogonal to ( ).
The change to the standard symmetric Lanczos algorithm is
minimal and can be summarized as follow:
1. Based on given , pre-compute ( ) and the
initial approximation vector using (8).
2. Construct a symmetric Lanczos space based on the
coefficient matrix instead of by modifying all the
matrix-vector products as
(10)
The resulting Lanczos vectors satisfy where
is a tridiagonal matrix.
Depending on different ways of factorizing , distinct
deflated Krylov solvers can be developed in a straightforward
way for complex symmetric coefficient systems [24]. Here we
use the complex orthogonal conjugate gradient method
(COCG) [25], which is based on Cholesky factorization. We
remark that since the solution is approximated in the space
generated by and , inclusion of deflation vectors in the
space is beneficial even when contains only poor
approximations of slow-converging eigenvectors.
A. Determination of Deflation Vectors
For the deflated Krylov solver, the number of desired
eigenvector must be chosen, along with which eigenvalues
to be targeted. In particular, small eigenvalues related to near-
null space and large outstanding eigenvalues are the most
important ones for deflation purpose and accelerating the
convergence. There are distinct ways to exploit some
knowledge about approximate eigenvectors [21][23]. Here we
assume that important information is obtained during the
solution of the problem at the first adaptive step, when all
elements are at their lowest order (𝑝 ), and not from a prior
knowledge about the problem. For this first step, the solution
is obtained by a different Krylov subspace solver, MINRES
[26]. Standard MINRES is for Hermitian matrices; in the
present case, the matrix is complex and symmetric. Standard
MINRES is adapted to this case by replacing every instance of
the Hermitian operator (i.e., complex-conjugate transpose) by
a simple transpose [24]. Determination of approximate
eigenvalues to be applied to subsequent adaptive steps is
achieved by the following harmonic projection method
[21][22].
Given subspace and tridiagonal matrix from the
MINRES iterations, the method computes eigenpairs ( ) by solving the generalized eigenproblem
(11)
where
(12)
(
) (13)
Defining [ ], the problematic modes can then
be expressed in the form . Equation (11) can be
solved at a low cost by any technique for dense generalized
eigenvalue problems. In our experiments, the QZ algorithm is
used.
B. Deflated Krylov for 𝑝-Adaption
Consider now the matrix equation at the th adaptive step, , with unknowns. Each matrix is of the form (1), with the same upper left block
. For , the equation is solved by deflated COCG with
deflation vectors obtained from as follows:
[ ]
(14)
In summary, the procedure for solving 𝑝-type hierarchical
systems using a deflated Krylov solver
preconditioned with a multilevel cycle is as follows:
1. First-order solution: Set = and apply iterations of
MINRES, preconditioned with the ASP/AMG algorithm
described in the previous section, to solve . This computes the matrix that has the Lanczos vectors,
and the tridiagonal matrix .
2. Select , the number of eigenvectors to be used.
3. Eigenvector computation: Find and using (12) and
(13). Solve (11) for eigenvalues and let be the matrix
containing the corresponding vectors. Set and
compute (( ) )
.
4. Adaption iterations: for
a. Initialize =0 and then transfer into it values from
, with appropriate indexing. b. Set according to (14).
c. Solve by preconditioned deflated COCG
with and from initial guess
( ) ( ).
In deflated COCG, the regular update of the descent
direction vector 𝑝 𝑝 is changed to 𝑝
𝑝 ( ) where is the
preconditioned residual vector: 𝑝 ( ). We
remark that assuming , the extra memory required for
storing and the non-zeros of is moderate. The
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION 5
computational cost of deflation is limited to finding just the
first values of and then multiplying by ( ) .
The cost is much less than that of the preconditioning and the
gain in decreasing the number of iterations makes it well
worthwhile.
IV. NUMERICAL RESULTS
In this section, we study the performance of the proposed
approach via several numerical experiments. In each example,
we adaptively increase the order of the elements and solve the
matrix problem at each adaptive step by either MINRES (for
the first step) or COCG (for subsequent steps), preconditioned
with variants of the W-cycle described in section I, with or
without deflation. Depending on the method used for
approximating the solution to the Whitney system, , the preconditioners tested are:
- pMUS[ASP]: the proposed approach, i.e., ASP and AMG.
- pMUS[AV]: one backward GS step followed by one
forward GS step, both applied to the vector-scalar
potential (A-V) version of the Whitney system, in
compact form [27][16]. This amounts to a SSOR
preconditioner applied to the compact A-V version of the
entire matrix.
- pMUS[iLU]: sparse incomplete factorization of [28].
- pMUS[LU]: complete LU factorization of using
UMFPACK [29].
The Krylov iterations are terminated when the infinity
norm of the residual is reduced by a factor of . In the
examples below, this parameter is set to unless
otherwise stated. All simulation are done using Matlab [30]
and performed on a PC with a -bit, 4-core, Intel GHz
processor and GB of RAM. The geometries are modeled
and meshed with ElecNet [31].
For the first example, we compute the S-parameter of a
five-resonator iris filter [9][32]. The geometry of the two port
waveguide filter is depicted in Fig. 3 and its outer dimensions
are . The computational domain
is discretized with tetrahedra, resulting in a maximum
edge length of ⁄ at GHz. The number of DOFs is
increased from to in steps by the p-
adaption scheme of [2], increasing the polynomial order of
25% of the elements at each step. The Krylov termination
parameter, , for both MINRES and COCG, is set to . For pMUS[ASP], 2 AMG levels are used (i.e., ).
Deflation is also considered. The number of deflation
eigenvectors, , is set to 50% of the number of MINRES
steps, , used to solve the first-order problem.
The results in Table I show that at each adaption step there
is a reduction in the number of iterations by using
TABLE I
SOLUTION STATISTICS FOR THE ADAPTIVE S-PARAMETER COMPUTATION OF WAVEGUIDE IRIS FILTER AT 3.63 GHZ
Adaption Step
DOFs 56,632 127,126 214,814 308,726 391,569 481,752
Preconditioner
(+ Deflation) Iter.
CPU
Time
(s)
Memory
(MB) Iter.
CPU
Time
(s)
Memory
(MB) Iter.
CPU
Time
(s)
Memory
(MB) Iter.
CPU
Time
(s)
Memory
(MB) Iter.
CPU
Time
(s)
Memory
(MB) Iter.
CPU
Time (s)
Memory
(MB)
pMUS[ASP] 71 8.66 64 97 16.6 114 101 27.9 182 103 44.8 294 108 60.7 374 110 81.1 547
pMUS[ASP]+Def[50%] 71 8.66 64 42 9.23 134 48 14.3 204 49 24.0 326 50 30.6 428 53 40.8 567
pMUS[AV] 197 14.4 57 207 20.6 116 207 40.4 168 203 72.7 287 203 97.4 382 204 131.3 520
pMUS[iLU] 594 84.1 205 636 188.0 289 639 245.9 316 604 338.9 363 687 459. 7 511 623 523.6 635
pMUS[LU] 1 2.41 302 40 7.52 354 50 13.4 417 57 24.1 541 60 33.1 608 61 43.6 733
Computed
Fig. 3. Geometry of the iris waveguide filter.
Fig. 4. Convergence behaviour of error for the uniform
and adaptive p-refinement
0 2 4 6 8 10 12 14
x 105
10-3
10-2
10-1
100
Number of DOF's
ab
s(S
11-S
11
ref )
Adaptive p- refinement
Uniform p- refinement
64 MB
476 MB
1082 MB
2784 MB
134 MB
204MB
326MB
428MB
567MB
Fig. 5. Magnitude of scattering parameters versus frequency
for the iris waveguide filter
3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4
-60
-50
-40
-30
-20
-10
0
Frequency [GHz]
ab
s(S
11)
[d
B]
1st Order (Uniform)
2nd Adaption Step
4th Adaption Step
6th Adaption Step
4th Order (Uniform)
Mode- Matching (Reiter et. al)
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION 6
pMUS[ASP] compared to other methods and also significant
acceleration, after the first step, by incorporation of deflation
eigenvectors. During the 6 adaption steps, the cumulative CPU
time related to the matrix solution phase for pMUS[ASP] with
(without) deflation is 128 s (240 s), while it is 377 s, 1840 s
and 124 s for pMUS[AV], pMUS[iLU] and pMUS[LU],
respectively. Note that an inappropriate treatment of the
lowest order block (such as iLU) can worsen the convergence
rate at higher orders and increasingly add to the overall
solution cost. In addition, the solution time for auxiliary space
preconditioning plus deflation approach is very close to
pMUS[LU] method, which has the highest memory
requirement due to the complete factorization.
In Fig. 4. we show how the error | | depends on
the number of DOFs with uniform and adaptive increments of
elements order. The computations are made at the frequency
of 3.63 GHz, but the behaviour is similar for all frequencies.
The reference solution is obtained by solving the same
problem with a highly refined mesh ( ) and using
4th order elements entirely, which results in 3.89 million
DOFs. Comparing the slopes of the curves, we see that adding
DOFs in an adaptive way gives a more rapid decrease of error.
The magnitude of computed over the frequency range
GHz to GHz is shown in Fig. 5 for some of the
adaption steps. The frequency response when uniform 4th
order elements are employed and results from [32] are also
given. As observed, the adaptive solver progressively reduces
the error in . The second example is free space scattering from a
conducting sphere surrounded by a polyhedral, 60-node
“buckyball”, acting as a metallic frame to reduce the
backscattered wave from the sphere at MHz [33]. The
problem is shown in Fig. 6. The diameter of the internal solid
sphere is m and the external polyhedral conducting frame
has diameter of m with an edge width and thickness of
m and m, respectively. An absorbing
boundary condition (ABC) is applied over a spherical surface
5 m away from the buckyball. The mesh obtained after
discretization consists of nodes and
elements with minimum and maximum edge lengths of
and , respectively. In this computation, the
number of unknowns increases from to in
TABLE II
SOLUTION DETAILS FOR THE BUCKYBALL SCATTERING PROBLEM
Adaption Step
DOFs 810,595 2,319,586 3,716,547 5,710,953
Preconditioner (+ Deflation) Iter. CPU Time
(hh;mm;ss)
Memory
(MB) Iter.
CPU Time
(hh;mm;ss)
Memory
(MB) Iter.
CPU Time
(hh;mm;ss)
Memory
(MB) Iter.
CPU Time
(hh;mm;ss)
Memory
(MB)
pMUS[ASP]+Def[50%] 108 00:03:58 928 124 00:08:39 2,635 130 00:12:50 3,621 157 00:25:30 5,895
pMUS[AV] 2,058 00:21:10 686 2,173 01:19:52 1,767 2,138 02:18:15 2,673 2,188 04:19:25 4,908
Fig. 6. Geometry of 6m metal sphere surrounded with a 10m spherical
polyhedral conducting frame.
Fig. 7. Comparison of effects of deflation on iterations counts for
the buckyball problem
1 2 3 4
100
150
200
250
300
Adaption Steps (i)
Nu
mb
er
of
Itera
tio
ns
No Deflation Vector (0%)
11 Deflation Vectors (10%)
32 Deflation Vectors (30%)
54 Deflation Vectors (50%)
76 Deflation Vectors (75%)
108 Deflation Vectors (100%)
Fig. 8. Comparison of effects of deflation on CPU time for
the buckyball problem
1 2 3 4
500
1000
1500
2000
2500
3000
Adaption Steps (i)
CP
U T
ime [
s]
No Deflation Vectors (0%)
11 Deflation Vectors (10%)
32 Deflation Vectors (30%)
54 Deflation Vectors (50%)
76 Deflation Vectors (75%)
108 Deflation Vectors (100%)
Fig. 9. Comparison of effects of deflation on memory usage for
the buckyball problem
1 2 3 4
1000
2000
3000
4000
5000
6000
7000
Adaption Steps (i)
Mem
ory
[M
B]
No Deflation Vectors (0%)
11 Deflation Vectors (10%)
32 Deflation Vectors (30%)
54 Deflation Vectors (50%)
76 Deflation Vectors (75%)
108 Deflation Vectors (100%)
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION 7
adaptive steps.
The detailed computational information is listed in Table II.
For pMUS[ASP], 3 AMG levels are used (i.e., ), with
, and DOFs, respectively. In this case,
it was not possible to run pMUS[iLU] and pMUS[LU]
because of the high memory requirements in factorizing
(more than 50GB). At the same time, the run time required by
pMUS[ASP] accelerated with 57 deflation vectors is much
less than with pMUS[AV] at each adaption step.
Next we consider the effect of varying the dimension, , of
the deflation subspace from 0 (no deflation) to
( deflation). Figs. 7-9 compare the number of iterations,
computation time and memory usage for several choices of .
Increasing leads to fewer iterations, but at the cost of more
memory usage. We also see that increasing beyond about
50% of does not bring much further reduction in
computation time. Towards the end of the adaption,
deflation leads to a reduction in cumulative computation
time and a reduction in Krylov iterations.
The next example is a rectangular metallic waveguide
loaded with split ring resonators (SRRs). This configuration
has attracted extensive attention [34][35], since it provides the
realization of negative index metamaterials and novel
methods to miniaturize waveguide-based devices. The
geometry is directly taken from [34], proposed for the
creation of a stopband above the ordinary waveguide cutoff
frequency. The geometry, showing the excitation port A and
absorption port B, is presented in Fig. 10. The rectangular
waveguide is a standard C-band type with dimensions
at the operating frequency of
GHz. The dielectric slabs (shown as green) on which the
four SRR arrays are printed have relative permittivity of
and thickness of mm. The dimensions of the
SRR elements can be found in [34], noting that printed
elements on the middle and left lateral wall slabs are a factor
of smaller than the elements on the right side. The entire
composite domain is discretized with tetrahedra
elements, resulting in edge lengths , and
(a)
(b)
(c)
Fig. 10. (a) The geometry of SRR loaded waveguide filter along with the excitation and absorption ports. (b) The mesh discretization.
(c) Intensity distribution of electric field strength within the loaded waveguide
Po
rt A
Po
rt B
TABLE III
SOLUTION DETAILS FOR THE SRR LOADED WAVEGUIDE PROBLEM
Adaption Step
DOFs 1,050,016 2,353,671 3,967,045 5,994,104
Preconditioner (+ Deflation) Iter. CPU Time
(hh:mm:ss)
Memory
(MB) Iter.
CPU Time
(hh:mm:ss)
Memory
(MB) Iter.
CPU Time
(hh:mm:ss)
Memory
(MB) Iter.
CPU Time
(hh:mm:ss)
Memory
(MB)
pMUS[ASP]+Def[50%] 947 00:36:25 955 916 01:43:21 8,803 1,551 03:57:10 10,416 2,084 08:11:28 13,023
pMUS[AV] 4,056 01:25:46 834 6,201 03:35:04 1,724 9,033 10:35:38 3,176 10,115 22:04:29 5,942
Computed
Fig. 11. Cumulative CPU time versus adaption steps for the SRR loaded
waveguide filter
1 2 3 4
2
4
6
8
10
12
14x 10
4
Adaption Steps (i)
Cu
mu
lati
ve C
PU
Tim
e [
s]
pMUS [AV]
pMUS [ASP]
pMUS [ASP]+Def [50%]
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION 8
. Adaption was used to compute the reflection
coefficient as the parameters of interest and the results are
given in Table III. The convergence of deflated Krylov
preconditioned with pMUS and 3 level AMG ( ) is
superior again compared to pMUS[AV]. The importance of
deflating the Krylov solver from hampering eigenvectors and
its effects on the cumulative CPU time is shown in Fig. 11.
While the required time to accurately solve the problem in
adaption steps is around hours with pMUS[AV],
pMUS[ASP] reduces that to hours and with deflation it is
less than hours. The magnitude of the electric field
intensity at the mid-plane passing thorough the geometry is
plotted in Fig. 10(c). It is apparent that energy propagation
toward the output port is suppressed by the ring resonator
elements.
For the final experiment, results obtained from the
scattering analysis of a finite noncommensurate frequency
selective surface (FSS) are presented. The device, shown in
Fig. 12, consists of two dissimilar Jerusalem cross screens
(dimensions taken from [36]) printed on opposite sides of a
dielectric slab, where is the free
space wavelength at 3 GHz. The dielectric relative permittivity
is . Such composite and multi-layer structures
with different periodicities are difficult to handle rigorously
using periodic boundary conditions [36][37] and analysis of
the entire system is usually needed.
For the computational domain truncation, the geometry is
enclosed in a box at least away from any point on the
FSS and the ABC is applied. The structure is illuminated
normally ( direction) with an -polarized plane wave. The
mesh consists of tetrahedra. As observed in Fig.
12, the mesh is highly nonuniform, with an average element
size of and sizes ranging from around
the metallic regions to near the truncation boundary.
A summary of computational statistics for 4 steps of p-
adaption, using pMUS[ASP]+Def[50%], is given in Table IV.
In this case pMUS[AV] failed at each adaptive step to reduce
the residual by a factor of after iterations. The
column labeled “Nonzeros” gives the number of nonzero
entries in the matrix. Note that the matrices become denser as
the adaption proceeds, which is characteristic of p-adaption.
Here, as in the previous examples, the number of iterations is
relatively stable, despite the growth in the size of the matrix.
At step 4, a complex valued system with almost million
unknowns and billion nonzeros is solved on a PC in
(a)
(b) (c)
Fig. 12. Geometry of a Jerusalem cross noncommenserate FSS
(a) Side view. (b) Top view showing the 7 7 upper screen.
(c) Bottom view showing the 5 5 lower screen
(a)
(b)
Fig. 14. Visualization of electric field for the noncommenserate FSS
(a) over plane . (b) over plane
Fig. 13. Residual history of (deflated) preconditioned COCG method
for the noncommenserate FSS scattering at 4th adaption step
0 200 400 600 800 1,000 1,200 1,400
10-6
10-5
10-4
10-3
10-2
10-1
100
101
102
Iterations
Rela
tive R
esid
ual
pMUS[AV]
pMUS[ASP]+Def[0%]
pMUS[ASP]+Def[50%]
TABLE IV
SOLUTION DETAILS FOR THE NONCOMMENSERATE FSS SCATTERING PROBLEM
i DOFs Nonzeros Iter. CPU Time
(hh:mm:ss)
Memory
(MB)
1 2,261,319 36,824,307 533 00:54:28 2,314
2 5,482,526 188,871,828 626 02:07:01 11,241
3 8,909,294 370,550,000 683 03:18:42 14,347
4 12,264,629 626,152,411 594 03:45:56 17,424
TABLE V
MATRIX HIERARCHY DETAILS FOR THE NONCOMMENSERATE
FSS SCATTERING PROBLEM AT THE 4TH ADAPTION STEP
Matrix DOFs Nonzeros
12,264,629 626,152,411
12,051,413 525,271,225
10,794,234 416,843,750
2,261,319 36,824,307
259,089 3,487,890
90,448 1,236,772
34,162 480,396
20,174 287,358
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION 9
reasonable time.
Table V gives, for the matrix (4th adaptive step), the
matrices used in its preconditioner. Since there are some
4th order elements present at this step, the largest matrix
(which is equal to itself) is . Four levels of AMG are
used (i.e., ), so the smallest matrix is .
The residual history in solving the system is shown in
Fig. 13. Removing the accelerations provided by deflation
results in an increase of and in Krylov iterations
and run time respectively, as well as reduction in
memory usage. Fig. 14 shows the computed electric field over
two cross-sections, both passing through the center of the
structure.
V. CONCLUSION
The pMUS[ASP] method has been shown to be effective for
the analysis of complex geometries by hierarchical finite
elements. Incorporation of ASP and AMG into the
preconditioning V- or W-cycle allows pMUS to be applied
even when the matrix at lowest order is too big for direct
solution. This is especially useful for scatterers with fine
geometric details, requiring a very large mesh in which the
bulk of the elements remain at low order. pMUS[ASP] was
always faster than pMUS[AV] (SSOR preconditioning applied
to the A-V system). In one case it was over 10 times faster and
in another case, pMUS[AV] failed to converge.
In p-adaption, the use of a deflated Krylov solver, which
exploits information obtained from the first adaptive step,
provided some additional reduction in run time (20% to 40%),
at the expense of a greater memory requirement (10% to
35%).
REFERENCES
[1] L. S. Andersen and J. L. Volakis, "Adaptive multiresolution antenna
modeling using hierarchical mixed-order tangential vector finite
elements," IEEE Transactions on Antennas and Propagation, vol. 49,
pp. 211-222, Feb 2001.
[2] D. Nair and J. P. Webb, “P-Adaptive Computation of the Scattering
Parameters of 3-D Microwave Devices”, IEEE Transactions On
Magnetics, vol. 40, no. 2, Mar 2004.
[3] M. M. Botha and J. M. Jin, "Adaptive finite element-boundary intergral
analysis for electromagnetic fields in 3-D," IEEE Transactions on
Antennas and Propagation, vol. 53, pp. 1710-1720, May 2005.
[4] I. Gomez-Revuelto, L. E. Garcia-Castillo, and M. Salazar-Palma, "Goal-
Oriented Self-Adaptive Hp-Strategies for Finite Element Analysis of
Electromagnetic Scattering and Radiation Problems," Progress in
Electromagnetics Research-Pier, vol. 125, pp. 459-482, 2012.
[5] G. H. Peng, D. Dyczij-Edlinger, and J. F. Lee, "Hierarchical methods for
solving matrix equations from TVFEMs for microwave components,"
IEEE Transactions on Magnetics, vol. 35, pp. 1474-1477, May 1999.
[6] Y. Zhu and A. C. Cangellaris, "Hierarchical multilevel potential
preconditioner for fast finite-element analysis of microwave devices,"
IEEE Transactions on Microwave Theory and Techniques, vol. 50, pp.
1984-1989, Aug 2002.
[7] J. F. Lee and D. K. Sun, "p-Type multiplicative Schwarz (pMUS)
method with vector finite elements for modeling three-dimensional
waveguide discontinuities," IEEE Transactions on Microwave Theory
and Techniques, vol. 52, pp. 864-870, Mar 2004.
[8] P. Ingelstrom, "A new set of H(curl)-conforming hierarchical basis
functions for tetrahedral meshes," IEEE Transactions on Microwave
Theory and Techniques, vol. 54, pp. 106-114, Jan 2006.
[9] P. Ingelstrom, V. Hill, and R. Dyczij-Edlinger, "Comparison of
hierarchical basis functions for efficient multilevel solvers," IET Science
Measurement & Technology, vol. 1, pp. 48-52, Jan 2007.
[10] J. Jin, The finite element method in electromagnetics, 2nd edition, John
Wiley, 2002.
[11] R. Hiptmair, "Multigrid method for Maxwell's equations," SIAM Journal
on Numerical Analysis, vol. 36, pp. 204-225, Dec 2 1998.
[12] S. Reitzinger and J. Schoberl, "An algebraic multigrid method for finite
element discretizations with edge elements," Numerical Linear Algebra
with Applications, vol. 9, pp. 223-238, Apr-May 2002.
[13] P. B. Bochev, C. J. Garasi, J. J. Hu, A. C. Robinson, and R. S.
Tuminaro, "An improved algebraic multigrid method for solving
Maxwell's equations," SIAM Journal on Scientific Computing, vol. 25,
pp. 623-642, 2003.
[14] R. Perrussel, L. Nicolas, F. Musy, L. Krahenbuhl, M. Schatzman, and C.
Poignard, "Algebraic multilevel methods for edge elements," IEEE
Transactions on Magnetics, vol. 42, pp. 619-622, Apr 2006.
[15] J. Xu, "The auxiliary space method and optimal multigrid
preconditioning techniques for unstructured grids," Computing, vol. 56,
pp. 215-235, 1996.
[16] A. Aghabarati, J. P. Webb, “An algebraic multigrid method for the finite
element analysis of large scattering problems”, IEEE Trans. Antennas
and Propagation, vol. 61, no.2, February 2013.
[17] Y. Erlangga, C. Oosterlee, and C. Vuik, "A novel multigrid based
preconditioner for heterogeneous Helmholtz problems," SIAM Journal
on Scientific Computing, vol. 27, pp. 1471-1492, 2006.
[18] Y. A. Erlangga, "Advances in iterative methods and preconditioners for
the Helmholtz equation," Archives of Computational Methods in
Engineering, vol. 15, pp. 37-66, Mar 2008.
[19] Y. Notay, "An Aggregation-Based Algebraic Multigrid Method",
Electronic Transactions on Numerical Analysis, vol. 37, pp. 123-146,
2010.
[20] R. A. Nicolaides, “Deflation of Conjugate Gradients with Applications
to Boundary Value Problems”, SIAM J. Numer. Anal., 24 (1987), pp.
355–365.
[21] Y. Saad, M. Yeung, J. Erhel and F. Guyomarcha, “Deflated Version of
the Conjugate Gradient Algorithm”, SIAM Journal on Scientific
Computing. Vol. 21, No. 5, pp. 1909–1926, Apr 2000.
[22] C. Andrew and Y. Saad. "Deflated and augmented Krylov subspace
techniques." Numerical linear algebra with applications, vol. 4 no. 1,
pp, 43-66, 1997.
[23] A. M. Abdel-Rehim, R. B. Morgan, D. A. Nicely and W. Wilcox,
“Deflated and Restarted Symmetric Lanczos Methods for Eigenvalues
and Linear Equations With Multiple Right-Hand Sides”, SIAM Journal
on Scientific Computing, vol 32, no 1, pp 129-149, 2010.
[24] R. Freund, “Conjugate Gradient-Type Methods for Linear Systems with
Complex Symmetric Coefficient Matrices”, SIAM Journal on Scientific
and Statistical Computing, 1992.
[25] H.A. van der Vorst, J. B. M. Melissen, "A Petrov–Galerkin type method
for solving Ax = b, where A is symmetric complex", IEEE Transactions
on Magnetics, vol. 26, no. 2, pp. 706-708, Mar 1990.
[26] C. C. Paige, and M. A. Saunders. "Solution of sparse indefinite systems
of linear equations." SIAM Journal on Numerical Analysis, vol 12, no 4,
pp. 617-629, Sep 1975.
[27] R. Edlinger, G. Peng and J. F. Lee, " A Fast Vector-Potential Method
Using Tangentially Continuous Vector Finite Elements", IEEE
Transactions on Microwave Theory and Techniques, vol. 46, pp. 86-868,
Jun 1998.
[28] Y. Saad, “Iterative Methods for Sparse Linear Systems”, PWS
Publishing Company, 1996, Chapter 10 - Preconditioning Techniques.
[29] T. A. Davis, “Algorithm 832: UMFPACK, an unsymmetric-pattern
multifrontal method”, ACM Transactions on Mathematical Software, vol
30, no. 2, June 2004, pp. 196-199.
[30] Matlab 7.10, the MathWorks, Inc., Natick, MA, 2000.
http://www.mathwork.com.
[31] ElecNet 7.3, 2012, Infolytica Corporation, Montreal, Canada,
http://www.infolytica.com.
[32] J. M. Reiter and F. Arndt; “Rigorous Analysis of Arbitrarily Shaped H-
and E-Plane Discontinuities in Rectangular Waveguides by a Full Wave
Boundary Contour Mode-Matching Method”, IEEE Transactions on
Microwave Theory and Techniques, vol. 43, no. 4, pp. 796-801, Apr
1995.
[33] P. A. Bernhardt, “Radar Backscatter from Conducting Polyhedral
Spheres”, IEEE Antennas and Propagation Magazine, vol. 52, No.5, Oct
2010.
[34] E. R. Iglesias, Ó. Q. Teruel, and M. M. Kehn, “Multiband SRR Loaded
Rectangular Waveguide”, IEEE Transactions on Antennas and
Propagation, vol. 57, no. 5, pp. 1570-1574, May 2009.
Copyright (c) 2013 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION 10
[35] F. Y. Meng, Q. Wu, D. Erni and L.W. Li, “Controllable Metamaterial-
Loaded Waveguides Supporting Backward and Forward Waves”, IEEE
Transactions on Antennas and Propagation, vol. 59, no. 9, Sep. 2011.
[36] Y. E. Erdemli, K. Sertel, R. A. Gilbert, D. E. Wright, and J. L. Volakis,
“Frequency-Selective Surfaces to Enhance Performance of Broad-Band
Reconfigurable Arrays”, IEEE Transactions on Antennas and
Propagation, vol. 50, no. 12, Dec 2002.
[37] R. Mittra, “A Look at Some Challenging Problems in Computational
Electromagnetics”, IEEE Transactions on Antennas and Propagation,
vol. 46, no. 5, pp. 18-32, Oct 2004.
Ali Aghabarati received the B.Eng. degree in telecommunication engineering
from Iran University of Science and Technology, Tehran, Iran, in 2008, and
the M.Eng. degree in electrical engineering from Amirkabir University of
Technology, Tehran, Iran, in 2010. Currently, he is working toward the Ph.D
degree at the Computational Electromagnetics Laboratory, McGill University,
Montreal, QC, Canada. His main research interests include numerical
techniques in electromagnetic computation, with focus on finite element
methods.
Jon P. Webb (M’83) received a Ph.D. from Cambridge University, England,
in 1981. Since 1982 he has been professor in the Department of Electrical and
Computer Engineering at McGill University, Montreal, Canada. His area of
research is computer methods in electromagnetics, especially the application
of the finite element method.
Top Related