## Pushing the boundaries:

## Fast integral methods for solving boundary element equations.

### Bachelor Project Mathematics

july 2015

Student: T.W. Verstraaten

First supervisor: Dr. B. Carpentieri Second supervisor: Prof. Dr. H. Waalkens.

faculty of mathematics and natural sciences

Abstract

In this thesis various preconditioning techniques are discussed for solving boundary element equations arising in electromagnetic scattering. Preconditioning is a crucial component for iterative Kylov methods in this context to accelerate their convergence, and there is a host of possible choices of preconditioners. These all attempt to increase the sparsity of the system and give rise to a favourable eigenvalue distribution. First we discuss the finite difference and finite element approaches to modelling differential equations and then we present the boundary element method (BEM) approach for the integral equation formulation. Afterwards, we discuss Krylov subspace methods and we describe the fast multipole method, which is used to reduce the computational cost of the matrix vector product to O(n log n). Finally we discuss some efficient preconditioning techniques for this problem class and illustrate their performance on the solution of realistic scattering problems.

### Contents

1 Introduction 3

1.1 Some notation . . . 3

1.2 Some definitions and concepts . . . 3

2 Numerical modelling of differential equations 4 2.1 Finite difference method . . . 4

2.2 The finite element method . . . 8

2.2.1 Weak formulation . . . 8

2.2.2 Galerkin method . . . 9

3 Integral Equations 11 3.1 Integral equations . . . 11

3.2 An integral equation for the acoustic problem . . . 11

3.3 Discretizating our model problem . . . 14

4 Projection methods 15 4.1 General projection method . . . 17

4.2 Results about general projection methods . . . 18

5 Krylov subspace methods 19 5.1 Krylov subspaces . . . 19

5.2 Arnoldi method . . . 21

5.3 Full orthogonalization method . . . 21

5.4 Restarted FOM . . . 23

5.5 IOM and DIOM . . . 23

5.6 GMRES . . . 25

5.6.1 Restarted GMRES . . . 27

5.7 Notes on implementation . . . 27

5.8 Comparing FOM and GMRES . . . 30

5.9 Preconditioning . . . 31

5.10 Deflation . . . 31

5.10.1 DEFLGMRES . . . 32

6 Fast multipole method 33 6.1 Basic concepts of the FMM method . . . 33 6.2 MLFMA . . . 37

7 Electric field integral equation 38

8 Incomplete factorization methods 41

8.1 Level of fill techniques . . . 41 8.2 Problems with ILU preconditioners . . . 43

9 Multilevel preconditioning 44

9.1 Independent set ordering . . . 44

10 Approximate inverse preconditioning 46

10.1 Frobenius-norm minimization preconditioner . . . 46

11 MLFMA-based preconditioners 50

12 Conclusion 52

### 1 Introduction

The first chapters of this thesis will be dedicated to techniques for numerically solving differential equations, and discussing their strengths and weaknesses. The discretization of such systems results in linear systems of the form.

Ax = b,

We will outline some of the most common methods for solving such linear systems.

1.1 Some notation

We will adopt the following notation throughout the thesis:

• ∆ denotes the Laplacian operator ∆ = P^{n}

k=1

∂

∂xi.

• h·, ·i denotes the standard inner product on R^{n}.

• For a matrix A, a?i and ai? denote the i^{th} column and row respectively. Also, range(A) will
denote the column space of A, that is range(A) = span_{i}{a?i}.

• Vectors will be given in bold as x.

• ei will always denote the i^{th} canonical basis vector.

• If D ⊂ R^{n} is some region in an n-dimensional space, we denote by ∂D its boundary.

• We suppress zeros in sparse matrix representation, for example we will write

B =

−4 1

1 −4 1

. ..

1 −4 1

1 −4

.

1.2 Some definitions and concepts

Some important definitions that are used in this manuscript are given here.

• The condition number κ2(A) of a matrix A is defined as
κ2(A) =kAk2kA^{−1}k.

It is a measure of how numerically stable a matrix is with respect to inversion. A numerically stable matrix A, will not amplify the small uncertainties of the data in the solution x.

• If ¯x is an approximate solution to Ax = b, we define the residual vector r as r= b− A¯x.

### 2 Numerical modelling of differential equations

Differential equations are ubiquitous in computational science. Their numerical solution typically results in linear systems of the form

Ax = b.

We will present some of the most common methods for solving such systems. In our discussion we will refer to the model problem:

∆u + k^{2}u = 0. (2.0.1)

defined on some domain D.

2.1 Finite difference method

The finite difference method is the easiest method to numerically approximate solutions of ordinary and partial differential equations. We present here only an introductory look at the method which is extensively covered in many books. A more in depth analysis of the finite difference method can be found for instance in [11].

We will assume that D is a square domain [a, b]× [c, d] and that u satisfies a homogeneous Dirichlet boundary condition, i.e., that u|∂D = 0. In general these assumptions do not necessarily need to hold when using finite difference methods, but we are here concerned with an overview of the general idea of the method.

The finite difference method relies on the Taylor expansion of a function
f (x + h) = f (x) + hf^{0}(x) + h^{2}

2!f^{00}(x) + . . . +h^{n}

n!f^{(n)}(x) + . . . (2.1.1)
where h is taken to be small. Upon truncating this series, we can obtain the approximation

f^{0}(x)≈ f (x + h)− f(x)

h . (2.1.2)

The truncation error for this approximation is given by
h^{2}

2!f^{00}(x) + h^{3}

3!f^{(3)}(x) + . . . . (2.1.3)

Similarly to equation 2.1.1 we can set up an approximation for f (x− h). Then adding this to equation 2.1.1 gives

f (x + h) + f (x− h) = 2

f (x) + h^{2}

2!f^{00}(x) + h^{4}

4!f^{(4)}(x) + . . .

. Then we may approximate the second derivative of f at x by

f^{00}(x)≈ f (x + h)− 2f(x) + f(x − h)
h^{2}

with an error of

− 2 h^{4}

4!f^{(4)}(x) + h^{6}

6!f^{(6)}(x) + . . .

.

The next step is to discretize our domain D into a rectangular grid. We subdivide the domain into
nm rectangles with height h_{1} = ^{d−c}_{n} and width h_{2} = ^{b−a}_{m} . We denote

u_{ij} = u(a + ih_{1}, c + jh_{2}).

Now, using the previous results we can write

∆u_{ij} = ∂^{2}

∂x^{2}u_{ij}+ ∂^{2}

∂y^{2}u_{ij} (2.1.4)

≈ u_{i+1,j}+ u_{i−1,j}

h^{2}_{2} +u_{i,j+1}+ u_{i,j−1}
h^{2}_{1} − 2

h^{2}_{1} + 2
h^{2}_{2}

u_{i,j}. (2.1.5)
When we use this approximation into our model problem, we obtain the formulation

ui+1,j+ ui−1,j

h^{2}_{2} +ui,j+1+ ui,j−1

h^{2}_{1} − 2
h^{2}_{1} + 2

h^{2}_{2} + k^{2}

ui,j = 0,

which holds for all discrete points (x_{i}, y_{i}) on the domain D. On the boundary we have prescribed u
to be equal to zero, so some of these terms will not appear in the final matrix. Now for simplicity
we take h := h1= h2. If we now define the vector

u= [u_{1,1}, . . . , u_{1,n}, u_{2,1}, . . . , u_{2,n}, . . . , u_{n,n}]^{T}

u12 u13

u22 u23

u32 u33

u02 u03 u04

u14

u24

u34

x y

a b

c d

u11

u21

u31

u00 u01

u10

u20

u30

u40 u41 u42 u43 u44

Figure 2.1: The rectangle [a, b]×[c, d] divided into 16 subrectangles.

Then we can write the system in the form
(A + k^{2}I)u = 0, (2.1.6)
where A is a block matrix of the form

A = 1
h^{2}

B I

I B I

. ..

I B I

I B

with

B =

−4 1

1 −4 1

. ..

1 −4 1

1 −4

.

Equation 2.1.6 is an eigenvalue-eigenvector prob- lem. Solving it means finding the eigenvector cor-

responding to the eigenvalue −k^{2}. Note that our original problem is also an eigenvalue problem if
we write is as

(∆ + k^{2})u = 0. (2.1.7)

Again the solution is the eigenvector of the operator ∆ with eigenvalue−k^{2}. In general, if u does
not satisfy a homogeneous Dirichlet boundary condition and the right hand side of equation 2.1.7 is
not zero, the right hand side of equation 2.1.6 would be non zero. Then we would have a standard
linear system

(A + k^{2}I)u = b, (2.1.8)

that may be solved by any method of choice. Note that the matrix involved is very sparse and thus iterative methods like the class of Krylov methods based on matrix-vector operations are more efficient than direct methods. These methods will be discussed later on.

The finite difference method is very simple and easy to implement. One of the major issues with this method, however, is that the domain D may have any arbitrary shape, and some highly ir- regular shapes can be approximated very poorly using rectangles. Another problem is the tuning of the value for h. Of course h needs to be sufficiently small so that the truncations made in the Taylor series are reasonably accurate. But in making h very small, we can see that entries in the matrix A become exceedingly large compared to the entries in the vector b in equation 2.1.8. This presents problems due to poor scaling where small errors may be enlarged. Therefore we cannot expect that the accuracy of our result will always increase when taking a smaller value for h.

As a general example of the numerical problems that we may encounter, consider the following example.

0 0.2 0.4 0.6 0.8 1 0

5 10 15

x

u

n = 10

0 0.2 0.4 0.6 0.8 1

0 1 2 3 4 5

x

error

0 0.2 0.4 0.6 0.8 1

0 20 40 60 80 100

x

u

n = 100

0 0.2 0.4 0.6 0.8 1

0 10 20 30 40

x

error

0 0.2 0.4 0.6 0.8 1

0 200 400 600 800 1000

x

u

n = 1000

0 0.2 0.4 0.6 0.8 1

0 100 200 300 400

x

error

Figure 2.2: A simple finite difference approximation of u^{0} = _{(1−x)}^{1} 2 shows some of the problems
that may occur when using the finite difference method. Notice that as n increases, the error near
x = 1 grows while the error away from x = 1 seems to decrease.

Example 2.1. Consider the one-dimensional problem
u^{0} = 1

(1− x)^{2}, x∈ [0, 1),

with u(0) = 1. The exact solution to this problem is u(x) = _{1−x}^{1} . A finite difference approximation
would divide the interval into n sub intervals of length h = _{n+1}^{1} and to use equation 2.1.2. Now we
need to be careful here and look at how accurate this approximation is. Notice that

d^{n}
dx^{n}

1 1− x

= n!

(1− x)^{n}
So, at xn= nh = _{n+1}^{n} , we have

d^{n}
dx^{n}

1

1− x^{n}

= n!

h^{n}.
Equation 2.1.3 now gives a bound on the truncation error of

1 + 1 + 1 + 1 + . . .

This series diverges and based on this error estimate we may expect exceedingly large errors for the approximation in equation 2.1.2. So in this case the implicit assumption that the truncation errors are not too large is not valid. In figure 2.2 we show some output from a simple Matlab finite difference code.

Next we move on to the finite element method which is a slightly more complex method.

2.2 The finite element method

Originally conceived as a technique used in civil engineering, the finite element method is now a widely used tool for numerically solving ordinary and partial differential equations. It has the ad- vantage that it can handle complex geometries and boundary conditions in a more natural way than the finite difference method. We will see that the finite element method is a projection method, these methods will be discussed in more detail later on. This overview of the finite element method will follow [19]

We consider again equation 2.0.1, this time with the more general boundary condition

u|∂D1 = g_{D}, (2.2.1)

∂u

∂n ∂D2

= g_{N}. (2.2.2)

So on one part of the boundary we have a Dirichlet condition and on the other part a Neumann
condition with ∂D = ∂D_{1}∪ ∂D2 and ∂D_{1}∩ ∂D2 =∅.

There is some nomenclature that is important here. A solution to the DE with boundary condi- tions 2.2.1 and 2.2.2 will be known as a classical solution. Also, Dirichlet and Neumann boundary conditions are known as essential and natural boundary conditions, respectively.

We will first derive what is known as the weak formulation of this problem. To do, this we will need to use Green’s first identity. This states that if u is a twice continuously differentiable function defined on D and v is a once continuously differentiable function defined on D, then

Z

D

v∆u dA = I

∂D

v∂u

∂n dS− Z

D

∇u · ∇vdA. (2.2.3)

2.2.1 Weak formulation

To derive the weak formulation we first multiply equation 2.0.1 by some test function v, and we integrate over the entire domain D to obtain

Z

D

v∆u + vk^{2}u dA = 0.

Now we can employ equation 2.2.3 to obtain I

∂D

v∂u

∂n dS + Z

D

vk^{2}u dA =
Z

D

∇v · ∇udA. (2.2.4)

A solution to the original ODE must satisfy this equation for all v^{1}. However, it is possible that
there exist solutions u to equation 2.2.4 that are not sufficiently smooth to satisfy the original
problem. For instance solutions u of shock wave problems can not be classical solutions as they are
not differentiable.

We proceed by demanding that this equation is satisfied for all v in a certain function space. We
will need the notion of a Sobolev space H^{1}. This space is defined as

H^{1} =

u : D→ R : Z

D

u^{2}+∇u · ∇u dA < ∞

.

This choice of search space for v ensures that the integrals in 2.2.4 converge provided that u is bounded and the region D is bounded. We will now define two spaces.

Solution space: H^{1}S = w ∈ H^{1}: w|∂D1 = gD , (2.2.5)
Test space: H^{1}0 = w ∈ H^{1}: w|∂D1 = 0 . (2.2.6)
The solution space is the space in which we will look for u and the test space is the space from
which we will take our test function v. Note that the solution space is technically not a vector
space as it is not closed under addition unless g_{D} ≡ 0. It is clear now that any classical solution u
is also a solution to the weak formulation given by

Find uh ∈ H^{1}S such that
Z

∂D2

vhgN dS + k^{2}
Z

D

vhuh dA = Z

D

∇vh· ∇uhdA, for all vh∈ H^{1}0. (2.2.7)

1Provided that the integrals converge for this choice of v.

2.2.2 Galerkin method

We will now construct a finite subspace of the solution spaceKS ⊂ H^{1}S. The idea is to take a finite
dimensional subspace K0 ⊂ H^{1}0 with a basis {φ1, . . . , φ_{n}}. In order to ensure that the Dirichlet
boundary condition on ∂D1 is satisfied, we add an additional set of vectors {φn+1, . . . , φn+m}, so
that for fixed coefficients un+1, . . . , un+m we have

n+m

X

k=n+1

u_{k}φ_{k}

! ∂D1

= gD,

and n+m

X

k=n+1

u_{k}φ_{k}

! ∂D2

= 0.

This ensures that we can reconstruct the Dirichlet boundary condition on ∂D_{1}. Define nowKS ⊂
H^{1}S as

KS =
( _{n}

X

k=1

u_{k}φ_{k}+

n+m

X

k=n+1

u_{k}φ_{k}

(u_{1}, . . . , u_{n})∈ R^{n}
)

.

Notice that all functions inKS satisfy the essential boundary condition on ∂D1. Now we look for
the finite element approximation u_{h} in the function spaceKS. The functions φ_{1}, . . . , φ_{n} are often
called the trial functions or shape functions. In a similar way we construct a finite basis that forms
a subspace that will be our test space; denote thisL0. In the Galerkin method, test functions are
chosen to be in the space K^{0}, i.e., K^{0} = L^{0}. In the Petrov-Galerkin method, the space of test
functions is not equal to span{φ1, . . . , φ_{n}}.

Now we demand that equation 2.2.7 holds for each of the basis functions φi. The corresponding weak formulation writes then:

Find (u1, . . . , un) such that

n

X

k=1

u_{k}

Z

D

∇φj· ∇φkdA− k^{2}
Z

D

φ_{j}φ_{k} dA

= Z

∂D2

φ_{j}g_{N} dS−

n+m

X

k=n+1

u_{k}
Z

D

∇φi· ∇φkdA.

for all j = 1, . . . , n. Now we are ready to write this in matrix form as Au = f ,

with

A_{ij} =
Z

D

∇φi· ∇φj− k^{2}φ_{i}φ_{j} dA, (2.2.8)

e^{T}_{i} f =
Z

∂D2

φigNdS +

n+m

X

k=n+1

u_{k}
Z

D

∇φ^{i}· ∇φkdA. (2.2.9)

Historically, from development and use in engineering, the matrix A is often called the stiffness matrix. Clearly, A is symmetric. In the case that our original problem is the Laplace equation, i.e., when k = 0, we have that A is also positive definite if ∂D = ∂D1. To see this, note that

v^{T}Av =

n

X

i=1 n

X

j=1

viaijvj (2.2.10)

=

n

X

i=1 n

X

j=1

vi

Z

D

φi· φjdAvj (2.2.11)

= Z

D

vh· vhdA (2.2.12)

≥ 0. (2.2.13)

If ∂D = ∂D_{1}, then v_{h}|∂D= 0. Then by the continuity of v_{h} we have that v≡ 0. Now as φ1, . . . , φ_{n}
form a basis for K0 this implies that v = 0. In the case that ∂D 6= ∂D1 we have only that A is
positive semi definite.

This concludes the brief introductions into the two traditional methods of modelling differential equations; these are probably the first two approaches that students encounter. The next section will discuss integral equations and show how to derive an integral equation for the acoustic scattering problem.

### 3 Integral Equations

In many cases, differential equations have an equivalent integral equation representation on the boundary of the domain that is sometimes preferable to use. Here we will first introduce integral equations and then derive an integral equation for the acoustic problem.

3.1 Integral equations

One of the most basic kinds of integral equations is known as a Fredholm equation. This is expressed mathematically as

b

Z

a

K(x, t)y(t)dt = f (x). (3.1.1)

Here f and K are known functions and y(t) is the function that we want to determine. Integral equations are characterized by the following definitions:

• If the unknown function appears both inside and outside the integral, we call it an integral equation of the second kind. Otherwise, we call it an integral equation of the first kind.

• If the limits of integration are constant, as in equation 3.1.1, it is termed a Fredholm equation.

If one of the limits of integration is variable, the equation is known as a Volterra equation.

• If the right hand side is equal to zero, the equation is homogeneous, analogously to the case of differential equations. Otherwise, the integral equation is called inhomogeneous.

Now, an integral equation with two variable bounds can transformed by a suitable change of co- ordinates into a Volterra equation.

3.2 An integral equation for the acoustic problem

As an example of the derivation of an integral equation, we present here the case of the acoustic scattering problem

∆u + k^{2}u = 0,

defined in some domain D. We will specify the boundary condition for this problem later. It will prove very useful to find an analytic solution for the 3D acoustic field in some bounded region D due to a point source at x0 ∈ D [6]. Green’s second theorem will be instrumental in this derivation.

Green’s second theorem states that if f and g are twice continuously differentiable on D, then Z

∂D

f∂g

∂n− g∂f

∂n

dS =

Z

D

(g∆f − f∆g) dV.

Here _{∂n}^{∂g} = n(x)·∇g(x) is the directional derivative in the direction normal to ∂D pointing outward.

Here we are assuming that the boundary of D, denoted ∂D, is piecewise smooth.

To derive an integral equation for u we make the following assumptions on u:

1. u must satisfy the Helmholtz equation ∆u + k^{2}u = 0 in D\ {x0}.

2. Near x0, the solution is ’close’ to the free field solution. If we define
u^{s}(x) := u(x)− G(x, x0),

then this condition implies that u^{s} and ∇u^{s} are continuous in some neighbourhood of x0.

3. u is twice continuously differentiable in D\ {x0}.

At this stage, fix a point x in D and define

φ(y) := u(y), (3.2.1)

ψ(y) := G(y, x). (3.2.2)

Here u is the solution we are looking for, and

G(y, x) := e^{ik|y−x|}

|y − x|.
To avoid singularities at x and x_{0}, we define

V ={y ∈ D | y /∈ {x, x0}},

and notice that both φ and ψ satisfy the conditions for Green’s second theorem on V , as well as the Helmholtz equation. This implies that

ψ∆φ− φ∆ψ = ψ(−k^{2}φ)− φ(−k^{2}ψ) = 0,
so applying Green’s second theorem yields

Z

∂V

φ∂ψ

∂n− ψ∂φ

∂n

dS = 0.

Notice that this holds for x in the interior of D as well as on the boundary. We will first consider the case where x is contained in the interior of D. Taking sufficiently small and dividing ∂V into three surfaces, namely, ∂D, the sphere γ around x and the sphere Γ around x0, we may write

Z

∂D

+ Z

γ

+ Z

Γ

= 0. (3.2.3)

We analyse what happens to these integrals as we let → 0. We focus our attention first to R

γ. A simple computation shows that

∂ψ

∂n = n(y)· ∇ψ(y) = −e^{ik}(ik− 1)
4π^{2} .
Observing that on γ_{} the function ψ takes the simple form

ψ(y)|γ = G(y, x)|γ =−e^{ik}

4π

it follows that

Z

γ

=− 1 4π

e^{ik}(ik− 1)

^{2}

Z

γ

φdS−e^{ik}

Z

γ

∂φ

∂ndS

.

We are interested in the case when → 0. The function φ is defined at x, and thus we can use the

result Z

γ

φdS ∼ φ(x) Z

γ

dS = 4π^{2}φ(x). (3.2.4)

Similarly, as ^{∂φ}_{∂n} is bounded, we see that
Z

γ

∂φ

∂ndS =O(^{2}).

Using both these results we conclude that Z

γ

∼ −e^{ik}(ik− 1)φ(x) + O(), as → 0,
and this implies that lim

→0

R

γ = φ(x).

Similarly we can show that lim

→0

R

Γ =−ψ(x0).

Taking the limit, as tends to zero, of equation 3.2.3, and using the previous two results and recalling that φ(y) = u(y) and ψ(y) = G(y, x), we obtain

u(x) = G(x0, x)− Z

∂D

u(y)∂G(y, x)

∂n(y) − G(y, x)∂u(y)

∂n(y)

dS. (3.2.5)

This is the integral equation that u must satisfy inside the domain D\ {x0}. Now, a similar result can be derived if we instead look at x on the boundary of D. First, we define the quantity

Ω(x) = lim

→0

1

^{2}
Z

S

dS

where S is the part of the sphere around x of radius contained in D. If the boundary is smooth at x then Ω(x) = 2π, for instance. We then replace equation 3.2.4 by

Z

γ

φdS ∼ φ(x) Z

γ

dS∼ Ω(x)^{2}φ(x).

Thus we only need to change one of the terms in equation 3.2.5 to finally obtain Ω(x)

4π u(x) = G(x_{0}, x)−
Z

∂D

u(y)∂G(y, x)

∂n(y) − G(y, x)∂u(y)

∂n(y)

dS,

for x on the boundary of D. Now we have rewritten the Helmholtz equation to an integral equation, specifically an inhomogeneous Fredholm equation of the second kind. In deriving these equations, we have not used the boundary conditions and thus this integral representation holds irrespective of the boundary condition. In light of our particular problem, we can now use the boundary equation

∂u

∂n =−ikβu

in the derived integral equation. This leads to the integral equations u(x) = G(x0, x)−

Z

∂D

∂G(y, x)

∂n(y) + ikβ(y)G(y, x)

u(y)dS, (3.2.6) Ω(x)

4π u(x) = G(x_{0}, x)−
Z

∂D

∂G(y, x)

∂n(y) + ikβ(y)G(y, x)

u(y)dS, (3.2.7)

for x in D and ∂D, respectively. These equations are called boundary integral equations as the integration occurs over the boundary of D. Now, one advantage of this boundary integral formula- tion of our original acoustic problem becomes apparent if we solve equation 3.2.7 on the boundary

∂D and then can reconstruct u on the whole domain D using equation 3.2.6. We have effectively reduced a problem over D to a problem over its boundary only. However, we do have to evaluate the Green’s functions and their integrals, which may be costly. Also, in our integral formulation,

the value of u at one point (of the boundary) depends on the value of u at all other points of the boundary due to the integration. Upon discretization, this will lead to linear systems with fully dense matrices in contrast to numerical schemes where approximate the Laplace operator directly using the values of u in some small neighbourhood.

In order to make an integral equation problem amenable to numerical approximation, we need to discretize the problem in some way. A suitable method to use is the finite element method. Sub- sequently we have to solve the problem of numerically approximating the integrals in the equation.

In the next section we will discuss one simple method to discretize the problem.

3.3 Discretizating our model problem

We apply the ideas of FEM to our integral equation. The boundary is divided into N control
surfaces, usually triangles. We denote these surfaces as γj for j = 1, . . . , N . Now, in the simplest
case we can restrict u to be constant on each γ_{i}, i.e., u|γj = u_{j}. Then equation 3.2.7 takes the form

Ω(x)

4π u(x)≈ G(x0, x)−

N

X

j=1

u_{j}
Z

γj

∂G(y, x)

∂n(y) + ikβ(y)G(y, x)

dS. (3.3.1)

Using the collocation method, we pick a point xi in each γi and impose that equation 3.3.1 holds
exactly at these points. Usually xi is chosen as the centroid of γi. We can quite safely assume that
x_{i} lies on a smooth part of ∂D, so Ω(x_{i}) = 2π. We thus require u_{i} to satisfy

1

2ui= G(x0, xi)−

N

X

j=1

uj

Z

γj

∂G(y, xi)

∂n(y) + ikβ(y)G(y, xi)

dS

for i = 1, . . . , N . This is a set of N simultaneous equations in N variables and thus it writes as a linear system by setting

b_{i} = G(x_{0}, x_{i}), (3.3.2)

cij = Z

γj

∂G(y, xi)

∂n(y) + ikβ(y)G(y, xi)

dS. (3.3.3)

The linear system has the form

1

2u= b− Cu.

Equivalently, we can cast this in the more familiar form

Au = b (3.3.4)

if we set A = ^{1}_{2}I + C. An important question is how to evaluate the entries in b and C. Note
for instance that in the expression for c_{ii} we run into trouble due to the singularity of G(y, x_{i}) at
y = xi ∈ γi. Often some form of Gaussian quadrature is used. Now from the expression for cij

we can see that the problem mentioned earlier appears, namely that C (and thus A) is a dense
matrix. We can already see some possibilities for improvement, though. The elements c_{ij} describe

the influence on u_{i} due to to a point source at γ_{j}. We can simplify the problem by considering
a collection of point sources far away from γ_{i} as a single point source. We will explore this idea
further when we discuss the fast multipole method in section 6.

When the geometry of the problem is irregular, the size of the control surfaces may vary in size, leading to an ill-conditioned matrix A. This will make the solution of the linear system 3.3.4 poten- tially more difficult. In Figure 3.1 we plot the eigenvalue distribution for a small scattering problem from a typical geometry matrix. There is little clustering of the eigenvalues and many eigenvalues have large negative real part. Such a distribution is unfavourable for rapid convergence of Krylov methods for solving the linear system [5].

Figure 3.1: Eigenvalue distribution of the geometry matrix of a satellite. Source [5]

Now that we have introduced the notion of integral equations and BEM formulation of a differential equation we to briefly discuss projection methods before diving into Krylov subspace methods which are the focus of this thesis.

### 4 Projection methods

Projection methods are used when we want to extract some approximate solution to a problem from some spaceK. This space is called the search space, of dimension m. In this case we need m constraints to ensure that we will get a unique solution. Often this is realized by demanding that the residual b− Ax is orthogonal to some other m dimensional space L, also called the constraint space or left subspace [15]. These two conditions on the approximate solution are known as the

Petrov-Galerkin conditions.

L
K
b− Ax^{∗}

K = L

b− Ax^{∗}
Oblique Projection

Orthogonal projection

Figure 4.1: Oblique and orthogonal projec- tion.

There are roughly two classes of projection methods: orthogonal and oblique. In or- thogonal projection methods we set L = K while in oblique projection methods this is not the case and L will depend on the al- gorithm of choice. If the search space is the same as the constraint space the con- dition is often called the Galerkin condi- tion.

Now consider again the problem^{2}

Ax = b. (4.0.5)

If we want to impose the Petrov-Galerkin condi-
tions this means we want to find x^{∗} ∈ K such that
b− Ax^{∗} ⊥ L. We sometimes wish to incorporate
an initial guess x0, in which case we look for x^{∗}
in the affine space x0+K. The conditions then
are given by

find x^{∗}∈ x0+K such that b − Ax^{∗} ⊥ L.

The requirement that b− Ax^{∗} be orthogonal to
L suggests that we can also adopt a minimiza-
tion point of view on the projection technique.

Namely, we can use the alternate but equivalent conditions

find x^{∗} ∈ K such that d(b − Ax^{∗},L) is minimized.

This is again equivalent to

find x^{∗}∈ K such that hb − Ax^{∗}, wi = 0 for all w ∈ L.

It is often the case that the spacesK and L change at each iteration, often they will be enlarged. We will see such methods later on when we describe Krylov subspace methods which are an important example of projection methods.

4.1 General projection method

Now suppose we want to find an approximation to equation 4.0.5. Consider V = [v_{1}, . . . , v_{m}] and
W = [w1, . . . , wm] such that

span{v1, . . . , v_{m}} = K, (4.1.1)

span{w1, . . . , w_{m}} = L. (4.1.2)

2We present here the argument for a linear system, but these methods also exists for problems of the form Au = f where A is an operator and f and u are functions.

Then we can write the approximate solution as

x^{∗}= x0+ V y,
for some y∈ R^{m}. Then multiplying by W^{T}A yields

W^{T}Ax^{∗} = W^{T}Ax_{0}+ W^{T}AV y.

Noting that W^{T}A(x^{∗}− x0) = W^{T}(r− r0) = W^{T}r_{0} gives
W^{T}r0 = W^{T}AV y.

In the case that W^{T}AV is invertible we can solve this to obtain
y = (W^{T}AV )^{−1}W^{T}r_{0}.

A general algorithm for a projection method with Petrov-Galerkin conditions is given in algorithm 4.1.

Algorithm 4.1: General projection algorithm

1 repeat

2 Select two subspacesK and L

3 Find bases V and W for K and L

4 r = b− Ax

5 y = (W^{T}AV )^{−1}W^{T}r

6 x = x0+ V y

7 until convergence;

4.2 Results about general projection methods

The following proposition guarantees that W^{T}AV is invertible under some conditions.

Proposition 4.1. Let A, K and L satisfy one of the following two properties:

1. A is SPD and K = L, or 2. A is invertible and L = AK.

Then B = W^{T}AV is invertible, regardless of the bases V and W of K and L respectively.

Proof. Consider the first case. Let V and W be bases of L and K respectively. Then, as L = K, we can write W = V G for some nonsingular matrix G. Now

B = W^{T}AV = G^{T}V^{T}AV.

Because A is SPD we have that V^{T}AV is SPD, and hence invertible. Thus B is invertible.

Consider now the second case. By the condition that L = AK, it now follows that we can write W = AV G for some nonsingular matrix G. Thus

B = G^{T}(AV )^{T}AV.

Because A is nonsingular, the matrix AV has full rank^{3} and so (AV )^{T}AV is nonsingular. Thus B
is nonsingular.

For the same two cases, two slightly more complicated results can be derived about the approxim-
ation x^{∗} they deliver. consider the following two theorems.

Proposition 4.2. Let A be SPD andK = L. Then x^{∗} is result of the orthogonal projection method
onto K with initial vector x0 if and only if x^{∗} satisfies

E(x^{∗}) = min

x∈x0+KE(x), where E(x) =h¯x − x, ¯x − xi

1 2

A withx satisfying A¯¯ x = b.

Proof. By the properties of projections, x^{∗} minimizes E(x) if and only if
V^{T}A(¯x− x^{∗}) = 0

for any V whose columns span K. Then we write x^{∗} = x_{0}+ V y and rearrange to get
V^{T}A(¯x− x0) = V^{T}AV y.

Here A(¯x− x0) = r_{0}, so that

y = (V^{T}AV )^{−1}V^{T}r_{0}, (4.2.1)

and so

x^{∗}= x0+ (V^{T}AV )^{−1}V^{T}r0.

This is precisely the x^{∗} obtained from the orthogonal projection method as seen in section 4.1.

Note that the inverse in equation 4.2.1 exists by theorem 4.1.

Now we derive a similar result for the case that A is arbitrary and L = AK.

Proposition 4.3. Let A be invertible and L = AK. Then x^{∗} is result of the oblique projection
method onto K orthogonal to L with initial vector x0 if and only if x^{∗} satisfies

R(x^{∗}) = min

x∈x0+KR(x), where R(x) =kb − Axk2.

Proof. See [15]

The two cases just discussed turn out to be very important and give rise to two widely used algorithms that will be discussed later on.

3Suppose not, in that case there would exist a non zero vector y such that AV y = 0. But that means that Az = 0 with z = Ay, with z non-zero as V has full rank. This contradicts the nonsingularity of A.

### 5 Krylov subspace methods

In the last section we looked at projection techniques in a general setting. We will now look more concretely at a certain type of projection method whereK is what is known as a Krylov subspace.

5.1 Krylov subspaces

Aleksey Nikolaevich Krylov, 1863- 1945.

Given A ∈ R^{n×n} and v ∈ R^{n×1}, we define the corres-
ponding m dimensional Krylov space as

Km(A, v) := span{v, Av, A^{2}v, . . . , A^{m−1}v}.

Krylov subspace methods rely on finding an approx-
imation to the exact solution A^{−1}b in the space
Km(A, b). We can find an m for which we can be sure
that A^{−1}b ∈ Km(A, b). Recall the following defini-
tion.

The minimal polynomial corresponding to ann×n mat- rixA is the lowest degree monic polynomial q(·) such that

q(A) = 0.

The Cayley-Hamilton theorem tells us that p(A) =

n

X

k=0

α_{k}A^{k}= 0,

where p(·) is the characteristic polynomial of A. Thus the degree of the minimum polynomial of A does not exceed n. Suppose that the minimal polynomial of A has degree m. Then we can write

q(A) =

m

X

k=0

β_{k}A^{k}= 0.

Pre multiplying this equality with A^{−1} and rearranging shows us that we can solve for A^{−1} in terms
of A, A^{2}, . . . A^{m−1}. So it follows that

A^{−1}b∈ Km(A, b).

In practice it is almost never useful to first find the minimal polynomial and then use this inform- ation to restrict the maximum size of the Krylov space as this would require knowledge of the spectrum of A, which is not available in general. Instead, the Krylov space is enlarged at each iterations and a residual is calculated to be used in a stopping criterion.

As we increase the dimension of the Krylov space, the new vectors A^{k}bbecome increasingly linearly
dependent. To see this, decompose^{4}

b=

n

X

i=1

βizi,

4Such a decomposition is not always possible if A is singular, i.e., when the eigenvectors of A do not span R^{n}.

where z_{i} are the normalized eigenvectors of A. Then

p→∞lim
A^{p}b

kA^{p}bk → z^{m}

where zm is the eigenvector of A corresponding to the largest eigenvalue. This property is actually
exploited in the so-called power method to determine the largest eigenvalue and the corresponding
eigenvector of a matrix A [2]. Therefore we may expect that, due to numerical errors and round
off errors, after a certain amount of iterations the computed vectors will no longer be linearly
independent. To avoid this problem, we generate an orthonormal basis {v1, v_{2}, . . . , v_{m}} such that

span{v1, v_{2}, . . . , v_{m}} = Km(A, v).

A well known method for generating an orthonormal basis of a Krylov space is called the Arnoldi method.

5.2 Arnoldi method

The discussion of the Arnoldi method follows [15]. In essence, the Arnoldi method generates an or-

thonormal basis forKm(A, v) by using the Gram-Schmidt orthogonalization process on{v, Av, . . . , A^{m−1}v).

Algorithm 5.1: Arnoldi algorithm

1 v_{1} = v/kvk2
2 for j = 1,. . . ,m do

3 for i = 1,. . . ,j do

4 h_{ij} = v_{i}^{T}Av_{j}

5 end

6 w_{j} = Av_{j} −

j

P

i=1

h_{ij}v_{i}

7 h_{j+1,j}=kwjk2
8 if hj+1,j= 0 then

9 Stop

10 else

11 vj+1 = _{h}^{w}^{j}

j+1,j

12 end

13 end

Proposition 5.1. Let V_{m} be the n× m matrix with columns vectors v1, . . . , v_{m} and ˜H_{m} be the
(m+1)×m Hessenberg matrix whose non-zero entries are hij, as defined in algorithm 5.1. Further,
let H_{m}= ˜H_{m}(1 : m, 1 : m). Then the following relations hold:

AV_{m} = V_{m}H_{m}+ h_{m+1,m}v_{m+1}e^{T}_{m} (5.2.1)

= Vm+1H˜m, (5.2.2)

V_{m}^{T}AVm = Hm. (5.2.3)

where em= (0, 0, . . . , 0, 1)^{T} is them^{th} canonical basis vector for R^{m}.

Now that we have constructed a basis forKm(A, v) we can use this to find approximate solutions to linear systems. We will describe here some iterative methods exploiting the idea of the Arnoldi basis.

5.3 Full orthogonalization method

The full orthogonalization method (FOM) seeks an approximate solution x_{m}= x_{0}+ V_{m}ysuch that
b− Axm⊥ Km.

It is thus an orthogonal projection method. It is common to define β =kr0k2 and let v1 = r0/β.

Then proposition 5.1 states that

V_{m}^{T}AVm= Hm

and we have that

V_{m}^{T}r_{0} = V_{m}^{T}(βv_{1}) = βe_{1},

by the orthonormality of V_{m}. Then we can state the following proposition:

Proposition 5.2. The solution to the computed by the full orthogonalization method is
x_{m} = x_{0}+ V_{m}y,

with

y= H_{m}^{−1}(v1).

Proof. It is clear that x_{m} ∈ x0+Km. It remains to show that b− Axm ⊥ Km, i.e., that
Vm(b− Axm) = 0.

Now,

V_{m}^{T}(b− Axm) = V_{m}^{T}(b− A(x0+ V_{m}H_{m}^{−1}(βe_{1}))) (5.3.1)

= V_{m}^{T}(b− Ax0)− Vm^{T}AV_{m}H^{−1}(βe_{1}) (5.3.2)

= V_{m}^{T}r0− H^{m}H^{−1}(βe1) (5.3.3)

= βe1− βe1 (5.3.4)

= 0. (5.3.5)

Algorithm 5.2: Pseudo code FOM algorithm

1 Set r_{0} = b− Ax0, β =kr0k2 and v_{1}= r_{0}/b.

2 Compute H_{m} using the Arnoldi algorithm

3 Compute ym= H_{m}^{−1}(βe1)

4 Set xm = x0+ Vmym.

In practice is it not desirable to set a fixed m beforehand as in algorithm 5.2 but to choose it dynamically according to some criterion. The following useful result can be used.

Proposition 5.3. The residual b− Axm for the approximate solution x_{m} found by algorithm 5.2
satisfies

b− Axm=−hm+1e^{T}_{m}ymvm+1.
Therefore

kb − Axmk2 = h_{m+1}|e^{T}my_{m}|. (5.3.6)
This theorem allows us to evaluate the residual at each step so that we may design a stopping
criterion without having to explicitly determine x_{m} at each iteration.

Proof. We have

b− Axm = b− A(x0+ Vmym) (5.3.7)

= r0− AVmym (5.3.8)

= r_{0}− VmH_{m}y_{m}− hm+1e^{T}_{m}y_{m}v_{m+1} (5.3.9)

= r_{0}− VmH_{m}H_{m}^{−1}(βe_{1})− hm+1e^{T}_{m}y_{m}v_{m+1} (5.3.10)

= −h^{m+1}e^{T}_{m}ymvm+1 (5.3.11)

The second identity of proposition 5.3 follows by simply taking the norm.

Then we can use this result to calculate the residual at each step by simply performing one vector- vector product.

5.4 Restarted FOM

Due to the orthogonalization process required in the FOM method, after m iterations we need to store m vectors. Also, at the new iteration, orthogonalization with respect to the previously computed m basis vectors will be performed. Thus the amount of flops required for each iteration increases with m and may become prohibitive. There are several methods to alleviate this problem.

The first is the restarted FOM method.

In the restarted FOM method, k iterations of the FOM algorithm are performed and if at that point no sufficient convergence is achieved, the FOM method is restarted after setting x0 = xm. Using this method, no more than k vectors need to be stored in total and the orthogonalization runs over at most k vectors. So the time and storage complexity cannot grow unbounded as the number of iterations increases.

Algorithm 5.3: Pseudo code of restarted FOM algorithm

1 while not converged do

2 Set r0 = b− Ax0, β =kr0k2 and v1= r0/b

3 Compute Hm using the Arnoldi algorithm

4 Compute y_{m} = H_{m}^{−1}(βe_{1})

5 Set xm = x0+ Vmym 6 Set x0 = xm

7 end

5.5 IOM and DIOM

Another method of controlling the time and storage complexity of the FOM method is by using the incomplete orthogonalization method (IOM). In this method we select an integer k > 0 and we replace the orthogonalization process that uses all vector determined so far by an orthogonalization process that uses only the k last determined vectors. To do this we simply modify the Arnoldi process in algorithm 5.1 to the algorithm 5.4.

Algorithm 5.4: Incomplete Arnoldi algorithm

1 v_{1} = v/kvk2
2 for j = 1,. . . ,m do

3 for i = max(1, j− k + 1), . . . , j do

4 h_{ij} = v_{i}^{T}Av_{j}

5 end

6 wj = Avj −

j

P

i=max(1,j−k+1)

hijvi
7 h_{j+1,j}=kwjk2

8 if hj+1,j= 0 then

9 Stop

10 else

11 v_{j+1} = _{h}^{w}^{j}

j+1,j

12 end

13 end

Now we need to store and orthogonalize at most k vectors so we may bound the time and storage
complexity. One problem arises, however, in the construction of xm. To do this we require the
matrix Vm which the algorithm did not fully store if the number of iterations exceeded k. One
solution is to recompute the matrix V_{m} after the algorithm finishes but this essentially doubles the
computation time required and is thus no good. An alternative approach consists of updating the
solution xm at each iteration so that we include the contribution due to each new vector vi.
The Hessenberg matrix Hm that we obtain in the IOM method is has upper bandwidth k− 1 and
lower bandwidth 1 due to its Hessenberg structure. Then if LU = Hm is the LU factorization of
H_{m} it is a well known result that U will have upper bandwidth of k− 1 as well. The direct IOM
method (DIOM) exploits this property. The approximate solution x_{m} is, as before, given by

x_{m}= x_{0}+ V_{m}H_{m}^{−1}(βe_{1}) = x_{0}+ V_{m}U_{m}^{−1}L^{−1}_{m} (βe_{1}).

Now define

P_{m} = V_{m}U_{m}^{−1}, z_{m} = L^{−1}_{m} (βe_{1}).

Then

xm= x0+ Pmzm. (5.5.1)

From P_{m}U_{m}= V_{m} we have that

p_{m}= 1
u_{mm}

"

v_{m}−

m−1

X

i=m−k+1

u_{im}p_{i}

# ,

where p_{m} is the last column of P_{m}. Also, if we let ζ_{m} =−lm,m−1ζ_{m−1} we may write
z_{m} =zm−1

ζ_{m}

, with ζ1 = β. Now from equation 5.5.1 we get that

xm= x0+ Pm−1, pm

zm−1

ζ_{m}

= x0+ Pm−1zm−1+ ζmpm = xm−1+ ζpm.
So we can update x_{m} using x_{m−1}, the previous vectors p_{m−k+1}, . . . , p_{m−1} and ζ_{m−1}.
Algorithm 5.5: DIOM

1 Set r_{0} = b− Ax0, β =kr0k2, v_{1} = r_{0}/b and j = 1. while not converged do

2 for i = max(1, j− k + 1), . . . , j do

3 compute hij using the incomplete Arnoldi algorithm.

4 end

5 compute v_{j+1}

6 find the LU factorization of Hj by updating the LU factorization of Hj−1 7 Set ζm =

(β, if m = 1

−L^{j,j−1}ζj−1, else

8 Set pm= _{u}^{1}

mm

"

v_{m}− ^{m−1}P

i=m−k+1

uimp_{i}

# .

9 Set xm = xm−1+ ζmpm 10 end

5.6 GMRES

The GMRES method was proposed by Saad and Schultz in 1986 in [16]. It is a projection method where we choose K = Km

A,^{r}_{β}^{0}

, and L = AK. It is thus an oblique projection method. As
discussed in section 4 the residual b− Ax is minimized over all vectors in x^{0}+K^{m}, so the solution
can be written as

x0+ Vmy.

Define

J(y) =kb − Axk2 =kb − A(x0+ Vmy)k2. Then

b− Ax = b − Ax0+ AV_{m}y (5.6.1)

= r_{0}− AVmy (5.6.2)

= βv_{1}− Vm+1H¯_{m}y (5.6.3)

= Vm+1(be1− ¯Hy). (5.6.4)

By construction, V_{m+1} is an orthonormal matrix so

J(y) =kβe1− ¯Hmyk2.

We want to minimize this function. The solution x_{m} can be computed by first solving the minim-
ization problem

ym= argminkβe^{1}− ¯Hmyk^{2}, (5.6.5)
and then recovering xm via

xm= x0+ Vmym.

The minimizer in equation 5.6.5 can be cheaply computed given that m is not too large [15]. Now the algorithm is built upon successively increasing the size of the Krylov space, determining ym

and checking if some stopping criterion is met. The algorithm of the GMRES method is given in algorithm 5.6.

Algorithm 5.6: GMRES

1 Set r0 = b− Ax0, β =kr0k2, v1 = r0/b and j = 1.

2 while not converged do

3 for i = 1,. . . ,j do

4 hij = v_{i}^{T}Avj

5 end

6 wj = Avj −

j

P

i=1

hijvi 7 hj+1,j=kwjk2 8 if hj+1,j= 0 then

9 Stop

10 else

11 vj+1 = _{h}^{w}^{j}

j+1,j

12 end

13 y_{j} = argminkβe1− ¯H_{j}y_{j}k2
14 j = j + 1

15 end

16 Compute xm= xm+ Vmym.

The algorithm differs from the Arnoldi algorithm only by the fact that we use the orthonormal basis generated to explicitly find an approximate solution ym in each iteration. The algorithm breaks down only if the Arnoldi process breaks down, i.e., when hj+1,j= 0.

Proposition 5.4. [15] Given thatA is non-singular, then the GMRES method breaks down at step
j if and only if the solution x_{j} is the exact solution to Ax = b.

This condition is sometimes also known as the optimality principle of GMRES. It simply means that we will be assured to converge to the exact solution assuming exact arithmetic in at most n iterations. A proof can be found in [15].

Note that the GMRES method requires the storage of m orthonormal n× 1 vectors after m iter- ations. Furthermore, each iterations requires m inner products. This storage and computational requirement may become too large giving rise to the development of the so-called restarted GMRES method.

5.6.1 Restarted GMRES

The idea of the restarted GMRES method is to perform a certain amount of iterations, say k
iterations, of the standard GMRES method, to compute the current approximation xm and restart
the algorithm setting x0 = xm. There are some advantages to this method. The major advantage
of the restarting procedure is that the matrix V_{m} can be discarded from memory at every inner
cycle of k inner iterations. In this manner the memory storage is about k× n for Vk, and each
iteration requires at most k inner products.

Algorithm 5.7: Restarted GMRES.

1 while Not converged do

2 Set r0 = b− Ax^{0}, β =kr^{0}k^{2} and v1= r0/b.

3 Generate ¯H_{k} using the Arnoldi algorithm starting with v_{1}.

4 Determine yk and xk.

5 x0 = x_{k}.

6 end

Every restart cycle is called an outer loop and the k loops performed at each restart are referred to as the inner loops. Note that the optimality principle is lost, that is, after n inner loops there is no guarantee the exact solution was computed even using exact arithmetic.

5.7 Notes on implementation

One of the problems with the GMRES algorithm as described above is that the true residual b−Axm

is not known unless it is explicitly calculated. A possible remedy is to compute x_{m} after a given
number of iterations and evaluate the residual then to see if sufficient convergence is achieved. We
can do better, however.

The least squares problem in equation 5.6.5 is often solved using plane rotations to transform the Hessenberg matrix into an upper triangular form. This is done by using Givens rotation matrices of the form

Ω_{i}=

1

. ..

1

c_{i} s_{i}

−si ci

1 . ..

1

.