Block-iterative methods in tomography

(1)

Block-iterative methods in tomography

Michael Mo

July 13, 2017

Bachelor’s thesis mathematics and computer science

Supervisors: dr. H. Kohr, dr. W.J. Palenstijn, dr. ir. A.L. Varbanescu, prof. dr. K.J. Batenburg

Korteweg-de Vries Instituut voor Wiskunde Instituut voor Informatica

(2)

Abstract

In this project we explain the main idea in tomography and show how we can make a mathematical model of an X-ray scanner setup. From the model we see that the considered object will be an unknown function f , and the measurements are samples of the Radon transform of f . The analytical method to reconstruct f is briefly discussed, by going through the projection slice theorem from which it follows that f can be fully reconstructed if all projections are available. In the algebraic method, we make use of discretization to convert the reconstruction problem to a problem of solving a system of linear equations Ax = b. Beginning with the Kaczmarz method, we see that the method generalizes to the form of block-iterative methods for solving Ax = b. We continue by giving block-iterative versions of Cimmino’s method and SART, and also present a new block-iterative method based on a mathematical convergence criterion, to solve Ax = b specific to the current context. From experimental results, we see that the number of iteration blocks and the block ordering matter for the convergence speed. After discussing efficient parallelizable implementations of the forward and backward projections, we use our GPU implementation to show that the number of blocks also have an impact on the computation time. We also find that positioning multiple sequential angles in the same GPU block can improve caching behaviour and runtime. For our new method, we identify a blocksize and block ordering that is optimal in the considered experiments and for the used hardware.

Title: Block-iterative methods in tomography

Author: Michael Mo, michael.mo@student.uva.nl, 10770518

Supervisors: dr. H. Kohr, dr. W.J. Palenstijn, dr. ir. A.L. Varbanescu, prof. dr. K.J. Batenburg

Second graders: dr. I. Bethke, dr. C.C. Stolk Date: July 13, 2017

Korteweg-de Vries Instituut voor Wiskunde Universiteit van Amsterdam

Science Park 904, 1098 XH Amsterdam http://www.science.uva.nl/math

(3)

1.1 Tomography . . . 4 1.2 Research question . . . 4 1.3 Thesis outline . . . 4 2 Modelling of a standard setting: X-ray scanners 6 2.1 The X-ray setting . . . 6 2.2 Analytical methods . . . 10 2.3 Algebraic methods . . . 12 3 Block-iterative methods 18 3.1 The Kaczmarz method . . . 19 3.2 New method . . . 24 3.3 General block-iterative methods . . . 27 4 Numerical experiments on convergence dependence of block-iterative

methods 29

4.1 Dependence on the number of blocks . . . 31 4.2 Dependence on the block ordering . . . 33 5 An implementation of the block-iterative method 37 5.1 Line and strip model coefficients . . . 38 5.2 Forward projection . . . 39 5.3 Backprojection . . . 41 6 Performance analysis 43 6.1 Convergence dependence on number of blocks in wall-clock time . . . 43 6.2 Improving kernel performance . . . 46

7 Conclusion 52

8 Popular summary (in Dutch) 54

Appendix 56

(4)

1 Introduction

1.1 Tomography

Tomography is about creating images of the inside of an object, by only making use of the physicial interaction of matter with a wave, current, beam or other probing signal. This can mathematically be seen as about making use of projections to solve a certain kind of inverse problem.

Loosely speaking, the main idea of tomography is to find more about the inner structures of an object, without physically opening or cutting it. This whole setting has many im-portant real world applications. For example, a doctor uses tomography to know more about the inside of the brain without actually cutting into the patient’s head. In technical industries we use tomography to see more of the inside of some complicated mechanical part, and an archeologist might make use of tomography to see what is inside an ancient tomb which is too fragile to be opened. So this field certainly has a lot of significance.

1.2 Research question

This thesis intends to study and present the models and methods used in tomography, and efficient implementations of the methods on computers. Part of the study also includes doing research on the performance of existing methods. Thus our research question is:

“How do we improve the performance of existing methods used in tomography?”, and we address the following questions:

1. How do we model a real world setting as a mathematical problem? 2. What are the current methods used in tomography?

3. Can we make use of a GPU in tomography?

1.3 Thesis outline

We begin by looking specifically at X-ray tomography in chapter 2. In there we dis-cuss how we can model an X-ray scanner to a mathematical problem, and show how it can be transformed to a problem of solving a linear system of equations Ax = b. In

(5)

chapter 3 we consider iterative methods to solve such systems. Specifically, we discuss the Kaczmarz method, Cimmino’s method and SART, and generalize them to the form of a block-iterative method, and come up with a new method of similar form. Their generalizations all have multiple parameters, and with numerical experiments shown in chapter 4 we try to identify and tune the most significant ones for convergence speed. The methods under consideration all consist of matrix and vector operations. Due to the large sizes of the matrices involved, the methods require a lot of computations. To reduce the computation time, the first step is often to make use of parallelism if possible. A discussion on an efficient and parallelizable implementation for our block-iterative methods is the subject of chapter 5. For parallel execution on the computer we make use of a GPU (Graphics Processing Unit). Apart from the fact that they allow for parallel execution, they are also better designed than CPUs for performing (simple) operations on large amounts of data. In chapter 6 we investigate the performance of our GPU implementation and discuss optimization techniques.

(6)

2 Modelling of a standard setting: X-ray

scanners

In applied tomography the goal is to know the inside of an object without opening or cutting it. The general idea to achieve that goal is to have some kind of beam penetrat-ing the object from one side and then measure the strength of the beam compenetrat-ing out the other side. Assuming that the beam penetrates easier or more difficult when it passes through different materials, the measurement of the strength of the outcoming beam will then contain some information about the materials it went through. If we repeat this procedure by shooting beams at different starting locations and across different angles, it may then be possible to locate regions of the object where it was either difficult or easy for the beam to penetrate, and thus gives us an idea of the different type of materials of the object in those regions.

As an example where this idea is applied, we look at one of the standard cases in tomography: The X-ray scanner. In most cases, the object is stationary and of a size not too large such that it can be positioned between an X-ray emitter and an X-ray detector. We call this the X-ray setting. In this chapter we discuss some ways to model the X-ray setting to a mathematical problem.

2.1 The X-ray setting

We begin by describing the basic functionality of an X-ray scanner for a stationary object. The X-ray machine itself will consist of at least two parts: One part which will emit X-rays, and one part which will detect the transmitted X-rays. The emitter and detector are then positioned in some fixed chosen location with the object in-between, and a measurement can then be made as follows: The X-ray emitter sends out an X-ray with a known starting intensity of I0 towards the direction of the detector. The X-rays

then travel in a straight line through the object until they hit the detector on the other side of the object. The detector then measures some intensity I1. This setup can be

(7)

Figure 2.1: X-ray setup.

Each type of material has a property called the attenuation coefficient. The attenuation coefficient of a material tells how difficult it is to be penetrated by an X-ray (the higher the number, the more difficult it is for an X-ray to penetrate). Since the attenuation coefficient is generally also characteristic for each type of material, knowing those co-efficients at each location of the object thus tells us what type of material is at that location. The manner in which the intensity of an X-ray decreases, will depend on the total attenuation A the X-ray suffers in its path. This relation can be expressed as [7]:

I1= I0· e−A

which is known as Beer-Lambert law. Assuming that the materials of the object have different attenuation coefficients, the measured intensity will depend on the materials of the object the X-ray went through, and thus multiple measurements at different locations and angles can give information about the inside of the object. More explicitly by making an extra calculation, the information we can get from a measurement I1, is

the total attenuation from the path the X-ray travelled: A = − log I1

I0

For convenience, from now on whenever we use the word measurement in the context of an X-ray detector, we will not actually mean the intensity I1 which the detector

mea-sures, but rather the total attenuation of the X-ray as seen by the formula above. Our goal is to find out what the values of the attenuation coefficients are for every location inside the object using only the measurements.

In order to generate an image of the inside of the object, we have to transform the previous setting to a more concrete mathematical model. To keep it simple, we start by discretizing the 3D object as a ‘finite stack of 2D slices’, and then solve the problem per ‘2D slice’. Before we give the explicit mathematical model for the X-ray setting in the 2D-case, we discuss a more convenient notation for infinite lines in R2and line integrals, and introduce the Radon transform.

(8)

There are many ways to classify an infinite line in R2. The most usual form is y = ax + b. We now introduce another classification of infinite lines in R2 which will be more con-venient in this context. Given a line L in R2, we can associate a pair (θ, s) to it, by setting θ as the angle needed to rotate line L clockwise around the origin such that it lies parallel to the y-axis, and by then setting s equal to the x-coordinate which all those points on L have after rotation by θ. So note that s can be positive or negative, depending on how we choose θ. An example is given in Figure 2.2. In other words, we correspond a pair (θ, s) to a line L if the following holds:

L = {(x, y) ∈ R2 | x cos(θ) + y sin(θ) = s}

With again the important remark that if L = (θ, s), then also L = (θ +π, −s). Using this classification of a line in R2 as a pair (θ, s), we can now introduce the Radon transform.

Figure 2.2: Example of a pair (θ, s) corresponding to a line L.

Definition 2.1.1. Given a function f : R2 → R, the Radon transform of f is denoted by R{f }, and defined as a function from the space of all lines in R2, which maps a line L to the line integral of f over that line L. We have seen that we can denote any line L in R2 as L = Lθ,s for certain θ and s, and we will write (θ, s) as shorthand for Lθ,s. So

given a line L = Lθ,s, we have:

R{f }(L) = R{f }(θ, s) := Z

Lθ,s

f ds

To write this line integral as an one-dimensional integral, note that we can parametrize a line Lθ,s= {(x, t) ∈ R2 | x cos(θ) + y sin(θ) = s} as:

(x(t), y(t)) = (s cos(θ), s sin(θ)) + t · (− sin(θ), cos(θ)) = (s cos(θ) − t sin(θ), s sin(θ)) + t cos(θ))

(9)

where t goes from −∞ to ∞. Therefore we have: R{f }(θ, s) = Z Lθ,s f ds = Z ∞ −∞ f (x(t), y(t)) · s dx dt 2 + dy dt 2 dt = Z ∞ −∞

f (s cos(θ) − t sin(θ), s sin(θ) + t cos(θ)) dt

With help of the Radon transform we can create a mathematical model of the X-ray setting. We begin with a 2D object and see the object as a two-dimensional function f : R2 → R, where we interpret f(x, y) as the value of the attenuation coefficient from the object/air at position (x, y). Furthermore, in this scenario we assume that the object is of finite size and that the attenuation coefficient of air is zero. This basically means we assume that there is an R > 0 such that f (x, y) = 0 for all (x, y) outside the disk D(0, R) := {(x, y) | px2_{+ y}2 _{≤ R}. The X-ray is seen as travelling in a}

straight line L, which begins at the location of the X-ray emitter pb, and ends at the

location of the X-ray detector pe. The total attenuation in the path the X-ray travels

(which the measurement gives us), is then just simply the line integral R_Lf (x, y)ds. Adding to the assumption f (x, y) = 0 for all (x, y) /∈ D(0, R), that the whole object fits between the emitter and detector (meaning pb, pe are not in D(0, R)), we then see that

R

Lf (x, y)ds =

R

Lθ,sf (x, y)ds = R{f }(θ, s), where Lθ,s is the infinite line going through

points pb, pe. This thus gives us the following relation for the total attenuation A:

A = Z

Lθ,s

f (x, y) ds = R{f }(θ, s)

So the conclusion is that we can correspond each measurement to a line (θ, s), and the measurement itself is simply just R{f }(θ, s). The objective is to know the attenuation coefficients at each location using the measurements, so mathematically the problem becomes to determine f given only some m samples R{f }(θi, si), i = 1, . . . , m. Before

we continue to talk about how these data can help us in determining f or if that is even possible, we first introduce another definition, which in this context makes it easier to talk about how the Radon transform is connected to the measurements.

Definition 2.1.2. Given a two-dimensional function f , and an angle θ, we will call the function pθ, defined as pθ(s) := R{f }(θ, s), the projection of f along angle θ.

One way to visualise the two-dimensional Radon transform, is to use the characterisation of seeing a line as a pair (θ, s). We note that if the two-dimensional function f is zero outside a disk centered at zero with radius R, then any line integralR_L

θ,sf (x, y)ds with

|s| > R is always 0. Therefore, we can plot all non-zero values of the Radon transform of f , in a bounded two-dimensional figure where thus on the y-axis we have θ ∈ [0, 2π) and on the x-axis s ∈ [−R, R]. Such a figure is called a sinogram. An example of a function with its sinogram is given in Figure 2.3.

(10)

Figure 2.3: Two-dimensional function f (left) with its sinogram (right).

Remark 2.1.3. Note that in the sinogram in Figure 2.3 we have deliberately not dis-played the values of R{f }(θ, s) where θ ≥ π, since it is easy to check that for all (θ, s) we have R{f }(θ, s) = R{f }(θ + π, −s), which can thus already be seen in the figure.

2.2 Analytical methods

So we now know that each measurement we have is just the evaluation of the Radon transform of the unknown function at a point (θ, s). In particular, if we make multiple measurements from the same angle, we get samples of the projection pθ(s) of f at

mul-tiple values for s. The goal is to find out what the two-dimensional function f is, so the first question to ask, is whether these projections can help us in determining f .

What we show is that with some minimal assumptions, the whole range of projections along every angle θ is actually enough to find the whole of f back. We will prove this with the help of the Fourier transform, and a theorem known as the projection slice theorem / Fourier slice theorem. Like the name suggests, we need to make use of the Fourier transform, so we quickly state the definitions and notation we use here for the Fourier transform.

Definition 2.2.1. The Fourier transform of a one-dimensional function g is defined as: G(ω) :=

Z ∞

−∞

g(t) · e−iωtdt and the inverse Fourier transform as:

g(t) = 1 2π

Z ∞

−∞

G(ω) · eiωtdω

For a two-dimensional function g, the (two-dimensional) Fourier transform is defined as: G(ω1, ω2) := Z ∞ −∞ Z ∞ −∞ g(x, y) · e−i(ω1x+ω2y)_dxdy

(11)

and the inverse (two-dimensional) Fourier transform as: g(x, y) = 1 (2π)2 Z ∞ −∞ Z ∞ −∞ G(ω1, ω2) · ei(ω1x+ω2y)dω1dω2

We now show the direct connection between the Fourier transform of pθwith the Fourier

transform of f with the following theorem [3].

Theorem 2.2.2. (Projection slice theorem): Let f : R2 → R, and pθ the projection of

f along θ. Then we have:

Pθ(ω) = F (ω cos(θ), ω sin(θ))

where Pθ is the (one-dimensional) Fourier transform of pθ, and F the (two-dimensional)

Fourier transform of f . Proof. Pθ(ω) = Z ∞ −∞ pθ(s) · e−iωsds = Z ∞ −∞ Z ∞ −∞

f (s cos(θ) − t sin(θ), s sin(θ) + t cos(θ)) dt · e−iωsds = Z ∞ −∞ Z ∞ −∞

f (s cos(θ) − t sin(θ), s sin(θ) + t cos(θ)) · e−iωsdt ds With the change of variables x = s cos(θ) − t sin(θ), y = s sin(θ) + t cos(θ) we get:

= Z ∞ −∞ Z ∞ −∞ f (x, y) · e−iω(cos(θ)x+sin(θ)y)dxdy = Z ∞ −∞ Z ∞ −∞

f (x, y) · e−i(ω cos(θ)x+ω sin(θ)y)dxdy = F (ω cos(θ), ω sin(θ))

So what the projection slice theorem basically tells us, is that the Fourier transform of a projection pθ is the same as a slice from the two-dimensional Fourier transform of f

through the origin at angle θ. So if we have all the projections available, then we have all the slices from the two-dimensional Fourier transform, meaning we know the Fourier transform of f . Then with the inverse Fourier transform, we can get f back, and thus the reconstruction.

The projection slice theorem gives an idea on how to reconstruct f , namely by doing the following three steps:

1. For each projection, calculate the (one-dimensional) Fourier transform of the pro-jection.

(12)

2. Calculate the (two-dimensional) Fourier transform of f using all those “slices” from step 1.

3. Calculate the inverse (two-dimensional) Fourier transform of f to get a reconstruc-tion of f .

Note that for real world applications, we only make measurements across a finite set of angles, and given an angle θ we also do not have the whole projection pθ available,

but only some N samples pθ(sk), k = 0, . . . N − 1. Therefore these steps all require

some estimation and interpolation. This is a possible method, but it turns out that the interpolation needed in the Fourier domain can introduce quite significant errors. A more used method is known as filtered backprojection (FBP). We now globally describe the steps in FBP [9]. There are two parts in FBP:

1. The filtering part. 2. The backprojection part.

We begin by describing what we mean by a backprojection part. The idea is that, we simply project our projection data (the projections pθ of f ) back onto our reconstruction

image along the same angle θ. We do this for each projection available. So for a point v of the reconstruction image, we compute the sum of all the projection values along lines going through v.

However, (with the projection slice theorem) it can be shown that to reconstruct the original f we must not backproject our projection data pθ, but rather the projection data

filtered with a ramp filter (also known as Ram-Lak filter). This filtering stage is exactly the first part of FBP. To reduce high-frequency noise in the resulting image (because of errors in the data), the ramp filter is often replaced by some other filter which dampens the higher frequencies.

We do not go into the details of these methods, but continue with another class of methods called algebraic methods.

2.3 Algebraic methods

A different approach to reconstructing f is known as the algebraic method. This method will translate the problem of reconstructing f into a problem of solving for x a system of linear equations Ax = b. We are able to do this because we will “discretize” the function f , which in this case means we will see f as being piecewise constant per pixel area. In this section, we describe this process.

First as we have seen before, each measurement corresponds to a line (θ, s). What mostly happens is that the measurements are done in some ordered method, meaning that all the lines (θi, si) are not random, but instead have some order in them. Two

(13)

common orderings in the two-dimensional case are known as a parallel beam geometry and a fan beam geometry. Before we introduce how exactly the lines are ordered in those two cases, we first define what we mean by a projection geometry and a view/projection. Definition 2.3.1. A view/projection is a set of lines V = {(θ1, s1), . . . , (θq, sq)}, and a

projection geometry is a set of views/projections G = {V1, . . . , Vp}.

With those new terms, we can make more explicit what the parallel beam and fan beam geometry mean:

Definition 2.3.2. A parallel beam geometry is a projection geometry G = {V1, . . . , Vp}

where for each view Vi, we have that all lines in that view have the same angle.

Definition 2.3.3. A fan beam geometry is a projection geometry G = {V1, . . . , Vp}

where for each view Vi, we have that there exists a point pi, such that all lines in that

view intersect the point pi.

Figure 2.4: Parallel beam geometry (left) and fan beam geometry (right). The parallel beam geometry and fan beam geometry can be visualised as in Figure 2.4. We also make the following definitions to make clear what exactly we mean by pixels and a pixel grid.

Definition 2.3.4. A pixel π is a rectangle in R2which we can denote as π = (x1, y1, x2, y2)

where (x1, y1) and (x2, y2) represents the coordinates of the bottom-left and top-right

corners of the pixel respectively.

The pixel region is then Rπ = {(x, y) | x1 ≤ x < x2 and y1 ≤ y < y2}. A pixel grid Π is

a set of a finite number of pixels, so Π = {π1, . . . , πn}.

To simplify the discussion, we only use unit pixels (square pixels with side length 1) and assume that all the pixels make up a partition of a rectangle centred at the origin. Let M be the height and N be the width of the rectangle. This means that the pixels make up a 2D grid of N columns and M rows, and all pixels can be indexed as πu,vwith 0 ≤ u < N

(14)

and 0 ≤ v < M as can be seen in Figure 2.5. For any pixel πu,v = (x1, y1, x2, y2) it also follows that: N 2 + x1= u N 2 + x2= u + 1 M 2 + y1= M − v − 1 M 2 + y2= M − v

, which is the same as:

x1 = − N 2 + u x2 = − N 2 + u + 1 y1 = M 2 − v − 1 y2 = M 2 − v

Figure 2.5: Example of the 2D grid with M = 3 and N = 4.

Beginning with such a pixel grid Π = {π1, . . . , πn}, n = M N , the discretizing of the

problem means we assume that the function f is constant on each pixel region Rπ, π ∈ Π,

and that f is zero when outside the pixel grid. Therefore, we can write f as f =P

π∈Πxπfπ, where fπ can be seen as a pixel indicator function, meaning:

fπ(x, y) =

(

1 if (x, y) ∈ Rπ

0 else, and where xπ is the value f takes on pixel π.

(15)

Using the linearity of the integral, we get that: R{f }(θ, s) = Z Lθ,s f (x, y) ds = Z Lθ,s X π∈Π xπfπ(x, y) ! ds =X π∈Π xπ Z Lθ,r fπ(x, y) ds =X π∈Π xπ· R{fπ}(θ, s) =X π∈Π c(θ, s, π) · xπ

where c(θ, s, π) := R{fπ}(θ, s) is the intersection length of line (θ, s) with pixel π. Each

measurement R{f }(θ, s) thus reduces to a linear combination of all the pixel values xπ, π ∈ Π.

Let now G = {V1, . . . , Vq} be a projection geometry. For each line (θ, s) which is in

a view Vk, let aθ,s be the row vector defined as:

aθ,s= a(θ, s, π1) a(θ, s, π2) . . . a(θ, s, πn)

Taking the entries as a(θ, s, πi) = c(θ, s, πi) for i = 1, . . . , n, we then see:

R{f }(θ, s) =X

π∈Π

c(θ, s, π) · xπ = aθ,s· x

where x is the vector x = (xπ1, . . . , xπn).

So if we have Vi = {(θi,1, si,1), . . . , (θi,ki, si,ki)} then we get with Ai as below:

Ai· x =      aθi,1,si,1 aθi,2,si,2 .. . aθ_i,ki,s_i,ki      ·      xπ1 xπ2 .. . xπn      =     

a(θi,1, si,1, π1) a(θi,1, si,1, π2) . . . a(θi,1, si,1, πn)

a(θi,2, si,2, π1) a(θi,2, si,2, π2) . . . a(θi,2, si,2, πn)

..

. ... . .. ...

a(θi,ki, si,ki, π1) a(θi,ki, si,ki, π2) . . . a(θi,ki, si,ki, πn)

     ·      xπ1 xπ2 .. . xπn      =      R{f }(θi,1, si,1) R{f }(θ_i,2, si,2) .. . R{f }(θ_i,k_i, si,ki)      =: b

(16)

With Ai as above for each view Vi, we can write for all lines, Ax = b with A =      A1 A2 .. . Am     

which is known as the system matrix. Note that each row of the system matrix mostly consists of zeros (since a line only intersects a small subset of all pixels), and that the number of rows is exactly equal to the number of measurements. The system matrix is thus a very large and sparse matrix.

Thus when we start discretizing, the problem is to determine what the values xπj,

j = 1, . . . , n are. But we also see that these should satisfy Ax = b, a system of lin-ear equations. Since each entry of A can be calculated without knowing x, and b is the vector containing the measurements, the only unknown is x which means we just have to solve Ax = b. But note that measurements probably contain errors, meaning it becomes a least squares problem AxLS= b.

This model with its discretization process to the system Ax = b where the entries of A are intersection lengths of a line with a pixel and b is the vector of measurements is a projection model called the line model. Here we have assumed that each measure-ment from a “detector-part” measured X-rays exactly along a single line. There are also other models: If we assume that a “detector-part” measures a bundle of X-rays with width d > 0, then we get a model called the strip model. Performing a similar kind of discretization, this then gives us a system Ax = b where the entries of A are given by a(θ, s, π) = Rs+d/2

u=s−d/2c(θ, u, π)du (which is the area of the intersection of the 2D-strip

with pixel π), and b is again the detected values.

A third projection model is known as the interpolation model. This model starts with a pixel grid Π = {π1, . . . , πM N}. To calculate the coefficients a(θ, s, πk) for k = 1, . . . , M N ,

we first distinguish between “more vertical” and “more horizontal’ lines. We call a line (θ, s) more vertical if θ ∈ [0,π₄)∪[3π₄ , π), and we call a line more horizontal if θ ∈ [π₄,3π₄ ). First consider a more vertical line (θ, s). Because it is more vertical, it will intersect at most two pixels in each row of the pixel grid. Consider a row v, and for a pixel πu,v

in row v, write (x(u)c , yc(u)) as the coordinate of the centre of pixel πu,v. It follows that

the centre of all pixels in row v have the y-coordinate: yc=

M 2 − v −

1 2

(17)

horizontal line y = yc. It follows that we have: ˜ x = s − ycsin(θ) cos(θ) = s − (M₂ − v − 1₂) sin(θ) cos(θ) ,

and that ˜x lies between the x-coordinates of the centres of some two pixels πu,v, πu+1,v.

This can be seen in Figure 2.6. Formulas for u, x(u)c and x(u+1)c are given by:

u = ˜ x +N 2 − 1 2 x(u)_c = −N 2 + u + 1 2 x(u+1)_c = x(u)_c + 1

Figure 2.6: Illustration interpolation model. The coefficients a(θ, s, πk,v) are then defined as:

a(θ, s, πu,v) := 1 cos(θ)(1 − |˜x − x (u) c |) a(θ, s, πu+1,v) := 1 cos(θ)(1 − |x (u+1) c − ˜x|) = 1

cos(θ)− a(θ, s, πu,v)

and a(θ, s, πk,v) := 0 if k /∈ {u, u + 1}. The factor _cos(θ)1 is there since we want the

coefficient to depend on the length of the line in the pixel row, which can be calculated to be equal to _cos(θ)1 . The case for the more ‘horizontal’ line goes in a very similar way. We have thus seen that there are multiple projection models. The choice could de-pend on what is most realistic to the real world setting, or could be a choice dede-pending on the amount of computation needed. In any case, we now need to solve a system Ax = b where A is very large and sparse. This will be the subject of the next chapter.

(18)

3 Block-iterative methods

In this chapter we mainly discuss numerical methods to solve large sparse linear sys-tems, and convergence guarantees. Although most of the methods and theorems we discuss here will also hold for complex-valued matrices or more general operators acting on Hilbert spaces instead of matrices, we limit our discussion to real-valued matrices. This is because we intend to use the theorems and methods only for our system of linear equations Ax = b with A the real-valued system matrix as given in chapter 2.

In the previous chapter we have transformed the tomographic problem to solving a least squares problem AxLS= b. Solving systems of linear equations does not seem too compli-cated, and one of the common methods to solve such a least square problem AxLS= b is to make use of the singular value decomposition (SVD) of a matrix. Factorizing A into its SVD A = U ΣVT _{with U and V unitary matrices, we see:}

Ax = b ⇐⇒ U ΣVTx = b ⇐⇒ ΣVTx = UTb

The general method to solve Ax = b with the SVD then goes as follows [8]: 1. Calculate the (reduced) SVD A = U ΣVT.

2. Solve for vector y, Σy = UTb. 3. Set solution as x = V y.

One remark is that in step 2, the diagonal system is very easy to solve by setting y = Σ+(UTb), where (Σ+)ii= σ_i−1 for the non-zero ith singular value, and 0 otherwise.

But note that if a singular value σi is very small, then the corresponding entry σ−1i in

Σ+ _{is very large. This means that noise or numerical errors can cause large changes to}

the solution. Therefore, the truncated SVD (i.e. setting all singular values to 0 if they are below some > 0) is often used.

However, the computation of a SVD of a matrix is not an easy task. One of the standard methods to calculate the SVD is known as Golub-Kahan bidiagonalization, and most other methods are variants of this, or are based on methods for computing an eigenvalue decomposition. The big disadvantage is that these methods rely on making iterative modifications to the whole matrix A, and therefore the whole of matrix A needs to be stored. Matrices coming from a two-dimensional tomographic setting can easily be of a size 106 by 106, and if we had looked at a three-dimensional tomographic setting, then the matrices can quickly reach dimensions of 109 by 109. These sizes are simply too big, and therefore we will need to rely on other methods to solve Ax = b.

(19)

3.1 The Kaczmarz method

A different method to solve a system of linear equations Ax = b, is the Kaczmarz method. As notation, from here on we let A ∈ Rm×n and write

A =      a1 a2 .. . am      , where aT_i ∈ Rn for i = 1, . . . , m,

and assume all the rows of A are non-zero. The Kaczmarz method is the iterative method given by:

x0= starting guess vector for k = 0, 1, 2, ... i = (k mod m) + 1 xk+1= xk+bi− ai· x k ||aT i ||2 aT_i end for

Remark 3.1.1. When the starting guess x0 is chosen as x0 = 0, it can be shown that the Kaczmarz method is in fact a linear transformation operating on b [6].

To see how the iteration formula can be derived, we first sketch the idea behind this method: The solution x needs to satisfy Ax = b, which is the same as x satisfies the highly underdetermined (one equation, n variables) system of equations ai · x = bi for

i = 1, . . . , m. Now the solution set for each of those systems of equations, is the set Hi = {z ∈ Rn | ai· z = bi} which forms an affine hyperplane in Rn. In the Kaczmarz

method we start with a chosen starting vector x0. The next vector x1, is then the orthogonal projection of the current iterate vector x0onto the solution set H1. The next

vector x2, is the orthogonal projection of x1 onto the next solution set H2, etc. If, for

example, A ∈ R2×2, then each of the solution sets H1, H2 is a line in the plane and we

can visualize the Kaczmarz method as shown in Figure 3.1.

(20)

If n = 3, then those solution sets Hi will be 2D planes in R3, and in general Hi will

be a (n − 1)-dimensional affine hyperplane in Rn. Before we derive the formula of the Kaczmarz method, we first state some facts of orthogonal projectors we will need to derive the formula.

Suppose W is a k-dimensional linear subspace of Rn, so W = span{w1, . . . , wk} with all

wi’s linear independent. If M is the n×k matrix with columns w1, . . . , wk, then the

or-thogonal projector onto W is the oror-thogonal projector PM onto range(M ), which can be

checked to be given by PM = M (MTM )−1MT. Also for any projector P (which projects

onto range(P )), the complementary projector of P , is given by I − P (which projects onto range(P )⊥= ker(P )). So we have the relation PM = I − PM⊥ and P_M⊥ = I − P_M.

Also note that the affine hyperplane Hi ⊂ Rn is just a translation of a linear

sub-space Y ⊂ Rn by some vector. Indeed, if we let h0∈ Hi be any vector, then by defining

Y := {y ∈ Rn| (y + h0) ∈ Hi} = {y ∈ Rn| ha, yi = 0}, it is easy to check that Y is then

indeed a linear subspace of Rn, and it is easy to see that Hi = Y + h0.

Going back to the Kaczmarz method, we need to project the current vector iterate xk to the set Hi which is a translation of the linear subspace Y with h0 as defined

before. Note that projecting xk onto Hi, is then the same as first translating xk with

−h0, then projecting it onto Hi − h0 (which is a linear subspace), and then making

the inverse translation +h0. In symbols this means: PHi = PY(xk− h0) + h0. Now it

is easy to see that Y is (n−1)-dimensional, and that Y⊥ is the one-dimensional space Y⊥ = {z ∈ Rn | hy, zi = 0, ∀y ∈ Y } = span{a}, so we get P_Y⊥ = a(aTa)−1aT = aa

T ||a||2 and we see: xk+1= PY(xk− h0) + h0 = (I − P_Y⊥)(xk− h₀) + h₀ = (I − aa T ||a||2)(x k_{− h} 0) + h0 = xk− h₀−aa T aT_a(x k_{− h} 0) + h0 = xk+a T_h 0− aTxk ||a||2 a = xk+b − ha, xi ||a||2 a

which is the iteration formula of the Kaczmarz method. We continue with some remarks about the Kaczmarz method. First of all, as we will also see when we prove a more general version of the Kaczmarz method, we note that the ordering does not have to be sequential. With that we mean that it is not necessary to choose i = (k mod m) + 1, but we can choose any permutation σ : {1, 2, . . . , m} → {1, 2, . . . , m} of the numbers 1 to m, and then set i = σ((k mod m) + 1).

(21)

Figure 3.2: Illustration of the Kaczmarz method with different orderings.

As can be seen in Figure 3.2, we note that the closer the lines between successive steps are in being orthogonal to each other, the faster the method converges. Thus the ordering can be quite significant. We also note that we can use an extra constant λ ∈ (0, 2) which is known as a relaxation parameter. This meaning behind this parameter can be thought of as “overprojecting” or “underprojecting” the current vector iterate xkto the solution set Hi instead of projecting xk onto the set Hi. An example of the Kaczmarz method

with “underprojection”, can be seen in Figure 3.3.

Figure 3.3: Illustration of the Kaczmarz method with relaxation parameter λ = 0.5. At last we note that instead of projecting onto the solution set of a single equation, one can also consider multiple equations as a block and project onto their joint solution set. All in all, with those remarks in mind, we have the following generalized version of the Kaczmarz method with p blocks:

x0 = starting vector, x(k,0)= xk x(k,j)= x(k,j−1)+ λAT_j(AjATj)

−1

(bj− Ajx(k,j−1)), j = 1, . . . , p

xk+1= x(k,p),

where we have that all the rows of A1, . . . , Ap together form a partition of the rows of A,

and b1, . . . , bp are vectors each containing the corresponding entries of b. We indeed see

(22)

The next step is to prove that this more general method converges. We first give the definition of positive definite and positive semi-definite matrices, since we will need to use them in the proof.

Definition 3.1.2. A real symmetric matrix A ∈ Rn is called positive definite and denoted as A > 0, if xTAx > 0 holds for all non-zero x ∈ Rn. If instead only xTAx ≥ 0 holds, then A is called positive semi-definite and denoted as A ≥ 0. For matrices A, B we also denote A > B and A ≥ B which have the meaning (A − B) > 0 and (A − B) ≥ 0 respectively.

What we actually prove is an even more general version of the iteration method. We replace the matrix AjATj by a positive semi-definite matrix Cj, changing the iteration

thus to the following form:

x0= starting vector, x(k,0)= xk

x(k,j)= x(k,j−1)+ λAT_jC_j−1(bj− Ajx(k,j−1)), j = 1, . . . , p

xk+1= x(k,p)

We now prove that this general method converges to the solution, with the proof based on [4]. (The proof given in [4] actually proves the following theorem for general operators in a Hilbert space setting.)

Theorem 3.1.3. Let Cj > 0 and Cj ≥ AjATj for j = 1, . . . , p and assume that the

system of linear equations Ajx = bj is consistent for j = 1, . . . , p. Then for a relaxation

parameter λ ∈ (0, 2), the iteration

x0= starting vector, x(k,0)= xk

x(k,j)= x(k,j−1)+ λAT_jC_j−1(bj− Ajx(k,j−1)), j = 1, . . . , p

xk+1= x(k,p)

is convergent and xk converges to PAx0+ A+b, where PA is the orthogonal projection

to Ker(A), and A+ is the generalized inverse of A. Proof. For this theorem, we write as notation:

Qj := I − λATjCj−1Aj

hj := λATjC −1 j bj,

so that the iteration becomes:

x(k,j)= Qjx(k,j−1)+ hj

(23)

Now let x+ be the minimum norm solution, meaning that x+ satisfies Ax+ = b, and x+ ∈ Ker(A)⊥. Also for k = 0, 1, 2, . . ., define ek := x+ − xk_{. We first claim that}

ek+1= Qek. To see why this is true, we expand xk+1 and Qx+: xk+1 = x(k,p)= Qpx(k,p−1)+ hp = Qp(Qp−1x(k,p−2)+ hp−1) + hp = QpQp−1x(k,p−2)+ Qphp−1+ hp = QpQp−1(Qp−2x(k,p−3)+ hp−2) + Qphp−1+ hp = QpQp−1Qp−2x(k,p−3)+ QpQp−1hp−2+ Qphp−1+ hp .. . = QpQp−1· . . . · Q1x(k,0)+ Qp· . . . · Q2h1+ . . . + Qphp−1+ hp = Qxk+ Qp· . . . · Q2h1+ . . . + Qphp−1+ hp Since x+ _{satisfies A}

jx+= bj for j = 1, . . . , p we also see:

Qjx+= (I − λATjCj−1Aj)x+= x+− λATjCj−1bj = x+− hj, j = 1, . . . , p Therefore we get: Qx+= Qp· . . . · Q1x+ = Qp· . . . · Q2(x+− h1) = Qp· . . . · Q2x+− Qp· . . . · Q2h1 = Qp· . . . · Q3(x+− h2) − Qp· . . . · Q2h1 = Qp· . . . · Q3x+− Qp· . . . · Q3h2− Qp· . . . · Q2h1 .. . = x+− hp− Qphp−1− . . . − Qp· . . . · Q2h1

Thus we see from above:

xk+1+ Qx+= Qxk+ x+

=⇒ ek+1 = xk+1− x+ = Qxk− Qx+= Qek

So we have ek+1 = Qek, and thus ek+1 = Qke0 = Qkx0− Qk_x+_{. We also note that Q has}

the invariant subspaces ker(A) and ker(A)⊥, and that Q = I on ker(A). To prove that xk → P_Ax0+ A+b, we have to show that the part of x0 which lies in ker(A)⊥ converges to x+. It thus remains to show that ek→ P_Ax0 (since then PAx0 = lim ek = lim xk− x+

(24)

We see: ||Q_jx||2 = ||x − λAT_jC_j−1Ajx||2 = hx − λAT_jC_j−1Ajx, x − λATjCj−1Ajxi = hx, xi − 2λhx, AT_jC_j−1Ajxi + λ2hATjC −1 j Ajx, ATjC −1 j Ajxi = ||x||2− 2λhAjx, C_j−1Ajxi + λ2hAjATjC −1 j Ajx, C −1 j Ajxi

If S ≥ T , which means S − T is positive semi-definite, then for all x we have: 0 ≤ xT(S − T )x = xTSx − xTT x = (Sx)Tx − (T x)Tx = hSx, xi − hT x, xi =⇒ hT x, xi ≤ hSx, xi

So since C_j−1≥ A_jAT_j, we get for all x:

hA_jAT_jC_j−1Ajf, Cj−1Ajxi ≤ hCjCj−1Ajx, Cj−1Ajxi

= hAjx, Cj−1Ajxi

Thus we see:

||Qjx||2≤ ||x||2− 2λhAjx, C_j−1Ajxi + λ2hAjx, C_j−1Ajxi

= ||x||2− λ(2 − λ)hAjx, C_j−1Ajxi

Since Cj is a positive definite matrix, the matrix Cj−1 is also positive definite. Because

λ ∈ (0, 2), we get ||Qjx||2 ≤ ||x||2 and we see (since C_j−1 is positive definite) that we

get equality only if Ajx = 0. Then it follows that ||Qx|| = ||Qp· . . . · Q1x|| ≤ ||x||, with

equality only if x ∈ ker(A). So if x ∈ ker(A)⊥, then we have ||Qx|| < ||x||, showing that ||Q|ker(A)⊥|| < 1, and thus that ||Qk_|ker(A)⊥|| ≤ ||Q|ker(A)⊥||k → 0, which shows

Qk_|ker(A)⊥→ 0.

3.2 New method

In the previous section we see a form of an iterative method for solving a system Ax = b with multiple parameters, i.e. the number of blocks Aj, the choice of partitioning A,

the matrices Cj, and a choice of relaxation parameter. We now look at the specific

case of solving Ax = b coming from the algebraic method from the X-ray setting. More specifically, we look at Ax = b, with the matrix A coming from the line model with a parallel beam geometry. We further simplify things and assume that each angle has the same number of lines, and that the distance between two consecutive lines is exactly 1.0. Note that each row of A corresponds with a line. When we need to partition A into blocks Aj we will simply speak of “assigning angles to a block” instead of assigning

rows to a block. With that we mean that if we assign an angle θ to a block Aj, it means

(25)

assume that all the blocks Aj contains the same number of angles.

We now present a new iterative method which will have the form mentioned in The-orem 3.1.3, and which we prove to converge to the solution of the system Ax = b in the tomographic setting as just described.

We define Cj as a diagonal matrix with entries (Cj)ii defined as:

(Cj)ii:= √ 2 · q · n X k=1 (Aj)ik

where q is the number of angles that corresponds to a block Aj.

To prove that this new iterative method actually converges to the solution, we use The-orem 3.1.3. So it remains to show that the assumptions from the theThe-orem are fullfilled. We begin by showing that for our Cj, we have Cj ≥ AjATj. First we need a well-known

theorem on the relation between the diagonal entries and the eigenvalues of a matrix. Theorem 3.2.1. (Gershgorin circle theorem): Let A ∈ Cn×n_{, and for i = 1, . . . , n}

let ri = Pn_j=1,j6=i|aij| be the sum of the absolute values of the non-diagonal entries

in the ith row of A. Then each eigenvalue of A lies in atleast one of the closed disks D(aii, ri) = {z ∈ C | |aii− z| ≤ ri}.

Proof. Let λ be any eigenvalue of A and let x be a corresponding eigenvector. Then choose i such that |xi| = max

1≤k≤n|xk|. Now we see: Ax = λx =⇒ m X k=1 (aik· xk) = λxi =⇒ m X k=1 k6=i (aik· xk) = λxi− aiixi= (λ − aii)xi

(26)

Since x 6= 0, we have that |xi| 6= 0, so: |λ − a_ii| = 1 |xi| · m X k=1 k6=i (aik· xk) ≤ 1 |xi| · m X k=1 k6=i (|aik| · |xk|) = m X k=1 k6=i |a_ik| ·|xk| |x_i| ≤ m X k=1 k6=i |a_ik|

since |xi| is the largest of all entries of x. So we indeed see |λ − aii| ≤ ri, meaning that

λ is in the disk D(aii, ri).

To show Cj ≥ AjATj, we have to show that (Cj−AjATj) is positive semi-definite, which is

equivalent to showing that all eigenvalues of (Cj−AjATj) are non-negative. To show that

those eigenvalues are non-negative, we use Gershgorin circle theorem. In our context, we have that Aj is one of the blocks of the system matrix A ∈ Rm×n coming from the

line model. Let l be the number of rows of Aj, meaning

l = number of angles in a block × number of lines per angle.

For i = 1, . . . , l, let ai= ((ai)1, . . . , (ai)n) ∈ Rnbe the vector equal to the ith row of Aj.

Then we see for any i ∈ {1, . . . , l}:

l X j=1 hai, aji = l X j=1 _Xn k=1 (ai)k· (aj)k = n X k=1 (ai)k· _Xl j=1 (aj)k ≤ n X k=1 (ai)k· ( √ 2 · q) =√2 · q · n X k=1 (ai)k

where we use that in a block Aj we have exactly q different angles, and for each angle

θ we have that for each pixel πk, the sum of the intersection lengths of all the lines

with angle θ with pixel πk is at most

√

2 (since we assumed the distance between two consecutive lines to be 1.0) and therefore: Pl

j=1(aj)k≤

√ 2 · q.

(27)

So: =⇒ ri := l X j=1,j6=i ha_i, aji ≤ √ 2 · q · n X k=1 (ai)k− hai, aii ≤ |√2 · q · n X k=1 (ai)k− hai, aii| = (Cj − AjATj)ii

We thus have that (Cj− AjATj)ii− ri ≥ 0 for all i, meaning D((Cj− AjATj)ii, ri) ⊂ R≥0,

so using Gershgorin circle theorem we get that all eigenvalues are non-negative, meaning Cj ≥ AjATj.

At last we need to show that Cj is positive definite. Since Cj is diagonal and all its

entries are larger than 0, it follows immediately that Cj > 0. So our new proposed

method converges by Theorem 3.1.3.

3.3 General block-iterative methods

In this section we discuss general block-iterative methods. We introduce the already known methods: Cimmino’s method and SART (Simultaneous Algebraic Reconstruction Technique), and then give the block-iterative version of those methods. All of them will have the form of the general block-iterative method we will discuss soon. In the next chapter, we will then be able to make experiments to research how the convergence speed per whole iteration might depend on:

1. The number of blocks chosen per whole iteration. 2. The choice of angles in a block.

Firstly, the general version of a block-iterative method will have this form [2]: x0 = starting vector, x(k,0)= xk x(k,j) = x(k,j−1)+ λk,jV_j−1ATjW −1 j (bj− Ajx (k,j−1)_), _{j = 1, . . . , p} x(k+1,0)= x(k,p)

Note that this is thus an even more general form than the one discussed in section 3.1. The idea of Cimmino’s method is almost the same as the Kaczmarz method. The only difference is that instead iterating on a per row based manner, we now calculate the projection for each row, after which we update the current iterate with the average of all those projections. So the formula in Cimmino’s method becomes (Note that here

(28)

ai represents the ith row of A, and thus aTi ∈ Rn.): xk+1= 1 m m X i=1 PHi(x k₎ = 1 m m X i=1 xk+bi− ai· x k ||aT i||2 aT_i = xk+ m X i=1 bi− ai· xk m||aT i ||2 aT_i

We can rewrite the equation above more compact:

= xk+ aT₁ · · · aT m    1 m||a1||2 . .. 1 m||am||2          b1 .. . bm   −    a1 .. . am   x k    = xk+ ATW−1(b − Axk) with W = diag(m||ai||2).

This is Cimmino’s method, and we can change it into the form of the general block-iterative version by setting

x(k,j) = x(k,j−1)+ V_j−1AT_jW_j−1(bj− Ajx(k,j−1))

where Wj is the diagonal matrix with entries (Wj)ii= l ·Pl_k=1(Aj)2_ik (l times the square

of the norm of the ith row of block Aj), with l = num rows in Aj, and of course Vj = I.

Another well known method, which in here we will call block SART, is the following block-iterative method:

x(k,j) = x(k,j−1)+ V_j−1AT_jW_j−1(bj− Ajx(k,j−1))

where Vj, Wj are diagonal matrices defined as (Vj)ii=Pl_k=1(Aj)ki (the sum of the ith

column of Aj), and (Wj)ii=Pn_k=1(Aj)ik (the sum of the ith row of Aj).

Commonly, in the case where each of the blocks Aj contains exactly one single

an-gle, this method is called SART. In the case that there is only one single block A1 which

contains all angles, it is known as SIRT (Simultaneous Iterative Reconstruction Tech-nique) [3].

At last we note that our new method as given in Section 3.2 also has the form of a block-iterative method, namely with Vj = I and Wj being the diagonal matrix defined

as (Wj)ii=

√

2·q·Pn

(29)

4 Numerical experiments on

convergence dependence of

block-iterative methods

In this chapter we perform numerical experiments to find out how the convergence of the three discussed block-iterative methods (Cimmino’s method, block SART and the new method described in section 3.2) can depend on:

1. The number of blocks chosen per whole iteration.

2. The choice of angles in a block. (Which we call a block ordering.)

We investigate the convergence with help of our own programs written in C, which we use to create synthetic data (of a simulation of the 2D X-ray setting with parallel beam geometry) and to reconstruct the image with the three block-iterative methods.

For the experiments described in this chapter, we start with a 200×200 sized image, and create synthetic data (without noise) based on a line model and parallel beam geometry. In total we use 200 angles spread equidistantly over the range [0, π), and for each angle we use 200 equidistant lines. This makes a total of 40000 measurements. The ground truth image and its corresponding sinogram are visible in Figure 4.1.

Figure 4.1: 200×200 ground truth image used in the experiment (left) and sinogram of the synthetic data (right).

The input data for the reconstruction program consists of those 40000 measurements. Furthermore, to be able to compare the effectiveness of the three different methods and the convergence dependence on the number of blocks and the block ordering, we base the effectiveness of the method on calculating:

(30)

1. The norm of the residual ||b − Axrec||.

2. The norm of the difference between the original and reconstructed image ||xorig−

xrec||.

where xrec, xorig are the reconstructed and original image, respectively. Note that in

the real world situation, only the norm of the residual could be calculated and not the norm of the difference between the original and reconstructed image, since of course the original image is unknown.

Before we present our experiments on convergence dependence, we first check how much the three methods actually differ from each other. In Figure 4.2, we see for the three block-iterative methods, on the left side a plot of the relative error ||b − Axrec||/||b||

against the number of iterations, and on the right side a plot of the relative ratio ||x_orig − x_rec||/||x_orig|| against the number of iterations. Note that the y-axis shows the relative error (in logarithmic scale). So the lower the graph is, the better the recon-struction. On these kind of “convergence figures”, we mostly base our conclusions on the right side of the figure, since we want xrec to converge to xorig, rather than that the

residual ||b − Axrec|| goes to zero. The number of blocks was set to 1 (which means that

the single block also contains all angles).

Figure 4.2: Relative error against the number of whole iterations for the three block-iterative methods.

What we notice is that the reconstructed image in Cimmino’s method converges much slower to the original image than the other two methods, and that the convergence speed decreases with growing iteration number.

(31)

4.1 Dependence on the number of blocks

We investigate first how the convergence of the three block-iterative methods will depend on the number of blocks we use. We have 200 angles in total and experiment with the number of blocks chosen as: 200, 100, 40, 20, 5 and 1. We again check both the error ratio ||b − Axrec||/||b||, as well as ||xorig − xrec||/||xorig||. Furthermore, we compute a

total of 10 total iterations and use a sequential block ordering, meaning the first block corresponds to angles 1, . . . , k, the second block corresponds to angles k + 1, . . . , 2k, etc. Figure 4.3 shows the results for Cimmino’s method. What we clearly see here, is that the more blocks we have, the faster the algorithm converges. The reconstructed image gained from Cimmino’s method with 40 blocks and 10 iterations can be seen in Figure 4.4.

Figure 4.3: Relative error against the number of whole iterations for different number of blocks used in Cimmino’s method.

Figure 4.4: Reconstructed image from Cimmino’s method.

(32)

here we see that the graphs are not ordered to block size (contrary to what is the case in Cimmino’s method). Furthermore, when we take either the most or least number of blocks, we still get a similar convergence speed, and the best number of blocks in this sequential ordering seems to be around 20.

Figure 4.5: Relative error against the number of whole iterations for different number of blocks used in block SART.

The convergence compared with Cimmino’s method does seem to be quicker in any case (see Figure 4.2), and the resulting reconstructed image (Figure 4.6) indeed looks better.

Figure 4.6: Reconstructed image from block SART.

At last we repeat the experiment for our new method given in section 3.2 and present the results in Figure 4.7. In here we get similar results as in block SART. We see that the best number of blocks in the sequential ordering is around 40 blocks, and again compared with Cimmino’s method, this method converges quicker and results in a better reconstructed image (Figure 4.8).

(33)

Figure 4.7: Relative error against the number of whole iterations for different number of blocks used in new method.

Figure 4.8: Reconstructed image from new method.

4.2 Dependence on the block ordering

Next we investigate how the assignment of angles to blocks influences the speed of convergence. We analyze a total of four different choices of ordering of angles:

1. Sequential ordering. 2. Random ordering. 3. Split-cyclic ordering.

4. Block split-cyclic ordering (with step ratio 2₃).

To make clear what we mean by those four orderings, we explain them with the help of a small example: Suppose we have in total 15 angles, 1, . . . , 15 which are ordered from

(34)

small to large, and want to assign them to 5 blocks B1, . . . , B5 each containing 3 angles.

For the four orderings we then get:

Sequential ordering: We assign the angles sequentially, meaning B1 = [1, 2, 3], B2 =

[4, 5, 6], B3 = [7, 8, 9] etc.

Random ordering: We assign each Bi 3 randomly chosen angles. Of course, each angle

can only be chosen exactly once in total.

Split-cyclic ordering: The angles are assigned to a block Bi by the rule: Bi contains the

angles k = m · N + i, m = 0, . . . L − 1 where N is the number of blocks and L is the size of a block. So in our example N = 5, L = 3 and we get B1 = [1, 6, 11],

B2 = [2, 7, 12], B3 = [3, 8, 13] etc. (This can be visualised as splitting the angles

across the blocks by going in a cyclic manner through the blocks.)

Block split-cyclic ordering: We begin with a sequential ordering, giving us blocks S1=

[1, 2, 3], . . . , S5 = [13, 14, 15].

Then let N be the number of blocks and define the “block distance” d as

d = [ step ratio × N ] ([x] means the nearest integer to x). So in our example we have N = 5 and d = [2₃×5] = 3.

The real block ordering Bi is a rearrangement of the sequantial ordering Sj, and

the blocks Bi are inductively defined as follows:

• (For k = 1) B1= S1.

• (For k + 1) Let p be s.t. B_k = Sp, and define S0 = SN. Then we define

Bk+1 = S(p+d) mod N, and if S(p+d) mod N is already assigned to some

Bj, j < k, then we assign Bk+1 = Sl where Sl is the nearest unassigned block

to S_{(p+d) mod N}.

So in our example: The first block is B1= S1. The next block must be d = 3 block

distances further, so B2 = S(1+3) mod 5= S4. Continuing in the same way we get

B3 = S(4+3) mod 5= S2, B4 = S(2+3) mod 5= S0 = S5 and B5 = S(5+3) mod 5= S3.

(The idea is thus that we select the blocks Sj in a cyclic manner such that the

distance between consecutive blocks is equal to the block distance d.)

To see how much these four choices can affect the convergence for the three methods, we perform an experiment with a fixed number of blocks equal to 40, and we again look at the convergence for 10 whole iterations of the four orderings. Note that the assignment of angles to the blocks is deterministic in the sequential, cyclic and block split-cyclic orderings, but not in the random ordering. Therefore, (since we want to show the expected convergence speed) we performed the experiment once for the deterministic orderings, and three times for the random ordering to ensure that the convergence speed we present here is typical of a random ordering. All three runs of the random ordering were similar, and we have just picked one of the three arbitrarily.

(35)

We start with Cimmino’s method. In Figure 4.9 we see that there is almost no dif-ference between the four block orderings.

Figure 4.9: Relative error against the number of whole iterations for different block or-derings in Cimmino’s method.

In block SART (Figure 4.10) we see that the only ordering which stands out, is the sequential ordering for converging slower than the other three. We also see that the random ordering performs slightly better than the other two remaining orderings.

Figure 4.10: Relative error against the number of whole iterations for different block orderings in block SART.

In the new method (Figure 4.11), we get something similar to the block SART. Again the sequential ordering converges slower, but here the block split-cyclic ordering seems

(36)

to be the best by a small margin.

Figure 4.11: Relative error against the number of whole iterations for different block orderings in new method.

So the number of blocks matters, and sometimes also the block ordering is relevant. We finish this section with an interesting observation about block SART. In that method there was only a big difference between sequential and any of the other three orderings. Redoing the experiment on the number of blocks in block SART with now a random ordering, gives us the following result (Figure 4.12): The more blocks we have, the smaller the error ratio becomes (which is the same conclusion for Cimmino’s method as can be seen in Figure 4.3).

Figure 4.12: Relative error against the number of whole iterations for different number of blocks used in block SART with a random block ordering.

(37)

5 An implementation of the

block-iterative method

In this chapter, we discuss an implementation of the block-iterative method in its general form x0 = starting vector, x(k,0)= xk x(k,j) = x(k,j−1)+ λk,jVj−1A T jWj−1(bj− Ajx(k,j−1)), j = 1, . . . , p x(k+1,0)= x(k,p)

where we have chosen an partition of A into blocks Aj, j = 1, . . . , p beforehand.

Since we have not done any actual experiments with the relaxation parameter λk,j,

we leave it out of the discussion. Although note that the change needed to include it is minimal, since it just involves one extra multiplication only. The general structure of the calculating part of the program should then look roughly like this:

x = starting guess calculate vectors Vj,Wj for k = 1 to num_iterations

for i = 1 to num_blocks

calc_new_v(v, Wi, bi) // Do: v = Wi^-1 * (bi - Ai * x) calc_new_x(x, Vi, v) // Do: x = x + Vi^-1 * A^T * v end for

end for

Figure 5.1: General structure of our implementation of the block-iterative method. As can be seen from the expressions that need to be calculated, the block-iterative method mostly consists of three parts:

1. Precalculating the constant (diagonal) matrices Vj and Wj for j = 1, . . . , p.

2. Calculate Ajw for any vector w, which we call the forward projection of Aj.

3. Calculate AT

jz for any vector z, which we call the backward projection of Aj.

Note that the reason we precalculate Vj and Wj is that they are diagonal and thus

small enough to be stored in memory. On the other hand, as seen before, any of the matrices Aj is not a diagonal matrix and would take too much space to store explicitly.

(38)

Therefore the functions we write cannot use any matrix Aias if it was stored in memory.

To get the result of the matrix multiplication Ajw, instead of making l · n

multipli-cations with l = number of angles in a block * number of lines per angle the number of rows in Aj and n = number of pixels, we will see that we can do it in

a matrix-free method. This method will also be much more efficient than a standard dense matrix-vector multiplication, since Aj contains a lot of zero entries. The idea

for this method, is to make use of the meaning that the entries of Aj have. For the

backprojection Ajz, we will do and discuss something similar.

Recall from chapter 2 that the system matrix A is given as:

A =     

a(θ1, s1, π1) a(θ1, s1, π2) . . . a(θ1, s1, πn)

a(θ2, s2, π1) a(θ2, s2, π2) . . . a(θ2, s2, πn)

..

. ... . .. ... a(θm, sm, π1) a(θm, sm, π2) . . . a(θm, sm, πn)

    

where each entry a(θi, si, πk) in the line model is the intersection length of line (θi, si)

with the kth pixel.

We have not yet explicitly said how to calculate those intersection lengths. Since we will not store the matrix A, it means that if we need to know the value of any entry of A, we need to calculate it every time from scratch. Therefore, the computation of such an entry needs to be done as efficient as possible and must not take a long time. This could also be an argument why the line model might be better than the strip model, since in general it is easier to calculate the intersection length than an intersection area. So before we discuss implementations of the forward and backward projections, we ex-plain how those matrix entries can be calculated. As in section 2.3 we assume that f is discretized on a rectangular grid of unit pixels.

5.1 Line and strip model coefficients

Suppose we are given a pixel (u, v) and a line (θ, s) and we need to calculate the inter-section length. We note that given an angle θ, if we let fline(s) be the function that

gives the intersection length of line (θ, s) with pixel (u, v) it is actually a piecewise linear function (and even continous, except for θ ∈ {0,π₂}) as can be seen in Figure 5.2).

(39)

Figure 5.2: For fixed θ, the corresponding function fline.

The function fline is thus piecewise linear, and also symmetric at the line x = 0. Thus

to fully determine fline, we would only need to know the values of c, d as seen in the left

side of Figure 5.2, and the maximum of fline which can easily be calculated with some

trigonometry. Any other pixel (u2, v2) is just a translation of the pixel (u, v), thus the

intersection length of the same line with pixel (u2, v2) is also just some translation of

the function fline.

For a strip model with the parallel beam geometry, we note that we can use the same idea and figure, but now we need the intersection area of the pixel with some two lines (θ, s1), (θ, s2). Thus again for a given angle θ, we could derive a function

fstrip(s1, s2) which calculates the intersection area of the area between lines (θ, s1) and

(θ, s2) with pixel (u, v). One method to calculate that would be to make use of a function

fint(s) :=

Rs

−∞fline(t) dt which is also quite simple since it will be piecewise linear or

quadratic. We can then simply set fstrip(s1, s2) = fint(s2) − fint(s1).

5.2 Forward projection

In this section we discuss an efficient implementation to calculate the forward projection of an arbitrary vector w ∈ Rn. So we need to calculate the vector y = Aw, where A is the system matrix, in this case corresponding to the line model, and vector w is the flattened 2D image.

Note that in the block-iterative algorithm, we actually had to do the forward projection Ajw with Aj representing the jth block. But since every block just corresponds to a

certain selection of angles out of all the total angles, and the explanation we give on how to implement the forward projection is based only on the set of angles we look at, we just discuss the implementation of the full forward projection y = Aw. The changes needed to calculate Ajw are minimal.

(40)

prod-uct of the ith row of A with w. And as we have seen before, every row of A corresponds to a single line (θi, si) and that row looked like:

aθi,si = a(θi, si, π1) a(θi, si, π2) . . . a(θi, si, πn)

where a(θi, si, πk) is the intersection length of line (θi, si) with the kth pixel. So we get:

yi = n

X

k=1

a(θi, si, πk) · wk

Note that most of the terms in the summation are zero. The terms which could be nonzero are exactly the terms a(θi, si, πk) · wk where the line (θi, si) intersects pixel πk.

These nonzero terms form a much smaller subset of all terms, and we thus only look at the pixels which the line (θi, si) intersects. The pseudocode to implement the forward

projection then goes as follows:

// Pseudocode to calculate forward projection y = A*w, // where A is a m x n matrix.

// Calculate the ith component of vector y. for i = 0 to (m-1)

// Calculate angle (theta) and dist (s) corresponding to (i+1)th row of A. theta,s = ...

y[i] = 0

// Make distinction between more ‘vertical’ or more ‘horizontal’ line. if (0 <= theta < PI/4 or 3PI/4 <= theta < PI )

// Go through the pixel-rows v=0,...,PG_height-1. for v = 0 to (PG_height-1)

// Find out between which two pixels (u1,v),(u2,v) the line passes. u1,u2 = ...

// Calculate corresponding matrix entries. coef1 = matrix_coef(theta, s, u1, v)

coef2 = matrix_coef(theta, s, u2, v) // Add only if pixels are within bounds. if (0 <= u1 < PG_width) then

y[i] = y[i] + coef1 * w[v*PG_width + u1] end if

if (0 <= u2 < PG_width) then

y[i] = y[i] + coef2 * w[v*PG_width + u2] end if

(41)

else

...(similar to above, but now with respect to columns)... end if

end for

Note that this method is easy to parallelize since the calculation of each ith coordinate of yi= (Aw)iis independent. In a CUDA implementation, we could then just design the

kernel, such that each thread in a GPU block calculates exactly one single coordinate of the vector y.

5.3 Backprojection

In this section we discuss an efficient implementation to calculate the backprojection on an arbitrary vector z ∈ Rm, meaning the calculation of y = ATz. Again, since our discussion is based on just the set of angles we look at, the changes needed to calculate AT_jz instead of ATz will be minimal.

We first note that the kth row of AT will look like:

a(θ1, s1, πk) a(θ2, s2, πk) . . . a(θm, sm, πk)

Thus we see that for the kth coordinate of y = ATz we get:

yk= m

X

i=1

a(θi, si, πk) · zi

Similarly as we have seen in the forward projection, most of the terms in the summation are zero. The nonzero terms are now the terms a(θi, si, πk) · zi where pixel πk is

inter-sected by a line (θi, si). Therefore, the idea is to only look at the lines which intersects

pixel πk. The pseudocode to implement the backward projection then goes as follows:

// Pseudocode to calculate backward projection y = A^T*z, // where A^T is a n x m matrix.

// Calculate the kth component of vector y. for k = 0 to (n-1)

y[k] = 0

// Calculate pixel coordinates (u,v) corresponding to kth pixel. u,v = ...

// For each angle, find the dist’s from that angle which intersects // pixel (u,v).

(42)

theta = qth angle

// Find bounds k1,k2 s.t. only the jth lines (k1<=j<=k2) with // angle theta can intersect pixel (u,v).

for j = k1 to k2

s = distance corresponding to jth line

// Find ith row corresponding to line (theta,s) and add result. i = ...

y[k] = y[k] + matrix_coef(theta,s,u,v) * z[i] end for

end for end for

Again (by design), we have that the computation of each kth coordinate of yk= (Az)k

is independent, meaning it is also easily parallelizable. In a CUDA implementation, we again design the kernel such that each thread in a GPU block calculates exactly one single coordinate of the vector y.

(43)

6 Performance analysis

In this chapter, we talk about convergence speed in wall-clock time and about the per-formance of our GPU implementation of the block-iterative method. We also use specific tools to help understand the performance we observe on the GPU. A much broader ex-planation on many terms and definitions from the GPU can be found in [1].

In the introduction we have already mentioned that the GPU is expected to outper-form the CPU. To demonstrate this, we propose the following experiment: we compare the time it takes to compute the double for-loop as seen in Figure 5.1 for our CPU and GPU implementations. The experiment setup used is the one described in chap-ter 4. The number of blocks is set to 40 and a sequential ordering is used. We have calculated the average duration out of three runs. The average duration for the CPU implementation was 15.45 seconds, and for the GPU implementation 0.1090 seconds. This means a speedup of almost 142 times! Although do note, that we have compared only the calculation of the double for-loop and not of the whole program. Depending how much “non-parallelizable” computations the rest of the program does, the speedup of the whole program will differ. Also note that this example is merely given to show the effect a GPU can have, and this speedup is not to be taken literally. The reason for that is that in this experiment the computations are not exactly the same, since for example we have used double precision floating point in the CPU implementation and single precision in the GPU implementation (most GPUs perform much better in single precision), and the problem size also has an effect.

6.1 Convergence dependence on number of blocks in

wall-clock time

We see in chapter 4 that the number of blocks matters in the convergence speed of the algorithm. However, in that experiment we compared the number of blocks with just the number of total iterations (which ran from 1 to 10). For practical uses where we just need to reconstruct the image as accurate and quick as possible, it is rather the case that we want the method to converge as quick as possible in wall-clock time, and thus not in number of whole iterations.

The reason why there is a difference, is that the time taken to compute an iteration, might depend on the number of blocks we have per iteration. This could especially be the case when we run our GPU implementation of the method on the GPU, since a higher number of blocks corresponds with less computational work per block, meaning

Block-iterative methods in tomography

Block-iterative methods in tomography

Michael Mo

July 13, 2017

Abstract

Contents

1 Introduction

1.1 Tomography

1.2 Research question

1.3 Thesis outline

2 Modelling of a standard setting: X-ray

scanners

2.1 The X-ray setting

2.2 Analytical methods

2.3 Algebraic methods

3 Block-iterative methods

3.1 The Kaczmarz method

3.2 New method

3.3 General block-iterative methods

4 Numerical experiments on

convergence dependence of

block-iterative methods

4.1 Dependence on the number of blocks

4.2 Dependence on the block ordering

5 An implementation of the

block-iterative method

5.1 Line and strip model coefficients

5.2 Forward projection

5.3 Backprojection

6 Performance analysis

6.1 Convergence dependence on number of blocks in

wall-clock time