• No results found

Context models of lines and contours

N/A
N/A
Protected

Academic year: 2021

Share "Context models of lines and contours"

Copied!
297
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

https://doi.org/10.6100/IR630035

DOI:

10.6100/IR630035

Document status and date: Published: 01/01/2007 Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne

Take down policy

(2)
(3)

Colophon

This thesis was typeset by the author using LATEX2

¶. The main body of the text was set

using 10-point Times Roman font. Figures were obtained using Mathematica, a product of Wolfram Research Inc., and Illustrator, a product of Adobe Systems Incorporated. Im-ages were included in the Portable Document Format (a registered trademark of Adobe Systems Inc.). The LATEX2

¶output was converted to PDF and transferred to film for

print-ing.

Financial support for the publication of this thesis was kindly provided by the Technische Universiteit Eindhoven.

The cover has been designed by the author. The photography of the zebras was taken by Erika Jagau.

Printed by the Printservice of the Technische Universiteit Eindhoven, the Netherlands A catalogue record is available from the Library Eindhoven University of Technology ISBN: 978-90-386-1117-4

c

2007 M. van Almsick, Essen, Germany, unless stated otherwise on chapter front pages, all rights are reserved. No part of this publication may be reproduced or trans-mitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the copyright owner.

(4)

Context Models of Lines and Contours

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de Technische Universiteit Eindhoven, op gezag van de Rector Magnificus, prof.dr.ir. C.J. van Duijn, voor een

commissie aangewezen door het College voor Promoties in het openbaar te verdedigen op dinsdag 11 september 2007 om 16.00 uur

door

Markus van Almsick

(5)

Dit proefschrift is goedgekeurd door de promotor: prof.dr.ir. B.M. ter Haar Romeny

Copromotor: dr. L.M.J. Florack

(6)

Contents

Colophon ii

Contents v

1 Introduction 1

1.1 Gestalt Laws . . . 3

1.2 The visual system . . . 7

1.2.1 Visual Preprocessing in the Retina . . . 7

1.2.2 The Visual Pathway . . . 9

1.2.3 The Visual Cortex . . . 10

1.2.4 The Euclidean Group Manifold of the Brain . . . 15

1.2.5 Horizontal Connections in the Striate Cortex . . . 21

1.3 Related Image Analysis Methodologies and Their Purpose . . . 26

1.4 Outline . . . 27

2 Orientation Scores 31 2.1 Correlation and Convolution . . . 34

2.2 Translation Invariance . . . 35

2.3 Transformations of Receptive Fields . . . 36

2.4 Probing and Expressing Images by Test-Functions . . . 37

2.5 Group Theory . . . 41

2.5.1 Finite Groups . . . 41

2.5.2 Lie Groups . . . 42

2.5.3 Unitary Representations . . . 43

2.5.4 Reducible and Irreducible Representations . . . 44

2.5.4.1 Translation Group . . . 45

2.5.4.2 Rotation Group . . . 50

2.5.4.3 Euclidean Group . . . 51

2.5.5 Orthogonality Theorem . . . 52

2.6 Invertible Orientation Scores . . . 54

2.7 Admissible Test-Functions . . . 57

2.8 Summary . . . 63

3 Line and Contour Models 67 3.1 Axioms for Line and Contour Models . . . 70

3.2 Stochastic Models of Lines and Contours . . . 72

(7)

3.2.2 Stochastic Line and Contour Processes . . . 75

3.2.3 Fokker-Planck Equation of Lines and Contours . . . 76

3.2.4 Time-Integration of Line and Contour Processes . . . 79

3.2.5 Euclidean Symmetry Constraints . . . 81

3.2.6 Mumford’s Stochastic Equation . . . 85

3.2.7 Approximation of Mumford’s Stochastic Equation . . . 86

3.2.7.1 Cartesian Spline Approximation . . . 86

3.2.7.2 Polar Spline Approximation . . . 88

3.3 Correlation of Line and Contour Features . . . 90

3.3.1 Empiric Verification . . . 90

3.3.1.1 Stochastic Properties of Retinal Blood Vessels . . . . 90

3.3.1.2 Stochastic Properties of MRI Contours . . . 92

3.3.2 More Complex Stochastic Processes . . . 93

4 Variational Line Models 103 4.1 The Lagrangian Density . . . 104

4.2 The Extremal Path . . . 107

4.3 Spline Approximations . . . 110

4.4 Properties of Extremal Contours and Lines . . . 112

4.5 Cocircularity and the Optimal Angle of Incidence . . . 113

4.6 Summary . . . 117

5 Green’s Function of Lines and Contours 129 5.1 Numerical Solutions of Mumford’s SDE . . . 131

5.2 Analytic Solution of the Fourier-transformed Green’s Function . . . 136

5.2.1 From a Partial to an Ordinary Differential Equation . . . 136

5.2.2 The Mathieu Equation . . . 139

5.2.3 The Floquet Theorem . . . 141

5.2.4 Periodic Mathieu Functions of the First Kind . . . 142

5.2.5 Expansion into Periodic Mathieu Functions . . . 143

5.2.6 Piecewise Assembly of Floquet Functions . . . 145

5.2.7 Green’s Function for º ¹ 0 . . . 150

5.2.8 Inverse Fourier-Transformed Green’s Function . . . 152

5.2.9 Α-Marginal of the Green’s Function . . . 159

5.2.10 First Moments of the Green’s Function . . . 160

5.3 Line Kernel of Cartesian Spline Approximation . . . 167

5.3.1 Assumption of Approximation . . . 167

5.3.2 Green’s Function of Cubic Splines . . . 168

5.3.3 Completion Fields . . . 173

5.3.4 Fourier-Transformed Approximate Green’s Function . . . 174

(8)

5.5 Summary . . . 182

6 Steerable Filters in G-Convolutions 183 6.1 G-Convolution and G-Cross-Correlation . . . 185

6.2 Euclidean Group . . . 187

6.3 Euclidean G-Convolution and G-Correlation . . . 190

6.3.1 2-Dimensional Case . . . 190

6.3.2 3-Dimensional Case . . . 191

6.4 Irreducible Unitary Representations of Translation and Rotation . . . . 192

6.4.1 Spatial Decomposition under Translation . . . 192

6.4.2 Spatial and Angular Decomposition in 2-Dimensions under Ro-tation . . . 193

6.4.3 Spatial Decomposition in 3 Dimensions . . . 194

6.4.4 Angular Decomposition in 3-Dimensions . . . 195

6.5 Steerable Kernels . . . 197 6.5.1 2-Dimensional Case . . . 197 6.5.1.1 Euclidean G-cross-correlation . . . 197 6.5.1.2 Euclidean G-convolution . . . 200 6.5.2 3-Dimensional Case . . . 203 6.5.2.1 Euclidean G-cross-correlation . . . 203 6.5.2.2 Euclidean G-convolution . . . 206 6.6 Summary . . . 209

7 The Framework and Its Application 211 7.1 Anisotropic Diffusion Equations . . . 212

7.2 Tensor Voting . . . 216 7.2.1 Tensors . . . 217 7.2.2 Voting Fields . . . 220 7.3 Application . . . 226 7.4 Conclusion . . . 230 8 MathVisionTools 235 8.1 Purpose and Strategy . . . 236

8.2 Implementation Example: Gaussian Derivative . . . 239

8.3 Application Example: Deblurring . . . 242

8.4 MathVisionTools Commands . . . 244

8.4.1 Hankel Transformation . . . 246

8.4.2 Orientation Score . . . 252

8.4.3 Line Kernel . . . 253

(9)

A Stochastic Calculus 261

A.1 Stochastic Differential Equations . . . 262

A.2 Description of Noise . . . 263

A.2.1 Characteristics of Probability Distributions . . . 264

A.2.2 Wiener Process . . . 270

A.2.3 White Noise Process . . . 272

A.3 Itô Calculus . . . 273

A.4 Itô’s Rules . . . 279

A.5 Itô’s Formula . . . 281

A.6 Fokker-Planck Equations . . . 282

A.7 Stratonovich Calculus . . . 284

B Partial Differential Equations 289 C Variational Calculus 297 C.1 Euler-Lagrange Equation . . . 298

C.2 Eulers Equations for Elastica . . . 300

D Mathematical Notations and Functions 305 D.1 Mathematical Notation . . . 307 D.2 Special Functions . . . 307 D.2.1 Orthogonal Polynomials . . . 308 D.2.2 Pochammer Symbol . . . 308 D.2.3 Hypergeometric Functions . . . 308 D.2.4 Clebsch-Gordan coefficients of SU (2) . . . 308 D.2.5 Elliptic Functions . . . 309 D.2.6 Mathieu Functions . . . 309 Bibliography 311 Summary 321 Curriculum Vitae 327 Publications 329

(10)
(11)

The alternating black and white stripes of a zebra obscure the outline of the animal and may be an evolutionary adaptation that prevents zebras from being seen by predators such as lions or hyenas. This hypothesis goes back at least to Rudyard Kipling [60]. According to Thayer [100], Marler and Hamilton [71] the zebra pattern acts as camou-flage to blend with the background and Cott[17] and Kruuk[63] voiced the assumption that the black and white lines were to confuse predators as to the distance of the fleeing animal. None of these hypotheses are easily confirmed. They are even contended by Waage [111], who believes that the stripes obliterate large single-colored regions, that are favored by biting insects such as the tsetse fly. According to Vale [103], flies prefer, large, dark, moving animals.

Figure 1.1: The outline of Zebras is obstructed by their pattern of black and white stripes.

Whatever the correct assumption, the stripes are designed to confuse the visual sys-tem of a predator, be it a lion, or be it a fly. And even a reader who rejects all of the above hypotheses may have to admit that it is hard to recognize the outline of a Zebra in a herd (Figure 1.1). The pattern of the Zebra challenges the visual system of the predator with a key issue in image recognition, termed perceptual grouping. On one hand an im-age is a field of spatially distributed light intensities. On the other hand imim-ages represent discrete objects, e.g. zebras, rivers, trees, etc. Bridging the gap by assigning a collec-tion of signals to distinct objects is at the heart of image recognicollec-tion. Grouping the right collection of signals is a task not easily accomplished.

This thesis deals with the grouping of line and contour segments. We develop a mathematical framework for contextual line and contour detectors, that captures at least some of the generic mechanisms found in the physiology of the human visual systems, that utilizes the applicable principles revealed by psycho-physical experiments, and that incorporates the appropriate methodologies in image analysis. With respect to this

(12)

ob-jective this thesis is different than most endeavors in image analysis. It does not promote a new method and proof its effectiveness, but it merges existing image analysis tools into a comprehensive mathematical framework thereby providing a better understanding of the underlying assumptions, principles, and relations among existing techniques, and consequently revealing avenues for optimization and generalization.

1.1

Gestalt Laws

The results of psycho-physical experiments that address perceptual grouping have been summarized in general laws known as the Gestalt Principles of Perceptual Organiza-tion[115][62]. The human visual system groups incoming signals by the following cri-teria.

• Proximity

Signals in temporal and spatial proximity tend to belong to one and the same object (see Figure 1.2 left).

• Similarity

Signals with similar properties, such as color or texture, are grouped together (see Figure 1.2 right).

• Continuity

Signals of one object tend to change their properties continuously rather than abruptly (see Figure 1.3).

• Closure

We tend to see the closure of incomplete figures or shapes (see Figure 1.4). • Common Fate

Signals with the same temporal pattern in motion and strength are perceived as one object.

One can apply all five Gestalt laws to the grouping of line and contour segments. In this thesis, however, we do not consider time-depentent images and video sequences, so that the law of common fate is beyond our scope.

An important aspect of the Gestalt laws except for closure is the fact, that these grouping rules do not depend on the recognition of objects, but built on the properties of incoming signals. Thus, they do not depend on the feed-back of recognition results that occur way down the image processing pipeline. Instead, they reflect the capabilities of the first stages of the human visual system without incorporating higher brain functions. This does not imply, that higher brain functions are excluded from the perceptual grouping process. A prominent visual illusion that demonstrates the influence of face recognition on perceptual grouping is the witch/women illusion (Figure 1.6), but the

(13)

Figure 1.2: The squares on the left are grouped by proximity and the squares and circles on the right are grouped according to similarity.

Figure 1.3: The line segments in the left image are grouped in the right image according to the Gestalt law of good continuations.

(14)

Figure 1.4: The 7 corners in the graphic tend to be seen as a square and a triangle, which represents the closure.

Figure 1.5: Line segments depict a circle in the left and right image. The saliency of the circle on the left is higher than on the right due to the texture that hampers the perceptual grouping.

(15)

Figure 1.6: This visual illusion depicts a young or an old woman depending on how the visual system groups the contour lines that make up the woman’s face.

complexity of this perceptual grouping task is still beyond the scope of current image analysis tools.

1.2

The visual system

The physiology of vision has been burgeoning and still is an ever growing field of re-search with new articles published on a weekly or even daily basis. We, therefore, do not even attempt to give a superficial introduction but merely mention a few facts regarding the visual systems of humans, macaque monkey, and tree shrews, that are relevant for the mathematical models in this thesis. For an overall introduction, see [46][55].

1.2.1

Visual Preprocessing in the Retina

Visual processing begins in the retina of the eye. The photoreceptors (rods and cones) are located in the back of the retina and are responsible for the conversion of light en-ergy into neural activity. Depending on the incoming light, the photoreceptors modulate the activity of bipolar cells, which in turn connect to ganglion cells. The axons of the ganglion cells form the optic nerve, which carries the information from the retina to the brain. Bipolar cells and ganglion cells integrate the signals of photoreceptors in such a way that they are tuned to respond to certain light distributions on small circular patches of the retina, called the cell’s receptive fields (RF). Both, bipolar cells and ganglion cells

(16)

+

-

+

-Figure 1.7: Representations of receptive fields of ganglion cells with light grey indicating areas of excitation and dark gray areas of inhibition. The left depicts a on-center/off-surrond field, which belongs to a ganglion cells that excites if light reaches the center portion of the field and that is inhibited if light falls on the surrounding. Ganglion cells with off-center/on-surrond receptive fields, as depicted on the right, exhibit the opposite behavior.

have two basic types of retinal receptive fields: on-center/off-surrond and off-center/on-surround(Figure 1.7) [64]. The center and its surround are always antagonistic and tend to cancel each other’s activity. The receptive fields of ganglion cells incorporate three invariance principles of image analysis. The ganglion RF measures the luminosity dif-ference between the center and the surround and is therefore susceptible towards contrast and not overall luminosity in an image. This invariance causes all subsequent image pro-cessing steps to work equally well on bright and dark images as long as enough contrast is ensured. Second, ganglion receptive fields are roughly circular and thereby invariant with respect to rotation. Last but not least, ganglion receptive fields come in different sizes gathering image information at different scales. The receptive field size of ganglion cells varies and amounts to an angular section of 6¢by 6¢near the fovea to 3ë by 3ëin

the periphery of the visual field. The diameter of the RF center can be as small as 2¢arc,

corresponding to the cross section of a 1 Euro coin at 40 meter distance. The concept of scale independence has give rise to a mathematical theory in image processing, termed scale-space[99][82][57][36][59][98][92][21].

1.2.2

The Visual Pathway

The 1.5 million axons of the ganglion cells form the optic nerve that leaves the eye and conveys the image information on the retina to the brain. Some 80% of the of the gan-glion axons lead to the left and right lateral geniculate nucleus (LGN), which is the main relay half way between the retina and the visual cortex in the posterior pole of the

(17)

occip-Retina

Optic tract

Optic chiasm

Lateral

geniculate

nucleus

Primary

visual

cortex

Optic

radiation

Left visual field

Right visual field

Left hemisphere Right hemisphere

Figure 1.8: The major pathway of visual information from the eye to the primary visual cortex.

ital cortex (Figure 1.8). Other pathways of ganglion axons lead to the suprachiasmatic nucleus of the hypothalamus to synchronize the internal clock with night and day. Some axons end in the pretectum of the midbrain that controls the opening of the pupil and certain eye movements. About 10% of the axons emerging from the retina project to a part of the tectum (roof) of the midbrain called the superior colliculus. This relatively large pathway triggers movements of the eyes and head, via the motor neurons of the brainstem trying to bring salient image features onto the fovea. These are just some of the intricate side tracks, which we boldly neglect now that we have mentioned them.

We direct our attention to the major pathway that leads via the LGN to the primary visual cortex. The left side of the brain processes the input of the right field of vision, where as the right side receives the input from the left field. Hence, axons coming from the nasal hemifields of the retina in the left (contralateral) and in the right (ipsilateral) eye have to cross to the opposite hemisphere via the optic chiasm. The axons of the retinal ganglions cells for the left visual field then terminate in the right LGN and vice versa. The 1.5 million axons in the optic nerves innervate about the same number of LGN-neurons, which are stacked into six separate layers. It is well know that the LGN segregates in its layers the axons of the ipsi- and contralateral eye as well as different ganglion types. Furthermore, each of the six layers maintains a retinotopic map, which extends to the primary visual cortex, and which will be explained further below. Under a simplistic view the LGN appears to merely relay the retinal information one-by-one, as LGN neurons exhibit the same receptive fields as ganglion cells and as they convey these responses

(18)

G

G

1mm

Rostral

Lateral

20

o

-10

o

0

o

10

o

10

o

20

o

-20

o

a

b

c

d

a

b

c

d

20

o

-10

o

0

o

10

o

-20

o

20

o

-10

o

0

o

10

o

Right Visual Field

Left Visual Cortex

Figure 1.9: A diagram of the right visual field (on the right) and the left visual cortex V1 of the tree shrew. The projection (retinotopic map) of a cross hair and the letter G in the field of vision is shown on the striate cortex rotated and inverted.

via the optic radiation to the primary visual cortex. However, is has been observed that despite the great innervation from the retina, about 80% of the excitatory input to the LGN comes not from the retina but from the primary visual cortex. The primary visual cortex appears to exert a significant feedback effect on the LGN. However, these feedback mechanisms are not well understood and beyond our scope, so that we view the LGN in this thesis as a simple relay without significant data processing capabilities.

1.2.3

The Visual Cortex

The next site in the visual pathway is the primary visual cortex, also referred to as the striate visual cortex, area 17, or V1. It is located in the occipital lobes of the cerebral cortex in both hemispheres. The left area V1 exhibits a retinotopic map of the right visual field, and the right area V1 receives input from the left visual field. A retinotopic map is a neighborhood preserving map of the 2-dimensional visual field onto the 2-dimensional surface area of the visual cortex (or LGN layer). A retinotopic map links signals from every section in the visual field to local patches of the cortex, the manifestations of these links being the neural connections from the retina via the LGN to the primary visual cortex. Figure 1.9 displays such a retinotopic map from the right eye of a tree shrew to its left striate cortex.

The receptive fields of most neurons in the striate cortex are assemblies of ganglion receptive fields as several of these ganglion cells merge their excitatory or inhibitory signals via the LGN into V1-neurons as depicted in Figure 1.10. In 1958 Hubel and Wiesel [47] measured with a microelectrode the activity of cells in the striate cortex of

(19)

+

-

+

-Figure 1.10: Simple receptive fields of neurons in the primary visual cortex. The receptive field on the left acts as an edge detector and combines a off-center/on-surrond field and a on-center/off-surrondfield shifted to the right. The receptive field to the right acts as a line detector and consists of two on-center/off-surrond receptive fields.

a cat and revealed that these cells responded only to directed visual clues, such as bright lines in a specific section of the visual field and, more importantly, the cell responses were specific with respect to the orientations of these visual stimuli. In retrospect this finding is not surprising as practically all combinations of ganglion RFs break the rota-tional symmetry as shown in Figure 1.11. Ganglion RFs are isotropic. Receptive fields of so-called simple or complex neurons in the primary visual cortex are not. They are anisotropic and exhibit different excitatory responses to directed visual stimuli with dif-ferent orientations.

Over all, the majority of the 200 million neurons in the primary visual cortex exhibit a specificity in direction. The ones with the simplest response pattern are called simple cells. Their receptive fields are determined by direct or indirect feedforward connections from LGN neurons [77] resulting in receptive fields of about 15¢near the fovea and up

to 3ë at more eccentric locations in the retinotopic map [46]. There are three different

receptive field types for simple cells detecting dark lines, bright lines, or edges. An idealized, but fairly realistic receptive field for line detection is displayed in Figure 1.11. Usually a single simple cells integrates the signal of more than 10 ganglion RFs. The receptive field in Figure 1.11 stems from an ideally aligned and distributed configuration of ganglion cells feeding their excitation into a single simple cell. The resulting RF size is in agreement with the observed relation between ganglion and simple cell RFs [46]. Furthermore, the angular resolution of the directional specificity is 20ë(see Figure 1.12).

(20)

Simple Cell

Receptive field

+

+

+

-

--

-Figure 1.11: An ideal receptive field composition of a simple cell via 10 ganglion cells with circular on-center/off-surrondfields.

(21)

20%

40%

60%

80%

100%

10

o

0

o

-10

o

-20

o

-30

o

20

o

30

o

α

Figure 1.12: The convolution of a straight line with the receptive fields of 10 aligned ganglion cells modeled by the Laplacian of Gaussian kernels (as in Figure 1.11) results in the displayed response curve dependent on the angle Α between line and RF. The distribution width around the optimal response at 0ë

is 20ë

.

area V1 in the visual cortex of tree shrews, which is typically between 40ë

to 60ë

[15]. The most abundant cell in the striate cortex exhibiting directional selectivity is the complex cell. The receptive fields of complex cells are roughly twice as large as those of simple cells. Complex cells react to the same stimuli as simple cells, except that the stimuli have to move, preferable orthogonal to their alignment. It appears, that complex cells are excited by a feedforward scheme of simple cells with delayed lateral connec-tions causing a time-differential as proposed by Barlow and Levick [7]. Obviously not only direction, but also motion appears to be an important image feature. The latter is due to two reasons. Lines or contours may move in the real world or the eyes may move causing the retina to shift relative to the image projection. The latter is constantly hap-pening several times a second in form of tiny movements called microsaccades. These movements are random in all directions and cause an excitation of complex cells even when an observer gazes at a still image.

In this thesis we address 2 and 3-dimensional still images or data sets, obtained via devices that in normal operation do not perform any saccade-like movements. We there-fore assume that we can safely neglect the time-dependence of receptive fields from here on. The encoding and the use of line and contour directions, however, will be the paradigm of the next chapters.

(22)

end-x

y

α

0

o

90

o

-90

o

Figure 1.13: The outline of the letter "G" is lifted into the third, angular dimension of the Euclidean group manifold that encodes the receptive field parameters location and orientation.

stoppedcell. This cell type exhibits the same properties as the regular simple/complex cell, but it is sensitive to the length of the stimulating line segment to one or both sides. Thus, end-stopped cells are capable to detect line endings or sudden turns in lines or edges.

1.2.4

The Euclidean Group Manifold of the Brain

The common property of simple, complex and end-stopped cells is the directionality of their RFs. The neurons in the primary visual cortex are not only characterized by the location of their RF in the visual field, but also by the angle of orientation that their RF exhibits. Hence, direction is an additional degree of freedom in the RF parameter space. The RF parameter space is no longer given by the 2-dimensional visual field as for retinal or LGN-RFs, but by a 3-dimensional Euclidean group manifold, the third dimension being the RF angle ranging from 0ë to 180ë for lines and from 0ë to 360ë

for edges. An example of a Euclidean group manifold is given by Figure 1.13 where the segments of a "G"-outline are lifted into the third, angular dimension according to their orientation. Note, that direction is a cyclic property resulting in a cyclic boundary condition in the third, angular dimension of the Euclidean group manifold. After all, a line segment with a -90ëalignment is equivalent to one with a +90ëalignment.

We will dwell on the mathematics of functions on Euclidean group manifolds in the subsequent chapters. In this section about the physiology of the visual system, we look

(23)

0

o

2

o

4

o

6

o

0

o

2

o

4

o

6

o

8

o

1mm

0

o

45

o

90

o

135

o

180

o

Figure 1.14: The color coded cortical orientation preference map of the striate cortex of a tree shrew consisting of isoazimuth and iso-elevation contours spaced at 1ë

intervals (black lines) indicating the layout of the map of visual field. A redrawing of results by Bosking et al. [10].

at the realization of the Euclidean group manifold in the visual cortex of the brain. How does a 3-dimensional parameter space map into the 2-dimensional cortex that is merely 2mm thick? The solution to this issue is easily explained by looking at the inverse corti-cal map, the mapping that projects the visual cortex into the Euclidean group manifold. Imagine having to cover a finite 3-dimensional volume (spanned by the visual field and the angle of RF-orientation) with a 2-dimensional sheet of the V1 area. One could fold the cortex just like a 2-dimensional piece of cloth and stuff it into the 3-dimensional position-/direction-box. One can then assign every volume element uniquely to a nearby cloth-/cortex-patch. This assignment of cortex patches constitutes an invertible map, the inverse being the cortical map that projects the Euclidean group manifold onto the cortex. Figure 1.14 depicts such a cortical map in the primary visual cortex of a tree shrew [10]. The hue of the color represents the encoding of direction at every location.

The reader should note that a cortical map is not a true point-to-point map. It just determines the rough location of a neuron with given RF parameters. The cortical map in Figure 1.14, for example, does not provide a 90ë

-orientation in the visual field between an isoazimuth and iso-elevation of 1ë and 2ë. Hence, a neuron in this section of the

visual field and a 90ë-oriented RF is projected to nearby blue areas, like the one between

(24)

II and III

IVB

IVC

α

β

left

eye righteye left

eye righteye

orientation columns

blobs

left LGN right LGN

Figure 1.15: A schematic drawing of a 1mm by 1mm patch of the striate visual cortex. Such a patch is called a hypercolumn. The encoding of ocularity and directionality is denoted by different colors. The gray columns in the cortex represent blobs, which are areas that contain color processing neuron with no directional selectivity.

may deviate from its average location is about 0.5mm [10][77][15] and one can verify in Figure 1.14 that every 1mm2 patch contains all colors. Hence, a circular patch of area V1 with a diameter of 1mm encompasses all the neurons that encode information about a position in the visual field. These overlapping patches are called hypercolumns. Cortical maps only structure the visual cortex parallel to its surface, so that every patch consists of all cortical layers, which explains the term column. A schematic drawing of a hypercolumn is displayed in Figure 1.15. The cells in layer IV are innervated by the axons coming from the left and right LGN. This layer therefore display a high ocular selectivity that is gradually lost in the other layers as signals from the left and right eye merge. On the other hand, the receptive fields of neurons in the LGN and those directly innervated by the LGN do not exhibit directionality1. Orientation selectivity occurs a

step later in simple and complex cells located in layers II, III and IVB.

1This does not apply to cats, where the innervation of layer IV neurons by LGN axons already exhibits a directional bias, so that all cortical neurons of V1 respond to directed stimuli[48]. In monkeys and shrews the preference in orientation is due to anisotropic discharge fields from layer IV neurons into layer II and III [77].

(25)

Ocular

dominance

stripes

Figure 1.16: The ocular dominance columns in area V1 of a macaque monkeys visualized by markers that were injected into one eye and transported via the LGN to the cortex. The light bands in this tangential section of the cortex show the places where the marker was located. They reveal the sites of ocular dominance of the injected eye. (Image taken from [83])

Obviously, orientation is not the only property that RFs may exhibit. As just men-tioned, neurons may respond solely to signals from either the ipsi- or the contralateral eye leading to the binary degree of freedom called ocularity. Specitifity to scale, color, motion, direction of motion and spatial frequency add even more dimensions to the RF parameter space. The cortical maps of these higher dimensional feature spaces work in the same manner as for orientation alone [8]. The exception to the rule seems to be scale as neurons with different RF sizes appear to be stacked upon each other in a random fashion so that scale varies perpendicular and not parallel to the cortex surface [46].

The physiological subtleties concerning the generation, the modeling, and the prop-erties of orientation encoding cortical maps have been investigated among others by [9][45][11][25].

Similar considerations regarding cortical maps also hold for the ocular dominance in V1-neurons. Figure 1.16 displays a tangential section of the striate visual cortex of a macaque monkey. The light stripes indicate neurons with innervation from one eye. The stripes are due to a marker that was injected into one eye and traveled along the axons via the LGN into the visual cortex.

The orientation encoding established in area V1 is maintained in subsequent areas of the visual systems, however, less and less prominent as one moves up to higher brain functions.

From a mathematical vantage point we learn from these physiological observations that the state space of image analysis should not just be the 2-dimensional visual field, but

(26)

100µm 25µm 25µm

Figure 1.17: A biocytin injection into horizontal neurons of the striate visual cortex of a tree shrew. The left image shows the biocytin injection site with labeled axons leaving. Several patches are formed by axon arbors. The top right image is a magnification of the injection site. Individual cells that have taken up the biocytin are visible. The bottom right image displays a magnification of axons with labeled boutons. (Image taken from [10]).

the Euclidean group manifold with three dimensions: two for position in the visual field, and one dimension for the orientation of anisotropic image features. Knowing the ball park, what are the rules of the game being played? How is the representation of direction put to use and how is perceptual grouping, postulated by the Gestalt laws, implemented? We again take a look at the physiology of the visual system to learn about the operations that act on the Euclidean group manifold and return to the issue of signal grouping, which was the starting point of this chapter.

1.2.5

Horizontal Connections in the Striate Cortex

If perceptual grouping operations are taking place in area V1, neural connections between hypercolumns should exist. In recent years Fitzpartick et al. at Duke University have performed experiments to visualize horizontal connections in the striate cortex of tree shrews. A biocytin stained horizontal neuron is displayed in Figure 1.17. The axons of the horizontal neuron extend over 2 to 4mm and are, thus, capable to connect several mutually distinct hypercolumns [10][15].

More interesting than the mere existence of these horizontal neurons are the sites they connect to. The boutons on axons are the sites where synapses of other neurons attach and where links to other neurons in V1 are established. Identifying all boutons of a few biocytin injected horizontal cells and superimposing the orientation selectivity

(27)

500µm

0o 45o 90o 135o 180o

Figure 1.18: Bouton distributions in the striate cortex of a tree shrew displayed over a color-coded map of orientation preference. The black dots in the left image represent biocytin labeled boutons that indicate horizontal connections to the neurons with an 80ë

orientation preference at the biocytin injection site shown by white dots. Labeled boutons are found near the injection site with all orientation preferences, but also preferably along the diagonal from top left to bottom right with orientation preference around 80ë

. The right image displays a corresponding result for an injection site at a 160ë

directional preference. (Image taken from [10]).

onto the boutons sites renders information about the functional connectivity in area V1. The results of such an experiment is displayed in Figure 1.18.

The morphology of the striate cortex suggests mutual interaction between aligned simple and complex neurons of similar orientation across a substantial part of the visual field. The nature of these interactions is subject of ongoing research. Recent results by Chisum et al. [15] show a mutual excitation of neurons in layer II/III and an increase in their firing rate by 30% to 600%, if their RF-locations and RF-orientations are aligned. A stimulus that triggers this mutual excitation is given in Figure 1.19A. All other dis-tributions of Gabor stimuli in Figure 1.19 do not lead to mutual excitation. The mutual excitation of V1-neurons in this experiment have most likely been conveyed via the hori-zontal neurons displayed in Figure 1.17 and 1.18, since the Gabor stimuli were spaced up to 10ë

apart from each other to avoid the excitation of overlapping hypercolumns. Look-ing at the wirLook-ing of horizontal connections in Figure 1.18 this result is not surprisLook-ing. A few subtleties, however, are.

(28)

A

B

C

D

Figure 1.19: These colinear distributions of Gabor stimuli were presented to tree shrews in the experiments by Chisum et al. [15]. The arrangement A results in strong mutual facilitation of V1-neurons presumably via the horizontal neurons shown in Figure 1.17, since the stimuli were separated up to 10ë

to avoid the excitation of overlapping hypercolumns in area V1. Arrange-ments B, C, and D do not lead to any increase in the neural response compared to those of single Gabor stimulus.

(29)

limited to cells that receive input from layer IV. The gaps between the Gabor stimuli in Figure 1.19A are not filled in. No illusionary contours or lines are formed as simple and complex cells seem to be strongly gated by incoming signals from the LGN. Even the opposite appears to be the case. Layer II /III cells between Gabor stimuli are slightly inhibited, which may be due to an overall isotropic antagonism between simple and com-plex cells. Physiological evidence for the latter has been gathered by van der Smagt et al. [107] in the striate cortex of rhesus monkeys. The loss of saliency of the circle in Figure 1.5 is also an indication for the inhibitory effect of neighboring oriented cues that are not aligned.

These findings are in agreement with experimental results by Ramdsen et al. [90] who detected cells in the visual cortex area V2 of macaque monkeys that do respond to illusionary contours and line completion, and who reported the opposite behavior for neurons in V1. Hence, it seems that area V1 encodes the factual contours and lines, whereas adjacent area V2 is the space for illusionary lines and gap completion.

In any case, horizontal connections in area V1 favor the perceptual grouping of aligned cues with equal orientation, which is the physiological manifestation of the Gestalt law of good continuation for lines and contours.

In summary:

• Visual information is extracted from an image on the retina via receptive fields that capture local isotropic contrast changes.

• Signals from the isotropic receptive fields are merged by simple and complex cells in the striate cortex to directed receptive fields that are tuned to edges and lines. • Anisotropic receptive fields exhibit besides their location and among other

proper-ties their orientation as an additional parameter resulting in a cortical map of the Euclidean group manifold.

• Horizontal neurons in layer II/III of area V1 link simple and complex cells that react to different sites in of the visual field. Neurons with aligned RF-sites and RF-orientations excite each other mutually exceeding an overall isotropic and in-hibitory masking of neighboring cells.

The physiological facts above will serve as beacons in the construction of corre-sponding mathematical models. Our models will be a priori and will be built on simple assumptions. Hence, the conception of these models will not be (image-) data driven, which may be viewed as a weak aspect. However, even the brain seems to apply a priori, genetically given models in the visual system. White et al. [116] have shown in ferrets that sensory experience is favorable but not the sole cause for orientation selectivity. Also genetic factors seem to drive the formation of cortical maps that project the Euclidean group manifold onto the striate cortex.

(30)

1.3

Related Image Analysis Methodologies and Their

Pur-pose

Most of the physiologic findings listed in the last section, even the recent ones, have been anticipated. Many related ideas in image analysis have already been put to use. Prominent examples with respect to orientation are non-classical receptive fields, tensor voting[74], stochastic completion fields[118], and Markov random fields[34]. We will introduce these methodologies in due time, as we establish a mathematical framework that is a common denominator for all these image analysis tools. The outline of the framework and the thesis is given in the next section, but what is the purpose? Why is grouping a big issue?

The benefit of successfully grouping contour and line segments is manyfold. It is obviously an advantage to have a consistent drawing of a scene or image. It facilitates the segmentation task. It allows the tracking of simple lines, outlines or tree like objects, and it can improve the noise to signal ratio and robustness in image processing.

The latter, less obvious point, is easily explained. A single signal is more likely to be obstructed by noise than an ensemble of signals, since the average of an ensemble is less susceptible to random influence. So, if it is possible to combine an ensemble of n signals and to attribute them to the same source, one can obtain a better overall measurement by taking the mean. This reduces the non-systematic error due to noise by 1/

0

n. So, the remaining, non-trivial issue in this scheme is the grouping. Only the grouping of signals belonging to the same source will yield robust measurements. The wrong grouping will produce bias and wrong results.

Inventions are driven by problem solving. The same holds true for this thesis, which was in part triggered by an assignment from Philips Medical Systems to design an algo-rithm that detects electrophysiology catheters in noisy fluoroscopy images [27]. Cardiac electrophysiology procedures are performed under the surveillance of X-ray fluoroscopy imaging systems. Fluoroscopy images generated by a low X-ray flux tend to be noisy and provide only poor image quality. An image analysis tool that can detect catheters in highly deteriorated fluoroscopy images is therefore desirable.

To overcome noise, we had to successfully group the dark pixels of catheter shad-ows, see Figure 1.20. To decide locally wether a single pixel or a small group of pixels is part of the elongated shadow of a catheter turned out to be practically impossible. The surrounding, the context of the local measurement, had to be taken into account. Fig-ure 1.21 demonstrates this observation. Thus, we had to investigate the relation between line segments and their surrounding, which eventually led to the development of line and contourmodels. The master thesis [27] of Eric Franken describes a first and quite successful attempt to solve the EP catheter problem.

(31)

ECG sticker

EP catheter tip

Contour of the heart

EP catheter tip

ECG sticker

Figure 1.20: The fluoroscopy image of an electrophysiology catheter intervention at the heart.

Figure 1.21: Both images are the same excerpt of a fluoroscopy image of an electrophysiology catheter. The "keyhole" view on the left obstructs the visual context so that the line shadow of the catheter is virtually invisible. With the context on the right one can easily detect the catheter.

(32)

1.4

Outline

It is not uncommon that the objective of a doctoral thesis changes over the years. The original assignment had been the development of a fast prototyping environment for biomedical image analysis based on the computer algebra system Mathematica. This ongoing project, a library of elaborate image analysis commands, now called MathVi-sionTools, is a technical support project. Unfortunately, the collection and implementa-tion of numerical as well as symbolic image analysis routines has not been suitable for scientific publications. To increase the academic impetus of this thesis the objective had to change.

The main content is now an in-depth analysis of existing algorithms and methodolo-gies and an emerging mathematical framework that encompasses several image analysis tools most notably tensor voting by Medioni et al. [74], splines and elastica [26][80], steerable filters [31], and stochastic completion fields [80] [118]. The result is an in-depth investigation into the underlying applied mathematics with several new findings and less an application oriented, engineering approach.

The reader should also note that some of the mathematical concepts discussed here are too bulky and demanding to find their way directly into fast and efficient computer programs. Nevertheless, the mathematical framework helps to understand the basic con-cepts. Furthermore, it can be used for applications by extracting the essentials, by sim-plification, and by cutting corners as has been done in some of the image analysis tools listed above.

Mathematical tools, such as group theory, representation theory, and stochastic cal-culus, do not belong to the standard repertoire of image analysis. We therefore provide a few short introductions to these elaborate theories, but limit them to the essentials needed here. As a result, the thesis is in some parts more detailed than is to be expected. The idea is to use these parts as didactic references. Furthermore, all routines discussed here have been implemented in MathVisionTools, which we introduce in the last chapter of this thesis.

The path to the last chapter about MathVisionTools follows the processing steps of the visual system as summarized in this introduction. Images on the retina are analyzed by neurons with receptive fields capable of detecting edges or lines. These receptive fields are parameterized by location, orientation and scale. Theories that deal with re-ceptive fields varying in scale and location are scale space [49][61] and wavelet theory [66]. Here we focus on location and orientation, the latter parameter being important for contours and lines. The responses of filters that only differ in location and direction lead to real-valued response functions defined on the Euler group manifold. We call these orientation scores. The second chapter deals with orientation scores. Not all filters ren-der orientation scores that are invertible and that allow the reconstruction of the original image. We therefore provide criteria for so-called admissible filters, which collect all image information and which do render invertible orientation scores. The admissibility

(33)

constraint requires some group theory and representation theory, which is presented in the course of the derivation. We, thus, gain a mathematical understanding of the encoding of image features by location and orientation as observed in the hypercolumns of V1.

The following three chapters deal with mathematical concepts that try to explain and mimic the horizontal connections of neurons in the visual striate cortex. The most promi-nent and simplest features with location and orientation are lines and contours. Local line and contour segments that are captured by filters do extend along their orientation. This leads inevitably to highly correlated line and contour responses of neighboring filters. We believe that these correlations are exploited by the horizontal, neural connections displayed in Figure 1.18. We therefore develop models for line and contour propagation that quantify the correlations of neighboring line or contour segments and relate these to simple line and contour properties.

We choose in the third chapter an axiomatic approach. Just a few a priori assump-tions allow the derivation of a unique stochastic process on the Euclidean group manifold modeling the propagation of lines and contours with only three parameters (neglecting scale): elasticity, geodicity, and inherent angular drift. The proposed line model is inde-pendent of the source that causes or generates the line or contour in the image. Models that incorporate knowledge about the source rely on insight, that the visual system simply does not have in its early processing phase. We therefore utilize only general principles, such as the Gestalt laws, to boot the recognition process. Our axiomatic Ansatz does not rule out the existence of feed-back loops that incorporate more elaborate line and contour models based for example on shapes as proposed by Pasupathy et al. [84]. Nev-ertheless, we consider these a posteriori. The result of our a priori axiomatic approach is a Markov process and its corresponding Fokker-Planck equation, which defines the Green’s function, point spread function, or propagation kernel of lines and contours.

In the fourth chapter we derive the first feasible results from the stochastic Markov process of chapter three. We translate the Markov process into a Lagrangian density of variational calculus and solve for the extremal path. The extremal path represents the optimal connection of two line or contour segments. We investigate the properties of the extremal path to proof and to disproof some assumptions found in the image analysis literature. In particular, we furnish a formula for the optimal angle of incidence, with which a line may reach a fixed position after starting at a given point with given direction. The fifth chapter deals with the solution of the Fokker-Planck equation. We manage to provide a closed, analytic solution in Fourier space, from which we can derive the symmetries and the statistical moments of the Green’s function in position space. The exact solution allows the verification of approximations that are easier to solve and to evaluate. We, thus, derive and discuss the Green’s function of cubic splines in position as well as in Fourier space. A slightly more accurate approximation is based on cubic splines in polar coordinates.

The Green’s function of chapter five is applied in chapter six to orientation scores via Euclidean group convolutions. These convolutions require the translation and rotation of

(34)

the kernel. To facilitate this task one can decompose the kernel into irreducible compo-nents under rotations. This mechanism is known in image analysis as steerable filters. We provide the equations for steerable kernels of Euclidean group convolutions in two and three dimensions.

Chapter seven demonstrates three simple application examples of all the tools in-troduced in the previous chapters and discusses the relation of tensor voting, stochastic completion fields and orientation scores.

The implementations of all the algorithms used in this thesis have been collected in the Mathematica addon MathVisionTools. A short introduction to the concepts and principles of the addon and a list of relevant commands is given in the last chapter.

(35)
(36)

2

(37)

In the introduction 1.2.4 we outlined the encoding of anisotropic image features via hy-percolumns in the visual cortex of the brain. This chapter is devoted to the corresponding mathematical construct: the mapping of anisotropic image features into the space of com-plex functions on the Euclidean Lie group manifold. In short, we call this function space the Orientation Score.

Just as legs do not resemble wheels, an orientation score does not resemble the cor-tical orientation preference map of the striate cortex, but the purpose and functionality is the same. The purpose is to establish a space for image features that exhibits loca-tion as well as orientaloca-tion and to provide a well-defined map that projects 2-dimensional images into the 3-dimensional orientation score. Operations within the 3-dimensional orientation score are treated in the following chapters.

Maps of one space into another are common tools in mathematics. The Fourier trans-formation F converting functions f (x) on position space into functions ˆf (Ω)on frequency space is a well-known example. In this chapter we establish a similar transformation that takes a 2-dimensional image luminosity function into a 3-dimensional feature re-sponse function on the Euclidean group manifold. We will investigate the oriented image features suitable for such a transformation and uncover the properties that such a trans-formation exhibits. In particular, we will verify if it is measure preserving, Euclidean invariant, and invertible.

The issue of this chapter is rather technical and not easily accessible if treated with mathematical rigor. Remco Duits, in collaboration with whom this work has been done, has already attended to most of the mathematical subtleties of orientation scores in his thesis[20][23]. Here we attempt to provide the reader with an easier, less technical intro-duction to the ideas and mathematical concepts of orientation scores.

The line of thought in this chapter is as follows. Image features are usually captured by test functions or filters that probe for a particular luminosity distribution. To do so everywhere, one moves the test function around to cover all locations within an image. These translations lead to correlation and convolutions of filters with an image luminosity function and the resulting filter responses are parameterized by the parameters of the translation group. For an orientation score we consider anisotropic test functions and not only translate but also rotate these. We thereby extend the group of translations to the Euclidean group of translations and rotations.

To find out how much information is been gathered by a test function and its trans-lated and rotated siblings, we turn to linear algebra and treat the image as well as the test function as vectors in a high, potentially infinite-dimensional vector space. To vi-sualize the concept, we start with a 3-pixel image that is represented by a vector in a 3-dimensional vector space. We then explain how transformed copies of a test function can generate a complete basis of the vector space and, thus, capture all components of an image vector.

The generation of a complete, possibly even redundant set of basis vectors by trans-formed copies of a single test function is the key issue in this chapter, for which we have

(38)

to take a short plunge into group theory and explain the concept of irreducible group representations in a vector space. Transformations and their concatenations constitute a group. The unitary matrices of a vector space and their matrix product can be used as a representation for the group elements (transformations) and the group product (concate-nation). The matrices that represent the transformation act on the vectors in the vector space and rotate these into one another. But a given group may not be large enough to rotate every vector into every other vector. Starting with an arbitrary vector, the repre-sentation matrices of a group may only rotate this initial vector within a linear subspace, so that all vectors perpendicular to that subspace are inaccessible. Thus, a vector space is divided by a group representation into invariant subspaces. Invariant subspaces that one cannot divide any further are called irreducible subspaces. Consequently, a group repre-sentation does not provide matrices that can take an element of one irreducible subspace into another irreducible subspace.

Thus, a test function that is a vector within an irreducible subspace will never give rise to transformed copies perpendicular to that irreducible subspace. A single test function and a group of transformations can only generate a set of basis vectors for the irreducible subspace in which the test function is located. Probing an image vector with such a set of transformed test functions can only reproduce the projection of the image vector onto that irreducible subspace. Every image component perpendicular to it will be lost. Hence, to construct an orientation score that captures an image completely, one has to ensure a test function that exhibits components in all irreducible subspaces. These are called admissibletest functions. Furthermore, orientation scores of admissible test functions allow the complete reconstruction of images. They are invertible.

The invertibility of orientation scores is of great benefit for image analysis. It allows, very much like a Fourier transformation, the construction of image analysis operations in a space that is more suitable. It is now possible to process image features with an explicit orientation and location-encoding. After processing one can now take the results and convert them back into an image, a process, that of course has no analogy in nature. After all, there is no need to project the images in your mind back onto the retina of your eye. But for us to see the computer’s visual cortex at work, it is a indispensable tool.

This work on orientation scores was sparked by an article [54] by Kalizin et al., which introduced a particular case of an orientation score. In this thesis we generalize his approach similar to the standard theory of wavelets [66], which you obtain when considering translations and dilations instead of translations and rotations.

2.1

Correlation and Convolution

In the visual system of humans and mammals an image of the outer world is projected on the retina where it is sampled by receptive fields. In mathematics this translates to a

(39)

luminosity distribution L(x) being sampled by a test function1

Y*(x). Just as bipolar and ganglion cells accumulate the excitatory and inhibitory signals of photoreceptors, one integrates over the luminosity function L(x) weighted according to test function Y*(x).

RFY[L] := à Y

*

(x) L(x) dx . (2.1)

Obviously, a single receptive field nor a single test function will be sufficient to probe a complete image. Instead an image needs to be sampled by a large number of test functions to extract all the necessary information. Usually one obtains these extra test functions by taking copies of the original test function and by moving them to other locations in the image. After all, the retina is also covered by look-alike receptive fields. In mathematical terms, we take copies of the test function and translate them with a translation vector b.

Tb[Y] (x) := Y(Tb-1(x)) = Y(x - b) . (2.2)

Inserting a translated test function Tb[Y] (x)into equation (2.1) yields the well-known expression for the correlation ø and convolution * of kernel Y (respective F) with im-age L.

RFT

b[Y](x)[L] = à Tb[Y] (x) L(x) dx

= à Y(x - b) L(x) dx

= (Y ø L)(b) for correlation, (2.3) and by redefining the kernel/test function F(x) := Y*

(-x)we obtain

= à F(b - x) L(x) dx

= (F * L)(b) for convolution. (2.4) The sampling of image data by a test function2 via convolution is omnipresent in

image analysis, since any linear, translation-invariant operation on an image can be ex-pressed as a convolution.

2.2

Translation Invariance

Translation invariance is a desirable property for most image analysis tools. It guaranties a process P with no bias for any location in an image L. Under translation invariance, 1Usually real-valued test functions are sufficient. However, later in this chapter we will construct complex linear combinations of test functions due to group theory, so that we introduce the Hermitian product right at the start. The complex conjugation can be neglected for real-valued test functions.

(40)

processing an image at location A is equivalent to performing the same process at another location B by first shifting the image to B, processing it at site B, and shifting the result back to location A. The corresponding commutation diagram with translation vector b¢

= --® ABis L ---Tb¢ ® Tb¢[L] ½ ½ ½ x P ½½½ x P P [L] T -1 b¢ ¬--- PATb¢[L]E (2.5)

Starting in the commutation diagram at the top left and proceeding clockwise one per-forms the translation Tb¢, the translation-invariant process P, and the translation back T

-1 b¢ , which is the same as proceeding on the left side, straight down via process P. Hence,

Tb-1¢ ëP ë T

b¢ = P . (2.6)

Concatenating Tb¢ from the left to both sides of equation (2.6) renders the commutation relation (2.7), which is another form to express the invariance.

AT

b¢ , PE := Tb¢ ëP - P ë Tb¢ = 0 . (2.7) It is a simple exercise to show that correlation (2.3) and convolution (2.4) are translation invariant.

Tb¢BRF

Tb[Y](x)[L]F = RFTb[Y](x)ATb¢[L]E . (2.8)

We will furnish a proof for a more general case further below.

2.3

Transformations of Receptive Fields

The translation Tb, with which we altered the location of the original test function Y(x), is just one of many possible transformations. Translation Tb is sufficient to move a test function Y(x) to every location x in an image domain, but the test function may depend on more parameters than just position. Receptive fields of ganglion cells, for example, come in different sizes. So accordingly, one can change the size of a test function by dilation Da, a > 0 being the dilation parameter and n being the dimension of the image data set.

Da[Y(x)] := 01 anY K

x

aO . (2.9)

This Ansatz leads to the theory of wavelets [3][66], which is well established and which we will not pursue here.

Simple and complex cells in the visual cortex exhibit anisotropic receptive fields. Hence, anisotropic test functions are relevant, their orientation Α is an extra parameter, and the rotations of these anisotropic test functions becomes necessary. After all, one wants to use an edge-detecting test function to capture contours of all orientations Α

(41)

x

y

α

by

bx

Figure 2.1: A convolution (2.12) of the image on the left with the kernel displayed in the middle renders an orientation score depicted by 8 discrete layers on the right.

and not just contour segments along an initially given direction. We therefore extend the group of translation-transformations Tb by rotations RΑand obtain the group of Eu-clidean transformations E=Tb ëRΑ.

2-dimensional test functions are rotated and translated according to

(Tb ëRΑ)[Y(x)] := Y IR-1Α (x - b)M , with RΑ =

K

cos Α sin Α -sin Α cos Α

O

. (2.10)

Substituting the Euclidean transformations into equation (2.1) we obtain

RFE

bΑ[Y][L] = à E[Y(x)] L(x) dx

= à Y(R

-1

Α (x - b)) L(x) dx for correlation, (2.11)

and again by redefining the kernel/test function F(x) := Y*

(-x)we obtain

= à F(R

-1

Α (b - x)) L(x) dx for convolution. (2.12)

Note, that the receptive field response RFE

bΑ[Y][L] is now a function of translation vector b Î R2and rotation angle Α Î S. The function domain is the Euclidean group

manifoldR2

´S. In agreement with [20] we call functions l : R2´S ® R or C on the Euclidean group manifold orientation scores.

(42)

p

3

p

1

p

2

|L

L

1

L

2

L

3

image L

image vector |L

Figure 2.2: The vector space of a simple 3-pixel image L depicted on the left. The luminosity values of the image are the components of image vector |L\ with respect to the "pixel basis" |p1\, | p2\, |p

3\.

2.4

Probing and Expressing Images

by Test-Functions

Before we can utilize orientation scores for image analysis, we have to verify what and how much of an image is captured by transformation RFE

bΑ[Y] in equations (2.11) and

(2.12). How much of an image is gathered by a test function Y and its transformed sib-lings E[Y]? To answer this question we need to view the image luminosity function L(x) and the test function Y(x) as vectors3

|L\and |Y\ of a high or infinite-dimensional vector space V . Viewing images as continuous luminosity distributions or functions leads to a function space of infinite dimensions. Viewing a digital image as a finite array of pixel values results in a finite-dimensional vector space.

In equation (2.1) test function Y*

(x)is not used as a vector but as a linear form, an element XY| of the dual vector space V*. Thus, the integral (or respective sum) in

equa-tion (2.1) represents the scalar product XY | L\ : V*

´ V # C. We skip here the mathemat-ical subtleties associated with infinite-dimensional vector spaces, since a proper account of this issue would be a substantial distraction from the answer to the posed question. For a thorough mathematical treatment see [20].

To visualize the problem at hand we start with an extremely simple image consisting of just three real-valued pixels. The luminosity values L1, L2, and L3of the three pixels 3Since we treat images like quantum-mechanical wave-functions as vectors of a Hilbert space, we adhere to Dirac’s bra-ket-notation [18]. A vector is given by the ket-notation |\ and a linear form, an element of the dual vector space, is given by the bra-notation X|. The scalar product of the two is then intuitively given by a bra-ket X | \.

(43)

are the coefficients of the image vector |L\ with respect to the orthonormal "pixel basis" vectors |p1\, |p

2\, and |p

3\of this 3-dimensional image vector space V =R3.

|L\ = L1| p1\ + L

2| p2\ + L

3| p3\. (2.13)

A test function Y(x) is then a 3-dimensional linear form XY| and equation (2.1) amounts to the scalar product RF|Y\[|L\] = Y1L1+ Y2L2+ Y3L3 = XY | L\. Finally, we have to in-troduce a group of transformations Uq with transformation parameter q that alters the test-vector |Y\. In this example we choose Uqto be the group of 30ë-rotations around the

p3-axis. Uq[|Y\] = |UqY\ = I| p 1\ | p 2\ | p 3\M× æ ç ç ç ç ç ç è cos (q Π/ 6) sin (q Π/ 6) 0 -sin (q Π/ 6) cos (q Π/ 6) 0 0 0 1 ö ÷ ÷ ÷ ÷ ÷ ÷ ø «¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬­¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬® Uq × æ ç ç ç ç ç ç è Y1 Y2 Y3 ö ÷ ÷ ÷ ÷ ÷ ÷ ø , with q Î {0, 1, 2, 3, 4, 5} . (2.14) Note, that we have taken an orthogonal UTq × Uq = 1(or in the case of complex matrices an unitary UÖ

q× Uq = 1) rotation matrix as transformation Uq to preserve the

norm XY | Y\ of test-vector |Y\, since XUqY | UqY\ = XY | (UÖq× Uq)Y\ = XY | Y\.

Probing an image |L\ with a test-form XY| as in equation (2.1) is nothing else but evaluating the scalar product XY | L\, the projection of |L\ onto test-vector |Y\. Expressing an arbitrary image vector |L\ in terms of projections onto transformed test-vectors, or rather transformed linear forms XUqY|, works well, if these test-vectors form a complete set of basis vectors. Test-vectors that span a subspace can only capture the projection of |L\onto that subspace. Any information about the component of |L\ orthogonal to that subspace is then lost.

With our example we can demonstrate that not only the transformations Uqdetermine the set of generated test-vectors, but also the original test-vector |Y\ itself. If we had taken arbitrary rotations around all possible axis for the transformation of test-vectors into account, we could have transformed any initial test-vector into any basis vector. The generation of a complete set of basis vectors would have been trivial. However, with a limited set of transformations like Uqin equation (2.14), only certain, so-called admissibletest-vectors allow the generation of a complete basis. If we take |Y\ = |p3\, we fail miserably, as no rotation around the p3-axis will alter |Y\. If we take |Y\ = |p2\, we do better. As shown in the left of Figure 2.3, we obtain a bouquet of test-vectors that spans the p1-p2-plane. This bouquet of test functions is a 6-fold orthonormal basis of the p1-p2-plane, so that one can expand at least the projection |L^ p

3 \of |L\ in that subspace as |L^ p 3 \ = 1 6 11 â q=0 XU qY | L\ |UqY\ , for |Y\ = |p2\ . (2.15)

(44)

p3 p1 p2 300̊ |L |Ψ  30̊ |Ψ  p3 p1 p2 |L 120̊ |Ψ  240̊ |Ψ |Ψ 

Figure 2.3: The left figure shows the transformation of a test-vector |Y\ = |p2\via 30ë-rotation around the p3-axis. All transformed test-vectors stay within the p1-p2-plane and do not form a complete set of basis vectors. The right figure displays the same scenario for test-vector

0

2/ 3 |p2\+ 0

1/ 3 |p3\. Here the same transformations generate four times an orthogonal basis of the complete vector space.

A better choice for a test-vector is |Y\ = 0

2/ 3 |p2\+ 0

1/ 3 |p3\. The transformations of this test-vector result in a 4-fold set of orthonormal basis vectors as depicted in the right of Figure 2.3. A complete expansion of |L\ in this basis is

|L\ = 1 4 11 â q=0 XU qY | L\ |UqY\ , for |Y\ = 0 2/ 3 |p2\+ 0 1/ 3 |p3\ . (2.16)

Decisive in the above example are the invariant linear subspaces under transforma-tion Uq. These are, in the case of our real vector space V = R3, the p

1-p2-plane and

the p3-axis. Uq can rotate any vector such that it projects onto any other vector, but only within the respective invariant subspaces. So, if a test-vector |Y\ is orthogonal to an invariant subspace W , it will stay orthogonal under transformations Uqand it will never generate basis vectors that span W . Conversely, if a test-vector |Y\ exhibits a component in each Uq-invariant subspace of V , transformation Uq can take these components and generate a complete basis for each subspace, and consequently a complete basis for the entire vector space V .

This simple, but instructive example has illustrated several aspects of the theory to come.

• The image luminosity function L and test function Y can be expressed as vectors |L\ and |Y\ in a function space V with a hermitian inner product (2.1).

Referenties

GERELATEERDE DOCUMENTEN

Churchill continued to seek ways of asking Stalin to allow British and American aircraft, flying from Britain, to drop supplies on Warsaw and then fly on to Soviet air bases to

to produce a PDF file from the STEX marked-up sources — we only need to run the pdflatex program over the target document — assuming that all modules (regular or background)

Experiment 2 — detection on 1,000 new repositories Since we had previously established in Replication that the existing detection techniques were able to accurately detect the usage

This study included the independent variables; discrepancy author, setting, imaging modality, discrepancy discovered by, time of assessing/ reporting imaging,

In dit onderzoek wordt van een onderwijsachterstand gesproken wanneer leerlingen door bepaalde omgevingskenmerken slechter presteren op school, gemeten met behulp van de

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:.. • A submitted manuscript is

H6: The more positive the party position of SDPs on a) immigration, b) cultural liberalism and c) European integration, the larger the effect of people’s attitudes on these issues

First, the causal relationship implies that researchers applying for positions and grants in their organisational career utilise these as resources for the enactment of scripts