Individual Tree Crown detection in UAV remote sensed rainforest RGB images through Mathematical Morphology

(1)

MSc Artificial Intelligence

Track: Intelligent Systems

Master Thesis

Individual Tree Crown detection in UAV remote sensed

rainforest RGB images through Mathematical Morphology

by

Eugenio Di Leo

10426191

July 28, 2017

42 ECTS 2016-2017

Supervisor:

dr ir L Dorst

Assessor:

dr J Van Gemert

(2)

(3)

Abstract

The aim of this thesis is to develop a way of assisting experts to perform Individual Tree Crown (ITC) detection from rainforest images taken by a Un-manned Aerial Veichle (UAV) with a consumer RGB camera, operated by peo-ple from the environmental organization Conservation Drones. Automating ITC detection in this setting poses two different challenges. On the one hand the ITC detection in rainforest is still an open problem due to the intrinsic char-acteristics that rainforests have compared to more regular coniferous forests. On the other hand, ITC detection has generally been tackled only with the help of multispectral imagery and/or depth maps generated by active remote sensing. In this thesis, a way to obtain a fair segmentation of the tree crowns us-ing only the RGB images is presented. First the characteristics that make tree crowns separable by the human eye, using only visible light, are explored. Then the illumination model of the tree crowns is studied and the noise that affects this model is described. The noise has two different aspects, and thus is reduced with two different methods. First a standard gaussian convolution is carried out, to reduce the leaf-level noise. Then the Mathematical Morphology framework is employed in order to reduce the more morphological branch-level noise.

The processed images are segmented using the watershed algorithm, also de-rived from the Mathematical Morphology framework. The result of the segmen-tations are encouraging, but heavily dependent on the scale parameters choice. Overall, when the set op optimal parameters is found, it can be employed to obtain fair results in all other images coming from the same flight. A possi-ble automation for scale selection is investigated, without yielding satisfactory results.

(4)

List of Figures

1.1 Conservation Drones founders. . . 2

1.2 Example of remote sensed image from Conservation Drones. . . . 3

2.1 Crown models examples. (Pollock, 1996) . . . 7

3.1 Example images from the two Datasets. . . 12

3.2 Spectral signatures of three different plants. . . 13

3.3 NDVI example image. . . 14

3.4 Forest patches representations obtained through LiDAR . . . 14

3.5 Two trees that present widely different scales. . . 17

3.6 Different trees with different textures. . . 18

3.7 Grayscale and inverted grayscale versions of a tree crown. . . 19

3.8 The RGB cube. . . 20

3.9 The HSV cylinder. . . 20

3.10 RGB and Hue comparison . . . 21

3.11 Hue channel of a Dataset 2 image. . . 22

3.12 RGB and Saturation comparison. . . 22

4.1 Oblate and prolate spheroids. . . 24

4.2 Scan lines of smooth spherical surfaces . . . 25

4.3 A typical tree crown scan line. . . 27

4.4 Scan lines of dented spherical surfaces. . . 28

5.1 Dilation of a square by a disk. . . 31

5.2 Erosion of a square by a disk. . . 31

5.3 Opening of a square by a disk. . . 32

5.4 Closing of a shape by a disk. . . 33

5.5 Umbra of a one dimensional signal. . . 34

5.6 Histogram of a patch of leaves, original and after Gaussian blur. 36 5.7 Gaussian smoothing, with σ = 10. Image and scan line. . . 37

5.8 Gaussian smoothing, with σ = 10. 3D topographic map. . . 37

5.9 Scan lines comparison between original image and subsequently Gaussian smoothed image. . . 38

5.10 Parabolic structuring element. . . 39

(7)

5.12 Morphological closing with parabolic structuring element. Image

and scan line. . . 40

5.13 Morphological closing with parabolic structuring element. 3D topographic map. . . 40

5.14 Scan lines comparison between Gaussian smoothed image and subsequently morphologically closed image. . . 41

5.15 Morphological opening with flat disk shaped structuring element. Image and scan line. . . 42

5.16 Morphological opening with flat disk shaped structuring element. 3D topographic map. . . 42

5.17 Scan lines comparison between morphologically closed image and subsequently morphologically opened image. . . 43

6.1 Approximated decomposition of cylindrical structuring element. . 45

6.2 Good segmentation of an image from Dataset 1. . . 46

6.3 Good segmentation of an image from Dataset 2. . . 47

6.4 Severe undersegmentation of an image from Datatset 2. . . 47

6.5 Severe oversegmentation of an image from Datatset 2 . . . 48

6.6 Other two segmentations from Dataset 1. . . 49

6.7 Other two segmentations from Dataset 2. . . 49

6.8 Well segmented crown from Dataset 1. . . 52

6.9 Undersegmented segmented crown from Dataset 1. . . 52

6.10 Well segmented crown from Dataset 2. . . 53

6.11 Undersegmented segmented crown from Dataset 1. . . 53 7.1 Oversegmented crown, with highlighted patches that belongs to it. 56

(8)

Chapter 1

Introduction

1.1 Remote sensing for forest surveying

In the last decades, forest management has become more and more important, as the green lungs of the planet are becoming more and more threatened. A great help to forest management and surveying has been given by the techno-logical advances that have been made in remote sensing. Generally speaking, remote sensing techniques are those practices that allow the remote acquisition of information about an object. Remote sensing, not requiring any real contact with said object, is the opposite of on-site observation.

Nowadays, with remote sensing we usually refer to aerial information ac-quisition, by means of aircraft or satellite sensors. Remote sensing can be thus divided in two main areas: active remote sensing, where a signal is emitted from the surveying machine and, after reflection upon the object of interest, is picked up by the sensor, eg. LiDAR (see Section 3.1.2), and passive remote sensing, where the signal detected by the sensor is just the reflection of the elec-tromagnetic radiation coming from the Sun. Applications of remote sensing are various, but most of them are related to Earth sciences like geology, hydrology, ecology and oceanography.

Remote sensing is thus widely used to assist forest surveying methods. For-est surveying consists, as the name implies, in periodical information gathering about a forest, while monitoring the changes it goes through. The information obtained is usually about the number, height and size of trees, and their growth rate and age. Forest surveying also deals with wildlife conservation tasks, mon-itoring the health of the vegetation, the insurgence of pests or diseases, or the changes caused by illegal logging and poaching.

(9)

1.2 Conservation Drones

Conservation Drones is a non-profit environmental organization founded by con-servation ecologist Lian Pin Koh and primate biologist Serge Wich (see Fig-ure 1.1). The aim of the organization is to employ inexpensive Unmanned Aerial Vehicles (UAVs, or more simply drones) for applications related to eco-conservation. While in the beginning Conservation Drones operated only in the North Sumatra region in Indonesia (the area also used in this specific thesis), it has now grown to be a worldwide organization, with tasks as wildlife monitor-ing, loggers and poachers trackmonitor-ing, and geographical mapping.

Figure 1.1: Conservation Drones founders: Lian Pin Koh (left) and Serge Wich (right) with a drone.

Their goal to perform efficient forest surveying, while maintaining extremely low costs, poses many challenges. Most of their equipment is composed of hand-made or consumer edition technology, keeping the use of expensive professional

(10)

tools to the minimum. One of their objectives, in the forest surveying context, is counting the number of trees that appear in the remote sensed images. This translates in performing a good tree crown detection task, without the state of the art equipment. An example remote sensed image from Conservation Drones is shown in Figure 1.2.

Figure 1.2: An example of a remote sensed image from Conservation Drones.

1.3 Aim of the project

The aim of this project is to develop a method for automatically detecting and localizing individual tree crowns in images taken in a specific setting. The images, provided by the Conservation Drones environmental organization, are taken by a drone flying over the North Sumatra rainforest. The drone was equipped with an inexpensive consumer compact camera, with a standard RGB sensor.

The challenges of this problem are mostly related to: the scarce information provided from simple RGB images, lacking any kind of hyperspectral or depth information; the topology of the rainforest, completely different from the more regular (and already largely explored) plantations or orchards; the shape and variety of the tree species in the rainforest, with a geometry of the tree crown less exploitable than the one of conical shaped conifers. As the visible light reflection is the only real information the dataset provides us with, we attempt to model the tree crown geometry and illumination. We then aim to model

(11)

the noise patterns, that result to have a double nature, both stochastic and morphological. To reduce the noise we then make use of tools taken from the Mathematical Morphology theory and framework.

1.4 Outline of the thesis

The thesis is organized as follows: in the remainder of this Chapter we introduce the basic concepts of remote sensing and we give the necessary background information on the aforementioned environmental organization; in Chapter 2 we present other methods and works related to individual tree crown delineation; in Chapter 3 we analyze the problem, pointing out the aspects that make the problem different from the ones already solved in literature; in Chapter 4 we model the problem; in Chapter 5 we propose a way to reduce the noise in the model; in Chapter 6 we show the results of the experiments; in Chapter 7 the final conclusions are drawn.

(12)

Chapter 2

Related Works

2.1 Introduction to Tree Crown Detection

In this Chapter we present previous approaches and methods related to the individual tree crown delineation problem. According to Brandtberg (1999), although the first aerial photograph ever taken is dated back in 1856, it wasn’t until the first World War (1915) that fixed cameras on planes have been used for remote sensing tasks, specifically for espionage. In the 1920s Sweden launched a pioneering project in remote sensing forest mapping, and in the 1940s the human interpretation of aerial images for forest surveying purposes started.

In the last two decades, with the increase of the computational power and especially the resolution and portability of photographic equipment, many tech-niques to automatically retrieve the position and the contours of individual tree crowns from aerial images have been developed.

2.2 Previous works on Tree Crown Detection

In this Section we describe the algorithms and methods that have been proposed to solve the problems of tree crowns localization and delineation. We summarize the characteristics of each method and we present the relevant literature about them. Even if the algorithms can be grouped in four broad categories, generally most of the approaches employ hybrid methods. Sometimes the detection and the delineation parts are handles by two different algorithms, or in other cases two algorithms are used in a sequential way, to perfection and refine the detec-tions. The four categories chosen for the grouping are: Local maxima filtering (LM), Valley following (VF), Template matching (TM), and Region growing (RG). At the end of this Section, in Table 2.1 we present a summary of all the methods.

(13)

2.2.1 Local Maxima (LM)

The idea behind local maxima filtering for tree crown detection is that treetops, being closer to the illumination source, are the points in the tree crowns with maximum reflectance (Brandtberg and Walter, 1998; Wulder, Niemann, and Goodenough, 2000). The brightest pixels are thus considered to be high prob-ability tree locations. Under this assumption, identifying a treetop translates to finding, usually through a sliding window approach, the local maxima in the remote sensed image. Culvenor (2002) uses, for example, a combination of four linear filters with different orientations to find radiometric peaks.

Naturally, when dealing with a local maxima finding problem, one of the main issues is that the detections are greatly affected by the size of the sliding window (Wulder, Niemann, and Goodenough, 2000; Pouliot, King, Bell, and Pitt, 2002). A window that is too large compared to the crown of a tree, will fail to detect different treetops, resulting in the merging of different crowns. On the other side, a window that is too small will create many false positives, by identifying multiple bright pixels that belong to the same crown (Wulder, Niemann, and Goodenough, 2000). A careful selection of the sliding window size is thus fundamental.

Tree crowns are far from precise geometrical shape, as we will see in detail in Sections 2.2.3 and 4.2. This means that often, due to the topology of the trees themselves, high intensity pixels could occur outside the highest, and suppos-edly brightest, part of the crown. A possible solution to this effect is to smooth the image using a Gaussian filter, that acts as a low-pass filter, removing the noisy bright pixels (Pouliot, King, Bell, and Pitt, 2002).

This approach is particularly suitable for trees that have a clear treetop, like most of the conifers (Wulder, Niemann, and Goodenough, 2000). Geometrically indeed conifers are generally conical in shape, this with a pointy treetop, while rainforest trees have a more round tree crown, with a less defined treetop.

2.2.2 Valley Following (VF)

The valley following approach is conceptually similar to the local maxima one described in Section 2.2.1. If treetops are supposed to be the brightest pixels in the image, the areas that are separating the crowns would be the darkest (Gougeon, 1995). We can represent an image as a topographic map in three dimensions, where the height is proportional to the intensity of the pixels. In this topological version, the tree crowns would look like mountains, or peaks, and the regions between them would be the valleys.

This method is more suitable for forests that contain mostly conical trees lit by the sun when it is at a low angle. Moreover the forests should not be extremely dense, as the overlapping of the crowns can impede the formation of

(14)

the valleys. Having extended areas without trees can also be a problem, as those clearings can result in many wrong detections. This problem can be overcome by using some tree/non-tree discrimination technique (see Sections 3.1.2 and 3.2). Also this method works best with trees of similar size, as large trees with internal shadows are often divided in multiple parts, while small trees have the risk of being grouped together (Gougeon and Leckie, 2006).

An algorithm closely related to the valley following approach, is the water-shed segmentation, where the image is first inverted, so that the high intensity crowns become valleys, while the darker separators become ridges. Then this inverted topographic image if filled with a virtual liquid, until the dams prevent-ing two different basins from mergprevent-ing are evident. These lines will become the segmentation lines (Wang, Gong, and Biging, 2004). A multispectral version of the watershed algorithm based on the concept of spectral angle is proposed by Jing, Hu, Noland, and Li (2012). The watershed algorithm, based on the more general mathematical morphology, is described in more detail in Section 5.2.3.

2.2.3 Template Matching (TM)

Template matching algorithms aim to localize a certain small image, called tem-plate, into a bigger image. In the case of remote sensed forest images, especially if the lighting conditions and the topology of the forest are appropriate, the appearance of each tree crown would be highly similar to the other ones.

The template can be extracted from a reference image, or can be artificially generated. After the template is obtained, the image is probed with the tem-plate matrix, and some correlation measure is computed at each location. The template will then be localized where this measure reaches a local maximum.

The artificial model approach has been first proposed by Pollock (1996) where a three dimensional model of a tree crown is proposed, under the form of a generalized ellipsoid of revolution of equation:

zn an +

(x2_{+ y}2₎n 2

bn = 1 (2.1)

In Equation 2.1 the parameters a, b and n can be used to substantially modify the shape of the ellipsoid, ranging from quasi conical shapes typical of conifers, to rounder shapes typical of deciduous trees. Some example of tree crowns modeled from Equation 2.1 can be seen in Figure 2.1 (Pollock, 1996).

(15)

Of course, given that this methods work best with trees that appear similar in shape and lighting conditions, they have been shown to excel in localizing trees in orchards, plantations and uniform coniferous forests, while they fail in less “regular” forests.

The parametrization of the lighting condition can be used to generate fake shadows on the model, as proposed by Larsen and Larsen (1997) and Larsen and Rudemo (1998), to better align the template tot he actual tree crowns. Quackenbush, Hopklns, and Kinn (2000) performed a template matching based localization utilizing sample tree crowns extracted from the full images as tem-plates. More recently, Hung, Bryson, and Sukkarieh (2012) introduced a new model where both the crowns and the shadows are parametrized separately, and then joined together.

2.2.4 Region Growing (RG)

The region growing techniques are widely used in computer vision to segment object. The main idea behind these methods is that the pixels belonging to the objects to segment have some predefined property, that is not shared by pixels not belonging to the objects. These properties could be anything from color intensity, to spectral angle in a specific window (Erikson, 2004).

The algorithms start by selecting some seed points, either randomly or fol-lowing some criteria, and then starting to grow those points by progressively adding new neighboring pixels. If a pixel is similar enough (according to the predefined property mentioned earlier) to the seed pixel, it is added to the re-gion. When it is not possible to add new neighboring pixels, the growth stops and the region should correspond to the object. Region Growing algorithms can be used as a tree crown delineation tool after having performed the localization task. Culvenor (2002), for example, uses the LM approach (see Section 2.2.1) to detect the possible treetrop locations, feeding them as seed locations for a RG algorithm. This technique is also present in the work of Bunting and Lucas (2006).

2.2.5 Summary of related research

As we have seen, the field of individual tree crown localization and delineation has been explored with many different approaches. Up until now however, as also reported by Larsen et al. (2011) and Ke and Quackenbush (2011), there has been no single algorithm or approach capable of dealing with every kind of forest. Moreover we noticed that the only satisfyingly solvable problems were those concerning forests with a topology substantially different from the rainforest one (see Section 3.1.4). Special cases such as orchards, plantations, coniferous forests with similar trees can be solved with high accuracy both in the localization task and in the delineations tasks. Deciduous dense forests such as the tropical rainforests are still open problems, especially when some

(16)

information about the problem is missing (see Section 3.1.2). Table 2.1 contains the summary of this Section, with a brief description of the four main areas in which the previous research can be classified.

(17)

Metho d General idea Pros & Cons Literature LM P oin ts with greatest brigh tn ess corresp ond to the treetops + Go o d for conical conifers + Simple implemen tation − Scale dep endan t W ulder, Niemann, and Go o denough, 2000 Culv enor, 2002 P ouliot, King, Bell, and Pitt, 2002 P ark, Lee , and Kw ak, 2011 VF Dark er re gion s corresp ond to cro wn separation areas + Go o d for conical conifers + Go o d for se p arated trees − Needs lo w-angle sun − Bad for complex forests Gougeon, 1995 Gougeon and Lec kie, 2006 W ang, Gong, and Biging, 2004 Jing, Hu, No land, and Li, 2012 TM App earance of tree cro wns is similar under certain conditions + Shado ws are in the mo del + Handles b oth texture and sp ectrum − Scale dep endan t − Bad for complex forests P ollo ck, 1996 Larsen and Larsen, 1997 Larsen and Rudemo, 1998 Quac ken bush, Hopklns, and Kinn, 2000 Hung, Bryson, an d Sukk arieh, 2012 R G Con tiguous pixels b elongin g to a cro wn ha ve similar characteristics + Handles complex tree shap es − Ov ersegmen tation − Undersegmen tation Culv enor, 2002 Erikson, 2004 Bun ting and Lucas, 2006 T able 2.1: Summ ary of previous researc h on the individual tree cro wn delineation problem.

(18)

Chapter 3

Problem Analysis

3.1 Characteristics of the problem

This Chapter focuses on the analysis of the aspects that make this problem chal-lenging and different to the ones described in Chapter 2. A discussion about the suitability of the RGB color space is carried out, and a possible different color space is explored and analyzed.

While tree detection in aerial images is not a novel problem per se, as stated in Chapter 2, the datasets on which we plan to perform this kind of localization presents many characteristics that make it fairly unique. A list of these features is presented, with a brief explanation about why they are important and how they could hinder the standard tree detection algorithms.

The acquired datasets were not presenting any ground thruth information about the position and/or boundaries of the tree crowns. Due to the lack of precise information, the analysis carried out in this Chapter will be more quali-tative than quantiquali-tative, performed mostly by visual examination of the remote sensed images.

3.1.1 The datasets

The datasets consist of photos taken by a drone (Unmanned Aerial Vehicle -UAV) flying over the Indonesian rainforest in the North Sumatra region. The images were taken with a fairly standard consumer digital camera (Canon IXUS 220 HS). This is thus a passive remote sensing problem. The images have a very high resolution of 4000 × 3000 pixels. The images were not resized, both to keep the level of detail and to not introduce unwanted artifacts.

The pictures come from two different flights and are taken at different al-titudes and with different lighting conditions. The first dataset (Dataset 1) consists of 178 images taken at an estimated altitude of around 80 m above

(19)

ground in diffuse lighting conditions, while the second dataset (Dataset 2) con-sists of 64 images taken at a much higher altitude and with a stronger direct illumination. The second dataset was missing GPS information and thus it was impossible to precisely assess the height, but this doesn’t pose any real incove-nience on the task. Two example images taken from the two datasets can be seen in Figure 3.1.

(a) Dataset 1 (b) Dataset 2 Figure 3.1: Example images from the two Datasets at hand.

3.1.2 Missing multispectral and depth information

As mentioned in Chapter 2 almost all tree crown localization algorithms make use of some spectral information outside the visible one, or employ some kind of depth map. This includes the Near Infrared (NIR) spectrography, and the LiDAR surveying system. Both of these fundamental pieces of information are missing in our dataset, that consists of standard RGB images in JPEG format, due to the fact that the camera is a standard one. The reason why NIR and LiDAR can help in this task is described in the rest of this Section. Some approaches that employ exclusively RGB images exist, but are mostly aimed at localizing trees in coniferous forests, a problem partially different to the one at hand. More details about this difference are explained in Section 3.1.4.

Near Infrared (NIR)

Most trees, thanks to the chlorophyll present in the leaves, reflect in the green part of the visible spectrum. Surprisingly though, most of the reflected elec-tromagnetic radiation falls in different parts of the infra-red spectrum (see Figure 3.2). The strongest reflectance happens in the NIR spectrumm, that includes electromagnetic radiation with wavelengths ranging from 700 nm to 2500 nm. (Knipling, 1970). This “invisible” reflection is not always unique for each species, but it can be used as a spectral signature in classification tasks (Cochrane, 2000). Is is worth noting that a standard consumer camera could be tweaked to acquire radiation in the NIR spectrum by sacrificing the Blue channel (Rabatel, Gorretta, and Labb´e, 2011).

(20)

Figure 3.2: Spectral signatures of three different plants. Photo Credit: Eric Brown de Colstoun - NASA

Another use of the NIR spectrum is the Normalized Difference Vegetation Index (NDVI). If the camera is equipped with a NIR sensor, a standard RGB image can be transformed in a NDVI image by a simple transformation that involves the Red and Near Infrared spectra. The resulting image highlights the vegetation, as it can be seen in Figure 3.3.

LiDAR

A portmanteau1_{between light and RADAR, the LiDAR surveying system uses}

infrared, visible, and ultraviolet radiations to create a depth map (see Figure 3.4). It is widely employed in aerial surveying and tree localization because it can detect trees at different heights, a typical situation in the rainforest ecosystem, and can help finding the center of the crowns, that is typically higher than the other parts of the trees. The UAVs employed for the remote sensing used in this study were not equipped with LiDAR technology because, as stated in Chapter 1, one of the main concerns of the project was to maintain the lowest possible budget.

3.1.3 Image compression

The dataset consists of JPEG images. JPEG is a lossy image compression method designed so that the human eye cannot perceive the quality reduction.

(21)

Figure 3.3: NDVI example image. Source: publiclab.org

Figure 3.4: Two forest patches representations obtained through LiDAR. Photo Credit: Sarah Frey - Oregon State University

(22)

Unfortunately this may generate unwanted artifacts, especially in other col-orspaces, as it will be discussed in Section 3.3.2.

3.1.4 Rainforest topology

Most tree crowns localization algorithms are designed to work in ecosystems different from the rainforest. Coniferous forests, for example, are simpler than rainforests, in the sense that they exhibit less biodiversity. Usually they have one main species, positioned in a relatively evenly spaced manner. Moreover coniferous plants often have a pointy treetop (e.g. spruces, firs) that can ease the task of the detection of the center of the crown, and possibly be a good feature for a pattern matching approach.

Rainforests on the contrary have an enormous amount of different species. It has been calculated that in the Malaysian Borneo rainforest (in the same tropical zone of Sumatra, where the dataset has been recorded) the amount of different tree species found in less than 7 hectares can be up to 711 (Krebs, 2008). As a comparison, in the whole European region north of the Alps there are just about 50 different species of trees (Krebs, 2008). This huge biodiver-sity, together with the possibility for trees from the same species to appear in different sizes, translates to a rather difficult localization task, confirming what has been reported by Larsen et al. (2011) and Ke and Quackenbush (2011). It is really hard to find a unique prototype tree in a rainforest, hindering the naive Template Matching described in Section 2.2.3. The rainforest surveyed by the drone also has a really high degree of canopy cover, defined as the “proportion of the forest covered by the vertical projection of the tree crown” (Jennings, Brown, and Sheil, 1999). This implies that the separation between the trees is scarce, leading to many overlapping crowns.

Rainforests have also a rather peculiar characteristic, as many branches of tall trees are the support for smaller plants called epiphytes2_{. In a crowded}

forest, the competition for sunlight is very high and these plants have evolved so that they can reach the light by growing upon other trees. This can complicate even further the localization task, respect to a coniferous forest. Moreover, in a typical rainforest, the variety of trees is paired by a variety of other plants like bushes and shrubs. This vegetation, without any depth information (e.g. LiDAR, see Section 3.1.2), can be easily confused with the trees.

3.2 Salient features to address

After a thorough visual examination of the datasets, together with information gathered from previous researches, we extracted the most important features that characterize our treetop localization problem. Being that the ground truth was not present (see Section 3.1.1), the assessment of the characteristics of the

(23)

problem is of a qualitative. In this Section we present those features and mention some possible techniques we can use to address them.

Tree/Non-tree distinction

An important prerequisite of every algorithm that deals with tree crown delin-eation is the ability to perform a simple classification of areas that contain a treetop and areas that don’t, i.e. background. Usually this task is performed in urbanized area, where the trees are sparse and other objects could be mistak-enly classified as trees. Our dataset is, for the most part, composed by images entirely encompassing the rainforest canopy, although in few images a river or a small building can be present. This means that, generally, the whole image contains trees, and there is no added information after performing a tree/non-tree classification. This step is, nonetheless, fairly easy to perform. Usually non tree areas in an image can be excluded with simple considerations on the color (more precisely the hue, as we can see in Section 3.3.2), or on the saturation of the patch in exam.

Different tree crown sizes

As mentioned in Section 3.1.4, the huge biodiversity of the rainforest means that, contrary to coniferous forests or monoculture forests3_{the trees can appear}

at many different scales. This would, in principle, suggest to use techniques or algorithms with some degree of scale invariance. Of the two datasets provided, though, one present less scale variations than the other. An example that shows this difference in scales can be seen in Figure 3.5.

An analysis performed on the image channels, like saturation and hue (see Section 3.3.2) is by itself scale invariant. Template matching approaches, or mathematical morphology ones (Section 5.2), and thus watershed segmentation (Section 5.2.3) are not inherently scale invariant.

Different textures

Another interesting feature that can be used to discriminate between different treetops is the different patterns made by their leaves. Leaves, and thus textures, can be grouped using some main characteristics, namely:

• Size

Leaves can vary from being really small, i.e. ≈ 1 pixel in diameter (in our datasets), to fairly big, i.e. ≈ 10 pixel in diameter.

• Shape

Leaves can be approximated with small ellipses with different eccentricity. They can be, at the extremes, almost circular in shape or very elongated. • Density of leaves

Leaves can be sparsely or densely distributed on the branches of a tree.

(24)

Figure 3.5: Two trees that present widely different scales.

• Density of clumps

Leaves groups, i.e. on the same branch, can be be barely detected, when they are close together or overlapping, or can be clearly visible, when they are far from each other.

These patterns result in differently textured areas in the images. Both trees from different species and trees from the same species but with different sizes can exhibit tree crowns with peculiar textures as can be seen in Figure 3.6.

Roundness

Most tree crowns appear to have a quasi-circular shape when photographed from above. This shape related clue appears to be one of the most prominent when the human vision system has to locate different trees in an image. Unfortunately the borders of such blob-like regions are not well defined, and are somewhat fuzzy, so a simple edge detection / blob detection is infeasible. This problem could be tackled with a pattern matching related approach. Another interesting option to exploit this feature is the Mathematical Morphology framework, that we will see in detail in Section 5.2).

Light reflection

As mentioned in Sections 2.2.3 and 2.2.2, about Template Matching and Valley Following, by examining the grayscale converted image (or the Value channel of

(25)

Figure 3.6: Different trees often exhibit different textures.

the HSV image), we can make some considerations about the relation between the intensity of the pixels and their position within or outside a tree crown. From an entirely qualitative analysis we notice that higher tree crowns reflect more light than lower tree crowns, while the background (when present) almost doesn’t reflect any light. This relation between height and luminosity is ex-pected, as higher trees will receive more light, and will not have shadows cast upon.

We notice, in general, that the average intensity is really low in the few background areas, i.e. the actual floor of the rainforest. Unfortunately the real background is hardly ever present in the images, as the canopy is really dense, overlapping, and often intertwined. However we notice this luminosity behavior also between bigger and smaller trees, or higher and lower trees. As a result, most trees presents darker areas around the borders of their tree crowns, as can be observed in Figure 3.7.

As we will see later in Chapter 4, due to the availability of only the RGB pixel intensities, the illumination model plays the most important role in the proposed solution for the problem. To exploit this illumination model, as for the roundness described before, we could use a pattern matching algorithm, or one important derivation of Mathematical Morphology: the watershed transform segmentation (see Section 5.2.3). A more detailed discussion about the illumination model, together with a way to exploit its structure, is carried out in Chapter 4.

3.3 Color Spaces

3.3.1 RGB color space

In the standard Red, Green and Blue (RGB) colorspace, each color is encoded with three coordinates and can be represented with a point in a cube (see Figure

(26)

(a) Grayscale (b) Inverted Grayscale Figure 3.7: Detail of a tree. It can be noted how the boundary regions of the

crown have a different average intensity than the interior of the crown.

3.8). The RGB encoding is not particularly suitable for a task like automatic tree crown delineation. This is caused by the fact that the channels are not illumination invariant. Naively, one may consider differences in the Green chan-nel to be able to give clues about different tree species, as most trees appear of slightly different shades of green to the human eye. This greenness, though, is subject to differences in the illumination and doesn’t really represent the shade of green. In fact, contrary to other colorspaces (like HSV/HSL or YCbCr), RGB doesn’t separate the chroma, i.e. the color information, from the luma, i.e. the intensity of the image. The Green channel could still be employed for other sub-tasks like tree/non-tree distinction.

3.3.2 HSV color space

We decide to perform the channel analysis in a different colorspace. One of the most potentially interesting choices is the HSV (Hue, Saturation, Value) colorspace. As in the RGB, the HSV encoding also assign three coordinates to each color, but as one of these, the hue, represents an angle, the colors lie in a cylinder instead of a cube (see Figure 3.9). We now proceed to describe the three channels and their relation to the features presented in Section 3.2.

Hue

As stated above, the hue is an angular dimension. It starts at 0◦

correspond-ing to primary red and then it passes through 120◦ _{corresponding to primary}

green, and 240◦ _{corresponding to primary blue. The lossy JPEG compression}

described in Section 3.1.3 results in a blotchy appearance of the hue channel when displayed. Rectangular patches of roughly the same hue are indeed com-pressed by the JPEG algorithm and this explains the “blocky” appearance that

(27)

Figure 3.8: The RGB cube.

(28)

can be seen in Figure 3.10 (b) and 3.11.

While this channel was considered to be the best to discern the greenness of different trees, from an analysis of the dataset, it results that most trees appear to be of roughly the same hue, as can be also seen in Figure 3.10 (b). Moreover, its intrinsic illumination invariance is not as strong as expected. In Figure 3.11 we can see that the shadows cast by some trees are present also in the hue channel.

(a) Detail in RGB (b) Same detail, Hue channel Figure 3.10: Dataset 1 detail in RGB (a) and corresponding Hue channel with

adjusted brightness (b). Both the uniformity of hue and the “blocky” compression artifacts can be seen in (b).

Saturation

On the HSV cylinder, the saturation of a color represents the distance of its corresponding point from the axis. Roughly speaking, saturation encodes the intensity of a certain hue. From an analysis of the channel, saturation seems to be potentially useful to discern between different species of trees, as some species present more saturated leaves than other species, as can be seen in Figure 3.12.

Value

The value represents the height of the color on the axis of the HSV cylinder. The value is closely related to the RGB colorspace, as it is computed, for each pixel, as max(R, G, B), i.e. the maximum value between the three RGB channels. The final result is similar to the grayscale version of the image. This channel emphasizes the lighting of the picture, so, together with the grayscale image, it’s a good channel to use for the luminance model discussed in Section 3.2.

(29)

Figure 3.11: Hue channel of the image in Figure 3.1b. It can be noted that the hue is not always illumination invariant. The dark areas correspond to the

light green hues, while the light areas correspond to the strong shadows caused by strong direct illumination.

(a) Detail in RGB (b) Same detail, Saturation channel Figure 3.12: Dataset 1 detail in RGB (a) and corresponding Saturation channel (b). It can be seen that the saturation can be noisy, but informative.

(30)

Chapter 4

Problem modeling

4.1 Towards a model of the tree crowns

In this Chapter we build a conceptual model of the tree crowns and the way their appearance distances itself from said model, taking into account the char-acteristics of the problem described in Chapter 3, especially the considerations on the light reflection. We propose an illumination model similar to the one outlined in Section 2.2.3, and we attempt to model the noise inherent to the tree crowns. With this models in mind, we can then partially cancel the noise, and ease the tree localization task.

4.2 Modeling the tree crowns

Considering the limitations that we highlighted in Sections 3.1.2 and 3.1.4 we know that our problem is not easily solvable using “standard” approaches as the ones presented in Chapter 2. In particular, summarizing the limitations, our problem at hand presents:

• Crowns at multiple scales; • Non-conical trees;

• Multiple species;

• No multispectral information; • No depth information.

From these limitations we understand that the only information we can rely upon comes from the visible light reflected by the tree crowns according to their shape. Our approach will conceptually be similar to a valley following method (see Section 2.2.2), but due to the non-conical appearance of the tree crowns,

(31)

and due to the high canopy cover (see Section 3.1.4), these valleys will be hard to detect without a preprocessing phase.

We note that, following the reasoning presented in Section 2.2.3 about the three dimensional modeling of trees, we can approximate most of the tree crowns present in the dataset images as ranging between moderately oblate spheroids to slightly prolate spheroids. An oblate spheroid is an ellipsoid where the polar radius is shorter than the equatorial radius, while a prolate spheroid has the polar radius longer than the equatorial one (see Figure 4.1). From a visual examination of the datasets, it is evident that most of the trees are neither extremely oblate, nor really prolate as a cypress, or a fir could be. For the purpose of this discussion we can then assume the shape of the tree crowns to

be close to a sphere. Page 1 of 1

Figure 4.1: Oblate spheroid (left) and prolate spheroid (right).

If the tree crowns were smooth spheres with a surface reflecting both diffuse radiation (lambertian) and specular, when lit by a quasi-zenithal light source they would present a specific illumination profile. The area where the specular reflection will be, i.e. the one at the same angle as the light source, will be more illuminated, and the other parts will progressively get darker. If the spherical surface only had a diffuse reflection, if illuminated its appearance would be the same as a solid disk. In Figure 4.2 we show two computer generated spheres1_,

one with both specular and lambertian reflections, and the other with only lam-bertian reflection, both illuminated from above, and seen from above. Also in Figure 4.2a we show the intensity profile of a scan line2 _{for both the images.}

Scan lines will be used in the remainder of this chapter to better illustrate the reflection models, because they can be interpreted visually in a easier way than a tridimensional topographic map.

Summarizing, we see that, for a smooth spherical surface with both specular and diffuse reflection, the three dimensional topographic intensity image will

1_{All the 3D spheres have been generated using the ray tracing software POV-Ray.} 2_{Row of the image matrix.}

(32)

have a convex shape, with a clear “bump” in proximity of the highest point. For a purely diffuse reflecting sphere instead, the three dimensional topograpghic map would look like a cylinder Naturally, as we will see in Section 4.3, neither of these overly simplistic models can explain the actual behaviour of the more complicated tree crowns when illuminated.

(a) (b)

Figure 4.2: Scan lines of a smooth spherical surface: specular and diffuse (a) and only diffuse (b).

4.3 Modeling the noise

In this Section we try to understand why and how the actual morphology of a tree crown differs from the ideal model described in Section 4.2, and we propose a method to reduce this noise.

The most evident difference between the real tree crowns and the “dome” model is the fact that the surface of the tree crowns is not smooth, nor homoge-neous. What we perceive as a tree crown is actually a surface composed of small parts, the leaves, that, as already analyzed in Section 3.2, reflect the light and generate complicated patterns and textures. We can therefore assume that the main difference between the illumination of a ideal dome, and the illumination of a dome-shaped tree crown lies in the presence of the leaves that makes the “surface” of the tree crown rough.

According to Grant (1987), “leaves are neither purely diffuse nor purely specular” reflectors, but present both kind of reflections due to their structure and chemical composition. A diffuse reflection occurs when the light hitting the leaf surface is reflected at many different angles, while the specular consists of light reflected exactly at the same angle of the incident light. Even though both

(33)

reflections are present, we can assume that the fact that each leaf is oriented in a slightly different way, causes the specularly reflected light to scatter at many different angles. This would lead the crown to appear as if all, or most, of its reflection was diffuse. If we look at the scan line of a real crown (Figure 4.3), in fact, we see that the profile doesn’t seem to follow the roundness of the one seen in Figure 4.2a, but looks more “rectangular”, like the one of a purely lambertian spherical surface (Figure 4.2b), albeit with noise.

If we add a dented texture to the spheres we can simulate the apparent roughness of the leaves and their “faceted” nature. In Figure 4.4 we show other two computer generated spheres, this time with a random pattern of small dents. As in the previous case, the left sphere presents a combined specular and lambertian reflection, while the right sphere presents an exclusively lambertian reflection. As we can see, the two intensity profiles are very similar, and the specularity doesn’t seem to play a big role when the surface is so dented.

From these considerations we understand that our reference model must be closer to a purely diffuse spherical reflector, than to a specular one. This implies that approaches that involve local maxima filtering (see Section 2.2.1) will not work so well. Valley following methods, on the other hand can still be applied. The fundamental pre-processing step before attempting a valley following approach must be to identify the noise and eliminate it. At this level (leaf-level noise) we can identify two fairly different kinds of noise:

• Light scattering: the leaves have many different orientations, thus the normal of the surface is often discontinuous. The leaves scatter the light in many different directions compared to a smooth dome.

• Light absorption: the tree crowns usually present branches arranged in a dome-like fashion. Branches usually have some separation between them, and this results in “holes” in the dome structure. These holes trap the light, and results to be darker than the actual surface of the dome. This two different kinds of reflection patterns induce two different kinds of noise in the intensity image. The first kind, generated by the scattering of light due to the random orientation of leaves, has a stochastic nature typical of Gaus-sian noise. The second kind instead, caused by the pits in the crowns, cannot be modeled in the same way. This latter noise, has a more morphological con-notation, as we will see in Chapter 5.

(34)

(35)

(a) (b)

Figure 4.4: Scan lines of a dented spherical surface: specular and diffuse (a) and only diffuse (b).

(36)

Chapter 5

Noise reduction

5.1 Two kinds of noise

Following the discussion of the noise model of Section 4.3, in this Chapter we introduce the tools to deal with those two kinds of noise. One of them, generated by the reflection patterns of the leaves, is stochastic in nature, and can be reduced using a standard Gaussian filtering. The other one, caused by the inter-crown dark areas, is better dealt with using a framework called Mathematical Morphology.

5.2 Introduction to Mathematical Morphology

The word morphology comes from Ancient Greek, meaning “the study of shapes”, and, historically, describes a branch of biology that deals with the study of the structure, shape and form of living things. Mathematical morphology (MM), even if not related with biology at all, deals with the study and processing of geometrical structures in images.

MM refers to a theory and technique developed by Matheron and Serra (Serra, 1983) mostly based on set theory and lattice theory. MM provides a set of useful tools for image analysis and processing. The original MM was designed for binary images, where pixels can only assumes two values, i.e. 0 and 1, but many greyscale extensions that follow the same principles have been formulated during the 1970s.

The basic idea behind MM is to “probe” an image with a certain shape, called structuring element. According to which morphological operator is used, this probe acts on the image in a different way. The four basic operators are dilation and erosion, and from them the operators opening and closing are de-rived (see Section 5.2.1). After a morphological operation is carried out, the

(37)

image is transformed.

In Section 5.2.1 we introduce, in a qualitative fashion, the basic operations of MM, needed to better understand the morphological smoothing process, while in Section 5.2.2 the grasyscale extension is explained. A rigorous discussion would be out of the scope of this Thesis, and it is found in many tutorials and textbooks. For a more technical and mathematically sound description of the theory behind MM the reader can refer, for example, to the work of Maintz (2005).

5.2.1 Binary Mathematical Morphology

Binary MM deals with binary images, i.e. images that are only composed of two “levels”: a background (black), and a foreground (white). A binary image can thus be represented by a matrix whose elements correspond to the pixels of the image. The elements corresponding to background pixels have value 0 and the ones corresponding to foreground pixels have value 1. We will call the black pixels OFF, and the white pixels ON. A structuring element is, in this framework, another binary image, smaller than the original image, with a cer-tain pixel (usually the central one) defined as its origin.

In this Section we will give a geometrical interpretation of binary MM. In order to understand the basic morphological operations without dwelling in the equations related to set theory, we have to define the concepts of Hit and Fit. When we probe an image with a structuring element, we overlap its origin pixel to every pixel of the image. If at least one ON pixel of the structuring element is overlapped to an ON pixel of the image, we say that the structuring element Hits. If all the ON pixels of the structuring element overlap with ON pixels of the image, we say it Fits. We can now describe the four fundamental morphological operations in simple terms. The images describing the four operators were created by Renato Keshet1_.

Dilation

The dilation of an image I by a structuring element S is a new image, where the pixel located at (x, y) is ON if S, centred at the same pixel and mirrored2_,

hits I (Equation 5.1). This results in a “thickening” of the foreground objects. Dilation makes small objects grow to the same size of the structuring element, and can fill holes sufficiently small. We indicate the dilation operation with the symbol ⊕.

1_{https://en.wikipedia.org/wiki/User:Renatokeshet}

2_{The mirroring is not relevant if the structuring element is symmetric, like in the case of} a disk or a square, as long as its origin pixel is in the center

(38)

I(x, y) ⊕ S = (

1 if S hits I

0 otherwise (5.1)

Figure 5.1: Dilation of a square by a disk.

Erosion

The erosion of an image I by a structuring element S is a new image, where the pixel at (x, y) is ON if S, centred at the same pixel, fits I (Equation 5.2). This results in a “thinning” of the foreground objects3_{. Generally speaking the}

erosion can remove structures smaller than the structuring element, and open gaps between object. We indicate the dilation operation with the symbol .

I(x, y) S = (

1 if S fits I

0 otherwise (5.2)

Figure 5.2: Erosion of a square by a disk.

3_{Erosion and dilation are dual operations, and it can be proved that an erosion of the} foreground is equal to a dilation of the complemented background and vice versa

(39)

Opening

The opening operation consists of an erosion followed by a dilation with the same structuring element. Conceptually, an opening can be thought as the union of all the parts of the object that entirely contain the structuring element (inpainting). The effects of an opening are a smoothing of the contours and an elimination of thin protrusions. We indicate the opening operation with the symbol ◦. The opening operation is idempotent, and thus subsequent openings with the same structuring elements don’t change the resulting image.

I ◦ S= (I S) ⊕ S (5.3)

Figure 5.3: Opening of a square by a disk.

Closing

The opening operation consist of a dilation followed by an erosion with the same structuring element. Conceptually, an opening can be thought as the collection of points generated by sliding the structuring element outside the objects (outpainting). Closing, as the name implies, results in the closure of holes and thin gulfs of the objects. We indicate the opening operation with the symbol •. The closing operation is also idempotent.

(40)

I • S= (I ⊕ S) S (5.4)

Figure 5.4: Closing of a shape by a disk.

5.2.2 Grayscale extension

Morphological analysis was originally developed for binary images, but its for-mulation has been extended to deal with grayscale images, as well as grayscale structuring elements. In grayscale images every pixel can range from being black to being white, taking a finite amount of values between the two extremes. Grayscale images can be viewed as scalar functions and, as said in Section 2.2.2, represented as topographic maps, where the plane defined by coordinates x and yis the image plane, and the third coordinate z represent the intensity, i.e. the gray value, of the pixels.

In this framework we can formulate the extensions of dilation and erosion operators, and thus the opening and closing ones. It is possible, in fact, to in-troduce grayscale morphology using exactly the same geometrical concepts from binary morphology.

The tridimensional topographic map representation of an image is a surface that partitions the space in two volumes. Everything that is lower or equal to the surface is called the umbra. Intuitively the umbra represents the portion of space below the surface generated by the image. Thinking about a grayscale image in this way, we can consider it to be a tridimensional binary set, where we can use the same operations defined for binary morphology.

To better understand the analogy, we can look at a one-dimensional signal (that can ideally represent a row of a grayscale image, as seen in Chapter 4). A one-dimensional signal can be used to define a two-dimensional binary set, where the umbra of the signal is black, and the complment of the umbra is white. This process is illustrated in Figure 5.5. It follows that to perform grayscale

(41)

morphological operations we have to perform binary morphology on the umbra of the surface generated by the image.

Figure 5.5: A one dimensional signal can be transformed in a binary image using the concept of umbra.

When the signal is two-dimensional, as a grayscale image, the umbra will define a tridimensional binary set. We can now apply the same geometrical reasoning to perform the basic binary morphological operations. The main dif-ference is that, going from a bidimensional set to a tridimensional set, also the structuring elements will be tridimensional. In this setting, structuring elements can be flat (like a cylinder, see Section 5.3.3), or non-flat (like a paraboloid, see Section 5.3.2).

An important thing to notice is that in this extension, the operations of opening and closing have still the same geometrical interpretation of binary morphology. Opening represents an “inpainting” of the umbra with a tridi-mensional structuring element, as if the structuring element was “pushed” from below the topographic surface. Closing represents an “outpainting” instead, as if the structuring element was “pushed” from above the topographic surface.

5.2.3 Watershed transform

The watershed transform, introduced by Digabel and Lantu´ejoul (1978) and perfected by Beucher and Lantu´ejoul (1979), is an image segmentation algo-rithm developed in the framework of grayscale mathematical morphology. As already stated, a grayscale image can be seen as a tridimensional topographic surface, with the intensity of each pixel mapped to the z axis. This surface will present ridges and valleys.

The intuitive explanation of the watershed algorithm is to progressively flood the topographic version of the image and mark the lines that prevent two dif-ferent “catchment basins” from merging. These lines act as watersheds, thus the algorithm name.

(42)

This algorithm is clearly suitable to exploit the lighting model described in Section 4.2. If we take the negative version of a grayscale image of the dataset, as we can see in Figure 3.7 (b), we can identify the trees as “dark” catchment basins and the lighter areas between the crowns as ridges, or watersheds.

5.3 Smoothing the images

Now that we described the mathematical morphology framework, we can outline a way to dampen the noise described in Section 4.3. According to that noise model, the noise patterns that hinder a proper segmentation of the tree crowns in the images are due to two different phenomena: the scattering of light due to the random orientation of the leaves, and the absorption of light due to the space between branches of the same crown. The aim of the noise removal, on a qualitative level, is to obtain images where the tree crowns are sufficiently uniform while keeping the valleys between them evident. In this way, segmen-tation techniques based on valley following will be able to better separate the tree crowns, ideally without segmenting each branch, or each clump of leaves, as a single crown. As stated in Section 4.3 we need to deal with these two kinds of noise in different ways.

5.3.1 Reducing the stochastic leaf-level noise

The light reflected from the leaves is scattered in a stochastic way. Leaves can appear slightly brighter or darker than they should, due to their specular reflec-tion and the slightly random angle their surface makes with the light direcreflec-tion and the viewer direction. The amount of light reflected can vary gradually from a maximum, when the viewer is aligned with the specular reflection, to a mini-mum, when the viewer is orthogonal to the specular reflection (and only receive the diffuse one).

For this kind of noise, an averaging filter can help smoothing the topographic surface, eliminating the spikes generated by the light scattering. For this task we convolve the image with a Gaussian filter with a relatively small sigma. The scale of the filter is empirically set by visually examining the average scale of the leaves. The result of applying a Gaussian smoothing to an image from the dataset can be seen in Figures 5.7 and 5.8. A comparison between a scan line of the original grayscale image, and the same scan line of the Gaussian smoothed image can be seen in Figure 5.9.

We can see the effect of the Gaussian smoothing on a patch of leaves in Figure 5.6. As we can see in Figure 5.6 (a) and 5.6 (b), the intensity distribution in a patch uniformly containing leaves is slightly skewed towards high intensities. This is due to most leaves being oriented perpendicular to the sun. The mean (and the mode) of the distribution, though, is not entirely pushed towards the highest intensities, as at least two other effects are present. On the one hand

(43)

leaves cast small shadows on neighbouring leaves, that will reflect less light even if in optimal position. On the other hand, small spaces between leaves result in slightly darker patches. Overall, even if not exactly zero-mean, the noise induced by the leaves can be reduced by a Gaussian filtering, that, as shown in Figure 5.6 (c) and 5.6 (d), only reduces slightly the mean of the intensities, while greatly reducing the variance.

(a)

(b)

(c)

(d)

Figure 5.6: A patch of leaves (a) and its corresponding grayscale intensity histogram (b). The same patch after Gaussian smoothing (c) and its

corresponding histogram (d).

5.3.2 Reducing the morphological branch-level noise

As stated in Section 4.3, the noise generated by the space between branches be-longing to the same crown, or between clumps of leaves bebe-longing to the same branch, has a different nature compared to the one generated by the leaves re-flection. The “pits” generated by these inter-crown gaps appear as dark areas in the crowns, and cannot be eliminated by a Gaussian smoothing. These holes are better dealt with using mathematical morphology, as they cannot simply be “averaged” out by a Gaussian filter. The absorption of light from these pits has a non-linear nature, and their intensity is, in fact, not related to the aver-age intensity of the crown surface (contrarily to what happens for the leaf-level noise). In order to “close” these gaps, we perform a closing of the Gaussian

(44)

Figure 5.7: Gaussian smoothing, with σ = 10. Image and scan line.

(45)

Figure 5.9: Scan lines comparison between original image (light blue) and subsequently Gaussian smoothed image (red).

smoothed image with a sufficiently narrow parabolic structuring element (see Figure 5.10). The choice of a parabolic structuring element is motivated by the analytical and morphological properties of that function. Its fundamental properties that make it unique (Boomgaard, 1992) are:

• It is rotationally symmetric; • It is dimensionally decomposable.

The first property ensures that the structuring element will not favor any direction, when applied in morphological operations over an image. The second property allows to drastically reduce computational times for those operations. It is important to note that the decomposition of an n-dimensional paraboloid in the dilation of n 1-dimensional parabolas is exact (see Figure 5.11). This is not the case of other structuring elements, that can only be decomposed by approximations (see Section 6.2). The parabolic structuring element has been proven to be the morphological analogous of the Gaussian kernel for convolutions (Boomgaard, 1992). For this reason, the formula used to build the paraboloid is the one shown in Equation 5.5, where the spread of the curve is given by the parameter σp.

z=x

2_{+ y}2

2σp2

(5.5) As stated before, the closing operation in this framework can be geometri-cally described as lowering down a paraboloid (the structuring element) over the topographic map representing the smoothed image. Intuitively, if the paraboloid is “narrow” enough, it will fit in the valleys separating the tree crowns, but it will not fit in the inter-crown gaps. This will effectively close the gaps, while keeping almost intact the valleys. Figures 5.12 and 5.13 show the effect of the closing operation using the parabolic structuring element depicted in Fig-ure 5.10. Morover a comparison between a scan line of the Gaussian smoothed image, and the same scan line of the closed image can be seen in Figure 5.14.

(46)

Figure 5.10: Parabolic structuring element.

(47)

Figure 5.12: Morphological closing with parabolic structuring element. Image and scan line.

Figure 5.13: Morphological closing with parabolic structuring element. 3D topographic map.

(48)

Figure 5.14: Scan lines comparison between Gaussian smoothed image (blue) and subsequently morphologically closed image (red).

5.3.3 An additional morphological operation

The watershed transform described in Section 5.2.3 is really sensible to local shallow minima, and thus, even after the morphological closing, still fails to properly segment the image. What we aim for is an image were the tree crowns appear mostly “flat”. In order to generate a suitable topographic map for the segmentation, we then need an additional morphological step.

We perform a morphological opening, but this time with a flat, disk shaped, structuring element. In grayscale morphology though, a flat disk is actually the face of a cylindrical tridimensional structuring element. This way of picturing the structuring element used for the opening is in accordance to the almost flat illumination model of a tree crown described in Section 4.2. The radius of the structuring element is chosen to be close to the radius of the average tree crown in the dataset. This operation will lower the intensity of the tree crowns (due to the inpainting), but will eliminate most of the residual small peaks that survived the closing operation. Figures 5.15 and 5.16 show the effect of the opening operation using a flat disk shaped structuring element. Like before, a comparison between a scan line of the closed image, and the same scan line of the opened image can be seen in Figure 5.17.

(49)

Figure 5.15: Morphological opening with flat disk shaped structuring element. Image and scan line.

Figure 5.16: Morphological opening with flat disk shaped structuring element. 3D topographic map.

(50)

Figure 5.17: Scan lines comparison between morphologically closed image (blue) and subsequently morphologically opened image (red).

(51)

Chapter 6

Experiments

6.1 A summary of the pipeline

Before showing the results of the watershed segmentation, we present a brief summary of the pipeline outlined in Chapter 5:

• The chosen image is converted to grayscale;

• It is blurred by convolving a Gaussian filter to reduce the leaf-level noise; • It is morphologically closed with a parabolic structuring element to reduce

the branch-level noise;

• It is morphologically opened with a flat disk shaped structuring element to eliminate the leftover shallow minima;

• A watershed segmentation is performed on the opened image.

6.2 Technical remarks and optimizations

The pipeline is implemented in MATLAB, with morphological functions avail-able from the Image Processing Toolbox. The parabolic structuring element is not directly available and has been implemented by the author. To maintain efficiency and reasonable computational times in the processing, while keeping the dataset at full resolution (see Section 3.1.1), two optimizations have been deemed fundamental:

• The parabolic structuring element employed in the closing step is exactly decomposed in two orthogonal structuring elements. This drastically re-duces the computational time, which can be lowered up to 4% of the original one.

(52)

• The cylindrical structuring element is approximated by a composition of 10 smaller structuring elements (see Figure 6.1). This decomposition, included in the MATLAB Image Processing Toolbox, grants a reduction of the computational time for the opening operation up to 5% of the original one.

Figure 6.1: Approximated decomposition of cylindrical structuring element.

6.3 Parameters of the pipeline

Following the description of the pipeline in Section 5.3, we identify three main parameters of the pipeline, one relative to each step, but all related to the scale: • Gaussian smoothing: the parameter σg of the Gaussian filter, i.e. the

scale of the smoothing;

• Morphological closing: the parameter σpof the paraboloid (see

Equa-tion 5.5), i.e. the scale of the parabolic closing ;

• Morphological opening: the radius r of the cylinder, i.e. the scale of the cylindrical opening.

We performed different experiments by varying these parameters, on images taken from both Datasets. Because the ground truth of the problem was not provided (see Section 3.1.1), the evaluation of the results is done in a qualitative manner, through visual inspection of the segmentations.

6.4 Watershed segmentations of processed

im-ages

After the opening with a cylindrical structuring element described in Section 5.3.3, we can proceed to use the watershed segmentation to approximately detect the

Individual Tree Crown detection in UAV remote sensed rainforest RGB images through Mathematical Morphology

MSc Artificial Intelligence

Master Thesis

Individual Tree Crown detection in UAV remote sensed

rainforest RGB images through Mathematical Morphology

Eugenio Di Leo

July 28, 2017

Supervisor:

dr ir L Dorst

Assessor:

dr J Van Gemert

Contents

List of Figures

Chapter 1

Introduction

1.1

Remote sensing for forest surveying

1.2

Conservation Drones

1.3

Aim of the project

1.4

Outline of the thesis

Chapter 2

Related Works

2.1

Introduction to Tree Crown Detection

2.2

Previous works on Tree Crown Detection

2.2.1

Local Maxima (LM)

2.2.2

Valley Following (VF)

2.2.3

Template Matching (TM)

2.2.4

Region Growing (RG)

2.2.5

Summary of related research

Chapter 3

Problem Analysis

3.1

Characteristics of the problem

3.1.1

The datasets

3.1.2

Missing multispectral and depth information

3.1.3

Image compression

3.1.4

Rainforest topology

3.2

Salient features to address

3.3

Color Spaces

3.3.1

RGB color space

3.3.2

HSV color space

Chapter 4

Problem modeling

4.1

Towards a model of the tree crowns

4.2

Modeling the tree crowns

4.3

Modeling the noise

Chapter 5

Noise reduction

5.1

Two kinds of noise

5.2

Introduction to Mathematical Morphology

5.2.1

Binary Mathematical Morphology

5.2.2

Grayscale extension

5.2.3

Watershed transform

5.3