Road Surface Damage Classification using LiDAR and Convolutional Neural Nets

(1)

Road Surface Damage Classification using LiDAR and Convolutional Neural

Nets

submitted in partial fulfillment for the degree of master of science

Sofia Tilon

10246673

master information studies

data science

faculty of science

university of amsterdam

2018-07-19

Internal Supervisor External Supervisor Title, Name Dr Thomas Mensink Dr. Samuel Blake Affiliation UvA, FNWI, IvI Hal24K

(2)

Road Surface Damage Classification using

LiDAR and Convolutional Neural Nets

Sofia Tilon

University of Amsterdam sofiatilon@gmail.com

Abstract

Faster road damage classification would decrease the time it takes to go from road inspection to road mainte-nance improving road safety and decreasing maintemainte-nance costs. Road damages are characterized by geometric fea-tures which can be derived from detailed point clouds sampled using a Light Detection and Ranging (LiDAR) sensor. It is expected that road damage shapes in point clouds can be learned using machine learning. Point cloud data is sampled from the A6 highway in The Netherlands and labeled segments are derived. Gridnet discretizes the segments and applies an 3D CNN for classification. The computational needs of Gridnet did not allow for extensive testing and the conclusion can be drawn that this method is therefore unfit for road damage classifica-tion. Pointnet++ consumes a raw unordered point cloud segment and is able to learn from local features while being invariant to non-homogeneous distribution. These results showed to improve on dummy classifiers and on results from preliminary studies using only high resolu-tion images. The quesresolu-tion remains whether Pointnet++ indeed learned from geometric shapes or if it learned to recognize the shape of manually proposed damage re-gions. Further testing should answer this question which in turn should lead towards the next step in faster road damage classification.

Keywords Road Damage, maintenance, Light Detec-tion and Ranging, 3D, ConvoluDetec-tional Neural Nets, Grid-net, discretization, PointNet++

1 Introduction

Approximately 80% of Dutch highways are covered with Road Safety and Drain Asphalt (ZOAB). This type of porous asphalt is composed of a base layer and a top-layer composed of gross-grain and fine-grain aggregates (Figure 1). The open space between aggregates makes it capable of efficient drainage which reduces aquaplaning and therefore increases safety. In addition it reduces noise pollution from tire and road surface contact [25]. Nat-ural causes (erosion or weathering) or man-made wear and tear deteriorate the state of the asphalt. The most common type of road damage, accounting for almost 90% of observed damages, is raveling where aggregates are lost over a long stretch of road surface resulting in a shallow decrease of road height [1]. Damages to

Figure 1. Schematic ZOAB cross-section [19]

the top-layer should be repaired as soon as possible as they can impair drivers safety. However, repairing ZOAB asphalt is costly. Currently, it takes approximately six months to go from visual inspection to maintenance [8]. Therefore, detecting road damages and especially rav-eling earlier could potentially save financial resources. Moreover, monitoring and maintenance resources can be better allocated if the time and location of maintenance can be precisely estimated.

Damage to asphalt can be characterized using geo-metric features [18]. Raveling for example is defined as the displacement of surface aggregates and quantified in square meters. Potholes are defined as bowl-shaped holes of various sizes and quantified in square meters. Longitudinal Cracking is defined as cracking parallel to the pavement and Transverse Cracking as cracking perpendicular to the pavement. Both are quantified in meters. All damages can occur in different severities and therefore may be deeper or larger [18].

Pavemetrics1 has shown that the geometric features of road damages can be derived from laser scans sampled using a Light Detection and Ranging (LiDAR) sensor. Li-DAR is an active sensor which can measure the distance to and location of an objects surface by measuring the time it takes to receive a return signal from an emitted laser pulse.

Recognizing and localizing objects in images using Region-based Convolutional Neural Nets (R-CNN’s) is the new standard in machine learning when it comes to classifying images or objects in images [5]. In recent and ongoing research, the R-CNN concept has been applied on LiDAR data to detect and classify objects such as cars or pedestrians in 3D space, each research improving on

1

(3)

scores of state of the art 3D object detection benchmarks such as KITTI [3, 4, 13, 24, 27].

Considering these advances, it is expected that CNN’s could potentially recognize and localize road damages from LiDAR data since these also posses geometric pat-terns. It is expected that classifying road surface damages using CNNs can reduce the time and efforts needed to detect and maintain road damages. Therefore, the main objective of this research is to investigate whether a combination of laser scanning and CNN’s can classify road damage types.

The main research question is defined as followed: Can road damage be classified using LiDAR data and CNN’s? For this purpose, two CNN’s based on different method-ologies are applied and compared. Gridnet applies an 3D CNN on discretized point clouds while Pointnet++ consumes raw unprocessed point clouds [22]. The main difference between Gridnet and Pointnet++ is that Grid-net requires extensive preprocessing while PointGrid-net++ does not. It is expected that less preprocessing will better adhere to the geometric features of the damage shapes. Moreover, volumetric representations are notorious for being computationally expensive [22]. Therefore it is expected that Pointnet++ will provide higher mean classification accuracies.

2 Light Detection and Ranging

LiDAR is an active remote sensing sensor. It actively emits laser pulses and measures the distance to an objects surface by measuring the return time of the reflected laser pulses [11]. The coordinates of the corresponding point can be derived from the distance, the laser pulse speed and the azimuth and zenith angle of the laser beam [7]. Therefore, this method is able to capture its surrounding in a highly detailed manner.

In Airborne Laser Scanning (ALS), a LiDAR sensor is attached to the belly of an airplane for large scale surveying applications such as large scale biomass esti-mations or for terrain modeling [17]. In Terrestrial Laser Scanning (TLS), a LiDAR sensor is attached to a tri-pod which allows for small scale surveying and modeling such as the creation of 3D city models or on small scale biomass estimations [11, 23].

In Mobile Laser Scanning (MLS), a LiDAR sensor is attached to the top or back of a moving vehicle or on a hand-held device.

2.1 Infrastructure Applications

MLS is used for road inventory surveys to detect road furniture, street signs and pole like objects and trees [6, 14]. In rail infrastructures it has been applied for

asset inventory measurements [12] or in commercial ap-plications for asset integrity monitoring purposes2. MLS has demonstrated to be much more convenient compared to traditional surveying methods [12]. With MLS, road surveys can be done from a car at traffic speed, meaning there is no need for road closures resulting in less traffic jams and road accidents [6].

TNO 3 _{in collaboration with Rijkswaterstaat}4 _and

Pavemetrics5 _{has investigated the potential of raveling}

detection using LiDAR. They were able to detect differ-ent stages of raveling using the Laser Crack Measuremdiffer-ent System (LCMS) of Pavemetrics [1]. LCMS reconstructs the road surface profile in high detail using laser trian-gulation from which deviations in the road profiles can be detected [10]. The methods proposed in this research are not tasked with damage detection but with damage classification unlike this reconstruction method. Even though, before damage can be detected using the re-construction method, several preprocessing task need to be performed: road marking detection, plane flattening, wheel lane detection etc. [1]. Pointnet++ does not need any preprocessing and therefore it should allow for faster road damage classification (section 4).

3 Data

A truck with cameras and a LiDAR sensor attached to its rear sampled pictures and point cloud data of approx-imately 20 km of the surface of the A6 highway near Almere (Figure 2). High resolution geotagged pictures (4000 x1600 pixels; 8.0 x3.2 meter) were sampled every nth second. The distance between images depends on driving speed and the truck drove in both directions, meaning that pictures may overlap [8].

Damages within the pictures are localized and labeled by experts, resulting in approximately 18.5K labeled images. Five classes are defined: Raveling, Undamaged, Tear (i.e. transverse cracking), Holes (i.e. Potholes) and Mechanical Damages (MD) (i.e. longitudinal cracking) [8].

A bounding box is placed around the damaged sec-tion (Figure 3: Label-overlay) meaning that multiple bounding boxes may exist in one image. The collected point cloud data is georeferenced to the geotagged im-ages by constructing a transformation matrix from the known coordinate systems of the images and point cloud (Amersfoort and WGS84). To derive labeled point cloud segments, points within the bounding boxes are extracted (Figure 4). No bounding box is equally shaped which

2_{www.fugro.com/our-services/asset-integrity/raildata} 3_{Netherlands Organization for Applied Scientific Research} 4_{Executing body of public works and water management; part of}

Ministry of Infrastructure and Environment

5

http://www.pavemetrics.com/applications/road-inspection/laser-crack-measurement-system/

(4)

Figure 2. Research Area. Damage types projected on A6 highway.

Figure 3. Road Damage of class Raveling. From left to right and top down: damaged section in magenta on raw image, unlabeled image, damage mask (pur-ple=undamaged), damage mask on raw image, raw point cloud, bounding boxes on interpolated LiDAR data on depth channel (brighter=higher in elevation).

implies that no segment is of equal size or that the points within are homogeneously distributed.

The data set comprises of approximately 26K labeled point cloud segments. The samples are split in a train, validation and test set (75%, 5%, 20%) (Table 1). 3.1 Class imbalance

The dataset is unbalanced meaning that there are signif-icantly more undamaged classes than others (Table 1). Class imbalance is treated differently for Pointnet++ and Gridnet. For Pointnet++, class imbalance is treated by

Figure 4. Damage of class Hole in a (a) Raw image, (b) Raw LiDAR (z) plot (c) Pixel occupancy plot

(col-ored=occupied, white=unoccupied)

oversampling the underrepresented data classes. Point-net’s architecture originally samples the first 1024 points from the point cloud. This was adjusted in such a way that a random subset of 1024 was sampled using a ran-dom seed value. The amount of times a sample is being oversampled is recorded in a hash table so that the random seed value can be changed accordingly. This assures that the same 1024 points are never sampled twice. Samples containing less than 1024 points were discarded (±2 K). The class balance after oversampling can be seen in Figure 5.

For Gridnet, underrepresented classes were also over-sampled. This time to assure that the exact same subset of points are not used twice, a random amount of points were dropped from the sample each time the sample was being reused. The amount of dropped points was assured to never be larger than 10% of the total amount of points within the sample.

train test val total

hole 4 2 1 7

mechanical damage 1889 526 116 2531 raveling 2038 633 149 2820

tear 102 25 8 135

undamaged 20056 5227 1278 26561 Table 1. Number of samples in different data sets.

4 Methods

4.1 Gridnet

Gridnet is based on a recently developed framework of volumetric representations of point data for object de-tection and classification using deep learning [15, 16, 21]. In this framework, a point cloud is converted to a volu-metric representation which can be either an occupancy or a density grid. The volumetric representation is then fed into an architecture that extends the concept of the

(5)

exp1 exp2 exp2 OS _exp3 exp3 O S exp4 exp4 O S 0 20 40 60 80 100 120 pe rc ent ag e o f t ota l [ %] 24.10 3 23.10 3 56.10 3 23.10 3 52.10 3 3.103 3.103 Total samples

Distribution of Classes in Train Set per Experiment

undamaged mechancial damage raveling hole tear damaged

Figure 5. Distribution of train sets for Pointnet++ experiments before and after class imbalance treatment (OS = oversampled).

2D Convolutional and MaxPooling layer to a 3D Convo-lutional and 3D MaxPooling layer. The moving window extends now in 3 dimension (x, y, z ) instead of 2 (x, y). This framework has been applied to detect optimal landing zones for UAV’s [15]. Density grid maps were constructed from large scale point cloud data to detect vegetation free zones. Results testing against simulated data and real data performed better than a random forest classifier or a CNN of which the features are manually tuned. Discretization to find optimal landing zones is different from road damage classification. When finding optimal landing zones, the goal is to classify each cell as suitable or not suitable whereas in this research the goal is to classify a group of cells as being damaged or not. Therefore, in this case neither a density map or an occupancy grid is suited. In both cases it is expected that the group of cells will have a homogeneous dispersion of points around the same height (asphalt height) and that damages will not provide huge variation in height and therefore in density or occupancy values.

It is argued that volumetric representations are strained by the sparsity issue and the cost of 3D con-volutions [20]. The distribution of points are mostly non-homogeneous, meaning that placing a discrete grid on top of such a point cloud is non-trivial as some cells remain empty. The larger the cell size, the more likely

Figure 6. Architecture of Gridnet.

it is that valuable spatial information is lost since more points are summarized. A small cell size means the in-formation is more likely to be retained, however more missing values will be present that need to be treated. The treatment of missing values is again non-trivial as all cells contribute to the weights in the hidden layers, calculated during backpropagation.

Considering the sensitivity of these characteristics of a discretized point cloud, a type of grid is defined that can be termed the minimum-height-grid (see 4.1.1). This grid is resized to a standard size of 50 x50 x10 which is the standard size required by Gridnet.

The architecture of Gridnet is based on a minimal version of traditional commonly used CNN’s (Figure 6). It applies two 3D convolutional layers with a kernel size of 3 x3 x3 and 16 filters, one 3D Max Pooling layer and one Dense layers. To decrease the probability of overfitting, several dropout layers are implemented. The optimization function is Adam [9], the loss function is categorical cross-entropy and the loss metric is average (mean average).

4.1.1 Minimum-Height-Grid

The minimum-height-grid is constructed in such a way that each cell is 0.02 square meters. This number is derived from the resolution of one pixel of the images. Each point within the point cloud is assigned to a grid cell based on their position in metric space (Figure 7). Gridnet requires input of equal sizing. Therefore, the pop-ulated 3D grid is re-sized to be of the shape 50 x50 x10 (x, y, z). Cells that contain multiple points, are assigned the lowest value since it is assumed that lower heights are more representative for damages since by definition aggregates are removed from the surface and therefore a lower height remains. Cells that contain no points are assigned the average height value of all points.

(6)

Figure 7. Example of discretized point cloud where points are assigned to a grid cell (example in red).

4.2 PointNet ++

Pointnet++ is based on its predecessor Pointnet [20]. Pointnet proposed one of the first frameworks that con-sumes an unordered set of points where most work re-garding point clouds and deep learning first converted the point cloud to a volumetric representation. Pointnet semantically encodes each point and aggregates all en-codings within the point cloud into one signal. By nature, Pointnet does not consider local structures present in metric space. Pointnet++ aims to consider local struc-tures by hierarchically iterating over partitions of the point cloud at different scales and applying Pointnet to each partition [22].

Figure 8 shows how Pointnet++ classifies a point cloud from features derived from hierarchical point set feature learning. During this process features are summarized on partitions at different set abstraction levels and combined into a multi-resolution feature set. The optimization function, loss function and the loss metric is equal to that of GridNet.

Combining features from different abstraction layers is powerful and has showed to improve on benchmarks datasets such as ModelNet40 [26]. Moreover, Pointnet++ showed to be the least effected by non-uniform density point-clouds compared to other methods.

It is expected that Pointnet++ will be effective in classifying road damages since the structures of damages are often locally, showing small variations in height in a mostly flat point cloud and of different scales, for exam-ple raveling shows variation in height on a larger area. Moreover, robustness against non-uniform point clouds is an advantage since damage shapes are non-uniformly sampled due to driving speed of the experimental truck. Figure 4 is a typical example of how the damage detected in the images are not completely or homogeneously cov-ered by the point cloud.

Figure 8. Architecture of PointNet++ [22].

4.3 Experiments

Four different experiments are constructed that should show whether road damage shapes can be classified using either Gridnet or Pointnet. Gridnet is only tested with experiment 2 since computational efforts limited further exploration (section 5.1). Pointnet++ was tested using all experiments.

Experiment 1: classifies all classes while not treating any class imbalance. The results of this experiment is treated as a baseline for the following experiments.

Experiment 2: excludes the classes Hole and Tear from classification as these classes are extremely underrepre-sented and could not be oversampled in a manner that assures that the same subset of random points are not sampled twice. The remaining classes Undamaged, Me-chanical Damage and Raveling are classified while the underrepresented classes Mechanical Damage and Rav-eling are oversampled. This experiment would show if oversampling helps classify road damages in a sufficient manner.

Experiment 3: is a binary classifier which attempts to classify Damaged (Raveling and Mechanical Damage) and Undamaged classed. The underrepresented Damaged class is oversampled. This experiment should show if Undamaged samples are distinguishable from Damaged classes which is important to show that the combination of LiDAR and Pointnet++ can be applied in this specific use case.

Experiment 4: is a 2 class classifier which attempts to classify Mechanical Damage and Raveling. This ex-periment should show if LiDAR in combination with Pointnet++ can classify damages in a more detailed manner. This would provide maintenance crews detail such that weights can be given to damages that are deemed more important and possibly such that priority can be added to their maintenance schedule.

4.3.1 Hyper-parameter tuning

The hyper-parameters decay rate, decay step, learning rate and momentum are tuned for Pointnet++ such that for each experiment the model behaves representative to this specific experiment and specific LiDAR data.

(7)

Decay rate Decay step LR Momentum method a 0.7 200000 0.0100 0.9 b 0.0 200000 0.0100 0.9 c 0.7 200000 0.1000 0.9 d 0.5 200000 0.0100 0.9 e 0.7 100000 0.0100 0.9 f 0.5 100000 0.0100 0.9 g 0.7 100000 0.0010 0.9 h 0.7 100000 0.0100 0.5 i 0.5 100000 0.0010 0.9 j 0.7 100000 0.0001 0.9 k 0.5 100000 0.0001 0.9 l 0.7 100000 0.0001 0.5 m 0.5 100000 0.0001 0.5

Table 2. Combination of hyper-parameter values tested for tuning.

Figure 9. Behaviour of tuned Pointnet++ models for experiment 1, 2, 3 and 4.

The best methods are selected based on the behavior of the validation loss and the validation accuracy. The vali-dation loss should converge to a lowest loss while having high mean classification accuracy. Methods F, D, J and I showed to perform best for Pointnet++ experiments 1, 2, 3 and 4 respectively (Figure 9). For each experiment, Pointnet++ is trained from scratch for approximately 40 epochs and tested using the test set.

For GridNet, the decision was made to pick the re-spective parameter set for experiment 2 that performed reasonably at first glance since it was computationally

too expensive to test all parameters set in Table 2. The learning rate was set at 0.01, momentum at 0.05. This de-cision must have influence on the optimization function and therefore on the results presented in next section since it cannot be guaranteed that the loss function converges to a local minimum.

5 Results

5.1 GridNet

Discretizing point clouds while simultaneously training Gridnet proved to be the main bottleneck for this method. Figure 10 shows the distribution of the number of points within the train dataset. Within each bin, the average time it takes to descretize 10 samples was calculated (on a device without gpu and 4 cpu processors). To process one sample in the most common bin (0 - 5000 points) around 0.7 seconds are needed. Gridnet is trained with a batch size of 16 on a device with 16 cpu’s and 32 GB of RAM6_{. Training the model for 2 epochs proofed to take}

approximately 10 hours. Compared against Pointnet++ which trained for 40 epochs with batches of size 32 within 6 hours, the performance of Gridnet is significantly slower and significantly more expensive in terms of computation time. Experiment 2 yielded a total class accuracy of 61 % which is even lower than the accuracy of 69% produced by a dummy classifier. A dummy classifier classifies all samples as the majority class, in this case the Undamaged class, and calculates the accuracies according to this classification.

Considering these results, the computational cost and that the hyper-parameters are not tuned and therefore that the trained model would never perform reasonably well, it was decided to abandon further experimentation on GridNet.

5.2 Pointnet++

Classifying individual classes in Experiment 1 showed to be challenging for Pointnet ++ (Figure 11). All Hole and Tear samples are classified incorrectly. This was expected since insufficient samples were available during training. While Mechanical Damage and Raveling samples had similar proportions in the train set, they do not have similar classification averages. 80% of Raveling samples were classified as Undamaged while over 90% of MD’s are classified correctly. The total class accuracy is 85% which is a higher than 70%, the accuracy than would be obtained by random guessing.

Experiment 2 shows that Mechanical Damage is well distinguishable from Undamaged and Raveling samples while Undamaged and Raveling samples are in approx-imately 25% of the cases confused for each other. The total class accuracy is 77% which is higher than 71%,

6_{https://userinfo.surfsara.nl/systems/lisa/description} 6

(8)

Figure 10. Distribution of number of points within the train dataset versus the average discretization time of 10 samples within corresponding bin.

the accuracy that would be obtained by a dummy classi-fier. The Raveling class accuracy improved compared to Experiment 1 by ±66 % while the Undamaged class ac-curacy decreased by approximately ±20 %. This implies that it is relatively difficult to distinguish Raveling and Undamaged samples. Preliminary studies by Hal24K in which Raveling and Mechanical Damages were classified in a two-way classifier against Undamaged samples using high resolution images and ResNet50 yielded classifica-tion accuracies of 63%, 73% and 61% for Raveling, Me-chanical Damages and Undamaged samples respectively [8]. The Raveling accuracies are similar (61%-65%) for ResNet and Pointnet++. For Raveling, this may suggest that there are similar features present in point clouds and images, that are being learned by both ResNet50 and Pointnet++. Using this line of reasoning, this may also suggest that characteristic features for Undamaged and Mechanical Damages are better distinguishable in point clouds as higher class accuracies are obtained using Pointnet++. To confirm this line of reasoning, future work could implement a similar two-way classifier be-tween a damage class and the Undamaged class using Pointnet++ and compare accuracies.

It is assumed that road maintainers would like to be confident that a sample classified as undamaged, is in-deed undamaged. Therefore, it is aimed to have a high precision (True Positives/ (True Positives + False Posi-tives)). The Precision-Recall curve in Figure 12 shows that for experiment 2, a maximum precision of approxi-mately 0.78 can be obtained. This implies that we could be 78% certain that a sample is classified correctly. This number is quite low and would not suffice for confident

Und MD Tear Hole Rav Predicted label Und MD Tear Hole Rav Tru e l ab el 0.96 0.04 0.00 0.00 0.00 0.07 0.93 0.00 0.00 0.00 0.20 0.80 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.96 0.04 0.00 0.00 0.00 1f Confusion Matrix Und MD Rav Predicted label Und MD Rav Tru e l ab el 0.77 0.02 0.22 0.03 0.96 0.01 0.33 0.01 0.65 2d Confusion Matrix Und Dam Predicted label Und Dam Tru e l ab el 0.75 0.25 0.19 0.81 3j Confusion Matrix MD Rav Predicted label MD Rav Tru e l ab el 0.96 0.04 0.45 0.55 4i Confusion Matrix 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Figure 11. Confusion Matrix for all experiments for Pointnet++

Figure 12. Precision-Recall curves, the Average Preci-sion (AP) and F1 score for experiment 2 (micro-averaged over three classes), 3 and 4.

road damage classification as it would imply that main-tainers still need to manually check the classification results to filter out the wrongly classified samples.

Experiment 3 shows that when Undamaged and Dam-aged (Mechanical Damage and Raveling) samples are classified, it is relatively harder to distinguish Damaged and Undamaged classes. The large amount of incorrectly classified Raveling in combination with the large amount of correctly classified Mechanical Damage samples, to-gether yield a lower accuracy than when classifying Me-chanical Damages alone.

Experiment 4 shows that when classifying Mechanical Damages or Raveling, the classification accuracy is much higher for Mechanical Damages than for Raveling. The total accuracy of 96% improves on the accuracy of 73% yielded by a dummy classifier. These results imply that it is constantly easier to classify Mechanical Damage as they yield high accuracies in all experiments. Section 5.4 presents suggestions on why this may be the case.

(9)

Figure 13. Effect on train loss and accuracy of over-sampling

5.3 Effect of oversampling

Balancing the dataset is common practice in machine learning to prevent overfitting [2]. This section shows the effect when the dataset would not be balanced.

Figure 13 shows how the train and validation loss and accuracy differ when trained and validated on a unbalanced (ratio 1:5) and balanced dataset (3:2) for experiment 3. It shows that both the train and validation accuracies and loss increase when using the unbalanced dataset. This trend is again observed when testing the model which is trained on the unbalanced dataset. The classification accuracy of the undamaged class increased by 20% whereas of the damaged class decreased by almost 25%.

Since more undamaged samples are present while train-ing Pointnet++, the model is learntrain-ing to recognize un-damaged shapes better than unun-damaged shapes. The overall accuracy is still high during training since most undamaged classes are classified correctly, even though damaged classes are in classified incorrectly half of the time. This shows that the ”best” model should not be chosen merely on validation accuracies or losses but that the balance within the dataset should be regarded. 5.4 True positives vs False Negatives

The True Positives (TP) and False Negatives (FN) of all Pointnet++ experiments are inspected to see whether they exhibit distinctive characteristics. The distribution in metric range in the x, y and z dimension within indi-vidual point clouds for the largest classes (Undamaged,

Figure 14. Confusion matrix without oversampling

Figure 15. Normalized distribution of z-range across TP and FN

Mechanical Damage and Raveling) is plotted in Figure 15 and Figure 16. These show that the range in Mechan-ical Damage is distinctive from Undamaged or Raveling. Mechanical Damages seem to be more narrow as they show lower variation in y-range. This is a logical result from the definition of Mechincal Damages. They are line shaped and measured in meters. The bounding box and therefore the extracted point cloud are narrow. Figure 16 shows that most TP’s fall within this description and that FN’s are wider than on average. This raises the question whether Pointnet++ actually learned to recognize the shape of the damages or rather the shape of the bounding box. This would motivate why Mechan-ical Damages yield high average class accuracies in all experiments. It can be argued that Pointnet++ is in-deed learning to recognize bounding boxes rather than shapes when looking at the distribution of x/y range for Raveling and Undamaged samples. They are by def-inition similar in shape since they are measured over square meters but presumably display a different pattern in height. Therefore, if Pointnet++ is indeed learning these patterns, no clear distinction should be seen within Figure 16. This however is still the case. Where the dis-tributions in range for Undamaged samples does not

(10)

Figure 16. Normalized distribution of x-range and y-range across TP and FN for Mechanical Damage, Raveling and Undamaged

overlap with the Raveling samples, Raveling samples are more often classified correctly (Figure 16: Raveling from x-range 0.8 and a cluster of correct values around y-range 0.2). Vica versa, Undamaged samples are more often classified correctly when there is less overlap in range with the Raveling samples (Figure 16: Undamaged, top-left TP’s).

6 Discussion

The uncertainties raised in section 5.4 imply that this research has not shown without a doubt that Pointnet++ is learning to classify damage shapes. To address these concerns future work should ensure that the samples of Undamaged, Mechanical Damage or Raveling are all extracted from an equal sized bounding box and see whether similar class accuracies are achieved for Mechanical Damages.

Future work could consider the sequentiality of the point cloud samples. Because the images may overlap, the point cloud may also overlap. If the same damage is present in the test, train or validation, overfitting may occur. Additionally, most damages and specially raveling, are stretched out over several meters meaning that the overlapping occurs more often. The degree of overlap should be quantified to assess whether overfitting would influence the results presented here. To prevent this issue, the dataset can be split in test, train and validation set based on coordinates or distances between samples instead of splitting at random.

There are labeling errors within the dataset which raises the question whether the data is reliable [8]. Some clear deviations in the images are not marked as dam-aged for unknown reasons. Furthermore, the labeling

procedure is unknown. There is no telling if labels are as-signed based on expert knowledge or by using objective classifier techniques. This is even more interesting when the damages within the images are observed. Damages such as Raveling in Figure 3 are often very subtle and variation in RGB channels are hard to detect. There-fore, assuming that the labels are assigned correctly it is difficult to detect why they are labeled as such. When looking at Raveling in Figure 3 for example, no obvious differences in RGB values are found compared to the undamaged portion of the image.

Finally, it should be considered that Pointnet++ was designed to learn to recognize shapes that have a defined, distinguishable shape from a 3D space such as: airplane, cup, chair, cat etc. These shapes have distinctive features (wings, ear, legs or tail) and are more pronounced in 3D space while road damage shapes are less pronounced since they can be considered mostly flat.

7 Conclusion

This research has not shown that Gridnet could clas-sify road damages since the method could not be tested thoroughly due to high computation costs. This research however has shown that Gridnet poses limitations that would make this method unfit for road damage classifi-cations assuming that most end-users who would like to classify road damages do not have access to hardware re-sources that would significantly decrease computational costs. Pointnet++ was relatively unaffected by these lim-itations and yielded promising classification accuracies. It does however raise the question whether the features of road damage shapes or rather features of its bounding box are learned.

(11)

Even if Pointnet++ was able to learn to recognize road damage shapes meaning that the accuracies can be accepted, the process from data collection to main-tenance is not yet faster or automated. Road damages need to be first localized and extracted before they can be classified. To proceed into automation, unsupervised learning using Pointnet++ should be tested next. This would allow Pointnet++ to ingest point clouds of reason-able size (couple of square meters) and to say whether they are of a certain damage class or not.

References

[1] W L C Van Aalst, G B Derksen, P P M Schackmann, F G M Bouman, P Paffen, and W. Ooijen van. 2015. Automated Raveling Inspection and Maintenance Planning on Porous Asphalt in The Netherlands. Technical Report. TNO. 1–26 pages.

[2] Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. 2014. Good Practice in Large-Scale Learn-ing for Image Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 3 (mar 2014), 507–520. http://ieeexplore.ieee.org/document/6574852/

[3] Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. 2017. Multi-View 3D Object Detection Network for Autonomous Driving. In IEEE Conference on Computer Vision and Pat-tern Recognition.

[4] Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for Autonomous Driving? The KITTI Vision Bench-mark Suite. In Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2016. Region-based convolutional networks for accurate object detection and segmentation. In IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 38. 142–158. arXiv:1311.2524

[6] Haiyan Guan, Jonathan Li, Shuang Cao, and Yongtao Yu. 2016. Use of mobile LiDAR in road information inventory: A review. International Journal of Image and Data Fusion 7, 3 (2016), 219 – 242.

[7] Jan Hackenberg, Christopher Morhart, Jonathan Sheppard, Heinrich Spiecker, and Mathias Disney. 2014. Highly accurate tree models derived from terrestrial laser scan data: A method description. Forests 5, 5 (2014), 1069–1105.

[8] Hal24K. 2018. Advin - Road damage detection. Technical Report. Hal24K, Amsterdam. 1–8 pages.

[9] Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. In International Conference for Learning Representations. arXiv:1412.6980 http://arxiv.org/ abs/1412.6980

[10] John Laurent, Jean Fran¸cois H´ebert, Daniel Lefebvre, and Yves Savard. 2013. 3D laser road profiling for the automated measurement of road surface conditions and geometry. Tech-nical Report. Pavemetrics. 1–11 pages.

[11] M A Lefsky, W B Cohen, G G Parker, and D J Harding. 2002. Lidar remote sensing for ecosystem studies. BioScience 52, 1 (2002), 19–30.

[12] Michael Leslar, Gordon Perry, and Keith Mcnease. 2010. Using mobile lidar to survey a railway line for asset inventory. In Proceedings of ASPRS Annual Conference. San Diego, 1–8. [13] Bo Li, Tianlei Zhang, and Tian Xia. 2016. Vehicle Detection

from 3D Lidar Using Fully Convolutional Network. CoRR abs/1608.07916 (2016). arXiv:1608.07916 http://arxiv.org/

abs/1608.07916

[14] Fashuai Li, Sander Oude Elberink, and George Vosselman. 2018. Pole-Like Road Furniture Detection and Decomposition in Mobile Laser Scanning Data Based on Spatial Relations. Remote Sensing 10, 4 (mar 2018), 1–28. http://www.mdpi. com/2072-4292/10/4/531

[15] Daniel Maturana and Sebastian Scherer. 2015. 3D Convo-lutional Neural Networks for landing zone detection from LiDAR. In IEEE International Conference on Robotics and Automation, Vol. 4. 3471–3478.

[16] Daniel Maturana and Sebastian Scherer. 2015. VoxNet: A 3D Convolutional Neural Network for real-time object recogni-tion. In IEEE International Conference on Intelligent Robots and Systems. IEEE, 922–928. http://ieeexplore.ieee.org/ document/7353481/

[17] Andreas Mayr, Martin Rutzinger, Magnus Bremer, Sander Oude Elberink, Felix Stumpf, and Clemens Geitner. 2017. Object-based classification of terrestrial laser scanning point clouds for landslide monitoring. The Photogrammetric Record 32, 160 (dec 2017), 377–397. http://doi.wiley.com/10.1111/ phor.12215

[18] J.S. Miller and W.Y Bellinger. 2003. Distress Identi-fication Manual for The Long Term Pavement Perfor-mance (Fourth Revised Edition). Technical Report. Office of Infrastructure Research and Development, Springfield. 1–164 pages. ://www.fhwa.dot.gov/publications/research/ infrastructure/pavements/ltpp/reports/03031/index.cfm [19] A. T. Papagiannakis and Eyad. Masad. 2008. Pavement

design and materials. John Wiley. 542 pages.

[20] Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas. 2016. PointNet: Deep Learning on Point Sets for 3D Classifi-cation and Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. 77–85. arXiv:1612.00593

http://arxiv.org/abs/1612.00593

[21] Charles R. Qi, Hao Su, Matthias Niessner, Angela Dai, Mengyuan Yan, and Leonidas J. Guibas. 2016. Volumetric and Multi-View CNNs for Object Classification on 3D Data. (apr 2016). arXiv:1604.03265 http://arxiv.org/abs/1604.03265 [22] Charles R Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017.

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. arXiv preprint arXiv:1706.02413 (2017).

[23] Shruthi Srinivasan, Sorin C Popescu, Marian Eriksson, Ryan D Sheridan, and Nian Wei Ku. 2014. Multi-temporal terrestrial laser scanning for modeling tree biomass change. Forest Ecology and Management 318 (2014), 304–317. http: //dx.doi.org/10.1016/j.foreco.2014.01.038

[24] Taewan Kim and Joydeep Ghosh. 2016. Robust detection of non-motorized road users using deep learning on optical and LIDAR data. In IEEE International Conference on Intelligent Transportation Systems. 271–276. http://ieeexplore.ieee.org/

document/7795566/

[25] J. P. M. Tromp. 1994. Road safety and drain asphalt (ZOAB). In Road safety in Europe and Strategic Highway Research Program. Leidschendam, 26–28.

[26] Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Lin-guang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3D ShapeNets: A deep representation for volumetric shapes. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 19. 1912–1920. arXiv:1406.5670 [27] Y Zhou and O Tuzel. 2017. VoxelNet: End-to-End

Learn-ing for Point Cloud Based 3D Object Detection. CoRR abs/1711.06396 (2017), 1–42.