Image recognition of rail insulation joints with convolutional neural networks

(1)

Image Recognition of rail insulation joints with

Convolutional Neural Networks

Master thesis G.W. Mol

MBA Big Data & Business Analytics

Amsterdam Business School, University of Amsterdam

MBABD2016 group, student number 11417773

Supervisor: Prof. Marcel Worring

25 September 2018

(2)

Table of contents

1 INTRODUCTION ... 4

1.1 THE PRORAIL ORGANIZATION ... 4

1.1.1 GENERAL OVERVIEW ... 4

1.1.2 INFRASTRUCTURE MANAGEMENT AT PRORAIL ... 4

1.2 FOCUS OF THIS STUDY ... 5

1.3 ASSET MANAGEMENT PROCESSES REQUIRE RELIABLE AND ACCURATE CONFIGURATION DATA ... 5

1.4 PREVENTIVE MAINTENANCE REQUIRES CONDITION MONITORING ... 6

1.5 OPPORTUNITIES FOR AUTOMATIC DATA COLLECTION AND PREDICTIVE MAINTENANCE ... 7

1.6 THESIS OVERVIEW ... 7

2 RAIL INSULATION JOINTS ... 9

2.1 TRACKS ... 9

2.2 TECHNICAL DESCRIPTION OF RAIL INSULATION JOINTS ... 9

2.3 EI-JOINT FAILURE TYPES AND NUMBERS ... 10

2.4 CHARACTERISTICS OF RAIL INSULATION JOINTS ... 12

3 BUSINESS UNDERSTANDING ... 14

3.1 STATEMENT OF THE BUSINESS PROBLEM ... 14

3.1.1 THE NEED FOR ACCURATE, RELIABLE AND COMPLETE REGISTRATION OF EI-JOINTS ... 14

3.1.2 THE INFLUENCE AT PUNCTUALITY OF FAILING EI-JOINTS ... 14

3.2 RESEARCH OBJECTIVES ... 15

4 LITERATURE REVIEW IMAGE RECOGNITION FOR RAIL INSULATION JOINTS ... 16

4.1 ABOUT THIS CHAPTER ... 16

4.2 LITERATURE REVIEW OF SENSORING AND ML IN RAIL APPLICATIONS ... 16

4.2.1 STUDIES USING NON-DESTRUCTIVE TESTING SIGNALS ... 16

4.2.2 LASER CAMERAS AND IMAGES ... 16

4.2.3 ELECTRICAL SIGNALS AND STRAIN GAUGES ... 16

4.2.4 AXLE BOX ACCELEROMETERS ... 17

4.2.5 IMAGE RECOGNITION STUDIES ... 17

4.2.6 COMBINING IMAGES AND OTHER SIGNALS ... 17

4.3 CONCLUSIONS ON RELATED WORK ... 17

5 DATA DESCRIPTION AND EXPLORATION ... 18

5.1 DATA SOURCES... 18

5.1.1 SAPEAM ... 18

5.1.2 NAIADE ... 18

5.1.3 BBMS(MONITORING DATA) ... 18

5.1.4 TECHNICAL, SCHEMATIC DESIGNS OF TRAIN SIGNALING SYSTEMS ... 18

5.2 LOCATION REGISTRATION ... 19

(3)

5.4 IMAGES USED IN THIS STUDY ... 22

6 BRIEF INTRODUCTION IN DEEP LEARNING ... 23

6.1 DEEP LEARNING IN PERSPECTIVE ... 23

6.2 THE WORKING PRINCIPLES OF NEURAL NETWORKS ... 24

6.3 CONVOLUTIONAL NEURAL NETWORKS ... 26

6.3.1 CONVOLUTIONAL LAYERS ... 26

6.3.2 MAXPOOLING LAYERS ... 27

6.3.3 REGULARIZATION ... 27

6.3.4 SUMMARY ... 28

7 IMPLEMENTATION ... 29

7.1 DATA DESCRIPTION AND PREPARATION ... 29

7.2 CNN ARCHITECTURE ... 31

7.3 TRAINING, VALIDATION, TESTING AND EVALUATION APPROACH ... 31

7.4 EVALUATION CRITERIA ... 33

8 EXPERIMENTAL RESULTS ... 35

8.1 TRAINING AND VALIDATION RESULTS ... 35

8.2 TEST RESULTS AT THE BALANCED TEST SET ... 35

8.3 TEST RESULTS AT AN IMBALANCED TEST SET ... 36

8.4 EFFECTS OF CROPPING AND OVERLAP AT THE RESULTING METRICS ... 37

8.5 BRINGING THE RESULTS TO BUSINESS VALUE ... 38

8.6 DISCUSSION AND RECOMMENDATIONS ... 38

9 CONCLUSIONS AND SUGGESTIONS FOR FURTHER RESEARCH ... 40

9.1 VERIFICATION FOR ACCURATE, RELIABLE AND COMPLETE REGISTRATION OF EI-JOINTS ... 40

9.2 CONDITION BASED MONITORING OF EI-JOINTS WITH MACHINE LEARNING ... 40

9.3 MISCELLANEOUS CONCLUSIONS ... 42

9.4 SUGGESTIONS FOR FURTHER RESEARCH ... 43

10 REFERENCES CITED ... 44

(4)

1 Introduction

1.1 The ProRail organization 1.1.1 General overview

The Netherlands is a densely populated country with a thriving economy, leading to a great amount of daily commuters, travelers and freight movements. This traffic is divided over roads, waterways and railways. The Netherlands has one of the busiest railway networks world-wide. ProRail is responsible for this railway network. Together with carriers, their task is to ensure that 24/7 passengers and goods reach their destination safely and on time. Primary concerns for ProRail are the safety, reliability, punctuality and sustainability of the rail network. It is a governmental organization that has been appointed to manage the infrastructure and to regulate and divide the rail capacity over the carriers of passengers and freight. The following key numbers of ProRail illustrate the size and complexity of the rail network:

Category Quantity

Train rides / year 3.3 million

Passenger rides / day 1.1 million

Passenger kilometers / year 152 million

Freight kilometers / year 6 million

Total track length (kilometers) 7,021

Number of switches 7,071

Number of railway stations (transfers) 404

Total non-current assets € 18.1 billion

ProRail’s tasks are to construct, maintain and manage the railway network and to do so, they divide the rail capacity, control all train traffic, build and manage stations and create new tracks. Finally, they maintain existing tracks, switches, signals and level crossings. The total financial volume of ProRail is approximately € 1.3 billion per year in revenue and cost. The company is 100% state owned but has been placed at arms-length to avoid conflicts of interests in the division of railway capacity among carriers. In the years to come, the

organization will be repositioned to bring governance closer to the ministry of infrastructure and the parliament.

ProRail reports Key Performance Indicators (KPI’s) that indicate the level of performance externally and internally. This report focuses on two specific KPI’s:

1. The quality of the data of its components and systems, a KPI that is expressed in a percentage of correctly registered components and systems. These data are called the configuration data.

2. The infrastructure availability, a KPI that is expressed as a percentage of the promised availability.

1.1.2 Infrastructure management at ProRail

In the previous paragraph we have seen that ProRail is an asset manager that builds and maintains physical objects and systems on a massive scale. This study focusses at small, but

(5)

important, parts in the rail called Electrical Insulation (EI) Joints. As we have seen from the table above, ProRail has 7,021 km of tracks. A track has two rails, one on each side; so, the total amount of rail is 14,042 km containing 43,000 EI-joints. To operate the rail network it is of crucial importance that these EI joints are carefully monitored. Yet they are only 6 mm in size and monitoring is hence a difficult task. So, the amounts of objects with which the organization is confronted and that need to be accurately registered, maintained and managed is large and poses interesting challenges.

1.2 Focus of this study

This study focuses on the possibilities of Image Recognition for the improvement of the two above mentioned KPIs:

- What they are and why they are important is explained in section 1.3. Image Recognition offers opportunities in large scale verification of configuration data, resulting in better quality of the processes that use these data

- This study zooms in on possibilities to monitor the condition of critical components so that unplanned downtime through unexpected failure can be prevented. This will be explained in section 1.4. Image Recognition offers opportunities to monitor the condition of large amounts of objects, to predict and prevent failures that cause unexpected down time of the rail network.

1.3 Asset management processes require reliable and accurate configuration data Configuration data are of major importance, because they are the basis for many asset management processes. Examples are:

•_{They feed the replacement process: master data administer the objects with their}

characteristics such as birth date, and thus their age. The age triggers inspections and subsequent replacement planning.

• In the procurement of maintenance contracts, contractors need to know amounts

and characteristics of objects, to enable them to make reliable cost calculations.

• Construction projects require accurate and detailed data of the as-is situation before

starting design, cost calculations and construction.

These examples illustrate that the ProRail processes are data driven and that an up-to-data, reliable and complete administration of configuration data is paramount.

The reasons configuration data suffer from quality issues are:

• Large amounts of changes each year: The railway configuration is far from being

static: maintenance contractors repair and replace smaller components on a daily basis; about 500 times a year, a construction or maintenance project replaces or (re)constructs parts of the railway network, railway stations (transfers) and/or the direct surroundings located at ProRail property. The size of the projects range from the replacement of a single switch, or a railroad crossing, to complete new railroad connections or railway stations. All these changes must be administered carefully in the configuration data. This leads to tens of thousands of changes of configuration data per year, most of them by manual entry.

•_{Manual entry: Human error introduces more and more incorrect administrations of}

configuration items, polluting the data over time. The cumulative errors raise challenges in the business processes that use the configuration data. Corrective actions often require on-site inspection and (again) human entry of the corrections.

(6)

In spite of correction projects of millions of Euros it has appeared that the current ways of working of manually administering, checking and correcting configuration data on a large scale has its limitations.

These factors cause many of the quality issues in the configuration data. They cause errors in the data over the years and subsequent inefficiencies and cost in the asset management processes that use the data as their primary input.

One of the biggest challenges is that registration errors are found regularly but that there is no way yet to verify the registrations other than by visual inspections, which, in turn, also are error prone because of the same reasons mentioned above (visual inspection of objects and manual entry of vast amounts of data). So, in fact the error rate is currently unknown. The organization cannot indicate the error rate in object administration, undermining the trust in the current registrations.

As measurement trains collect images and signals it will be studied in this report if image recognition offers opportunities for automatic collection of object data and verification of existing registrations of these objects. If the processes using the configuration data have less errors, inefficiencies and costs will be avoided and, indirectly, the availability, reliability, sustainability and safety of the network will improve, thus creating business value. 1.4 Preventive maintenance requires condition monitoring

Currently, the lifetime of a component is based on (an estimate of a) MTBF: Mean Time Between Failure, the average period after which an object fails. The asset management process uses MTBF, often combined with the monitoring of usage (number of train passages and tons), but hardly ever the actual condition of a component. The vast amount of

components do not allow to monitor the condition of all components against acceptable effort and cost in a traditional way, involving labor such as visual inspections. As a result, failures may occur unexpectedly, sometimes causing a minor or even major disruption of the train service. Failure of critical railway components such as switches or signal systems, at critical locations in the country, can cause major impact on the passenger train service, for hours or even an entire day, disrupting the logistics of carriages and trains of the right capacity being at the right place at the right time. Today’s asset management processes are

descriptive rather than predictive (x% probability this EI-joint will fail within a certain time),

let alone prescriptive (take preventive action when a certain probability the EI-joint will fail within a certain time is reached).

ProRail wishes to further improve train service punctuality, the second KPI as described in paragraph 1.2, by reducing failures that have significant impact on rail traffic. The ambition is to be able to predict failures and replace components before failure, thus avoiding

disturbance of the train service. The top 4 network disturbances are: 1. Third party influences (people, traffic, cattle)

2. Malfunctions in switches

3. Problems in train detection (is there already a train in a section?)

4. Rail problems (rail surface defects, squats, subsidence, disconnected or missing fasteners, etc)

(7)

This study focuses on the possibilities for condition-based monitoring with the use of image recognition. If this is possible, it would open opportunities to predict if objects are near to fail and take preventive action, thus having a positive impact on network availability. It will be clear that if more passengers and freight arrive on time daily, this would create very high value for the organization.

1.5 Opportunities for automatic data collection and predictive maintenance In recent years we have seen the rise of big data, machine learning and analytics on a massive scale. Big Data analysis has been explicitly addressed as potentially very interesting for rail [1], [2]. This is made possible due to an ever and exponentially increasing

computational power at affordable cost. At the same time, sensors are becoming cheap, robust and ubiquitous. There are sensors in many types: cameras (both for video and stills), accelerators, temperature sensors, and laser scanners for point clouds, all producing usable, digital data, if required connected through the internet or IoT, and producing static and streaming data. All these data carry the promise to detect and monitor individual components and to predict when they will fail.

The first important opportunity is data collection and verification: instead of solely relying on manual, human data collection, Image Recognition carries the opportunity to automatically detect and classify objects from signals and/or images, offering opportunities to verify existing registration.

The second important opportunity is the automatic estimation or classification of the condition of an object. In asset management this is called Condition Based Monitoring (CBM).

As we have seen, an asset manager can follow the current MTBF type of approach, but new opportunities arise in the field of CBM and predictive maintenance. In the MTBF approach all components are treated the same: from a certain average lifetime a component has an

average probability to fail after which on average the cost of failure is higher than

replacement cost. This inherently has the risk of replacing good components that were not worn yet (or at least spend cost for inspection and decision making to extend their lifetime) and missing components that fail before the average lifetime has been reached, incurring cost through unexpected disruption of the railway network. CBM offers the opportunity to determine correlations between characteristics of signals and images, and the condition of a component. Once established, this offers the opportunity to assess the condition of

individual components on a massive scale, that previously only could be performed at

unacceptable cost and at a human capacity that simply was (and is) not available. 1.6 Thesis overview

First, the reader must get acquainted with the function and characteristics of the component under consideration: the rail electrical insulation joint (EI-joint). Chapter 2 gives a technical description and the possible failure modes, amounts and an impression of the locations of rail insulation joints used in the Dutch railway network. This forms the basis for chapter 3, which presents the business problem and states the research objective of this thesis. Chapter 4 gives an overview of scientific literature on the use of image recognition for classification and detection of rail surface defects and insulation joints. Next, chapter 5

(8)

describes the data available (administrative registrations, location data, images, etc.) and the data preparation of the images. Chapter 6 briefly introduces the reader into Convolutional Neural Networks which form the basis for our image recognition solution, to a level that enables the reader to interpret the results further on. Chapter 7 describes the data preparation and the CNN design used in chapter 8, that gives the experimental results. Chapter 9 gives the final conclusions of this study.

(9)

Figure 2 The train detection system.

2 Rail insulation joints

This study focuses on image recognition of a specific component in the rail network, called

Rail Insulation Joints or Electrical Insulation Joints. It is therefore necessary to understand

what they are, what function they have, what problems may occur and what their

characteristics are before continuing with the business understanding and the application of image recognition. This chapter therefore first explains EI-joints, to enable the reader to interpret them in the rest of this thesis.

2.1 Tracks

EI-joints are small components of tracks. A track is a railroad connection from a switch to a next switch, or from a switch to a buffer stop. At ProRail, tracks have a unique number, a physical length and a defined direction (a start and an end switch, or a starting switch and an ending buffer stop).

2.2 Technical description of Rail Insulation Joints

Electrical Insulation Joints (EI-joint) are components of the rails intended to electrically separate sections of rails. The separation consists of a 6 mm gap filled with a hard, plastic material, called the End Post. Both sides of the rails are connected to each other by two plates on the side (often called fish plates), mechanically fixed by four or six bolts, as demonstrated in the pictures in Figure 1 below.

Figure 1 Examples of Rail Insulation Joints

The most important function for an EI-joint is to facilitate train detection. The train detection system uses the rails as part of an electrical circuit. This is illustrated by Figure 2. This figure shows the situation without a train in the block at the left. If there is no train in the block (Figure 2, left hand side), there is an electrical circuit from (and to) the

power source via the rails to a relay1_{that connects}

both rails. This track relay is part of the track flow. The coil in the relay attracts the relay contacts, so that the signal at the beginning of the block will

1_{A relay is an electromagnetic switch holding up a ferro-magnetic piece of material (switched to one side)}

inside a coil, if there is an AC-voltage at the coil. As soon as the signal falls out, the weight drops and the switch is turned to the other side.

(10)

show green light. After all, the block is free. If there is a train in the block (figure 2, right hand side), it will be short-circuited by the wheels and axles of the train. No power flows through the coil of the relay and the relay drops off. As a result, the relay contacts come into their rest position so that the signal at the start of the block will show a red light.

The electrical current runs through the coil of a relay, keeping its weight up and the traffic light turned on green. As soon as a train passes the EI-joint (red lines at the right of figure 2), the electrical circuit short-circuits through the train axle and wheels with the rails on the other side of the track, causing zero voltage at the relay and its weight to drop, switching the signal from green to red.

This signals an upcoming train that it may not enter the section. As soon as the train leaves the block at the next EI-joint, the light is turned green again.

A second important function of an EI-joint is to delimit the traction current that passes from the overhead power lines through the train and rails back to the power supply station. This keeps the side-effects of currents under control. Failures in joints disturb its electrical and mechanical functions.

EI-joints can be single sided (either at the right or at the left rail of the track) and double sided (both at the left and right rail of the track). Although insulation on both sides of the railway track (in the left and the right rail) is not strictly necessary, experience has shown that double sided EI-joints have lower incident rates (in other words, are more reliable) than single sided ones.

A special category of EI-joints is found within switches. Switches perform the function of being able to change from one track to another. The consequence is that it causes electrical contact between the left and right rails of the track. For that reason, switches contain EI-joints within the switch construction. Physically they are very much like the EI-EI-joints found in point to point tracks. It is important to note that due to switches, the number of EI-joints required is actually bigger than would be expected on first sight. EI-joints can be related to three different subsystems:

1. Tracks (point to point connections)

2. Railroad crossings (regular traffic crossing a railroad) 3. Switches

2.3 EI-joint failure types and numbers

EI-joints can fail for various reasons. Two categories of failures occur:

1. An electrical failure, usually caused by contacting one side of the joint with the other by scraped of material, vandalism (contacting with a coin or other peace of

conductive material) or rail head deformation (an effect called ‘overhang’); a pure electrical failure doesn’t cause any mechanical effects noticed by trains passing by. 2. A mechanical (physical) failure, for example broken fish plates, loose bolts, breaking

out of the rail head; mechanical failure may or may not cause electrical failure but it can cause unwanted effects such as environmental noise, uncomfortable bumps for passengers, damage to train wheels and ultimately, in case of a complete mechanical failure, create unsafe situations (derailing).

(11)

Figure 3 gives the distribution over the two categories of failure types.:

Figure 3 Electrical and mechanical failure distribution (derived from [3])

The following figures give examples of failure types of EI-joints:

Figure 5: end post damaged, rail end broken out

Several types of failures can cause failure of EI-joints. The most import ones are the ones that cause a short circuit between the metal on both sides of the 6 mm layer of insulation. Short circuit can be caused by pollutions, such as iron splinters and material shaved off from the wheels of trains, the rails itself or both; or from deformation of the metal because the material is being rolled out over the years by trains passing by, eventually making contact to the other side, an effect called overhang. Part of the failures are caused by vandalism usually at level crossings. If a coin or a piece of metal is put on the rails, an electrical contact is made, the system thinks a train is passing, the bells start clinging and the barriers come down.

One specific, extensive study [3] reports on the numbers of yearly failures per failure type over a population of 50,000. In the following table they give the absolute numbers per 50,000 EI-joints per year:

Failure cause Actual

nr/50,000

1 Joint shorted: splinters 261

2 Joint bypassed: overhang 200

3 Joint shorted: foreign object (vandalism) 200

4 Joint shorted: shavings 150

5 Broken fishplate 83

6 Poor geometry 48

7 Glue connection broken 37

(12)

8 Rail head broken out 30

9 Damage by maintenance 18

10 Joint shorted: shaving from grinding 10

11 End post broken out 9.4

12 Battered head 5.5

13 Arc damage 3.4

14 Broken bolts 2.1

Figure 6 (also derived from [3]), gives the distribution of failure categories of EI-joints:

Figure 6: Distribution of failure types [3]

The table and figures above suggest that 1,057 EI-joints per 50,000 per year fail, in other words about 2.1%. ProRail has 43,000 active EI-joints, so we may expect around 900 failures per year.

2.4 Characteristics of Rail Insulation Joints

Several configuration data registrations within the ProRail organization contain data on EI-joints. The details of these registrations will be described in chapter 5. The most relevant registrations are:

• SAP EAM, containing the information on the physical characteristics of objects.

• Naiade, containing the locations of section endings.

From these two sources, SAP EAM contains slightly over 43,000 and Naiade 18,000 registrations of active EI-joints. The differences are explained by the following aspects:

•_{SAP EAM is intended to reflect individual, physical components. EI-joints can be single}

but also double sided and do not all have the function of a section ending for train detection. EI-joints in switches are often intended to prevent electrical contact between the right and left rails, without always being a section boundary.

•_{Naiade only contains the locations of section endings projected on the center line of the}

railroad track and does not administer if it is single (either at the left or right rail, thus only one joint) or double sided (at the left and the right rail, thus two joints). It also disregards extra EI-joints in switches that do not have a function as a section boundary.

(13)

Although no exact measurements were available for this research, specialists at ProRail indicate that there should be about 33,000 EI-joints that have a function as a train detection section ending and are most relevant for condition monitoring.

(14)

3 Business understanding

3.1 Statement of the business problem

3.1.1 The need for accurate, reliable and complete registration of EI-joints As described in paragraphs 1.3 and 1.4 it has appeared that the quality of EI-joint registration is variable and at times poor, leading to registration errors that accumulate through the years, causing inefficiencies and cost in the primary processes that use the data as their input. This calls for the need of affordable verification techniques.

This thesis uses automated verification of EI-joints as a case for verification of object

registrations in general. About two to four times per year pictures are taken from the rails by a camera that has been mounted vertically under the train. One way to find the EI-joints is to visually inspect the pictures. Experience from picking pictures have shown that human inspection would require an estimated 126 mandays to only inspect all pictures of a single measurement run of 14,000 km of rails. Including double checks and overhead, it is not unreasonable to assume that this would cost at least 150 man days for a single run of data in total. It is clear that if one generalizes human verification to a variety of object types, cost and effort would become unacceptably high.

The alternative is automatic recognition and registration of objects. The first objective of

this thesis is to establish if EI-joints can be detected automatically from pictures of the rails taken by specialized measurement trains. If so, this would indicate that there are various opportunities for automatic verification for the ProRail organization.

3.1.2 The influence at punctuality of failing EI-joints

A malfunction of the infrastructure causes so called ‘train service affecting irregularities’,

denoted with the acronym TAO2_{. The goal of the asset manager ProRail in general is to}

reduce the number and impact of TAOs.

For this study it is relevant to note that a disturbance of a train service can have several causes:

1. A breakdown of the train itself

2. Logistics (personnel or trains not being at the right time at the right place)

3. An organizational or physical malfunction of the infrastructure, or external effects blocking the infrastructure

Categories 1 and 2 are the responsibility of the carriers and are outside of the influence of ProRail and are therefore not considered in this report.

Category 3 is ProRail’s responsibility. In this report we will focus on disturbances caused by

malfunctions in the train detection systems of the infrastructure. This type of disturbance has

its causes in the physical (as opposed to organizational) malfunctioning of the train detection system of the infrastructure.

2_{TAO stands for ‘treindienst-aantastende onregelmatigheid’, the Dutch translation of ‘Train service affecting}

(15)

In this paragraph we will take a further look into the relative importance of EI-joints as compared to all types of disturbances that ProRail must manage. The object of this study concerns the #3 impact on train service: train detection as described in paragraph 2.1. The

following table illustrates the relative amount3_{of impacted trains hindered by electrical}

circuit problems in train detection, as a percentage of the total amount of impacted trains. It appears that this is 13.8%.

Although the fraction of EI-joint malfunctions causing a malfunction of the electrical circuit is not exactly known, the majority of electrical circuit problems are attributed to a malfunction of EI-joints. One should realize that reducing train detection malfunction by detecting

degraded EI-joints will have a positive impact of 10-14% of all infrastructure related hindrances. Still, it is the third TAO category, and therefore important.

The second objective of this thesis is to research the possibilities to classify the condition of EI-joints automatically. This would potentially have a positive impact on the KPI for

punctuality of the train service. 3.2 Research objectives

The research objectives of this study are:

1. Can EI-joints be automatically recognized from specific images taken by

measurement trains, for the purpose of verification of existing configuration data registrations of EI-joints and if so, how and how well does it perform in terms of metrics (overall accuracy, precision and recall, confusion matrix)?

2. What are the requirements for condition-based monitoring of EI-joints using Image Recognition?

3_{Note that 100% is all hindrance caused by the infrastructure, but that in fact, infrastructure related hindrance}

is only 15% of total hindrance, as explained in the introduction. Since this study is limited to the ProRail organization, I will use 100% as all infrastructure related (all hindrance for which ProRail must manage the solution)

(16)

4 Literature review Image Recognition for Rail Insulation Joints

4.1 About this chapter

Traditionally, railways were inspected visually by inspectors walking along the tracks. In the last two decades measurement trains have appeared, recording video and still images and sensor data using both conventional non-destructive testing and modern techniques such as laser scanners. Initially, the images and data were used to aid inspectors zooming in visually where they had indications of defects. In the last decade, computer power and the general availability of machine learning algorithms have given rise to the use of Image Recognition to automate tedious, error prone and labor-intensive visual scanning of large amounts of images and signals. This chapter presents research results of the application of

ML-techniques for rail from scientific literature. Some parts of the text in this chapter refer to Neural Networks, Convolutional Neural Networks or Deep Learning. If the reader has no knowledge in these fields, it could be useful to read chapter 6 first, before continuing with the rest of this chapter.

4.2 Literature review of sensoring and ML in Rail applications 4.2.1 Studies using Non-Destructive Testing signals

Non-destructive testing (NDT) is the field of inspection of materials and constructions using sound waves (usually ultrasonic), eddy currents, X-rays, thermal radiation and others. Active methods in NDT use transducers that emit and sensors that receive signals. Examples are transducers that emit and receive ultrasonic sound waves [4], [5], or a coil that emits an electromagnetic field inducing currents and/or sound waves. Passive methods only look at received signals, such as spontaneous sounds, or thermal images giving surface temperature of materials or objects. Rail asset managers all use these techniques to detect anomalies in various materials and constructions, among which the rail itself. Several studies use non-destructive testing data for automatic detection and classification of anomalies in rail. 4.2.2 Laser cameras and images

Another study mentions the use of Deep Learning used on laser camera 3D-images [6]. Others use axle box accelerators to detect anomalies in rail. An older study reports crack detection of EI-joints, mostly at the fish plates [7], using more classical ML-algorithms than Neural Networks. They report 29% True Positives but still had 15% of False Negatives. These studies indicate that the use of Deep Learning methods as well as signals (instead of images) are promising.

4.2.3 Electrical signals and strain gauges

An insulated joint with an electrical failure can sometimes be identified by a drop in the voltage between the two rails of the joint [8]. This voltage can be measured without

disrupting the track circuit. Intermittent failures can be noticed by a system that constantly samples this voltage value and watches for sudden drops. The same authors reported that finite element modeling and laboratory testing have identified increases in the strain at the center of the joint bar and the elastic gap opening between rail ends as salient

characteristics of debonding. This suggests that built-in strain gauges, if connected in some way (e.g. by IoT), can also be used to find indications of debonding of the EI-joint.

(17)

4.2.4 Axle Box Accelerometers

TU Delft has researched the use of axle box accelerometers (ABA) for monitoring of EI-joint condition [9]. Using time series and spectral analysis of ABA signals, they were able to detect EI-joints with a hit rate of 84% and a false alarm rate of 19%. They also studied detection of rail surface defects using ABA [10]. Severe and moderate rail defects were found with a 100% hit rate and no false alarms and small defects with 78% hit rate. In this paper, they use the scale-averaged wavelet power (SAWP) for the automatic detection of squats. This

function captures the variation of the spectrum in a signal, and thus, the system triggers the detection when the power spectrum of the frequencies related to the defects is higher than a given threshold. This is a very simple algorithm that can be applied in real time. The results suggest that this technique has high potential for the detection of squats, but surprisingly, does not score extremely high for detecting EI-joints. A similar study also uses energy density measurement of images processed with a Gabor filter [11]. The proposed method uses an algorithm to extract the rails from the background, coming down to a contrast image processing method. By using energy thresholding they report over 89% Recall and Precision. 4.2.5 Image recognition studies

Other studies researched the recognition and/or classification of rail defects (called squats) or specific rail objects from images taken by a camera under or from a train [12], [13], [14, 14], [15]. Research at TU Delft shows that rail defects can be detected with Convolutional Neural Networks (CNN) with up to 93% binary accuracy. They demonstrated that in a multiclass classification of six classes (normal, insulation joints, four types of squats), the accuracy for the classification of joints was 88.6%. Shang et. al. [15] use a two-step approach to automatically crop images at the rails after which they apply a CNN to do a binary

classification between normal and defect rails. They also report an overall accuracy up to 91% for the detection of rails containing defects such as squats. In general, the studies contain results of both binary and multi-class classification.

4.2.6 Combining images and other signals

The use of other (non-image) data in combination with image recognition could potentially be very interesting. Using axle box acceleration or eddy current for instance, ML algorithms could be able to detect EI-joints from their signals and confirm the presence of joints detected from images with CNN (or the other way around).

4.3 Conclusions on related work

The studies referred to suggest that signals other than images offer potential for classification of anomalies and joints in rail. Time series measured from ultrasonic equipment, eddy current inspection, laser cameras and axle box accelerometers all offer potential for classification. As for images, Neural Networks offer interesting opportunities in the classification of signals.

Research in image recognition shows promising results in the recognition of both rail defects and EI-joints [10], [12], [13], [15]. This literature will be used in the CNN architecture as explained in paragraph 7.2 and the results in chapter 8.

(18)

5 Data description and exploration

5.1 Data sources

ProRail has a number of data sources of EI-joints available, each of which serves a specific purpose. This section describes the primary data sources to enable the reader to have a global understanding in sections referred to the systems and their data. The reader should note however that only the specific images contained in the BBMS system have been used for the image recognition results presented in chapter 8. However, it is important to understand the other sources, since they provide information about EI-joints, and also are the sources to be verified by the results of image recognition. The sources mentioned in this study are:

5.1.1 SAP EAM

SAP EAM is part of the SAP landscape of ProRail and contains the functionalities for the physical asset management (daily maintenance, incident management, replacement planning, financial planning, etc) of objects. SAP EAM contains an administration of the master data of the objects and their physical characteristics. The object master data in SAP EAM is the basis for the physical asset management processes of both ProRail and its contractors. One of the object types contained is the EI-joint, including producer, type, location, etc. SAP EAM also contains notifications of incidents. Experts at ProRail analyze root causes of incidents. SAP EAM is a source to be verified, rather than used in this thesis. 5.1.2 Naiade

Naiade is a system that is used in the schematic-geographical domain of data. Its main purpose is to supply a solid data source for (re)design and publication of train

communication, signaling and processing systems. Naiade contains a registration of all tracks of the railway network. It does not contain the individual EI-joints, but it does contain a registration of the boundaries of the train detection sections. The Naiade data have indirectly, been automatically collected from technical drawings. The technical drawings themselves have been made by people and contain inaccuracies in locations due to the use of the km-ribbon way of representing the location. Like SAP EAM, the Naiade data remain to be verified rather than being used as data in this thesis.

5.1.3 BBMS (Monitoring data)

ProRail monitors the railway network regularly with specifically contracted measurement trains, recording data of ultrasonic and eddy current testing, and various images and videos. The BBMS system publishes the monitoring data from measurement trains, among which images of the rails. The images all have reference files, referring to the correct track and a position at the track at which the picture has been taken. Images stored in BBMS have been used in this study. They will be described in paragraph 5.4.

5.1.4 Technical, schematic designs of train signaling systems

ProRail manages the collection of technical designs of the train signaling system. The designs are created and stored as CAD-files and not as structured data. These drawings contain the locations of section endings and individual EI-joints. The InfraAtlas system (a bespoke

development of ProRail) extracts section endings from the design drawings projected on the railway center line, but not per side (left or right rail, or both). This source has incidentally

(19)

been used to check EI-joint positions from designs, but don’t have an important role in this study. As with the SAP and Naiade data, they are a source to be verified.

5.2 Location registration

The sources used contain several ways of determining the location of an object. They are:

• code/km: ProRail uses a system that divides the country into a number of

GEO-codes (areas or regions) and a kilometer along the track. This facilitates easy recognition by the train driver and his communication with train traffic control and incident management. The main disadvantage is that it has considerable inaccuracies in absolute positions.

• Kilometer-Ribbons: this system is closely related to the GEO-code/km location

system. It has the same disadvantages as GEO-code/km system.

• Coordinates in GPS or RD: this has the advantage of giving an exact location at the

map.

•_{Length along a track in % of total track length.}

The following table gives an overview of the location registration used per system and type of data:

Data from GEO-code/km Km-ribbon RD/GPS % of track

SAP EAM x x (RD)

Naiade x x (GPS)

BBMS x

It should be noted that the location results cannot be used immediately and require translation between different ways of capturing locations. ProRail already has undertaken steps to develop software components, but they have not been applied in this study yet. To bring the image recognition results to value, they must be translated to location

(20)

5.3 Data quality

SAP EAM contains 45,343 EI-joints in total. Figure 7 gives an indication of the types registered in SAP EAM:

Figure 7: Numbers of EI-joints per type of joint (from SAP EAM dd. May 28, 2018)

The largest category is of the type ‘Unknown’ (‘Onbekend’ in Dutch) and contains 43.5% of all registrations of EI-joints. The second category is the EI-joint of type ‘Edilon-NS’ and is found in large parts of the South-West, West and North-East of the country and contains 34.7% of the grand total.

Another example is the registration of the date of installation of joints. Figure 6 illustrates the registration of the date of installation of EI-joints. A large number of the EI-joints has a date in the year 1800, long before any railroad track existed in the country.

(21)

Figure 8: Visualization of installation dates of EI-joints in SAP EAM

This raises doubts in the accuracy of the EI-joint registrations in SAP.

Naiade contains 18,302 EI-joints reflecting section endings. As indicated earlier, Naiade has been converted from technical drawings and a GIS system. The difference between the total number in SAP and in Naiade is explained by the following facts:

• many joints are in the left and right rail. SAP registers each individual EI-joint while Naiade only registers section boundaries from EI-joints as a single section ending, reflecting the joint position(s) at the track heartline, without specifying whether it is single or double sided.

• Extra joints occur in switches to avoid electrical contact between sections through

the switch construction. These joints do not form section endings and are not registered in Naiade.

The absolute locations of the section endings may have their inaccuracies coming from the km-ribbon location indications in the technical drawings that form the source of the Naiade data. It is also not exactly known if all registrations are still current, actually exist or are obsolete. A general shortcoming in the Naiade and especially SAP EAM data collections is that they are produced by people (directly or indirectly) and that especially in large

quantities, errors can be made in administration of objects. This is a well-known problem in SAP EAM: this is a database of approximately 1,000,000 objects, that have been collected during many years by a great variety of contractors and personnel. In practice, manual registrations of this size tend to degrade in quality in time. Keeping the administration accurate is very labor intensive.

The locations of the images from the measurement trains have inaccuracies due to measurement errors in GPS. The measurement train also uses the number of wheel revolutions to determine its position. Slippage of the wheels may cause inaccuracies in the determination of the location of the images.

(22)

All these aspects make it difficult to effectively use a combination of the data sources. Usage of the SAP EAM and Naiade collections for the initial estimate of location appeared to be to inaccurate and to involve too much work in translating location systems. Image locations can be computed from data supplied by the contractor preprocessing the images of the

measurement train, but required considerable programming. This study therefor focuses on researching if and how well joints can be detected from images.

5.4 Images used in this study

The images used in this study have been recorded by a line scanning camera that has been vertically mounted under one of the measurement trains. This study uses pictures taken by a camera under a special measurement train. This measurement train periodically (two to four times per year) takes pictures of all the rails of the network. They are all images of the top of the rail and not from the sides. It should be noted that for the purpose of the second

objective (condition monitoring), failure types that are not visible from the top, will not show up in the pictures and cannot be detected and classified from the measurement train pictures alone. The following picture gives an example of an image containing an EI-joint:

Figure 9: Example of an original image containing an EI-joint

The image reflects 2.5m in length of the rail and 0.56m in width around the rail. The image contains 1293 x 289 pixels. The length of the images is not constant throughout the dataset. It varies plus or minus 5 pixels. The full dataset contains 4.2 million images and a total of 230 Gb. The images do not contain all parts of the railway network. The total length covered by the images is about 10,400-10,800 km. The switches are not covered by the images. The dataset is imbalanced. The ratio of images with to images without joints is hard to measure exactly: the total number of pictures prevents the number of pictures containing joints to be counted. Assuming that all section ending EI-joints are contained in the pictures, the ratio of image containing a joint and an image not containing a joint would be 1:127 on average. This assumption will be used in the next of this study. However, some tracks use other signaling systems, such as axle counters. Without going into detailed descriptions of this type of system, it occasionally happens that tracks (and thus series of images) don’t contain joints.

(23)

6 Brief introduction in Deep Learning

6.1 Deep Learning in perspective

During the twentieth century people have attempted to develop systems of Artificial Intelligence (AI), an area of computer science that attempts to create intelligent machines that work and react like humans. In the beginning researchers tried to develop machines with explicitly programmed rules. This type appeared not to be very successful in dealing with situations that had not been programmed in the system. Another subcategory of AI are methods and systems that use data to learn from. This is the field of Machine Learning (ML). ML algorithms can be divided into two major categories:

1. Supervised Learning algorithms: this category requires a labeled dataset as input to train the algorithm.

2. Unsupervised Learning algorithms: this category does not require labeled data but determines categories itself.

3. Reinforcement learning algorithms: this category has come up more recently, but since it is a more recent development, it will not be regarded in this thesis. It is important to note that the algorithm used in this study requires labeled input to be trained and therefore belongs to the category of supervised learning algorithms.

ML-systems often require a set of characteristics in some tabular form as inputs, after which an algorithm learns to make a distinction between classes (classification) or predicts a value of some kind. Often, these algorithms are simple, easy to train and use but have the

disadvantage that they require a set of characteristics of the topic of interest. But how can one be sure that one has the optimal set of features? A specific subset of ML, nowadays referred to as Deep Learning, do not require the extraction of features but can learn from the data (such as images or signals) directly. Deep Learning is a term used for multi-layered Neural Networks. The word “Deep” refers to the use of many layers in these Neural

Networks. Neural Networks are loosely inspired on the working of the visual cortex and the brain of animals and humans, which will be described further in this chapter. Thus, Deep Learning is a subset of ML that is a subset of AI, as illustrated in the next figure.

(24)

Figure 10: Deep Learning can be seen as a subset of ML, that is a subset of AI (source www.edureka.com)

The visual and audial parts of animals and people have systems of biological learning with which one can easily classify and recognize a wide variety of visual, audial and other complex sensory information. This has inspired the development of models of how learning happens or could happen in the brain. They usually model the brain function through interconnected cells or neurons, that transfer values between cells through some mathematical function between the cells. Through time, this principle was called cybernetics in the 1940-1960s,

connectionism in the 1980-1990s (when the term Artificial Neural Networks (ANNs) also was

used) and has seen a resurgence from 2006 onwards, since then often called Deep Learning. 6.2 The working principles of Neural Networks

In this section the principles of Neural Networks (NN) are briefly explained, without going into the mathematical details. The intention is to give the reader a general idea and some important characteristics that explain the effectiveness of NN compared to other types of algorithms and to give a basic insight into the way it works, so that the results presented in the next chapter can be interpreted more easily. This section does not go into the deeper mathematical details, other than the basic principle of the way a NN is trained and computes output from an input. The interested reader can find these details in textbooks (e.g. [16]). NNs consist of layers of nodes of connected cells, or neurons that transfer values from layers of sending cells to layers of receiving cells, as illustrated in Figure 10. These layers are also

called densely connected or dense layers, a term we will use further on in this text. The input layer (red) contains the values from our image, signal or some set of characteristics that we wish to classify. The first yellow layer receives its inputs from the input layer, after which it passes on to the next layer, until it reaches the output layer (blue). In this

example the output layer reflects four classes:

(25)

by simply picking the output node with the highest value, the network indicates its estimate of the class that it finds most likely to reflect the class of the input. The signal or input goes from left to right. The first hidden layer builds up a representation from the input layer, the second hidden layer from the first hidden layer, and so on until it reaches the output layer. This is called forward propagation.

The principle of forward propagation works as follows: let’s take the blue neuron y1 in the

next figure as an example. This is the top neuron in the first hidden layer. Its value is

determined by the sum of a weight w1 multiplied with the value of the first neuron of the

input layer, x1 and the weight w2* x2. Note that this is a linear combination. Next, we apply a non-linear function to this sum. The reason to apply a non-linear transformation is that otherwise we would have nothing more than a set of linear equations with a large number of weights, and linear problems have their limitations in solving problems.

Figure 12: forward propagation from input cells x to hidden layer 1, to hidden layer 2, to the output layer y.

The equation for the determination of the first neuron in the first hidden layer y1,1 is: 𝑦𝑦1,1 = 𝑓𝑓(𝑤𝑤𝑥𝑥1,1𝑥𝑥1+ 𝑤𝑤𝑥𝑥2,1𝑥𝑥2+ 𝑏𝑏1,1)

Next, the values in rest of the cells of the first hidden layer are computed the same way. Note that transferring the input x1, x2 to the first hidden layer y11, y22, y33 requires six weights in total. The last term, b11, is a bias term to compensate for offsets in the data. So, in total there are 9 parameters involved in transferring the input to the first hidden layer. The general formula to determine the number of parameters from one layer to another is

(𝑛𝑛𝑛𝑛 𝑜𝑜𝑓𝑓 𝑖𝑖𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐) 𝑥𝑥 (𝑛𝑛𝑛𝑛 𝑜𝑜𝑓𝑓 𝑜𝑜𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐)+ (𝑛𝑛𝑛𝑛 𝑜𝑜𝑓𝑓 𝑜𝑜𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐) So, in our example above, we would have a total number of

(2 x 3 + 3) + (3 x 2 + 2) + (2 x 1 + 1) = 20

parameters. In today’s application of NN’s it is not uncommon to have millions to tens of millions of parameters. The function 𝑓𝑓 is the activation function. Examples of activation functions are Rectified Linear Unit and tanh (𝑥𝑥). Rectified Linear Unit, or ReLU is defined by:

(26)

Note that both ReLU(x) and tanh (𝑥𝑥) are non-linear functions, that allow us to effectively describe complex and non-linear phenomena. There are a variety of activation functions and numerous variations per type of activation function, but ReLU is reported to be an effective activation function in image recognition and will be used in the research for this thesis. The next question to be solved is how to train the algorithm. This comes down to how to determine the set of parameters that optimally maps our input (e.g. images of cats and dogs) to the output that reflects classes (this is a cat or a dog). This is done through back

propagation. Without going into the mathematical details this works as follows: at the start

we have no idea what the

parameters in our network should be, so we just start at some random value, or, in some cases, a best guess or previous work. Next, the network computes the output using the initial

parameters and determines some measure of error between the output (reflecting the predictions) and the true values of a training set. Then in a next step, called an epoch, the parameters are slightly adjusted and the output is determined again. If the error increases, we are obviously going into the wrong direction with our parameters. If the second output has a smaller error, the algorithm has found the direction in which to search further. In the further steps the algorithm uses the error difference between two consecutive epochs as a measure for the change in the weights. Again, this report will not explain the details mathematically, but it is important to realize that the error plays a role. The intention is that, from epoch to epoch, the error (called loss) should become smaller and smaller and that this should be done as computationally efficient as possible. Searching for the minimum in a complex function with many parameters is a mathematical and computational field in itself. We will not go into any further detail here, but Deep Learning packages (Tensorflow, Theano, Torch) have implemented search algorithms that deal effectively with finding optimal parameters in a neural network. The above procedure shows that Neural Networks usually fall in the category of supervised

learning. This is not always the case, but in the research described in this report, it is.

6.3 Convolutional Neural Networks 6.3.1 Convolutional layers

In recent years, Convolutional Neural Networks (CNNs), have shown to be effective in image recognition [17], [18], [19], [20], [21], [22]. CNN’s use convolutional layers in combination with the layers described in the previous section. Convolutions can be seen as filters that highlight features in an image, such as edges or certain characteristic shapes or textures; instead of just determining weights and biases as described above, CNN’s determine the optimal set of convolutional filters that fits the classification best. In a convolutional layer, the values of the receiving cells are determined by a convolution with the input layer. Convolutions build feature maps of images. If, for example, we specify that a convolutional

(27)

layer has 10 “nodes”, the layer builds 10 different images of 10 different convolutions with the input image. By using the principle of back propagation as illustrated in the previous section, the network finds the optimal set of convolution filters that build feature maps that optimally highlight characteristics that discriminate between the classes the algorithm has learned from training.

6.3.2 Maxpooling layers

Although it has been proven that CNN’s are very effective in image recognition, they come at a computational cost. Without going into further detail, they tend to lead to strong increases of the number of parameters from layer to layer. A way to mitigate this is the use of pooling operations. Pooling is a technique to downsize images by summarizing groups of pixels. CNN’s often use maxpooling layers that take the maximum in a box of pixels, usually 2x2, to create a summarized image. Maxpooling reduces the number of parameters that the

convolutional layers produce. Another purpose of Maxpooling layers is to consider the image at multiple levels of resolution and scale. Maxpooling also is a form of regularization,

described in the next paragraph. 6.3.3 Regularization

CNN’s usually contain hundreds of thousands to millions or even more parameters. One way to look at the parameters is that they reflect what the network has learned from the training set. Having so many parameters always has the risk of overfitting, the tendency to have very detailed knowledge about the training set, but a much weaker ability to generalize to unseen data. Regularization is a way to suppress overfitting effects. Two strategies of regularization have proved to be effective in CNN’s:

1. The use of Dropouts 2. Batch normalization

Dropouts is a technique that randomly turns off connections between cells of two

connected layers [23]. This is illustrated by the figure at the left. In each epoch, the algorithm randomly chooses to break a number of

connections. The user specifies the percentage of connections to be broken. Dropouts in a way introduce noise into the system, thus

suppressing to become too sensitive to detailed features in the training set. It can be seen as using a (slightly) different network in each step of training, because in each step another set of units is canceled out. Dropouts tend to suppress overfitting effects.

Batch normalization as proposed in [24] takes the output of a particular layer and rescales it to a mean of 0 and a standard deviation of 1 of a particular batch of training data. The algorithm solves the problem where different batches of input might produce wildly different distributions of output in any layer of the network. If this occurs the learning can progress very slowly. Batch normalization has the effect that it speeds up the learning progress in cases where these effects might occur.

(28)

Finally, Maxpooling as described in paragraph 6.3.3 also is (next to the functions already described) a form of regularization.

6.3.4 Summary

Neural Networks are inspired by the way the brain of humans and animals work. They connect layers of units more or less like braincells are connected in the brain. Convolutional Neural Networks (CNN’s) are a class of Neural Network architectures that are fit for image recognition problems. They use a combination of convolutional layers and fully connected neural layers. To avoid overfitting and slow learning, one can use maxpooling, dropout and batch normalization techniques.

(29)

7 Implementation

7.1 Data description and preparation

The following steps have been taken to prepare the images for the CNN:

•_{Draw a sample of tracks with pictures: The source data consists of 4,200,000 pictures}

of 429 x 1293 pixels, of 3692 tracks. From the 3692 tracks, a random sample of 102 tracks was drawn. The 102 tracks collectively contained slightly over 100,000 images of left and right sides of rail.

•_{Build up a training set of positives and negatives from the sample of 100,000 pictures:}

422 pictures of joints were manually picked from the total set of 100,0004_{. Since the}

data are imbalanced (approximately 1 positive on 127 negatives) the negatives (not containing an EI-joint) have been built up by drawing a random sample of pictures from the same 102 tracks, from which the positives were removed. Next, the set of negatives was under-sampled to an equal amount of negatives as positives (422).

• Crop images: The joints themselves are quite small (approximately 4x25 pixels). The

422 original pictures were cropped to zoom in at the rails and the direct surroundings (including the characteristic bolts). Two crop sizes were tested:

o 430x96 pixels: Although often difficult to observe, the pictures often show the bolts of the fish plates on next to the rail. Many pictures show the return cables of the traction power system, attached to the rail close to part of the joints. The bolts and cables could be features for image classification. o 280x28 pixels: The second crop size only shows the rail surface itself. The

intention is to focus at the rail only and to recognize joints from their characteristic shape, rather than associated parts such as cables and bolts next to the rail.

• Apply overlap between consecutive cropped pictures: The consecutive cropped

pictures overlap to avoid problems with joints at the boundary of an image. In this case an overlap of 50% of consecutive images has been used, as illustrated by Figure 13

•_{Normalize cropped pictures: The pixels in the cropped pictures are normalized to}

0.0-1.0 floating point Numpy arrays in Python.

Figure 15: A cropping example of three consecutive, 50% overlapping, cropped images of the original image. Note that cropped picture 1 and 2 both contain the same joint. Cropped picture 3 does not contain the joint.

The following pictures give examples of cropped images of both positives (contain a joint) and negatives (does not contain a joint):

4_{Manually picking the 422 pictures containing EI-joints from 100,000 costed 3 days (24 hours) of work. The}

collection has not been double checked to verify if any pictures of joints had been missed, but after rechecking small parts, there was an indication that 10-20% of the images may have been missed.

(30)

Figure 16: examples of positives containing a joint (five pictures at the left) and negatives, not containing a joint (five pictures at the right); image size 430x96 pixels.

Figure 17: 280x28 pixels pictures, limited to the rail surface; 10 samples pos (left) and neg (right)

Note that the negatives (no joint) reveals other categories of potential interest. Most are smooth and even, but some reveal welds (pictures 3 and 7 of the negatives in Figure 17), parts of a switch (picture 9 of the negatives) or damages called squats (picture 10 of the negatives).

One advantage of the pictures is that they are quite stable. In some cases the rail was not exactly centered. Also, the pictures slightly vary in length (plus or minus 5 pixels). This was dealt with by taking a fixed window size for cropping. A very small portion of the images was incomplete and contained errors. The orientation is always the same. Finally, consecutive pictures exactly match, so they can easily be combined to scan at the connection of two consecutive pictures. Since the variations were small and incomplete pictures were very rare (less than 10 on a 100,000), these effects were ignored. Therefore, there was no need for extensive data cleaning or handling of missing data.

(31)

7.2 CNN architecture

A number of architectures were tested. The architecture of the best performing Deep Convolutional Neural Network (CNN) was a tweaked version of the CNN of Molodova et.al. [10] and was as follows:

Type of layer: Specifications

Input Pixel size 280 x 28

Layer 1: Convolutional Feature maps

Filter size (kernel) 9 x 5 20

Layer 1: Maxpooling Filter size 2 x 2

Filter size (kernel) 9 x 6 40

Layer 2: Maxpooling Filter size 2 x 2

Filter size (kernel) 4 x 2 2 x 2

Layer 4: Fully connected Number of units 320

Layer 4: Dropouts Dropout percentage 25%

Layer 7: Output layer Number of units 2

Total number of parameters 2,106,086

The convolutional and fully connected layers used the ReLu activation function. The convolutional layers used zero padding to avoid loss of pixels at the outer boundaries. The differences with the specification in [10] are the use of dropouts at the first fully connected layer (layer 4) and an output layer of 2 units added to the original output layer of 8.

Note that the kernels in the convolutions are not square but elongated. Although it is not exactly clear why this is better than a square kernel, horizontally elongated kernels could fit the elongated shape of the joints better than square ones.

Appendix 1 shows the implementation in Keras. Keras is an API that can be used in

combination with several libraries (Tensorflow, Theano, and others). In this case Keras used the Tensorflow backend.

7.3 Training, validation, testing and evaluation approach The training and testing approach used:

• A balanced, labeled test set for training, validation and testing

• An imbalanced, realistic set for evaluation

(32)

Figure 18: Process for building up the balanced training set

The images for training were cropped in the process to 280x28 pixels taken from the original images of 1293x289 pixels. The following table specifies the training set used:

Type of picture Number Balanced or not

Training set 1,091 (80% of train-validate) of 102 tracks Balanced

Validation set 273 (20% of train-validate) of 102 tracks Balanced

Test set 190 Balanced

Evaluation set 6,036 (24 Positives, 6,012 Negatives) of 2 tracks Imbalanced

The CNN was trained at the Training and Validation set (1,091+273=1,364 images in total),

using random shuffling at each epoch. Training was done by optimizing the total validation

accuracy. The test set of 190 balanced images was kept apart and was not used for training. It was only used to do a first evaluation of the trained network. The training and validation results are presented in paragraph 8.2, the test results in 8.3

Finally, all images of two tracks were selected and used as a realistic, imbalanced dataset as illustrated in Figure 17. This set was used to evaluate the trained algorithm at a realistic, imbalanced set of pictures of negatives (no joint) and positives (joint).

(33)

Figure 19: Creating the imbalanced test set

Paragraph 8.4 presents the results of the evaluation at the imbalanced test set. 7.4 Evaluation criteria

The experimental results in chapter 8 are evaluated using the following metrics:

Accuracy: the total amount of correctly classified measurements divided by the total amount

of measurements:

𝑇𝑇𝑇𝑇 + 𝑇𝑇𝑇𝑇 𝑇𝑇𝑇𝑇 + 𝑇𝑇𝑇𝑇 + 𝐹𝐹𝑇𝑇 + 𝐹𝐹𝑇𝑇

Recall: recall is a measure for the ability to find all relative cases within a dataset. It is very

useful in the case of imbalanced classification problems: those problems where only a small amount of relevant data points exists among a great many irrelevant ones. The object of interest in this study, detecting EI-joints that occur every 1 out of 260 (cropped) images, is such a case. By simply assuming every image as a negative, the overall accuracy would be 259/260 = 99.6%! But, our objective is to correctly detect a high percentage of EI-joints that actually are EI-joints, while incorrectly labeling images while they are not has a lower price: we simply have to remove the incorrectly labeled images manually. In other words: we require high recall. The definition is:

𝑛𝑛𝑐𝑐𝑐𝑐𝑟𝑟𝑐𝑐𝑐𝑐 = _{𝑇𝑇𝑇𝑇 + 𝐹𝐹𝑇𝑇}𝑇𝑇𝑇𝑇

Precision: precision is a measure for how many of the positively classified data points

actually are positive data points:

(34)

The cost of precision is the cost of manually removing the misclassifications of the category False Positives. The higher the precision, the lower the cost.

Confusion matrix: A confusion matrix contains information about actual and predicted

classifications done by a classification system, in this case the trained CNN algorithm. It can be used to evaluate the performance of the classification system. A binary classification can be evaluated by the following matrix:

Predicted

Negative Positive

Actual Negative TN FP

Positive FN TP

True Positive (TP): an actually positive data point that has been classified as positive True Negative (TN): an actually negative data point that has been classified as negative False Positive (FP): an actually negative data point that has been classified as positive False Negative (FN): an actually positive data point that has been classified as negative