Detecting Wildfires Through the Use of Bird Behaviour

(1)

Detecting Wildfires Through

the Use of Bird Behaviour

(2)

Layout: typeset by the author using LATEX.

(3)

Detecting Wildfires Through the

Use of Bird Behaviour

Guilly Kolkman 11822465

Bachelor thesis Credits: 18 EC

Bachelor Kunstmatige Intelligentie

University of Amsterdam Faculty of Science Science Park 904 1098 XH Amsterdam Supervisor Dr. E. Rakhimberdiev Dr. S. van Splunter Conservation Ecology Group

Groningen Institute for Evolutionary Life Sciences University of Groningen, The Netherlands

Department of Vertebrate Zoology

Lomonosov Moscow State University, Moscow, Russia Jan, 2021

(4)

Acknowledgements

Firstly, I would like to thank my supervisor Eldar Rakhimberdiev for assisting me from start to finish with this project and thesis. Whenever I got stuck, he always had clever solutions that approached the problems I encountered from a new and different angle. Secondly, I would like to acknowledge Sander van Splunter for being my contact person at the University of Amsterdam, which allowed me to have Eldar as my supervisor. In addition, I would like to thank him for helping me with navigating all the technicalities that come with writing a thesis. Thirdly, I would like to thank Willem Bouten for bringing me in contact with Eldar.

(5)

Abstract

Wildfires are getting more and more frequent with global warming. Such wildfires pose a major threat to nature and humans alike. This thesis aims to explore if it is possible to detect wildfire with the aid of bird behaviour. To examine if this is possible, birds that were close to a wildfire were annotated and analysed by multi-ple supervised machine learning algorithms. The dataset used includes birds that breed in Germany and migrate towards Spain and Africa. Concurrently, the wild-fire dataset that was used is comprised of wildwild-fires that were detected by satellites. These beforementioned two datasets were combined and analysed to find evidence for the possibility of detecting wildfires with the aid of bird behaviour. This anal-ysis shows that a change in the behaviour of a bird close to a fire can be observed, but the machine learning algorithms used were not yet precise enough to detect most of the wildfires.

Keywords: bird behaviour, wildfire detection, machine learning algorithms, GPS, animal tracking, early warning system

(6)

Chapter 1 Introduction

Wildfires are a threat to the ecosystem, vegetation, lives of animals and humans. The amount of smoke created by wildfires creates air pollution [7]. In 2018 there have been a total of 59,000 wildfires and 354.000 hectare burned in Europe, Middle east and North Africa [8]. Therefore, wildfire detection is paramount to decreasing wildfire spread. Regarding the detection of these wildfires there are a few systems in place: sensor based and satellite based. Sensor based detection utilises individ-ual sensors that need to be installed on vantage points. Two kinds of sensors used are optical sensors, which provide colour information, and infrared sensors, which detect wildfires using thermal radiation of the surroundings. Sensors have the dis-advantage that they have a limited range and are prone to give noise. Meanwhile, satellite based detection has a big detection range and a high accuracy. However, the response for the detection of a wildfire is dependent on the satellite passing the area. Another downside to satellite detection is that it has trouble detecting fire through clouds [7]. Since these methods have their disadvantages this research will show a proof of concept for a novel wildfire detection method. This will be based on behaviour patterns of animals when a wildfire is in a close vicinity, us-ing global positionus-ing system (GPS) trackus-ing stored in the Movebank database. Movebank is the largest repository for movement data. It has different animal species with a wide spectrum of different trackers and 2.4 billion locations in use. This database is publicly available for research purposes and maintained by the Max Planck institute of animal behaviour.

1.1 Bird behaviour

An unexpected event like a wildfire will induce stress in animals and this affects their behaviours [3, 6]. For a wildfire this means a behavioural change towards fleeing. The most direct and fastest way of fleeing for a bird is flight. These

(8)

acteristics of flight can be analysed and deduced on the basis of the amount of time in flight if something unexpected happened to the bird. A classification algorithm will be used in the detection of bird behaviour when a wildfire is in a close vicin-ity. Previous studies have researched the classification of animal behaviour using supervised and unsupervised techniques [4, 9]. From the behaviours of foraging, resting and wandering to predator avoidance.

Based on this information, there is no detection system capable of detecting wildfires with the aid of bird behaviour. Therefore, the main research question of this thesis is: to what extent can wildfires be detected with bird tracking data? To be able to answer this research question this thesis will focus on two sub-questions:

• How can bird behaviour be annotated?

• Based on the behaviour models defined, to what extent is it possible to detect patterns in bird behaviour that indicate a wildfire in a close vicinity, using basic machine learning approaches?

The thesis is structured by first analysing the two datasets used. Then a basic overview of the machine learning algorithms used to research the main research question. The last section contains the results of the machine learning algorithms.

(9)

Chapter 2 Approach and Implementation

The approach to answer the research question of this thesis was to use classification algorithms to detect whether there is a wildfire or not. The detection used in this paper is supervised learning. Supervised learning can be defined as the task to learn the mapping of input X to output Y by using annotated data [2]. The two datasets used for this analysis were the sample tracking data stored at Movebank and the MODIS dataset on wildfires. The research of which the timeframe was 2015-2019 is described in the following section.

2.1 Dataset

2.1.1 Movebank

The movebank "LifeTrack White Stork Rheinland-Pfalz"[5] was used for tracking the birds. This dataset consists of 109 birds equipped with a wearable tracking device that logged GPS positions every 5 minutes with accelerometer data. These birds breed in Germany and migrate towards Spain and Africa.

2.1.2 MODIS

The dataset used for the detection of wildfires is the MCD14ML dataset. This dataset is from the earth observation data, LANCE: NASA Near Real-Time Data and Imagery, Fire Information for Resource Management System (FIRMS) [1]. This dataset uses Moderate Resolution Imaging Spectroradiometer (MODIS). MODIS is an instrument on the two satellites Terra (EOS AM-1) and Aqua (EOS PM-1). The Terra satellite passes the equator in the morning from north to south while the Aqua satellite passes the equator in the afternoon but from south to north. Therefore these two satellites cover the entire Earth’s surface every 1 to 2 days.

(10)

Figure 2.1: MODIS dataset

It detects fires in 1-km pixels which are burning at the time of overpass. The parameters of part of the MODIS dataset are seen in figure 2.1. The observations used in this thesis are: latitude, longitude and confidence.

2.2 Formatting data

Making a supervised model required an annotated dataset. This dataset is cre-ated by having a spatiotemporal overlay of the Movebank and MODIS datasets. First, the Movebank dataset was simplified to bird behaviour per hour, which de-creased the calculation time without decreasing accuracy when combining the two datasets. Likewise in the MODIS dataset annotations were made for fires that are in consecutive days, because if a fire already exists, bird behaviour is not likely to give a strong response. Afterwards for each hourly position of a bird, the shortest distance between a wildfire and the bird was calculated. These are paired and stored into a new dataset. However, in the MODIS fire dataset the timestamps of a fire do not correspond to the time when a fire started. What MODIS actually registers is the time when the satellite passes over the area of the wildfire. In ad-dition every row has a column which denoted if this is a day with fire or without. For the days without fires an assumption was made that two days before the fire is detected, the bird behaved as normal.

Secondly, from this new annotated dataset with bird-fire pairs, parameters were calculated to analyse the bird’s behaviour per day. These parameters are related to the movement of the bird. The behaviours used were resting, walking and flying. The assumption for these behaviours was that when a fire is in a close vicinity the bird would fly away as the shortest and fastest means of travel. Resting is defined as a speed being less than 0.15 km/h, walking is defined as a speed between 0.15-5.0 km/h and flying denoted by a speed greater than 5.0 km/h. These distributions were chosen by analysing the speed histogram of the Movebank dataset. Using the information about speed the speed of the bird, the total distance travelled per day was calculated and added in the parameter dataset. Extra parameters used were the number of behavioural modes in a day, the average time of a behaviour,

(11)

and nighttime movement. Normally at night, a bird such as the stork used in this research would sleep, therefore, nighttime movement could suggest a disturbance. Nighttime movement was calculated by comparing it to the sunset and sunrise of the specific day at the bird’s location. If the behaviour was before sunrise this would be added to the nighttime parameter. Other behaviours like breeding and hunting were not added. This is because the accelerometer data needs to be analysed and unfortunately this is outside the scope of this thesis.

Lastly, after the parameters were calculated, principal component analysis (PCA) was used. This method reduces the dimensionality of the dataset and keeps most of the information. To use PCA, the dataset first needed to be standard-ised. Standardisation means that the dataset has a mean of zero and a standard deviation of one. For PCA a consistent variance is important, because the goal of PCA is to maximise the variance. Therefore, when features are not standardised the direction of the maximal variance can be skewed.

The final dataset consisted of the parameters per bird per day that were close to a fire. The parameters were:

• date, • no_of_points_speed, • longitude, • latitude, • distance_travelled_km, • resting, • no_of_rest, • average_rest_time, • walking, • no_of_walk, • average_walk_time, • flying, • no_of_fly, • average_fly_time, • nighttime_flying, • nighttime_walking, • distance_from_fire, • bird and • fire.

2.3 Statistical analysis

Multiple supervised algorithms were tested to inspect which one would work the best for classifying wildfires. For all the algorithms the module Sklearn from Python was used. For the remainder of this section the classification of wildfire will be referred to as class.

(12)

2.3.1 Naive bayes

As the name naive bayes suggests, it is based on bayes theorem. Bayes theorem can be defined as:

P (A|B) = P (B|A)P (A) P (B)

Here we can get the probability of event A happening, given B has occurred. Where A is the class and B are the features. The assumption is made that all features are independent [2].

2.3.2 Logistic regression

Logistic regression separates the classes by finding the linear boundary line be-tween classes. The output of the linear function is afterwards decreased to a value between [0,1] with the use of the sigmoid function [2]. That value will then be classified to the class, zero or one whichever is closest to the value. Thus, a value that is 0.30 will be classified as not a fire.

Sigmoid = 1 1 + e−x

2.3.3 Support vector machine

A support vector machine algorithm has the objective to find a hyperplane that separates all N features in a N -dimensional space. This decision boundary tries to maximise the margin for the training data. Where the margin is defined as the distance to the closest sample [2].

2.3.4 Linear discriminant analysis

The linear discriminant analysis focuses on dimension reduction without the loss of information. It reduces dimensions in a similar manner to principal component analysis. The difference is that linear discriminant analysis is used for supervised classification and principal component analysis for unsupervised [2].

2.3.5 Quadratic discriminant analysis

Quadratic discriminant analysis is the same as a linear discriminant analysis how-ever it can learn with as the name suggests quadratic boundaries, this is not possible in linear discriminant analysis [2].

(13)

2.3.6 K-nearest neighbours

This algorithm classifies data by calculating which objects are closest to each other. It groups similar objects together into a cluster and when a new object is added it is classified in the most similar cluster [2].

2.3.7 Decision tree

A decision tree is a nonparametric model where a region is identified by a sequence of recursive splits. In these splits there are different nodes, a decision node or a leaf node. Each decision node has a discrete split based on a condition, afterwards a branch is chosen based on the condition. This process is repeated until a leaf node is reached and no more splits are made. A node is a leaf node when all instances of splits the branch can take after the node are the same class. This node is classified as pure. Purity is calculated using a impurity measure.

Im = − K

X

i=1

P_milog2Pmi

The purity is satisfied if P1 _{= 1} and P2 _{= 0} then all examples are of a certain

class. Where Pi

m is the probability that given it reaching the node m has the class

fire or not fire [2].

2.3.8 Random forest

Random forest continues on the decision tree classifier. It uses multiple decision trees on different random splits of the dataset. Subsequently, from the different decision trees a vote is made for the best outcome and selected. The best outcome is the combined predictions of the individual decision trees [2].

2.4 Approach

A detection of wildfires with a 1 or a 0 is a binary classifier. Which is a classifier to split the data into two groups based on the classification rules.

2.4.1 Evaluation

An evaluation of the different algorithms is done through cross validation. First a train, test and validation set is made with a split of 7/10. There could be a split where the results differ, therefore a cross validation of 10 folds was used. Except for the range of 5km the dataset was not big enough to use 10 folds, therefore

(14)

5 folds were used for 5km. How successful an algorithm was is evaluated by an accuracy measure. Where tp, tn, fp, fn are true positives, true negatives, false positives and false negatives respectively.

Accuracy = tp + tn N

However, the accuracy can be skewed because of the distribution of the data. Thus, precision, recall and F-score were also used to evaluate the algorithms. Precision is defined as the proportion of positive classifications that are correct.

P recision = tp tp + f p

Recall is defined as the proportion of actual proportions that were correctly clas-sified.

Recall = tp tp + f n

Lastly, the F-score is the weighted average based on precision and recall F score = 2 ∗ precision ∗ recall

(15)

Chapter 3 Results

The results are spread out over 3 different distances from the initial fire. These distances were 5km, 10km, 50km. Which had a dataset size of 19, 78 and, 1326 respectively.

3.1 5km

The best machine learning algorithm for fires within a 5km range is quadratic discriminant analysis with an accuracy of 0.753. The accuracy differs between the highest ranking and the lowest ranking algorithm by a margin of 0.386 as can be seen in figure 3.1. In the PCA of the data from 5km with an information keep of 0.95 there are six principal components. The highest ranking feature with the most information is flying for 5km.

Model Accuracy Precision Recall F1-score Quadratic Discriminant Analysis 0.753 0.567 0.700 0.640 Linear Discriminant Analysis 0.666 0.517 0.650 0.573 Support vector machine 0.600 0.500 0.600 0.533 Logistic regression 0.517 0.450 0.550 0.447 Decision tree 0.433 0.317 0.400 0.373 K-nearest neighbours 0.433 0.217 0.400 0.313 Bayes 0.433 0.333 0.450 0.367 Random Forest 0.367 0.283 0.400 0.300 Table 3.1: 5km results 10

(16)

3.2 10km

In a range of 10km K-nearest neighbours yields the best results while in the results of 5km, K-nearest neighbours was ranked second to last. When going from 5 to 10km the accuracy also decreased to 0.625 as seen in figure 3.2. In addition quadratic discriminant analysis, which provided the best result at 5km, has now dropped to an accuracy of 0.492. The F1-score does not have a big difference between K-nearest neighbours and logistic regression, but below logistic regression the F1-score goes below 0.5. Furthermore, the PCA of the dataset for 10km, with an information keep of 0.95 had seven principal components as described in table 3.4. With flying being the highest ranking feature.

Model Accuracy Precision Recall F1-score K-Nearest Neighbours 0.625 0.554 0.625 0.578 Logistic Regression 0.583 0.571 0.575 0.557

Bayes 0.542 0.438 0.550 0.472

Quadratic Discriminant Analysis 0.492 0.379 0.500 0.418 Random Forest 0.458 0.354 0.4750 0.387 Linear Discriminant Analysis 0.450 0.350 0.450 0.388 Decision Tree 0.433 0.363 0.438 0.387 Support Vector Machine 0.383 0.242 0.375 0.298

Table 3.2: 10km results

3.3 50km

The accuracy for fire in a range of 50km are all between 0.52 and 0.469 as seen in figure 3.3. There is little difference between the best and worst performing algorithms. Moreover, the algorithm that had the best results in 5km now has the lowest result with an accuracy of 0.469. For the PCA of 50km it has the same amount of principal components as 10km. In addition, the order of importance for the principal components are the same.

3.4 Comparison

A comparison between the best performing algorithms over the different distances was made in figure 3.1. This figure shows a decrease in all evaluation factors the farther a bird is from a wildfire. There is a significant decrease between accuracy, recall and F-score. However, precision has a slower decrease as can be seen in

(17)

Model Accuracy Precision Recall F1-score Logistic Regression 0.520 0.519 0.519 0.513

Bayes 0.511 0.502 0.502 0.442

Random Forest 0.504 0.503 0.503 0.500 Quadratic Discriminant Analysis 0.493 0.462 0.485 0.411 Decision Tree 0.492 0.491 0.492 0.490 Linear Discriminant Analysis 0.477 0.477 0.473 0.454 Support Vector Machine 0.472 0. 469 0.470 0.467 K-Nearest Neighbours 0.469 0.468 0.469 0.467

Table 3.3: 50km results

5km 10km 50km

Feature Percentage of_information Feature Percentage of_information Feature Percentage of_information flying 0.420 flying 0.344 flying 0.331 average

rest_time 0.224 walking 0.218 walking 0.242 no_of_fly 0.152 average

walk_time 0.139

average

walk_time 0.150 average

fly_time 0.083 no_of_fly 0.118 no_of_fly 0.081 nighttime flying 0.066 nighttime walking 0.070 nighttime walking 0.073 no_of_fly 0.029 nighttime flying 0.059 nighttime flying 0.063 - - average_{walk_time} 0.026 average_{walk_time} 0.031

Table 3.4: PCA

figure 3.1. There was also not a big difference between the evaluation of 50km and 100km. In addition, from the total features used in the algorithms, a few features were missing. These feature were the resting, no_of_rest, no_of_walking and distance travelled on a day.

(18)

Figure 3.1: Comparison best algorithm results

3.5 Exploration of the results

Outside of machine learning algorithms to research the signal of a wildfire close to a bird, an activity graph was made as shown in figure 3.2. In the graph grey is resting, green is walking and orange is flying. When a fire occurred close to the bird, the bird flew away and the days after travelled through the air. In figure 3.2 the latitude increases or decreases the days after a fire took place. As opposed to staying constant when there was no fire.

(19)

2016 2017 2018 0 5 10 15 20 animal ID: 4353 Time hour of a da y 2016 2017 2018 35 40 45 50 4353 Time latitude

Jul Aug Sep Oct Nov Dec

0 5 10 15 20 animal ID: 4358 Time hour of a da y

Jul Aug Sep Oct Nov Dec

20 30 40 50 4358 Time latitude

Figure 3.2: Activity of a sample bird with a fire<5km

(20)

Chapter 4 Discussion

This thesis made a proof of concept that it is possible to detect patterns for birds in a close vicinity to a wildfire. It is a concept that could help in decreasing the response time of detecting a wildfire and the implementation cost is cheaper since the GPS-trackers are already in place for many animals. One way to improve pattern detection is with the help of deep learning. However, that was out of the scope of this thesis.

One of the biggest drawbacks to this research was the amount of data available. There was enough signal of behavioural change for birds within 5km, however, the dataset consisted of only 19 entries. Thus, the reliability of the research could be improved with a bigger dataset.

Another option to increase the reliability and the results, would be to use the timestamps per 5 minutes to research exactly when a bird’s behaviour would change in response to a fire instead of the general behaviour over a whole day. To research the exact change, a different dataset for fire detection is needed that has the exact timestamp a wildfire starts. Thus, the downside of these two existing datasets is not having the precise parameters that could improve the research.

Lastly, the combining of two datasets might have as a result missing informa-tion, or containing information with errors. For the bird dataset the accelerometer data were discarded and for the fire data the information about the strength of the fire were discarded. This was discarded because of time constraints and difficulty. For future work a suggestion would be to use machine learning algorithms that have the possibility to use the time series data to research the patterns of bird behaviour.

(21)

Chapter 5 Conclusion

This thesis focused on the research question: to what extent can wildfires be de-tected with bird tracking data? This question was separated in two sub-questions. Regarding the question whether bird behaviour can be annotated. A bird’s behaviour can be annotated using general parameters such as resting, walking and flying. However, more specific behaviours are more difficult to annotate and require timestamp analyses. Therefore the basic model used with speed can calculate resting, walking and flying but it can not annotate other behaviours.

The second question is about the possibility to detect patterns in bird behaviour that indicates a wildfire in a close vicinity, using basic machine learning approaches. This question can be answered by viewing the accuracy, precision, recall and F-score. The closer a fire was to a bird, the stronger the response was. This can be seen from the results of the machine learning algorithms and the manual inspection of the birds. Moreover, the most important features that influence the detection of a wildfire or not is the amount of flying on a day. This can be explained by the fact that when a fire occurred the bird tries to flee in the most direct manner. Because flying is the most direct and fastest to flee from a fire, the bird will most likely choose this. Interestingly the amount of nighttime movement does not influence the detection of a fire significantly. This could be the case because there are not many instances in the dataset. However, from the algorithms there is a signal of patterns for detection of a wildfire.

Thus, for the main research question, it is possible to detect wildfires using bird behaviour. Although, the results in this research are not reliable enough to have a certainty of the detection of wildfires. A bird’s behaviour changes significantly enough to detect wildfires.

(22)

Bibliography

[1] Mcd14ml. https://earthdata.nasa.gov/earth-observation-data/near-real-time/firms/mcd14ml.

[2] Etham Alpaydin. Introduction to machine learning. The MIT press, third edition edition, 2014.

[3] Scheibe K.-M. Michaelis S. Streich W. J. Berger, A. Evaluation of living conditions of free-ranging animals by automated chronobiological analysis of behavior. 2003.

[4] Merijn de Bakker. Automatic classification of bird behaviour on the basis of accelerometer data. June 24th 2011.

[5] Myotis (Wolfgang Fiedler). Lifetrack white stork rheinland-pfalz. https://www.movebank.org/cms/webapp?gwtfragment = page =

studies, path = study76367850.

[6] A. J. J. MacIntosh. The fractal primate. 2014.

[7] Kosmas Dimitropoulos Panagiotis Barmpoutis, Periklis Papaioannou and Nikos Grammalidis. A review on early forest fire detection systems using op-tical remote sensing. 11 November 2020.

[8] Boca Roberto SAN-MIGUEL-Ayanz Jesus, Durrant Tracy. Forest fires in eu-rope, middle east and north africa 2018. 2019.

[9] Guiming Wang. Machine learning for inferring animal behavior from location and movement data. 2008.

Detecting Wildfires Through the Use of Bird Behaviour

Detecting Wildfires Through

the Use of Bird Behaviour

Detecting Wildfires Through the

Use of Bird Behaviour

Acknowledgements

Contents

Chapter 1

Introduction

1.1

Bird behaviour

Chapter 2

Approach and Implementation

2.1

Dataset

2.1.1

Movebank

2.1.2

MODIS

2.2

Formatting data

2.3

Statistical analysis

2.3.1

Naive bayes

2.3.2

Logistic regression

2.3.3

Support vector machine

2.3.4

Linear discriminant analysis

2.3.5

Quadratic discriminant analysis

2.3.6

K-nearest neighbours

2.3.7

Decision tree

2.3.8

Random forest

2.4

Approach

2.4.1

Evaluation

Chapter 3

Results

3.1

5km

3.2

10km

3.3

50km

3.4

Comparison

3.5

Exploration of the results

Chapter 4

Discussion

Chapter 5

Conclusion

Bibliography