Imputing Parking Usage on Sparsely Monitored Areas Within Amsterdam Through the Application of Machine Learning

(1)

Imputing Parking Usage on Sparsely Monitored Areas Within Amsterdam Through the Application of

Machine Learning

submitted in partial fulfillment for the degree of master of science Jeroen Schmidt

12280860

master information studies data science

faculty of science university of amsterdam

2019-06-25

Industry Supervisor Academic Supervisor UvA Examiner Title, Name Dr Bas Schotten Dr Elenna Dugundji Frank Nack

Affiliation City of Amsterdam Vrije Universiteit Amsterdam & CWI University of Amsterdam Email b.schotten@amsterdam.n E.R.Dugundji@cwi.nl nack@uva.nl

(2)

Imputing Parking Usage on Sparsely Monitored Areas Within

Amsterdam Through the Application of Machine Learning

Jeroen Schmidt

MSc Candidate University of Amsterdam jeroen.f.l.schmidt@gmail.com

ABSTRACT

Effective parking policy is essential for cities to reduce the demand their road networks experience and to combat their carbon foot-prints. Existing research in the application of machine learning to understand parking behaviour assumes that cities have prohib-itively expensive stationary parking sensors installed while no research has yet attempted to use machine learning to impute for parking behaviour using mobile probe data of sparsely monitored areas. To this end, this paper shows that is is indeed feasible to impute parking pressure (occupation as a percentage). Gradient Boosted Trees were found to perform the best with an R2 score of 0.20 and RMSE score of 0.087. This paper also found that 3 unique parking occupancy’s patterns exist and that this information in combination with neighbourhood characteristics has an impact on imputation under certain conditions.

KEYWORDS

sparse spatial-temporal data, predicting sparsely monitored parking usage, machine learning, data science, map matching, clustering, imputation

1 INTRODUCTION

An effective parking policy is essential to city planning as City gov-ernments strive to become more efficient with their use of resources. Recent studies show that parking policy has a direct effect on the traffic patterns that cities experience; a study by Shoup[27] found that roughly 30% of all urban traffic is due to cruising congestion caused by individuals looking for on-street parking. Parking policy, as a result, also has an impact on the carbon footprint that many cities are aiming to reduce.

Parking behaviour is a challenging phenomena to track, making the development of good parking policy particularly hard. Existing methods are either not practical or expensive if a city wishes to have a complete spatio-temporal view of how parking is used; for example, the manual counting of parking usage or, the use of stationary parking sensors.

The City of Amsterdam1wishes to implement better parking pol-icy yet does not have a complete view of the parking behaviour[3]. It has available temporary parking pass purchases provide a contin-uous time-series view for the whole city and accounts for roughly ±15% of daily parking behaviour, this behaviour is easy to track because we are able to link the purchased passes to the zones and associated pay stations. The majority of parking usage is however not trackable because they are long term (housing, work) parking permits which do not self report. It so happens that the City of

1_{Map Provided in Appendix F, Figure 21}

Amsterdam has mobile sensors attached to cars that patrol to city for parking infringements, these mobile sensors provide the city with a sparse temporal-spatial view of parking usage. Thus the City of Amsterdam wises to use the mobile sensor data to determine what the percentage of parking occupation for neighbourhoods looks like when they are not observed. The percentage of parking occupation is also refereed to as parking pressure.

The problem of determining the value of missing observations is known as imputation. A survey on Smart Parking Solutions con-ducted by Lin et al.[22] showed that the area of research concerning imputation of parking usage had been sparsely explored in the con-text of when no continues view of parking behaviour within a single area is available. Three papers were identified that are closely re-lated to the problem at hand; 1) Simure-lated data to resemble that of probe vehicle observations to forecast parking usage [8](not to be confused for predicting for unseen areas), 2) Imputed parking usage for unmonitored parts of a city by taking the predictions of a machine learning model that was trained on complete data from sensors monitoring another part of city that matched the charac-terises of the unmonitored parts [20], 3) The use of Inverse Distance Weighting (IDW) to impute parking pressure for unmonitored roads that were adjacent to roads that were monitored continuously with sensors[9]. Of these three papers, only the work by [8] dealt with using sparse spatio-temporal data while the other two papers used continues data to impute data from completely unmonitored areas. This paper investigates the feasibility of imputing parking pres-sure for sparsely monitored areas using sparse spatio-temporal data. Towards that goal, we structure the research around the following research questions:

Is it possible to impute parking usage of sparsely monitored city areas using sparse temporal-spatial data and machine learning methods? This is divided into the following sub-questions;

(1) Given the structure of the data, what supervised machine learning methods produce the best results?

(2) What kinds of city data are the most important for imputing unobserved neighbourhoods?

(3) Can neighbourhood characteristics of over represented neigh-bourhoods be used by supervised machine learning methods to better predict for under represented neighbourhoods? (4) Do neighbourhoods show distinct parking behaviour and

does the classification of these unique behaviours help im-pute overall parking pressure?

This paper is broken into 5 sections. Section 2 covers the existing literature. Section 3 explains the methodologies employed and the reasoning for their use. The results are shown in Section 4 and then discussed in Section 5 in terms of the research questions.

(3)

The papers’ findings and discussions are then summaries in the concluding Section 6.

2 RELATED LITERATURE

No research was identified that tackles the problem of imputing parking usage for sparsely observed areas within cities but exten-sive related work exists on the topic of smart parking. The related research is broken down into 3 subsections; 1) Provides a general overview of the field of smart parking solutions in the context of imputing parking usage, 2) Discussion of research that has directly or very closely looked at imputing parking usage for unobserved ar-eas and 3) Discussion of existing research that looked at predicting future parking usage when the data is continues.

2.1 Overall Field

An academic survey on Smart Parking Solutions was conducted by Lin et al.[22] over the past 15 years and looked at over 100 papers. A categorisation framework was proposed for the different types of research conducted within the domain of smart parking solutions. This framework offered clarity into the current research, but it demonstrated, in its omission of the topic, the lack of research conducted on the imputation of parking usage in sparsely monitored areas through predictive machine learning methods.

The academic survey by Lin et al.[22] provided guidance into relevant papers based on the types of data sources they used, the two notable ones being; non-stationary sensors and crowd sourcing. Papers that looked at non-stationary sensors used cars equipped with LIDAR, Magnetron and Cameras to provide a variate of smart parking solutions and mainly were concerned with the problem of detecting parking occupancy; while none were related to predict-ing occupancy, they did offer insight into the methodologies used to detect occupancy. The papers that looked at using crowdsourc-ing methods to determine parkcrowdsourc-ing usage primarily investigated the simulated situation if a large number of individuals volun-teered their parking behaviour through a mobile application. These papers[23][8] overlap somewhat with this paper but differ in that they assume complete city scan coverage within ±25 minutes -which is not the case in the Amsterdam context.

2.2 Predicting Parking for Unseen Areas

Three papers were identified that directly or very closely dealt with the topic of predicting parking usage within sparsly monitored areas. A majority of which comes from Fabian Bock who has been focusing on the topic of unknown parking behaviour since 2016.

The work by Bock et al.[8] looked at the performance of random forests to predict future parking usage using data that was simulated to resemble 1) mobile app crowd-sourcing, 2) Probe Vehicle parking detection (analogous to the scanner cars used in Amsterdam). The research used the San Francisco fixed point parking data[1] to simulate the 2 cases to predict into a 30 minute time horizon. This research assumed that there were sufficient probe vehicles (300 cars) in the city to allow for a 25 minute time window for a complete view of the city. This research lends credence to the idea that sensor data in the form of scanner cars can be used to predict for parking behaviour if there are enough mobile sensors.

The work by Bock and Sester[9] investigated the performance of using Inverse Distance Weighting (IDW) to impute parking usage in adjacent roads from nearby road segments that had continues sensor data. They found strong correlations between the adjacent parking usage the observed road segment of up to 50m. They found that IDW was 5% better than taking the aggregated road segments. This research showed that there is a possibility for information gain when considering spatial relations between observation.

The paper by Ionita et al.[20] was based on the authors mas-ters dissertation [19]. Gaussian clustering was performed on the average amenity duration visits2in order to identify pairs of areas that had monitored and un-monitored parking spaces. The paper showed that one could train a model on one area of a city and then use that trained model to predict for another part of the city (that was completely unmonitored) on the condition that they be-longed to the same cluster class. This paper also experimented with several machine learning models(Decision Trees, Support Vector Machines, Multi-layer Perceptions, Gradient Boosted Machines) and determined that gradient boosting machine performed the best. This paper relied on data from San Francisco OpenParking data for the occupancy values. The paper also found that events, fuel price, traffic and weather data but found fuel price, traffic were little predictive value. The author proposed that future work could inves-tigate the usage of machine learning models to learn the clusters directly and then predict parking usage behaviour.

Auxiliary research of interest includes the work by Bock et al.[10] who looked at using clustering to aggregate parking meters accord-ing to temporal behaviour in-order to simplify the analysis of those devices. The research found that there were the Amsterdam mobile sensor data does not offer 3 distinct patterns of ticket purchasing behaviour within the city of Hannover, Germany over a one year time period. The hierarchical clustering was used on the hourly aggregated occupancies of the sold passes. This work is of interest because it can help guide the understanding of different neigh-bourhoods when building predictive models to impute the parking usage, and it lends itself well to the work that was done by Ionita et al.[20].

2.3 Predicting Future Parking Occupancy

The majority of research that was conducted on predicting future parking occupancy was done so on the assumption that continues, and full view data was provided in the form of sensor data. While this research is not directly relevant to predicting parking usage for sparsely monitored areas, it did show that machine learning could be used successfully in the domain of parking and offers insight into methods and techniques that could be explored in the future. A large portion of research conducted on predicting future park-ing occupancy before 2014 was done through statistical and heuris-tic approaches such as Caliskan et al.[12]David et al.[14] and Teodor-ovic and Lucic[28]. Machine learning has been using in more recent years to predict parking usage. Some auxiliary literature on the topic includes the work by Vlahogianni et al.[29] who demonstrated a solution that performed real-time parking focusing for the city of Santander. MAPE was also used as an evaluation metric and used a

2_{Obtained from the Google Places API} 2

(4)

combination of survival analysis and Multi-Layer Perceptrons to build a time-series model.

More recent work by Badii et al.[4] looked into predicting park-ing volumes in parkpark-ing lots in a part of Florence. They compared sev-eral different classes of predictive models and found that Bayesian Regularized Neural Networks obtained the best precision via a MAPE score.

The work by Richter et al.[26] was done in collaboration with Volkswagen AG. Their primary goal was to find an optimal strategy to build parking occupancy that would suit the limited resources of on board car navigation systems. They build time series models that only used parking occupancy figures and they employed spatial and temporal clustering through the use of complete linkage and dynamic time warping (DTW).

The work by Yang et al.[30] was submitted in Jan-2019 of this year and is currently under review. It proposes a framework that allows for parking occupancy forecasting using several combina-tions of data classes through the use of long short-term memory and convolutional graph neural networks. The paper shows how its framework is able to support several combinations of data classes, namely; temporal-spatial(graph), temporal, spatial(graph) and con-stant. Their use case was employed on Pittsburgh Downtown park-ing data. The paper used mean absolute percentage error (MAPE) as its evaluation metric and yielded favourable results. This paper used parking meter transactions, traffic speed data, roadway network information and weather data with favourable results.

3 METHODOLOGY

The Data Science Process defined by Blitzstein et al.[6]3was used to synthesis the methodology used in this paper. It is an iterative process consisting of 5 core parts; synthesising a question, getting the data, exploring the data, model the data, and communicating & visualising the results. Each step supports the step that follows it while insights from the current step are feedback to the preceding step in order to refine it.

This methodology proved ideal in dealing with the complexities posed by the data, particularly in refining the mobile sensor data. This approach resulted in the macro outline in Figure 1, which illus-trates entities and processes required to performed imputation in the context of the Amsterdam case; rectangles represent processes & cylinders represent data sources.

Figure 1: Data and Methods Flow Diagram Outlining how Parking Pressure is Imputed | Rectangles Represent Meth-ods & Cylinders Represent Data Sources

3_{see Appendix B}

A combination of data sources were used in order to impute parking usage; 1) Scan Data containing the observation of park-ing occupation from the mobile sensor data, 2) City Road Network Data contained the spatial geometry road networks of the cities road network and was used to add contextual information to the scans, 3) Parking Pass Data which contains the transnational-log of all on-street purchases tied to a paypoint, 4) Neighbourhood Data is an amalgamation of various statistics on individual neighbour-hoods; specifically, a catalogue of buildings(types) that reside in each neighbourhood and the population & housing desensitise & counts within each neighbourhood. The data was then processed in order to make it suitable for the various machine learning methods. Experiments were then conducted to find the best clustering method in order to identify similar neighbourhoods based on park-ing pass behaviour. The clusterpark-ing results were then combined with the other data sources and used with the supervised learning methods in-order to predict for the missing parking pressure values. Experiments also investigated the performance of the supervised learning methods depending on the level(city, cluster, neighbour-hood) at which they were trained. This was done to determine if the methods could learn generalisable properties from over represented neighbourhood and thus help better predict for neighbourhoods with less observations.

3.1 Mobile Parking Sensors

The mobile sensor data was obtained internally from the Ams-terdam City government. This data provided us with the sparse observations of parking usage throughout the city and was used to determine the parking pressure for each neighbourhood for sparse intervals in time. This subsection explains key characteristics of the data and what was done to the data in-order to make it suitable for parking pressure imputation.

3.1.1 Scan Volumes. The City of Amsterdam has had between 15 to 20 vehicles equipped with panorama license plate scanners which are used to drive throughout the city in order to detect parking offenders who do not have valid on-street parking permits. This data holds over 160 million unique scans and spans from Jan-2016 to Dec-2018.

Figure 2: Stacked Monthly Scan Volumes Per Permit Type

The data also categorises the scanns according to 7 unique types of parking passes; but they can be categorised into 5 over arching

(5)

categories. The first 3 permits types are held for long periods, of-ten between 1 month to 12 months; they are, residential parking passes (BEWONERP), work parking passes (BEDRIJFP) and citywide parking passes(STADSBREED). On-street temporary parking passes (BETAALDP) for short stays. The 3 remaining permit types were SPORTP(sport) BEZOEKP(visiting), ELAADP and were grouped under a general category called Other.

The volumes of the various permit categories described above are plotted in Figure 2. The data volumes for the months of April-2017 and August to September 2018 are below average due to uncontrol-lable data ingestion problems.

3.1.2 Buisness Rules & Bias. The operation of the scanning devices is outsourced to a 3rd party private company that is contractually committed to two performance indicators; 1) the revenue generated from identifying non-valid parking 2) the chance that a vehicle without a valid parking right will be detected per hour. A softer rule expects each area to be visited at least once every day.

These business rules result in the mobile scanning devices scan-ning neighbourhoods a certain number of minimum times to meet the performance indicators, but these rules also do not stop them from prioritising other areas of the city in order to generate as much revenue as possible. This behaviour can be observed in the heat map in Figure 19; it shows a log scale count of the number of times a road segment has been scanned from mid-2016 to the end of 2018. Notably, the centre of Amsterdam is heavily scanned compared to the peripheries of the city. It should be noted that the north of Amsterdam (across the Ij river) was only added to the routes of the mobile scanners at the end of 2018.

These business rules also result in a bias of when a hood is scanned because it will priorities times when the neighbour-hoods are likely to have high offender rates, this can be observed in Figure 4. Figure 4 shows the density plot of observation occur-rences for 6 example neighbourhoods, we can see that there are observations distributions that look mildly uniform (m56a), ones that have a clear bias for the early morning or afternoon (e42a,a00c) and neighbourhoods that have alternating peaks (k44c,a02b,k44f) that respectively have dual peak, increasing or decreasing visitation trends as the hours progress.

Figure 4: Density Plot of Visitations for Examples of Differ-ent Neighbourhood Visitation Behaviours

Large parts of the city are also not scanned during weekends because they become free parking during those periods and thus there is no need for the 3rd party company to scan those areas. A consequence of this is that the data available for weekends is bias towards neighbourhoods that are still scanned. The weekdays were thus removed for the cases when models were built on the city and cluster training levels.

The low number of scanning devices results in a spatio-temporal sparse view of the city. This sparsity can be observed in figure 5, which shows a one-hour interval snapshot from September 24th 2017, where each point on the map corresponds to a unique car being scanned and each colour corresponds to a different scanning divide that made that observation. The blue lines represent the borders of neighbourhoods.

Figure 5: Parking Locations Scanned from 2pm-3pm on 24/Sep/2017. Each dot represents a scan and each colour rep-resents the unique probe that made the observation. Blue Lines Represent Neighbourhood Boundaries.

Figure 6: Parking Locations Scanned by a Single Mobile Sen-sor for the Day of 24/Sep/2017. Each Dot Represents a Scan and the Colour Corresponds to the Time of Day that Scan Oc-curred. Blue Lines Represent Neighbourhood Boundaries.

(6)

Figure 3: Number of Times a Road Segment was Scanned - Log Scale

3.1.3 Determining Routes. Observations of parking pressure for neighbourhoods can only be determined if we know what neigh-bourhoods were scanned completely (or as close to). The data posses a problem in this regard because it only provides the locations of parking spots that have been scanned; consequently we do not know the route that the probes take, what empty spots have been scanned or what percentage of a neighbourhood have been scanned. This problem obscures the ground truth of what the neighbour-hood parking pressure looks like because we do not know when a mobile scanner travelled through a neighbourhood or when it scanned the whole neighbourhood. Figure 6 illustrates a snapshot example of single mobile scanners behaviour within a day and shows how a mobile sensor will travel partially though neighbour-hoods, i.e the dark red and yellow-green line travelling diagonally north-west across various neighbourhoods.

The routes of the mobile scanners thus had to be determined, initial attempts tried using map matching4, but it was found to be computationally expensive and would have taken over two weeks of nonstop running to match every scan to a route & road segment. A naive route calculation was thus devised. The following steps were used to naively determine which roads were travelled through by a mobile scanner within a neighbourhood were5:

(1) Match parking spots to road segments6defined by simplified Open Street Map (OSM) road network [11]

4_{Further explanation Provided in Appendix C} 5_{Edge Case Discussed in Appendix D}

6_{A road segment in the simplified OSM topology is defined as an edge between nodes,}

where nodes represent intersections.

(2) Match Scan locations to OSM road segments.

(3) Calculate window differences between scan observations grouped by scan devices and neighbourhood

(4) Determine when long breaks occur between scans in-order to demarcated new trips.

(5) For each trip in a neighbourhood; determine what unique road segments were driven over

Routes that had driven over roads that accounted for more then 80% of the parking spots within a neighbourhood were then as-sumed to have travelled through whole neighbourhood, routes that had less then 80% were discarded.

Name Description

Date Date when Neighbourhood was Scanned Hour Hour in which Neighbourhood was Scanned Neighbourhood

Code

Neighbourhood that was Scanned

Year Year

Month Month of Year Week Week of Year Quarter Quarter of Year Parking

Pres-sure

Value between 0 and 1 indicating much parking is in use in a neighbourhood

Table 1: Scan Data Used for Modelling

(7)

3.1.4 Summary. A total of 248 000 neighbourhood pressure obser-vations were produced after the naive route matching was applied with the 80% coverage filtering. Table 1 shows a summary of the data that was created with the goal of being used with the machine learning methods for imputation.

3.2 Parking Permit Purchases

Parking permit purchases were internally obtained from the Na-tionaal Parkeer Register (NPR) through relations provided by the Amsterdam City Government. The data is a transnational log of various types of parking passes purchases from Jan-2016 to Dec-2018.

The 4 most important parking permits within that data are; residential parking permits (BEWONERP), work parking permits (BEDRIJFP), citywide parking permits(STADSBREED) and short duration parking permits (BETAALDP).

We are unable to determine the locations that the first 3 types of parking permits are used through this data source because it only records purchases and these permit types are valid for long duration’s (3-12 months). BETAALDP passes provide a far better view of their use because they are purchased for hourly intervals and can be linked to an on-street pay point. The BETAALDP passes account for ±15% of parking usage for any given day.

The BETAALDP passes were aggregated for hourly intervals per neighbourhood in-order to align it with the data generated from the scan data in Section 3.1.1. The aggregation of the BETAALDP passes was performed by joining the pass purchases to the associated on-street pay points and then the pay-points were spatially joined to the geometries of the neighbourhoods. Four types of aggregations were built; 1) number of passes bought, 2) parking occupation as a function of how much time was bought in that hour(i.e. a pass bought at 13:30 will count for 0.5 of a pass at hour 13), 3)4) Parking pressure as expressed in terms of points 1 and 2.

Figure 7: Average Peak Parking Pressure due to Parking from On-Street BETAALDP Pass Occupation

The average peak parking pressure due to BETAALDP passes was also determined for each neighbourhood. These values are shown in Figure 7. We can see that a handful of neighbourhoods experience over 20% of their peak parking pressure from BETAALDP passes.

A total of 621 000 observations were created from the NPR data but only 204 000 remained after joining it with the scan data. Table 2 shows a summary of the data that was created to be used with the imputation machine learning methods.

Feature Name Description Date Data of Observation

Hour (t) Hourt in day when NPR window start Neighbourhood Code Neighbourhood of Observation NPR Purchases % @t-i Percentage of temporary parking passes

bough w.r.t the total parking spaces in a neighbourhood at timet − i

NPR Purchases # @t-i Number of temporary parking passes bought at timet − i

NPR Occupation % @t-i Percentage of parking spots occupied w.r.t the total parking spaces in a neigh-bourhood at timet − i

NPR Occupation # @t-i Number of parking spots occupied at timet − i

Avg. Peak Parking Pres-sure from NPR

The peak parking pressure from the hourly aggregated average from all the data of that neighbourhood

Table 2: NPR Data Used for Modelling |i ∈ [0, 3] where i is an hour

3.3 Neighbourhood Characteristics Data

Descriptive characteristics of neighbourhoods were obtained from the City of Amsterdam public data portal7. These datasets were only available for 2018 and were retrospectively used for 2017 and 2016.

Statistics relating to population and housing figures and densities were obtained from the Key Neighbourhood Figures dataset[17] and contained the following columns:

• area in ha • area in ha (land) • population

• population density per km2 (land)

• houses

• housing density per km2 (land)

• average occupancy

A dataset of all non-residential buildings in Amsterdam (mostly businesses) was obtained from the Buildings Functions Map dataset[16]. It classified each building according to the following 13 classifica-tions;

• Enterprises • Retail • Offices • Care

• Activities and Meeting • Unclear

• Parking

• Hotels bars restaurants • Education

• Going Out and Tourism • Religion

• Sports

• Public transport

7_{https://data.amsterdam.nl/datasets/} 6

(8)

3.4 Clustering Experiments

Three experimenters were performed to determine which clustering methods identified the most informative clusters. The clustering was performed on the normalised hourly occupations per neigh-bourhood caused by the BETAALDP passes.

Each experiment looked into what the optimal number of clusters should be. Clustering evaluation metrics were used to determine the optimal number of clusters and then they were used to compare the methods with respect to one another. The best clustering method was then used to create features based on the cluster classifications of the neighbourhoods and then combined with the features dis-cussed in sections 3.1.1 to 3.3. This was inspired by the works of Bock et al.[10] and Ionita et al.[20] who each respectively showed that clustering methods could help impute for completely unob-served areas and they could help identify unique areas within a city based on on-street pass sales.

3.4.1 Clustering Methods. Hierarchical and K-mean Clustering were used to determine similar neighbourhoods based on similar on-street parking behaviour from the NPR data. Hierarchical Clus-tering with a Dynamic Time Warping (DTW) similarity matrix and Euclidean K-Means was used. These methods were assessed in-order determine which method used between Bock et al.[10] and Richter et al.[26] was better. The clustering methods were assessed according to the metrics outlined in Section 3.4.2.

K-Means clustering createsK distinct, non-overlapping clusters. The desired number ofK clusters must be specified. K-means cat-egorises the data intoK clusters by attempting to minimise the within-cluster variation of the data[18]. The variation is defined as a function of the pair-wise squared Euclidean distance measure[18]. Hierarchical Clustering starts by defining every point as an in-dividual cluster and then iteratively joins clusters based on how similar they are by using a linkage technique to asses how similar cluster are to one another[18]. Complete linkage was used in this paper and determines the maximal intercluster dissimilarity[18]. The dissimilarity score from the Linkage technique if a function of a distance measure, such as the Euclidean or DTW, between pairs of data.

Dynamic Time Warping (DTW) performs nonlinear warping to match pairs of time series as best as possible [2]. DTW has shown state of the art performance in clustering times-series data when compared to Euclidean measures and how much data is needed to find a stable clustering optima[15].

3.4.2 Evaluation of Number of Clusters Used. Cluster evalu-ation metrics are required to determine the optimal number of clus-ters that the Hierarchical and K-mean clustering methods should use[18]. The Silhouette score and Davies-Bouldin index were cho-sen because they both asses different aspects of how well the clus-tering captures unique groupings of the data. The Silhouette score asses how close or far the points in one cluster are to the points in another cluster, while the Davies-Bouldin index evaluates how far apart clusters as a whole are from each other.

The Silhouette score is calculated by determining how far away each point in the dataset is from other points that don’t belong it its respective cluster grouping. It assigns a score between [0,1], where a score of 1 indicates samples from one group are as far away as

possible from the point in another clusters and a score of 0 indicates points of one group are very close to points in another group[18]. The Davies-Bouldin index is determined by considering the sum score of the ratio of euclidean overlap of clusters by comparing the sum of cluster diameters to the distance of the cluster centroids[18]. In short it is a good evaluation metric to asses the intra and inter cluster similarities. It provides a score between [0, ∞]; values close to 0 indicates a good partition between the clusters and scores above it represent poorer performance.

3.5 Imputation Experiments

Three different machine learning methods were investigated; Gradient Boosted Method (GBM), Random Forest and Least Absolute Shrink-age and Selection Operator (LASSO). These methods were chosen because they are easy to interpret compared to blackbox meth-ods and the current research [20],[8] showed a history of good performance by these methods in the domain of parking usage.

These three methods were then trained at three levels of data slicing; 1) one model is trained for the whole at the city level, 2) one model is trained per cluster at the cluster level, and 3) one model is trained per neighbourhood at the neighbourhood level. These models were then evaluated using R2 and RMSE with 5-fold cross validation in order to determine their ability to impute the parking pressure that a neighbourhood experiences. In total 9 experiments were conducted.

3.5.1 Supervised Learning Models. Supervised machine learn-ing models were used to predict the misslearn-ing parklearn-ing pressure values for hourly windows per neighbourhood. Gradient Boosted Method (GBM), Random Forest and Linear regression methods were trained and tested against the known parking pressure obtained from the scan data.

Gradient Boosted Regression Trees are a subset of Gradient Boosted Methods and are an ensemble learning method that use many weak learners construct from decision tress. Weak learners (trees) are built sequentially, the idea being that each consecutive learner slightly improves where the previous learner performed badly in, the end result being that a strong learner is built with an aversion towards learning bias in the data and is a strong method for producing strong generalisable models [5][25].

Random Forests are also an ensemble learning method for regres-sion. It constructs many individual (independent) decision trees from bootstrapped samples (with replacement) drawn from the training set, this makes the method robust against over-fitting on the training set[21]. The splitting of branches during the learning process of each tree is determined by the best split among a ran-dom subset of features; this prevents the ensemble of trees from all prioritising possible strongly predictive features and thus helps in capturing more of the variance in the data[5][25].

GBM and RF methods are robust against outliers in the training data and are able to handle categorical and continues data very well because they use decision trees (discrete decision structures) to construct the ensemble[21].

Least Absolute Shrinkage and Selection Operator is a linear re-gression method. This method performs both normalisation of the data and features selection[21]. Feature selection occurs through a weighted penalty term that forces the method to use as few features

(9)

as possible inorder to get the best result possible. This makes the method both easily interpretable when many (potentially redun-dant) features are used and helps produce a model that generalises well because it is less prone to overfit the data then compared to other linear methods[21].

Three baseline models were also created in order to have a bench-mark for the regression methods to be compared against. The base-line model used the aggregated average on an hourly basis for the whole city/cluster/neighbourhood (depending on the training level) and used the corresponding result to impute the missing values for a specific hour and neighbourhood.

3.5.2 Evaluation of Regression Model Results. The data was split via 5-fold cross validation and stratified across neighbour-hood codes & the hour of day in order to fairly assess each models performance for an individual neighbourhood. The special case of time-series cross validation train-test splitting was not necessary for the context of this problem because forecasting is not being done and thus the phenomena of temporal forecasting information leakage does not need to be accounted for. Root Mean Square Er-ror (RMSE)(eq. 1) andR2(eq. 2) errors are calculated by comparing the true values (y) vs the predicted values ( ˆy).

RMSE(y, ˆy) = s ÍT t =1(yˆt−y_t)2 T (1) R2₍_{y, ˆy) = 1 −} Ínsamples−1 i=0 (yi−yˆi)2 Ínsamples−1 i=0 (yi−y)¯2 (2)

The RMSE score indicates the localised error by averaging the error difference between the true value and the predicted value[21]. In other words it gives us an average estimate of how off our pre-dictions are in the units of the dependant variable.

TheR2score is chosen as it shows how much of the variance the model can capture with respect to the dependent variable[21]. It produces a values between 0 and 1, where 0 indicates no variance being captured and 1 indicates that the model was able to explain all the variance in the dependent variable. It is possible to get negative values when the predicted values perform worse then the line defined byyi−y (¯y being the mean)[7], i.e when the fraction¯ is larger then 1.

The mean and standard deviation of each neighbourhood was then calculated across the k-folds in-order to understand the per-formance of the methods at the neighbourhood level. The mean, standard deviation and medians were then calculated on top of the aggregated neighbourhood metrics to get a single metric value per training level. This single value was obtained to allow for the various methods at different training levels to be fairly and easily assessed against one another.

Concretely, the aggregated mean metrics across the k-folds are the; mean of means(Eq. 3a), standard deviation of means(Eq. 3b) and median of means(Eq. 3c). The aggregated standard deviation metrics across the k-folds are; mean of standard deviation(Eq. 3d), standard deviation of standard deviation(Eq. 3e) and median of standard deviation(Eq. 3f).

Mean of Metrics:

mean-meanд= mean(mean(L( ˆyдcn, ˜yдcn), c), n) (3a) std-meanд= std(mean(L( ˆyдcn, ˜yдcn), c), n) (3b) med-meanд= med(mean(L( ˆyдcn, ˜yдcn), c), n) (3c)

Standard Deviation of Metrics:

mean-stdд= mean(std(L( ˆyдcn, ˜yдcn), c), n) (3d) std-stdд= std(std(L( ˆyдcn, ˜yдcn), c), n) (3e) med-stdд= med(std(L( ˆyдcn, ˜yдcn), c), n) (3f)

Where:

L( ˆy,˜y) = Error Metric; e.g. R2_{or RMSE}

n ∈ N = [neighbourhoods] c ∈ C = [K-Folds]

д ∈ G = [training level group] Xc = c-Fold X subset for training

ˆ

Xc = c-Fold X subset for testing

ˆ

yc = c-Fold Y subset for testing

Xдc = {k ∈ Xc|training level(k) = д}

ˆ

Xдc = {k ∈ ˆXc|training level(k) = д}

ˆ

yдc = {k ∈ ˆyc|training level(k) = д}

fдc(x) = ML Method Trained on X_дc ˜

yдc = fдc( ˆX_дc) ˜

yдcn = {k ∈ ˜yдc|neighbourhood(k) = n}

mean(v, j) = mean of tensor v, applied along axis j std(v, j) = standard deviation of tensor v,

applied along axisj

med(v, j) = median of tensor v, applied along axis j

4 RESULTS

4.1 Clustering of Parking Pass Occupation

Behaviour

The best performing configurations for each of the clustering meth-ods that were tested are shown in table 3. A breakdown of the Silhouette and Davies-Bouldin index scores for each cluster size can be found in Appendix E, Table 6.

(10)

Table 3: Best Results for Different Clustering Methods

Model # Clusters SI Score DBI Score K-Means (Euclidean) 3 0.374 1.048 Hierarchical (DTW) 3 0.369 1.385 Hierarchical (Euclidean) 5 0.379 0.785 Hierarchical (Euclidean) 3 0.375 1.086

Hierarchical clustering with Euclidean distance using 5 clusters appeared to be the best performing model but was in fact outper-formed by K-Means (Euclidean) clustering8. The Silhouette score for the Hierarchical (euclidean) clustering method plateaued after 3 clusters were used (Figure 8), this means that the first 3 cluster explain the most variance within the data and that using more then 3 clusters complicates the model with little gain observed. Inspection of clusters 4 and 5 also revealed that they labelled a total of 3 neighbourhoods. These neighbourhoods had above normal peaks compared to the other groups but otherwise showed simi-lar behaviours (this would explain why the Davies-Bouldin score was dramatically lower at 5 clusters). Accordingly, Hierarchical (euclidean) Clustering with 3 clusters was the optimal configura-tion for that method but a consequence of this is that M-Means (Euclidean) is the real best performing model - with a better SI score of 0.374 and DBI score of 1.048.

Figure 8: Scores for Different Numbers of Clusters for Hier-archical (Euclidean) Clustering

The normalised on-street parking occupancy’s for each cluster is plotted in figure 9a9. The first 3 clusters capture the majority of the neighbourhoods with respective neighbourhood counts of 208,72 and 38. Each cluster appears to capture unique times-series behaviours. Cluster 0,1 and 2 all increase and peak in similar ways during the morning and mid-day hours but differ in how they decrease through the end of the day; 1) cluster 0 starts decreasing at the end of work hours but never increases again, 2) cluster 1 only starts decreasing 3 hours after working hours stop at around 8pm, and 3) cluster 2 starts decreasing at the end of work hours (± 5pm) and then increases around 8pm. Visual inspection of the normalised

8_{Plot of DBI & SI Scores for K-Means Can be Found in Appendix E, Figure 16} 9_{The SI and DBI Score plots for the Hierarchical Clustering Methods can be found in}

Appendix E

occupancy for K-means Clustering vs Hierarchical Clustering (DTW & Euclidan)9reveals that the similar time-series behaviours arise regardless of the method that is used which indicates a robustness to the findings.

The neighbourhoods and their corresponding cluster numbers are projected onto the Amsterdam city map in Figure 9b. We can observe that there is a structure in the spatial representation of the clusters, cluster 2 is predominantly centred around the city centre while clusters 0,1 are clustered in pockets around the peripheries of the city.

(a) Average Occupation Due to BETAALDP Pass Parking in Hourly In-tervals from 2016 to 2018 for Each Cluster Group

(b) Neighbourhood Clusters Mapped onto Amsterdam Figure 9: Neighbourhood Clusters Identified by K-Means(Euclidean) Clustering

4.2 Performance of Imputation Models

The aggregated R2 and RMSE 5-fold cross validation mean scores for the city, cluster and neighbourhood training levels are shown in Tables 4a, 4b and 4c. The aggregated R2 and RMSE 5-fold cross validation mean scores for various training levels are shown in Tables 5a, 5a and 5a. The grey cells represent the best score for a training level and bold represents the best score across all the experiments.

The RMSE scores performed very similarly for the GBM, Random Forest and LASSO methods across all the training levels and metric aggregations. This is interesting in that the R2 metrics vary wildly from method to method for each metric aggregation.

(11)

Table 4: Aggregated Scores of the Means of the RMSE and R2 Scores Calculated Across the 5-Fold CV Folds for Various Modelling Methods Trained at Different Levels

Model Type City Level Cluster Level Neighborhood Level RMSE R2 RMSE R2 RMSE R2 Baseline 0.184 -148.5 0.173 -114.8 0.090 -6.89 GBM 0.133 -7.89 0.119 -4.69 0.087 0.203 Random Forest 0.086 0.081 0.086 -0.24 0.088 0.191 LASSO 0.171 -6.42 0.166 -97.9 0.099 0.003

(a) Mean for Neighbourhood Mean CV Scores

Model Type City Level Cluster Level Neighborhood Level RMSE R2 RMSE R2 RMSE R2 Baseline 0.00 275.76 0.00 206.5 0.00 12.87 GBM 0.056 39.31 0.053 25.18 0.041 0.261 Random Forest 0.042 1.039 0.042 4.244 0.042 0.240 LASSO 0.171 -6.42 0.083 1172. 0.044 0.089

(b) Standard Deviation for Neighbourhood Mean CV Scores Model Type City Level Cluster Level Neighborhood Level

RMSE R2 RMSE R2 RMSE R2 Baseline 0.184 -25.36 0.173 -24.30 0.090 -1.879 GBM 0.121 -0.34 0.106 -0.11 0.077 0.226 Random Forest 0.075 0.201 0.076 0.211 0.080 0.199 LASSO 0.171 -6.42 0.144 -0.81 0.089 -0.00

(c) Median for Neighbourhood Mean CV Scores

Table 5: Aggregated Scores of the Standard Deviations of the RMSE and R2 Scores Calculated Across the 5-Fold CV Folds for Various Modelling Methods Trained at Different Levels

Model Type City Level Cluster Level Neighborhood Level RMSE R2 RMSE R2 RMSE R2 Baseline 1598. 0.097 1269. 0.091 90.88 0.045 GBM 0.008 6.343 0.008 5.012 0.007 0.095 Random Forest 0.007 0.303 0.007 0.755 0.006 0.108 LASSO 0.008 31.090 0.008 195.4 0.007 0.025

(a) Mean for Neighbourhood Standard Deviations CV Scores Model Type City Level Cluster Level Neighborhood Level

RMSE R2 RMSE R2 RMSE R2 Baseline 3210. 0.00 2382. 0.00 158.8 0.00 GBM 0.006 38.60 0.007 33.78 0.005 0.102 Random Forest 0.006 1.674 0.007 6.064 0.004 0.115 LASSO 0.006 334.6 0.008 2603. 0.005 0.048

(b) Standard Deviation for Neighbourhood Standard Deviations CV Scores

Model Type City Level Cluster Level Neighborhood Level RMSE R2 RMSE R2 RMSE R2 Baseline 136.8 0.097 241.34 0.091 31.43 0.045 GBM 0.007 0.132 0.007 0.121 0.005 0.061 Random Forest 0.005 0.076 0.005 0.071 0.005 0.074 LASSO 0.006 0.177 0.006 0.180 0.005 0.009

(c) Median for Neighbourhood Standard Deviations CV Scores

The Random Forest method significantly outperform the other model classes at the city and cluster training levels when we look at the median of mean and mean of standard deviation scores; but it performs similarly (worse) when we look at the mean of mean metrics. This indicates that the method is performing very poorly for a small subset of neighbourhoods which must have extremely bad R2 scores. The Random Forest method performs similarly to the gradient boosting machine at the neighbourhood training level when we look at the mean of means and median of means scores, the respective R2 differences being 0.012 and 0.027.

(a) Mean of RMSE and R2 Across 5-Fold CV per Neighbourhooda a_{The R2 Scores Mapped onto Amsterdam Can be Found in AppendixF, Figure 20}

(b) Standard Deviation of RMSE and R2 Across 5-Fold CV per Neigh-bourhood

Figure 10: Scatter Plot of the R2 vs RMSE Test Set Metrics from GBM Trained at the Neighbourhood Level. Each point Represents a Unique Neighbourhood. A Kernel Density Es-timate Plot is attached to each Metric on their Respective Axis.

The best training and method combination came from using the Gradient Boosted Method. It had a mean of mean R2 score of 0.2, a median of mean R2 score of 0.24 and a standard deviation of mean R2 score of 0.24. Figures 10a shows the behavior of these metrics .

(12)

The Gradient Boosted Method also obtained a mean-standard devi-ation R2 score of 0.095 and a median-standard devidevi-ation of 0.0641. Figure 10 illustrated the behaviours of these last two metrics. The metrics of each distinct model trained per unique neighbourhood is represented by point on on Figuers10a & 10; the colour od each point indicates how many data points were used to train that par-ticular model. The distribution plots of the RMSE and R2 scores are illustrated on each respective axis.

The difference in the median and mean R2 scores can be ex-plained by the observed long tail distribution in Figure 10a. Further-more, we can see the individual models built per neighbourhood are packed around the median R2 score of 0.24, and a large majority are packed around the interval [0.0,0.5]. The individual models achieve very good RMSE scores, tightly packed around 0.08, that is to say, that most of the models are wrong by 8% parking pressure units for their respective neighbourhoods.

We can also see a weak inverse relationship between RMSE and R2 in Figure 10a, that is to say, as the models can explain more variance in the data (R2) the lower the RMSE error will be (relative error), but the weakness of this relationship is of concern.

Figure 11: R2 Interval Counts for the Best Models per Train-ing Level & Best Baseline Model

The standard deviation of the RMSE and R2 scores for the models that were trained and tested across the 5-folds are shown in Figure 10. The models achieve very consistent RMSE scores across the 5-folds and deviates by only 0.005. The R2 scores, however, are centred around 0.061 and have a more aggressive standard deviation of 0.102, the implication of this is that many individual models for specific neighbourhoods will have a different R2 score by around 10% parking pressure units.

The number of neighbourhoods for specific R2 buckets for the best performing method per training level is plotted in Figure 11, this figure also includes the best performing baseline model which was obtained from the neighbourhood level.

Figure 11 shows that the best baseline model overwhelmingly produces R2 scores for neighbourhoods in the range (−20, 0, 1] while the remaining models perform very similarly to one another. The GBM models trained at the neighbourhoods level performs slightly better then the rest in that it outputs fewer neighbourhoods with an R2 score in the (−20, 0, 1] range and more in the (0.1, 0, 2]

Figure 12: Top 15 Features for Predicting Parking Pressure. Colours Correspond to Source of Feature. Red: Neighbour-hood Properties, Blue: Parking Betaald Pass Parking, Green: Temporal Properties

and (0.4, 0.5] ranges while the Random Forest model trained at the city level outputs more neighbourhoods with better R2 scores in the intervals (0.2, 0.3] and (0.6, 0.7].

The top 15 features for the Random Forest Model trained at the city level are shown in figure 12. Neighbourhood properties constitute nine out of fifteen features. Temporal-based features also appear quite prominently such as the hour of the day, the week of the year, the NPR parking occupation and purchases (as a percentage and count) for timest till t − 3.

5 DISCUSSION

5.0.1 Given the structure of the data, what supervised machine learn-ing methods produce the best results? The results indicate that it is possible to impute for parking usage using sparse spatio-temporal parking data from mobile devices through machine learning meth-ods with mild confidence. In all cases, the GBM and Random Forest Models outperformed the baseline models and the LASSO models. The LASSO and baseline models were unable to learn the variance of the dependent variable but maintained acceptable RMSE scores. The best performing models were the Gradient Boosting Ma-chines trained at the neighbourhood level; they achieved mean of mean R2 and RMSE scores across all the neighbourhoods with 0.203 and 0.087 respectively. Thus, each model per neighbourhoods is on average incorrect by 8% parking pressure units. Likewise, the model is only able to capture roughly 20% of the variance present in

(13)

the data. What is curious is that the GBM method performs poorly when trained at the city and cluster level while the random forest method radically outperforms it, a possible explanation is because GBM models use an ensemble of weak learners. Thus the GBM method is unable to discriminate the different neighbourhoods in-order to make the appropriate prediction while the Random Forest models can discriminate better what neighbourhood to predict for by using the properties of the neighbourhood because they are more prone to overfitting.

Closer inspection of how well the individual R2 scores are for each neighbourhood depending on the training level used (see Figure 11) reveals that the neighbourhood level is the most ideal because it has the least number of neighbourhoods with R2 scores less than −1. That is to say that the GBM trained at the neighbour-hood level performed noticeably better in the interval (−100, 0.1] and (04, 05] compared to the RF models, but this is at the expense of the GBM models performing comparatively less to the RF models in the intervals (0.6, 0.7] and (0.8, 1].

The low RMSE scores are encouraging in that they indicate that we can indeed use machine learning to impute parking pressure. The R2 score indicates that the models are to explain a small amount of variance, in most cases, around 20%, but this is still several mag-nitudes better than the baseline model.

Possible explanations for the low variance are due to the obfus-cated ground truth in the data and the bias present in how obser-vations are obtained. We naively constructed the routes that the mobile scanners took and thus could not filter out scans on roads that were observed because corners a mobile scanner crossing an intersection nor could we accurately determine which empty roads were driven past10. Further refinement of the ground truth could be obtained through the use of map matching11.

There is also the issue of observation bias skewing the ground truth due to the business rules described in Section 3.1.1, i.e. mobile scanners will drive through areas when they think there might be a high offender rate. This bias also breaks the normally distributed random route assumption made in Bock et al.[8] and might be the leading cause for why the variance is so low. This possible expla-nation could be further explored by simulating the mobile probe patterns observed in Amsterdam with that of the San Francisco[1] data where we have a better ground truth to validate against.

This paper also did not consider spatial relationships between entities; like adjacent neighbourhood parking pressures and the inter-intra traffic loads. The research by [8]Bock et al. and Yang et al.[30] indicate that these features might also better explain the parking pressure. Future work should considering applying the work by Yang et al.[30] in-order to incorporate spatial relation-ships between entities to impute for parking pressure for sparsely monitored areas.

5.0.2 What kind of features are the most important for imputing neighbourhoods? The best performing method was the GBM when models were trained at the neighbourhoods level, only the on-street NPR pass, month, quarter of the year, the hour of day and week of year features could be used because this scenario trained individual models per neighbourhood and thus neighbourhood

10_{Refer to Appendix D for Further Discussion} 11_{Refer to Appendix C for Further Discussion}

properties were rendered useless. The fact that the GBM method at this training level was able to capture some variance in the dependent variable shows that the on-street NRP parking data indeed is of value, even if it accounts for less then 15% of parking behaviour within Amsterdam.

We can look at how the neighbourhood properties have an influ-ence on parking pressure as a whole when training at the city level. The feature importance diagram in Figure 12 reveals how aspects of nightlife and living density are the most important; this is inter-esting because the model was also given one-hot encoded features of all the neighbourhood codes and instead of using those to iden-tify the neighbourhoods it used more descriptive neighbourhood properties. The average peak parking pressure from NPR parking also had a strong impact on the prediction of parking, it could be that this feature allows the model to determine the neighbourhoods that experience on average high peaks of on-street betaald pass purchases and this is able to regulate the influence of how the NPR features affect the parking pressure prediction. Once we have considered the majority of the features that describe the neighbour-hoods we then encounter thet to t − 3 NPR betaald pass features that were also used at the neighbourhood level, which is indicative that the last 3 hours of parking NPR usage are indeed important. It should be noted that three types of NPR features exist in the top 15, 1) parking occupation as a percentage of pressure, 2) parking purchases as a percentage of pressure, and 3) parking occupation as a count of purchases (only fromt − 2 on-wards).

5.0.3 Can neighbourhood characteristics of over represented neigh-bourhoods be used by supervised machine learning methods to better predict for under represented neighbourhoods? The results from Figure 11 and the aggregated metrics in Table 3a show that the methods trained at the neighbourhood level outperform the meth-ods trained with neighbourhood properties (the city and cluster levels). This indicates that the methods that were tested are unable to use neighbourhood properties to better predict for under repre-sented neighbourhoods. This could be because the methods them selves have their loss functions defined according to the loss of the whole training vector and do not calculate a stratified weighted error. This might create a system that optimises for neighbourhood accuracy regardless of the imbalance of training points between neighbourhoods.

5.0.4 Do neighbourhoods show distinct parking behaviour and do the clustering of these behaviours help? The clustering results in Section 4.1 showed how 5 distinct time-series patterns exist. The first 3 correspond to distinct zones within the city of Amsterdam while the last two clusters only separate outlier neighbourhoods.

Cluster 0 captures time series behaviour that corresponds exclu-sively to business hours. Visual validation of this shows that cluster 0 neighbourhoods corresponding to areas that are predominantly business-focused, such as the south and south-west of Amsterdam. Cluster 0 was also able to identify the surrounding areas around the "knowledge highway" (centre east of Amsterdam), this area has a mix of municipal, higher education and business buildings. Towards the north of Amsterdam, we can also see that neighbour-hoods were classified as cluster 0. These areas also correspond to dense business zones such as the dockyards, shell corporate offices and businesses around Sloterdijk Station.

(14)

Cluster 1 captures long-form time series behaviour that extends beyond normal work hours, until 7 pm. This behaviour is puzzling because it corresponds to areas that are exclusively high-density apartments; neighbourhoods within I’burg, neighbourhoods around the inbound highway to the north of Amsterdam and around Sloter-plas. A possible explanation for this is that residences perform air-ends near their homes after work and thus they use on-street parking on the way home. This could be validated in future work by looking at commuter patterns within these areas.

Cluster 2 captures time-series behaviour corresponds to be-haviour one would expect from mixed-use areas; occupation through-out work hours followed by a decrease as people leave for home but then a slight increase as people come for the nightlife. Plotting these zones on the Amsterdam geography further validates this assumption because it shows how the neighbourhoods belonging to cluster 0 inhabit mixed-use areas of the city with both housing and venues for nightlife; which is a key characteristic of neighbour-hoods that reside between the highway encircling the city and old Amsterdam.

The feature importance analysis in Section 4.2 revealed that the identification of cluster 2 was an important feature for predicting parking pressure (Random Forest Method trained at the city level). This could be because cluster 2 overlaps with areas that have low parking pressure peaks due betaald passes(Figure 7 and thus the method uses it to discriminate zones where knowing the betaald passes information is of little influence to the total pressure of that neighbourhood. This assumption is further strengthened by the fact the feature Avg. Peak Parking Pressure from NPR was classified as more important than the cluster feature cluster_2. Visual inspection of the 3 groups of distinct time-series patters from all three clustering methods reveals very similar behaviours and indicate that the patterns found are independent of the method used and thus can be relied on.

6 CONCLUSION

This paper has proposed a working solution to impute parking pressure in sparsely monitored areas within the City of Amsterdam and provided an outline for future work in an understudied sub-field of smart-parking.

Gradient Boosted Tree and Random Forest Methods outper-formed both the average baseline Method and the LASSO Method. Gradient Boosted Trees performed the best when models were built for each neighbourhood, with an average R2 score of 0.20 and RMSE score of 0.087 across all the neighbourhoods. Higher R2 scores may be achieved if the ground truth of the data is further refined through the use of map matching and historical road net-works. This will likely result in improved estimates of the observed parking pressure neighbourhood experiences. This analysis may be refined by investigating spatial components through explicit features of adjacent neighbourhoods or the use of more advanced deep learning methods as outlined by [30]Yang et al..

Random Forests performed the best when compared to alter-native methods within the City and Cluster training level eximents. The models built using the Random Forest method per-formed marginally worse (±0.02) then GBM when it was trained at the neighbourhood level. The most significant predictors of parking

pressure from the best performing method trained at the city level (Random Forest) were the following; neighbourhood characteristics associated with density, buildings associated with leisure, temporal features, and features built from the NPR data.

This paper showed that neighbourhood characteristics does not improve the predictive power for under observed neighbourhoods when methods are trained at higher levels of abstraction (cluster and city level). Research in this area may be refined by defining the loss function with respect to a stratification criteria of neighbourhoods. K-means clustering with 3 clusters was shown to be the best per-forming clustering configurations out of the experiments that were conducted. This method identified 3 unique time-series NPR occu-pation patterns that corresponded with distinct areas categorised by mixed spaces, work offices, and residential areas. Also, the cluster-ing results help perform imputation when they were used with the Random Forest Method trained at the city level. Visual validation indicated that the unique time-series behaviours were independent of the clustering methods used. Further validation could be per-formed by inspecting properties of the neighbourhoods such as the percentage of building types that reside in each cluster grouping.

The imputation of sparsely monitored parking is an understud-ied phenomenon which has the capacity to greatly improve the efficiency of transportation within the City of Amsterdam. This paper has demonstrated K-Means clustering, Random Forest and Gradient Boosted Tree Methods can be used to impute for parking pressure in sparsely observed areas using data provided by mobile sensors. Employing these methods to more accurately assess park-ing pressure behaviours within cities will improve the CityâĂŹs capacity to serve its citizens by reducing on-road vehicle traffic and its overall carbon footprint.

REFERENCES

[1] [n. d.]. Parking Sensor Data Guide for San Francisco. http://sfpark.org/ wp-content/uploads/2014/06/docs_sensordata.pdf

[2] 2007. Dynamic Time Warping. In Information Retrieval for Music and Motion. Springer Berlin Heidelberg, Berlin, Heidelberg, 69–84. https://doi.org/10.1007/ 978-3-540-74048-3_4

[3] 2019. De toekomst van de auto in Amsterdam. https://www.amsterdam.nl/ actueel/nieuws/toekomst-auto/

[4] Claudio Badii, Paolo Nesi, and Irene Paoli. 2018. Predicting Available Parking Slots on Critical and Regular Services by Exploiting a Range of Open Data. IEEE Access 6 (2018), 44059–44071. https://doi.org/10.1109/ACCESS.2018.2864157 [5] Christopher M. Bishop. 2006. Pattern recognition and machine learning. Springer,

New York.

[6] Joe Blitzstein, Pfister Hanspeter, and Kaynig-Fittkau Verena. [n. d.]. CS109 Data Science. https://cs109.github.io/2015/

[7] Fabian Bock, Jiaqi Liu, and Monika Sester. 2016. Learning On-Street Parking Maps from Position Information of Parked Vehicles. In Geospatial Data in a Changing World, Tapani Sarjakoski, Maribel Yasmina Santos, and L. Tiina Sarjakoski (Eds.). Springer International Publishing, Cham, 297–314. https://doi.org/10.1007/ 978-3-319-33783-8_17

[8] Fabian Bock, Sergio Di Martino, and Monika Sester. 2016. What are the poten-tialities of crowdsourcing for dynamic maps of on-street parking spaces?. In Proceedings of the 9th ACM SIGSPATIAL International Workshop on Computational Transportation Science - IWCTS ’16. ACM Press, Burlingame, California, 19–24. https://doi.org/10.1145/3003965.3003973

[9] Fabian Bock and Monika Sester. 2016. Improving Parking Availability Maps using Information from Nearby Roads. Transportation Research Procedia 19 (2016), 207–214. https://doi.org/10.1016/j.trpro.2016.12.081

[10] Fabian Bock, Karen Xia, and Monika Sester. 2018. Mapping similarities in temporal parking occupancy behavior based on city-wide parking meter data. Proceedings of the ICA 1 (May 2018), 1–5. https://doi.org/10.5194/ica-proc-1-12-2018 [11] Geoff Boeing. 2017. OSMnx: New methods for acquiring, constructing, analyzing,

and visualizing complex street networks. Computers, Environment and Urban Systems 65 (Sept. 2017), 126–139. https://doi.org/10.1016/j.compenvurbsys.2017. 05.004

(15)

[12] Murat Caliskan, Andreas Barthels, Bjorn Scheuermann, and Martin Mauve. 2007. Predicting Parking Lot Occupancy in Vehicular Ad Hoc Networks. In 2007 IEEE 65th Vehicular Technology Conference - VTC2007-Spring. IEEE, Dublin, Ireland, 277–281. https://doi.org/10.1109/VETECS.2007.69

[13] OpenStreetMap contributors. 2015. OpenStreetMap Mapmathcing. https: //www.openstreetmap.org

[14] David, Andrea, Overkamp, Klaus, Scheuerer, and Walter. 2000. Event-oriented forecast of the occupancy rate of parking spaces as part of a parking information service. Proceedings of the 7th World Congress on Intelligent Systems (2000). [15] Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn

Keogh. 2008. Querying and Mining of Time Series Data: Experimental Compar-ison of Representations and Distance Measures. Proc. VLDB Endow. 1, 2 (Aug. 2008), 1542–1552. https://doi.org/10.14778/1454159.1454226

[16] Amsterdam City Goverment. [n. d.]. Amsterdam Building Functions. https: //data.amsterdam.nl/datasets/CWujX-uXU9R8Sg/

[17] Amsterdam City Goverment. [n. d.]. Key Neighborhoods Figures. https: //data.amsterdam.nl/datasets/LnsAX8TtqdtJ2A/

[18] Maria Halkidi, Yannis Batistakis, and Michalis Vazirgiannis. 2001. On Clustering Validation Techniques. Journal of Intelligent Information Systems 17, 2/3 (2001), 107–145. https://doi.org/10.1023/A:1012801612483

[19] Andrei Ionita. 2017. Extending Estimation of Parking Occupancy to Untracked City Areas using City Background Information. Ph.D. Dissertation. Rwth Aachen University. http://dbis.rwth-aachen.de/cms/theses/smart-parking

[20] Andrei Ionita, AndrÃľ Pomp, Michael Cochez, Tobias Meisen, and Stefan Decker. 2018. Where to Park?: Predicting Free Parking Spots in Unmonitored City Areas. In Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics - WIMS ’18. ACM Press, Novi Sad, Serbia, 1–12. https: //doi.org/10.1145/3227609.3227648

[21] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Springer Texts in Statistics, Vol. 103. Springer New York, New York, NY. https://doi.org/10.1007/978-1-4614-7138-7 [22] Trista Lin, Herve Rivano, and Frederic Le Mouel. 2017. A Survey of Smart Parking

Solutions. IEEE Transactions on Intelligent Transportation Systems 18, 12 (Dec. 2017), 3229–3253. https://doi.org/10.1109/TITS.2017.2685143

[23] Suhas Mathur, Tong Jin, Nikhil Kasturirangan, Janani Chandrasekaran, Wenzhi Xue, Marco Gruteser, and Wade Trappe. 2010. ParkNet: drive-by sensing of road-side parking statistics. In Proceedings of the 8th international conference on Mobile systems, applications, and services - MobiSys ’10. ACM Press, San Francisco, California, USA, 123. https://doi.org/10.1145/1814433.1814448

[24] Patterson, Gearhart, Nesbitt, Dara-Abrams, Knisely, Dongen, Kreiser, and diluca. [n. d.]. Valhalla Repository. https://github.com/valhalla

[25] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830. [26] Felix Richter, Sergio Di Martino, and Dirk C. Mattfeld. 2014. Temporal and Spatial Clustering for a Parking Prediction Service. In 2014 IEEE 26th International Conference on Tools with Artificial Intelligence. IEEE, Limassol, 278–282. https: //doi.org/10.1109/ICTAI.2014.49

[27] Donald C. Shoup. 2006. Cruising for parking. Transport Policy 13, 6 (Nov. 2006), 479–486. https://doi.org/10.1016/j.tranpol.2006.05.005

[28] DuÅąan Teodorovic and Panta Lucic. 2006. Intelligent parking systems. European Journal of Operational Research 175, 3 (Dec. 2006), 1666–1681. https://doi.org/10. 1016/j.ejor.2005.02.033

[29] Eleni I. Vlahogianni, Konstantinos Kepaptsoglou, Vassileios Tsetsos, and Matthew G. Karlaftis. 2016. A Real-Time Parking Prediction System for Smart Cities. Journal of Intelligent Transportation Systems 20, 2 (March 2016), 192–204. https://doi.org/10.1080/15472450.2015.1037955

[30] Shuguan Yang, Wei Ma, Xidong Pi, and Sean Qian. 2019. A deep learning ap-proach to real-time parking occupancy prediction in spatio-termporal networks incorporating multiple spatio-temporal data sources. arXiv:1901.06758 [cs, stat] (Jan. 2019). http://arxiv.org/abs/1901.06758 arXiv: 1901.06758.

[31] Yu Zheng, Xing Xie, Chengyang Zhang, Yin Lou, Wei Wang, and Yan Huang. 2009. Map-Matching for Low-Sampling-Rate GPS Trajectories. In Proceed-ings of 18th ACM SIGSPATIAL Conference on Advances in Geographical In-formation Systems. https://www.microsoft.com/en-us/research/publication/ map-matching-for-low-sampling-rate-gps-trajectories/

ACKNOWLEDGMENTS

I would like to express my gratitude to the City of Amsterdam for allowing me to work on this stimulating research and for inspiring me in what it means to be a data driven organisation that serves its people. Particular thanks go to my supervisors Bas Schotten and Elenna Dungundji who both helped me achieve a lot more then I would have otherwise been able to in given the short time that was available, and to Frank Nack for overseeing my thesis defence. I would also like to extend my gratitude to my partner Lauren Veckranges for encouraging me to follow my dreams to the other side of the world and for all the support she gave me. A special thanks goes to my family from afar for continually supporting me in my ventures and to the life long friends I have made in the Hardly Working group at the University of Amsterdam.

A

ACRONYMS

DTW Dynamic Time Warping IDW Inverse Distance Weighting SI Silhouette-Index

DBI Davies-Bouldin-Index

LASSO Least Absolute Shrinkage and Selection Operator RMSE Root Mean Square Error

NPR Nationaal Parkeer Register GBM Gradient Boosted Method SVM Support Vector Machine

B

DATA SCIENCE PROCESS

Figure 13: Data Science Process Outline by Blitzstein et al.[6]

(16)

C

MAP MATCHING

Map matching is a method used to snap gps coordinates of a travel-ling entity onto a road network. It’s application is predominantly in satellite navigation systems used by road vehicles(see Figure 14). This makes map matching an ideal method for determining the routes that mobile parking sensors that drive around the city if we treat the gps scan locations they produce as noisy gps route coordinates of the mobile sensor. Zheng et al.[31] developed map matching methods that are robust for low sample rates and are able to determine what roads have been driven over even if gaps exists between GPS observations. Map matching is able to account for the edge case that the naive route detection encounters in Appendix D because it would treat scans on roads that were driven through as adjacent noisy gps observations(like point 9 in Figure 15).

The open source map matching library Valhalla[24] was success-fully used to determine the routes that the mobile sensors took. Valhalla was able to return routes in less then 1 second for up to 10, 000 gps points given to it. There was another 3 seconds of over-head for each route calculation because of the pre and post data transformations that had to be done in-order to determine the park-ing pressure at a road segment level. The post-pre processpark-ing had to segment scans into neighbourhood visits inorder to avoid double counting roads that were visited later in the day but this wold have needed over 800 computational hours to determine all the routes since 2016. This high compute time was caused because small por-tions of a neighbourhood were visited every day by the various mobile sensors and thus each visitation would need 4 seconds of compute time to determine the route taken in that neighbourhood.

Figure 14: Example of Map Matching[13]

D

NAIVE ROUTE DETECTION EDGE CASE

Figure 15 illustrates an edge case when the naive route detection fails. Parked cars that were scanned by the mobile sensors are marked by an X and the associated number represents the order that the scans occurred. The blue rectangles represent roads that were driven over while the red rectangle represents a road that was incorrectly assigned as being driven over. This edge case occurs when a mobile sensors takes a corner and is able to perform a scan of a parked car(s) that is(are) adjacent to it as it takes the corner. The naive route detection method is unable to take this case into account because it uses a rolling window over the chronological order of the scans that were detected and marks any unique road segment as having being driven through if at least one scan was detected on it.

Figure 15: Example of an Edge Case when Naive Route De-tection Incorrectly Selects a Road Segment That Was Not Driven Over

E

CLUSTERING

Figure 16: Scores for Different Numbers of Clusters for K-Means (Euclidean) Clustering

The cluster tagging names are distinct and should not confused to mean cluster 1 in one method is the same as cluster 1 in another method.

(17)

Figure 17: Average Occupation Due to BETAALDP Pass Park-ing in Hourly Intervals from 2016 to 2018 for Each Cluster Group | Hierarchical Clustering (Euclidan)

Figure 18: Average Occupation Due to BETAALDP Pass Park-ing in Hourly Intervals from 2016 to 2018 for Each Cluster Group | Hierarchical Clustering (DTW)

F

MAPS & RESULTS

(18)

Figure 19: Number of Times a Road Segment was Scanned - Log Scale

Figure 20: Mean of RMSE and R2 Across 5-Fold CV per Neighbourhood from Gradient Boosted Method (GBM) Method Trained at the Neighbourhood Level

(19)

Figure 21: Map Of Amsterdam Road Network