Penalization of travel time uncertainty in optimal route classification

(1)

Faculty of Economics and Business

Amsterdam School of Economics

Requirements thesis MSc in Econometrics.

1. The thesis should have the nature of a scientic paper. Consequently the thesis is divided up into a number of sections and contains references. An outline can be something like (this is an example for an empirical thesis, for a theoretical thesis have a look at a relevant paper from the literature):

(a) Front page (requirements see below)

(b) Statement of originality (compulsary, separate page) (c) Introduction (d) Theoretical background (e) Model (f) Data (g) Empirical Analysis (h) Conclusions

(i) References (compulsary)

If preferred you can change the number and order of the sections (but the order you use should be logical) and the heading of the sections. You have a free choice how to list your references but be consistent. References in the text should contain the names of the authors and the year of publication. E.g. Heckman and McFadden (2013). In the case of three or more authors: list all names and year of publication in case of the rst reference and use the rst name and et al and year of publication for the other references. Provide page numbers.

2. As a guideline, the thesis usually contains 25-40 pages using a normal page format. All that actually matters is that your supervisor agrees with your thesis.

3. The front page should contain:

(a) The logo of the UvA, a reference to the Amsterdam School of Economics and the Faculty as in the heading of this document. This combination is provided on Blackboard (in MSc Econometrics Theses & Presentations).

(b) The title of the thesis

(c) Your name and student number (d) Date of submission nal version

(e) MSc in Econometrics

(f) Your track of the MSc in Econometrics 1

Penalization of travel time uncertainty in

optimal route classification

G.W.E. Romme

TNO ERP Making Sense of Big Data

Student number: 10330461

Date of final version: August 14, 2018 Master’s programme: Econometrics

Specialisation: Big Data Business Analytics Supervisor: Dr. N.P.A. van Giersbergen Second reader: Dr. E. Aristodemou

TNO supervisor: Dr. M.H.T. de Boer

Abstract

In 2017 traffic congestion costs have risen to a record level in the Netherlands. Experts recom-mend investments in smart travel solutions instead of in new roads. Such a solution could be a navigation system that is able to let its user avoid roads with a lot of congestion risk. In this paper, an optimal route classification algorithm with penalization of travel time uncertainty is proposed. This algorithm was trained by performing a logistic regression on real-life highway data from the Netherlands. The algorithm was subsequently incorporated into a simplified route navigation system, of which the performance was measured by comparison to a non-penalizing route navigation system on several aspects of travel time reliability. The results indicate that the penalizing navigation system never performs worse than the non-penalizing system and that it outperforms the non-penalizing system on three out of the five trajectories that were selected for this research.

(2)

Statement of Originality

This document is written by Gijs Wilhelm Egbert Romme, who declares to take full respon-sibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Introduction

On the news website NOS it is reported that in 2017 the cost of traffic congestion in the Netherlands has risen to a record level of a billion euros (Vrachtvervoerders lijden 1 miljard euro schade door files, 2017). In the same article, Arthur van Dijk, chairman of the Dutch interest group for the transport sector TLN, explains this damage is caused by halted production and stores possibly missing out on revenues because required components or products have not been delivered yet, among others. Moreover, it is expected that congestion will only get worse as the Dutch economy is growing. This means that more people will be working and transport will increase too. Because of this, until 2022 travel time could increase by 28% compared to 2016 (Van der Aa, 2017). The government is investing significant amounts of money into expanding the roads capacity, but Van Dijk states that this will not be sufficient. He recommends that people take the train more or that the government invests in smart solutions such as apps that help people in their travels (Vrachtvervoerders lijden 1 miljard euro schade door files, 2017). A possible goal of such smart solutions is to reduce travel time uncertainty, to prevent the congestion costs caused by late deliveries, for example. This is what will be the focus of this research.

The goal of this research is to create an optimal route classification model that penalizes the travel time uncertainty surrounding the different routes in a certain trajectory. The model will be based on the logistic regression model and the penalization of travel time uncertainty will be applied in its error function. This is done in order to obtain more reliable travel times when traveling from the origin of a trajectory to its destination. The optimal route classification model will then be used inside a simplified route navigation system in which the user can choose from a number of different risk profiles, which depend on the user’s current situation and risk aversion. The user will also enter an origin, a destination and a desired time of arrival into the navigation system. Based on these inputs, the system will not only recommend the optimal route to take depending on the chosen risk profile, but also a time of departure. With this, the user will hopefully be able to obtain more reliable travel times than by following the recommendations of a navigation system in which penalization of travel time uncertainty is not incorporated. The definition of ‘optimal’ in this case is therefore ‘the most reliable according to

(5)

the chosen risk profile’. Because the user is able to specify a risk profile, the route navigation system essentially takes away the user’s effort of having to decide how much time he or she will allocate to a certain travel to be able to reach its destination at the desired time. Based on the chosen profile, the algorithm will make a tradeoff between the user having to leave early and being at risk of arriving at his or her destination later than what was planned.

For each risk profile, a different route navigation system will be trained due to different system settings for each profile. In addition, a separate system needs be trained for every trajectory on which the system is used. This is because the route classifier will be optimized in order to learn details and nuances about a trajectory to be able to decide at what exact time which route in this trajectory will be optimal according to the selected risk profile. Because of this, the model will not be generalizable to other trajectories. The dataset that will be used in this research contains data on most of the Dutch freeways from 2016. From this data, travel times can be computed on which the navigation system can be trained. The navigation system’s performance will be compared to the performance of a navigation system into which penalization of travel time uncertainty has not been incorporated. This leads to the following research question: to what extent can penalization of travel time uncertainty improve the reliability of the recommendations of a navigation system? This question will be answered by evaluating the navigation system’s recommendations on several performance measures that will be defined later in this research.

The next chapter of this paper will cover theory on personalized routing, travel time uncer-tainty and the logistic regression model. In Chapter 3, the data that was used in this research will be discussed. Moreover, the methods that were used to process this data will be explained. Then, the models for the optimal route classifier and the route navigation system will be dis-cussed in Chapter 4. In addition, it will also be explained what kind of tests will be performed in this research. The results of this research will then be presented and discussed in Chapter 5 and in the final chapter a conclusion on the navigation system’s performance will be drawn.

(6)

Chapter 2

Theoretical background

In the first section of this chapter, previous research on personalized routing will be summa-rized. After that, theory on travel time uncertainty will be discussed. Then finally, the logistic regression model will be explained.

2.1 Personalized routing

In recent years, multiple developments have been made in the field of routing to create navigation algorithms that are more personalized to their user. Ceikute & Jensen (2013) show that local drivers often do not follow the routes that are recommended by a large navigation service. They conclude that different alternative routes within a certain trajectory are taken by different drivers and that even the same driver does not follow the same route all the time. According to Ceikute & Jensen, the reason why this happens is that these drivers may have extensive knowledge of the local traffic conditions based on their past driving experiences, causing them to prefer different routes than the route that was recommended by the navigation service based on a measure other than simply the travel time. For navigation systems to make recommendations that better match the choices that drivers make in reality the driving preferences of the systems’ users have to be modeled into the systems’ underlying algorithms.

Classic navigation systems usually recommend the same routes to each navigation system user, where the recommendations are based on minimizing the travel time or traveled distance. Drivers may, however, prefer other measures being minimized, such as the number of traffic lights, the likelihood of congestion or fuel consumption. Moreover, it is believed that drivers may base their routes choices on multiple cost measures at the same time (Yang, Guo, Ma & Jensen, 2015; Balteanu, Joss & Schubert, 2013; Hendawi et al., 2017). An intuitive way of simultaneously taking into account any number of cost measures in a personalized navigation algorithm is by assigning a weight to each of the measures and then recommending the route for which the weighted sum of costs is minimal (Yang et al., 2015; Hendawi et al., 2017). Balteanu et al. (2013), on the other hand, model driving preferences as sets of trade-off factors between two cost measures. They then compare the routes that were selected by single navigation system

(7)

users to other available pareto-optimal routes on these two cost measures. In this way, they can create preference distributions that explain why a certain user prefers a chosen route over other pareto-optimal routes that were not chosen.

Some of the cost measures that can be minimized are time-dependent and uncertain. Travel time, for example, differs throughout the day, as it depends on various traffic conditions. To take this into consideration, Yang et al. (2015) identify different contexts for each driver that are based on temporal aspects, e.g., morning versus non-morning. For each of the contexts, different driving preferences are specified. When a driver is planning a trip, the algorithm of Yang et al. then identifies the context that best corresponds to the timing of the trip and uses the driving preferences corresponding to that context to find the route that is most preferred for the driver in question. Hendawi et al. (2017) also take into account the time-dependency of some cost measures and make their routing algorithm recommend an optimal time of departure next to the optimal route for a certain travel. They do this by first evaluating the costs corresponding to a certain set of possible times of departure. Then, they select the time of departure for which the sum of these costs weighted by the preference weights is minimal.

Looking more specifically into personalized routing where risk preferences are taken into account, Hu, Yang, Guo & Jensen (2018) propose a risk-aware path selection algorithm in which travel times are described as a time series of random variables. They let each random variable represent a different time interval of 15 minutes from a total time horizon of a week. In this way, they take into consideration the time-dependency of and uncertainty surrounding travel times. Hu et al. then model a driver’s risk preferences by means of utility function categories, where the assumption is that the route with the highest utility for a certain driver is the preferred route. The utility functions that they use are all non-increasing in travel time, as less travel time is preferred. To model the risk preferences of drivers, they create three different preference categories: risk loving, risk neutral and risk averse. For risk loving and risk averse drivers utility function categories of convex and concave functions are used, respectively. For the utility functions corresponding to risk neutral drivers the only condition is that they are non-increasing. To then translate the utility functions into route preferences Hu et al. use principles of stochastic dominance.

In the algorithm of this paper, similar to Yang et al. (2015) and Hendawi et al. (2017), multiple performance measures will be taken into account. These performance measures include the expected travel time, but also other measures that are based on comparing the navigation system’s recommended travel times to actual travel times. However, as the main goal of this research is to examine to what extent penalization of travel time uncertainty will improve the reliability of the recommendations of a navigation system, and not necessarily to create the best possible navigation system, the optimization over these performance measures will be performed in a simpler way than in the methods that were explained in this section. The method of optimization from this research will be explained in Chapter 4.3. Another similarity to Hendawi et al. (2017) is that for each risk profile that will be used in this paper (corresponding

(8)

to a certain risk preference) a different optimal time of departure will be recommended as well. The risk profiles from this paper are comparable to the risk preference categories from Hu et al. (2018), but also to the different contexts that are identified for each navigation system user by Yang et al. (2015). In this research it is assumed that a person’s risk preference is not constant, but possibly changes with each context. Because of this, four different risk profiles are used in the navigation system of this paper that each represent a different context paired with a certain risk preference. Different to Hu et al. (2018), for each risk profile, different performance measures will be optimized over instead of optimizing over the same measure (the expected travel time) but using different utility functions.

The main novelty of this paper is that the selection of optimal routes will happen through a classification model in which penalization of travel time uncertainty is applied in the error function. By classifying purely on travel times from a certain trajectory, the travel time distri-butions of the different routes in this trajectory are implicitly learned by the navigation system. Since all information about a certain route, such as the number of lanes, the maximum driving speed, etc., is also implicitly contained in the travel times, these data do not have to be worried about anymore. Moreover, the technique of applying penalization of uncertainty in the error function of a model has never been applied before.

2.2 Travel time uncertainty

To be able to penalize travel time uncertainty in the error function of a model, first a measure of travel time uncertainty is needed. Something to be taken into consideration for this is that distributions of travel times are generally not symmetric (Van Lint & Van Zuylen, 2005). Van Lint & Van Zuylen therefore recommend focusing on median and percentile values to determine the level of uncertainty attached to a certain route. Yeh, Kuo, Chen & Lin (2014) state however that while an algorithm is being trained, the distribution of its features is often not yet fully known. Because of this, median and percentile values are hard to incorporate in such an algorithm.

Another point that Yeh et al. (2014) make is that an uncertainty measure should mostly focus on the level of loss. This is because in certain situations, the level of possible loss should be kept below a certain threshold. An illustration specific for the route navigation context of this paper is that sometimes the user has a high need of not arriving late, but does not care whether he or she arrives earlier at the destination. For this reason, the variance for instance would not be a good measure of uncertainty to penalize, as only losses (for the example given: extra travel time) should be seen as detrimental. In their risk-avoiding reinforcement learning algorithm, Yeh et al. use the uncertainty measure of the expected loss below a certain threshold. In the context of travel times this could be translated to the expected travel time on top of a certain threshold. This uncertainty measure takes into account the asymmetry of travel time distributions and in addition is learnable.

(9)

while arriving earlier than planned is fine. This is because the main goal of the algorithm of this research is to arrive in time more often, so by being early this goal is still achieved. Therefore, based on the uncertainty measure used by Yeh et al. (2014), the uncertainty measure that will be penalized in this research is the expected extra travel time on top of the mean travel time: E(T |T > E[T ]) − E(T ). It will be further referred to as the ‘expected delay’. It is different from the measure that was used by Yeh et al. in that the level of expected loss above a certain threshold (the threshold being the mean travel time) is used instead of the level of expected loss below a certain treshold.

2.3 Logistic regression

The model that will be used to perform the optimal route classification is the multinomial logistic regression model. Say there are K classes in this model. For every observation n (n = 1, ..., N ), the logistic regression model then returns K posterior probabilities pnk (k = 1, ..., K), which

can be seen as representing the probabilities that the corresponding observation belongs to class K according to the model. The observation is then classified as belonging to the class with the highest pnk value.

The following formulas are all taken from Bishop (2006, p. 209). The model takes as its input a vector of (possibly transformed) features φ_n and multiplies it with a vector of weights wk to get the activations:

ak= wkφn. (2.1)

These activations are then put into the softmax activation function to obtain the posterior probabilities: pnk = exp(ak) P j exp(aj) . (2.2)

From the posterior probabilities, the error function can be computed. For the algorithm of this research, the log loss error function will be used. The log loss error function for one observation is as follows:

En(w1, ..., wK) = −

X

k

ynkln(pnk), (2.3)

where ynk is a boolean that is equal to 1 if observation n belongs to route k. The total error is

computed by simply summing the above error term over all N observations.

It is important to note that the log loss error function only increases signficantly in case of misclassification. That is, if ynj = 1 while pnj is close to 0, a large term of −ln(pnj) is added

to the total error. If pnj is close to 1, however, the ln(pnj) term becomes approximately zero.

This also means that the increase of the total error caused by observation n is only caused by the posterior probability of the class j that this observation actually belongs to, as for every other class ynm= 0 (for m 6= j).

(10)

To control overfitting of a model on the dataset that it was trained on, its error function can be extended by including regularization (Bishop, 2006, pp. 10 - 11):

En(w1, ..., wK) = − X k ynkln(pnk) + λ 2 X k kw_kk2. (2.4)

Regularization can prevent the model weights from becoming unnecessarily large and fitted to the training set, to improve the generalizability of the model to other data. The parameter λ provides a trade-off between the fit of the model on the training set and the simplicity of the model, where increasing λ simplifies the model more.

The error function will be minimized by using the stochastic gradient descent (SGD) method, in which an update is made to the weight vector wk for every observation separately (Bishop,

2006, pp. 240 - 241). This happens through the following algorithm:

wn+1,k = wn,k− κ∇wkEn+1, (2.5) where κ is the learning rate parameter and the gradient of the log loss error function is (Bishop, 2006, p. 209):

∇_w_kEn(w1, ..., wK) = −(pnk− ynk)φn. (2.6)

If regularization is incorporated into the log loss error function, the gradient would become: ∇_w_kEn(w1, ..., wK) = −(pnk− ynk)φn+ λwk. (2.7)

The advantages of using stochastic gradient descent are that it is computationally a very efficient algorithm, since it only handles one observation at a time and only updates the weights for features that occur for that observation. Moreover, it is able to escape from local minima since stationary points for individual observations can differ from stationary points for the total sample (Bishop, 2006, pp. 240 - 241).

(11)

Chapter 3

Data

In this chapter, the artificial data and real life freeway data that were used in this research will be described. Moreover, the method of computation for the travel times that were used in the training of the optimal route classifiers will be explained. After that, it will be explained how the trajectories that are suitable to train optimal route classifiers on were selected.

3.1 Artificial data

As a starting point for creating an optimal route classification algorithm that uses real life freeway data, an artificial dataset was created to exemplify a situation in which penalization of travel time uncertainty could be beneficial. This dataset was then used to test the algorithm’s initial forms on. Moreover, to prevent overfitting of the algorithm’s parameters on the real life freeway data, some of the algorithm’s general parameters were also set based on the artificial data. The results of this will be discussed in Section 5.1. The artificial data will be described first, in this section.

To create data that resembles a situation in which penalization of travel time uncertainty could be beneficial, different types of travel time distributions, representing artificial routes, were needed. For the algorithm to be able to act in a risk averse way, a route that is generally the fastest to take was needed next to at least one other route that is less risky than the fastest route. For this purpose, three different travel time distributions were generated from Weibull distributions. Let t denote the travel time. The distributions were then generated according to the following formulas:

tkdhn = αkd+ Weibull(γk) ∗ skdh, (3.1)

skdh = Normal(µk, σk), (3.2)

where k represents a route (k = 1, 2, 3), d represents a day of the week (d = 1, ..., 7), h represents an hour of the day (h = 1, ..., 24) and n represents one observation of a certain route, day and hour (n = 1, ..., 360). Weibull(·) and Normal(·) are random draws from the Weibull and Normal distributions, respectively. This means that for every combination of a k, a d and an h the travel times are drawn from a different distribution. The values for the parameters α, γ, µ, and σ can

(12)

be found in Table A.1 in the Appendix. Each of the route distributions essentially exists out of 7 ∗ 24 = 168 smaller distributions of 360 observations with different parameters each. From this result N = 168 ∗ 360 = 60, 480 observations for each of the three artificial route distributions. This number is approximately equal to the number of generated travel times for one route in the freeway dataset. The distributions were then randomly split up into 75% of the observations for training and 25% for testing purposes. Histograms of the resulting distributions are shown in Figure 3.1.

Figure 3.1: Travel time distributions for the artificial routes

Route A represents the generally fastest, but riskiest route and route C represents the generally slowest, but safest route. Route B is in the middle for both aspects. For each of the routes, the mean travel time, the expected delay and the percentages of each of them being the fastest in the training set are displayed in Table 3.1. These values confirm that route A is optimal most of the time, but is also riskier due to its higher expected delay, while routes B and C are optimal less often but are also safer. Though the differences in expected delay between the three routes are small, it will be shown in Section 5.1 that this does not form a problem.

Routes

A B C

Mean travel time (min) 39.6 42.6 47.0 Expected delay (min) 2.7 2.3 1.7 Fastest route (%, training) 66.4 28.9 4.7 Table 3.1: Descriptive statistics for the artificial route data

(13)

3.2 Freeway data

The database that was used to train the classifiers belongs to the logistics department of TNO and is not publically available. It consists of data on all Dutch freeways, so called ”A roads” in the Netherlands, and two ”N roads” (the N11 and N14) from the year 2016. In the Netherlands, all of the roads are numbered and road locations are indicated by mile markers. Every road in the dataset has been split up into parts, which are sorted by their road type (such as regular road, off-ramp, parallel road, etc.) and driving direction. The driving directions are labelled either right or left. The convention in deciding which direction is which is as follows: if you are driving in the direction in which the numbers on the mile markers are increasing, you are driving on the right. Due to differences in the length of roads, mile markers at opposite sides of the road do not necessarily display the same number. The shape of the roads in the database can be visualized using the available longitude and latitude coordinates.

The routes for which travel times were computed were simplified by having them contain only regular roads. Since there are no data available on which roads intersect or merge with each other and around which mile markers intersections happen, this had to be determined before being able to compute travel times. To do this, first it was manually determined which roads touch by plotting the coordinates of all of the roads. The corresponding coordinates of these touchpoints were then computed by finding the set of coordinates from two intersecting or merging roads for which the Euclidean distance is minimal. Since the mile markers at opposite sides of the road do not always display the same number, coordinates for all combinations of driving directions in moving to a different road (coming from the left/right combined with going to the left/right) are needed to be able to find all the necessary corresponding mile markers.

The last and most important data from the database that was used are the filtered average driving speeds. Per road an average driving speed in kilometers per hour is available for every two hectometers of road and for every minute of every day in 2016 (except for the first two days, during which no speeds were measured due to an error in the system while its administrator was on holiday). These average speeds are based on speeds that were measured by road detectors at the lane level. An average weighted over measured traffic intensities, which is the number of cars per minute per lane, was taken to get the average speed per side of the road. As the road detectors are not equidistantly distributed over the roads, the average speeds were then interpolated over their measurement location wherever possible according to logistic standards. All of this data processing was performed by the controller of the data prior to the start of this research. From this resulted the filtered average driving speeds for every two hectometers that were used in this research.

(14)

3.3 Handling missing data

Figure 3.2: An example of filtered measured speeds, visualized in a heatmap

In Figure 3.2, the filtered average driving speeds for a certain road and day are exemplified in a heatmap. In this figure, traffic congestions between approximately 7 and 10 am (420 and 600 minutes) and 3 and 6 pm (900 and 1080 minutes) can be recognized for this particular road and day. The dark green bars and squares in the bottom of the figure represent missing data. It can be seen that sometimes no speeds were measured for substantial parts of a road. The most likely cause of this is that a road detector was (temporarily) broken or malfunctioning and the interval over which had to be interpolated to still obtain data was too large to be justified by logistic standards.

Another possible cause of missing data is that the detectors measuring the speeds for a certain piece of road are not owned by the same national organization that measured the other data, but for example by a regional organization. In that case there would of course not be gaps in the data, but there would not be any measured speeds for that particular road at all. The data then is available somewhere else, but it was not tried to retrieve them for this research.

To solve the issue of missing data caused by broken or malfunctioning detectors a method of interpolation was created. To be able to discuss this method, first the process of travel time computations will have to be explained. With the previously described mile marker data corresponding to touchpoints of roads and filtered average driving speeds it can be determined at all times during 2016 how long it would have taken someone to travel from a certain origin to a certain destination using a specific route (not taking into account urban areas). In this research, travel times were needed both per time of arrival and per time of departure. They were only computed for times of departure/arrival between 6 am and 8 pm. Travel times outside of these hours are fairly constant, so it was chosen not to take these into account here. Moreover, intervals of 5 minutes were used between times of departure/arrival corresponding to the travel

(15)

times, resulting in 168 travel times per day. The intervals were chosen to be 5 minutes long to reduce computation time compared to intervals of 1 minute, but to still be small enough to be detailed.

The travel time computations are an iterative process during which it is necessary to keep track of both your current location and the current time. There is a possibly different measured speed for every minute of every day and for every interval of 2 hectometers of road. Because of this, it has to be determined during each iteration whether the next speed that will figuratively be driven will be that of the next minute in the same road interval or that of the next road interval during the same minute. If you would describe this movement visually as in Figure 3.2, it would result in a movement going from left to right, corresponding to a movement through time, and from top to bottom, corresponding to a movement from the start of a road to its end. Because of this two-dimensional movement through such a heatmap as in Figure 3.2, one would have to approximate at what point in time the end of a bar of missing data is reached when missing data is encountered. The method of interpolation used in this research is that the movement through a bar of missing data will be approximated by using the last known speed from before the missing data was encountered. When the end of a bar of missing data is reached, the movement process will then continue as normal by using the known data again. This approximation of the movement through time is solely for the purpose of approximating from which minute, or which index (in programming terms), the measured speed should be used when a bar of missing data has been jumped over. Hence, this approximation of the time path is called ‘indexing time’ here.

The actual travel time resulting from the interpolation method was obtained in a different way. Namely, during the computation process the fraction over which had been interpolated of the total of the road intervals, fn, was kept track of. The total travel time that was based on

known data only was also registered and is called the ‘tracking time’ here. The tracking time would then be divided over the fraction of the total distance over which was not interpolated to obtain an actual travel time that would be used in the rest of this research:

Travel time: tn=

Tracking time_n 1 − fn

, (3.3)

where n represents an observation. If more than 10% of the total distance of a route has been interpolated over in the computation of a certain travel time (fn> 0.10), this travel time would

not be used in the training of the classifiers.

Most of the time, this method will not lead to different results than when the actual data would have been available, since the driving speed is approximately constant most of the time. However, deviations in results will occur when data is missing for a road interval in which drivers would encounter the start or end of traffic congestions, since the driving speed will then change significantly. Such a case of traffic conditions was found for the 7th of March of 2016, on the right side of the A1. A heatmap of the measured speeds for this road and day can be seen in Figure 3.3, where the critical point is circled in black.

(16)

Figure 3.3: An example of an extreme case where data was missing at a road interval where drivers encountered a traffic congestion. The critical point is circled in black.

It is suitable to perform the accuracy test on, because there are probably no cases that are more extreme than this. For the interpolation method to be accurate, it would have to approximate driven paths in a natural way most of the time. In Figure 3.4, paths for indexing time and tracking time are shown for three adjacent departure times for which the critical missing data point occurs. On the axes of these plots the driving time in minutes (x-axis) and driven distance in kilometers (y-axis) are shown. For every departure time the two time paths are the same up to the point where data is missing (around 100 km on the y-axis). From this point on it can be seen that the tracking path stops progressing in time while the indexing path does continue, until a new known data point is reached. As explained before, the reason is that in the end the tracking time is computed based on known data points only, after which it is rescaled to obtain the actual travel time.

For the departures at 15:35 and 15:45 (Figures 3.4a and 3.4c, respectively), the indexing paths are interpolated quite naturally. However, for the departure at 15:40 (Figure 3.4b), the interpolation is not very natural since there is a big change in the slope of the indexing path when the end of the bar of missing data has been reached. This results in a significant increase of about 20% for the approximated indexing time compared to the other two departure times. The corresponding computed travel time, on the contrary, does not show a deviation of such magnitude. In Figure 3.4d, one can see that the approximated travel time for departure at 15:40 is equal to 112.7 minutes, while in reality the travel time probably was around 106.5 minutes (based on the other two travel times). This increase of about 6% is not that heavy. Moreover, since estimation of the travel times has been negatively affected by the simple method of interpolation for only one time of departure in this extreme case, it is concluded that the interpolation method performs satisfactorily here.

(17)

(a) Departure time: 15:35 (b) Departure time: 15:40

(c) Departure time: 15:45

Departure time Travel time

15:35 104.5

15:40 112.7

15:45 108.3

(d) Resulting travel times

Figure 3.4: (a - c) Indexing time and tracking time paths for three departure times for which a critical missing data point occurred. X-axis: driving time in minutes. Y-axis: driven distance in kilometers. (d) Corresponding computed travel times in minutes

It can also occur that there is no data available for the first road interval of a certain road at a certain time due to an error in the measured speed registration. This would correspond to a vertical bar of missing data when visualized in a heatmap. To solve this issue, a horizontal jump is made to the first point in time for which there is available data again. If this time jump is larger than 10% of the total computed travel time, the travel time would not be used in this research.

With the data and computation methods described in the last two sections, the distribution of travel times could be derived for all routes for which there is data available. Using these distributions, trajectories from a certain origin to a certain destination could be selected for use in the training of the optimal route classification model. The method used for this trajectory selection will be discussed in the next section.

3.4 Trajectory selection

The optimal route classifiers in this research were trained on data from a number of different trajectories. These trajectories were carefully selected on containing freeways with different kinds of uncertainty about them, so that the route classification algorithm can be tested in

(18)

different ways. To be able to choose relevant trajectories, information about the travel times concerning the possible route components contained in these trajectories was needed first.

A route component here is defined as a part of a road in between two touchpoints of that same road with other roads or in between such a touchpoint and the end of the road. Since there are many route components contained in the total road network of the Netherlands, generating travel times for each component separately for the entire year of 2016 would have been very time consuming. Therefore, it was chosen to compute the travel times for this analysis for only one month of 2016. To decide which month to use, travel times for one testing trajectory were generated for the entire year. From these travel times could be deducted which month contains the most information, due to containing the smallest number of missing travel times.

The trajectory from Amsterdam to Rotterdam, the two biggest cities in the Netherlands, was chosen as being representative for the most important part of the Dutch road network. This trajectory starts at the point where the A8 merges into the A10 and ends at the point where the A13 merges into the A20. The routes relevant for this trajectory (and for every other trajectory described from here forward) were found using Google Maps’ (n.d.) routing functionality. Their road components are described in Table 3.2.

Route Route components

A A10 - A05 - A04 - A13 B A10 - A02 - A12 - A20 C A10 - A01 - A27 - A12 - A20

Table 3.2: Route components for the ‘Amsterdam - Rotterdam’ trajectory

For this trajectory, the number of missing travel times are plotted in Figure 3.5 per route and per month of 2016. It can be seen that the lowest numbers of errors are obtained for the months of July (6) and August (7). These months were in the holiday period of the Netherlands, so the traffic conditions then were not representative of regular traffic conditions. The month of February (1) has relatively few errors as well, but also is not suitable due to it being in the middle of winter. This leaves the month of April (3) as the best representative of regular traffic conditions in the Netherlands.

Knowing this, travel times for the month of April were computed for all route components in the road network for which data is available. Based on these travel times, the delay ratio was computed for all the route components as following:

Delay ratio_c= E(Tc|Tc> E[Tc]) − E(Tc) E(Tc)

, (3.4)

where T represents the travel time and c stands for a certain route component. That is, the delay ratio is the expected delay divided by the expected travel time. The reason for the delay ratio being the measure that is analyzed here instead of the expected delay is that the delay ratio is a measure of uncertainty that is better comparable across different routes in this context because it corrects for the length of a route component. Based on the delay ratio, the route

(19)

Figure 3.5: Number of missing travel times per route of the Amsterdam - Rotterdam trajectory, per month of 2016 (Jan = 0)

components were then split up into two categories: ‘high delay components’ and ‘low delay components’, corresponding to components with a delay ratio higher than the mean delay ratio and lower than the mean delay ratio, respectively. In Figures 3.6a and 3.6b, these categories are visualized in a delay map of all the route components in the Netherlands for which data was available, for the left and right side of the road, respectively. These two figures are also visualized jointly in Figure 3.6c. In the latter, it can be seen that when the two sides of one road are not in the same category, the color of this road becomes brown. From these maps it also becomes apparent that there is a significant number of roads for which there is no data available and that there are some roads for which data is available for only one driving direction. Nevertheless, there are still plenty of roads available to create trajectories from. It was also confirmed that there were enough travel times available for all relevant route components to be able to confidently categorize them as either ‘high delay’ or ‘low delay’.

By analyzing the subfigures of Figure 3.6, a number of trajectories were selected in which there existed both a route consisting out of mostly high delay route components and a route consisting out of mostly low delay route components. The selected trajectories are described in Table 3.3. Due to the database used in this research only containing data on the freeways, the origins and destinations of the trajectories are not always exactly located at the specified start and end respectively, but are mainly used for naming purposes. Travel times were computed for the selected trajectories and subsequently analyzed by comparing the mean travel times and expected delays of the different routes contained in the trajectories. From this analysis follows that the ‘Utrecht - Barneveld’ trajectory has the most suitable travel times to train classifiers on, since route C has a significantly lower expected delay than the other two routes in this trajectory (in the ‘Results’ chapter of this paper, this will be discussed more). The other trajectories were found to be less suitable to train on, so they were not used in this research.

(20)

However, it is not excluded that the optimal route classifier could provide benefits on these trajectories.

Trajectory Start End Route Route components

Arnhem - Barneveld A12 -|- A50 A30 -|- A01 A A50 - A01

B A12 - A30

Utrecht - Barneveld A12 -|- A27 A30 -|- A01

A A27 - A01

B A27 - A28 - A01

C A12 - A30

Haarlem - Hilversum A05 -|- A09 A01 -|- A27

A A09 - A04 - A10 - A01

B A09 - A01

C A05 - A10 - A01

Vlaardingen - Utrecht A04 -|- A20 A27 -|- A12

A A20 - A12

B A04 - A12

C A04 - A15 - A16 - A15 - A27 Table 3.3: Descriptions of trajectories that were selected based on the delay maps.

‘-|-’ stands for ‘intersection’

Next to the overall delay ratio of certain route components, route components with strongly varying degrees of delay ratios in different periods were also looked for. More specifically, delay ratios for all route components were computed for each day of the week separately and for the morning and afternoon rush hours separately. Then, the variances between the periodical delay ratios, for weekdays and for morning and afternoon rush hours separately, were calculated and the route components were sorted by these variances. With these numbers, route components with strongly varying degrees of delay ratios in different periods could be found.

In Tables A.2 and A.3 in the Appendix, the delay ratios and corresponding variances are displayed for weekdays, and morning and afternoon rush hours, respectively. It can be seen in both tables that the route components with the highest delay ratio variance are often very short pieces of road. It is possible that these road components are too small for their delay to have a significant effect on the travel times of the routes in which they are contained. Therefore, it is preferable to create trajectories from longer route components with a high delay ratio variance. On the other hand, their high delay ratios do indicate that the components are located at very busy points in the freeway network, so the surrounding route components probably are crowded roads as well. Because of this, and because of not being able to create trajectories suitable for training from the longer route components with a high delay ratio variance, it was chosen to use short route components to create trajectories from. The trajectories were made not longer than necessary, so that the varying delays of the short route components might still have a significant effect on the total travel times of routes in which they are contained.

The route components that were selected based on the weekday delay ratio analysis are shown in Table 3.4. The two selected route components are a part of the A16 and a part of

(21)

Descriptive data Delay ratios

Road Direction From To Mon Tue Wed Thu Fri Sat Sun

A16 Right 24.8 27.6 0.86 1.03 0.54 1.74 1.08 0.01 0.01

A10 Right 16.0 21.0 0.63 0.71 0.48 0.60 0.48 1.03 1.44

Table 3.4: Description of the route components selected by analyzing the weekday delay ratios, with their corresponding delay ratios per working day.

‘From’ and ‘To’ correspond to mile markers, but are specified in kilometers

the A10, both on the right side of the road. When looking at the delay ratios for the A16 route component, one can see that there is quite a lot of variation in the daily delay ratios during the working week. On Thursday the delay ratio is relatively high and on Wednesday it is relatively low. Because of this, this route component might be contained in a trajectory where on Wednesdays a different road is generally recommended than on Thursdays. The other selected route component, the part of the A10, does not have much variation in delay ratios during the working week, but turns out to be very busy during the weekend. This could mean that this route component might be contained in a trajectory where a route containing this part of the A10 generally is recommended during the working week, but not during the weekends. From the A16 and A10 route components, the ‘Woerden - Breda’ and ‘Hilversum - Haarlem’ trajectories were respectively selected to train classifiers on. These two trajectories are described in Table 3.6.

In Table 3.5, the route component that was selected based on having a high variance between its rush hour delay ratios is displayed. It is a part of the A15, on the right side of the road. In the ‘Delay ratios’ part of the table can be seen that the delay ratio for the right side of this piece of road is much higher during the afternoon rush hour than it is during the morning rush hour. This is probably caused by people working in Rotterdam, Den Haag or other cities nearby going back home after their working day by taking this route. In the morning these people are driving on the left side of the road, so they then do not have any effect on the travel times for this route component. Forming a trajectory based on this route component could result in a trajectory in which a different route is optimal during the morning rush hour than during the afternoon rush hour. Therefore, the ‘Den Haag - Ridderkerk’ trajectory was created from the A15 route component. Just like the two previously selected trajectories, it is described in Table 3.6.

Descriptive data Delay ratios

Road Direction From To Morning Evening

A15 Right 63.6 66.4 0.02 1.05

Table 3.5: Description of the route component selected by analyzing the rush hour delay ratios, with its corresponding delay ratios for morning and afternoon rush hours separately

(22)

Trajectory Start End Route Route components Woerden - Breda A12, 45.5 km* A16 -|- A58 A A12 - A2 - A27 - A58

B A12 - A20 - A16

Hilversum - Haarlem A27 -|- A1 A09 -|- A05

A A01 - A10 - A04 - A09 B A01 - A09 - A02 - A09

C A01 - A10 - A05

Den Haag - Ridderkerk A13 -|- A04 A15 -|- A16 A A04 - A15

B A13 - A20 - A16

Table 3.6: Descriptions of trajectories that were selected based on periodical delay ratio vari-ances

* mile marker

The four trajectories that have been selected to train classifiers on so far all were selected in such a way that their routes were only as long as was minimally necessary to form a trajectory. Because of this, a longer trajectory was also looked for to see whether penalizing travel time uncertainty could offer improvements on trajectories of different lengths. The trajectory that was selected for this purpose is the ‘Arnhem - Amsterdam-Zuid’ trajectory, which is described in Table 3.7. The ‘Arnhem - Amsterdam-Zuid’ trajectory also seems to be suitable to train a classifier on, since route B of this trajectory consists mostly of route components that are categorized as ‘low delay’ components in the delay map in Figure 3.6, while routes A and C consist of mostly ‘high delay’ route components.

Trajectory Start End Route Route components

Arnhem - Amsterdam-Zuid A12 -|- A50 A2 -|- A10

A A12 - A30 - A01 - A10

B A12 - A02

C A50 - A01 - A10

Table 3.7: Description of the Arnhem - Amsterdam-Zuid trajectory, that was selected based on its length

As a summary, an overview of all the routes that were selected to train classifiers on can be seen in Table A.4 in the Appendix. For each of the routes in these trajectories, the travel time distributions were checked for outliers caused by erroneous interpolation. This was done by looking at the percentage over which had been interpolated for the highest travel times of each distribution. If the interpolation percentage was very high, the travel times would be removed from the dataset. No outliers were found in any of the distributions, however.

(23)

(a) Delay map for the left side of the road

(b) Delay map for the right side of the road

(c) Delay map for both sides of the road combined

Figure 3.6: Delay maps of the road network of the Netherlands, for both sides of the road separately and for both sides combined. Dataset-specific coordinates on the x-axis and y-axis.

(24)

Chapter 4

The model

In the context of optimal route classification, the interpretation of the logistic regression model that was discussed in Chapter 2 is changed. The letter k is now changed to the letter r to represent one route from a group of routes that together form a trajectory. The posterior probabilities pnr can be interpreted as the probabilities that route r is optimal for time of arrival

n, according to the model. This means that the classes of the model now become ‘Optimal’ and ‘Not optimal’. Moreover, since only one of the routes in a trajectory can be classified as ‘optimal’, this will happen for the route with the highest posterior probability.

In this chapter, first the variables of the optimal route classification model will be defined and the reasons why they are included will be given. In the second section of this chapter will be defined the different penalty forms and parameters that will be tested in this research. Finally, the functionality of the navigation system in which the optimal route classification model can be used will be explained.

4.1 Model variables

The optimal route classification model takes as input several features and based on these decides what route to classify as ‘optimal’. The first of these features are a series of Boolean (dummy) variables, representing the possible times of arrival that the user of the navigation system can choose from. As mentioned before, these times were specified with intervals of 5 minutes between them and going from 6 am to 8 pm. Therefore, the series consists of 168 Booleans.

The second series of Boolean variables represents the different seasons. In Figure 4.1, the mean travel time over all road stretches (road pieces split up as in the original dataset) is displayed per day, for the majority of days in 2016. The five bars of exceptionally low values are caused by missing travel times for most road stretches except for only a few short road stretches, resulting in a very low mean travel time at certain times. Hence, these values can be ignored. It then becomes visible that the daily variation in the mean travel time differs significantly over the year. From day numbers 0 until approximately 80, the peaks are on average lower than for the period from day numbers 80 to 140. This may seem counterintuitive if the two time

(25)

intervals are assumed to represent winter and spring respectively, but it can be explained. The cause of this is probably that the winter of 2016 was exceptionally mild in the Netherlands while spring started off with frost and ended with thunderstorms (KNMI, 2017). After day number 140 there is a period of exceptionally low variation in the mean travel time, most likely caused by the holidays during which people travel less than usual. After the holiday period, a period with the highest peaks of all the four periods starts. This is probably because October and November of 2016 were colder than usual (KNMI, 2017). The four distinguishable periods are not exactly the same as the four seasons, but to prevent overfitting of the model on the used dataset it was chosen to approximate them by using the seasons. Due to only using data from 2016, a season was defined as starting in the first month that actually falls completely within the season. Winter, for example, is defined as all days falling in January, February and March. If data from more than one year would have been used to train the model, interaction terms of season variables and year variables would be included as well to account for the weather per season being different every year.

Figure 4.1: Mean travel time over all road stretches in the original dataset per day, from 01/03/16 until 12/12/16

Due to long computation times, it was chosen not to continue travel time computations after 12/12 since a seasonal pattern was already detectable. For the first two days of 2016 there were no speeds measured.

The last two Boolean series used as features are a weekday series and a rush hour series. As exemplified in Section 3.4, it is likely that there are differences in the expected delays for a road piece between the different weekdays or between the morning and afternoon rush hours. Because of this, these variations should be taken into account in the route classifier as well. The definition of the weekdays series is straightforward, but to be able to empirically define at what time the morning and afternoon rush hours start, the graph in Figure 4.2 was generated.

In this figure, the mean travel time over all route components during the day is plotted. Just like during the trajectory selection (based on the analysis of Figure 3.5), only travel times from the month of April were used to generate this graph. Two peaks are clearly visible in the

(26)

Figure 4.2: Mean travel time over all route components during the day, for the month of April 2016

graph, representing the morning and afternoon rush hours. It is visible that on average, the morning rush hour runs from approximately 7:00 to 9:30 am and the afternoon rush hour from approximately 3:30 to 6:30 pm. Therefore, the morning rush hour Boolean is set equal to 1 if the time of arrival falls between 7:00 am and 10:00 am and the afternoon rush hour Boolean is set equal to 1 if the time of arrival falls between 3:30 pm and 6:30 pm.

In summary, the optimal route classifier takes as input four series of features:

φ_n= [tn, sn, dn, hn]T, (4.1)

where φ_ncorresponds to the specification in (2.1), tn is a time of arrival series (168 Booleans),

sn is a season series (4 Booleans), dn is a weekday series (7 Booleans) and hn is a rush hour

series (3 Booleans). Since one Boolean was left out for each series due to otherwise perfect multicollinearity in the model variables, the total number of input variables is equal to 178.

4.2 Optimal route classification

As mentioned earlier, the goal of this research was to improve an optimal route classification model by applying penalization of travel time uncertainty in its error function. Here, the log loss error function was used, as shown in (2.3). The choice was made to extend the log loss error function to the following form by including penalization:

En(w1, ..., wR) = −(1 + penalty)

X

r

ynrln(pnr), (4.2)

where the penalty form is still unspecified. In this specification of the error function, the (1 + penalty) term enlargens the log loss error term that is added to the total error in case of misclassification. This is logical because the uncertainty surrounding a certain route does not need to be penalized if the route actually is the fastest route to take at a certain moment. To incorporate differing amounts of risk aversion into the model, a risk aversion coefficient α ≥ 0 was introduced. Here, a low value of α corresponds to low risk aversion. Moreover, to

(27)

let the model learn the amounts of uncertainty surrounding each route in a certain trajectory, a penalty strength coefficient ρ > 0 was introduced. This coefficient ρ varies according to the amount of uncertainty surrounding the route that was chosen as optimal by the model. This works as follows: if a relatively safe route in a certain trajectory was not chosen as optimal while it actually was optimal, a large penalty should be given, so ρ should be big. If in contrast a relatively safe route was chosen as optimal while it actually was not, ρ should be small so that only a small penalty is given. The coefficients α and ρ were then combined into an interaction term to form the total penalty, to which they both should be positively related. In this way, for a high value of the risk aversion coefficient α, the model could then learn to prefer a certain route within a trajectory over another route because its travel times are more reliable and not necessarily because it is the fastest route to take in general.

To obtain the total penalty, the penalty strength coefficient ρ needed to be defined first. It depends negatively on the amount of uncertainty surrounding a route and preferably should be comparable across different trajectories. Because of this, it could be based on the measure of uncertainty surrounding a route r that was defined in Chapter 2, the expected delay:

Dr= E(Tr|Tr > E[Tr]). (4.3)

This measure did not suffice yet, as ρ should be related to Dr in a negative way. Moreover, the

expected delay is comparable across different routes, but not yet across different trajectories. Because of the latter, the following transformation was applied first to obtain the uncertainty factor Ur: Ur= Dr max r Dr . (4.4)

That is, the uncertainty factor is equal to the expected delay of a certain route divided by the maximum expected delay occurring within the same trajectory. In this way, it is a measure of how relatively uncertain a route is within a certain trajectory. The measure of uncertainty is relative to the most uncertain route within a trajectory, which will therefore always have a value of Ur equal to 1. Because of this, the uncertainty factor can be used to compare the ranges of

uncertainty occurring in different trajectories.

Since the penalty strength coefficient ρ should be negatively related to the amount of un-certainty surrounding a route, one last inverting transformation was necessary to get from Ur

to ρn. For this, two different transformations were tested. To be able to define these, first r∗

has be defined. It is as follows:

r_n∗ = argmax

r

pnr. (4.5)

pnr from this equation was already defined in (2.2) (where k = r). The different forms of ρ that

were tested are then: • ρ_n= _U1

r∗_n • ρ_n= −log(Ur∗

(28)

In addition, different interaction forms between α and ρ were tested for use as the total penalty: • Penaltyn= ρnα

• Penaltyn= αρn

• Penalty_n= (ρn)α

The tests for ρ and the total penalty happened simultaneously, giving six different forms for the total penalty that were tested. The testing was performed by comparing classification outcomes for a range of values for α, for the artificial dataset that was described in Section 3.1. The penalty forms were mainly evaluated on two aspects. The first being their classification stability over using different seeds in splitting up the training and testing samples. The second is their range of classification outcomes. Here, a bigger range of outcomes was considered better, as it allows for more options to choose an optimal value for α from. The results of these tests will be discussed in Section 5.1.

Once the optimal penalty form was found, other parameters of the model could be optimized as well. In (2.5), the learning rate parameter κ is included. To improve the converging of the weights of the model, κ can be defined as decreasing for every observation n:

κn= κ0

1

nδ, (4.6)

where κ0 is a constant that is larger than 0 and δ is a constant that is between 0 and 1. By

increasing κ0, larger steps are taken during the stochastic gradient descent optimization. By

increasing δ, the step size becomes relatively smaller as more observations have been processed. The values of α and κ0 can be changed in such a way that different combinations of these

parameter values lead to approximately the same classification outcome. To achieve this, α has to increase when κ0 decreases and vice versa. It is useful to test the navigation system’s

performance using multiple of these combinations, because increasing α while decreasing κ0

leads to more influence of the ‘1’ from the term between the brackets in (4.2) compared to the ‘penalty’ term. This is especially important for the performance of the non-penalizing navigation system to which the penalizing navigation system was be compared, because it might lead to significantly different values of the model weights that result from the SGD optimization. In addition, different values of δ needed to be evaluated to see what the optimal value is for this parameter. The value of δ was linked to the value of κ0 that was used and was decided based

on the artificial dataset. This will be discussed in Section 5.1.

Simultaneously to testing different values of κ0, uncertainty factors of different specificities

were tested as well. As shown in Sections 3.4 and 4.1, the expected delay for a certain route can differ per season, weekday and rush hour. Because of this, the optimal route classification algorithm could improve if the expected delays were computed per specific period instead of over an entire year. The uncertainty factors that are penalized in the model are based on the expected delays, so they would also become more specific because of this. As an example, the ‘Den Haag - Ridderkerk’ trajectory will be used here. As shown in Tables 3.5 and 3.6, it

(29)

is expected that route A of this trajectory is much busier during the evening rush hour than during the morning rush hour. If the uncertainty factors for this trajectory would be specified per rush hour, the penalization would be much lighter for travels during the evening rush hour than for travels during the morning rush hour (as the penalty strength depends inversely on the uncertainty factor). In this way, the model would become more detailed and would apply penalization more fittingly to each situation. The different forms of the uncertainty factor that were tested will from now on be referred to as ‘uncertainty splits’. They are specified in Table 4.1. The results from the simultaneous testing of different values of κ0 and different

uncertainty splits will be discussed in Section 5.2.

Uncertainty splits

Periods 1 2 3 4 5 6 7 8

Seasons X X X X

Weekdays X X X X

Rush hours X X X X

Table 4.1: Periods for which specific uncertainty factors are calculated for each uncertainty split

4.3 Navigation system

Now that all aspects of the optimal route classifier have been defined, the form of the navigation system in which the classifier could be used can be discussed. It was mentioned already that the user of this navigation system will be able to choose from four different risk profiles, depending on the user’s current situation and risk aversion. Based on the chosen risk profile, the navigation system will then recommend a possibly different route and time of departure. To evaluate the performance of the uncertainty penalizing navigation system, its recommendations were compared to the recommendations of a non-penalizing navigation system on several different aspects.

As a start to explaining the functionality of the navigation system, the four risk profiles are defined and examples of situations in which they are useful to the navigation system’s user are given:

• Minimal risk: the user wants to be absolutely sure that he or she gets to the destination in time.

Example: the user needs to post an important package at the post office before it closes. • Little risk: the user wants to be fairly sure that he or she gets to the destination in time.

Example: the user has a meeting at work with a colleague.

• Mild risk: the user wants to stay as long as possible at the origin of the travel, while still having a reasonable chance of arriving in time at the destination.

(30)

Example: the user is on an enjoyable trip with friends, but would like to be home before a certain time.

• Indifferent: the user does not care whether he or she arrives earlier or later at the destination than the planned time of arrival.

Example: the user is driving through a beautiful landscape while on a road trip during the holidays. He or she does not mind having to travel longer, because the user can then enjoy the view.

Concerning the ‘Minimal risk’ profile, one can of course never be absolutely sure of arriving in time. However, given circumstances that are not highly irregular, following the navigation system’s recommendations under this profile should lead to arriving in time. Situations in which the ‘Indifferent’ risk profile will be chosen probably do not occur often, but for completeness it was still included.

For each of the risk profiles, a different time of departure will be recommended for a certain travel. The time of departure is computed as follows:

Time of departure = Desired time of arrival − Tr∗_,pct, (4.7) Tr∗_,pct = {T_r∗_,s,d | P(T_r∗_,s,d ≤ pct_risk)}, (4.8) where pct is a percentage (0 ≤ pct ≤ 1) that differs for each risk profile, risk, and T is the travel time. r∗ has already been defined in (4.5). The symbols s, d and risk are left out of (4.7) for ease of notation. s and d represent the season and day of the week corresponding to the desired time of arrival, respectively. Hence, Tpct is a percentile of the travel time distribution for the

combination of a season and a weekday corresponding to the desired time of arrival. Percentiles are not specified per rush hour here, because this turned out to actually lead to worsened performance. It was also examined whether using more detailed percentiles per time of arrival was of value to the model, but this was not the case either. In Table 4.2, the percentiles for each risk profile are shown. They will be discussed at the end of this section. The percentiles were determined intuitively and were confirmed by asking TNO employees what percentiles they would use for each profile. Three TNO employees (other than the writer of this paper) offered their input, which is shown in Table A.5 in the Appendix.

With the recommended time of departure available for each observation, the navigation system’s performance could then be evaluated. This was done by comparing the Tr∗_,pct corre-sponding to a recommended time of departure to the actual travel time of the recommended route at that time. By aggregating the comparisons for all of the processed observations, the following performance measures could be computed:

(31)

In-time score: Percentage of arriving in time Travel time: Mean travel time in minutes

Minutes late: Average number of minutes that one was late, given that one did not arrive in time

Wasted time: Average number of minutes that one was early, given that one arrived in time

For each of the risk profiles, different performance measures are optimized over through a grid search. This happens by first randomizing the total sample of 2016 for a certain trajectory over the different days that it contains and splitting it into a training sample (50%), an optimization sample (25%) and a testing sample (25%). The sample randomization is done in such a way that data from one day occurs in only one of the subsamples. For a range of 11 different values for α, the route classification model’s weights are then optimized using the training sample. The number of training epochs that was used here is equal to 1. The range of values for α depends on the κo (from (4.6)) that was used, but it always includes α = 0 to allow the model to decide

not to apply any penalization. Then, for each of the values for α, the performance measures are computed using the optimization sample. The performance measures are not continuously dependent on α since several combinations of the posterior probabilities pnr for the different

routes can lead to the same route being chosen as optimal. This is because a certain route will be chosen as optimal as long as its value for pnr is the highest out of all of the routes in a certain

trajectory. Therefore, the step size between the grid values is set sufficiently large to prevent the grid search from leading to the same outcome for all values. Based on the performance measures over which are optimized for the selected risk profile, the optimal value out of all the grid values, α∗, is then selected. This is done by first finding the values of α that lead to the highest value for the primary performance measure for a certain risk profile. Out of these values, the value of α for which the secondary performance measure is optimal is then selected as α∗. If the same optimal values for both performance measures are obtained for multiple values of α, the smallest of these values is selected as optimal. This concludes the first round of the grid search.

The second round of the grid search starts by letting the range of values for α go from the nearest value of α from the previous round that is lower than α∗ to the nearest value of α from the previous round that is higher than α∗. Again, 11 different values for α are used. If for example for the first round of the grid search, the range of α is from 0 to 1000 in steps of 100 and the optimal α for this round is equal to 400, the range in the next round will go from 300 to 500 in steps of 20. The performance measures corresponding to the selected risk profile are again optimized over to find a new optimal value for α. However, this new value is only used under certain conditions of improvement. If the in-time score is used as the primary performance measure, the new value for α∗ is used if it increases the in-time score by more than 3% in absolute terms compared to the first round. If any of the other three performance measures are primarily optimized over, the second round α∗ will only be used if it decreases

Penalization of travel time uncertainty in optimal route classification

Amsterdam School of Economics

Penalization of travel time uncertainty in

optimal route classification

G.W.E. Romme

Contents

Chapter 1

Introduction

Chapter 2

Theoretical background

2.1

Personalized routing

2.2

Travel time uncertainty

2.3

Logistic regression

Chapter 3

Data

3.1

Artificial data

3.2

Freeway data

3.3

Handling missing data

3.4

Trajectory selection

Chapter 4

The model

4.1

Model variables

4.2

Optimal route classification

4.3

Navigation system