Saturation forecasting for smart traffic intersections with recurrent neural networks, based on sensor loop data

(1)

Saturation forecasting for smart traffic intersections with recurrent neural networks, based on

sensor loop data

submitted in partial fulfillment for the degree of master of science Thomas van Dooren

10625488

master information studies data science

faculty of science university of amsterdam

2019-07-05

Internal Supervisor External Supervisor Title, Name Mahsasadat Shashahani Samuel Blake Affiliation UvA, FNWI, IvI HAL24k

(2)

Saturation forecasting for smart traffic intersections with

recurrent neural networks, based on sensor loop data

Thomas van Dooren

University of Amsterdam thomassebastiaanvd@gmail.com

ABSTRACT

Traffic control centres in The Netherlands are in need of automated, quantitative and predictive statistics of smart traffic intersections. Saturation of a traffic intersection is a quantitative indication of the traffic flow. Saturation is calculated by dividing the current headway with the minimum headway, where headway is the amount of seconds between cars that pass a traffic light. Based on the V-log data (V-log sensors track the status of traffic lights and presence of vehicles on certain parts of the road) of multiple intersections in Noord Holland, several forecasting models have been created and compared; a random forest regression model, multiple long short-term memory recurrent neural networks (RNN) and a persistence model. The modelling approach for the long short-term memory models is optimized to create a general model that is able to predict new lanes. This is done by varying the memory and memory step for the input sequences of the model, varying features for the model and by varying what data is used as input for the model. The RNN that scales best to new lanes is an RNN with 3 features (time of day, day of week and previous saturation values) trained on 3 lanes with a mean MSE(x1000) of 3.04 and a Pearson coefficient of 0.97. This model outperforms the persistence model, which has a mean MSE(x1000) of 3.50 and a Pearson coefficient of 0.94, in forecasting saturation 15 minutes ahead. The modelling approach that scales best to new lanes for 15 minutes predictions is an RNN model that uses 3 lanes as input data (6 years of data), has 3 features (time of day, day of week and previous saturation values), has a memory of 120 minutes and a memory step of 1 minute.

1 INTRODUCTION

Traffic congestion is a big problem and is not easy to solve. It costs road users a lot of time and results in huge societal and economic costs. Traffic demand is going up and it seems that expanding the road infrastructure is not solving the problem. One way of solving the problem would be to make more efficient use of the existing infrastructure [1]. Traffic control centres try to control the traffic flows to prevent and reduce traffic jams. Generally, their aim is to improve the performance of the traffic system. These traffic control centres can influence the traffic light behaviour of intersections. When an intersection gets over saturated, the over saturation spreads to the intersections nearby with the possibility of over saturating very large parts of the road network.

The traffic control centres in the the province of Noord Holland have very few automated ways of identifying abnormal behaviour for traffic intersections. They use automatic alerts when certain roads have a low traffic speed or certain sensors in the road are on for an exceptional long time (indicating a traffic jam). Most of the time, video footage is monitored by human eyes in order to identify abnormal behaviour. This is time consuming, subjective

and expensive. That is why automating this process is necessary [11]. Moreover, the current methods for detecting saturation are not able to make any future predictions and are not specific enough. When over saturation is observed (in present time), it is already too late to act properly for the traffic controllers. On the bright side, the province of Noord Holland has been investing in smart traffic intersections a lot. Currently only 200 (roughly) traffic intersections are considered smart, but the goal is to increase this number. These intersections have V-log sensors underneath the surface of the road. Every 0.1 second these sensors generate V-log data with information about the status of traffic lights and presence of vehicles on certain parts of the road. This is on the cutting edge of smart traffic.

All in all, traffic controllers will be able to make better decisions in controlling the traffic flows if they can rely on quantitative satu-ration predictions. For this project we are interested in calculating the saturation of several traffic intersections in province of North Holland. The goal is to predict saturation rates of the intersections 15 minutes into the future, based on the V-log sensor data generated by the sensors that are measuring these intersections.

1.1 Research question

The need for traffic control centres to have saturation forecasts leads to the following research question: What modelling approach is best for 15 min forecasts of traffic saturation based on scalability to new lanes?

1.2 Support for the research question

In addition to the research question, the following questions will be used as guideline to answer the research question.

• How is the saturation of a traffic intersection defined and how can it be extracted from the loop sensor data? • What models can provide a baseline to compare the long

short-term memory recurrent neural network with? • What long short-term memory recurrent neural network

structure is best suited for the described problem? • Is a long short-term memory recurrent neural network better

at forecasting saturation of smart traffic intersections based on V-log data than a random forest regression model or persistence model?

• What is the optimal modelling approach for a long short-term memory recurrent neural network?

2 RELATED WORK

In this section related work is discussed for saturation based on headway, long short-term memory recurrent neural networks (RNN) and random forest models.

(3)

2.1 Saturation based on headway

The saturation flow of a traffic intersection is a measure for the performance of the intersection. It describes the maximum amount of passenger car units that hypothetically are able to pass the in-tersection, considering the lights are always green, during an hour. For the calculation of the maximum saturation flow, an optimal situation is assumed. It is the maximum of cars a traffic intersection can process.

The maximum saturation flow (for an hour) is then calculated with Equation1.

Smax=_H3600 min

(1) Maximum saturation (Smax) is calculated by dividing 3600 (amount of seconds in an hour) by the minimum headway in seconds (Hmin). Smaxis dimensionless. The headway is the time between two cars passing the same traffic light. When cars are queued for a traffic light, the first car takes a few seconds to accelerate and pass the traffic light, which results in a relatively high headway [3]. The second car is able to accelerate during the acceleration of the first car, resulting in a shorter time needed to pass the traffic light, thus a smaller headway. Every following car in this queue takes less time to pass the traffic light. From six cars and on ward, an optimum value is established for headway [16]. This is the minimal time between two cars passing the traffic light. In Figure1the headway does not change significantly for vehicles behind the sixth vehicle in queue. This value is the minimum headway (Hmin) and is calcu-lated by averaging the headway of all the vehicles that are behind the sixth vehicle in the queue, see Equation2. For a comparison between the theory (Figure1) and the actual data used for this project, see Figure15in the appendix.

Figure 1: Headway for nth vehicle in queue [3].

Hmin= Íl

j=7H(j)

l + 1 − n (2)

In Equation2, H(j) is the headway of thejt hvehicle in line. The summation starts at j = 7. l is the last queued vehicle position and n = 7, the first vehicle where optimal headway is assumed (see Figure1: for j > 6 headway is approximately a constant value.) In

the original paper, a few assumptions are made for the traffic lane [3]:

• 3,6 meter lane width • Flat gradient

• No parking or bus stops near the intersection

• Uniform movement type, i.e only straight movement or only turning movement

• No heavy vehicles • No pedestrians or cyclists

2.2 Random forest

Decision tree regression models combine predictive and supervised machine learning with tree-like graphs. The tree is a visual rep-resentation of the features of a machine learning problem. Every node in the tree is a condition based on the input features [17]. The random forest model randomly creates multiple decision trees with randomly created nodes, conditions and hierarchy [9]. Every tree in the forest makes a prediction for the output. The mean of all the output predictions is the prediction of the random forest model. The random forest regression model can handle large number of features and is robust to noisy data [12].

Random forest models are generally designed for regression and classification problems, not for time series. The interdependence be-tween several observations in a time series is not taken into account by the random forest model. However, multiple studies suggest that random forest models can be used to make accurate predictions based on time series. High accuracy predictions are made by a ran-dom forest model forecasting energy load in an electric circuit [5] and in another study a random forest model was able to forecast energy prices in New York [10].

2.3 Recurrent neural network and long

short-term memory

A study shows that recurrent neural networks perform better than feed-forward models for analyzing time series for repairable system failure analysis [6]. Additionally, another study has concluded that RNN models are very powerful tools for making predictions on chaotic time series [20]. All together, literature suggests that recur-rent neural network (RNN) models are very suitable for analyzing any sort of time series.

Forecasting of traffic has been done earlier by comparing differ-ent models such as k- nearest neighbour, lasso regression, Support vector regression, ARIMA, VAR and TiGraMITe [14]. But also neu-ral networks and classic statistical methods are used for short term traffic forecasting. These methods generally outperform linear re-gression models [7]. However, studies that forecast traffic behaviour with neural nets often do not include extra features or combina-tions of models, which is a relevant research topic for the field [19]. Also, previous research does not focus on generalizing models to different situations [19].

An artificial intelligence master thesis has been published in 2018 and explains the design of a recurrent neural network model to analyze traffic behaviour on highways using sensors with similar characteristics as the sensors that are used for this project [18]. In this thesis the data comes from sensors that are placed underneath the surface of the highway. From this data, features such as speed 2

(4)

and type of a vehicle are extracted. The goal of the thesis is to pro-vide traffic controllers with real time information about the actual driving behavior on the highway. This is done by comparing dif-ferent timeseries-based models, such as a long short-term memory neural network (RNN) and a gated recurrent units neural network (GRU), with a feed-forward model, a multilayer perceptron (MLP). The RNN and GRU outperformed the MLP significantly. The big difference between that thesis and this project, is that in that thesis a classification model has been created whereas this project will be using regression models. Also, the model of that thesis uses traffic loop data as input, which is used to measure speed. For this thesis, the data comes from a different type of sensor and speed is not a relevant statistic.

3 METHODOLOGY

In this section will be explained how the data is processed, what features are constructed and what models are used.

3.1 Data

In this section, an overview of the data is presented. It includes how the data is gathered and visualized, but also points out flaws in the data and how to handle those. Next, the algorithm that labels all the saturation values to the data is explained. Finally, the last section explains the steps needed to transform the output of the algorithm to something that is useable as input for machine learning models.

Figure 2: Traffic intersection 1231.

3.1.1 Data overview. The data, available via a private MongoDB, is binary sensor data from several traffic intersections in the North Holland province. The data of four crossroads is available, these crossroads will be referred to as V-log 1234, 1231, 1239 and 5195. Figure2represents one of the smart intersections. The Koplus, Langelus, 1e verweglus and 2e verweglus are sensors underneath the asphalt on different parts of the intersection. The koplus is the sensor in front of the traffic light, the langelus is positioned just behind the koplus, the 1e and 2e verweglus are positioned far away from the traffic light. These sensors have binary outputs; an output of 1 will be generated if a vehicle is on top of the sensor and an output of 0 will be generated if there is no vehicle on top of the the sensor (assuming vehicles are the only entities crossing the intersection that trigger the sensor). The traffic lights have 3 possible outputs; 0 for red, 1 for green and 2 for orange. The output

of the smart intersection updates every 0.1 second. This is done for every sensor in the crossroad.

Some of the data is flawed; it shows behaviour that is abnor-mal for norabnor-mally functioning traffic lights. The traffic light status changes from 1 to 2 and from 2 to 1 several times in a short period, indicating that the traffic light changes from green to orange and back from orange to green again. This behaviour is abnormal and indicates that either the traffic light itself was flawed, or something went wrong in translating the output from the sensor. Either way, the saturation algorithm detects and works around these flaws in the data.

3.1.2 Data labeling. To be able to train a supervised model, labeled data should be provided to train the model on. An algorithm has been developed to label the data with saturation values. A traffic light cycle is the time in seconds from when a traffic light changes from red to green until it changes from orange to red. The algorithm calculates saturation at the end of the cycle, when the light changes from orange to red. There is a maximum amount of cars that are able to pass a traffic light in a certain timeframe, this is based on the minimum headway from this specific lane, based on Equation

2. Instead of copying that formula, the absolute minimum headway will be used, which is practically the same. In reality, the average headway of cars that pass the traffic light in a traffic light cycle almost never reaches the minimum headway. But whenever it does, the saturation value should be maximum (a value of 1). The average headway of cars that pass the traffic light in a traffic light cycle is called the current headway,Hcur. By dividing the cycle time, Ct (in seconds), by Vp (the amount of vehicles that pass during the cycle), Hcuris calculated (Equation3).

Hcur=_VpCt (3)

The saturation (S) can be calculated by dividing the minimum headway (Hmin) by the current headway (Hcur) (Equation4). See Figure3for a visual representation of the algorithm.

S = Hmin

Hcur (4)

Figure 3: Flowchart of the saturation algorithm.

Some lanes are extremely busy (this often means low minimum headway) and some lanes are extremely quiet (this often means high minimum headway). This fact encourages to distinguish the 3

(5)

minimum headway per lane. The minimum headway is calculated for every lane for v-log 1231 and shown in Figure4.

Figure 4: Minimum headway calculated per lane for one week of data on intersection 1231. Values for headway in seconds are on the y-axis. The lanenumbers are on the x-axis.

3.1.3 Some considerations for labeling. Every time a traffic cycle ends, all the following timestamps are labeled with the saturation value until the next traffic cycle ends. Thus, the labeling of satura-tion value always lags one cycle behind. As a result of this "lag", the algorithm output shows some specific behaviour during night times that does not correspond with the definition of saturation in general. During night times very few cars pass the crossroad, which may result in traffic light cycles that are few hours long. The last saturation value calculated will be repeated as long the light is still green. This results in strange behaviour; it is possible that the last saturation value was rather high, which results in the algorithm incorrectly judging that the crossroad is saturated, where in reality 0 cars have passed the past few hours. The algorithm checks if a traffic cycle is taking longer than 15 minutes, when this is the case, the saturation value is set to 0. The 15 minute threshold is based on the fact that during daytime no traffic light cycle is this long, but during nighttime, when no cars pass, this often happens.

Several approaches for calculating the minimum headway have been attempted and a constant minimum headway gave the best results. The other headways are discussed in the appendix.

3.2 Features

In this section the implementation of two extra features, the fine-tuning of the saturation feature and the normalization of all features are explained.

3.2.1 Saturation. The output of the algorithm is not uniform in the sense that the time between timestamps differs due to differences between traffic light cycle lengths. The RNN model that is used to forecast saturation only performs well on data where the time between timestamps is uniform. In order to achieve this, forward filling is used. Every timestamp is rounded down on the minute and duplicated for every following minute, until an already existing timestamp is encountered.

Figure 5: Saturation value on the y axis, time (hour) on the x axis. The blue line is the raw data after forward filling. The green line is the rolling average, averaging 30 inputs. The orange line is the rolling average with the Blackmanharris window function applied.

After forward filling, the plotted data follows the blue line in Figure5. The blue line, generated by the raw data, fluctuates a lot, which is not predictable. Also, the goal is to predict saturation 15 minutes into the future, thus the heavy fluctuation every minute does not provide the correct information to make a solid prediction. The green line is the rolling average of 30 minutes of data. Every saturation value is recalculated by averaging all the saturation values from 15 minutes before the timestamp at issue to 15 minutes after the timestamp at issue. The general behaviour of the blue line, without all the fluctuations every minute, is represented by the green line. The line however, does not follow all the peaks of the blue line. The BlackmanHarris window function, a mathematical function that applies a weighting to each discrete time series sample in a finite set [15], is applied to the data. It is the orange line in the plot. This line is smooth, but still follows the peaks of the raw data. The line looks predictable for a 15 minute forecast. This will be used as input for the prediction models. Several window functions have been tried and the BlackmanHarris gave the best result in terms of minimizing the fluctuation without overgeneralizing the data. 3.2.2 Time of day and day of week. Time is a very important feature in forecasting saturation due to the periodicity of traffic. The minute of the day, a value between 0 and 1440, has been used as feature. Day of the week is also very important feature, it is a logical argument: week days have rush hours (in general) and weekend days do not. By adding this feature, the model is able to distinguish weekend and week days and make better predictions (probably). Day of the week uses numerical representations 0, 1, 2, 3 ,4 ,5 and 6 for every day of the week.

3.2.3 Normalization of features. Generally, normalizing features is good practice for machine learning problems. Specifically, normaliz-ing features is very important for RNN models. The RNN has LSTM cells with sigmoid as built in function. The intrinsic behaviour of this function makes the RNN perform better if input values are 4

(6)

scaled between -1 and 1 and are symmetric around the value 0. All the features have been scaled to values between -1 and 1.

The most straight-forward way of creating time features, is by creating features for year, month, day of the week, hour of the day and minute of the hour. This representation works fine in general, but has its flaws. For example, date 1, Wednesday 8 august 2018 23:59 and date 2, Thursday 9 august 2018 00:00 should be very similar because the time difference is very small. However, these dates are considered to be very different based on the feature ex-traction explained above. Day of the week would be represented by a categorical variable with value 2 for Wednesday and 3 for Thursday. Hour of the day would be represented by a categorical variable with value 23 for the first date and 0 for the second date. The difference between 23 and 0 is very big despite the fact that it is only 1 hour difference in reality. This approach results in a disconti-nuity of the timescale while the timescale should be continuous. To tackle this problem, feature extraction will be done on a continuous scale. Time of the day will be projected on a hypothetical circle with values between 0 and 2π, see Equation5. In this equation, h is hour of the day, m is the minute of the day and 1440 is the total amount of minutes available in a day (60*24). By defining a periodical function, it is ensured that minute 0 and minute 1440 have the same TimeOfDay.

TimeO f Day = sin

2π60 ∗h + m 1440

(5) For example, Date 1 will have a value of sin(2π*(23 * 60 + 59)/1440) = -0.004 and Date 2 will have a value of sin(2π*(0)/1440) = 0. These values are much closer than the first approach suggested for feature extraction and is a solution for the discontinuity problem.

Day of the week is used as static feature. The values for this feature are 0,1,2,3,4,5 and 6. These values are not optimal for the RNN. The values are thus scaled by taking the mean and subtracting this with the standard deviation, see Equation6. d is a integer value for day of the week, the mean,µ, is 3 and the standard deviation, σ, is 2.

DayO f W eek =d − µ

σ (6)

The same approach is made for the values for saturation. The saturation values vary from 0 to 1, but for better performance of the model the values are scaled to the range -1 to 1 by using Equation

7. In this formula, S is unscaled saturation.

Saturation =S − µ_σ (7)

3.3 Models

In this section, two base line models are discussed, the configuration of recurrent neural networks are discussed and model implementa-tion is explained.

3.3.1 Baseline models. A baseline method is necessary to compare the results of the RNN model with. Instead of using a model that is designed for data with a temporal component, the base line will be a standard feed-forward model. Forecasting with a feed forward model as baseline method is done by creating features that are causally related to saturation. The features that are used are: time

of the day and day of the week. In addition to these features, some kind of memory behaviour is implemented by adding a third feature: the value of saturation of the previous timestamp.

A random forest model will be used to compare the results of the RNN with. The random forest consists of 100 individual trees. The performance of the RNN model will also be compared with the persistence model, this model just copies the last value and uses it as prediction [4].

3.3.2 Recurrent neural network: long short-term memory. Whereas conventional neural networks are not designed to learn from pre-vious events, the RNN contains powerful LSTM cells that are able to process information from previous events. These self-connected cells store temporary information of previous states. In addition to this, the cells have different multiplication matrices built in, to control the flow of the information through the cells [13]. RNNs are difficult to train, because the weights are updated every cycle, which can lead to an unstable network of weights. Adding LSTM cells counter this with several "gates" inside the memory cell such as a tanh or sigmoid layer.

Imagine a RNN with only 1 layer: a LSTM cell. The input for the LSTM cell is an array of successive values. The input that goes into LSTM cell of the RNN is the first value of the array with saturation values. The output of the LSTM cell is a vector representation of the input value. This output is used as new input for the LSTM cell together with the second value of the array, and then the LSTM cell outputs a representation of the first two values in the array. This is repeated for every value in the input array. After the complete array has been through the LSTM cell, the LSTM outputs a vector repre-sentation of the full array. The RNN is able to output a prediction value. This prediction value is compared with the target value with a loss function. the RNN then back-propagates this information through the model, adjusting the weights of multiplication matrices, training the model.

The one dimensional RNN only has sequential saturation values as input. Interestingly, a random forest is able to make predictions with a very different set of features. In addition to the one dimen-sional RNN described above, a multidimendimen-sional RNN is designed that combines the features of the random forest and one dimen-sional RNN approach; a RNN that has sequential saturation data as input, but also sequential features such as time of the day or static features such as day of the week. Instead of feeding the RNN a vector of arrays, the vector is a tensor with several features. If an array of 7 previous timesteps is used in the one dimensional RNN model, the input has the form (Amount of inputs, 7, 1). The first dimension is the amount of inputs, the second dimension is the amount of previous values of the timeseries used (7) and the last value is the amount of features. If the feature time of the day is added as sequential feature to the model, the input has the form (Amount of inputs, 7, 2). In addition to the extra sequential features some static features will be used in this model. Instead of only feeding a list of sequential data to the RNN, a single value will be concatenated with the vector representation of the sequential features, that is the output of the last LSTM cell. This single value represents, for example, a numerical representation of the day of the week.

(7)

3.3.3 Model implementation. Python has very powerful libraries that simplify implementing different machine learning models. The LSTM and random forest regression models are both available in two python packages: Keras and Sklearn, and are used for the purpose of this project. The model consists of 2 LSTM layers and 2 dense layers. The first layer, LSTM1, has multiple time series as input and outputs the same format. The LSTM1 output is the input for the second layer, LSTM2. The LSTM2 outputs a vector representation of its input. This vector contains all the information of the the previous states of all the features that are used for the input of LSTM1. The LSTM2 layer is connected to a dense layer. This dense layer is a multiplication with a matrix filled with weights, increasing the amount of neurons in the network. Before the vector goes through the dense layer, the static features are concatenated to the vector. The output of this dense layer is a very large vector. This vector enters another dense layer to flatten out the vector to a single value; this is the output of the entire model, i.e. the the prediction value. This prediction value is compared to the target value. The model calculates the loss (for example mean squared error) and adjusts weights in the model in order to make better predictions.

4 EXPERIMENTS

The final RNN model is designed by testing the performance of different RNN models with different parameters for the architecture, different input for the saturation values and different combinations of features. In the RNN, the dropout function is used for the LSTM1 and LSTM2 layers. The dropout function randomly selects neurons in the network and disables these. This is a regularization technique for reducing the complexity of the model and to prevent over-fitting. Mean squared error (MSE) will be used as loss function. This is a standard loss function in Keras that is often used for regression problems.

An 80/20 training-test-split is done. The optimizer used is "Adam".

For section 4.1 the models are trained for 5 epochs and for section 4.2 the models are trained for 15 epochs.

New crossroads do not have historic data available where models potentially could be trained on and creating a unique model for every lane of every crossroad in the Netherlands is simply too complex. Instead, the goal is to create a model that generalizes to many different lanes. That is why the focus for this section is on models where training is done on different lanes than the validation lane.

4.1 Input architecture

Saturation values are presented to the RNN in a list of n values with a total memory of m having ms minutes between every saturation value. The memory (m) is the amount of recent minutes that are relevant for for the RNN to use as input. The memory step (ms) is the timestep between each value in the memory. The total number of values in the list are n = (m/ms) + 1. For example, m = 60, ms = 15, n = 5 and the current time is 15:00. The saturation values that will be used as input for the RNN are the values that correspond to the timestamps 15:00, 14:45, 14:30, 14:15 and 14:00. It is not clear however what configuration of m and ms is optimal; are only the

last 15 saturation values relevant with 1 minute in between, or do 8 saturation values with 15 minutes in between have more predictive power? This will be evaluated in this section.

The first model consists of 2 layers: a LSTM layer with 111 neu-rons and a dense layer with 1 neuron. The only input for this model are saturation values. The configuration of this input has been varied and tested, see table1. The result with the highest mem-ory has the lowest MSE, thus that model performs best. From this table is concluded that higher memory probably leads to a better performance. By increasing the memory to over 120, the gain in performance does not weigh up against the increase of running time.

Table 1: Optimizing configuration input first RNN m ms MSE x 1000

30 5 5.955

60 5 5.786

120 5 5.691

The second RNN model consists of 4 layers: a LSTM layer with 5 neurons and a dropout function, another LSTM layer with 20 neurons and a dropout function, a dense layer with 50 neurons and a dense layer with 1 neuron. From table2can be concluded that decreasing the memory step probably leads to a higher performance. Also, based on m = 5 and ms = 120 in2and1, it can be concluded that the second RNN has a better overall performance.

Table 2: Optimizing configuration input second RNN m ms MSE x 1000

120 1 3.29

120 3 4.99

120 5 5.23

Besides this, different optimisers for the loss function and dif-ferent amount of neurons per layer have been tested. The optimal configuration is the second RNN, see Figure6.

Note: the features that are used for the results in table1and table

2are mathematically different than in the next section. It explains the relatively big difference in MSE values.

4.2 Influence of different lanes and features

Different lanes on the same crossroad can have very different be-haviour. Some lanes are very busy during rush hour in the morning and some lanes may be very busy during rush hour in the evening. Also, some lanes may not have any peaks at rush hour at all; these lanes are probably not used for commuting. The machine learning models may not be able to generalize behaviour from one lane to another lane, due to these differences in behaviour. To tackle this problem, models are trained on multiple of different lanes, instead of only one lane. The RNN model has been tested with different sets of features; 1 feature: only saturation as feature, 2 features: a combi-nation of saturation and time of day and 3 features: a combicombi-nation of saturation, time of day and day of the week.

(8)

Figure 6: Architecture of the best performing RNN that has been used for experimenting and generating the final results

Table3shows the results of training the RNN model on 1 lane with different feature combinations. The data is trained on 80 per-cent of lane 0 of V-log1234 and validated on the other 20 perper-cent of the lane. It appears that 3 features perform worse than 1 or 2 features, when training on the same lane.

Table 3: Optimizing feature input training and validating on lane 0 from V-log 1234

Features MSE x 1000

1 1,29

2 1,27

3 1,44

Table4shows the results of training the RNN model on 3 lanes with different feature combinations. The data is trained on 33 per-cent of lane 0, 1 and 2 of V-log1231 (in order to keep the training size the same over experiments, only a small part of the data is used) and validated on the lane 0 of V-log1234. It appears that 3 features perform significantly better than 1 or 2 features. In combination with the results of Table3, it can be concluded that using 1 feature may give better results than using several features, when training is done on only 1 lane. When training is done on several lanes, several features give better results. When training is done on only one lane, the model is probably over fitting the extra features on the training set. However, when more lanes are added to the training set, the

model does not over fit. The difference between the values of the additional features between lanes is probably the cause for this. Table 4: Optimizing feature input training on lane 0, 1 and 2 from V-log 1231 and validating on lane 0 from V-log 1234

Features MSE x 1000

1 1,74

2 1,47

3 1,18

Table5shows the results of training the RNN model on varying amount of lanes with 3 features. Every RNN model is trained on an equal data size. For one lane, 2 years of data (this is the maximum data available for 1 lane) is used to train the model on, for two lanes, 1 year of data is used from both lanes to train the model on, etc. Training has been done on the lanes of V-log1231 and validation has been done on lane 0 from V-log1234.

Table 5: Optimizing lane input training with 3 features Lanes MSE x 1000 1 1,44 2 2,67 3 1,18 4 1,38 5 1,35

When training on 3 lanes the lowest MSE is achieved, see table5. Increasing the amount of lanes does not seem to positively influence the result all the time. When increasing the amount of lanes from 1 (MSE = 1.44) to 2 (MSE = 2.67), there is a very big rise in MSE. The reason for this is probably because lane 2 has very different behaviour than the validation lane. By adding lane 2 to the training set, the RNN is learning very specific behaviour that does not apply perfectly to the validation lane.

Table6shows the results of training the RNN model on varying amount of lanes with 3 features without restricting the data size when increasing lanes to train the model on. All the data available for every lane is used (2 year total per lane). Training on 1 lane is done with a total of 2 year data, training on 2 lanes is done with a total of 4 years of data, etc. The lowest MSE is found for training on 3 lanes. It appears that increasing the amount of data by adding different lanes positively lowers the MSE significantly. The MSEs in table6are lower than in table5for training on 1, 2 or 3 lanes and higher when training is done on 4 or 5 lanes. This means that using more data does not decrease the MSE per se, but it may help for certain configurations.

4.3 Predicting different lanes

Based on previous experiments, a few models are tested to see if they are able to make predictions for all the lanes (11 lanes in total) of V-log 1234. The models that have been compared are: RNN training on 3 lanes (V-log 1231: lanes 0, 1 and 2. 6 years of total data) with 3 features, RNN training on 1 lane (V-log 1231: lane 0) with 1 feature, RNN training on 1 lane (V-log 1231: lane 0) with 3 7

(9)

Table 6: Increasing data input for training with 3 features Lanes MSE x 1000 1 2,20 2 1,29 3 1,16 4 1,92 5 1,67

features and the persistence model. The MSE has been calculated for predictions on every lane. The mean MSE of all the lanes has also been calculated. The RNN with 3 features trained on 3 lanes has the lowest mean MSE of 3.04. The second lowest mean MSE, 3,09, is found for the RNN trained on 1 lane with 1 feature. The persistence model has a mean MSE of 3.50 and the lowest MSE, 5,34, is found for the RNN trained on 1 lane with 3 features. The model with a mean MSE of 3.04, the RNN trained on 3 lanes with 3 features, has the best average performance for new lanes. See Table

7for a overview of all the MSE values on every lane.

5 RESULTS

The RNN model that performed best on the experiments is com-pared with two models: a random forest regression and a persistence model. The MSE will be used as statistic to compare the models. Also, the Pearson coefficient [2] of the best baseline model, the persistence model, will be compared with the Pearson coefficient of the best RNN model.

5.1 Persistence

Applying the persistence baseline model on the data of V-log1234 lane 0 resulted in an MSE(x1000) of 2,10. See Figure7for the pre-dicted saturation values over a time span of a few hours. Further-more, the Pearson coefficient has been calculated for the real and predicted values, see Figure8. The Pearson coefficient = 0.94. The figure shows that the model has some outliers, for example a value where a saturation of 0.6 is predicted, but the actual value is a saturation of approximately 0.

5.2 Random Forest Regression

The random forest regression model is trained on parts of the data of the first three lanes of log1231 and validated on lane 0 of V-log1234. This resulted in an MSE(x1000) of 4.44. See Figure9for the predicted saturation values over a time span of a few hours.

5.3 RNN

The RNN model that performed best is the model that is trained on all the data of the first three lanes of V-log1231 and validated on lane 0 of V-log1234. This resulted in an MSE(x1000) of 1.16. See Figure10and Figure11for the predicted saturation values over a time span of a few hours and a day. Furthermore, the Pearson coefficient has been calculated for the real and predicted values, see Figure12. The Pearson coefficient = 0.97. This is 0.03 higher than the Pearson coefficient of the persistence model and 0.03 lower than 1 (which is a perfect score). The RNN model scores better on the MSE and Pearson statistic than the persistence model. In

Figure 7: Predictions and real values of V-log1234 lane 0, by apply-ing the persistence model over a time span of a few hours on the 19th of Januari.

Figure 8: Scatterplot with Pearson correlation of the persistence model with the real values on the y-axis and predicted values on the x-axis. Also, on two axes boxplots can be found that visualize the distribution of saturation values.

addition to this, the mean squared error has been calculated for different groups of saturation values, see Figure13. The figure shows that the performance of the RNN model is similar for every value of saturation. The highest MSE (1.23) is found for saturation values between 0.2 and 0.3. The lowest MSE (0.91) is found for saturation values between 0.6 and 0.7. This means that saturation 8

(10)

Figure 9: Predictions and real values of V-log1234 lane 0 by applying a random forest model on a time span of a few hours on the 19th of Januari.

values between 0.6 and 0.7 are predicted more accurately than values between 0.2 and 0.3.

Figure 10: Predictions and real values of V-log1234 lane 0 by apply-ing the RNN model (trained on 3 lanes with 3 features) on a time span of 24 hours on the 19th of Januari.

6 DISCUSSION AND FUTURE WORK

The RNN model can still be improved. For future work new features should be added to the model. For instance, a feature for national holidays will probably slightly increase the accuracy of the RNN that is trianed on multiple lanes. This is because there is different traffic behaviour on national holidays (no commuting for example). For future research it would be interesting to see how accurate predictions can get with highly complex models specially designed for saturation forecasting. The RNN used in this project is relatively simple and straight forward. Priority has been to create a simple model that is able to make solid predictions. A very complex model may increase the prediction accuracy.

Figure 11: Predictions and real values of V-log1234 lane 0 by apply-ing a RNN model (trained on 3 lanes with 3 features) on a few hours on the 19th of Januari.

Figure 12: Scatterplot with Pearson correlation of the RNN with the real values on the y-axis and predicted values on the x-axis. Also, on two axes boxplots can be found that visualize the distribution of saturation values.

Literature suggested that a RNN is a very solid choice for pre-dicting any type of timeseries. However, it is unclear if the RNN performs the best. Some other models that focus on predicting time series are ARIMA, SARIMA, SARIMAX, GARCH, etc [8]. These models may perform just as well as the RNN or perhaps even better. However, these models are probably not scale able to new lanes 9

(11)

Figure 13: Barchart with the MSE for the prediction of lane 0 from V-log1234 with the RNN model. MSE(x1000) on the y-axis and satu-ration values on the x-axis. For every "bin" of satusatu-ration, the MSE has been calculated.

and not use able for analyzing live streams of data. On the opposite, this is something that an RNN is capable of.

The headway in equation2proposed in the related work section is based on some assumptions. A few of these assumptions may not be in line with real situations that occur on the crossroads. The lanes do not have 3.6 meter width exactly, heavy vehicles may enter the crossroad, lanes may have a steep gradient, bus stops may be nearby the intersection and also cyclist and pedestrians may cross the intersection. Using this equation without proper adjustments would be hard to justify. For this project the equation has been used as guideline in approaching saturation and headway calculation. The equations used for this project however are not concerned with these assumptions. These formulas are based on the headway equation proposed but are not influenced by these assumptions.

7 CONCLUSION

The RNN that is trained on all the data of the first three lanes of V-log1231 has the lowest mean MSE, a mean MSE(x1000) of 3.04. The RNN with 1 feature, trained on only lane 0 of V-log1231, has a similar mean MSE(x1000) of 3.09. The persistence model has a mean MSE(x1000) of 3.50 and the RNN with 3 features, trained on only lane 0 of V-log1231, has a mean MSE(x1000) of 5.34. The RNN with 3 features trained on 3 lanes and the RNN with 1 feature trained on 1 lane scale significantly better to new lanes than the other models. The model trained on 1 lane with 3 feature performed the worst. This indicates that extra features do not improve scalability of the model when training on only 1 lane. Extra features do improve the scalability of the model when more lanes are used to train on. A Pearson coefficient of 0.97 for the predictions of the best performing RNN indicates that the predicted and real values are very similar. This is also 0.03 higher than the Pearson coefficient of the persistence model.

The RNN proved to be able to outperform the random forest model and persistence model. The RNN is an adequate forecasting model for traffic data. The modelling approach that scales best to new lanes for 15 minutes predictions uses 3 lanes as input data (6 years of data), has 3 features (time of day, day of week and previous saturation values), has a memory o 120 minutes and has a memory step of 1 minute.

REFERENCES

[1] Lakshmi Dhevi Baskar, Bart De Schutter, J. Hellendoorn, and Zoltán Papp. Traffic control and intelligent vehicle highway systems : A survey âĹŮ. 2012. [2] Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. Pearson

Cor-relation Coefficient, pages 1–4. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.

[3] CJ Bester and WL Meyers. Saturation flow rates. In Proceedings of the 26 th South African Transport Conference (SATC), pages 560–568. Citeseer, 2007.

[4] Carlos F.M. Coimbra and Hugo T.C. Pedro. Chapter 15 - stochastic-learning methods. In Jan Kleissl, editor, Solar Energy Forecasting and Resource Assessment, pages 383 – 406. Academic Press, Boston, 2013.

[5] Grzegorz Dudek. Short-term load forecasting using random forests. In D. Filev, J. Jabłkowski, J. Kacprzyk, M. Krawczak, I. Popchev, L. Rutkowski, V. Sgurev, E. Sotirova, P. Szynkarczyk, and S. Zadrozny, editors, Intelligent Systems’2014, pages 821–828, Cham, 2015. Springer International Publishing.

[6] S.L Ho, M Xie, and T.N Goh. A comparative study of neural network and box-jenkins arima modeling in time series prediction. Computers and Industrial Engineering, 42(2):371 – 375, 2002.

[7] M.G. Karlaftis and E.I. Vlahogianni. Statistical methods versus neural networks in transportation research: Differences, similarities and some insights. Trans-portation Research Part C: Emerging Technologies, 19(3):387 – 399, 2011. [8] Bohdan M. Pavlyshenko. Machine learning models for sales time series

forecast-ing. 11 2018.

[9] Wenji Mao and Fei-Yue Wang. Chapter 8 - cultural modeling for behavior analysis and prediction. In Wenji Mao and Fei-Yue Wang, editors, New Advances in Intelligence and Security Informatics, pages 91 – 102. Academic Press, Boston, 2012.

[10] J. Mei, D. He, R. Harley, T. Habetler, and G. Qu. A random forest method for real-time price forecasting in new york electricity market. In 2014 IEEE PES General Meeting | Conference Exposition, pages 1–5, July 2014.

[11] Harm Jan Mostert (Senior Policy Advisor Smart mobility at Provincie Noord-Holland). Personal communication.

[12] Ranadip Pal. Chapter 6 - overview of predictive modeling based on genomic characterizations. In Ranadip Pal, editor, Predictive Modeling of Drug Sensitivity, pages 121 – 148. Academic Press, 2017.

[13] Haşim Sak, Andrew Senior, and Françoise Beaufays. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Fifteenth annual conference of the international speech communication association, 2014.

[14] Julien Salotti, Serge Fenet, Romain Billot, Nour-Eddin El Faouzi, and Christine Solnon. Comparison of traffic forecasting methods in urban and suburban context. In Internationale Conference on Tools with Artificial Intelligence (ICTAI), pages 846–853, Volos, Greece, November 2018. IEEE.

[15] John Semmlow. Chapter 8 - linear system analysis: Applications. In John Semm-low, editor, Signals and Systems for Bioengineers (Second Edition), Biomedical Engineering, pages 317 – 374. Academic Press, Boston, second edition edition, 2012.

[16] Werner Siegloch. Die leistungsermittlung an knotenpunkten ohne lichtsignals-teuerung. STRASSENBAU U STRASSENVERKEHRSTECH, (154), 1973. [17] Lin Tan. Chapter 17 - code comment analysis for improving software

qual-ity**this chapter contains figures, tables, and text copied from the authorâĂŹs phd dissertation and the papers that the author of this chapter coauthored [[3], [1], [35], [7 ]]. sections 17.2.3, 17.4.3, 17.5, and 17.6 are new, and the other sections are augmented, reorganized, and improved. In Christian Bird, Tim Menzies, and Thomas Zimmermann, editors, The Art and Science of Analyzing Software Data, pages 493 – 517. Morgan Kaufmann, Boston, 2015.

[18] Thomas van der Ham. Real-time detection of traffic behavior using traffic loops. 2018.

[19] Eleni Vlahogianni, Matthew Karlaftis, and John Golias. Short-term traffic fore-casting: Where we are and where weâĂŹre going. Transportation Research Part C: Emerging Technologies, 43, 06 2014.

[20] Jia-Shu Zhang and Xian-Ci Xiao. Predicting chaotic time series using recurrent neural network. Chinese Physics Letters, 17(2):88–90, feb 2000.

(12)

8 SOME ELABORATION

In this section I would like to discuss the reasoning behind choosing between different ways of formulating headway for calculating the saturation. The crux of the matter is that if headway is dependent on the amount of cars passed, the saturation curve does not show the behaviour that we would expect.

For example, for this project, every lane has its own minimum headway. This minimum headway is constant for all the calculations and does not differ per traffic light cycle. This results in normal expected behaviour for the traffic flow on the crossroad, see Figure

14. The graph shows the saturation curve on a Monday; we expect a rush hour somewhere between six and nine in the morning and four and eight in the afternoon/evening. The graph shows this expected behaviour: a peak around eight in the morning, a peak around five in the afternoon and the lowest saturation values are during night time.

Figure 14: Saturation values for Monday 10 august with constant headway.

Instead of using a constant value for headway, several headway functions have been tried that are dependent on the amount of cars that have passed in a traffic cycle, to have headway follow behaviour that looks the same as the curve in figure1. I will discuss one of them, see Figure15. Every traffic light cycle, the amount of cars determine the headway value. The headways in Figure15are the average headway values measured measured over a day per amount of cars passed. This resulted in unexpected behaviour for the saturation curve, see Figure16. There are no peaks anymore, which means it is not possible to detect rush hours. This saturation curve is almost flat for the biggest part of the day and does not reflect the behaviour of a traffic intersection. That is why the headway formula that is dependent on the cars that have passed has not been used to calculate saturation. Note that the saturation values from15are different from14. The exact saturation values are not so important, the focus is on the curve.

Additionally, the minimum headway dependent of cars has also been used for calculating saturation. It resulted in a plot that is similar to Figure15, and thus the saturation plot was also similar to Figure16. This headway function also did not reflect the behaviour of traffic on an intersection.

Figure 15: Average headway values dependent on cars passed.

Figure 16: Saturation values for Monday 10 august with average headway dependent on cars passed.

The next few pages will consist of of different notebooks that I have used during this project. The first notebook, RNN3Features5lanes, is used to train a RNN on 5 different lanes. The second notebook, Headway, is used to calculate all the values for headway and out-puts 104 json files with saturation values and timestamps for every lane. The third notebook, CombineJSONS, is used to combine all the json files with saturation values to a panda dataframe that is ready to analyse with machine learning models. Also, CombineJSONS adds two features. I will not be including every notebook that I have used for this project, since the content either overlaps too much with the three notebooks discussed above or the content is not interesting for the public.

(13)

Table7shows the MSE values for the validation of different models on the lanes of V-log 1231. Model1 is an RNN with 1 feature trained on 1 lane. Model2 is a RNN with 3 features trained on 3 lanes. Model3 is a RNN with 3 features trained on 3 lanes and Model4 is the persistence model.

Table 7: MSE values for validation on lanes of V-log 1231 Lanenumber Model1 Model2 Model3 Model4

0 1.30 2.20 1.16 2.10 1 1.20 1.54 1.01 1.90 2 5.50 9.82 5.22 6.17 3 5.44 7.77 6.96 6.48 4 0.58 0.87 0.60 1.09 5 14.05 27.18 12.19 10.85 6 2.14 3.30 2.33 3.70 7 1.51 2.75 1.55 1.98 8 1.22 1.56 1.29 2.49 9 0.47 0.99 0.50 0.78 10 0.59 1.18 0.61 0.90 Mean MSE 3.09 5.34 3.04 3.50

9 PYTHON SCRIPTS

12

(14)

RNN3Features5data

June 27, 2019

In [ ]:

import

pandas

as

pd

import

math

# In this notebook a LSTM will be trained, and several graphs will be plotted.

# Load in different data sets: 5 lanes to train on and 1 lane to validate on.

data1231lane0

=

pd

.

read_csv(

"vlog201231FinalLane0.csv"

)

data1231lane1

=

pd

.

read_csv(

"vlog201231FinalLane1.csv"

)

data1231lane2

=

pd

.

read_csv(

"vlog201231FinalLane2.csv"

)

data1231lane3

=

pd

.

read_csv(

"vlog201231FinalLane3.csv"

)

data1231lane4

=

pd

.

read_csv(

"vlog201231FinalLane4.csv"

)

data2

=

pd

.

read_csv(

"vlog201234FinalLane0.csv"

)

In [ ]:

# Calculate the mean and standard deviation, but only do it for 80 percent of the data. We do not want to use the mean of the

# test set. 80 percent of the data is train set.

meanlane0

=

data1231lane0[

"saturation0"

]

.

iloc[:

int

(

0.80*

len

(data1231lane0))]

.

mean()

stdlane0

=

data1231lane0[

"saturation0"

]

.

iloc[:

int

(

0.80*

len

(data1231lane0))]

.

std()

meanlane1

=

data1231lane1[

"saturation1"

]

.

iloc[:

int

(

0.80*

len

(data1231lane1))]

.

mean()

stdlane1

=

data1231lane1[

"saturation1"

]

.

iloc[:

int

(

0.80*

len

(data1231lane1))]

.

std()

meanlane2

=

data1231lane2[

"saturation2"

]

.

iloc[:

int

(

0.80*

len

(data1231lane2))]

.

mean()

stdlane2

=

data1231lane2[

"saturation2"

]

.

iloc[:

int

(

0.80*

len

(data1231lane2))]

.

std()

meanlane3

=

data1231lane3[

"saturation3"

]

.

iloc[:

int

(

0.80*

len

(data1231lane3))]

.

mean()

stdlane3

=

data1231lane3[

"saturation3"

]

.

iloc[:

int

(

0.80*

len

(data1231lane3))]

.

std()

meanlane4

=

data1231lane4[

"saturation4"

]

.

iloc[:

int

(

0.80*

len

(data1231lane4))]

.

mean()

stdlane4

=

data1231lane4[

"saturation4"

]

.

iloc[:

int

(

0.80*

len

(data1231lane4))]

.

std()

meanlaneDIF

=

data2[

"saturation0"

]

.

iloc[:

int

(

0.80*

len

(data2))]

.

mean()

stdlaneDIF

=

data2[

"saturation0"

]

.

iloc[:

int

(

0.80*

len

(data2))]

.

std()

In [ ]:

# Scale the saturation values to a value between -1 and 1.

data1231lane0[

"saturation0"

]

=

(data1231lane0[

"saturation0"

]

-

meanlane0)

/

stdlane0

1

(15)

data1231lane1[

"saturation1"

]

=

(data1231lane1[

"saturation1"

]

-

meanlane1)

/

stdlane1

data1231lane2[

"saturation2"

]

=

(data1231lane2[

"saturation2"

]

-

meanlane2)

/

stdlane2

data1231lane3[

"saturation3"

]

=

(data1231lane3[

"saturation3"

]

-

meanlane3)

/

stdlane3

data1231lane4[

"saturation4"

]

=

(data1231lane4[

"saturation4"

]

-

meanlane4)

/

stdlane4

data2[

"saturation0"

]

=

(data2[

"saturation0"

]

-

meanlaneDIF)

/

stdlaneDIF

In [ ]:

import

numpy

as

np

from

numpy

import

array

from

sklearn.model_selection

import

train_test_split

# Put the features of the panda dataframe into lists.

sequencelane0

=

list

(data1231lane0[

"saturation0"

])

sequenceHourlane0

=

list

(data1231lane0[

"TimeOfDay"

])

sequencelane1

=

list

(data1231lane1[

"saturation1"

])

sequenceHourlane1

=

list

(data1231lane1[

"TimeOfDay"

])

sequencelane2

=

list

(data1231lane2[

"saturation2"

])

sequenceHourlane2

=

list

(data1231lane2[

"TimeOfDay"

])

sequencelane3

=

list

(data1231lane3[

"saturation3"

])

sequenceHourlane3

=

list

(data1231lane3[

"TimeOfDay"

])

sequencelane4

=

list

(data1231lane4[

"saturation4"

])

sequenceHourlane4

=

list

(data1231lane4[

"TimeOfDay"

])

dayFeaturelane0

=

list

(data1231lane0[

"dayScaled"

])

dayFeaturelane2

=

list

(data1231lane2[

"dayScaled"

])

dayFeaturelane1

=

list

(data1231lane1[

"dayScaled"

])

dayFeaturelane3

=

list

(data1231lane3[

"dayScaled"

])

dayFeaturelane4

=

list

(data1231lane4[

"dayScaled"

])

sequenceDifferentCR

=

list

(data2[

"saturation0"

])

sequenceHourDifferentCR

=

list

(data2[

"TimeOfDay"

])

dayFeatureDifferentCR

=

list

(data2[

"dayScaled"

])

# Delete panda dataframes, we do not want to overload the memory.

del

[data1231lane0,data1231lane1,data1231lane2,data2,data1231lane3,data1231lane4]

# This function prepares the data into lists of timesteps, for the LSTM to understand.

def

split_sequence(sequence,m,p,ms):

X, y

=

list

(),

list

()

for

i

in

range

(

len

(sequence)):

2

(16)

# find the end of this pattern

end_ix

=

i

+

m

# check if we are beyond the sequence

if

end_ix

+

p

>

len

(sequence)

-1

:

break

# gather input and output parts of the pattern

seq_y

=

sequence[end_ix

+

p]

seq_x

=

sequence[i:end_ix

+1

][

0 ::ms]

X

.

append(seq_x)

y

.

append(seq_y)

return

array(X), array(y)

m

= 120

p

= 15

ms

= 1

epochnumber

= 15

lstmn

=5

# Create lists with trainvalues and their targets.

xlane0,ylane0

=

split_sequence(sequencelane0,m,p,ms)

xHourlane0,yHourlane0

=

split_sequence(sequenceHourlane0,m,p,ms)

xlane0

=

xlane0

.

reshape((xlane0

.

shape[

0 ], xlane0

.

shape[

1 ],

1 ))

xHourlane0

=

xHourlane0

.

reshape((xHourlane0

.

shape[

0 ],xHourlane0

.

shape[

1 ],

1 ))

xlane1,ylane1

=

split_sequence(sequencelane1,m,p,ms)

xHourlane1,yHourlane1

=

split_sequence(sequenceHourlane1,m,p,ms)

xlane1

=

xlane1

.

reshape((xlane1

.

shape[

0 ], xlane1

.

shape[

1 ],

1 ))

xHourlane1

=

xHourlane1

.

reshape((xHourlane1

.

shape[

0 ],xHourlane1

.

shape[

1 ],

1 ))

xlane2,ylane2

=

split_sequence(sequencelane2,m,p,ms)

xHourlane2,yHourlane2

=

split_sequence(sequenceHourlane2,m,p,ms)

xlane2

=

xlane2

.

reshape((xlane2

.

shape[

0 ], xlane2

.

shape[

1 ],

1 ))

xHourlane2

=

xHourlane2

.

reshape((xHourlane2

.

shape[

0 ],xHourlane2

.

shape[

1 ],

1 ))

xlane3,ylane3

=

split_sequence(sequencelane3,m,p,ms)

xHourlane3,yHourlane3

=

split_sequence(sequenceHourlane3,m,p,ms)

xlane3

=

xlane3

.

reshape((xlane3

.

shape[

0 ], xlane3

.

shape[

1 ],

1 ))

xHourlane3

=

xHourlane3

.

reshape((xHourlane3

.

shape[

0 ],xHourlane3

.

shape[

1 ],

1 ))

xlane4,ylane4

=

split_sequence(sequencelane4,m,p,ms)

xHourlane4,yHourlane4

=

split_sequence(sequenceHourlane4,m,p,ms)

xlane4

=

xlane4

.

reshape((xlane4

.

shape[

0 ], xlane4

.

shape[

1 ],

1 ))

xHourlane4

=

xHourlane4

.

reshape((xHourlane4

.

shape[

0 ],xHourlane4

.

shape[

1 ],

1 ))

xHourDifferentCR,yHourDifferentCR

=

split_sequence(sequenceHourDifferentCR,m,p,ms)

x2,y2

=

split_sequence(sequenceDifferentCR,m,p,ms)

(17)

x2

=

x2

.

reshape((x2

.

shape[

0 ], x2

.

shape[

1 ],

1 ))

x_trainDifferentCR,x_testDifferentCR,y_trainDifferentCR,y_testDifferentCR

=

train_test_split(x2,y2,test_size

=0.2

,random_state

=4

,shuffle

=

False

)

# Split everything into train and test sets.

x_trainlane0,x_testlane0,y_trainlane0,y_testlane0

=

train_test_split(xlane0,ylane0,test_size

=0.2

,random_state

=4

,shuffle

=

False

)

xHour_trainlane0,xHour_testlane0,yHour_trainlane0,yHour_testlane0

=

train_test_split(xHourlane0,yHourlane0,test_size

=0.2

,random_state

=4

,shuffle

=

False

)

xFeatureDay_trainlane0,xFeatureDay_testlane0,yFeatureDay_trainlane0,yFeatureDay_testlane0

=

train_test_split(np

.

array(dayFeaturelane0),np

.

array(dayFeaturelane0),test_size

=0.2

,random_state

=4

,shuffle

=

False

)

x_trainlane1,x_testlane1,y_trainlane1,y_testlane1

=

train_test_split(xlane1,ylane1,test_size

=0.2

,random_state

=4

,shuffle

=

False

)

xHour_trainlane1,xHour_testlane1,yHour_trainlane1,yHour_testlane1

=

train_test_split(xHourlane1,yHourlane1,test_size

=0.2

,random_state

=4

,shuffle

=

False

)

xFeatureDay_trainlane1,xFeatureDay_testlane1,yFeatureDay_trainlane1,yFeatureDay_testlane1

=

train_test_split(np

.

array(dayFeaturelane1),np

.

array(dayFeaturelane1),test_size

=0.2

,random_state

=4

,shuffle

=

False

)

x_trainlane2,x_testlane2,y_trainlane2,y_testlane2

=

train_test_split(xlane2,ylane2,test_size

=0.2

,random_state

=4

,shuffle

=

False

)

xHour_trainlane2,xHour_testlane2,yHour_trainlane2,yHour_testlane2

=

train_test_split(xHourlane2,yHourlane2,test_size

=0.2

,random_state

=4

,shuffle

=

False

)

xFeatureDay_trainlane2,xFeatureDay_testlane2,yFeatureDay_trainlane2,yFeatureDay_testlane2

=

train_test_split(np

.

array(dayFeaturelane2),np

.

array(dayFeaturelane2),test_size

=0.2

,random_state

=4

,shuffle

=

False

)

x_trainlane3,x_testlane3,y_trainlane3,y_testlane3

=

train_test_split(xlane3,ylane3,test_size

=0.2

,random_state

=4

,shuffle

=

False

)

xHour_trainlane3,xHour_testlane3,yHour_trainlane3,yHour_testlane3

=

train_test_split(xHourlane3,yHourlane3,test_size

=0.2

,random_state

=4

,shuffle

=

False

)

xFeatureDay_trainlane3,xFeatureDay_testlane3,yFeatureDay_trainlane3,yFeatureDay_testlane3

=

train_test_split(np

.

array(dayFeaturelane3),np

.

array(dayFeaturelane3),test_size

=0.2

,random_state

=4

,shuffle

=

False

)

x_trainlane4,x_testlane4,y_trainlane4,y_testlane4

=

train_test_split(xlane4,ylane4,test_size

=0.2

,random_state

=4

,shuffle

=

False

)

xHour_trainlane4,xHour_testlane4,yHour_trainlane4,yHour_testlane4

=

train_test_split(xHourlane4,yHourlane4,test_size

=0.2

,random_state

=4

,shuffle

=

False

)

xFeatureDay_trainlane4,xFeatureDay_testlane4,yFeatureDay_trainlane4,yFeatureDay_testlane4

=

train_test_split(np

.

array(dayFeaturelane4),np

.

array(dayFeaturelane4),test_size

=0.2

,random_state

=4

,shuffle

=

False

)

xHour_trainDifferentCR,xHour_testDifferentCR,yHour_trainDifferentCR,yHour_testDifferentCR

=

train_test_split(xHourDifferentCR,yHourDifferentCR,test_size

=0.2

,random_state

=4

,shuffle

=

False

)

xFeatureDay_trainDifferentCR,xFeatureDay_testDifferentCR,yFeatureDay_trainDifferentCR,yFeatureDay_testDifferentCR

=

train_test_split(np

.

array(dayFeatureDifferentCR),np

.

array(dayFeatureDifferentCR),test_size

=0.2

,random_state

=4

,shuffle

=

False

)

# Memory deletion.

del

[xlane0,ylane0,xlane1,ylane1,xlane2,ylane2,x2,y2,xlane3,ylane3,xlane4,ylane4]

del

[xHourlane0,yHourlane0,xHourlane1,yHourlane1,xHourlane2,yHourlane2,xHourDifferentCR,yHourDifferentCR,xHourlane3,yHourlane3,xHourlane4,yHourlane4]

del

[sequencelane0,sequencelane1,sequencelane2,sequencelane3,sequencelane4,sequenceHourlane4,sequenceDifferentCR,sequenceHourlane1,sequenceHourlane0,sequenceHourlane2,sequenceHourlane3,sequenceHourDifferentCR,dayFeaturelane0,dayFeaturelane1,dayFeaturelane2,dayFeaturelane3,dayFeaturelane4,dayFeatureDifferentCR]