• No results found

Behavioural Responses to Time-Based Dynamic Pricing: Evidence from Online Fuel Diaries Master Thesis

N/A
N/A
Protected

Academic year: 2021

Share "Behavioural Responses to Time-Based Dynamic Pricing: Evidence from Online Fuel Diaries Master Thesis"

Copied!
68
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Behavioural Responses to Time-Based Dynamic

Pricing: Evidence from Online Fuel Diaries

Master Thesis

Tonny Romensen

s2246694

January 13, 2020

Abstract

Dynamic pricing receives much recent attention, yet little is known about how consumers respond and deal with price volatility. By focusing on a market with frequent price changes, fuel, this thesis examines the effect of price volatility on driving behaviour (fuel efficiency). Using self-obtained online fuel diaries from more than 5.000 drivers, I find that price volatility increases drivers’ fuel efficiency. Further analyses reveal that this effect is larger for low-income drivers and when prices on average have increased during a period. The thesis concludes with policy recommendations and suggestions for future research.

Keywords: dynamic pricing, time-based pricing, fuel consumption, fuel efficiency, fairness

Program: MSc Marketing (Specialization: Marketing Intelligence) Supervisors Dr. A.E. Vomberg

(2)

Contents

1 Introduction 5

2 Related literature 7

2.1 Literature review . . . 7

3 Hypotheses development 11 4 Discussion of the dataset 13 4.1 Data sources . . . 14

4.1.1 Dataset 1 - Fuel diaries . . . 15

4.1.2 Dataset 2 - User information . . . 17

4.1.3 Dataset 3 - Fuel prices . . . 18

4.1.4 Dataset 4 - Weather . . . 19

4.1.5 Dataset 5 - Car specifications . . . 20

4.2 Data collection . . . 21

4.2.1 Merging the datasets . . . 23

4.2.2 Scraping logs and errors . . . 23

4.3 Cleaning steps for the main dataset . . . 24

4.3.1 Dataset: Fuel Diaries . . . 24

4.3.2 Panel data . . . 25

4.4 Summary of resulting datasets . . . 27

4.4.1 Comparison to the ideal dataset . . . 27

5 Measurement and analytical strategy 28 5.1 Measurement of main variables . . . 28

5.1.1 Fuel efficiency . . . 28 5.1.2 Refilling ratio . . . 28 5.1.3 Price volatility . . . 28 5.1.4 Income levels . . . 29 5.1.5 Price movements . . . 29 5.2 Control variables . . . 29

5.2.1 Car related variables . . . 29

5.2.2 Fuel tank capacity . . . 30

5.2.3 Fuel type . . . 30

(3)

5.3.1 Price per litre . . . 30

5.3.2 Total kilometre . . . 30

5.3.3 Energy labels . . . 31

5.3.4 Lagged variables . . . 33

5.4 Driver related variables . . . 33

5.5 Station type . . . 33

5.5.1 Rain quantity and temperature . . . 34

5.5.2 Drive style . . . 35

5.6 Variable definitions . . . 35

5.7 Overview of variables, measurement, and descriptive statistics . . . 37

5.7.1 Descriptive statistics . . . 37

6 Model development 39 6.1 Control variables . . . 39

6.2 Hypothesis 1 - Price volatility negatively influences the fuel efficiency . . . 40

6.3 Hypothesis 2 - Income moderates the effects of price volatility on fuel efficiency . 40 6.4 Hypothesis 3 - Price volatility and price increases enlarge the effect of price volatility on fuel efficiency. . . 41

6.5 Hypothesis 4 - Price volatility negatively influences the refilling share . . . 41

6.6 Summary of parameter abbreviations . . . 43

7 Results 44 7.1 The impact of price volatility on the fuel efficiency . . . 44

7.1.1 Model free evidence . . . 44

7.1.2 Regression results . . . 48

7.1.3 Results fuel efficiency . . . 50

7.2 The effects of price volatility on the fuel tank refilling share . . . 50

7.2.1 Model free evidence . . . 50

7.2.2 Regression results . . . 54

7.2.3 Regression results of the refilling share . . . 55

7.3 Post-hoc tests . . . 56

8 Discussion 58

9 Conclusion 60

(4)
(5)

1

Introduction

Recent advances in technology have made it possible for businesses to dynamically vary the price along multiple dimensions, such as consumers, time periods, and locations (Haws and Bearden, 2006). This process is known as dynamic pricing and it is applied by businesses on an increasingly large scale (Jayaraman and Baker, 2003; Kannan and Kopalle, 2001). A benefit of dynamic pricing for firms is that profits can be increased by setting prices in such a way that they approximate consumers’ willingness-to-pay for a good or service. While this practice may decrease consumer surplus, it does enable a wider range of consumers to benefit from the product or service. Dynamic pricing thus has benefits as well as disadvantages, and for this reason it has received much recent attention from researchers and policy makers.

How do consumers respond to the practice of dynamic pricing? How do the responses manifest themselves in consumer behaviour? Which factors influence the type of response by consumers? These are important questions in the context of the increasingly important role of dynamic pric-ing in society. However, surprispric-ingly little is known in the literature about these questions. This is mainly due to data limitations. It is difficult to find settings in which consumer behaviour and responses can be directly linked to price fluctuations. This thesis aims to answer the pre-viously raised questions by focusing on a consumption category with frequent price changes: fuel. Using a large self-obtained dataset with online fuel diaries, I am able to investigate the direct impact of price volatility on short-term and long-term consumer behaviour, differentiate between consumer income levels and control in part for external influences. Specifically, this thesis studies the effect of price volatility on both the fuel efficiency of the user and mutations in the fuel quantity obtained at the gasoline station. In addition, in a sub-analysis I further explore the effects in relation to income levels and differentiate between price shifts.

(6)

the time of acceleration or similar methods that reduce the overall fuel consumption. These methods can offset the negative influences of fuel price volatility. Hence, this study observes if consumers apply this behaviour and adjust their behaviour in relation to the gasoline consump-tion in case fuel price volatility occurs.

This thesis finds that consumers are susceptible to price volatility and adjust their behaviour accordingly. A high price volatility during an observational period reduces the litres drivers consume for every 100km. Hence, drivers improve their fuel efficiency (p < 0.01)1. A large part of this fuel efficiency improvement is related to the income level. Low income users are more susceptible to price volatility and improve the fuel efficiency to a larger extend in comparison to average income drivers (p < 0.01). The response of high income drivers to price volatility is different compared to low income drivers. The high income drivers reduce the fuel efficiency in case of high price volatility (p < 0.01). This finding signals that drivers from different in-come levels differently interpret the effects of price volatility. The results are is in line with the differences related to costs interpretations, budgets and mental accounts (Thaler, 1985). Compared to the income differences, a larger effect is observed in case the driver does not only observe high price volatility, but also increasing fuel prices during the observational period. In case both the fuel price volatility is high and price per litre increases then the fuel efficiency improves (p < 0.05). Furthermore, the effect of price volatility on consumers is not only limited to the fuel efficiency. Consumers also adjust their gasoline refilling behaviour after a period of high price volatility is observed. Reductions in the quantity obtained in relation to the size of the gasoline tank is observed when price volatility is increasing (p < 0.01). The difference between high and low income users that is observed for fuel efficiency cannot be found in the refilling share.

The findings of this thesis are in line with previous studies. Price volatility has direct implica-tions for the interaction with the product. More price adjustments in the short run reduces the fairness perception of a good or service (Haws and Bearden, 2006). Consumers and producers are entitled to a fair price in the dual entitlement setting (Vaidyanathan and Aggarwal 2003). For the gasoline market, price elasticity is rather low (Knittel and Tanaka, 2019). Addition-ally, the abundance of consumer information and insights allow firms to apply first or second degree price discrimination. These methods require frequent price adjustments to capture a larger share of the consumer surplus from different consumer groups. Thus, increasing the fuel

(7)

price volatility. The increased price volatility in short time-periods is known to reduce the con-sumers fairness perception (Haws and Bearden, 2006). In addition, offering different prices to consumers lowers the intrinsic value of the product (Gelbrich, 2011). This increases the spread between the consumers willingness-to-pay and the market prices. One the one hand, standard economic theory would suggest that in such an event consumers would substitute the product (Hastings and Shapiro, 2013). On the other hand, the price volatility is observed when the product (fuel) has already been obtained. Therefore, adjustments in the driving behaviour are being applied.

This thesis proceeds as follows. Section two discusses related literature. The third section out-lines the hypotheses. The fourth section introduces the data and elaborates on the preparation of the data for the analyses in this thesis. The fifth section presents measurements and the analytical strategy. The sixth section discusses the model development, followed by the results in the seventh section. A discussion of the results is provided in the eight section. The ninth section concludes the thesis.

2

Related literature

This section reviews and discusses the essential publications and concepts important to dynamic pricing in relation to price volatility and consumer interactions. First, the practice of dynamic pricing is introduced. Second, where and how dynamic pricing being applied. Third, consumers’ negative responses to price volatility. Fourth, the consumer benefits related to dynamic pric-ing. Fifth, how consumers perceive price differences between individuals or groups. Sixth, the short and long term effects of dynamic pricing. Seventh, the influences of dynamic pricing on consumer behaviour and finally why would consumers adjust their driving behaviour in case of price volatility.

2.1

Literature review

(8)

firms to optimize profits and should therefore be embraced. On the other hand, consumers have frequently objected to the implementation of dynamic pricing strategies. The objections are commonly related to being unknown to this form of pricing strategy and raise questions about the fairness of this pricing strategy (Haws and Bearden, 2006). Nevertheless, dynamic pricing practices are being developed and implemented for different markets and sectors. Most commonly, firms are able to differentiate between consumers and generally do so through points of reference. Haws and Bearden (2006) discuss four reference points in dynamic pricing. First, firms are able to adjust prices according to the time. Second, it is common to differentiate in pricing depending on the geographic location. Third, producers often increase or decrease prices based on the cost price. Lastly, firms could design personalized pricing practices related to the customer characteristics.

The different reference points allow segmenting or individually targeting consumers. In order to do so, information about the consumers is required. Hence, before the internet and com-puters came into being, dynamic pricing has been discussed theoretically, but few practical implementations had been achieved (Rao, 1984). Nowadays, with the abundance of behavioural consumer data, future product demand can be approximated (Grewal, 2010). In addition, the digital transformation also allows the reduction of the menu costs through digitizing price tags (Kannan, 2001)2. Consequently, prices can be adapted more frequently based on the prevailing model. On the other hand, the increased amount of price changes and personalized pricing strategies are not without consequences. Price adjustments yield price volatility that influence the consumers fairness perceptions (Haws and Bearden, 2006).

Evidence from online retail organizations suggests that consumers firmly oppose to dynamic pricing practices (Garbarino and Lee, 2003). The reasons for this opposition are multi-fold, but three factors prevail. First, consumers are unknown to the practice of dynamic pricing and therefore prefer a proven method of fixed prices compared to dynamic pricing (Courty and Pagliero, 2008). Second, depending on their usage, consumers expect to pay, on average a higher price (Joskow and Wolfram, 2012). Third, consumers are able to memorize and compare prices (Ariely et al., 2019). The memorization is consequential because consumers have attitudinal re-sponses to price fluctuations. The price fluctuations in short periods of time increase the price volatility and are known to influence the consumers fairness perception (Haws and Bearden, 2006). In addition, price fluctuations that yield higher prices further influence the negative

(9)

perception triggered by the behavioural model of loss aversion, where consumers perceive price increases as more negative than price decreases (Kahneman and Tversky, 1979).

Dynamic pricing also yields considerable benefits to the consumer. In basic economic theory, it is expected that markets offer the most efficient outcome. Through dynamic adjustments of the price, consumers are lured into or diverted from certain offerings to ensure consistent quality. Lower prices can attract more customers, but also reduces the interest from producers. Higher prices lower the demand from customers, but increase producers interest. Specifically sectors in which high demand fluctuations are present, such as ride hailing or apartment sharing services, both consumers and producers are able to obtain a better outcome (Cramer and Krueger, 2016). Consumers are ensured that property remains available, while suppliers obtain the highest offer. In addition, consumers are also able to financially gain from a dynamic pricing implementation by adjusting behaviour according to price setting (Dupont et al., 2011). Therefore, the imple-mentation of dynamic pricing strategies can be beneficial to both the consumer and supplier.

On average, consumer responses to dynamic pricing are rather negative (Gelbrich, 2011). Even though both firms and consumers are able to benefit from the practices. Part of this discrepancy is due to the dual entitlement principle. In this principle, it is stated that both the consumer and producer are entitled to a fair price (Vaidyanathan and Aggarwal 2003). Profit maximization is the main aim of any business; meanwhile firms also need to take the demands from consumers into account. The demands from customers are wide spread and related to social and practical influences. One major contributor is the exposure to other consumers. Gelbrich (2011) finds that consumers include the prices offered to other users in the decision process. The inter-personal relationship functions as an additional moderator in defining the fairness perception. Consumers are commonly captivated by superior offers to others and yield the difference as a loss. The extent of losses is related to the status of the interpersonal relationship. A distinction is made between friends, acquaintances and how they interact. Close friends are more impact-full compared to unrelated individuals. This finding is important in an online setting where consumers showcase and comment on each other’s behaviour and prices paid.

(10)

on the decision making process of a consumer. Consumers purchase lower quality products in case higher prices are displayed or higher quality products in case of a price decrease. Moreover, Lin and Prince (2013) confirm the relationship between volatility and demand. Higher price volatility reduces the demand for the observed products accordingly. Both studies assess the short term effects of price volatility. High volatility in the present yields changes in the present. In addition, price volatility also has long term effects on demands. Specifically, price volatility in the formative years sets a standard for future behaviour3. High volatility during the formative

years reduces the demand for goods later in life. The effect has been measured over a period of up to 40 years (Severen and Van Benthem, 2019). Consequently, price volatility is influential and able to steer behaviour in both the short run and the long run.

Interestingly, little is known in the literature about the usage of consumable goods when price volatility increases during the consumption. Lin and Prince (2013) look into the gasoline market and find that price volatility affects the aggregated demand for gasoline. However, this study does not specify the driving behaviour of the car when consumers observe price fluctuations. To my knowledge, a recent study from (Knittel and Tanaka, 2019) is the first article that obtains micro-level data that identifies a relationship between fuel prices and driving behaviour. This study not only confirms that the price elasticity for gasoline is low, but also that consumers adapt their driving behaviour according to the prevailing prices.

Lin and Prince (2013), Knittel and Tanaka (2019) and Van Benthem and Severem (2019) use the gasoline market to identify consumer responses to price movements. This is understand-able from the perspective of data availability for both price data and consumer behaviour. In addition, the gasoline market is known to have dynamic pricing strategies in place (Yuan and Han, 2011). Also, the dynamic pricing strategies influence the price that is constantly being shown to the consumer, using price displays along the road. Consumers are known to uncon-sciously observe information that influences their behaviour (Dijksterhuis et al., 2005). Hence, adjustments in their purchasing behaviour is expected. On the other hand, consumers have a low price elasticity for gasoline (Knittel and Tanaka, 2019). In addition, the price volatility is observed when the fuel is already obtained and regular driving commutes might be hard to substitute in the short run. Therefore, behavioural adjustments that lower the usage of gasoline in times of high price volatility is expected.

(11)

3

Hypotheses development

In the dual entitlement setting, both the consumer and producer expect to be entitled to a fair price (Vaidyanathan and Aggarwal, 2003). The price elasticity for gasoline is low, allowing suppliers to partly adjust prices in favour of the firms goals. In standard economic theory profit is seen as the main goal of the firm. Dynamic pricing is deemed to be a profitable strategy by setting prices equal to the willingness to pay of consumers and consequently capturing a larger share of the consumer surplus (Garbarino, 2003). To observe the effects of dynamic pricing on behaviour, a rich dataset is required to observe the behaviour of the driver and the prevailing prices. This study is able to obtain unique driver specific fueling behaviour data for an extended period of time in the gasoline market. Fuel stations in general apply (time-based) dynamic pricing (Schechner, 2017). In order to capture a larger share of the consumer surplus, prices need to be adjusted based on consumer characteristics. The consumers who pass-by a certain gasoline station differ on hourly, daily and weekly basis. Hence, price adjustments are required to align the price with the targeted segment and their consumer-surplus. Therefore, the fuel price volatility increases. The adjustments are communicated online and offline to the consumers. The effects of price volatility on consumers negatively influences the fairness per-ception and the intrinsic value (Haws and Bearden, 2006). Consequently, the spread between market value and intrinsic value widens.

On average, the price elasticity for gasoline is low (Knittel and Tanaka, 2019). Therefore, con-sumers are likely to adjust their demand to compensate the price increase. In addition, the price volatility is also observed when gasoline has already been obtained. The depletion of the gaso-line tank on average requires two weeks4. During these two weeks, the observed price volatility

has no financial impact on the driver, but there might be mental effects. Consumers keep track of a mental budget unconsciously. Thaler (1985) raised the theory of mental accounting. Con-sumers have preset budgets in their mind for specific product categories that they recurrently relate to. Depleting this account yields changes in behaviour. A higher price volatility increases price uncertainty for the consumer. To minimize the impact of future gasoline purchases on their mental account, consumers are expected to improve their driving efficiency when increas-ing price volatility is observed. Therefore, the first hypothesis evaluates the impact of price volatility on the fuel efficiency.

4The two-week average has been calculated based on the fuel diary dataset discussed in the next

(12)

H1: The current price volatility has a positive effect on the fuel efficiency.

Moreover, consumer’s responses to dynamic pricing are expected to be heterogeneous. The demographic background, specifically the income, of a consumer is expected to be influential on the consumer interpretation of price volatility. Thaler (1985) outlined the presence of men-tal accounts and the behavioural influences. In addition, the study showed that low income consumers, compared to high income consumers are more frequently consulting and recreating mental accounts based on their spending habits. Also, the budgets of low income consumers is smaller. A larger price volatility creates uncertainty related to the influences of prices in the future and the effects on the mental account. Therefore, the increased frequency of consulting the mental account, smaller mental account and price uncertainty enhances the expectation of depletion. Hence, stronger improvements in the fuel efficiency are required for low income users. High income drivers on the other hand have a larger budget and mental account. The impact of price volatility on their fuel efficiency is thus expected to be smaller. The second hy-pothesis compares the impacts of price volatility on fuel efficiency between different income levels

H2: Compared to high income drivers,low income drivers improve the fuel efficiency more when there is a high price volatility.

(13)

H3: When the price increases during high volatility period, the positive impact of price volatility on fuel efficiency is expected to be bigger.

Besides the adjustments in fuel efficiency, an alternate behaviour to compensate the mental loss is the refilling quantity adjustment. Krishna (1994) contends that consumer certainty and learn-ing regardlearn-ing future price volatility results in consumers who wait for promotions to buy and then buy all the quantity necessary on those purchase occasions. Likewise, Mela et al. (1998) confirm that a consumer learning to wait for especially good deals and then stockpiling when those deals occurs. Through the learning effects during the observational period, consumers understand that high levels of price volatility are temporarily. Given a low price elasticity in the gasoline market, consumers are unlikely to decrease their overall fuel consumption. Instead, consumers may adjust their refilling ratio over time to compensate for the discrepancy. Hence, upon the increase of price volatility, consumers would reduce the refilling quantity and postpone additional refill quantities to the future. As soon as the price volatility decreases again, the quantity obtained in relation to the fuel tank will increase. Consequently, I hypothesize the following:

H4: Higher price volatility reduces the quantity obtained in relation to the size of the fuel tank.

4

Discussion of the dataset

(14)

4.1

Data sources

Figure 1: The front page of Autoweek.nl

The dataset that contains information about the drivers fuel efficiency is obtained from the website autoweek.nl Autoweek.nl is a Dutch car website and magazine that specializes in news that is related to the automotive industry. The website was found in 1990 and started as a magazine only publisher, but later introduced a website. Autoweek.nl is ranked as the 178 website in terms of popularity on the comparison website Alexa5. The publisher targets at car

enthusiasts and individuals that are interested in buying a new or second hand vehicle. With a high frequency, new articles, reviews and tutorials are posted to keep the visitor informed about the day-to-day activities in the automotive industry. The front page of the website is utilised to serve the most recent information that has been published. In addition, comparison tools and specification tools are shown that users can operate for searching more detailed and specialised information. The tools on the website offer ample opportunity to record, identify and compare vehicles.

Three well-known tools are Autoweeks Verbruiksmonitor (hereafter referred to as: Fuel diary), the forum and car specifications. First, the Autoweek fuel diary is a tool that allow registered visitors to monitor their fuel consumption over time in combination with related information such as distance, fuel and road type. Second, the forum allows registered users to discuss and

5https://www.alexa.com/siteinfo/autoweek.nl – Alexa tracks websites around the internet for their

(15)

interact with other users about the problem statements mentioned by other users or oneself. The forum also offers ample opportunities to display the users interest and add extra public information. Third, the car specification environment provide detailed car specific information. The information displayed on the website is highly specific for a given type of car and allow visitors to be informed about the capabilities and attributes of vehicles.

Updating the website is a collaboration between the editors of the magazine and the visitors. The information on the frontpage is organised by employees of the magazine. The information on the forum and the fuel diary is supplied by the users. User contributions and interaction enables the scalability of Autoweek. Frequently, new updates are added to the fuel diaries that outline the usage patterns related to fuel consumption and driving behaviour. The forum is very populated and new threads are started daily. The forum has 19 main topics that range from car usage, car repair to other hobbies. In addition, new messages and discussions are posted daily. Visitors are therefore triggered to return. The user profiles of the visitor on the forum are informatively structured. It means that the visitor can display personal information in addition to contributions to the forum.

4.1.1 Dataset 1 - Fuel diaries

(16)

Table 1: Dataset Fuel Diaries Number of observations 303.996 Number of variables 20 Time frame 2010 -2019 Unique drivers 5302 Variables:

- Difference between observed and manufacturer specified - Number of observed refills

- Total observed driving distance

- Manufacturer average kilometre per litre - Average kilometre per litre

- Total car kilometres - Brand

- Year of built - Driven road types - Litre per 100km - Car type - Remarks - Tire type - Identifier - Date - Quantity - Total amount - Fuel type

The fuel efficiency dataset is based on the fuel diary tool, of Autoweek.nl. With this tool, consumers are able to create, update and compare fuel diaries. The new observations are added manually by the driver. The driver is able to register up to 20 pre-specified variables and non-specified remarks. After the submission, the values are added to the personal fuel diary profile of the user and aggregated into summary statistics displayed above the fuel diary. The remarks added by the user commonly indicate the fuel station, miscellaneous or additional information.

The dataset consists of fuel diaries from drivers in the Netherlands. The diaries are updated manually by the driver after a refill. The dataset is based on 22 thousand drivers. In total 330 thousand observations are collected6. The first observation of the dataset is registered in the

year 2002 and the latest in 20197. In the process of adding the fuel consumption, drivers also note the price, type of roads and a self-assigned drive style. Other variables include the type of fuel and other describing elements in line with the purchased quantity.

6The scripts related to data collection are made by the author of this thesis and available on request. 7In the final dataset the observations between the year 2017 and 2019 are removed due to changes on

(17)

4.1.2 Dataset 2 - User information

Figure 3: The user profile page of Autoweek.nl

The user information dataset is based on the user’s profile page on Autoweek.nl. The profiles are directly linked to the forum and forum posts. The user information outlines the interest of the user, age and profession, owned vehicles, fuel diaries and vehicles that the user is inter-ested in. In addition, the profile page shows all the forum posts that the user has written on a subtopic. The user profile is manually updated through specified input fields by the user. The user accounts do not share personal identifiable information if the user refrained from sharing the related information.

(18)

Table 2: User profiles Number of observations: 2.890 Number of variables: 9 Variables:

Member registration date Fuel diary identifier Name Interests Age Gender Interests Occupation

4.1.3 Dataset 3 - Fuel prices

Figure 4: The location-based fuel prices from Athlon.nl

(19)

lease vehicles on the road and therefore the prices are regularly updated throughout the day8. Another benefit of this dataset is the availability of different fuel types. Next to the gasoline price per litre, the location and type of fuel are being registered.

The information from this website has been obtained on a daily bases for a period of over two years. The data has been obtained via an automated Python script that queried the Athlon.nl database on a daily basis. In this process all the available data in relation to the fuel prices are obtained. The dataset consists out of 2.2 million observations and allows for the creation on price and location based subgroups.

Table 3: Athlon Number of observations: 2.116.724 Number of variables: 11 Time Frame: 2015-2016 Variables: Name Price Distance Last update City Time Zip code Link Address Fuel type 4.1.4 Dataset 4 - Weather

The weather dataset has been obtained from the Dutch governmental weather organization KNMI. This organization is responsible for all the nationwide weather related measurements. The KNMI applies an open data policy through which all the available weather data is publicly published in an easy to use format ready for further analysis. The provided information allows the observer to specifically analyse all weather related aspects. For example: The variables

(20)

describe the hours and amount of sun, rain and fog. The presence of particles and air quality indications have also been included (AQI).

The weather dataset is extensive and contains information for over 100 years. During the first years not all of the variables are obtained. Therefore, in the early years the dataset con-tains a large amount of values that show “not available”. On the contrary, only information is required for the time that fuel diaries are collected. The fuel diary dataset contains up to 14 years of data. During this time frame the weather dataset contains all the observations and no “not available” columns are observed.

Figure 5: The open data environment from the Dutch weather agency KNMI

4.1.5 Dataset 5 - Car specifications

(21)

Figure 6: The substitution of gasoline stations when prices change

4.2

Data collection

For this thesis five different datasets are collected. Fuel diary, User information, Athlon fuel prices and the car specifications, are obtained by custom made scrapers that were created by the author of this thesis9. In this section, I will outline the process that was required for obtaining the fuel diary data10.

The database has been constructed based on a self-written automated code that was continu-ously updating the database. The data collection has been separated into two parts. The first part consisted out of obtaining the availability of Verbruiksmonitor (fuel diaries). This has been done via a self- written Python based script that collected every diary that was available. The indexation page of the diaries are sorted based on the moment of creation. Consequently, the most recent created diary was posted on the first page. This allowed a consistent display of

9The scripts that have been created for the data collection are available on request.

(22)

accounts while scrolling through the available pages. Every page displayed a maximum of 15 different diaries that contained a link to the fuel diary environment. The links to the diaries are stored in a format that is readable for the collection scraper. The collection scraper visited the individual diary page after new links had been obtained. The size of the diary could range between a single page and more than 50 pages. Therefore, the collection scraper was instructed to continue to collect data as long as more pages were available. In the collection process, both the diaries and available metadata are obtained. The metadata in this case is related to the car specifications. The regular data are all values described in the fuel diary.

The Autoweek website is a regular website that contains both HTML and Javascript elements. And so, scraping libraries such as Selenium and Scrapy were used. Selenium is required to obtain elements displayed in a Javascript driven environment11. Subsequently, the raw information was added to the Scrapy environment that converted the information and stored the variables and data in SQL databases12. In addition, the storage in this environment was important for the next step where a link had to be made between the fuel diaries and user profiles.

The user profiles are not mentioned on the fuel diary page, but only on the user profile page. Therefore, another Python script collected all the user profiles that are registered on the website of Autoweek.nl. The first scraper has indexed all the users that are registered on the website. In the second step, another scraper approached the user profile and stored the user related information in the SQL dataset. The connection between the fuel diary and the user profile has been made automatically through the comparison of fuel diaries mentioned on the user profile page and the actual link of the fuel diary. Fuel diaries are automatically displayed on the users profile. The obtained profiles are public and open to non-registered users on the days of data collection.

After the dataset was obtained and stored in the SQL dataset, more information was added based on the time and location of the observations. This will be discussed in the next section. After the dataset contained all the information that was required for this study, I exported the dataset as a datafile that is readable to statistical software.

11Selenium is an automated web-browsing environment that can be controlled with scripts, such as

Python

12Scrapy is a tool that can be programmed to send requests to a website and transform the information

(23)

4.2.1 Merging the datasets

In order to operate the dataset all the variables are merged into a single dataset. The common identifier depends on the type of dataset. The fuel diaries and profile information are merged through the related page links. On every profile page a link is mentioned that relates to the specific fuel diary of the user. The pages are merged in case both the unique fuel diary link and the link mentioned on the profile are the same. For a few observations the profile page of the user contained multiple vehicles. In that case the profile page is linked to all the unique vehicles in combination with the related variables. No variables are excluded.

The weather and fuel prices dataset are merged based on the mentioned time. The weather data was available more frequently. Therefore, the mean for every day is calculated and used. The datasets are not related to one specific user. Hence, the observations take only the time into account for merging. For all the observations in the main dataset (fuel diaries) time related variables are available.

A large number of the fuel diaries user profiles are collected and linked. In total it was possible to collect 5.222 unique user profiles. The accounts that have not been obtained are indicated as private. The private accounts do not allow non-registered users to view their profile page. Consequently, for privacy reasons no efforts have been made to collect the information that is behind the privacy wall. The number of obtained user profiles is substantial and no effects on the analysis are expected due to the privacy restrictions. In addition, because this study applies panel data, the number of observations per user is considerable.

4.2.2 Scraping logs and errors

During the data collection it was ensured that no errors would prevail in the obtained data. The scripts are written in Python. Before the code was deployed, five test runs have been started and the obtained data has been manually reviewed for inconsistencies. The data review dur-ing the collection process was operated through random selection of values in the dataset and compare the values in the environment from which the value has been obtained. No errors have prevailed. In addition, the information has been sorted and outliers are flagged for later analysis.

The scraping output and behaviour of the code has been logged throughout the process. It was made sure that no errors occurred13. This is important, because in case a code tries to obtain

(24)

a large quantity of data it frequently occurs that connections are dropped or truncated if the server experience a large amount of traffic from other users. This was also the reason why the collection process was started during the night. Most commonly, traffic during nightly hours on non-international websites is low. Another, during the collection process the output was ob-served and made sure that all the accounts and fuel diaries available were selected and collected.

Finally, after the data was collected, the python logs have been studied and no errors were mentioned. The output in the mySQL databases is exported into comma-separated files for further analysis. The collection was successful and finalized for the analysis.

4.3

Cleaning steps for the main dataset

4.3.1 Dataset: Fuel Diaries

The main dataset contains the drivers fuel diaries. On the fuel diary environment, visitors are able to add personal information related to their fuelling behaviour. The information that is provided by the drivers is manually entered. The submission website allows users to specify the quantities and amounts freely. The majority of the input boxes provide mandatory guidance in the information that can be entered in the forms. This pre-specification makes sure that no string values are added in the numerical fields. In addition, the date is being provided through a popup windows that shows a calendar.

The manual submission of data is prone to typographic errors. The error that most com-monly prevailed is the provision of information in the wrong input box. In these cases, the numbers that are entered do not correspond to the quantity, price or the total number of kilo-metre. Another conventional mistake is the entering of quantities that are not possible given the car the user is driving in. The errors became clear after the vehicles that were used in the fuel diary were compared to the car specification dataset. In a few cases, drivers submitted gasoline quantities that exceeded the capacity of the vehicles fuel tank. In other observations the user submitted a value that is below the minimum requirements threshold of three litres. Other inconsistencies have also been observed through the ratio variables such as fuel efficiency. Observations from drivers that consumed less than two litre per 100km are excluded. Similarly, if the driver used more than 50 litre per 100km, the value was also excluded.

(25)

table shows the minimum or maximum reference values that are employed to reduce the num-ber of outliers. In addition, in some circumstances outliers are ought to be strange, but still represented a unique and possible submission. Highly inefficient vehicles for example have large fuel tanks and a low fuel efficiency. Finally, where possible, wrongly entered values are removed, while maintaining the observation.

Table 4: Outlier Requirements

Variable Requirement

Price per litre Less than 1 euro or more than 2 euro Fuel quantity Less than 3 litre or more than 100 litre current car valuation More than 100.000

Observation distance Less than 50 kilometre

Date before 2010 and after 2016

Duplicates In case the same observation prevailed more than once. Other Inconsistent valuesa

Note: (a) Inconsistent values are values that do not align with other values in the observation. For example: mileage in relation to litres

4.3.2 Panel data

Datasets are provided in a wide variety of structures. The most vital data structures are: Cross-sectional data, Time series data, pooled cross-sectional data and panel data. The dif-ference between the data types is related to the observed period, relationship and number of observations for each cross-sectional member (Leeflang et al., 2018). Cross-sectional data are single units observed in one point in time. In case the single units are observed over a longer time-period, then those are considered as time-series. If on the other hand recurring data is observed randomly over time, I consider the data to be pooled cross-sectional. The richest dataset is panel data. This dataset contains time-series for every cross-sectional member that is present in the dataset (Wooldridge, 2013). In other words, the observational data related to a single unit is uniquely observed for multiple points in time. This allows for inter-unit analysis on the differences between the observed units in relationship to the main variable.

(26)

to study the dynamics of changes. This is specifically interesting to find the effects of lagged vari-ables on driving behaviour. Third, panel data outperforms time-series or cross-sectional data through the observation of more interactions. These interactions are unable to be observed through other forms of data because the driver is unknown. Fourth, panel data is commonly attributed to more complex behavioural models. The driver is connected to the observation and therefore the effect of non-driver related influences can be excluded. Finally, panel data reduces the effects of biases related to aggregation.

In order for data to be good, it is important that the data encompasses four factors: availability, quality, variability and quantity (Vriens, 2012). First, the data has to be observed and usable to be available for further analysis (Brand and Leeflang, 1994). Second, the data has to be valid and reliable. Otherwise, the analysis and results might be biased and non-reproducible (Bear-den et al., 2011). Third, the included variables have to show variability in order to measure the impact. Finally, it is important to have enough observations that quantify the observed be-haviour over time. The size of the dataset defines the precision of an estimate and should at least exceed the number of parameters. The fuel diary dataset is able to accommodate all four factors.

(27)

4.4

Summary of resulting datasets

Table 5: Data availability: Observations, variables and users per dataset

Dataset Observations Vars.1 Panels Avg. obs. per panel2 First obs.4 Last obs.

Fuel diaries3 303.906 20 7.343 42 1-1-2002 9-11-2019

Car Specifications 76.160 83 76.160 1 N/A N/A

User Profiles 2.890 9 2.890 1 N/A N/A

Athlon prices 3.815 12 3.815 554 1-1-2015 22-12-2016

KNMI weather 43.394 42 43.394 1 1-1-1901 19-10-2019

Full dataset 303.906 166 7.343 42 01-01-2010 31-12-2016

Note: The databases are merged into the full database for further analysis.

(1) Number of variables (2) Average observations per panel (3) Verbruiksmonitor (4) DD-MM-YYYY

4.4.1 Comparison to the ideal dataset

The obtained data is highly suitable for the analysis of the stated hypotheses. Nevertheless, a more extensive and broader dataset could positively contribute to a more robust analysis. The limitations of the obtained dataset can be summarized into five factors. First, The dataset is subject to anomalies related to manual user-input. On the one hand, the fuel diaries environment has multiple protections in place that ensure the highest quality, such as preset input boxes that guide the user input14. On the other hand, the user is able to enter information that does not correctly display the actual values. Second, adding information to and updating the users fuel diary is voluntary. Drivers do not have to enroll themselves to be able to drive their vehicles. Third, the dataset only provides information about the usage of the car. Based on time, I am able to associate extra information. Contrarily, this study does not know the users personal life, behaviour and other non-observed factors that can influence the drive style. Fourth, the available information allows accurate inferences of driving behaviour based on calculated ratio’s. Minute-to-minute and automated observations of the user and car would have positively contributed to a richer dataset. Fifth, the current dataset does not describe the location of the user and where the fuel has exactly been obtained. This information could have positively contributed to a better and more detailed estimation about the users individual level price elasticity and fuel budget (discount station versus high price stations).

14Preset input boxes are available for dates, numbers, addresses and specific letters or combinations

(28)

5

Measurement and analytical strategy

5.1

Measurement of main variables

5.1.1 Fuel efficiency

The variable for fuel efficiency is comprised out of a ratio of two separate variables. The ratio is calculated based on the total quantity obtained and the total number of kilometres the user has driven. The total quantity obtained relates to the number of litres a user has obtained during his or her refilling moment at the gasoline station. The total amount of kilometers describes how much kilometre a user has driven since his last refilling moment. The ratio is calculated by dividing the total number of kilometres by the total quantity obtained times 100. Litre per kilometre = T KT Q ∗ 100 where TQ is the total quantity and TK the total number of kilometre driven in this period.

5.1.2 Refilling ratio

The refilling ratio is the ratio that is being added to the gasoline tank of the car. The variable represents a ratio between the total size of the gasoline tank in litres and the amount of litre that has been obtained during the refill moment. The ratio is calculated as follows: F illing ratio =

LO

GT Where LO is litre obtained and GT the size of the gasoline tank. The size of the gasoline

tank is known for every car that has been observed in the fuel diary dataset. The total fuel quantity obtained is known from the fuel diary dataset.

5.1.3 Price volatility

The price volatility is based on the standard deviation that has been observed in the period between the current and previous observation. The volatility is calculated with the standard deviation of all prices that occurred during an observed day. The overall price volatility that the driver on average has been exposed to is measured by taking the mean value for all the daily standard deviations that have occurred between the previous and current refilling moment. Hence, the formula that has been applied is: Period sd = (SD(Pt=1)+SD(Pt=2)+...+SD(Pt=X)

T where

(29)

5.1.4 Income levels

The exact income level of the consumer is not known. Therefore, the income level is estimated based on the current market value of the car that the driver is using. At first, a distribution for the car values is made. From this distribution the first and third quartiles are collected. Car values that are in the lower quartile ( <e 17.195) of the data are considered economical choices and the driver assumed to be a low income individual. Vehicles that are valued between the e 17.195 and e 31.690 are considered to be average income users. The high income users are assigned to vehicles that have a current market value that exceeds the amount ofe 31.690. In the dataset 21,9% of the users are assigned to the class low income. 26,8% of the users is considered to be part of the high income class and the remaining 51,3% is an average income driver. Average income drivers drive fewer kilometres compared to low income drivers, 142.621Km. and 149.816Km. respectively. High income drivers on average consume 6,98 litre per 100 kilometre. Low income individuals consume 6,20 litre fuel per 100 kilometre.

5.1.5 Price movements

The price movements are required to estimate if on average the price between the current and previous refilling moment has increased or decreased. The formula to find if a price increased during an observational period is Pt=1− Pt=2> 0, where P is the price per litre and t the time

of the observation. First, the difference between the current and previous refilling moment is calculated. Second, the sign of the value is being observed (positive or negative). Finally, a dummy variable price increased is created that observes if the price in the period has increased or decreased. The variable price increased is one in case the price in the period has increased and zero if the price for gasoline remained the same or decreased.

5.2

Control variables

5.2.1 Car related variables

(30)

5.2.2 Fuel tank capacity

The fuel tank capacity is specified by the car specification database. In this database, informa-tion is available for every car that is being meninforma-tioned in the fuel diary dataset. The value for fuel tank capacity is the number that identifies the total number of litres available in the fuel tank of the car. In the dataset, the values are between 10 and 100 litre.

5.2.3 Fuel type

In the Netherlands gasoline stations provide a wide range of fuels. The most common fuels are gasoline and diesel with a market share of 58,53% and 38,37% respectively. On average, vehicles that require gasoline are less fuel efficient compared to vehicles that drive on diesel. The difference is 7 litre compared to 5,6 litre per 100km respectively. In total eight different fuel types are observed in the dataset. For the analysis three different dummy variables are created. The variables, fueltype gasoline and fueltype diesel indicate the usage of gasoline or diesel respectively for a given car. The variable is one in case the car solely uses the related fuel type for propulsion and zero otherwise. The variable fueltype other is created to observe all other fuel types that are non-diesel and -gasoline.

5.3

Type of road

The type of road a car is driving on influences the fuel consumption. In the dataset the consumers have the possibility to indicate the type of roads they have been driving on between two refill moments. It is possible to indicate three different levels of road type: highway, urban area and both. The majority of the users are driving both on the highway and urban area. On average the urban area drivers use 7,2 litre per 100km while highway drivers use 6,25 litre. The added variables are highway driver, non highway driver and highway and non highway driver.

5.3.1 Price per litre

The variable price per litre indicates the price that the user in the observation has paid for per litre of fuel. This fuel can be gasoline, diesel, gas, electricity or similar.

5.3.2 Total kilometre

(31)

occurred. The formula: Kt=1+ Kt=2+ ... + Kt=x, where K is the number of kilometre during

the observational period and t the observation.

5.3.3 Energy labels

The average fuel efficiency is indicated by the energy label provided by the European Com-mission: Energy, Climate change, Environment. The energy label is based on the car labelling directive15and assists consumers buying a fuel efficient and environmentally friendly car16. The

criteria are based on internal and car related aspects such as the model, version, fuel, trans-mission and co2 emissions. An example is shown in figure 9. In addition, the poster like label,

being displayed on the vehicle is also created to encourage manufacturers to build vehicles with better fuel efficiency grades. The energy label has seven different stages. The letter A stands for the most fuel efficient car, while the letter G is the least efficient. In figure 7 the different efficiency labels are displayed along with the share of drivers in different income groups. In the high fuel efficiency levels (A-C) the low income groups have slightly more fuel efficient vehicles on average for group B and C, but the difference is limited. In the remaining fuel groups the difference between the income levels is small, except for energy label G. In this label the high and average income drivers are clearly over represented. On average the distribution of vehicles is equally divided among income groups in relation to fuel efficiency.

15(Directive 1999/94/EC)

(32)

Figure 7: Fuelgrade per income level

Note: The bars indicate the percentage of vehicles for a given income group in an energy segment. Low income drivers have vehicles with a value < e 17.195. Average income drivers have a car with a value between e 17.195 and e 31.690. High income drivers have a car with a value >e 31.690.

In a few cases, for example for older vehicles, energy labels are not available. In that case the the value N/A is assigned in the dataset. The fuel efficiency label is added as a dummy variable. Eight different dummies are created: Energylabel A, Energylabel B, Energylabel C, Energylabel D, Energylabel E, Energylabel F, Energylabel G and Energylabel NA.

(33)

5.3.4 Lagged variables

To observe the events in the previous period, two lagged variables are created for price per litre and total quantity that indicate the price that has been paid during the previous visit and the obtained quantity in the previous period.

5.4

Driver related variables

Besides the car specifications, also the driver in charge is able to implement driving strategies that allow for differences in fuel efficiency. In figure 9(b), the fuel consumption for different age groups is observed. The figure shows that users tend to have a parabolic consumption function that is higher for young adults, lower for adults and finally higher again in case of seniors. Second, situational motives that influence fuel efficiency are likely to be driven by weather and personal emotions. Personal emotions have not been observed in the obtained datasets, but the weather information is. On rainy days shown in figure 9(a) users tend to consume more fuel in case more rain occurs.

Figure 9: Fuel efficiencies in relation to driver factors

((a)) The fuel efficiency in relation to the amount of rain during a day in milliliters

((b)) There is seemingly a parabolic relation-ship between the age and litre per 100km. The middle-aged drivers are more fuel efficient than the younger and older drivers.

5.5

Station type

(34)

for a given location. Hence, it was found that on average highway fuel station charge at least 15% more than non-highway stations. Therefore, in case the obtained price per litre for an observation was at least more than 15% of the average price per litre then the dummy variable highway station is one. The prices are estimated for every fuel type.

5.5.1 Rain quantity and temperature

Kilpel¨ainen and Summala (2007) found that weather influences the behaviour of car drivers. Hence, a variable is created that estimates the average rain quantity during the previous and cur-rent observation. The rain quantity is obtained from the Dutch weather organization (KNMI). This organization provides a daily average of rain. The rain quantity per period is then calcu-lated with the following formula:

rain mmt=1+ rain mmt=2+ .. + rain mmt=x

T

where rain mm is the rain quantity during a given day in millimeters, t the time and T the total number of days between the current and previous refilling moment. Similar to the rain quantity, also temperature has been created as a control variable. In this case the average daily temperature in Celsius is being used for the estimation of the average temperature.

temp ct=1+ temp ct=2+ .. + temp ct=x

T

where temp c is the average temperature for a given day and T the number of days between the previous and current period.

Figure 10 shows the seasonal influence on the efficiency of the car. On average, vehicles are 3,9% more efficient in the winter months compared to the summer and autumn17. The

temper-atures on average are 4 and 18 degrees respectively. Also, the winter months have 32% higher windspeeds compared to the summer months that might affect the friction a car has to deal with. These external factors influence the car efficiency.

17Autumn has been used as a point of reference, because during the summer a larger number of drivers

(35)

Figure 10

5.5.2 Drive style

The variable drive style is a self-assigned value by the driver. The driver is asked to reflect on his or her driving behaviour and summarize the drive style into one of the three available segments. The three available segments are regular, sportive and economical. The regular driver style indi-cates that the driving behaviour was in line with the drivers average drive style. Sportive means that the driver was accelerating and breaking more frequently. Economical relates to an effi-cient drive style. The variables are divided into three different dummies. Drivestyle economical, Drivestyle regular and Drivestyle sportive. The variable is one in case the event occurred and zero otherwise.

5.6

Variable definitions

(36)

Table 6: Variable Definitions

Variable Measurement

Litre per 100km The amount of fuel consumed over a distance of 100 kilometre.

Factory litre per 100km The manufacturer specified fuel consumption over a distance of 100 kilometre. Price per litre The litre price the driver has payed for the fuel in the observation.

Fuel tank capacity The size of the fuel tank in litres.

Fuel quantity The amount of fuel obtained during the related observation.

Refill share The ratio between the quantity and the size of the fuel tank. How much fuel was added in relation to the size of the fuel tank.

Age The age of the driver.

Period sd The price volatility observed in the period between the current and the previous refilling moment.

Gender The gender of the driver.

Energy label The fuel efficiency label assigned by the European commission. A is fuel efficient G is fuel inefficient.

Fuel type The fuel type that the vehicle requires for Propulsion (Gasoline, Diesel, Natural Gas).

Drive style The driver specific drive style. economical, regular or sportive. Total kilometres The total amount of kilometres that are registered in the

fuel diary of the driver.

Highway fill-up A dummy variable that indicates if th fill-up was at a highway fuel station. Discount station A dummy variable that indicates if the fill-up was at a discount station. Highway driver A user defined dummy variable that

indicates if the majority of the driving was on the highway.

Temperature The average temperature that has been observed in the Netherlands during the day of the observation.

rain quantity The amount of rain in millilitres during the observation.

period price increased A variable that indicates if the price during a period has increased. low income A driver that owns a car of which the current value is below e 17.195. average income A driver that owns a car of which the current value is

between e 17.195 and e 31.690.

(37)

5.7

Overview of variables, measurement, and descriptive

statis-tics

The final dataset is comprised out of the previously mentioned datasets. The dataset now con-tains user specific information in line with profile, car and weather related data. The merged dataset holds a large number of different variables. These variables indicate the user, his or her driving behaviour, the environment and the surroundings. The effect of dynamic pricing on fuel efficiency is quantified in the variables litre per 100km in combination with the price volatility variable. The remaining variables and additional information has been summarized in table 7. This table provides an overview of the summary statistics.

(38)

Table 7: Descriptive Statistics

Income level specific

All observations Low Average High

Variables XAll obs Std(X). Min. Max. XLow XAverage XHigh

litre per 100km 5,97 1,76 1 14,98 5.98 6.4 7.33

manufacturer litre per 100km1 5,53 1,66 1,9 17,3 5.10 5.32 6.37

price per litre 1,47 0,195 1 2 1.52 1.45 1.42

fuel tank capacity2 52,19 10,49 10 100 41.56 52.45 61.76

fuel quantity 39,30 11,62 3 99,44 30.93 39.11 47.60 refill share3 .75 .16 .03 1 .75 .75 .77 age 42.46 11,71 16 78 39.63 43.12 43.03 period volatility4 .19 .01 .11 .21 .193 .194 .194 total kilometre 63.123 56.678 1 1.000.000 52.653 64.696 69.664 highway fill-up 0,06 .24 0 1 .058 .059 .063 discount station .32 .47 0 1 .16 .34 .42 highway driver .32 .47 0 1 .26 .32 .36 weather temperature 11.10 5.99 -12.1 27.1 10.97 11.14 11.15 rain quantity8 23.97 49.46 0 639 23.96 23.95 24.02

period price increased .538 .498 0 1 . . .

gender - male 89,63% . . . 97,18% 89,63% 99,90% - female 0,86% . . . 2,80% 0,52% 0,10% - n/a 9,51% 9,49% 9,6% 8,95% energy label5 a 5,57% . . . 2,38% 6,61% 4,32% b 11,39% . . . 13,74% 13,9% 7,14% c 15,34% . . . 22,34% 15,86% 2,89% d 16,38% . . . 18,52% 15,56% 9,84% e 8,64% . . . 8,31% 8,6% 12,93% f 8,78% . . . 8,97% 8,66% 9,14% g 29,13% . . . 17,54% 27,25% 8,45% n/a 4,77% . . . 8,19% 3,57% 45,28% fueltype6 lpg 1,33% . . . 1,71% 1,41% 0,61% lpg / gasoline 0,92% . . . 0,66% 1,01% 0,90% natural gas 0,14% . . . 0,89% 0,01% 0,11%

gas / natural gas 0,03% . . . 0,08% 0% 0,01%

gasoline 58,53% . . . 79,41% 57,04% 42,19% diesel 38,37% . . . 17,24% 40,45% 54,23% electric / gasoline 0,59% . . . 0% 0% 1,77% electric / diesel 0,05% . . . 0% 0% 0,18% drivestyle 7 regular 68,56% . . . 68,19% 67,32% 70,10% sportive 8,84% . . . 6,90% 8,62% 11,30% economical 22,6% . . . 24,91% 24,06% 17,90% N 226577 55711 111537 58599

Notes: (1) The producer mentioned fuel consumption per 100km. (2) The size of the gasoline tank in litres. (3) The share of the gasoline tank in quantity that was added during the refill.

(4) period volatility is the volatility observed between two observations.

(39)

6

Model development

In this section four different hypotheses are being modelled. The first hypothesis discusses the relationship between fuel efficiency and price volatility. The second hypothesis continues with the model constructed for hypothesis one and add income related effects. This model will test the influence of income levels on fuel efficiency in relation to price volatility. The third model estimates the influences of price increases during observational periods in addition to price volatility and in relation to fuel efficiency. Finally, the fourth hypothesis shows the influences of price volatility on the fuel tank refilling share. Additionally and similar to the first three hypotheses, also a model is created that is able to test the effects of differences between income levels.

6.1

Control variables

The hypotheses in this section apply the control variables that are presented and summarized in table 8. Therefore, below I have summarized the control variables that are selected in the control variables section. The variables will control for external effects. The applied control variables that are summarized in the vector variable Xit where t is the time and i is the driver.

Table 8: Control variables Variable

Price per litre Total kilometre Energy label A Energy label B

Energy label C Energy label D Energy label E Energy label F

(40)

6.2

Hypothesis 1 - Price volatility negatively influences the fuel

efficiency

In order to test the first hypothesis, a log-linear model is created. This model estimates the effects of price volatility on the fuel efficiency. The fuel efficiency is indicated by the variable litre per 100km.

ln Fit= α + β1Vit+ β2Xit+ it (1)

The dependent variable Fit describes the fuel efficiency. This variable is the ratio between the

total number of litres obtained for the observation and the kilometres driven, summarized for the litres per 100km. The natural log has been applied to normalize the distribution of the variable. The subscript t and i refer to the time of the specific observation and driver respectively. Variable Vit indicates the price volatility. In addition, the model includes a vector variable Xit.

This vector contains the control variables that are mentioned in table 8. Finally, the model will be estimated with Newey-West robust standard errors to control for auto-correlation and heteroskedasticity.

6.3

Hypothesis 2 - Income moderates the effects of price

volatil-ity on fuel efficiency

The model for hypothesis 2 is created to identify the effect of different income levels on the response to price volatility.

ln Fit = α + β1Vit+ β2Hit+ β3Lit+ β4VitLit+ β5VitHit+ β6Xit+ it (2)

The model for the second hypothesis uses part of the model that has been created for the first hypothesis. Four extra variables are added that indicate the income level of the driver. The first variable Fit is the dependent variable and estimates the fuel efficiency of the driver. (Litre

per 100km). The second variable is Hit where H indicates a dummy variable that denotes if the

income level of the driver is high. Subscript t stands for time of the specific observation and i the respective user. The third variable Lit is a dummy variable and indicates if the observed

(41)

indicates a low income user. The second interaction term is denoted by VitHit. Where V H is

the interaction term between the price volatility and a dummy that indicates a high income user. Similar to the model constructed for hypothesis 1, X is a vector that contains the control variables specified in table 8. The model will be estimated with Newey-West robust standard errors to control for auto-correlation and heteroskedasticity.

6.4

Hypothesis 3 - Price volatility and price increases enlarge

the effect of price volatility on fuel efficiency.

The third model estimates the relationship between the positive price volatility and the fuel efficiency. In case the prices in an observational period on average have increased in relation to the price volatility then the hypothesis assumes an increased effect positive on fuel efficiency compared to lower or equal prices.

ln Fit= α + β1Vit+ β2Wit+ β3VitWit+ β4Xit+ it (3)

This model is partly based on the model constructed for hypothesis 1 as a base model and adds two additional parameters. First, the variable Fit is the dependent variable and estimates the

fuel efficiency of the driver (Litre per 100km). Second, Vit indicates the price volatility. The

third parameter Wit is a dummy variable and indicates if the price in the observational period

on average has increased. The interaction term VitWit is added to capture the relationship

between the two factors. In the interaction term, V is price volatility and W the increased price dummy. Combined, the interaction term captures the effects of price volatility in case the prices on average have increased during the observational period. The model will be estimated with Newey-West robust standard errors to control for auto-correlation and heteroskedasticity.

6.5

Hypothesis 4 - Price volatility negatively influences the

re-filling share

The fourth hypothesis describes that a relationship is expected between price volatility and the refilling ratio. Specifically, a higher price volatility is expected to reduce the refilling ratio. In order to observe if the effect is present, a new log-linear model will be estimated with F illing share as the dependent variable.

(42)

The dependent variable refilling share is displayed as Sit. The variable is a ratio between the

fuel quantity obtained and size of the gasoline tank. The second variable V is the observed price volatility and observes the period where the fuel was consumed. The subscript t stands for the time of the specific observation and i the respective user. The variable X is a vector that contains all the control variables mentioned in table 8. The model will be estimated with Newey-West robust standard errors to control for auto-correlation and heteroskedasticity.

In addition, model 5 is a sub-analysis and will test if similar to hypothesis 2, income levels influences the behaviour of a consumer in relation to the refilling ratio. In this model the income levels are both interacted with the price volatility. It is expected that a lower income drivers have a stronger negative response to price volatility and therefore lower the refilling ratio.

ln Sit= α + β1Vit+ β2Hit+ β3Lit+ β4VitLit+ β5VitHit+ β6Xit+ it (5)

The dependent variable S shows the refilling ratio. The variable is a ratio between the fuel quantity obtained and size of the gasoline tank. Similar to the second hypothesis, additional variables will be added for income levels. The first variable Hit is the dummy variable that

is one in case the driver is assigned to the high income class and zero otherwise. The second variable, L, is also a dummy variable that is one in case the driver is marked as a low income driver and zero otherwise. The relationship between income and the price volatility is captured in two extra interaction terms. The first interaction term is denoted by VitLit. Where V L

is the interaction term between the price volatility and a dummy that indicates a low income user. The second interaction term is denoted by VitHit. Where V H is the interaction term

(43)

6.6

Summary of parameter abbreviations

In the table below a summary is shown of the variable abbreviations that are being used in the models.

Table 9: Model variable descriptions Variable Measurement

F Measures the fuel efficiency

V Measures the price volatility observed in the observational period.

H Indicates a high income driver L Indicates a low income driver X Vector for control variables

S Variable for the refilling share of the gasoline tank

W Dummy variable that indicates if the price per litre increased during an observational period.

Referenties

GERELATEERDE DOCUMENTEN

To follow the same road used for measuring CBBE (the success of a brand ), we will measure the success of a line extension by using the consumers attitude towards the extension

It is showed that the WTP for the single elements (design and functional customization) is higher than the WTP for both elements combined. Women would pay 15.6 EUR

The varied results also apply in the case of gender diversity, where in the case of SDROA, a female on the board directors leads the firm to take less risk, while, in the

The effect of the Asian countries with a high level of judicial independence on the relation between the dependent and independent variable deteriorates the negative

How much influence the turmoil has on volatility differs per industry, due to the fact that crisis started in the financial and real estate sector those industries are also

According to the results of data regression analysis, I found that for high-tech manufacturing firms from Western Europe, the relationship between internationalization and

The coefficient for the income level variable, demonstrate a positive output (0.288) which is in line with the expectations since oil price increase has more impact on

This study concludes that it is certain that the BEPS regulation increased the overall transparency and guidance in terms of its implementation, by splitting the transfer pricing