CLICK & BRICK RETAILING: STATES IN THE CUSTOMER JOURNEY

(1)

THE CUSTOMER JOURNEY

A Hidden Markov Model on Clickstream Data

By

NIEK VAN DER WERF

(2)

THE CUSTOMER JOURNEY

A Hidden Markov Model on Clickstream Data

Niek van der Werf Marktstraat 10b, Groningen h.n.van.der.werf@student.rug.nl

+31 6 16 03 02 62 S3788024

Master Thesis - MSc Marketing Intelligence University of Groningen

Faculty of Economics and Business June 15, 2020

First supervisor: Frank T. Beke

(3)

Abstract

This research models the customer journey by looking at omnichannel dynamics in customer behavior. Clickstream and transaction data are used to reveal differences in omnichannel patterns. Two research questions guide this study. (1) Which states in the customer journey can be identified, based on customers’ omnichannel behavior? And (2) what characteristics matter most in the different states? Specifically, a hidden Markov model is developed to distinguish different states in the customer journey. The model is used to estimate (1) the state a customer starts its journey in, (2) the transitions among latent states in the journey, and (3) the relative importance of different indicators across the states. Four states are found: Just Started, Orientation, Omni-Enthusiasticand Becoming a Nethead. The results show that people in earlier states use mobile devices, enter through firm-initiated contacts, and are less likely to convert. In later states, non-mobile device use increases, just as customer-initiated contacts. In the third state, customers are most likely to convert offline, whereas the fourth state shows an online dominated purchase behavior. Additional estimations on variables show robustness of the model. This study contributes to the realms of the customer journey, clickstream analyses, and hidden Markov chain applications. The model enables marketers to gain insights in device use, channel use, navigational behavior, and conversion throughout different states in the journey. Altogether, this research provides a refreshing look at how customers behave throughout the different states in an omnichannel customer journey.

Keywords: Customer Journey Mapping, Omnichannel, Touchpoints, Clickstream Data, Hidden Markov Model, Latent States.

(4)

Preface

Years from now, this time will be remembered as the time of Corona. During this turbulent period, I wrote my master’s thesis to finalize the MSc Mar-keting Intelligence. It represents the end of two years at the University of Groningen. The past months stood in the light of finding a research direction, preparing the data, designing a model, and interpreting the results. I used many software programs and tools, which hopefully come in handy in the future. The virus forced me to spend many hours in my student room, which sometimes made me feel the walls were closing in on me. Many different states of mind emerged. From despair to relief, and from resignation to fulfillment. In that sense, I could have studied the different states in writ-ing a master’s thesis too. Nevertheless, the final result made it all worthwhile.

An array of people deserve and have my appreciation for their help in writing this thesis. First, I want to thank Frank Beke for his guidance in the process. Secondly, I thank my fellow students for some useful feedback and for often helping me out in the data cleaning process. A special thanks goes out to Tammo Bijmolt, the hidden Markov model specialist, who really helped me with the modeling part. Next, I thank my friends and family, who always supported and motivated me. Some of them even provided perceptive com-ments on the draft form of this thesis. Finally, throughout the process, no one was more on my side than Pien. And for that, I am very grateful. Altogether, it has been a great learning experience, and I truly enjoyed writing my thesis. Therefore, I hope you enjoy reading my thesis as well.

Niek van der Werf Groningen, June 2020

(5)

1

Introduction

In 1900, the Russian mathematician Andrei Andreevich Markov (1856-1922), wrote a paper containing proof of an inequality for algebraic polynomials (Basharin, Langville, and Naumov, 2004). He extended the law of large numbers and the central limit theorem to sequences of dependent random variables, better known as Markov chains. Little did he know that years later, this method would be applied to one of the most relevant topics in marketing research, the customer journey (Marketing Science Institute, 2020). Markov’s paper formed the basis of the so-called hidden Markov model (HMM), a popular method to uncover sequences in data (Netzer, Ebbes, and Bijmolt, 2017). Famous examples can be found in economics and finance (Rydén, Teräsvirta, and Åsbrink, 1998), engineering (Juang and Rabiner, 1991), bioinformatics (Krogh et al., 1994) and education (Vermunt, Langeheine, and Bockenholt, 1999). This study proposes a new application of the HMM: to map the customer journey. The customer journey is the series of actions a customer takes, to arrive at the moment of purchase (Lemon and Verhoef, 2016), and beyond (Bijmolt et al., 2019). The customer journey recently gained momentum in marketing research, and with that, many purchase funnels are developed. However, with the rising attention for omnichannel retailing (Barwitz and Maas, 2018), and the ever-changing use of devices, channels, and touchpoints, the call for a more relevant view on navigational behavior in the customer journey arises. Because customers are not engaging through one or another channel anymore, different paths can lead to a purchase. With the rocketed use of the internet and mobile devices, customers started switching between online and offline touchpoints and channels before making an online or offline purchase. However, the selfish approach of marketers towards mapping the customer journey has lead to a

(8)

lack of holistic view (Kaushik, 2015). Kaushik, Google’s ‘digital marketing evangelist,’ therefore developed the famous See, Think, Do, Care marketing framework. Upon which many policymakers build their strategy. Although many purchase frameworks have been created and used, it is difficult to capture a customers’ movement. The significant advantage that we have today is the available amount of data. Especially, big data has become a power source. Hence the resemblance with oil. Data-mining giants like Amazon, Facebook, or Google establish themselves as masters of the universe (Marr, 2018). This study utilizes clickstream data as input for the HMM. By analyzing millions of rows, patterns can be recognized. The aim is to build a useful model that delivers new insights into the realm of omnichannel web analytics. The main objectives are (1) to find different states in the customer journey based on omnichannel behavior patterns using a HMM and (2) to make clear what characteristics weigh most in which states. The states, each with their own characteristics, will be described according to device use, touchpoint behavior, page clicks, and conversion. Site-centric clickstream and transaction data of a retailer is used, with the extension of offline sales incorporated in the data set.

1.1 Problem Statement & Research Questions

(9)

This study uses clickstream data to map this behavior. Clickstream data is the electronic record or internet usage (Bucklin and Sismeiro, 2009). However, literature lacks insights into a holistic view of actual web behavior with both online as offline orientation. The solution is offered in the form of a HMM. A HMM can incorporate multiple aspects of the customer journey, to paint an overall picture of customer behavior. Using the proposed HMM in this study, a marketer should be able to distinguish clear states in the omnichannel customer journey, each with their characteristics. Therefore, two research questions are formed and formulated as:

1) Which states in the customer journey can be identified, based on cus-tomers’ omnichannel behavior?

2) What characteristics matter most in the different states?

1.2 Contribution & Relevance

(10)

Kawazu et al., 2016) and customer journey related issues as customer relationship man-agement (Netzer, Lattin, and Srinivasan, 2008), a combination of the two remain scarce. Especially with an omnichannel approach. This study adds the following application of a HMM: to capture dynamics in the omnichannel customer journey using site-centric clickstream data. Moreover, from a managerial perspective, this study provides insights in navigational behavior of customers who switch between devices, channels and buy both online as offline. By determining states, a marketer can better serve and target its customer base. Besides, personalized offers can be made when a marketer knows in which state a customer is.

1.3 Outline of this Thesis

The outline of the remainder of this thesis is structured as follows:

Chapter 2. Theoretical Framework.Provides the theoretical background. The covered topics are ‘The Customer Journey,’ ‘Touchpoints,’ ‘From Multichannel to Om-nichannel,’ ‘Clickstream Data’, The Journey as a Funnel, and ‘The Hidden Markov Model’. The chapter ends witch the conceptual model of this study.

Chapter 3. Research Design.Describes the origin of the data set used in this study and the model building process. The three main components of the model are given, as well as a graphical overview.

Chapter 4. Application To The Data. First shows the data cleaning process and the variable transformation. Next, it gives a description of the sample and compares different models based on information criteria. The results of the model are discussed, and robustness checks are done.

Chapter 5. Discussion.Extends the findings towards a broader perspective. A reflection on the conducted research is given. The added value of this thesis in terms of managerial, as well as academic implications, are discussed. Further, the limitations of this study and recommendations for future research are addressed.

(11)

ness, however remote, they have an opportunity to form an impression.”

– Jan Carlzon, Former CEO of SAS Group

2

Theoretical Framework

One of the critical aspects of managing the customer journey is knowing the so-called moments of truth (Hyken, 2016) or key events in a customer journey. For a business to be successful, every interaction has to be managed carefully to create a positive outcome, as Carlzon (1987) already stated a long time ago. This chapter discusses the existing literature on the concepts this research touches upon. First, the concept of Customer Journey will be described. Next, within that scope, touchpoints, omnichannel retailing, clickstream data, funnel frameworks are discussed. Also, related theories on HMMs are given. Two hypotheses are drawn, and the last section provides the conceptual framework.

2.1 The Customer Journey

Every purchase has a history. Some started a long way back, and some were made in the glance of an eye. Fact is that every purchase starts at a certain point. This is indicated as the start of the customer journey. Ideally, the customer journey starts with the recognition of a need and ends with a post-journey evaluation, hereby passing through several stages (Bijmolt et al., 2019). Another definition is the series of actions a customer takes to arrive at the moment of purchase Lemon and Verhoef (2016). Følstad and Kvale (2018) define the customer journey as a sequence of direct and indirect contacts between the customer and a firm and (Baxendale, Macdonald, and Wilson, 2015) define the customer journey as the search process consisting of a number of discrete encounters with varying touchpoints. Note that not all authors incorporate a post-purchase stage. The Marketing

(12)

Science Institute (2020) highly prioritizes understanding the customer journey. The rise of digitalization and the proliferation of new touchpoints has turned the ostensibly linear path of the customer journey into a far more complex path of purchase (Herhausen et al., 2019). Resulting in more extensive and versatile customer journeys (Edelman and Singer, 2015; Srinivasan, Rutz, and Pauwels, 2016), forcing retailers to form a ”360-degree view” on how customers interact and leverage multiple channels in the customer journey (Kannan and Li, 2017; Li et al., 2020). In the end, marketers want the journey to be as seamless as possible to enhance the customer experience. Customer experience is a multidimensional construct focusing on a customer’s responses to a firm’s offerings during the entire customer journey (Lemon and Verhoef, 2016). Multiple touchpoints in each phase of the sequence and past experiences can influence the future customer experience (Følstad and Kvale, 2018). This paper partially follows Herhausen et al. (2019) by defining the customer journey as customers’ search and purchase usage of all online and offline touchpoints from various sources.

2.2 Touchpoints

(13)

we speak of a touchpoint. This could be the search for a product on the website, social media interaction, or a physical visit to the store. Often, a customer iterates a few times over the same or different touchpoints (Bernard and Andritsos, 2017).

Touchpoints can be distinguished in firm-initiated contacts (FIC) (e.g., paid search) and customer-initiated contacts (CIC) (e.g., organic search). The difference between the two is the one who initiated the contact. When the firm initiates the contact through e.g., advertising, they determine the timing and exposure. In case of customer-initiated touchpoints, the customer triggers the communication (Anderl et al., 2016). Research shows that the origin of contact is an important differentiator for online marketing channels (De Haan, Wiesel, and Pauwels, 2016; Li and Kannan, 2014; Wiesel, Pauwels, and Arts, 2011). Attributing conversion has become more relevant since the discovery of the last click heuristic. Anderl et al. (2016) found that the contribution of firm-initiated touchpoints is often underestimated, and that of customer-firm-initiated touchpoints is overestimated. Li and Kannan (2014) investigated carryover and spillover effects and found that conversion credit often differs significantly from the last click attribution. This provides relevant insights for marketers in terms of budget allocation. In this study, all interactions between customers and firms that involve any transactional or informational exchange, and that can be measured by web analytics and sales data, are captured in the definition of touchpoints. Although the importance of other touchpoints is acknowledged, it is impossible to include all of them in this study. Other studies have included competitor-owned touchpoints and partner/third party touchpoints, such as word-of-mouth (WOM) (Baxendale et al., 2015). However, since a firm is mostly able to influence their own touchpoints, the influence of their own touchpoints is prioritized. Thus, it is decided to focus on the firm’s own touchpoints.

2.3 Multichannel vs. Omnichannel

(14)

charac-terized by the switching behavior of customers between channels in every stage of the journey. Moreover, customers have recently started to use different channels simultane-ously (Bijmolt et al., 2019). Hereby making the omnichannel approach a central theme in marketing research (Verhoef et al., 2015). Omnichannel retailing is a relatively new term in the marketing context, which can cause some ambiguity, mainly since multichannel and omnichannel are often used interchangeably. Because this research studies the behavior across multiple touchpoints and channels before arriving at a purchase, it is nec-essary to explain how the term omnichannel was derived from the multichannel literature.

Multichannel is often referred to as the presence of multiple channels that work inde-pendently from each other (e.g., website and store). It shows its value by an increased customer experience (Lemon and Verhoef, 2016) and increased sales (Ansari, Mela, and Neslin, 2008; Venkatesan, Kumar, and Ravishanker, 2007). However, the multichannel view is often limited to the store, the online website or catalog, and thereby neglecting other channels such as mobile channels and social media (Verhoef et al., 2015). Mul-tichannel focuses on the separate channels, whereas omnichannel looks at the overlap of channels (cross-channel) to provide a seamless customer experience (Verhoef et al., 2015). Omnichannel distinguishes in the potential for channel switching across and within stages (Bijmolt et al., 2019). It is about creating an overall retailing experience that is the same across channels and touchpoints (Blom et al., 2017; Verhoef et al., 2015). This is important because channel integration positively relates to sales (Cao and Li, 2015) and leads to a competitive advantage (Herhausen et al., 2015).

(15)

this study aims at finding different states in the customer journey, it is relevant to take an omnichannel approach in this manner. The way a customer interacts with a company could potentially say a lot about how close someone is to conversion. Communications could influence both immediate or postponed purchases, throughout multiple channels (Verhoef, Neslin, and Vroomen, 2007). In the same sense, a store itself could function as a fulfillment channel to convert pre-existing purchase intentions. It could also trigger conversions made in a later state (Baxendale et al., 2015). Following the previously mentioned literature, the following definition is used in this study. Omnichannel retailing is creating an optimal synergetic customer experience across numerous channels and touchpoints. Note that this definition reflects on the touchpoints described in the previous section.

2.4 The Journey as a Funnel

(16)

use of a Markov chain to map the customer journey based on channels as well. De Haan et al. (2016) extend the literature by looking at the effects of online and offline chan-nels. They model the stages as a website funnel, from the homepage to the checkout page.

Most of these funnels have in common that they consider a customer as initially disen-gaged and not deliberating a purchase. Then, the customer is confronted with different touchpoints from a retailer and moves into a state of awareness. Next, she transitions to a consideration stage to engage in orientation activities such as web browsing. Finally, the customer converts or not (Abhishek et al., 2012). After finalizing the purchase, the customer reaches the post-purchase stage, which covers aspects related to the brand or product that happened after the purchase (Frambach, Roest, and Krishnan, 2007; Lemon and Verhoef, 2016). This suggests the customer journey should be viewed as a loop rather than a funnel. The loop relates to the post-purchase path seen in repetitive purchases or the customer reentering the process, starting at the pre-purchase stage again (Court et al., 2009; Lemon and Verhoef, 2016). Modeling purchase funnels provide the foundation for thinking holistically about the customer experience, as a process that customers go through (Lemon and Verhoef, 2016). Other studies show that a purchase funnel includes a stage prior to, during, and after a purchase (Neslin et al., 2006; Court et al., 2009; Lemon and Verhoef, 2016; Engel et al., 1978). This study is mainly focused on defining the first two parts of a customer journey (i.e., the path-to-purchase).

2.5 Clickstream Data

(17)

applying data mining techniques to the discovery of usage patterns (Srivastava et al., 2000). More recently, Park and Vasudev (2017) referred to WUM as analyzing Web users’ interaction, data including Web server access logs, user queries, and clicks, in order to discover Web user’s behavioral patterns. Many studies have already studied web service analytics. Early studies investigated website browsing behavior using clickstream data (Bucklin and Sismeiro, 2003; Park and Fader, 2004). Soon, researchers extended this by addressing conversions when analyzing web behavior (Moe and Fader, 2004; Montgomery et al., 2004). Also, advertising effects (Chatterjee, Hoffman, and Novak, 2003) were taken into account. More recent research focuses on predicting conversion (Park and Vasudev, 2017) and linking online conversion to timing and visit patterns (Park and Park, 2016). While most of these studies provide substantial insights into online behavioral patterns and conversion, they typically focus solely on online purchase behavior, often within the same visit. This study extends this view by incorporating offline conversions, and by allowing for finding important indicators, that are of use in earlier stages of the purchase process. This tackles the often-made mistake to misattribute conversion to the last stage.

(18)

It is often seen that groups of users with similar characteristics are put together. These characteristics can be demographic or behavioral of nature. When a subgroup is described based on known characteristics, such as demographics, it is called segmentation (Wedel and Kamakura, 2012). When a subgroup is defined based on behavioral characteristics, this is called clustering (Mobasher and Liu, 2007). This work is related to Park and Park (2016), and Zhang, Bradlow, and Small (2015), by looking into the clumpiness (i.e., clustering phenomenon) of data patterns and relate it to customer behavior. The purpose of clustering users is to find a similar browsing pattern across sessions, mainly to derive business intelligence (Dias and Vermunt, 2007; Mobasher and Liu, 2007). In this case, the clusters represent a state in the customer journey. Within sessions, multiple characteristics can be investigated. Mallapragada, Chandukala, and Liu (2016) found that page views positively affect basket value and session duration negatively. De Haan et al. (2018) find that switching from mobile to desktop increases the conversion probability. Montgomery et al. (2004) prove that navigational patterns through page categories can give valuable insights. In this study, similar characteristics are investigated.

2.6 The Hidden Markov Model

(19)

A HMM is a combination between a standard latent class model and a Markov chain. Basically, it is a latent class cluster model in which customers are allowed to switch between the clusters over time (Vermunt and Magidson, 2016). Only in this case, the clusters are called latent states. HMMs are used to model how a set of hidden states can be derived from one or multiple sequences of observations. In the context of marketing, often the customer’s state in purchasing behavior is meant. Using a latent class analysis as HMM, for clustering, provides multiple advantages opposed to traditional methods. First, latent class analysis helps in determining the number of clusters based on the data (McLachlan and Peel, 2000), in this case, states. Second, HMMs can deal with different measurement levels in the data set (Vermunt and Magidson, 2016), and third, latent class analyses outperform traditional approaches (Vidden, Vriens, and Chen, 2016). This study adopts similar procedures of other marketing applications of the HMM on clickstream data. The states in the customer journey are unknown upfront, and the influ-ence of clickstream related characteristics on the purchase funnel is derived from the data.

This study is related to the previous literature in terms of mapping the customer journey, clickstream data analyses, and by using a HMM to capture dynamics. A novel point is that this work incorporates offline transaction data into online customer sessions. Another unique point is that the sessions will function as a time variable, instead of mainstream time variables such as days or weeks. Lastly, this study deals with latent states of users rather than predefined purchase funnel stages. This allows an exploratory view of the data and limits tunnel vision. Note that the term stages is often used when defining a purchase funnel framework. This study develops a HMM and therefore the phase of a journey is referred to as state. Drawing on the research questions, given in the introduction of this research, the following two hypotheses will form the basis of this study:

H1: The proposed model will yield significantly different states in the cus-tomer journey based on behavioral patterns in the clickstream data.

(20)

2.7 Conceptual Framework

To visualize the hypotheses suggested in this study, figure 2.1 provides an overview of the design of this study. From bottom to top: First, the clickstream behavior, including the conversion, will be analyzed. It is expected that different sessions show different behavioral patterns. Thus, this behavior will be compared throughout the various ses-sions. Lastly, based on the dynamic behavior throughout sessions, multiple states in the customer journey will be identified, described, and compared.

Figure 2.1: Conceptual overview of this study

(21)

– George E.P. Box, Statistician

3

Research design

The previous chapter gave the theoretical framework of the customer journey, clickstream data, and some global background theory on HMMs. This chapter explains how this study was executed and dives deeper into how the proposed HMM was built. As implied above, it is impossible to create a perfect model. However, this study aims at building a useful model that delivers new insights into omnichannel web analytics. Hence, the objectives of this study are (1) to find different states in the customer journey based on omnichannel behavior patterns using a HMM and (2) to make clear which indicators weigh heaviest in each state. First, the origin of the data set is described. Second, the model specification and estimation procedure of the proposed model are given.

3.1 Data

The data for this study was delivered by a Dutch digital commerce bureau. The database consists of clickstream data from a Dutch furniture retailer, collected over a period of 10 weeks, from December 18, 2019, till February 29, 2020. The data was collected by Google Analytics. In total, four types of data were collected. (1) Over 1 million unique sessions were stored, in which information such as channel, device, and session duration was collected. A session starts when a user enters the website and ends when a user leaves the website, after 30 minutes of inactivity and at midnight. (2) Online and offline transactions were collected in a different database and aggregated on session-level. Offline conversions were linked through a confirmation mail. In case of an offline transaction, users need to determine a moment of delivery via mail. When doing this,

(22)

Google Analytics automatically combines the offline transaction and the online session. This enables insights into online-offline switching behavior. An online session might, for example, end at a store locator page, with an offline conversion as a result. This could be proof of web-rooming behavior (Brynjolfsson et al., 2013). (3) Approximately 30 million rows of clickstream data were stored. The log file recorded all online clicking behaviors in the form of URLs and events. This is in line with web analytics research (Kawazu et al., 2016; Li et al., 2020; Montgomery et al., 2004). By using timestamps and sessionId’s, the data was structured. This data was mainly used to determine which type and the number of webpages are visited within a session. (4) Besides visiting specific pages, users can also trigger events, which are measured by Google Analytics. All events were incorporated in dummy form. For every session, a user has either triggered the event or not. The differently aggregated databases were merged into one. The userId’s are used to link all sessions to the same user. This enabled the researcher to track a user’s web behavior over time. An overview of the databases is provided in table 3.1.

Table 3.1: Databases and Aggregation Levels

Aggregation Lvl UserId

Database Sessions(1) Transactions (2) PagePath (3) Events (4)

Colums UserId SessionId SessionId SessionId

channelGrouping Online_Revenue TimeStamp Appointment

deviceCategory Offline_Revenue pagePath Magazine

pageViews Newsletter

uniquePageviews Storefinder

sessionDuration

Time Span 18/12/2019 - 29/02/2020

(23)

the data set contains four out of the five elements for an ideal CRM database (Winer, 2001): transactions, omnichannel customer contacts, response to marketing stimuli, and longitudinal data. The data only lacks descriptive (demographic) information.

3.2 Model Specification

This section is built on chapter 14 from the book ‘Advanced Methods of Modeling Markets’ Leeflang et al. (2017), written by (Netzer et al., 2017). They extended the general hidden Markov framework of Zucchini et al. (2017) into a marketing application. This study proposed a HMM on clickstream data. The goal is to capture the dynamics of a sequence of observations over a longer period of time. The time sequence is hereby the important distinguishing factor. In this case, the time sequence is captured by the number of sessions, instead of regular time sequences such as weeks or days. The captured outcomes per session are characteristics such as channel, device, and page views. Importantly, for the proposed HMM, it is assumed that the customer is aware of her latent state and acts according to the state. For the researcher, only the sequence of observations is available. Therefore, the hidden states, the transitioning between the states, the observations given a state and the number of hidden states K, need to be derived from the data. In figure 3.1, a graphical overview of the proposed HMM is given.

(24)

The HMM describes a customer’s transition among a finite set of latent states in the customer journey over a longer period of time. Latent means that the states are not observed. Each customer belongs to a certain state S at a certain moment in time t but can switch between states over time. The states a customer goes through in the customer journey are noted as Si = [Si1, Si2, . . . , SiT]. However, since the states are hidden, we

only observe a set of outcomes Yi = [Yi1, Yi2, . . . , YiT] for every individual in the set of

customers (i = 1, . . . N ), at time T . The number of states is derived from the data. The transition between the hidden states is characterized by a Markovian process (Netzer et al., 2008). That is, the current state s of customer i in time period t is only dependent on the previous time period t − 1. The proposed HMM consists of three parts. In the next sections, the three components will be elaborated. A customer can transit among K states over T time periods. Mathematically, this can be written as:

P (Yi1, Yi2, ..., YiT) =

X

s1,s2,...,sT P (Si1 = s1) T

Y

τ =2 P (Siτ = sτ|Siτ −1 = sτ −1) × T

Y

ν=1 P (Yiν|Siν = sν) (3.1) Where:

• The initial state distribution P (Si1 = s1), s1 = 1, 2, . . . , K), is represented by a

1×K row vector π.

• Transitional probabilities P (Sit+ 1 = st+1|Sit = st) for (st+ 1, st= 1, 2, . . . , K,

are represented in Qt, a KxK transition matrix.

• The state-dependent distributions of observed activity P (Yit|Sit = st), st =

1, 2, . . . , K, are represented in Mt, a KxK matrix.

3.2.1 Initial State Distribution

(25)

state distribution. The vector of initial state probabilities, where π0 = [π1, π2, ..., πS]

is the probability of a customer being in state K at the first time period πk = P (Si1 =

s1), s1 = 1, 2, ..., K). Since πs is a probability, it should always be between 0 and 1

(0 ≤ πs ≤ 1) and the sum of π0 always equals 1 (PS_s=1πs = 1). Multiple options for

specifying the vector of initial state probabilities are possible. The most general form is to estimate π directly (Netzer et al., 2017). The estimation is done by using a vector of K − 1 parameters. In order for this method to function well, it is assumed that all customers in the data set started their journey within the time period.

3.2.2 Transitional Probabilities

The flow through different states is normally not identical for every customer. Customers might jump from the first state directly to the last state or do not end up in the last state at all. The chances of moving from one stage through the other are called the conditional probabilities P (Sit|Sit), where the probability of a customer being in state

t + 1, depends on the current state t. The transition between the states is represented by a KxK transition matrix of conditional probabilities, Q (see equation 3.2). The more states, the larger the matrix. The first row in Q contains the conditional probabilities of a customer being in one of K states, giving their current stage is the first. That means, q11

represents the conditional probability of P (Sit+ 1 = 1|Sit = 1), q12the probability of

P (Sit+ 1 = 1|Sit = 2), and so on. The same principle is applied to all rows in the matrix.

In general, it holds that qstst+1= P (Sit+ 1 = st+1|Sit = st) for st+ 1, st= 1, 2, . . . , K.

Because every element in matrix Qis a probability, the numbers are between 0 and 1 (0 ≤ qiss0≤ 1 ∀ s and s0), and every row of the matrix adds up to 1 (

PS

s0=1qiss0= 1∀s).

In this study, the transition matrix is homogeneous with Qt = Q for t = 1, 2, . . . , T ,

(26)

Q = s1 s2 . . . sK             s1 q11 q12 · · · q1K s2 q21 q22 · · · q2K .. . ... ... . .. ... sk qK1 qK2 · · · qK3 (3.2)

3.2.3 State Dependent Distribution

When modeling the state-dependent distributions, the observed data Y , depending on the latent state at time t, is directive. The probability distribution of Yit, P (Yit|Sit), solely

depends on the current state, given that the latent state Sitis known. That means temporal

dependencies of observations are completely driven by the latent state membership over time. Without the condition of a customer’s state, conditional probabilities are indepen-dent over time. The parameters are derived from the data and used the notice significant differences between the states. This is the most flexible component, as many variables can function as the observed outcome Yit. In the proposed HMM, multiple outcomes

are observed per given state, called indicators. First, variables with a multinomial distri-bution are included (Conversion, channelGrouping and deviceGrouping). More in this in Chapter 4. Multinomial variables are modeled using a multinomial logit. A similar approach is used on count data. Only here, variables have a Poisson distribution. The used count data variables are sessionDuration and all pageview-related variables (see Chapter 4). Often, covariates are included in the model, to measure effects on the initial states and the transition matrix. However, this study aims at finding indicators per state. Hence, no covariates are included. Since state-dependent distributions differ across the K states, for each state st = 1, 2, . . . , K, one set of coefficients βst is defined. Matrix

Mtin equation 3.3 collects the probabilities of a customer in time t as a KxK diagonal

matrix. The elements P (Yit|Sit= st) are noted on the diagonal.

(27)

3.3 Estimation Procedure

The estimation was done in software program Latent Gold 5.1 (LG) (Vermunt and Magidson, 2016). Indicators and distributions of the variables were selected, and the parameters were estimated. Lastly, information criteria determined the best fitting model.

3.3.1 EM Algoritm

In LG, parameter estimates are obtained by the EM (Baum-Welch) algorithm (Dempster et al., 1977; Paas et al., 2007; Netzer et al., 2017). The algorithm leads to the maximum likelihood for the occurrence of the manifest patterns (Paas et al., 2007). The idea is to treat the latent state memberships as missing data. An iterative algorithm finds the parameters that maximize the likelihood function (Netzer et al., 2017). Since LG did this for us, there is no need to explain the algorithm further. Once the parameters were estimated, they were judged for significance. However, for interpretation, this study preferred to look at the probabilities and means of the variables. For multinomial data, probabilities give a clear overview of the distribution amongst the outcomes. For count data, means show most-visited pages and relative differences between categories.

3.3.2 Model Comparison

(28)

the solutions, they will be moved to act.” – Bill Gates, CEO of Microsoft

4

Application to the Data

Up to now, the theoretical background and the practical elaboration of the data collection and model building have been given. This section describes the empirical application of the proposed HMM on the available data. The goal is (1) to show how the model is able to capture the dynamics in the different states of the omnichannel customer journey and (2) to provide meaningful insights on the factors derived from the model. In order, the process of data preparation, a description of the sample, a model comparison, and the results of the chosen model is described. Lastly, robustness checks are done.

4.1 Data Cleaning

First, the data is cleaned. This is done in the statistical program R. In general, due to the large amount of data and the small amount or irregularities, it was chosen not to impute data. The data set is aggregated at session-level, where every row represents a session. Multiple rows are assigned to the same userId, and page-related information is added to the session rows. First, duplicated sessions were deleted. Next, userId’s with one or two sessions were deleted since they do not represent a time sequence. Also, sessions with missings and weird-looking userId’s were left out. Further, sessions with a duration of less than ten seconds were labeled as bounces and deleted, since they do not represent a substantial part of the customer journey. During the merging, not all sessionId’s could be retrieved from all four databases (see section 3.1). Only matched sessionId’s are left in. Lastly, three sessions were chosen as a threshold in order to represent a time sequence. The final database consists of 190,040 sessions and 42,471 users (see figure 4.1).

(29)

Figure 4.1: Data Cleaning Funnel

4.2 Variable Transformation

As analyzing every page would become too extended, it was decided to classify pages, in line with Montgomery et al. (2004). Hence, the pagePath column was searched for keywords, and pages were assigned to categories. To elucidate, table 4.1 shows all categories and an example page. Page visits per category were added to the data.

Table 4.1: Example pagePath and Classified Categories

SessionId pagePath Category

1555980068532.58iywem / Homepage

1555980068532.58iywem /search/ SearchPage

1555980068532.58iywem /wonen-banken-hoekbanken CategoryPage 1575155559887.6wycoada /p/3-zits-more-grijs-leer/45578/ Productpage

1575155559887.6wycoada /account/login/ Accountpage

1575155559887.6wycoada /advies/wat-is-notenhout/ AdvicePage

1551899257485.6mx1ipw /inpsiratie InspirationPage

1551899257485.6mx1ipw /meubel-outlet Sale

(30)

Moreover, four events are incorporated into the tracking of user behavior (see table 4.2). The appointment dummy is triggered when users request an appointment on the website. Appointments can be made to get advice or pick up products in-store. The magazine dummy is triggered when people request the magazine, possibly to get inspired or to get more information (Böttger et al., 2017). Similarly, there is the newsletter dummy, and lastly, the store finder dummy is triggered when users use the online tool to find physical stores nearby. This event can also be triggered from a product page.

Table 4.2: Event Dummies

SessionId Event Type 0/1

1548976261142.z4na7zxb Appointment 0

1548976261142.z4na7zxb Magazine 0

1548976261142.z4na7zxb Newsletter 0 1548976261142.z4na7zxb Storefinder 1

(31)

Table 4.3: Modification variable channelGrouping

Original Category New Category Initiated by

Criteo Advertising Firm

Display Advertising Firm

E-mail Social Firm

Paid Social Social Firm

Social Social Customer

Branded Paid Search Paid Search Customer Generic Paid Search Paid Search Customer

Organic Search Organic Search Customer

Direct Direct Customer

Affiliates Referral Customer

Referral Referral Customer

Although a HMM is able to process different kinds of response distributions, such as normal or Poisson, it was decided to transform the response variable, conversion, into a multinomial variable with three outcomes: online conversions, offline conversion, and no conversion. The reason is that the goal is gaining insights on switching behavior between the type of conversions rather than the amount of conversion itself.

4.3 Sample Description

(32)

Table 4.4: Sample Description

Characteristics Count

Users 42,471

Sessions 194,040

Users with Conversion 1,465

Sessions with Conversion 1,543

Users with multiple Conversions 67

Mean Sessions per User 4.57

Median Sessions per User 4

Mode Sessions per User 3

Minimum Number of Sessions 3 Maximum Number of Sessions 108

In the sample, three types of variables with different distributions are included. All variables are used as indicators in the HMM. That means that these variables are used to characterize the state a user is in. Additionally, a time variable is included. This time-varying variable labels the sequence by assigning the number of sessions (i.e. 1st

session, 2nd_{session, etc).}

(33)

represents the device used per session. Mobile represents most of the sessions, followed by desktop. Tablet is used least.

(2) Count. Ordinal data variables with Poisson distribution. These discrete variables are non-negative and reflect the number of occurrences of specific events within the time period (Netzer et al., 2017). Session Information in table 4.5 shows three general count variables related to a session. Page views, unique page views, and session duration. It is seen that people see, on average, about ten pages per session, of which about seven are unique. The average session duration is six to seven minutes, which is considerably longer compared to the benchmark of two to three minutes. This metric can provide insights in patterns but should be interpreted carefully since it is based on other average values in Google Analtyics (Cohen, 2016). The second set of count variables are the Page Categories. These categories relate to the ones in section 4.2. A couple of things stand out. First, CategoryPage and ProductPage, are visited way more often than other pages. This is in line with expectations since the main focus of retailers is to sell products. Second, Homepage is visited relatively few, indicating that many customers enter the website on other webpages. For example, they click on an ad that directly sends them to a product page. Lastly, Advicepage is the least represented category, showing that these functions are used substantially less than, for example, inspiration pages.

(34)

Table 4.5: State Indicating Variables

Conversion Count Rate

Online Conversion 1,410 0.73%

Offline Conversion 133 0.07%

No Conversion 192,497 99.2%

Characteristics Levels Count Percentage

channelGrouping Advertising 17,837 9.19% Paid Search 82,636 42.59% Social 24,362 12.56% Organic Search 46,034 23.72% Direct 15,445 7.96% Referral 6,743 3.47% deviceCategory Desktop 53,499 27.57% Tablet 25,799 13.3% Mobile 114,742 59.13%

Session Information Mean Maximum

pageViews 10.18 124

uniquePageviews 6,93 86

sessionDuration (in seconds) 402 12,151

Page Categories Count Page Categories Count

Homepage 105,126 SearchPage 104,554 CategoryPage 1,282,359 InspirationPage 120,950 ProductPage 1,060,979 AccountPage 34,356 AdvicePage 27,782 Sale 96,701 Checkout 48,952 Events Count Appointment 102 Newsletter 2,425 Magazine 299 Storefinder 25,613

(35)

4.4 Estimated Models

The first step in estimating the HMM is determining the number of states. Different models with a different number of states were set up, to be able to compare model performance. All models contained the same indicators and the same time variable (see table 4.5). The comparison is made employing the number of parameters, the log-likelihood, the BIC, and the CAIC (see table 4.6). When judging the measures, it is seen that the BIC and CAIC keep improving when adding states. The large set of data explains this. When much data is available, it is easy to point out minor differences within a state, and divide the state into two, leaving a ‘better’ model.

Table 4.6: Model Comparison

Hidden States Parameters BIC CAIC

1 26 126731552.94 126731578.94 2 57 45789015.98 45789072.98 3 92 28760818.81 28760910.81 4 131 22388219.78 22388350.78 5 174 19064679.12 19064853.12 6 221 17147513.89 17147734.89 7 272 16003748.28 16004020.28 8 343 15280901.87 15281228.87

(36)

Figure 4.2: Visualization BIC Scores

As an extra check, the additional value of an added state is looked at. After adding a fifth state, the additional initial state probability of belonging to that state becomes small, which implies that the value of an added state is quite low (.0348), whereas the addition of state 4 gives a probability of .0696. Based on this argumentation, it is decided that a 4-state model fits the data best. In the next section, the estimation results of this model are given.

4.5 Estimation Results

This section discusses the actual results of the study. In line with the three components of the proposed HMM, first, the initial state probabilities and the transition matrix are given. Then, the state-dependent distribution of the parameters is given. Based on these parameters, the states are described.

4.5.1 Initial State Probabilities & Transition Matrix

(37)

.34, .17 and .07. This shows that part of the customers start in later stages of the customer journey, possibly because the first states fall outside the data frame, or because a customer is already familiar with the firm and skips the first states in the journey.

Table 4.7: Initial State Distribution and Transition Matrix

Initial State Distribution

State (t=0) 1 2 3 4 Probability 0.4238 0.3352 0.1714 0.0696 Transition Matrix State(i) State (i-1) 1 2 3 4 1 0.5370 0.4560 0.4139 0.3729 2 0.2845 0.3243 0.3052 0.2901 3 0.1282 0.1566 0.1949 0.2027 4 0.0503 0.0631 0.0860 0.1343

(38)

Figure 4.3: Visualization Transition Matrix

4.5.2 Parameters

(39)

Table 4.8: State-Dependent Distribution of Means/Probabilities State-Dependent Distribution State(i) 1 2 3 4 Overall Conversion No 0.9984 0.9905 0.9834 0.9754 0.9921 Offline 0.0006 0.0008 0.0008 0.0004 0.0007 Online 0.0010 0.0087 0.0159 0.0243 0.0072 channelGrouping Advertising 0.1104 0.0780 0.0730 0.0720 0.0920 Paid Search 0.4294 0.4314 0.4168 0.3973 0.4259 Social 0.1313 0.1274 0.1151 0.1013 0.1256 Organic Search 0.2133 0.2484 0.2646 0.2889 0.2372 Direct 0.0727 0.0771 0.0934 0.1071 0.0796 Referral 0.0374 0.0336 0.0319 0.0282 0.0348 devideCategory Desktop 0.2487 0.2845 0.3137 0.3346 0.2756 Tablet 0.1142 0.1479 0.1545 0.1445 0.1329 Mobile 0.6371 0.5676 0.5317 0.5209 0.5915

Session Info pageViews 4.1509 10.522 19.357 29.104 10.154

uniquePageviews 3.3201 7.2977 12.400 17.653 9.9202 sessionDuration 65.524 293.98 865.44 2153.8 399.95

Page Categories Homepage 0.4101 0.5887 0.7137 0.8416 0.5413

(40)

4.5.3 State Description

The first scan on the data shows that most count variables tend to increase the further customers go in the customer journey. It shows that, pageViews, uniquePageviews, ses-sionDuration and all Page Categories increase every next state. The same tendency is spotted in the events. This implies an improved engagement down the line of the customer journey. However, some pages have stronger effects in certain states than others. The multinomial data provides more varied results. Below is a description per state.

State 1 – Just Started. This state is the first in the journey. Most customers begin in this state, at the moment of their first interactions with the retailer. What stands out is that only a few conversions are made in this state. Since it is assumed that a customer journey takes multiple sessions before conversion, this is in line with expectations. Further, paid search is the most used channel, meaning that most customers start their journey by clicking at sponsored search, such as Google Ads. Advertising and social channels are vital session starters too, proving that many journeys start by being exposed to an ad or social media. Organic search is low, possibly because of lower awareness. Also, mobile use is most substantial in this state, both in general as relative. All page views are considerably low in the first state, indicating low engagement at first. Especially advice pages and checkout pages are hardly visited, showing low intentions to purchase yet.

(41)

State 3 - Omni-Enthusiastic. This state is characterized by even more engagement. It shows a serious increase towards conversion behavior. Although the proof for offline conversion in this state is not very strong, there is proof that state 4 shows relatively few offline conversion behaviors. Hence, it may be assumed that most offline purchases are made in this state. Further, this state shows less traffic through advertising channels and more through organic and direct search, indicating familiarity with the retailer or particular product they are interested in. Again, (unique) page views and session duration increased tremendously. Only this time, tablets are relatively popular to use. Inspiration and advice pages were visited relatively more often, as well as appointments were made and store finding events triggered. The change in these variables could hint towards offline purchases, but this state does not rule out online conversion (see increase checkout page). The combination with the increase in tablet use implies a complementary effect of tablets on offline conversion.

(42)

four states. About the magazine request and newsletter sign-up, this is in line with the expectation of more engagement. However, the appointment and storefinder dummies would imply offline orientation, while offline conversion is lowest in this state.

4.6 Check for Robustness

To test the results on robustness, it was decided to change two variables and run the proposed HMM again. This robustness check is mainly used to illustrate the validity of the analysis. Since the hypotheses were exploratory of nature, it is hard to measure the extent to which the results are in line with the expectations. However, demonstrating robustness by gaining similar results, when the set-up of certain variables is changed, strikes in twofold. On the one hand, it shows justification for the decisions made. On the other hand, it provides additional proof for the parameters.

Two checks were done. First, as the variable channelGrouping was reduced to six levels, it is tested if this decision was justifiable. Thus, the model was estimated again with eleven levels. The results showed the following. Criteo decreased throughout the states. This is in line with the assigned category advertising. Display showed the same results in the first three states but increased in the last state. A possible explanation could that ads are used as a short cut to the already known website. Next, branded paid search and generic paid search showed similar results. Both are steady and considerably high in the first two states. Lower scores are depicted in the last two states. Further, social was probably most tricky to combine since E-mail can be viewed independently from social media. However, E-mail and paid social were in line with the results. The regular variable social showed few significant effects, but tend to stay steady throughout the states. Lastly, affiliates showed similar results as the assigned category Referral. Referral remains steady across the states, but no proof is found for an effect in state 4. Overall, it is considered justified to reduce channelGrouping into six categories.

(43)

(44)

customer so well that the product or service fits him and sells itself.”

- Peter Drucker, Author | Consultant | Educator

5

General Discussion

The aim of this research was to capture the dynamics of customer behavior in an om-nichannel environment using nothing but clickstream and transaction data. After all, knowing and understanding a customer is of vital value for a firm. Two research questions were followed in this process: (1) Which states in the customer journey can be identified, based on customers’ omnichannel behavior? and (2) what characteristics matter most in the different states? Four states have been identified. This chapter extends the findings in the state description towards a broader perspective. This is done by looking at similar results from other literature. It will look at the scientific and managerial implications of the results and lastly, it addresses the limitations of this study and future research ideas.

This study proposed a HMM to map customer behavior, by looking at online navigational patterns on session level, and combine that with online and offline transaction data. Four states were found, in line with models like AIDA (Kotler and Armstrong, 2010) and the model from Jansen and Schuster (2011). Engel et al. (1978) identify five states, since they address a post-purchase stage, something this study does not. Most existing models are online oriented and lack offline application. This study suggests an omni-oriented state, in which online and offline channels are better integrated (Herhausen et al., 2019). The proposed HMM yielded to be a suitable alternative to map the customer journey. Since much theory is already available on purchase funnels, it is wise not to be led by theories only, but to let the data speak for itself. The data showed to be an excellent fit for a HMM, and the estimation of the three main components showed rational results. A solid theoretical and statistical foundation was found to choose four states for the model.

(45)

It was found that, within each state, different characteristics have different effects. One can conclude that the conversion rate of the data sample was rather low. Therefore, we look at relative differences between states. The first two states showed few signs of conversion, which indicates a pre-purchase state (Court et al., 2009; Lemon and Verhoef, 2016). It is theorized that the third state provides the most offline conversions. Surfing online and buying offline is proof of webrooming behavior (Brynjolfsson et al., 2013). The fourth state is online oriented, which confirms the rise of e-commerce (Ecommerce News Europe, 2019). As theorized, most of the conversion could be repetitive conver-sions. This is underlined by literature that showed that conversion rates of returning customers are 40 times higher than potential customers (Chan, Wu, and Xie, 2011). Next, the effect of channels is studied. It was found that in early states, advertising, referral, social, and paid search is often used. Later states point more towards direct and organic search. This confirms the study of De Haan et al. (2016), that FICs are more effective in the early stages and CICs in later stages of the customer journey. For example, when a customer explores options using Google, the chance of conversion is lower, given the orientation stage, than when a customer enters the website directly at the end of the purchase funnel (Lemon and Verhoef, 2016). Another study showed that the relative mobile use decrease and desktop use increases when the journey progresses (De Haan et al., 2018). This study confirms that finding. Additionally, the findings show that the chance of using a tablet, is highest in the third state, indicating that tablet users are more omnichannel oriented. This can be linked to the finding of Xu et al. (2017), that tablet users are more interested in a wider variety of products.

(46)

for increased engagement and customer experience (Böttger et al., 2017; Lemon and Verhoef, 2016). Besides, search pages and category pages are relatively most popular in orientation stages. In terms of the events, it is seen that all probabilities increase down the line of the customer journey. Some of these findings make sense since they are related to a higher customer experience (Lemon and Verhoef, 2016), and level of inspiration (Böttger et al., 2017). However, the chances of an appointment or storefinder event are the highest in state 4 as well. This in contradiction to the overall finding of an online orientated customer base. It is likely, however, that because of the increased page views, session duration, and overall engagement, these events are more triggered as well. Overall, the findings show some evidence for state-specific importance of indicators and some evidence for the more continuous progress of variables. The lower backward- than forward-moving probabilities provide evidence for a funnel framework. However, some variables show signs of a loop function as well. E.g., customers who start in later states of the journey, might be customers who reenter the journey.

5.1 Scientific Implications

(47)

hand, to omnichannel literature (see section 2.3). However, it also adds possibilities to conventional clickstream analytics, which tend to focus on the online customer base, and thereby neglecting the brick and mortar side of retailing (see section 2.5). Lastly, although many applications of HMMs have been developed, there is a lack of using HMMs to map the omnichannel customer journey (see section 2.6). As far as my knowledge reaches, not once has it been done in this context.

5.2 Managerial Implications

(48)

5.3 Limitations & Future Research

Some limitations that bound this study need to be addressed. First, the conversion data used in this study was limited. Although the robustness check strengthened the results, such few data points can easily be manipulated by other variables. Besides, the linking method between offline transactions and online is prone to errors. Hence, it is difficult to base hard conclusions. Especially offline transaction data was scarce in this study. Also, this study lacked other offline variables. Future work might consider incorporating a more accurate method of linking and additionally research other offline variables such as store visits and in-store advice requests. Repetitive conversions deserve some extra attention as well. Investigating customers with repetitive conversions might contribute to the design of a loop function in the customer journey. Second, this study did not consider the sequence of pages within a session. Prior literature has shown that page-by-page browsing behavior improves conversion prediction (Montgomery et al., 2004). Other researchers could investigate the omnichannel customer journey and address the sequence of visited pages and actions.

(49)

6

Conclusion

Once upon a time, brick and mortar stores feared the rise of e-commerce. Luckily, that is no longer necessary. The rise of omnichannel retailing has paved the way for synergetic effects between webshops and physical stores. It has become more and more relevant to correctly look at the role of channels, touchpoints, and dynamics in navigational behavior in the customer journey. Customers are not engaging through one or the other channel anymore, and different paths can lead to a purchase. Customers started switching between online and offline touchpoints before making an online or offline purchase. Much literature has already focused on the role of the different touchpoints and channels within the online journey. However, a relatively new insight into the customer journey is the discovery of patterns in omnichannel customer behavior throughout the customer journey. Although lots of research is yet to be done, the future of retail surely lies in mapping the omnichannel customer journey.

This study showed how customers’ dynamics change throughout the customer journey. A four-state HMM was proposed to provide insights into the purchase state a customer is in. Early states in the journey indicate orientating behavior, whereas customers in later states are hinting at conversions, both online and offline. Especially the distinction between an omnichannel purchasing state and a solely online-focused state is a promising finding. It is all about knowing your customers to serve them better. This study was conducted to provide theoretical and empirical building blocks to extend the knowledge on the customer journey and omnichannel retailing. All for a greater purpose, to offer customers a superior experience.

(50)

Bibliography

Vibhanshu Abhishek, Peter Fader, and Kartik Hosanagar. Media exposure through the funnel: A model of multi-stage attribution. Available at SSRN 2158421, 2012.

Eva Anderl, Ingo Becker, Florian von Wangenheim, and Jan Hendrik Schumann. Map-ping the customer journey: Lessons learned from graph-based online attribution modeling. International Journal of Research in Marketing, 33(3):457–474, 2016.

Asim Ansari, Carl F. Mela, and Scott A. Neslin. Customer channel migration. Journal of marketing research, 45(1):60–76, 2008.

Eva Ascarza and Bruce G. S. Hardie. A Joint Model of Usage and Churn in Contractual Settings. Marketing Science, 32(4):570–590, 2013.

Francesco Bartolucci, Alessio Farcomeni, and Fulvia Pennoni. Latent markov models: a review of a general framework for the analysis of longitudinal data with covariates. Test, 23(3):433–465, 2014.

Niklas Barwitz and Peter Maas. Understanding the Omnichannel Customer Journey: Determinants of Interaction Choice. Journal of Interactive Marketing, 43:116–133, 2018.

Gely P. Basharin, Amy N. Langville, and Valeriy A. Naumov. The life and work of aa markov. Linear algebra and its applications, 386:3–26, 2004.

Shane Baxendale, Emma K. Macdonald, and Hugh N. Wilson. The Impact of Different Touchpoints on Brand Consideration. Journal of Retailing, 91(2):235–253, 2015.

David R. Bell, Santiago Gallino, and Antonio Moreno. Offline showrooms in omnichan-nel retail: Demand and operational benefits. Management Science, 64(4):1629–1651, 2018.

Gaël Bernard and Periklis Andritsos. A process mining based model for customer journey mapping. CEUR Workshop Proceedings, 1848:49–56, 2017.

Tammo H.A. Bijmolt, Manda Broekhuis, Sander de Leeuw, Christian Hirche, Robert P. Rooderkerk, Rui Sousa, and Stuart X. Zhu. Challenges at the marketing–operations interface in omni-channel retail environments. Journal of Business Research, 2019.

(51)

Angelica Blom, Fredrik Lange, and Ronald L. Hess. Omnichannel-based promotions’ effects on purchase behavior and brand image. Journal of Retailing and Consumer Services, 39(August):286–295, 2017.

Tim Böttger, Thomas Rudolph, Heiner Evanschitzky, and Thilo Pfrang. Customer inspi-ration: Conceptualization, scale development, and validation. Journal of Marketing, 81(6):116–131, 2017.

John Brauer. People First: A User-Centric Hybrid Online Audience Measurement Model. The Nielsen Company, 2011.

Erik Brynjolfsson, Yu Jeffrey Hu, and Mohammad S. Rahman. Competing in the age of omnichannel retailing. MIT Sloan Management Review, 54(4):1–7, 2013.

Randolph E. Bucklin and Catarina Sismeiro. A model of web site browsing behavior estimated on clickstream data. Journal of Marketing Research, 40(3):249–267, 2003.

Randolph E. Bucklin and Catarina Sismeiro. Click Here for Internet Insight: Advances in Clickstream Data Analysis in Marketing. Journal of Interactive Marketing, 23(1): 35–48, 2009.

Jacques Bughin. Brand success in an era of digital darwinism. Journal of Brand Strategy, 2(4):355–365, 2014.

Lanlan Cao and Li Li. The Impact of Cross-Channel Integration on Retailers’ Sales Growth. Journal of Retailing, 91(2):198–216, 2015.

Jan Carlzon. Moments of truth. Ballinger Cambridge, MA, 1987.

Raymond B. Cattell. The scree test for the number of factors. Multivariate Behavioral Research, 1(2):245–276, 1966.

Dave Chaffey. E-commerce conversion rates- How do yours compare?, 2020.

(52)

Tat Y. Chan, Chunhua Wu, and Ying Xie. Measuring the lifetime value of customers acquired from Google search advertising. Marketing Science, 30(5):837–850, 2011.

Patrali Chatterjee, Donna L. Hoffman, and Thomas P. Novak. Modeling the Clickstream: Implications for Web-Based Advertising Efforts. Marketing Science, 22(4):520–542, 2003.

Ilan Cohen. Analysis of “Average Session Duration” in Google Analytics, 2016. https://www.visma.com/blog/analysis-reporting-average-se ssion-duration-google-analytics/, Accessed: 2020-06-13.

David Court, Dave Elzinga, Susan Mulder, and Ole Jørgen Vetvik. The consumer decision journey. McKinsey Quarterly, 3(3):96–107, 2009.

Evert De Haan, Thorsten Wiesel, and Koen Pauwels. The effectiveness of different forms of online advertising for purchase conversion in a multiple-channel attribution framework. International Journal of Research in Marketing, 33(3):491–507, 2016.

Evert De Haan, Pallassana K. Kannan, Peter C. Verhoef, and Thorsten Wiesel. Device switching in online purchasing: Examining the strategic contingencies. Journal of Marketing, 82(5):1–19, 2018.

Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1):1–22, 1977.

José G. Dias and Jeroen K. Vermunt. Latent class modeling of website users’ search patterns: Implications for online market segmentation. Journal of Retailing and Consumer Services, 14(6):359–368, 2007.

Ecommerce News Europe. Ecommerce in The Netherlands, 2019.

https://ecommercenews.eu/ecommerce-in-europe/ecommerce-t he-netherlands/, Accessed: 2020-06-13.

(53)

James F. Engel, Roger D. Blackwell, and David T. Kollat. Consumer Behavior. Business and management. marketing. Dryden Press, 1978.

Ruud T. Frambach, Henk C.A. Roest, and Trichy V. Krishnan. The impact of consumer internet experience on channel preference and usage intentions across the different stages of the buying process. Journal of interactive marketing, 21(2):26–41, 2007.

Asbjørn Følstad and Knut Kvale. Customer journeys: a systematic literature review. Journal of Service Theory and Practice, 28(2):196–227, 2018.

Sonja Gensler, Scott A. Neslin, and Peter C. Verhoef. The Showrooming Phenomenon: It’s More than Just About Price. Journal of Interactive Marketing, 38:29–43, 2017.

Damien Geradin and Dimitrios Katsifis. Taking a dive into google’s chrome cookie ban. Available at SSRN, 2020.

Dennis Herhausen, Jochen Binder, Marcus Schoegel, and Andreas Herrmann. Integrating Bricks with Clicks: Retailer-Level and Channel-Level Outcomes of Online-Offline Channel Integration. Journal of Retailing, 91(2):309–325, 2015.

Dennis Herhausen, Kristina Kleinlercher, Peter C. Verhoef, Oliver Emrich, and Thomas Rudolph. Loyalty Formation for Different Customer Journey Segments. Journal of Retailing, 95(3):9–29, 2019.

Shep Hyken. The New Moment Of Truth In Business, 2016.

https://www.forbes.com/sites/shephyken/2016/04/09/new-mo ment-of-truth-in-business/#51e80b5238d9, Accessed: 2020-06-13.

Bruce Isaacson. Using Customer Journey Maps to Improve Your Customer Experience. MMR Strategy Group. Whitepaper, 2012.

Bernard J. Jansen and Simone Schuster. Bidding on the buying funnel for sponsored search and keyword advertising. Journal of Electronic Commerce Research, 12(1): 1–18, December 2011.

(54)

Pallassana K. Kannan and Hongshuang “Alice” Li. Digital marketing: A framework, review and research agenda. International Journal of Research in Marketing, 34(1): 22–45, 2017.

Avinash Kaushik. See, Think, Do, Care Winning Combo, 2015.

https://www.kaushik.net/avinash/see-think-do-care-win-co ntent-marketing-measurement/, Accessed: 2020-06-13.

Hirotaka Kawazu, Fujio Toriumi, Masanori Takano, Kazuya Wada, and Ichiro Fukuda. Analytical method of web user behavior using Hidden Markov Model. 2016 IEEE International Conference on Big Data (Big Data), pages 2518–2524, 2016.

Philip Kotler and Gary Armstrong. Principles of Marketing. Pearson, 2010. ISBN 9780137006694.

Anders Krogh, Michael Brown, Saira I. Mian, Kimmen Sjolander, and David Haussler. Hidden markov models in computational biology. applications to protein modeling. Journal of molecular biology, 235(5):1501–1531, 1994.

Peter S.H. Leeflang, Jaap E. Wieringa, Tammo H.A. Bijmolt, and Koen Pauwels. Model-ing Markets. SprModel-inger, 2014. ISBN 9781493920853.

Peter S.H. Leeflang, Jaap E. Wieringa, Tammo H.A. Bijmolt, and Koen Pauwels. Ad-vanced Methods for Modeling Markets. Springer, 2017. ISBN 978-3-319-53467-1.

Katherine N. Lemon and Peter C. Verhoef. Understanding customer experience through-out the customer journey. Journal of Marketing, 80(6):69–96, 2016.

Hongshuang Li and Pallassana K. Kannan. Attributing conversions in a multichannel online marketing environment: An empirical model and a field experiment. Journal of Marketing Research, 51(1):40–56, 2014.

CLICK & BRICK RETAILING: STATES IN THE CUSTOMER JOURNEY

THE CUSTOMER JOURNEY

A Hidden Markov Model on Clickstream Data

NIEK VAN DER WERF

THE CUSTOMER JOURNEY

A Hidden Markov Model on Clickstream Data

Abstract

Preface

Contents

1

Introduction

1.1

Problem Statement & Research Questions

1.2

Contribution & Relevance

1.3

Outline of this Thesis

2

Theoretical Framework

2.1

The Customer Journey

2.2

Touchpoints

2.3

Multichannel vs. Omnichannel

2.4

The Journey as a Funnel

2.5

Clickstream Data

2.6

The Hidden Markov Model

2.7

Conceptual Framework

3

Research design

3.1

Data

3.2

Model Specification

X

Y

Y

3.2.1

Initial State Distribution

3.2.2

Transitional Probabilities

3.2.3

State Dependent Distribution

3.3

Estimation Procedure

3.3.1

EM Algoritm

3.3.2

Model Comparison

4

Application to the Data

4.1

Data Cleaning

4.2

Variable Transformation

4.3

Sample Description

4.4

Estimated Models

4.5

Estimation Results

4.5.1

Initial State Probabilities & Transition Matrix

4.5.2

Parameters

4.5.3

State Description

4.6

Check for Robustness

5

General Discussion

5.1

Scientific Implications

5.2

Managerial Implications