Automation of Quotation Decision Making

(1)

Automation of Quotation Decision Making

SUBMITTED IN PARTIAL FULLFILLMENT FOR THE DEGREE OF MASTER

OF SCIENCE

Zheya Feng

12143979

M

ASTER

I

NFORMATION

S

TUDIES

Data Science

F

ACULTY OF

S

CIENCE

U

NIVERSITY OF

A

MSTERDAM

2019-07-10

1st_Examiner ₂nd_Examiner

MSc. Chang Li Dr. Pengjie Ren

(2)

Automation of Quotation Decision Making

Zheya Feng

zheya.feng@gmail.com University of Amsterdam

ABSTRACT

Quotation decision making is a manpower intensive Business-to-Business activity, however, implementing machine learning in gen-erating quotation price can not only save labor cost but also yield more profit. This study focuses on the automation of quotation de-cision making with machine learning methods to improve profit by experimenting on historical quotation data from a diary company. A price generator and outcome discriminator combined model is proposed. To realize this, multiple feature selection methods are employed first to identify the important features in quotation activ-ities, whose performance are compared and the feature selection results are aggregated following stability selection. Then the most appropriate discriminator and generator are selected based on per-formance experiment results of multiple classifiers and regressors. It is proved by an offline evaluation that the proposed model in-creases profit efficiently for it yields 11% more profit compared with a simple price generation.

1 INTRODUCTION

A quotation transaction is initiated by a buyer sending a request for quotation with specifications to the seller. The specifications usually include type, quantity of required product or service, de-livery terms, and payment terms [38]. After receiving the demand from a buyer, sellers will first assess whether they are able to meet the requirements from buyer, and then propose a price to the buyer by taking quotation content and previous transactions into con-sideration. The transaction is successful if the price is acceptable to buyers, otherwise the buyer either negotiates the price or ac-cepts quotation from other suppliers. In increase the sales volume and make as much profit as possible for the company, appropriate quotation prices are required to be generated for each deal.

The Business to Business (B2B) market is estimated at trillions of dollars, yet only approximately 3.4% of published articles in the top four marketing journals concerning B2B contexts compar-ing with other marketcompar-ing decisions [35]. B2B largely lags behind the Business-to-Consumer (B2C) market in terms of pricing and adoption of technology [7]. Because immeasurable factors, such as loyalty of buyers, negotiation skill of salesperson and so on, are wildly considered as indispensable elements for making successful quotation decisions, nowadays human resources are still intensively involved in quotation activities. However, machine learning can be helpful in generating quotation automatically given the undeniable fact that it is widely used and powerful in multiple fields. It has been proved by Leung et al. [38], Shichor and Netzer [50] and Zhang et al. [60]. that the automation of quotation making can not only avoid human mistakes and reduce labor cost involved in quotation making, but also can improve the performance of decision making. In this paper, we focus on the automation of quotation decision making with machine learning methods. The ultimate goal is to

generate optimal quotation prices that can ensure profitability and client acceptability simultaneously. A quotation price generator and quotation outcome discriminator combined model is proposed to realize the goal. The generator is expected to give a precise measure of how the price should be, and the discriminator should be able to predict the quotation outcome with price proposed by generator accurately. Since quantitative studies about quotation activity features are rare, and powerful features are critical for model performance, feature selection is a must for this study. To select stable and powerful features, multiple filter and embedded feature selection methods are compared and combined following stability selection. The choices of an appropriate discriminator and generator can be done by comparing universally used models.

By conducting experiments on historical data from a diary com-pany, we are able to answer the following research questions:

RQ1 : How do different feature selection methods, information theory-based and similarity-based filter methods, different tree classifiers and feature search strategies combined em-bedded methods perform on data with both numerical and categorical features?

RQ2 : What are the influential features involved in predicting quotation transaction outcomes?

RQ3 : How good are different classifiers,k-nearest neighbors, logistic regression, linear support vector machine, Gaussian naive Bayes, random forests, XGBoost, LightGBM and neural network, in predicting quotation outcome?

RQ4 : How good are different quotation generation models, lin-ear regression, ridge regression, generalized additive model, random forests regressor and gradient boosting regressor, in simulating salesperson pricing behavior?

RQ5 : How much can a quotation price generator and quota-tion outcome discriminator combined model increase accep-tance probability and profit of quotations than simple price generation?

This study is expected to make two contributions. First, a novel machine learning approach in generating optimal quotation price is proposed which can enrich the insufficiency study of B2B pricing in both academic and business world. Second, since there are few quantitative researches have been done on finding out decisive features that lead to different quotation outcome, the identification of those decisive features via different feature selection methods can also provide insights for the business field.

The rests of the paper are organized as follows. In Section 2, some related works are reviewed. Then, in Section 3, we introduce how the problem of making quotation pricing decision is approached in this paper. In Section 4, experiment and measurement plans corre-sponding to each research question are described. The results and analyses are presented in Section 5. Finally, Section 6 generalizes this study and points out limitations and potential future work.

(3)

2 RELATED WORK

This section describes previous research done on features of B2B pricing (2.1), feature selection methods (2.2), quotation generation (2.3), and pricing with machine learning (2.4).

2.1 Features in B2B Pricing

Business-to-Business pricing activity is a relatively understudied field, exceptions such as relationship between buyers and sellers, reference and loss aversion behaviour are reviewed in this section. B2B environments are generally characterized by long-term re-lationships between buyers and sellers [45]. In order to build and maintain a long-term relationship, sellers vary prices across buyers and even change prices between subsequent purchases of the same buyer [60], and buyers trust sellers more as the relationship grow longer and deeper [44]. Besides long-term relationship, reference effects customer decisions in purchasing non-durable products [8], and external reference prices have been proved to have quadratic effect on consumer price expectations [33], also on the transaction pricing outcome in B2B market transactions [9]. Loss aversion is an important B2B customer behaviour that customers react asym-metrically to the increases and decreases of price [9], customers actually put more emphases on losses from a reference point than equivalent sized gains [24].

2.2 Feature Selection Methods

Feature selection is a common technique that aims to reduce data dimensionality. Feature selection helps to achieve better model performance and interpretability. It involves finding a “good" subset of features under objective functions such as prediction accuracy, minimal use of input features [54]. The entire feature selection process consists of four steps: (1) subset generation; (2) subset evaluation; (3) stopping criterion; (4) result validation [16].

Feature selection method can be categorized into filter models, wrapper models and embedded models [39]. The first two are widely used [15]. The filter model selects features by measuring general characteristics of training data including information, dependency, consistency and distance [41, 52] but in this case the connection between induction model and features has been overlooked. The wrapper method has been introduced to address this problem, with which the induction algorithm itself is part of the function eval-uating feature subsets. The induction models in wrapper select feature subsets using greedy search strategies such as hill-climbing, best-first, branch-and-bound, and genetic algorithm [22]. It has been proved that wrapper method outperforms filter method [32] with the cost of more computations. The embedded method com-bines these two models by employing feature selection criteria of filter and wrapper in different stages [27, 40, 41]. With the rise of deep learning, applying auto-encoders in feature extraction be-comes popular in image classification and speech recognition given their outstanding abilities in denoising and learning from sparse features [18, 51, 57].

Feature selection methods can be sensitive to data and produce unstable features leading to unreliable model performance in su-pervised learning. The concept of stability selection [17, 43, 49] is thus introduced to address these issues. The idea is to repeatedly

evaluate classification performance of all models, for example, pair-wise combinations of various selection and classification methods with random sampling [17], after which, structures or variables that occur in a large fraction of the resulting selection sets are chosen [43].

2.3 Quotation Making

Several studies have proved that machine learning and mathemati-cal models can facilitate quotation pricing and increase profit. Shi-chor et al. [50] conducted a study on metal B2B retailer case and proved that pricing decision in B2B setting can be automated and it performs better than purely salesperson in terms of profit. They build individual model for each salesperson and the quotation prices generated by which then serve as a suggestion for salespersons. Further more, they use random forests to predict the expected profit difference between human and model pricing to decide which price to use. Leung et al. [38] propose a pricing decision support system Smart-Quo which applies fuzzy association rule mining approach and fuzzy logic technique to identify the factors influenc-ing pricinfluenc-ing decision of products and then formulate flexible and dynamic pricing strategies for each product. Significant improve-ment in terms of the efficiency and effectiveness is achieved in making pricing decisions for each product in a pilot run. Zhang et al. [60] propose a pricing framework which adopts hierarchical Bayesian model, multivariate non-homogeneous hidden Markov model, buyer heterogeneity, and control functions to target and capture the evolution of trust, and to control the price endogeneity. The pricing strategy based on this model shows a 52% improvement in profitability on a US mental retailer. They also found that buyers can be represented by “vigilant" and “relaxed" those two states of trust to the seller.

2.4 Machine learning methods for pricing

Quotation is in a way similar to stock price and wholesale price as they share multiple common factors ranging from business strategy to time series feature. In the case of smart data pricing, Tsai et al. [53] use KNN to calculate the similarity between each pair of users in terms of smart data usages to determine future smart data prices for users. Mao et al. [19] proposed a method of forecasting short-term electricity price based on a two-stage hybrid network of self-organised map (SOM) and support-vector machine (SVM). Machine learning methods are frequently employed in stock price prediction. Khaidem et al. [29] predict the direction of stock market prices with random forest for its advantage in reducing overfitting by training on divided feature space. Patel et al. [46] found that random forests outperforms ANN, SVM, and naive Bayes in predicting stock price. Nishikawa et al. [30] integrated profit and time factors into training procedure of a feed-forward neural network and made an improvement in forecasted results; they also found that the simple RNN had the best-forecasted results.

3 METHODOLOGY

3.1 Datasets

3.1.1 Description.

(4)

(1) Historical transaction data. The empirical context and data used in this study come from a Netherlands-based diary company that supplies cheese to clients in Europe and east Asia. The transaction records consist of two B2B transactions, quotations and orders. The historical quotations contain transaction level quotation information spanning 23 months from January 2017 to November 2018. The total number of quotation transactions are 3, 539, among which 66.49% are accepted by clients. The number for order transactions is 9, 632. Both quotation and order transaction entry contain product and delivery specifications, customer details, and outcome for quotation transactions.

(2) Historical prices of cheese products. Two sets of external data are considered: one is Trigona cheese weekly prices [4], which includes contract and spot prices; the other is weekly market prices of three types of cheeses (Cheddar, Gouda and EDAM) from EU Milk Market Observatory [1]. Both of the two external datasets cover the whole time range of the internal data.

3.1.2 Data Preprocessing.

(1) Remove transaction records that have no previous ref-erence. Reference price is an important factor affecting client decision behaviors because clients tend to rely on historical transaction as reference for the current one [9], thus only transactions that have historical references are considered in the following steps.

(2) Remove unrelated and incomplete quotation transac-tion records. Quotatransac-tions can be recorded as failed for sons other than inappropriate quotation prices. Internal rea-sons can be, for example, salesperrea-sons find the inventory level too low to meet the demanded quantity after price has been proposed to client. For external, clients may cancel the request for quotation because of the change of demand from their customers. Since this study focuses on quotations that are caused by inappropriate quotation price, failed quotation transactions caused by other reasons are removed. Incom-plete transaction records, are also excluded since they may change data distribution.

(3) Remove outliers. In order to eliminate noises further (e.g. caused by human input errors), outliers are detected by look-ing into the data distribution. Followlook-ing the empirical rule, data points with a z-score of higher than 3 are removed. 3.1.3 Feature Construction. Apart from plain descriptive fea-tures from original data, domain knowledge supported feature con-struction is necessary considering the complexity of quotation making. Thus, several additional features are constructed based on the previous related works. The manually constructed features along with original features can be found in Appendix as table 6.

The following two procedures are applied after feature genera-tion.

(1) Transform skewed data. Logarithmic transformation are applied to every quotation quantities and value features be-cause distributions of these features are severely left-skewed and the values themselves are much larger than values of other features.

(2) Perform label encoding to categorical features. Since the types of product, material and customer country are in text form, label encoding is applied to these data to transform them into numerical values. The choice of label encoding rather than other encoding methods is motivated by its sim-plicity and the goal to keep a relatively small data dimension.

3.2 Feature Selection

Features are critical part in achieving good machine learning per-formance. This section focuses on feature selection and consists of two parts. The first part is to evaluate different feature selection strategies in terms of performance, applicability, stability and effi-ciency on a dataset with correlated, mixed-type features and noises. Two filter methods, information theory based and similarity based respectively, are chosen for the comparison with embedded meth-ods. The other part is to generate insights on important features involved in B2B quotation activities by aggregating the multiple selected feature sets following aggregation method from stability selection.

3.2.1 Feature selection with filter methods. Filter based feature selection methods rank features relying on certain performance criteria such as similarity, mutual information and statistical mea-sure [52]. Joint mutual information and Fisher score is selected from information theory based criteria and similarity based crite-ria respectively. They are classical filter methods that have been applied in multiple fields in the past decades.

(1) Joint Mutual Information (JMI) [58]. In information the-ory, the relevance of a single feature and target variable can be measured by the mutual information (MI). LetXibe one of the features andY be the target variable. MI is calculated as follows.

I(Xi;Y ) = K p(xi,y)||p(xi)p(y), (1) wherep(xi,y) is the joint distribution of xi andy, K p||q is the Kullback-Leibler divergence of two probability function defined by K p||q = N Õ i p(xi)loдp(x_q(xi) i), (2) The joint mutual information considers the mutual informa-tion not only between features and target but also between different features, enabling the identification of interaction between features [56]. With this characteristic, JMI may perform better in finding out important features from a cor-related feature set. WithXiandXjdenoting featurei and j, Y denoting the target, JMI score [58] of a feature is computed as follows,

JMI = I(Xi;Y ) − αI(Xj, ..., Xk;Xi)+ αI(X_i;Xj, ..., Xk|Y ), (3) whereα is the inverse of number of features. Note that JMI only takes discrete features, thus continuous features such as quotation quantity and value, are discretized uniformly into 100 bins.

(2) Fisher score. Fisher score is one of the most widely used filter methods. It measures the similarity between two data distributions. With Fisher score, one is able to collect a subset

(5)

of features that can separate data points from different classes as far as possible while keep same class data points as close as possible [21]. Fisher score ofj-th feature is computed as

F (Xj)= Íc

k=1nk(µkj −µj)2

(θj)2 , (4)

wherec is the number of classes of target, nkis the size of k-th class, µj_kandθ_kj is the mean and standard deviation ofk-th class corresponding to j-th feature, µj andθjis the mean and standard deviation of all classes corresponding to thej-th feature [21].

3.2.2 Feature selection with embedded methods. Three different tree classifiers and two feature search strategies are combined with each other to conduct embedded feature selection.

(1) Base classifiers. Three classifiers, namely Random Forests, XGBoost and LightGBM, are selected to perform feature selection. Although they are all in the scope of ensemble decision tree, they conduct classification and feature impor-tance measurement differently. In RF, the imporimpor-tance of a feature is Gini importance which is the normalized total re-duction of the criterion brought by that feature [3]. Both XGBoost and LightGBM provide more than one feature im-portance measurement options, weight in XGBoost and split in LightGBM are selected in this study, both of which calcu-lates feature importance as the number of times a feature is used to split the data across all trees [2, 5].

(2) Search strategy.

• Recursive Feature Elimination (RFE). RFE is often com-bined with induction algorithms to select features. The idea is obtaining a desired size of features by recursively eliminating the least important features from the remain-ing feature set. SVM-RFE [23] is one implementation of this search strategy.

• Recursive Feature Inclusion (RFI). Unlike RFE, RFI fol-lows a sequential forward feature selection manner that starts with an empty feature set and includes the most important feature into the feature set in each iteration. By combining the above mentioned three classifiers and two search strategies, six different feature sets can be yielded.

3.2.3 Aggregation of feature selection results. According to the methodology of stability selection, features whose appearance fre-quencies are above a certain threshold are selected as stable fea-tures [17]. Selection of feafea-tures over all methods will follow major voting strategy. The selected features are used in the rest of the study.

3.3 Quotation Outcome Prediction

It is important for a salesperson to know if the price is going to be accepted by client and the probability of the acceptance. This task can be conducted by multiple machine learning models. However, how accurate can them be in predicting quotation outcome is a question remaining to be answered.

The following classifiers are selected to perform quotation out-come prediction task.

•k-nearest neighbors (KNN)

• Gaussian naive Bayes

• Linear support vector machine (SVM) • Logistic regression

• Random forests • XGBoost • LightGBM

• Simple neural network

KNN classifies an instance based on the the labels of k nearest neighbors in the feature space [6]. Gaussian naive Bayes apply Bayes theorem and implements the classification by assuming the likelihood of the features to be Gaussian [42]. Linear SVM perform classification by finding a separating hyperplane with the maximal margin between two classes [10]. Logistic regression use a linear combination of variables as argument of the sigmoid function to per-form binary classificaion [55]. Random forests [25], XGBoost [12] and LightGBM [28] are three ensemble learning classification mod-els who construct multiple trees. Random forests output prediction as the mode of prediction by individual trees, the other two mini-mizes errors between additive trees sequentially [20], and they are optimized versions of gradient boosting machine.

The first six classifiers use the same data format as in the pre-vious sections. For neural network, one-hot encoding is applied to transform categorical features. Then the value of each feature is concatenated into a feature vector which serves as the input of neural network. The resulting dimension of each instance is 635.

3.4 Quotation Price Generation

Knowing how likely a quotation is going to win is not enough, it is essential to generate a proper price first. Quotation price generation can be achieved by simulating salesperson pricing behavior with historical data. The following describes how it can be implemented.

3.4.1 Ratio of final price to guide price as target. Because the price difference between different cheese products can be large, it is not fair to use the final quotation price as target. However, guide price is a reference used by salespersons in historical quotation prices making, and the ratio of quotation price to guide price can be viewed as certain discount for the clients. Thus, ratio of final quotation price to guide price is chosen as target variable.

T = P_Pq

д (5)

wherePqandPдdenote quotation price and corresponding guide price respectively. The ratio has a value range of approximately 0.8 to 1.2 in this study.

3.4.2 Price generation models. Like quotation outcome predic-tion, there are also multiple applicable algorithms for price genera-tion. The following price generation models are selected to simulate sales person pricing behaviour. The most appropriate one will be selected as generator in the proposed model.

• Linear regression • Ridge regression

• Generalized additive model (GAM) • Random forests regressor

• Gradient boosting regressor

(6)

Linear regression and ridge regression are two linear models. The difference between them is that ridge regression introduces penalty to the fitted coefficients to prevent overfitting [26]. GAM is an additive modeling technique where the impact of the predictive variables is captured through smooth functions (can be non-linear), the variable are added and linked to the predictor variables by certain link function [37]. Random forests and gradient boosting regressors are ensembled learning methods who construct multi-tude trees, and the former outputs mean prediction of individual trees as final prediction [25], the other sequentially minimizes the residuals between the additive trees at each iteration [20].

3.5 Optimal Decision with Generator and

Discriminator

The disadvantage of the price generation models in the previous section is obvious that they are only possible to generate one single “proper" result, however, the goal of quotation price optimization is gaining as much profit as possible, in other words, finding out the highest acceptable price of client, in this case, the optimal ratio.

Let assume that all clients are rational. Given two quotation offer, rational client will always choose the lower one. Following this assumption, by lowering an initially unacceptable quotation offer bit by bit (letd denotes this decrement), it will be acceptable to client at some point. In this way, one is able to obtain the largest profit. To achieve this goal, a good discriminator as those classifiers mentioned in section 3.3 is needed to tell whether an offer will be acceptable to client or not.

A proper boundary of quotation offer is also important that the upper boundRupshould provide the largest possible profit and the lower boundRlowguarantee that the offer will not lead to severe loss in profit. The standard deviationθ of the target can be used as a reference of the boundary space, and the ratio ˆR generated by ratio generation models will be the benchmark value. Thus, the possible ratio values of a quotation transaction can be,

R ∈ [ ˆR − θ, ˆR + θ]. (6) The decrementd used in the ratio adjustment is a hyperparameter which depends on the pricing condition of company, in this study, it is set as 0.1%, and more precise value can be obtained with a smaller decrement. The other hyperparameter to adjust is the thresholdπ of prediction probability for win, which controls how conservative the prediction is. The setting of this parameter is a trade off between profit and successful rate. The profit space can be narrowed by using a too high threshold because target value have to go further down to reach that prediction probability, however, high threshold guarantee the successful rate. The threshold in this study is set as 60% rather than 50% in most classification cases as the quotation pricing strategy put relatively more emphasis on having the client accepts the price.

4 EXPERIMENTAL SETUP

4.1 Model Performance Metrics

4.1.1 Performance of feature selection methods. The performance of different feature selection methods is measured from three as-pects, quality of selected features, stability and efficiency. Those three aspects help us to evaluate different methods.

The performance measurement of the quality of selected feature sets follows the idea that the higher the classification performance is, the better the selected features are [40]. Apart from three classifiers that used in embedded feature selection, four other classifiers are chosen to give a performance measure of selected features, namely, KNN, linear SVM, Gaussian naive Bayes and logistic regression. The involved performance metrics are shown in Table 1, where TP denotes the number of true positive results, and TN, TP, FP denote the number of true negative, false positive and false negative results in prediction respectively. ROC is the curve of true positive rate and false positive rate. For brier score and log loss,yiis the actual outcome at instancei, pi is the probability that was forecast. The first five metrics are expected to show how good these classifier are in predicting quotation outcome, and the last two metrics shows how accurate their probabilistic predictions are.

Table 1: Classification performance metrics.

Metric Description Accuracy _{T P +T N +F P+F N}T P +T N

AUC The area under ROC curve

Precision _{T P +F P}T P Recall _{T P +F N}T P

F1-score 2×Pr ecision×Recall_{Pr ecision+Recall} Brier score _N1 ÍN

i=1(pi−yi)2

Log loss ÍN

i −(yi×loд(pi) − (1 −yi) ×loд(1 − pi))

The running time of feature selection algorithms is considered as a metric for evaluating efficiency, and the mean time of multiple feature selection runs is used to present the overall efficiency of each method.

The stability of different feature selection methods is measured by consistency index [34]. LetA and B be two feature subsets, whereA ⊆ C and B ⊆ C are both of size k. The consistency index is computed as follows, IC(A, B) = r − k2 n k −k2 n =rn − k2 kn − k2, (7)

wherer denotes the size of the intersection of A and B, and n denotes the number of features. The stability index for a set ofK selected sequences is the average of all pairwise consistency indices [34].

τS(A(k)) = 2 K(K − 1) K−1 Õ i=1 K Õ j=i+1 IC(S_i(k), S_j(k)). ₍₈₎ The higher the index is, the more consistent the selected features are.

(7)

4.1.2 Comparison of quotation outcome prediction models. The same performance metrics as shown in Table 1 are used to compare the performance of quotation outcome prediction models. The test data prediction time is also included as a metric to measure the model speed.

4.1.3 Comparison of price generation models. Mean Absolute Error (MAE) and MAX Error (MAXE) are used to evaluate the per-formance of different quotation generation models. MAE highlights the large error, and MAXE shows how worse the prediction can be. The lower MAE and MAXE are, the better the prediction is. The two metrics are computed as following:

MAE = Ín

i=1(yˆi−y_i)

n , (9)

MAXE = (max | ˆyi −y_i|), (10) where ˆy is the prediction value and y is the observed value to be predicted.

4.1.4 Comparison of simple price generation model and a genera-tor and discriminagenera-tor combined model. Considering that it is hard to implement and test models in real quotation cases, the performance evaluation adopts an offline evaluation manner [59] and focuses on the comparison of profitability between baseline, which is the simple price generation model, and the generator and discriminator combined model. The first assumption of the evaluation is that clients will always prefer lower prices. Note that the target is actu-ally the ratio of the final quotation price to corresponding guide price rather than the final price as mentioned in section 3.5

WithV denoting the product of quotation quantity and corre-sponding guide price, the profit difference between two quotation and guide price ratio candidatesri andrj is given as follows.

δi, j = V (piri−p_jr_j), (11) wherepiandpjdenote probability of having positive outcome for ratiori andrj.piri−p_jr_j can be viewed as percentage of profit difference in expected sales. In the case of a won historical quota-tion, any ratio lower than the historical ratio is considered as 100% acceptable to that client, thus the probability of positive outcome is assigned as 1. When comparing with a historical ratio with nega-tive outcome, any ratio higher than that ratio is considered as 100% unacceptable to the client, thus the probability of positive outcome is assigned as 0. The probability value in other cases is given as the probability prediction from discriminator.

pi =          0, ifri ≥R, o = 0 1, ifri ≤R, o = 1 ˆ p, otherwise , (12)

whereR and o denote the historical ratio and outcome, with 0 as lost and 1 as won. ˆp is the predicted probability of a positive outcome. With Eq. 11, the performance of two models can be further compared. Here the rate of having positive profit difference in all quotation transactions is considered, and the difference for this value between two models can be computed as,

p_{dif f} =ni−nj

N , (13)

whereni andni denote the number of timesδi, j < 0 and δi, j > 0

respectively, andN is the number of total quotation transactions.

4.2 Model and Data Setup

In order to ensure the validity of experiments, we randomly parti-tion the dataset into training set and testing set with the ratio of 3:2. We repeat this process 25 times, thus we have 25 training and testing patitions. The first 5 partitions are used for hyperparame-ter optimization of models, following the methodology in [11, 48]. Particularly, each algorithm is trained with every hyperparameter candidates on the 5 training sets and we take the highest median of the performance on the 5 testing sets as the best parameter. In our experiments, the tuned hyperparameters are learning rate, number of trees and maximum tree depth for random forests, XGBoost and LightGBM, regularization strength and maximum iteration for logistic regression and linear SVM, and number of neighbors for KNN. The rest 20 partitions are used for feature selection, thus 20 different feature sets will be obtained for each feature selection method in this way. Due to limited computation resources, the number of selected features to experiment are 5, 10, . . . 40.

The same methodology applies to the evaluation of feature se-lection quality and the evaluation of different quotation generation and prediction models. Hyperparameter tuning is also applied to price generation models. We tune the following hyperparameters: regularization strength for ridge regression, number of splines for GAM, number of trees and maximum tree depth for random forests regressor and gradient boosting regressor. For the neural network, the learning rate and output shape of dense layers are tuned. The structure of simple neural network is shown in table 7 in Appendix, and optimizer is Adam [31]. Since linear SVM does not support probability prediction, sigmoid calibration is applied to make it possible.

Package scikit-learn [47], Keras [13] and pyGAM [14] are used for model implementation.

5 EXPERIMENTAL RESULTS

5.1 Performance comparison of feature

selection methods

5.1.1 Feature selection quality comparison. F1 score of classifiers with features selected by different methods over 20 runs is shown in Figure 1. The performance of logistic regression, linear SVM and Gaussian naive Bayes is lower than that of other algorithms in magnitude. Thus we put them in the same row and the others in other rows. Note that it may hard to notice the difference of Fisher score and JMI in F1 score because they resemble each other strongly in performance, especially for KNN, random forests, XGBoost and LightGBM.

Overall, the feature selection performance of embedded meth-ods is better than filter-based methmeth-ods for KNN, random forests, XGBoost and LightGBM because those classifiers are able to reach a higher score level with fewer features that are selected by em-bedded methods. For logistic regression, linear SVM and Gaussian naive Bayes, RFE - LightGBM, RFI - LightGBM are the worst two methods when the feature set size is smaller than 10, however, there is no significant performance difference found among the rest six embedded methods and filter methods.

For the two filter methods, Fisher score and JMI show a very similar performance level over all seven classifiers. For the six

(8)

embedded methods, RFI - LightGBM outperforms other five for KNN, random forests, XGBoost and LightGBM when the feature size is 5 as they obtain a relatively high F1 score, however, this advantage diminishes as the number of features increases. One interesting thing to notice is that all the three RFI methods display less performance improvement as the feature set size increases, which is explainable with their forward feature selection strategy. With more features included, the three RFE methods are slightly better than RFI methods, among which XGBoost is the best in the three tree classifiers when the feature size is larger than 10.

Result of other metrics can be found in Appendix as Figure 2 for accuracy, 3 for AUC, 4 for precision, 5 for recall, 6 for brier score and 7 for log loss.

5.1.2 Efficiency and stability comparison. Selection time and consistency index of each feature selection methods are shown in Table 2. The two filter methods selected are generally better than embedded methods in terms of stability and efficiency. For the embedded methods, the two with XGBoost require more time to do feature selection while the consistency index is also lower than the other two, showing a opposite result with feature quality. RFI-LightGBM has the best performance in both consistency and stability.

Combing feature selection quality, consistency and efficiency, the two filter methods are more recommendable than embedded methods to perform feature selection task for logistic regression, linear SVM and Gaussian naive Bayes, as those two use less time and achieve a comparable classification performance compared with embedded methods. For other classifiers, embedded methods are recommended as long as no strict limits on feature selection time.

5.2 Influential features in quotation activity

Following the feature aggregation rule from stability selection, features which appears in 5 out of the 20 feature sets by each method are considered as stable features. The following features are then selected by major voting among those methods.

• Days between quotation creation date and delivery start date • Duration of delivery

• The acceptance rate of past quotations

• Days after last order (Recency of order transaction) • Days after last quotation (Recency of quotation transaction) • Weekly market price of GOUDA and EDAM provided by EU

commission • Quotation quantity

• Ratio of final sale price to guide price at last quotation • Margin of final sale price to cost

• The mean ratio of final sale price to guide price over all past quotations

• Number of accepted quotations over past quotations • Quotation value

• Number of rejected quotations over past quotations As expected, reference price such as market price and historical results play an important role in predicting quotation outcome, which indicates that clients rely on past experience or relationship with the company and market trend to make decisions. It is also not surprise to find that the days before delivery and the duration

of delivery are among the most important features because these two imply the degree of urgency of clients. The more urgent the client is, the more likely the offered price can be accepted by the client even it is relatively high.

5.3 Performance comparison of classifiers in

predicting quotation outcome

The features used for experimenting performance of different clas-sifier are the aggregated features from section 5.2.

Accuracy, AUC, brier score and prediction time is displayed in Table 3, full result can be found in Appendix Figure 8. Random forests, XGBoost and LightGBM are the three best classifiers in pre-dicting quotation outcomes, and KNN has competitive performance at this data dimension. Since KNN classifies instances by distance in feature space which may imply that the clients can be further categorized based on distance. However, from the F1 score of KNN shown in Figure 1, it is noticeable that the performance of KNN drops gradually as the data dimension increases, suggesting that it is not a good idea to use KNN when dealing with high dimensional data. The simple neural network has an accuracy more than 7% lower than the previous mentioned classifiers, not to mention it requires much more time to do prediction, but its ability in identify-ing positive outcome is good since it has relatively high Recall and F1 score. The low accuracy of neural network can be explained by, first, the number of training instances is insufficient, second, the variance and uncertainty in the business feature data and the mixed feature structure make it less powerful than in the field of image or text classification in which the data are more structured, but it is still better than the rest three classifiers. Logistic regression is the best in terms of both accuracy and brier score among the rest three classifiers. The relatively bad performance of these three classifiers suggests that multicollinearity exists between features [36].

Though the best classification performance is achieved by the three tree classifiers, their probability prediction of a proposed price does not comply to the rational client assumption in section 3.5, which states that the lower the price is, the higher the probability of acceptance is. This can be explained by their tree classification nature. Hence, they are not considered to be the discriminator in the end. Another reason for this decision is that they are more than 10 times slower in terms of prediction time which may not be efficient enough for practical implementation. Logistic regression is then selected as it complies to the assumption and are not too bad in predicting quotation outcome.

5.4 Performance comparison of quotation

price generation models

The performance of different regression model is shown in Table 4, in which the metrics indicate how good those regression models are in simulating salesperson pricing behaviour.

As for MAE, the two linear regression models display the worst performance. Generalized additive model as a non-linear model, is slightly better than linear ones, however, random forests regressor and gradient boosting regressor achieve a MAE 15% lower than GAM. Ridge regression shows the lowest MAXE in the prediction because the penalty it add to the linear regression reduces model variance. Random forests regression has the second lowest MAXE.

(9)

Figure 1: F1 score of classifiers with 20 feature sets selected by different feature selection methods.

Though gradient boosting is in the second place in term of MAE, it also has the second worst performance in MAXE.

Random forests regressor is the best in simulating salesperson pricing behaviour based on the experiment results.

5.5 Performance comparison of baseline model

and the proposed price generator and

discriminator combined model

Based on the experiment results from the previous two sections, ran-dom forests regressor and logistic regression have been selected to be the generator and discriminator in the combined model. Random forests regressor is then also used as the baseline model.

(10)

Table 2: Efficiency and stability performance of different feature selection methods.

Filter Embedded

Number of features

Measures Fisher score JMI RFE - RF RFI - RF RFE -XGBoost RFI -XGBoost RFE - Light-GBM RFI - Light-GBM 5 Selection time (s) 1.40 2.039 47.548 5.097 133.405 19.995 38.169 5.407 Consistency index 0.787 0.787 0.576 0.826 0.659 0.742 0.611 0.865 10 Selection time (s) 1.40 2.039 44.624 10.282 123.75 38.196 38.656 10.633 Consistency index 0.873 0.879 0.657 0.769 0.649 0.647 0.665 0.807 15 Selection time (s) 1.40 2.039 40.682 14.585 118.45 55.747 35.718 13.971 Consistency index 0.848 0.848 0.762 0.788 0.596 0.637 0.721 0.828 20 Selection time (s) 1.40 2.039 36.123 18.568 104.314 69.666 54.396 27.88 Consistency index 0.849 0.849 0.73 0.767 0.562 0.676 0.767 0.798 25 Selection time(s) 1.40 2.039 54.35 37.804 123.457 107.07 39.927 25.477 Consistency index 0.789 0.788 0.757 0.821 0.574 0.664 0.844 0.87 30 Selection time (s) 1.40 2.039 35.419 33.397 108.706 122.966 35.019 27.602 Consistency index 0.821 0.821 0.831 0.93 0.561 0.638 0.848 0.912 35 Selection time (s) 1.40 2.039 29.588 37.032 92.438 141.122 29.931 29.452 Consistency index 0.805 0.806 0.918 0.88 0.559 0.561 0.87 0.928 40 Selection time (s) 1.40 2.039 24.065 41.673 78.656 158.412 24.7 32.261 Consistency index 0.687 0.683 0.887 0.94 0.545 0.483 0.87 0.86

1. For consistency index, the higher the better.

2. The best results of filter and embedded methods are shown in bold respectively.

3. Filter methods perform feature importance measurement once and form a feature ranking after each training phase, hence the selection time remain the same when the number of feature increases.

Table 3: Results of quotation success prediction.

Classifier Accuracy AUC Brier Time (s) KNN 0.853 0.801 0.120 0.020 Gaussian naive Bayes 0.731 0.646 0.209 0.003 Linear SVM 0.729 0.586 0.177 0.003 Logistic regression 0.763 0.626 0.168 0.003 Random forests 0.862 0.784 0.108 0.048 XGBoost 0.864 0.801 0.106 0.042 LightGBM 0.872 0.814 0.113 0.047 Neural Network 0.784 0.807 0.176 0.293

The best result for each metric is shown in bold.

Table 5 shows the profit difference between two models. The price generator and discriminator combined model earns more profit than simple price generation in 19 out of 20 times of experi-ments. The combination of generator and discriminator yields an average ofe 51980 more profit for each quotation transaction in the 20 times of experiments, and this profit improvement equals to 11% in expected sales (if products of that quotation quantity are sold at guide price). However, the price generated by combined model at the same time yields less profit than baseline model in more than 12% of total quotation transactions. The higher overall profit but lower perform-better rate indicate that the combined model

Table 4: Results of price generation.

Regression model MAE MAXE Linear regression 0.03347 0.22608 Ridge regression 0.03352 0.18904 Generalized additive model 0.03274 0.24323 Random forests 0.02816 0.21566 Gradient boosting 0.02898 0.24076

1. The best result for each metric is shown in bold. 2. Note that the target to predict here is the ratio of final quotation price to corresponding guide price.

improves profit efficiently because it earns more profit in fewer transactions. Thus, the conclusion is that the proposed model, in which the price generator predicts a proper quotation price bound-ary and a price decrease from the upper bound gradually to get a positive outcome prediction from the discriminator, can efficiently increase profit. Look into the experiment result further, the com-bined model earnse 190484.067 more profit than baseline model in past accepted quotations, while the number ise 57545.485 in past rejected quotations. For the baseline model, it earnse 26529.456 more profit than combined model in past accepted quotations and e 27973.877 in past rejected quotations. The different performances

(11)

in the two conditions shows that the combined model gain more profit by broadening the profit space.

To implement the combined model into practice, the two hy-perparameters, the decrement for ratio adjusting and threshold of prediction probability for acceptance need to be further experi-mented.

Table 5: Gains of the generator and discriminator combined model comparing to the simple price generation.

Metrics Performance

Profit difference (e ) 51979.922 ± 13416.610 Percentage of profit difference 11.295 ± 3.477 in expected sales (%)

Frequency difference of positive PD (%) −12.264 ± 2.513

PD denotes profit difference, and positive PD means that it earns more profit than the other.

6 CONCLUSIONS

6.1 Conclusion

In this study, five research questions are answered with the goal of making the optimal quotation decisions automatically.

In the comparison of different feature selection methods, filter methods JMI and Fisher score are more recommended than embed-ded methods to perform feature selection for logistic regression, Gaussian naive Bayes and linear support vector machine as they spend less time to perform tasks and the feature selected by them achieve competitive performance. For random forests, LightGBM and XGBoost, the embedded methods are better. Among the six embedded methods, search strategy RFE outperforms RFI as the feature size increases, and base classifier XGBoost is relatively bet-ter than the LightGBM and Random forests when the feature size is larger than 10. Influential features are aggregated across features selected by those methods, and they shows that previous experi-ences, market prices and urgency degree of clients are the most important features in quotation activities.

To realize the goal of achieving more profit and ensure win probability, a price generator and quotation outcome discriminator combined model is proposed. In the selection of the discriminator, though LightGBM is the best in terms of accuracy, its probability prediction does not comply to the lower price, higher acceptance probability assumption, instead, logistic regression is then selected to be the discriminator for its goodness in probability prediction and the balance between accuracy and efficiency. The price generator is selected by comparing the performance of linear regression, ridge regression, generalized additive model, random forests regressor and gradient boosting regressor in simulating salesperson pricing behaviour, and random forests regressor is the best choice in terms of prediction deviation.

The proposed generator and discriminator combined model out-performs the baseline model (random forest regressor) in 19 out of 20 times of experiments. Though in each experiments the proposed model is in average 12% less likely to perform better than baseline model, it yields in average 11% more profit which equals toe 51979 for each quotation. The results indicate the proposed model, in

which a price generator predicts a proper quotation price bound-ary and adjusts the prices with certain decrement to get a positive outcome prediction from a quotation outcome discriminator, can efficiently increase profit.

6.2 Future work

The proposed model is proved to be better than a simple regression model by offline evaluation. However, there are plenty of room for future work. Three aspects of improvement are proposed.

(1) Data level. The quotation related features included in this study are all about quotation specifications, customer infor-mation and their references in making decision. However, to simulate the pricing behavior of salespersons better, more features such as the inventory level of demanded product, quality of product can be included. The negotiation record data can give more insights on customer behaviour. (2) Model level. The classification result indicates that some

common characteristics may exist between customers, thus a cluster analysis of customers is a possible feature work, and more customized pricing models can be built to generate a more precised price and achieve higher profit.

(3) Evaluation level. Models perform good in offline evaluation may perform differently in online evaluation. To know the ac-tual performance of models in practice, an online evaluation is highly suggested.

ACKNOWLEDGEMENTS

I thank my supervisor Chang Li for his helpful feedback and sug-gestions. I am also grateful to Jiayang Zhuo for generously sharing his domain knowledge, practical insights and experience to guide me and inspire me.

REFERENCES

[1] Europe milk market observatory. https://ec.europa.eu/agriculture/ market-observatory/milk_en. Accessed: 2019-05-08.

[2] Lightgbm feature importance. https://lightgbm.readthedocs.io/en/latest/ Python-API.html, note = Accessed: 2019-06-23.

[3] Random forests feature importance. https://scikit-learn.org/stable/modules/ generated/sklearn.tree.DecisionTreeClassifier.html#r251, note = Accessed: 2019-06-23.

[4] Trigona cheese. http://en.trigonadairytrade.nl/cheese-2/. Accessed: 2019-04-10. [5] Xgboost feature importance. https://xgboost.readthedocs.io/en/latest/python/

python_api.html, note = Accessed: 2019-06-23.

[6] Naomi S Altman. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3):175–185, 1992.

[7] Anthony K Asare, Thomas G Brashear-Alejandro, and Jun Kang. B2b technology adoption in customer driven supply chains. Journal of Business & Industrial Marketing, 31(1):1–12, 2016.

[8] David R Bell and Randolph E Bucklin. The role of internal reference points in the category purchase decision. Journal of Consumer Research, 26(2):128–143, 1999. [9] Hernan A Bruno, Hai Che, and Shantanu Dutta. Role of reference price on price and quantity: insights from business-to-business markets. Journal of Marketing Research, 49(5):640–654, 2012.

[10] Yin-Wen Chang and Chih-Jen Lin. Feature ranking using linear svm. In Causation and Prediction Challenge, pages 53–64, 2008.

[11] Huanhuan Chen, Peter Tino, and Xin Yao. Probabilistic classification vector machines. IEEE Transactions on Neural Networks, 20(6):901–914, 2009. [12] Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system. In

Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794. ACM, 2016.

[13] François Chollet et al. Keras. https://keras.io, 2015.

[14] daniel servÃľn, Charlie Brummitt, Hassan Abedi, and hlink. dswah/pygam: v0.8.0, October 2018.

[15] Sanmay Das. Filters, wrappers and a boosting-based hybrid for feature selection. In Icml, volume 1, pages 74–81, 2001.

(12)

[16] Manoranjan Dash and Huan Liu. Feature selection for classification. Intelligent data analysis, 1(1-4):131–156, 1997.

[17] Chad A Davis, Fabian Gerick, Volker Hintermair, Caroline C Friedel, Katrin Fundel, Robert Küffner, and Ralf Zimmer. Reliable gene signatures for mi-croarray classification: assessment of stability and performance. Bioinformatics, 22(19):2356–2363, 2006.

[18] Jun Deng, Zixing Zhang, Erik Marchi, and Björn Schuller. Sparse autoencoder-based feature transfer learning for speech emotion recognition. In 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, pages 511–516. IEEE, 2013.

[19] Shu Fan, Chiwu Mao, and Ley Chen. Next-day electricity-price forecasting using a hybrid network. IET generation, transmission & distribution, 1(1):176–182, 2007. [20] Jerome H Friedman. Stochastic gradient boosting. Computational statistics &

data analysis, 38(4):367–378, 2002.

[21] Quanquan Gu, Zhenhui Li, and Jiawei Han. Generalized fisher score for feature selection. arXiv preprint arXiv:1202.3725, 2012.

[22] Isabelle Guyon and André Elisseeff. An introduction to variable and feature selection. Journal of machine learning research, 3(Mar):1157–1182, 2003. [23] Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik. Gene

se-lection for cancer classification using support vector machines. Machine learning, 46(1-3):389–422, 2002.

[24] Bruce GS Hardie, Eric J Johnson, and Peter S Fader. Modeling loss aversion and reference dependence effects on brand choice. Marketing science, 12(4):378–394, 1993.

[25] Tin Kam Ho. Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, volume 1, pages 278–282. IEEE, 1995.

[26] Arthur E Hoerl and Robert W Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67, 1970.

[27] Bingbing Jiang, Chang Li, Maarten de Rijke, Xin Yao, and Huanhuan Chen. Prob-abilistic feature selection and classification vector machine. ACM Transactions on Knowledge Discovery from Data, 13(2):Article 21, April 2019.

[28] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, pages 3146–3154, 2017.

[29] Luckyson Khaidem, Snehanshu Saha, and Sudeepa Roy Dey. Predicting the direc-tion of stock market prices using random forest. arXiv preprint arXiv:1605.00003, 2016.

[30] Nguyen Lu Dang Khoa, Kazutoshi Sakakibara, and Ikuko Nishikawa. Stock price forecasting using back propagation neural networks with time and profit based adjusted weight factors. In 2006 SICE-ICASE International Joint Conference, pages 5484–5488. IEEE, 2006.

[31] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

[32] Ron Kohavi and George H John. Wrappers for feature subset selection. Artificial intelligence, 97(1-2):273–324, 1997.

[33] Praveen K Kopalle and Joan Lindsey-Mullikin. The impact of external reference price on consumer price expectations. Journal of Retailing, 79(4):225–236, 2003. [34] Ludmila I Kuncheva. A stability index for feature selection. In Artificial intelligence

and applications, pages 421–427, 2007.

[35] Peter J LaPlaca and Jerome M Katrichis. Relative presence of business-to-business research in the marketing literature. Journal of Business-to-Business Marketing, 16(1-2):1–22, 2009.

[36] Kim Larsen. Generalized naive bayes classifiers. ACM SIGKDD Explorations Newsletter, 7(1):76–81, 2005.

[37] Kim Larsen. Gam: The predictive modeling silver bullet. Multithreaded. Stitch Fix, 30, 2015.

[38] KH Leung, CC Luk, KL Choy, HY Lam, and Carman KM Lee. A b2b flexible pricing decision support system for managing the request for quotation process under e-commerce business environment. International Journal of Production Research, pages 1–24, 2019.

[39] Chang Li and Maarten de Rijke. Incremental sparse bayesian ordinal regression. Neural Networks, 106:294–302, October 2018.

[40] Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P Trevino, Jiliang Tang, and Huan Liu. Feature selection: A data perspective. ACM Computing Surveys (CSUR), 50(6):94, 2018.

[41] Huan Liu and Lei Yu. Toward integrating feature selection algorithms for clas-sification and clustering. IEEE Transactions on Knowledge & Data Engineering, (4):491–502, 2005.

[42] Wangchao Lou, Xiaoqing Wang, Fan Chen, Yixiao Chen, Bo Jiang, and Hua Zhang. Sequence based prediction of dna-binding proteins based on hybrid feature selection using random forest and gaussian naive bayes. PLoS One, 9(1):e86703, 2014.

[43] Nicolai Meinshausen and Peter Bühlmann. Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4):417–473, 2010. [44] Christine Moorman, Gerald Zaltman, and Rohit Deshpande. Relationships

be-tween providers and users of market research: the dynamics of trust within and

between organizations. Journal of marketing research, 29(3):314–328, 1992. [45] Robert M Morgan and Shelby D Hunt. The commitment-trust theory of

relation-ship marketing. Journal of marketing, 58(3):20–38, 1994.

[46] Jigar Patel, Sahil Shah, Priyank Thakkar, and K Kotecha. Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Systems with Applications, 42(1):259–268, 2015.

[47] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon-del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Courna-peau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. [48] Gunnar Rätsch, Takashi Onoda, and K-R Müller. Soft margins for adaboost.

Machine learning, 42(3):287–320, 2001.

[49] Rajen D Shah and Richard J Samworth. Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(1):55–80, 2013.

[50] Yael Karlinsky Shichor and Oded Netzer. Automating the B2B Salesperson Pricing Decisions: Can Machines Replace Humans and When? Technical report. [51] Heung-Il Suk, Seong-Whan Lee, Dinggang Shen, AlzheimerâĂŹs Disease

Neu-roimaging Initiative, et al. Latent feature representation with stacked auto-encoder for ad/mci diagnosis. Brain Structure and Function, 220(2):841–859, 2015. [52] Jiliang Tang, Salem Alelyani, and Huan Liu. Feature selection for classification:

A review. Data classification: algorithms and applications, page 37, 2014. [53] Yi-Chia Tsai, Yu-Da Cheng, Cheng-Wei Wu, Yueh-Ting Lai, Wan-Hsun Hu,

Jeu-Yih Jeng, and Yu-Chee Tseng. Time-dependent smart data pricing based on machine learning. In Canadian Conference on Artificial Intelligence, pages 103– 108. Springer, 2017.

[54] Eugene Tuv, Alexander Borisov, George Runger, and Kari Torkkola. Feature selection with ensembles, artificial variables, and redundancy elimination. Journal of Machine Learning Research, 10(Jul):1341–1366, 2009.

[55] Alfonso Urso, Antonino Fiannaca, Massimo La Rosa, Valentina RavÃň, and Ric-cardo Rizzo. Data mining: Prediction methods. In Shoba Ranganathan, Michael Gribskov, Kenta Nakai, and Christian SchÃűnbach, editors, Encyclopedia of Bioin-formatics and Computational Biology, pages 413 – 430. Academic Press, Oxford, 2019.

[56] Jorge R Vergara and Pablo A Estévez. A review of feature selection methods based on mutual information. Neural computing and applications, 24(1):175–186, 2014.

[57] Chen Xing, Li Ma, and Xiaoquan Yang. Stacked denoise autoencoder based feature extraction and classification for hyperspectral images. Journal of Sensors, 2016, 2016.

[58] H Yang and John Moody. Feature selection based on joint mutual information. In Proceedings of international ICSC symposium on advances in intelligent data analysis, pages 22–25. Citeseer, 1999.

[59] Jeonghee Yi, Ye Chen, Jie Li, Swaraj Sett, and Tak W Yan. Predictive model performance: Offline and online evaluations. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1294–1302. ACM, 2013.

[60] Jonathan Z Zhang, Oded Netzer, and Asim Ansari. Dynamic targeted pricing in b2b relationships. Marketing Science, 33(3):317–337, 2014.

(13)

A

APPENDIX

A.1 Features

Table 6 shows the features used in this study. Four feature groups are formed. Individual quotation contains specification of that quo-tation; Last transaction contains features about the last transac-tion (quotatransac-tion or order) from the same client; Past transactransac-tion con-tains aggregated all past transactions from the same client. Other features are in general group.

A.2 Structure of simple neural network

Table 7 shows the structure of neural network classifier used in section 3.3.

A.3 Performance of feature selection methods

The rest metrics used for measuring quality of selected features, namely accuracy, AUC, precision, recall, brier score and log loss of feature selection methods, are displayed in this section. The F1 score is already in section 5.1.1. Note that for a clear visualization, y-axis range can vary for subplots (feature quality measured by different models) in the following figures.

A.4 Performance of quotation outcome

prediction models

Table 8 shows the full results of different classifiers in predicting quotation outcome as in section 3.3.

(14)

Table 6: Feature group and corresponding features.

Feature group Feature

Individual quotation Quotation quantity, Quotation value, Duration of delivery, Days between quotation creation and delivery start date, Delivery country, Customer ID, Customer country, Delivery quarter, Creation week, Material type, Plant ID, Product type, Product hierarchy level (1-6), Sale price, Guide price, Conversion cost, Ratio of sale price to guide price, Margin of sale price and guide price, Margin of sale price and conversion cost, Final outcome

Last transaction Quantity quantity, Quotation value, Recency, Ratio of sale price to guide price, Rencency of order transaction, Order quantity, Order price

Past transactions Sum of quantity, Average of quantity, Sum of quantity in accepted quotations, Average of quantity in accepted quotations, Sum of quotation value, Sum of value in accepted quotations, Average of value in accepted quotations, number of quotations, number of accepted quotations, number of rejected quotations, Standard deviation in quantity, Number of order transaction, Sum of order value, Average ratio of sale price to guide price, Percentage of accepted quotations

General Guide price trend, Number of quotations on the same day from the client, EDAM weekly price, Gouda weekly price, Cheddar weekly price, Trigona contract price, Trigona spot price

1. The sale price here is the final quotation price.

2. Guide price trend represents the up or down trend of guide price derived from moving average of daily mean guide prices with a 30-days window.

Table 7: Structure of neural network.

Layer (type) Activation function Output shape

Input - (1, 635)

Dense Relu (1, 64) Dense Sigmoid (1, 1)

(15)

Figure 2: Accuracy of classifiers with 20 feature sets selected by different feature selection methods.

(16)

Figure 3: AUC of classifiers with 20 feature sets selected by different feature selection methods

(17)

Figure 4: Precision of classifiers with 20 feature sets selected by different feature selection methods.

(18)

Figure 5: Recall of classifiers with 20 feature sets selected by different feature selection methods.

(19)

Figure 6: Brier score of classifiers with 20 feature sets selected by different feature selection methods.

(20)

Figure 7: Log loss of classifiers with 20 feature sets selected by different feature selection methods.

(21)

Table 8: Results of quotation success prediction.

Classifier Accuracy AUC Brier score Log loss Precision Recall F1 score Time (s) KNN 0.853 0.801 0.12 2.394 0.821 0.801 0.81 0.02 Gaussian naive Bayes 0.731 0.646 0.209 0.808 0.657 0.646 0.651 0.003 Linear SVM 0.729 0.586 0.177 0.535 0.67 0.586 0.573 0.003 Logistic regression 0.763 0.626 0.168 0.514 0.707 0.626 0.639 0.003 Random forests 0.862 0.784 0.108 0.357 0.853 0.784 0.808 0.048 XGBoost 0.864 0.801 0.106 0.385 0.844 0.801 0.818 0.042 LightGBM 0.872 0.814 0.113 0.674 0.852 0.814 0.83 0.047 Neural Network 0.784 0.807 0.176 0.874 0.839 0.868 0.853 0.293

The best result for each metric is in bold.