• No results found

6.5 Methods

6.5.8 Summary of qualitative properties

Based on the list of important qualitative properties for methods performing causal inference in time series we compiled and on whether each method investigated fulfills these properties discussed in this chapter the following table is created. It summarizes the properties of all methods and proposes a classification scheme for causal inference methods in time series. Properties are reported with respect to the specific implementation used in this benchmark.

Method MTE PMIME PCMCI MVGC TCDF PDC CCM

Delay discovery 3 7 3 7 3 7 ****

Self-causation 7 7 3 7 3 3 7

Instantaneous causality 3 7 3 7 3 *** ****

Observed confounders 3 3 3 3 3 3 7

Unobserved confounders 7 7 7 7 ** 7 7

Polyadic relations 7 7 7 7 7 7 7

Non-linear data 3 3 3 * 3 * 3

Non-stationary data 7 7 7 7 3 7 7

Bivariate/Multivariate data both both both both both both bivariate

Discrete/Continuous data both both both both both both both

Table 6.2: Overview of properties for each method examined. *: Barnett and Seth (2014). **:

Nauta et al.(2019). ***: Faes et al.(2013a).****: Ye et al.(2015).

Results

The results of the benchmark study conducted in Chapter 6 are listed here. The methods are evaluated quantitatively, as described in Section6.4.2. Moreover, their performances are compared, visualized and remarks on the overall results are made. All methods were executed on an 8th generation Intel® Core© i5-8365U CPU.

7.1 Method performance

We apply every method presented in Chapter6to the datasets introduced in Section6.2utilizing the experimental design detailed for each method in its corresponding section. To elaborate on the evaluation of each method in practice, we present the full results of the data group H1 for multivariate transfer entropy. Iterating MTE over 5 such datasets and comparing the estimated causal graph with the ground truth, we obtain the results listed in Table7.1.

Category H1 Dataset 1 Dataset 2 Dataset 3 Dataset 4 Dataset 5 Average

True positives 30 30 36 26 36

False positives 9 8 4 23 7

True negatives 335 336 340 321 337

False negatives 6 6 0 10 0

Sensitivity 0.83 0.83 1 0.72 1 0.88

Specificity 0.97 0.98 0.99 0.93 0.98 0.97

F1 score 0.80 0.81 0.95 0.61 0.91 0.82

MCC 0.78 0.79 0.94 0.57 0.91 0.80

Time (in seconds) 10845 10395 11097 11811 11460 11097

Table 7.1: Full MTE results on the first data category, rounded to two decimals. The average MCC and the median runtime are both highlighted with bold.

This table contains all relevant performance evaluation metrics introduced in the previous chapter for this specific method / data category combination. In the next section, for the simulated H´enon datasets we will only present the average MCC and runtime of each method and shortly comment on its performance. For the real dataset, the F1 score will be reported instead of the MCC, as the latter was found to be too strict for the evaluation of a very low dimensional dataset. The results will be subsequently summarized in Table7.9. The next section includes a series of visualizations of the results, their interpretation, and insights gained from them. Here, results are reported and discussed following the order the methods were presented.

Category H1 H2 H3 H4 Avg. H´enon Real Data MCC (average) 0.80 0.97 0.80 0.94 0.88 0.80 (F1 score)

Time (median, seconds) 11097 795 10993 803 5922

-Table 7.2: Summary results for MTE.

MTE

Multivariate TE performs very well on the low dimensional H2 and H4 data groups, and its performance as well as its running time appear to be significantly impacted on higher dimensional datasets. It is the slowest method with the average median running time over the different data groups exceeding 1.5 hours. MTE performs equally great in the low dimensional loosely coupled group (H4) and in the low dimensional strongly coupled one (H2). The robustness of TE in low dimensions may be attributed to the fact that this particular implementation uses the first KSG estimator for TE (3.20). This estimator bases the threshold for counting points in the marginal spaces on the distance of every point to its kth nearest neighbor on the joint space, and not marginally. As it is noted in Kraskov et al. (2003), for low dimensional data this will not be harmful; only for high dimensional data will the second KSG estimator perform better. Noting the significant difference in MTE performance between low and high dimensional datasets, this remark should be taken into account in TE analyses. On real data, MTE also performed well.

PMIME

PMIME performed perfectly in all H´enon data, precisely retrieving the correct causal graph at every iteration. Contrasting the results between the two information theory methods studied so far, PMIME outperformed MTE both in terms of performance and speed. Indeed, a staggering difference is noted in the computational complexities of the two methods. As remarked in Kugi-umtzis(2013) the fact that PMIME bypasses the computationally exhausting step of significance testing of estimates dramatically improves its speed compared to MTE. PMIME was however challenged by the real dataset, registering a below average result.

Category H1 H2 H3 H4 Avg. H´enon Real Data

MCC (average) 1 1 1 1 1 0.50 (F1 score)

Time (median, seconds) 2075 127 2078 126 1101

-Table 7.3: Summary results for PMIME.

PCMCI

PCMCI performed decently well on simulated data, and was among the top performing methods in the real dataset. The main asset of this method is its computational speed, as PCMCI was the fastest among the methods examined. An interesting observation regarding PCMCI is the fact that it seems to actually benefit from weakly coupled data in terms of performance.

Category H1 H2 H3 H4 Avg. H´enon Real Data MCC (average) 0.67 0.60 0.78 0.85 0.72 0.80 (F1 score)

Time (median, seconds) 25 1 21 1 12

-Table 7.4: Summary results for PCMCI.

MVGC

MVGC on average performed relatively well, however its performance was inconsistent and highly impacted by different coupling strengths and dimensionalities; MVGC significantly benefited from high dimensional data. On real data, MVGC showcased above-average performance.

Category H1 H2 H3 H4 Avg. H´enon Real Data MCC (average) 0.84 0.29 0.72 0.53 0.70 0.67 (F1 score)

Time (median, seconds) 383 3 220 3 152

-Table 7.5: Summary results for MVGC.

TCDF

On simulated data, TCDF also performed decently well. It attained a balance between consistent performance over different configurations, computational speed (scaling favorably as the number of variables increased) and general robustness in the data it can accommodate, being suitable even for non-stationary data. On the other hand, TCDF performed badly on the real dataset, essentially failing to detect causality.

Category H1 H2 H3 H4 Avg. H´enon Real Data MCC (average) 0.83 0.76 0.65 0.70 0.74 0 (F1 score)

Time (median, seconds) 73 17 73 17 45

-Table 7.6: Summary results for TCDF.

PDC

On simulated datasets, PDC was the best non-information theoretic method. It exhibited consist-ently high performance throughout the different categories. The specific implementation used was found to be significantly slower as the number of time series increased, which should be attributed to inefficient coding routines. On real data, PDC displayed mediocre performance.

Category H1 H2 H3 H4 Avg. H´enon Real Data MCC (average) 0.85 0.86 0.86 0.82 0.85 0.56 (F1 score)

Time (median, seconds) 4936 14 4866 13 2457

-Table 7.7: Summary results for PDC.

CCM

CCM was the worst performing method in the H´enon map datasets. This may be attributed to the fact that CCM is only able to make bivariate inferences. As a result, it is the only method that does not account for the effects of the other variables on each causal relation it investigates.

In the low dimensional real dataset consisting of 3 variables only, CCM performed well.

An overview of the results discussed so far is included in Table 7.9. In this table, methods are sorted based on their overall average performance (on simulated data) from the best performing (top) to the worst performing (bottom) methods. The results presented so far are subsequently discussed and visualized in the next section, and methods are comprehensively compared.

Category H1 H2 H3 H4 Avg. H´enon Real Data MCC (average) 0.36 0.41 0.28 0.44 0.37 0.80 (F1 score)

Time (median, seconds) 621 31 587 31 318

-Table 7.8: Summary results for CCM.

Category H1 H2 H3 H4 Average Runtime Real Data (F1 score)

PMIME 1 1 1 1 1 1101 0.50

MTE 0.80 0.97 0.80 0.94 0.88 5922 0.80

PDC 0.85 0.86 0.86 0.82 0.85 2457 0.56

TCDF 0.83 0.76 0.65 0.70 0.74 45 0

PCMCI 0.67 0.60 0.78 0.85 0.72 12 0.80

MVGC 0.84 0.29 0.72 0.53 0.70 152 0.67

CCM 0.36 0.41 0.28 0.44 0.37 318 0.80

Table 7.9: Summary of all results.