• No results found

5.3 Incorrectly Rejecting H 0

5.3.1 Errors

As simulations are only approximate imitations of the tests, there are errors that affect the results obtained. For the implementation of these tests, we see that four main errors contribute.

Machine Error It is important to note that machine error is present in the results. This error is generally neglected due to its small size. However, to compute the p-values many different values were compared, some of which were equal up to an order of 10−15. Thus, when comparing two such numbers one will be strictly greater than the other, although without the machine error these would be equal. For example, if a p-value is close to 0.05 this error could cause the value to increase slightly over 0.05 resulting in an incorrect rejection.

CHAPTER 5. RESULTS & COMPARISON OF METHODS

Error from Reduced Number of Permutations For large values of n and small values of p the number of permutations can be very large. Thus, Monte Carlo methods were used to limit the number of permutations used to construct tests with reasonable run times. However, this results in only approximate p-values. It is known that the accuracy of results from a Monte Carlo simulation deteriorates as the number of permutations used decreases. For the above tables, 1000 permutations were used. Although comparably small for large values of n and small values of p, the computed p-values were already smaller than 0.05 by an order of 102 or were considerably larger. Therefore, the high runtime cost did not outweigh the improvement of the values. Note that the tests are run 1000 times for each function and each method for data sets of length 150 and 1000 Monte Carlo iterations. Thus, the running time increases quite rapidly when increasing the number of Monte Carlo iterations.

Error from Finite Number of Repetitions To compute the above results, the tests were only run a finite number of times. As the number of runs increases to infinity, the fraction that the test rejects under the null will converge to its true value. The smaller the number of repetitions, the further away the result will be from its true value.

Asymptotic Error The finite number of observations used also causes an error. The Fourier frequencies that define the periodogram are inversely proportional to the length of the observed data. Thus, the larger the data the more frequencies considered in the interval [−π, π]. In some cases, the periods tested do not correspond to a Fourier frequency and thus the extended periodogram is used which interpolates the values of the periodogram to consider all values in the interval [−π, π]. In theory, as n increases to infinity, the two periodograms coincide. Thus, a smaller n will cause a larger asymptotic error. Note, that this also applies to the Lomb-Scargle’s periodogram. Based on repeated experiments the Lomb-Scargle’s periodogram is believed to have the highest asymptotic error compared to the other test statistics. When increasing n from 150 to 300 there was a noticeable improvement in the results attained from Lomb-Scargle’s periodogram, whereas the other results remained similar (See Section5.4.1).

5.4 Receiver Operating Characteristic (ROC) Curves

As seen in Chapter2, ROC curves are useful tools for assessing the performance of a test. Figures 5.6 to 5.15 show the ROC curves for the different functions given in Table 5.1. In each of the graphs, the purple line is the line of no-discrimination. Recall that the best test statistic will be closet to the top left corner. To compute the false positive and false negative rates, 1000 data samples were generated each of length 150, half under H0 and a half under H1. The number of Monte Carlo iterations was set to 1000. This is consistent with the values used to compute the fraction that the test rejects under the null in Section 5.3. Due to this, the errors explained in Section5.3.1apply to the results in this section as well.

In many cases, the lines overlap at multiple points and thus it is not easy to see which one is the better statistic. For example in Figure 5.4 it is difficult to see whether the blue or yellow line is closer to the top left corner. In such cases, the area under the curve is computed and the test with the largest area is considered better. Furthermore, it should be noted that the results for permuting method 2 (sub-blocks) are usually quite similar to those of permuting method 1 (blocks) as for cases where an integer period is tested, method 2 reduces to method 1.

To get a better intuition for the functions, their periodograms are plotted. The periodogram for the sine and cosine functions can be found in Figure 5.2. In these figures, we see a peak at the frequencies corresponding to the period of the functions. From equation3.8 we see that the frequencies corresponding to the period of the sine function is 6π/7 ≈ 2.7 and 4π/5 ≈ 2.5 for the cosine function. The periodograms have a similar shape and thus we expect the results to be sim-ilar. However, we see that in this particular case the peak for the cosine function is higher. These

CHAPTER 5. RESULTS & COMPARISON OF METHODS

variations occur due to the random error added to the functions. Also, recall from Proposition 3.2.1that the variance of the periodogram is the highest at the peak.

Figure 5.2: Periodogram of the sine and cosine function. In the figures we see a peak at the frequencies corresponding to the period of the functions. The sine function has period 7/3 which corresponds to frequency 6π/7 ≈ 2.7 and the cosine function has period 5/2, which corresponds to frequency 4π/5 ≈ 2.5.

Sine Function For the sine function the tests perform quite well as three of the curves are close to the top left corner. This is important as the sine is one of the most basic periodic functions.

The fact that it performs well allows us to consider more complicated functions like those with multiple frequencies (See Section 5.6). Figures5.3, 5.4, and5.5 show that the best tests are the ones using Fisher’s G-statistic, Bartlett’s method, and Lomb-Scargle’s Periodogram. To determine which test is better the areas under the curves were calculated and recorded in Table5.6.

Test Statistic Fisher Bartlett Welch Lomb-Scargle

Method 1 0.9636 0.9695 0.8213 0.9513

Method 2 0.9636 0.9633 0.8325 0.9335

Method 3 0.9727 0.9662 0.8995 0.9396

Table 5.6: Area under the ROC curves for each test and each permuting method. We see that the test using Fisher’s G-statistic for permuting method 3 performs best.

From Table 5.6 we see that the areas are quite similar which we could have predicted from the high level of overlap between the curves. However, we see that Fisher’s G-statistic is the best for permuting methods 2 and 3 while Bartlett’s performs slightly better for permuting method 1.

Cosine Function A cosine function is a sine function with phase π/2. Hence, the test performs similarly to the sine function. Computing the area under the curves we saw that the curve for the test using Fisher’s G-statistic improved to 0.9727 for both methods 1 and 2. Similar improvements were found for Lomb-Scargle’s periodogram, however, the tests using Bartlett’s method and Welch’s method perform slightly worse. The cause of these deviations is due to the randomness of the noise and the difference in chosen periods to test under the alternative hypothesis. Due to the different periods of the functions, the periods tested under the alternative were different. Nonetheless, the test using Fisher’s G-statistic performs best for all permuting methods.

CHAPTER 5. RESULTS & COMPARISON OF METHODS

Figure 5.3: ROC curves for the sine function for permuting method 1 (Permuting Blocks).

Figure 5.4: ROC curves for the sine function for permuting method 2 (Permuting Sub-blocks).

Figure 5.5: ROC curves for the sine function for permuting method 3 (Permuting Observations).

CHAPTER 5. RESULTS & COMPARISON OF METHODS

Figure 5.6: ROC curves for the cosine function for permuting method 1 (Permuting Blocks).

Figure 5.7: ROC curves for the cosine function for permuting method 2 (Permuting Sub-blocks).

Figure 5.8: ROC curves for the cosine function for permuting method 3 (Permuting Observations).

CHAPTER 5. RESULTS & COMPARISON OF METHODS

Similarly, in Figure5.9we plot the periodogram values for the sawtooth and triangle waves to aid in our analysis. In the figures we see a peak at 2π/9 ≈ 0.7 and 6π/8 ≈ 2.35 for the sawtooth, and triangle waves, respectively. By equation3.8these value correspond to the period of the waves.

Figure 5.9: Periodogram of the sawtooth and triangle wave. In the figures we see a peak at the frequencies corresponding to the period of the functions. The sawtooth wave has period 9, which corresponds to frequency 2π/9 ≈ 0.7 and the triangle wave has period 8/3 which corresponds to frequency 6π/8 ≈ 2.35.

Sawtooth Function Due to its rigid shape, we see that the periodogram of the sawtooth wave (left of Figure5.9) is not as clearly defined as for the other functions. The periodogram is flatter making it more likely that the tests reject the null hypothesis. This flatness is also apparent in the Lomb-Scargle Periodogram. Due to this, we see the ROC curves of the tests move further away from the top left corner. However, in Figures 5.10, 5.11, and 5.12 we see that the performance of the test using Fisher’s G-statistic remains quite similar to the previous functions tested. This is because the test using Fisher’s G-statistic leverages the fact that Fisher’s G-statistic stays con-stant under the null hypothesis. Therefore, the flatness of the periodogram does not affect Fisher’s G-statistic as much as the others. We can verify this observation by calculating the area under the curve which are 0.9605, 0.9693, and 0.9655 for permuting methods 1,2, and 3, respectively.

Here we see that the values are very similar to the areas found for the sine function.

From the figures, we also see that the tests using Welch’s method and Lomb-Scargle’s periodogram improve quite drastically for permuting method 3. For the test using Welch’s method, the im-provement is more drastic. The area under the curve increases from 0.6422 to 0.8745 between permuting method 1 and 3. This improvement is seen for the other functions as well. In the first two permuting methods the neighbourhood of an observation does not change substantially, especially when the observations are in the middle of a block or sub-block. Welch’s method groups observations in the same neighbourhood together and averages them. Thus, its value will not vary that much under permutations. On the other hand, in permuting method 3, individual observa-tions are permuted and thus an observaobserva-tions neighbourhood is more different. In this case, the value of the test statistic is more susceptible to change allowing it to be more powerful. For the test using Lomb-Scargle’s periodogram, the improvement is based on the higher degree of variance when using permuting method 3 as well. The variation in the Lomb-Scargle periodogram values is already quite small compared to the other test statistics. Therefore, with less variation in the data, it is less likely for the test to reject the null hypothesis.

Triangle Function The triangle wave has a similar structure to a sine function with period 8/3. Due to this, we see that the ROC curves for the triangle wave are quite similar to that of the sine function. As the curves overlap at multiple points in Figures5.13,5.14, and5.15we compute the area under the curves. We find that the area underneath the curve for the test using Fisher’s G-statistic is the highest with values 0.9727 for each permuting method. The test using Bartlett’s is only slightly worse.

CHAPTER 5. RESULTS & COMPARISON OF METHODS

Figure 5.10: ROC curves for the sawtooth wave for permuting method 1 (Permuting Blocks).

Figure 5.11: ROC curves for the sawtooth wave for permuting method 2 (Permuting Sub-blocks).

Figure 5.12: ROC curves for the sawtooth wave for permuting method 3 (Permuting Observations).

CHAPTER 5. RESULTS & COMPARISON OF METHODS

Figure 5.13: ROC curves for the triangle wave for permuting method 1 (Permuting Blocks).

Figure 5.14: ROC curves for the triangle wave for permuting method 2 (Permuting Sub-block).

Figure 5.15: ROC curves for the triangle wave for permuting method 3 (Permuting Observations).

CHAPTER 5. RESULTS & COMPARISON OF METHODS

In conclusion, we see that in all cases the test using Fisher’s G-statistics performs consistently well. This result is not surprising based on the findings in Chapter4. Under the null hypothesis, the value of Fisher’s G-statistic remains constant whereas under the alternative hypothesis its value decreases under permutations. As it behaves very differently under the different hypothesis, its false positive rate is approximately zero and its false negative rate close to one. Although there are specific cases where the test using Bartlett’s method does perform slightly better, there are also cases where it performs considerably worse. Therefore, the test using Fisher’s G-statistic is preferred as it is more consistent. Moreover, it is recommended to use this test with permuting method 3, as the results are more consistent than for the other permuting methods.