• No results found

GOOD LUCK! Question 1 [6 points] Are the following statements correct/sensible? Motivate your answer by a short argument

N/A
N/A
Protected

Academic year: 2021

Share "GOOD LUCK! Question 1 [6 points] Are the following statements correct/sensible? Motivate your answer by a short argument"

Copied!
2
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

VU university Statistical Data Analysis, part I

Faculty of Sciences 25 March 2013

Use of a basic calculator is allowed. Graphical calculators and mobile phones are not allowed. This exam consists of 4 questions (27 points).

Please write all answers in English. Grade = total+33 .

GOOD LUCK!

Question 1 [6 points]

Are the following statements correct/sensible? Motivate your answer by a short argument.

a. [2 points] The Shapiro-Wilk test is for testing a composite null hypothesis.

b. [2 points] M -estimators with bounded ψ-function are robust.

c. [2 points] A two sample QQ-plot is the same as a scatter plot for paired data.

Question 2 [6 points]

Let X1, . . . , X100 be independent and identically distributed random variables with unknown distribution P . Suppose we want to test H0 : P = P0, for a known P0, using the χ2 goodness-of-fit test.

a. [3 points] The test statistic

X2=

k

X

i=1

(Ni− npi)2 npi

has approximately a χ2k−1-distribution under H0. Describe the rule of thumb that needs to be satisfied for this approximation to be reliable.

b. [3 points] Suppose we are given intervals I1, . . . , Ik that do not fulfill the rule of thumb for the given sample size. In such a situation we can still use X2 as test statistic. However, we cannot rely on its approximate χ2-distribution. Therefore, we use a bootstrap test.

Describe the steps that are made in a bootstrap test for the given null hypothesis using X2 as test statistic.

Question 3 [7 points]

In Figure 1 a histogram, boxplot and several QQ-plots of a data set x are presented.

a. [2 point] Which of the four location scale families do you think is most appropriate for these data? Explain your answer.

1

(2)

Histogram of x

x

Frequency

0 1 2 3 4 5 6

0510152025 012345

-2 -1 0 1 2

012345

Normal Q-Q Plot

Theoretical Quantiles

Sample Quantiles

0.0 0.2 0.4 0.6 0.8 1.0

012345

Uniform Q-Q Plot

Quantiles of Uniform

Sorted Data

0 1 2 3 4

012345

Exp Q-Q Plot

Quantiles of Exp

Sorted Data

5 10 15 20

012345

Chi^2 Q-Q Plot, df= 8

Quantiles of Chisquare

Sorted Data

Figure 1: Histogram, boxplot and QQ-plots against the N (0, 1), uni- form[0,1], exponential(1) and the standard χ28 distributions of a data set.

b. [2 points] The α−trimmed mean of these data was computed for α = 0, 0.1, 0.2, 0.3, 0.4, 0.5. The values of these 6 trimmed means are, in arbitrary order: 0.69, 0.62, 0.81, 0.63, 0.74, 1.02. Which of these values is the 0.1 trimmed mean? Motivate your answer clearly!

c. [3 points] Using the QQ-plot you have selected under part (a) determine the location a and scale b approximately. You may use that the sample variance equals 1.32. (You may use that the expectation and variance of the χ2k-distribution are k and 2k, respectively.)

Question 4 [8 points]

Let X1, . . . , Xn be independent and identically distributed random

variables with unknown distribution P . Suppose that the sample variance Tn(X1, . . . , Xn) = SX2 is used to estimate the variance of P . To determine the accuracy of this estimator, its standard deviation is estimated by means of the empirical bootstrap.

a. [4 points] Describe the steps of the empirical bootstrap scheme that you would use to find the bootstrap estimate of the standard deviation of Tn.

b. [2 points] Describe shortly which two errors are (necessarily) made in this bootstrap procedure.

c. [2 points] Which of the two errors in part (b) can be made arbitrarily small? What do you have to change in the procedure under (a) to make this error smaller?

2

Referenties

GERELATEERDE DOCUMENTEN

The test that Moore proposed to determine whether an attempt at defining ‘good’ is correct and not an attribution in disguise is the so-called “Open Question Argument.” The

• You may use results proved in the lecture or in the exercises, unless this makes the question trivial.. When doing so, clearly state the results that

In k-means clustering, selecting the value of k that produces the smallest Sum of Squared Errors (SSE) is not suited as a method to determine the number of clusters present in

The grey ‘+’ represents the data point inside the sphere in the feature space.... In this case, there are in total

The grey ‘+’ represents the data point inside the sphere in the feature space... In this case, there are in total

“The basic pecking order model, which predicts external debt financing driven by the internal financial deficit, has much greater time-series explanatory power than a static

Question: How much insulin must Arnold use to lower his blood glucose to 5 mmol/L after the burger and

This theorem will be used to show that the bootstrap percolation process has a sharp threshold, for the case an unoccupied site gets occupied if at least half of it neighbours