1
Methods of Multi-Model Consolidation, with Emphasis on the Recommended
Cross Validation Approach
Huug van den Dool
CTB seminar, May, 11, 2009
Acknowledgement: Malaquias Pena, Ake Johansson, Wanqiu Wang,Tony Barnston, Suranjana Saha
3
Traditional Anomaly Correlation
F’ = (F - C
obs) A’ = (A - C
obs)
Forecast, verifying Analysis, Climatology
AC = Σ F’ A’ / (Σ F’F’ Σ A’ A’)
1/2Summation is in space, or in space and time.
Weighting may be involved.
Cobs is known at the time the forecast is made, i.e. determined from previous data.
A (and F obviously) are not part of the sample from which C is calculated Relationship of AC (skill) to MSE .
AC is calculated from ‘raw’ data.
New trend due to availability of hindcast data sets:
F“ = (F - C
mdl) A’ = (A - C
obs)
and, C
obstends to be calculated from data that
matches the model data.
5
Short-Cut Anomaly Correlation
F“ = (F - C
mdl) A’ = (A - C
obs)
AC
sc= Σ F” A’ / (Σ F”F” Σ A’ A’)
1/2F” = (F - C
mdl) = (F – C
obs) - (C
mdl- C
obs)) F” = F’ - (C
mdl– C
obs) (1)
Using F” amounts to a systematic error correction (SEC) , which requires a cross-validation (CV) to be honest.
{{ Eq (1) becomes more involved if the periods for C
mdland C
obsare not the
same.}}
Why do we need CV?
• To obtain an estimate of skill on future (independent) data.
While there is no substitute for real time forecasts on future data, a CV procedure attempts to help us out (without having to wait too long)
• Leaving N years out of a sample of M creates N independent data points. Or does it??
• Details of CV procedures used by authors are exceedingly ad-hoc and often wrong
• We recommend 3CVRE
7
Meaning of 3CVRE
• Leave 3 years out (3 as a minimum)
• R: Leave 3 years out, namely the test year plus two others chosen at Random, see
example
• E: Use ‘External’ observed climatology,
not an observed climatology that changes
in response to leaving out a particular set
of 3 years.
Example 1981-2001.
Three years left out. First
year is test year. The
other two are picked at
years left out 1981 1985 1989 years left out 1982 2000 1989 years left out 1983 1990 1998 years left out 1984 1993 1981 years left out 1985 1992 1995 years left out 1986 1999 1987 years left out 1987 1996 1989 years left out 1988 1988 1989 years left out 1989 1983 1992 years left out 1990 1985 2000 years left out 1991 1990 2001 years left out 1992 1996 2001 years left out 1993 1985 1995 years left out 1994 1989 1991 years left out 1995 1986 1996 years left out 1996 1991 1990 years left out 1997 1991 1990 years left out 1998 1991 1988 years left out 1999 2001 1995
9
Why leave three out?, as
opposed to just one. Two very different reasons
• Anomaly Correlation does not change between ‘raw’ and CV-1-out. (This can be shown analytically)
• CV-1-out leads to serious ‘degeneracy’ problems when
the forecast involves a regression (as it does for MME
with unequal weights) and skill is not that high to begin
with (applies unfortunately)
M. Peña Mendez and H. van den Dool, 2008:
Consolidation of Multi-Method Forecasts at CPC.
J. Climate, 21, 6521–6538.
Unger, D., H. van den Dool, E. O’Lenic and D.
Collins, 2009: Ensemble Regression. Manuscript Accepted
Monthly Weather Review
2009 early online release, posted January 2009 DOI: 10.1175/2008MWR2605.1
(1) CTB, (2) why do we need ‘consolidation’?
11
Context: Consolidation of
Several Models
OFFicial Forecast(element, lead, location, initial month) =
a * F 1 + b * F 2 + c * F 3 +…
Honest hindcast required 1950-present.
Covariance (F
1, F
2), (F
1, F
3), (F
2, F
3), and
(F
1, A), (F
2, A), (F
3, A) allows solution for a, b, c
(element, lead, location, initial month)
13
CON is color blind
Apply to:
• Monthly SST, 1981-2001, 4 starts, leads 1-5
• 9 models
• Domain is 20S-20N Pacific Ocean
• (gridpoints, not Nino34 index)
M. Peña Mendez and H. van den Dool, 2008:
Consolidation of Multi-Method Forecasts at CPC.
J. Climate, 21, 6521–6538.
15
Table 1. Some information on the DEMETER-PLUS models
Acronym Full Name layout Period
D1, D2,…,D7 DEMETER Models * Ensemble members: 9 Leads: 0 to 5 months
Initial months: Feb, May, Aug, Nov. 1980-2001 CFS NCEP Climate Forecast System Ensemble members: 15
Leads: 0 to 8 months
Initial months: Jan to Dec 1981-2006
CA CPC Constructed Analog Ensemble members: 12
Leads: -3 to 12
Initial months: Jan to Dec 1956-2006
* Institutions developing these models: European Center for Medium Range Forecasts, Max Plank-Institute, Meteo-France, United Kingdom Met Office, Instituto Nazionale de Geofisica e Vulcanology, Laboratoire d’Oceanographie Dynamique et de Climatologie, European Centre for Research and Advanced Training in Scientific Computation.
K
CON = Σ α k SST k
k = 1
i.e. a weighted mean over K model estimates
One finds the K alphas typically by minimizing the distance between CON and observed SST.
17
Classic or Unconstrained Regression (UR)
The general problem of consolidation consists of finding a vector of weights, α, that minimizes the Sum of Square Errors, SSE, given by the following expression:
SSE = (Zα - o)T(Zα - o) (5)
Then leads to ZTZα = ZTo
So the weights are formally given by
α = A-1 b (6)
where A = ZTZ is the covariance matrix, and b=Zto .
Equation (6) is the solution for the ordinary (Unconstrained) linear Regression (UR).
Why ridge regression?
One of the preferred methods that:
• Tries minimize damage due to overfit (too many coefficients from too little data)
• Tries to handle co-linearity as much as possible
• Has a smaller difference in correlation
(MSE) for dependent and independent
data
19
Essentially, ridging is a multiple linear regression with an additional penalty term to constrain the size of the squared weights in the minimization of SSE (5):
J = (Zα - o)T(Zα - o) + λ αTα (7)
where I is the identity matrix, and , the regularization (or ridging) parameter, indicates the relative weight of the penalty term.
Similarities between the ridging and Bayesian approaches for determining the
weights have been discussed by Hsiang (1976) and Delsole (2007). In the Bayesian view, (8) represents the posterior mean probability of α, based on a normal a priori parameter distribution with mean zero and variance matrix (σ2/λ)I, where σ2I is the matrix variance of the regression residual, assumed to be normal with a mean zero.
Minimization of J leads to
α = ( A + λ I ) -1 b (8)
21
(Delsole 2007)
UR MMA COR
RI RIM RIW
Climo
23
3CVRE
SEC
SEC and CV
25
25.5 .7 26.8 -.4 1981 2.45 25.9 1.1 28.1 .9 1982 2.45 23.8 -.9 27.1 -.1 1983 2.45 23.5 -1.3 26.7 -.5 1984 2.45 24.1 -.7 26.7 -.5 1985 2.45 26.0 1.3 27.4 .2 1986 2.45 26.6 1.9 28.8 1.6 1987 2.45 23.6 -1.1 25.6 -1.6 1988 2.45 26.2 1.5 26.7 -.5 1989 2.45 25.8 1.1 27.3 .1 1990 2.45 23.5 -1.2 27.9 .7 1991 2.45 24.4 -.3 27.5 .4 1992 2.45 24.4 -.3 27.6 .4 1993 2.45 23.5 -1.2 27.3 .1 1994 2.45 22.9 -1.8 27.0 -.2 1995 2.45 25.6 .9 27.1 -.1 1996 2.45 25.8 1.1 28.9 1.7 1997 2.45 23.4 -1.3 25.9 -1.2 1998 2.45 24.5 -.2 26.3 -.8 1999 2.45 25.0 .3 26.7 -.5 2000 2.45 25.2 .4 27.3 .1 2001 2.45 24.7 .0 27.2 .0 all
No CV
Mdl 4 anomaly Obs anomaly year SEC
25.5 .9 26.8 -.4 1981 2.62 25.9 1.3 28.1 .9 1982 2.62 23.8 -.9 27.1 -.1 1983 2.46 23.5 -1.3 26.7 -.5 1984 2.44 24.1 -.8 26.7 -.5 1985 2.32 26.0 1.4 27.4 .2 1986 2.56 26.6 2.0 28.8 1.6 1987 2.63 23.6 -.8 25.6 -1.6 1988 2.73 26.2 1.5 26.7 -.5 1989 2.48 25.8 1.1 27.3 .1 1990 2.54 23.5 -1.2 27.9 .7 1991 2.42 24.4 -.3 27.5 .4 1992 2.49 24.4 -.5 27.6 .4 1993 2.32 23.5 -1.3 27.3 .1 1994 2.38 22.9 -1.8 27.0 -.2 1995 2.48 25.6 .9 27.1 -.1 1996 2.45 25.8 1.0 28.9 1.7 1997 2.36 23.4 -1.4 25.9 -1.2 1998 2.37 24.5 -.3 26.3 -.8 1999 2.42 25.0 .2 26.7 -.5 2000 2.41 25.2 .5 27.3 .1 2001 2.50
3CVRE
Mdl 4 anomaly Obs anomaly year SEC
27
29
Conclusions MME
• MMA is an improvement over individual models
• It is hard to improve upon an equal weight ensemble average (MMA). Only WestPac SST show some
improvement as per ridge regression
• This is caused by (very) deficient data set length. We need 5000 years, not 25.
• Pooling gridpoints, pooling various start times and
leads, throwing out ‘bad’ models upfront and using all ensemble members helps
• Equal treatment for very unequal methods is ….
• RIW and COR make sense, because this is what CPC does subjectively.
• As should have been expected: UR is really bad
31
ACsc
ACsc plus CV
AC (raw)
33
Why leave three out?, as
opposed to just one. Two very different reasons
• Anomaly Correlation does not change between ‘raw’ and CV-1-out. (This can be shown analytically)
• CV-1-out leads to serious ‘degeneracy’ problems when
the forecast involves a regression (as it does for MME
with unequal weights) and skill is not that high to begin
with (applies unfortunately)
35
37
Bayesian Multimodel Strategies
Linear regression leads to unstable weights for small sample sizes.
Methods for producing more stable estimates have been proposed by van den Dool and Rukhovets (1994), Kharin and Zwiers (2002), Yun et al. (2003), and Robertson et al. (2004).
These methods are special cases of a Bayesian method, each distinguished by a different set of prior assumptions (DelSole 2007).
Some reasonable prior assumptions:
R:0 Weights centered about 0 and bounded in magnitude (ridge regression)
R:MM Weights centered about 1/K (K = # models) and bounded in magnitude R:MM+R Weights centered about an optimal value and bounded in magnitude R:S2N Models with small S2N (signal-to-noise) ratio tend to have small weights LS Weights are unconstrained (ordinary least squares)
From Jim Kinter (Feb 2009)
If the multimodel strategy is carefully cross validated, then the simple mean beats all other investigated multimodel
strategies.
Since Bayesian methods involve additional empirical
parameters, proper assessment requires a two-deep cross validation procedure. This can change the conclusion
about the efficacy of various Bayesian priors.
Traditional cross validation procedures are biased and
incorrectly indicate that Bayesian schemes beat a simple
mean.
39
Concluding comments CV
• CV is done because …….
• Does CV lower skill???
• CV procedures are quite complicated, full of traps. (The price we pay for impatience)
• Is there an all-purpose CV approach?
• 1-out procedures may be problematic for several reasons
• 3CVRE appears appropriate for (our) MME
study.
--- OUT TO 1.5 YEARS ---