Data and methods for the MiKlip decadal forecasts 2018-2027

Data

For the analysis, yearly means of near surface temperature are used. The data of the decadal climate prediction system (MiKlip-system) consists of predictions, which were started in the past to validate the system (hindcasts), and of predictions for the next ten years. The forecast system consists of an initialisation scheme, considering observations, and the global circulation model MPI-ESM (Müller et al., 2012; Pohlmann et al., 2013; Marotzke et al., 2016). The predictions were done with the 'PreopLR' configuration using MPI-ESM 1.2. The prediction data contain ten ensemble members, which were yearly initialised for the years 1960-2017. Each simulation is integrated for ten lead years. For a spatially higher resolved evaluation over Europe, data of the global model in the 'baseline1' configuration (MPI-ESM 1.0) were dynamically downscaled by means of the regional climate model CCLM (Rockel et al., 2008; Mieruch et al., 2014).

With the data of the global model, the region of the North-Atlantic (NA) between 60°-10°W and 50°-65°N is additionally investigated, besides global evaluation. Therefore, HadCRUT4 (Morice et al., 2012) is used as observational dataset, which is available on a global 5°x5° grid. For the assessment of forecast skill of the regional model, a comparison with observation dataset CRU TS 4.01 (Harris et al., 2014) is done. For a consistent validation, the model data of the prediction system is regridded to the same grid as observations (5°x5° and 0.5°x0.5°, respectively). On the one hand data analysis is done for each grid point separately and on the other hand done for spatial averages of the considered regions, i.e. global mean and NA for data of the global model as well as average of the European region for the regional model.

Temperature anomalies and temporal averaging

Temperature anomalies with respect to the period 1981-2010 (WMO reference period) are calculated for both, predictions and observations. The systematic temperature difference between model and observation (bias) generally changes with lead time (model drift). This functional behaviour is estimated by means of the hindcasts, in order to adjust these deviations (Pasternack et al., 2017). At the same time, it is assumed that the functional behavior of the bias changes with initialisation time from 1960-2017. The method of Pasternack et al. (2017) adjusts mean bias and additionally the conditional bias and the ensemble spread. The latter is done to ensure that the forecast uncertainty is represented by the ensemble spread.

The adjusted temperature anomalies are analysed for 4-year running means for the decadal forecast. Thus, predictions are made for the lead years 1-4, 2-5, 3-6, …, 7-10. For the yearly forecast, only the mean of the first lead year is evaluated.

Validation and prediction skill

The validation of the prediction skill is done with hindcasts of the MiKlip system, which were produced for the past. The maximum time period, which can be used for validation for every lead time period (year 1-4 to year 7-10) is 1967-2016. For the skill assessment, hindcasts are compared with observations. An assessment cannot be done for grid points without existing observations for the validation period (missing values). These grid points are grayed on the map. The skill of the decadal prediction is compared with a reference forecast. The difference of these forecast skills, i.e. the improvement of the decadal forecast in comparison to the reference forecast, is called skill score [%]. If the skill of the decadal forecast system and the skill of the reference forecast is identical, the skill score has a value of 0%. For a perfect decadal prediction, the skill score is 100%. Reference forecasts are the climatology of the observations for the period 1981-2010 and the uninitialised historical climate projection, which differs from the decadal prediction system only in the non-existing initialisation scheme. Bootstrapping is used for testing whether the skill improvement in comparison to the reference forecast is random (significance test). Therefore, random years from the validation period are 1000 times sampled with replacement and also validated. The significance level is 95%.

Ensemble mean forecast

An ensemble average is calculated out of the ensemble members, which is used for forecast and validation. For the spatial averages, the 10. and 90. percentiles of the ensemble distribution are also shown beside the ensemble mean. To validate the prediction skill of the ensemble average, the skill score of the mean square error between hindcast and observation is used (MSESS) (Goddard et al., 2013; Illing et al., 2013; Kadow et al., 2014). The MSESS assesses whether the decadal prediction is able to better reproduce observations than the reference forecast of climatology (Fig. 1) and the uninitialised historical climate projection (Fig. 2).

Figure 1: MSESS of the decadal forecast (ensemble mean of near-surface temperature) for lead years 1-4: positive/negative values describe skill improvement/decline of the decadal prediction in comparison to the reference forecast of the climatology, both compared to observations HadCRUT4.
Figure 2: MSESS of the decadal forecast (ensemble mean of near-surface temperature) for lead years 1-4: positive/negative values describe skill improvement/decline of the decadal prediction in comparison to the reference forecast of the uninitialised historical climate projections, both compared to observations HadCRUT4.

Probabilistic forecast

For the probabilistic forecast the period between 1981-2010 is split into three equivalent frequency ranges of temperature (temperature below normal, normal and higher than normal). Based on the distribution of the ensemble simulations, a forecast probability for each category and lead year period (year 1-4, … , year 7-10) is calculated. Due to the small number of ensemble members, the probability calculation is done with a Dirichlet multinomial model with flat Dirichlet prior (Agresti and Hitchcock; 2005).

The validation of the decadal forecast compared to observations is done with the ranked probability skill score (RPSS) (Ferro 2007; Ferro et al., 2008), which assesses the prediction of the concerning ranks. The RPSS compares whether the decadal prediction system is able to better reproduce the observations than the reference forecast of climatology (Fig. 3) and the uninitialised historical climate projection (Fig. 4).

Figure 3: RPSS of the decadal forecast of near-surface temperature for lead years 1-4: positive/negative values describe skill improvement/decline of the decadal prediction in comparison to the reference forecast of the climatology, both compared to observations HadCRUT4.
Figure 4: RPSS of the decadal forecast of near-surface temperature for lead years 1-4: positive/negative values describe skill improvement/decline of the decadal prediction in comparison to the reference forecast of the uninitialised historical climate projections, both compared to observations HadCRUT4.

References