A correction article relating to the abstract and author affiliation of this publication can be found here: http://dx.doi.org/10.5334/jors.136
In recent years decadal climate predictions have become more and more popular in the climate science community. Typically an ensemble of so-called hindcast experiments with starting dates between 1960 and today are performed enabling a verification against observational data. But the evaluation and validation of decadal climate prediction systems is both a scientific and a technical challenge in the current climate research.
There are several convincing arguments for using a standardized tool for decadal model evaluation. For instance, model development stages and interim test phases of (in this case decadal climate predictions) can be assessed easily. It also simplifies the comparison of decadal prediction systems developed by different modeling groups.
Goddard et al.  proposed a framework for verification of decadal hindcasts addressing two key questions:
1. Do the initial conditions in the hindcast lead to more accurate predictions of the climate, compared to uninitialized climate change projections?
2. Is the prediction model’s ensemble spread an appropriate representation of forecast uncertainty on average
The first question is addressed through the Murphy- Epstein decomposition of the Mean Squared Error Skill Score (MSESS) , which is based on the Mean Squared Error and can be used to compare two different model outputs (e.g. initialized and uninitialized). The MSESS is a deterministic skill measure of the ensemble mean. For the second question Goddard et al.  suggest to use a modified version of the Continuous Ranked Probability Skill Score (CRPSS) , which compares the average ensemble spread with the mean squared error. We extended the probabilistic part with the logarithmic ensemble spread score , which is a direct estimate for the skill of the mean ensemble spread.
MurCSS follows this framework and offers a standardized and reproducible way to calculate the above mentioned metrics for decadal prediction systems including a bootstrap method to determine significance levels. A detailed explanation of the tool, its output, and a tutorial how to use the tool can be found on our web-page ( https://www-miklip.dkrz.de/about/murcss/ ).
Implementation and architecture
The tool architecture can be separated into three components (see Fig. 1), file input/preparation, metric calculation, and plotting routines. The file input searches for valid files and prepares them for the actual metric calculation. This component makes use of the MiKlip file database which is based on the international standards CMOR  and NetCDF . It is also possible using the simplified file input component (findFilesCustom.py) which can be easily adjusted to the local database structure. The main part is the metric calculation, where the actual skill score calculation takes place. Most calculations are performed using CDO or the NumPy package, applying multiprocessing whenever possible. After processing the results are stored in the common NetCDF data format. In the last step these files are visualized using plotting routines.
We developed the following strategy to assure the quality of the system.
a) We assembled a set of test-data.
b) Unit tests, which were constantly extended as soon as new problems were observed.
c) Users within MiKlip helped as beta-testers to improve the tool.
MurCSS has been developed and tested on Linux. Although the tool should run on every system where Python and CDO are installed.
Python 2.7, CDO 1.5.4
• Python Modules:
- Matplotlib >= 1.1.0
- NumPy >= 1.5.0
- SciPy >= 0.8.0
• NetCDF Libraries
GNU General Public License 3
GNU General Public License 3
(3) Reuse potential
MurCSS is already frequently used within the MiKlip project. The direct access to our file database and the standardized score calculation allows a quick assessment of the latest model improvements. The standardized output also facilitates the comparison of results obtained by different working groups within MiKlip. By now MurCSS has been used in a number of publications, for instance Pohlmann et al.  or Kadow et al. , and more in preparation.
Although it was developed to improve the comparability within the MiKlip project, we encourage other modeling groups working on decadal climate predictions to use our tool to benefit from the advantages that come with a standardized evaluation tool. In order to use the tool in projects outside of MiKlip MurCSS comes with a simplified file input component (findFilesCustom.py) which can be easily adjusted to the local data-structure in murcss_ config.py.
The tool can also be applied to other disciplines of climate modeling, at the time of writing, for instance, we are working on an extension of MurCSS to seasonal forecast systems. Furthermore the modular architecture allows an easy extension to other skill scores or metrics of interest.
For the interested reader we established a guest account at our web-page ( https://www-miklip.dkrz.de / user: guest / password: miklip) where we provided some example analysis of the tool.
Support for MurCSS
When users or developers run into problems or discover bugs we encourage them to either open an issue on the GitHub page or to contact us via email (sebastian.illing@ met.fu-berlin.de) directly.
We encourage users to submit pull requests on the GitHub page if they have developed new features.