(1) Overview

Introduction

Long-range weather forecasts based on output from ensembles of computer simulations are attracting increasing interest as being useful for various weather-sensitive socioeconomic sectors, including agriculture, energy, and water management [1, 2, 3, 4, 5]. A variety of methods have been proposed to convert ensemble outputs to calibrated probabilistic forecasts of future meteorologic variables such as temperature and precipitation [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]. The workflow for applying, assessing and comparing the performance of different methods involves many steps: downloading historic observations and ensemble outputs; applying the selected methods to produce probabilistic forecasts; comparing the performance of the selected methods over a common past period using any of several possible metrics; and producing summary graphics. Until now, there has not been any publicly available implementation of this workflow, forcing different investigators and end users to write their own and limiting reproducibility of published results and the ability to isolate causes for differences between studies.

To address these problems, a new SeFo (Seasonal Forecasting) package was written, with functionality as given in the following section. This SeFo package builds on the Logocline package previously published by the author [20].

This paper is intended to serve as a general description of the philosophy and computational architecture of the package. Technical details on specific forecasting methods implemented and skill assessments of seasonal forecasting using this package will be published in appropriate peer-reviewed venues.

Implementation and architecture

SeFo is written as a package for the free numerical computing language and environment GNU Octave [21, 22], exploiting built-in Octave functions and capabilities wherever possible. (Although Octave is deliberately designed to be able to run programs written for the proprietary environment MATLAB, no effort has been made to make SeFo usable in MATLAB.) The package has a modular architecture that isolates components of the workflow for probabilistic seasonal forecasts using model ensembles. Each module is an Octave function that writes its output to a file, from which it can be accessed by other modules. An input structure is used to hold options that need to be passed on to different modules, such as which data sets, dates, and forecast methods to use. (The list of possible options is given in a README text file.)

The core SeFo functions (modules) all have names starting with “sefo_”, and their interrelationships are diagrammed in Figure 1. There are also a number of ancillary functions included in the package, for example for obtaining regression models used for particular forecast methods, regridding to the 1 degree grid used for the forecast ensemble output, and making maps. The capabilities of the core functions are as follows:

sefo_obs_read: Download and regrids observational data (currently surface air temperature, either from Berkeley Earth Surface Temperature [23]; NCEP/NCAR reanalysis [24]; or Climate Prediction Center [25]) for a specified month.

sefo_obs_assemble: Collect the observations for a sequence of months.

sefo_fcst_read: Download and store ensemble forecasts from a given climate model and month (currently the data source is the North American Multi-Model Ensemble (NMME) Phase 1 [26] accessed via the IRI Data Library [27]).

sefo_fcst_assemble: Collect ensemble forecasts for a sequence of months.

sefo_predict: Apply one of several (currently 23) available prediction methods to estimate a probability distribution values for a given month from current ensemble predictions plus a set of past prediction-observation pairs. Currently the implemented forecast methods all return t distributions as the forecast probability distributions.

sefo_adj: Apply an optional calibrating adjustment to the forecast t distribution to better match the distribution of verifying observations over some specified past period.

sefo_cdf: Calculate, and optionally map, requested quantiles of a forecast probability distribution.

sefo_verify: Compare probabilistic forecasts for a past period against observations using several metrics, including forecast root mean square error, bias, mean negative log likelihood, and Kolmogorov-Smirnov statistic.

sefo_time_methods: Compare the computation times for selected forecast methods.

sefo_example: Exercise the key components of the package by generating a sample forecast for next month (Figure 2).

Figure 1 

Calling dependencies between the core functions in sefo. Each has “sefo_” prefixed to its name.

Figure 2 

Example graphical output, generated with the sequence “predict_year = 2016; predict_month = 5; lag = 2; sefo_example” using version 0.0.2 of SeFo.

Basic installation instructions are provided in the README file.

Quality control

Each of the core functions (all the functions with names beginning in sefo, except sefo_example) has a demonstration script that tests and illustrates its basic capabilities. “demo function_name” will run this script in Octave. Some of the ancillary functions have their own unit tests defined (“test function_name”). Development and testing was carried out in a Linux environment, specifically the Debian distribution (versions Jessie (Stable) and Unstable), and in Mac OS X with a Macports Octave installation.

Limitations and potential improvements

Currently, the package routines are not fully generalized. For example, NMME is currently the only supported source of ensemble predictions.

Documentation for the package and unit tests and demos for non-core functions are also not complete.

More information could be provided while the software is running, such as percentage progress of the downloads and data analysis.

Better input checking for the options structure could be provided with analogues of the odeset and odeget functions used for supplying parameters to Octave’s differential equation solvers.

While the current data sources for SeFo, referenced above, are, to the author’s knowledge, available without restrictions on use, abilities to handle and display different data licenses could potentially be added.

Once the functionality has been extended to more use cases and the documentation is more complete, it is envisioned that SeFo might be added to the Octave Forge repository, from which it might be accessed by a wider user base.

Users are encouraged to submit bugs and patches to the repository issue tracker on Bitbucket.

(2) Availability

Operating system

While in theory the package should run in any operating system for which Octave is available, including Windows, it has only been tested in Unix-like environments (Linux and Mac OS X).

Programming language

The package requires GNU Octave (Version 3.8 or newer) with the linear-algebra [28], nan [29], netcdf [30], and splines [31] packages installed.

Additional system requirements

An Internet connection is required to download observational data and numerical weather prediction model ensemble output. Data and intermediate files are stored locally, which will typically require one to several gigabytes of space, depending on the use case.

Dependencies

There are no dependencies beyond those for Octave with the indicated packages.

Software location

Archive (e.g. institutional repository, general repository)

Name: figshare

Persistent identifier:https://dx.doi.org/10.6084/m9.figshare.3114844.v1 (tarball of version 0.0.2)

License: CC-BY

Publisher: NY Krakauer

Date published: 15/03/16

Code Repository (e.g. SourceForge, GitHub etc.)

Name: Bitbucket

Identifier:https://bitbucket.org/niryk/sefo

License: GPL V3+

Date published: 15/03/16

Language

Octave

(3) Reuse potential

Given the modular structure of SeFo, it could be extended with comparatively little additional work within the seasonal forecast context to accommodate alternative weather variables (such as precipitation or sunniness, although because these are farther than temperature from a normal distribution, some modification in the forecast methods would be advisable [32, 33]), observation data sources, sources of ensemble outputs (besides NMME), and methods for generating forecasts from ensemble outputs and past observations. Many of the components of SeFo, including the specific forecast and verification methods implemented in the functions called by sefo_predict and sefo_verify, could also be reused for forecasting applications in fields outside of weather prediction.