1 Introduction

Analyses of future energy systems are based on tools for complex socio-techno-economic systems. The complexity of these systems increases due to the intermittent supply characteristics of renewable energies which require high temporal and spatial resolution modeling. Additionally, a higher interaction between sectors such as heat, power and transport leads to the need for comprehensive sector coupled approaches. At the same time, a trend towards open source energy (system) models can be observed in the energy system modeling research field [], as models have been criticized for lack of transparency and reproducibility [].

For energy system modelers, data handling including input collection, processing and result analysis is one of the most time-consuming tasks. Therefore, open source and open data modeling approaches are put forward as an argument for efficient use of resources []. Yet, there is no standardized or broadly used model-agnostic data container in the scientific field of energy system modeling to hold energy system related data. In most cases every software comes with its own logic relating to input-data and output-data of the model. In addition, the decision about how to create the required data sets from raw data sources and the post-processing of result data is often left to the user. Due to these two reasons, re-use of data and more importantly reproducibility of model results is a challenging task, even for experienced modelers.

To improve reproducibility of model results and re-usability of existing data, the following data model description has been developed. Energy system related data is stored in the Data Package format. The complete reproducible workflow from raw-data to final results is described for this data model. The data model has been implemented in the Python package oemof.tabular [] which is based on the Open Energy Modeling Framework (oemof). However, the concept is not restricted to this package, but can be applied with other software as well.

2 Background

Oemof is a powerful tool for the modeling of energy systems []. Functionalities range from large linear programming (sector coupled) market models [, , ] to detailed mixed integer heating system [, ] or battery models to assess the profitability of power plants in current and future market environments. The underlying concept and its generic implementation allows for this versatile application. It is based on a bipartite graph structure, where nodes are partitioned into buses and components. Most oemof components are of a rather abstract type. For example the Transformer class can be used to model different energy system components such as power plants (1 input, 1 output) as well as a heat pump (2 inputs, 1 output) or any other conversion process. To illustrate the concept, Figure 1 shows a Transformer connected to different buses (1 input, 2 outputs) to model a combined heat and power (CHP) plant.

Figure 1 

Illustration of a CHP plant model based on the oemof.solphTransformer class. Nodes are shown as ellipses/squares and edges between the nodes are depicted as arrows.

The usage of the Python API for this component in oemof.solph is shown in Figure 2.

Figure 2 

Example of the oemof.solph Application Programming Interface (API) for a transformer component with one input and two outputs.

When building energy system models, data is often stored in a tabular data format, for example, CSV files, Excel files or relational databases. However, the design and implementation of a generic tabular data input processing tool for oemof.solph has proven to be difficult. One of the reasons is that tables are a flat and two dimensional data structure whereas oemof.solph’s API utilizes a high degree of nested objects and data structures. Mapping this nested approach onto flat tables is a non-trivial task.

3 Facades

Facilitating the task outlined above is one of the functions oemof.tabular needs to accomplish. The package was developed in order to allow the user to create an oemof model via tabular data sources. This means that it must also enable her to specify oemof.solph components using such flat structures. In oemof.tabular, this is done by using the façade design pattern first introduced introduced by Gamma et al. in 1994 []. The facade design pattern has two main purposes: 1. it provides a simple interface for users to access functionalities of a complex subsystem and 2. it loosens the coupling of consumers of the subsystem’s interface with the components of the subsystem.

Therefore, facade classes have the following advantages when viewed in the context of oemof.tabular:

  • Facades allow instantiation of models from two dimensional data sources as they provide a simplified interface to complex underlying structures.
  • The simplified and thus restricted and less generic mathematical representation leads a more transparent modeling approach.
  • The simplified interface is easy to use and integrate within the context of teaching and capacity building.
  • It also allows building an interface for composed components which are constructed using multiple oemof.solph objects.
  • Facades can be used with a different back-end, which allows the integration of other energy system modeling frameworks which may not even have to be written in the same programming language.

The implementation

A user of oemof and oemof.solph is expected to use instances of classes from a particular class hierarchy to build a model. Thus, facades are integrated into this class hierarchy as a mix-in class: a facade to a specific oemof.solph class is created by sub-classing it, mixing in the Facade class via cooperative multiple inheritance and then using the general facade methods to simplify the interface of the original oemof.solph class.

This allows for a two step approach to build complex components out of simple ones. One can first aggregate a complex subsystem using composition, without having to think about simplifying the interface of that system. Simplifying the interface can be done in a second step by creating a facade via inheritance.

The oemof.tabular package not only provides the facade infrastructure, but also implements a number of facades to regular simple oemof.solph components as well as complex compositions of oemof.solph components.

Since facades are integrated into oemof.solph’s class hierarchy, the classes of all oemof.tabular components are sub-classes of oemof.solph components, which means that they can freely be mixed with all their more generic parent class objects in a model. In addition, the data model is extendable and could be applied for various model generators, like for example PyPSA [] or calliope []. However, currently the implementation for reading Data Packages is limited to oemof.tabular classes. The facade concept as used in oemof.tabular has also proved it’s applicability by being transported to and used in the oemof.thermal [] package.

The issue of transparency

Model generators such as oemof.solph can indeed simplify energy system modeling. However, it is noteworthy that this is a double-edged approach. Simplification for the user always comes with drawbacks as the complexity remains hidden from the user. Depending on the parameters provided, different sets of constraints are created. Nonetheless, resulting mathematical equations are not visible at any stage of the modeling process. Therefore, approaches like OSeMOSYS [] can have a higher level of transparency than other object oriented model generators. As such models or model generators are implemented in a pure algebraic modeling language, every part of the model (variable definitions, constraints, etc.) is clearly and transparently detectable in the source code files. In the case of facades in oemof.tabular, mathematical relations of the models and their implementation are hidden by an additional layer of classes. However, since the oemof.tabular API is less generic and more restricted than the oemof.solph API, the additional layer may actually increase transparency compared to oemof.solph components by creating a clear link between input-parameters and the resulting mathematical model.

Facade Example: Hydro Reservoir Modeling

To illustrate the facade concept, subsequently an oemof.tabular storage example is compared to the classical oemof.solph approach. The oemof.solph package provides a GenericStorage class to model different storages such as batteries, hot water storages or pumped hydro-electric storages. To model reservoir storages with an inflow and possible spillage, a set of connected oemof.solph components is required. To simplify modeling, the Reservoir facade bundles these components and provides a high level API access to a more complex underlying model. Figure 3 provides an illustration of the Reservoir facade.

Figure 3 

Illustration of a reservoir model in oemof.tabular.

The facade class itself is a subclass of the GenericStorage. However, to allow for a constant inflow into the storage, an additional Source object is created.

The reservoir is modeled as a storage with a constant inflow (x denote endogenous variables, c denote exogenous variables):

(1)
xlevel(t)=xlevel(t1)(1closs_rate(t))+xprofile(t)xflow,out(t)cefficiency(t)   tT
(2)
xlevel(0)=cinitial_storage_levelccapacity

The inflow is bounded by the exogenous inflow profile. Thus, if the inflow exceeds the maximum capacity of the storage, spillage is possible by setting xprofile(t) to lower values.

(3)
0xprofile(t)cprofile(t)   tT

The spillage of the reservoir is therefore defined by cprofile(t) – xprofile(t). Additional constraints apply which have been omitted in the description but can be retrieved from the oemof documentation.

API comparison for the reservoir example

Subsequently, in Figure 4, the Python code to instantiate this component is shown. In comparison to the oemof.tabular code, the required oemof.solph code differs significantly (see Figure 5). First of all, more objects with a nested set of objects need to be instantiated (Flows, Sources). This nested structure allows for a very flexible modeling approach. However, it creates hurdles for writing a generic data interface to instantiate all these objects, due to the large set of possible combinations. In contrast, the flat structure of the facade arguments allows for a simple interface to tabular data. One additional difference which can be observed is the (energy) specific naming of attributes, for example efficiency, compared to outflow_conversion_factor. As the Reservoir class is a subclass of the GenericStorage class, some attributes of the parent class are also available in the child class (initial_storage_level).

Figure 4 

API example for an oemof.tabular reservoir facade.

Figure 5 

API example for a simple oemof.solph reservoir model.

Even for comparably small systems, the example underlines the advantages of the approach.

4 Data Packages

A Data Package is, in its simplest form, not more than a valid JSON [] file named “datapackage.json”. The file contains meta data about data resources which can be specified inline in the same file. For more complex cases, data resources are stored in separate files inside the directory containing the “datapackage.json” file. The contents of the mentioned JSON file are standardized via the Data Package specification []. An example fragment of such a datapackage.json JSON file can be seen in Figure 7. The Data Package has been extended by other standards, which further refine the format and contents of the meta data file and the resources to suit different application contexts. Examples of this are Fiscal Data Packages [], meant to store fiscal data, as well as Tabular Data Packages [], which refine the original Data Package [] specification to handle table like data. The latter combines the advantages of databases and spreadsheets with the ubiquity and user-friendliness of CSV files. Tabular Data Packages allow storing type meta data and set primary keys as well as foreign keys across resources, i.e. different CSV files. They are more lightweight than databases and they are both, human readable and easily processable in almost any programming language. In recent years, different European projects in the field of energy system modeling have decided to opt for Data Packages to store model relevant data [, ]. Using Data Packages in the correct manner also allows to adhere to the FAIR principle of data handling proposed by Wilkonsen et al. 2016 [].

In the context of oemof.tabular, Data Packages are used to hold information on the topology and parameters of an energy system model instance. At a minimum this includes all exogenous model variables and associated meta data. However, it may also include raw data and scripts for pre- and post processing. On top of the Tabular Data Package structure an structure an energy system specific logic is added, which adds minimal additional constraints on the format of Tabular Data Packages used to specify an oemof.tabular model, while still keeping them valid Tabular Data Packages according to the original specification. Therefore, oemof.tabular requires the following parts in a Tabular Data Package:

  1. a directory named data containing at least one sub-folder called elements, which may optionally contain a directory called sequences and a directory called geometries and
  2. a valid meta-data .json file for the Data Package.

The exemplary folder tree of such a Data Package is depicted in Figure 6.

Figure 6 

Example of an oemof.tabular Data Package folder tree.

Figure 7 

Setting foreign keys in the JSON meta data file for cross referencing connected components.

As stated above, data inside Data Packages is stored in so called resources, which, for a Tabular Data Package, are CSV files. The columns of such resources are referred to as fields. Therefore, field names of the resources are equivalent to parameters of the energy system elements and sequences. Connections between components and buses can be defined via foreign keys. These allow linking element fields to fields of other elements stored in other resources. To reference the name field of a resource with name bus a foreign key can be set within the JSON meta data file using the forgeinKeys key as shown in Figure 7.

To distinguish elements and sequences, these two are stored in sub-directories of the data directory. In addition geometrical information can be stored under data/geometries in a .geojson format. To facilitate the process of creating, processing and calculating a Data Package, oemof.tabular offers several functionalities:

  • oemof.tabular.datapackage.building contains functions to infer meta data, download raw data, read and write elements, sequences etc.
  • oemof.tabular.datapackage.processing contains functions to process model results, which can be used in the compute.py script.
  • oemof.tabular.datapackage.aggregation allows to aggregate time series to reduce model complexity.

5 Reproducible Workflows

Reproducibility of results is a recurring point of discussions in the energy system modeling community [, ]. These discussions have mainly been centered around the availability of source code (open source) and data (open data). Historically, for many prominent models neither the source code nor all input data have been made available. Thanks to new open source developments [, , , ] this has partly changed in recent years (for example the open release of MESSAGEix []). However, not all barriers have been removed yet. Firstly, closed models are still being used for research purposes. Secondly, more subtle barriers exist even for open source models. For one of the first open source models, Balmoral [], a GAMS software license is required, which constitutes a barrier to re-run computations. Another important issue is what can be described as the difference between practical and theoretical transparency. While for open source models with open data theoretical reproducibility should be possible, practical issues hamper such exercises. First of all, not all necessary information may be given by the respective authors. If provided, complexity of model environments with poor documentation can make any attempt time consuming. In these cases, reproducibility is hardly possible from a practical point of view, even for experienced researchers with domain-specific knowledge.

Workflow description

To improve reproducibilty of oemof.tabular-based research, a structure and workflow is proposed which is based on a the set of ten rules for reproducibility in computational research presented by Sandve et al. 2013 []:

  1. For every result, keep track of how it was produced
  2. Avoid manual data manipulation steps
  3. Archive the exact versions of all external programs used
  4. Version control all custom scripts
  5. Record all intermediate results, when possible in standardized formats
  6. For analyses that include randomness, note underlying random seeds
  7. Always store raw data behind plots
  8. Generate hierarchical analysis output, allowing layers of increasing detail to be inspected
  9. Connect textual statements to underlying results
  10. Provide public access to scripts, runs, and results

The starting point of this workflow is the folder structure shown in Figure 8.

Figure 8 

Folder structure for a repository suitable for reproducible workflows.

  1. Everything in the repository is (if possible) generated by scripts, version controlled, and documented to keep track of every step in result production and avoid manual data manipulation (rule 1, 2). Obviously, the repository is made publicly available (rule 10).
  2. The raw-data directory contains all input data required to build the input Data Packages for the model. Ideally, raw data sources come with meta data information and open licenses. Unfortunately not all data published comes with such information which hinders reproducibility of workflows. Raw data can also be bundled on remote persistent storages like Zenodo [], which are suitable for FAIR data distribution.
  3. The scenarios directory allows to specify different scenarios and describes them in a basic way. The TOML format provides an easy and, if necessary nested structure. In addition to a description, configuration settings for constructing the input Data Packages can be specified in these files. Figure 9 provides an example for a scenario file in the TOML format. This file can be used in the scripts to build Data Packages. Note that the user-specific build-scripts will need to interpret keys and values. Therefore, scenario files in the TOML format do not follow a specific standardized structure, except using the TOML language.
  4. The scripts directory contains code to construct input Data Packages for scenarios based on the configuration .toml files and the raw-data (rule 2). In addition, a script to compute the scenario(s) can be stored there. If possible, raw data can also be downloaded from persistent sources (for example Zenodo) using scripts. Finally, this directory would also contain code for post processing data and for result visualization (rule 7).
  5. Results are stored in the results directory. One important part is the separation of input and output data. Input data contains model specific exogenous model variables (in this context, oemof.tabular Data Packages). The output data directory contains endogenous model variables. Altogether, this step acknowledges rule 5 and 10 of the ten rules.
  6. The open license and environment definition in combination with a version control system such as git allows to reproduce results on different operating systems (rule 3, 4 and 10).
Figure 9 

Example TOML file with scenario specifications to build input Data Packages.

An example of this workflow has been published for a model-based analysis of the German electricity system []. The energy system model covers the German power system with its neighboring countries. Similarly, the workflow has been applied in an analysis for flexibilisation of heat pumps [].

It should be noted that energy modelers also need to acknowledge energy modeling specific best practices such as proposed by Decarolis et al. [].

6 Conclusion

This paper introduces the application of the facade concept and the usage of Data Packages for the Open Energy System Modeling Framework (oemof). The concept has been implemented in the Python package oemof.tabular which is designed as an interface to instantiate energy system models with the oemof.solph library from Tabular Data Packages. Using facades can (1) increase transparency by restricting generic components to energy specific components, (2) allow to build composed components and instantiate those from tabular data sources, (3) facilitate the application in teaching and capacity building environments and (4) allow for reproducible workflows. Additionally, the implementation based on the Data Package standard allows to store meta data of the model input data in a standardized way. To enable reproducibility of energy research results a workflow is proposed which is based on scientific literature.