(1) Overview
Introduction
Understanding land use and land cover change (LULCC) is important for projecting the evolution of the Earth system, but tools for downscaling and translating LULCC changes across different modeling platforms are limited. For example, spatial disaggregation of LULCC projections from the Global Change Assessment Model (GCAM) [, ], which predicts future land use change for each GCAM region and Agro-Ecological Zone (AEZ) or water basin, is generally conducted on a case-by-case basis and a number of manual post-processing steps are required to customize outputs before they can be used by Earth System Models (ESMs). Demeter was developed to refine this process to ensure ease-of-use, output format extensibility, and consistency in the disaggregation methodology among Earth system modelers who utilize global or national GCAM land allocation projections. This requires Demeter to produce gridded data in multiple formats and resolutions, allow users to provide pertinent constraints, evaluate time-series projection data in terms of LULCC, and much more. To accommodate the need for a flexible approach to land allocation disaggregation from zonal to gridded resolutions, Demeter employs a robust intensification and extensification (expansion) algorithm proposed by and demonstrated in West et al. [] and Le Page et al. [].
Demeter also enables users to evaluate model uncertainty by hosting the functionality to build random parameter ensembles. The ensemble function in Demeter creates an array of all possible permutations of key influencer parameters as identified by Le Page et al. []. The user may then choose the number of randomly selected permutations that he/she wish to evaluate. Permutation runs are conducted in parallel using Joblib in Python and spread over the number of CPUs defined by the user. Each run outputs a unique directory of products.
Demeter was developed to downscale LULCC information from GCAM but could be extended to other IAMs. It is written in Python (version 2.7) and utilizes NumPy to provide an efficient and effective N-dimensional array approach to quantifying global and national LULCC over time. Python’s extensive package repository also provides the flexibility that Demeter requires to output data in a variety of formats. Finally, Python was chosen to encourage community involvement in the extension of Demeter due to the language’s broad use.
The objective of this paper is to describe Demeter’s work-flow, capabilities, output module extensibility, and accessibility for community use.
Implementation and architecture
The scientific underpinnings of Demeter are described in West et al. [] and Le Page et al. []. Demeter uses gridded observational spatial data (OSD) representing real-world land use and land cover observations as an on-the-ground reference to determine the spatially distributed land allocation projections for the base year in an IAM (Figures 1 and 2). The OSD serves as a starting point to disaggregate zonal land allocation projections from GCAM to a grid. The spatial resolution of the OSD controls the resolution to which GCAM land allocations will be disaggregated. Demeter’s OSD can be derived from publically available LULC products such as MODIS (https://modis.gsfc.nasa.gov/data/dataprod/mod12.php), ESA CCI (https://www.esa-landcover-cci.org/), etc. Land class name and type harmonization between the OSD and GCAM is defined by the user and reclassified during runtime in Demeter. The harmonization of land area between observed and projected data is achieved by adjusting GCAM’s total projected area to the feasible area in the OSD.
Applying the projected future land allocation to the OSD is achieved systematically by intensification and expansion [, ]. Intensification is the process of increasing a particular land use or land cover in a grid cell where it already exists, while expansion creates new land use or land cover in grid cells where it does not yet exist but is in proximity to an existing allocation. Proximal relationships are defined by kernel density probabilities that are generated during runtime to determine the probability that a grid cell may contain a particular land use or land cover class (Figure 3). The user can define a probability threshold that must at least be present in the grid cell for expansion to occur.
During the intensification and expansion process, user-defined transition priorities for each land class determine the order of transitions among land classes. The order in which land classes are processed is also defined by the user. Additionally, Demeter permits the user to provide custom constraints that may impede siting (e.g., soil quality) and weight them by how the constraint may apply to a specific land class. The intensification and expansion process is completed for each GCAM time step starting from the spatial distribution of LULC area from the previous time step. Figure 4 demonstrates the OSD and the product of applying the GCAM allocation for 2005 through intensification and expansion and the resulting change.
Demeter finalizes its run by outputting a suite of tabular, spatial, and mapped data. These outputs can be used to evaluate the spatial distribution of land data and which transitions have occurred where. The user can choose to output their data in a format specifically required by their model or research objectives.
Currently Demeter output products include:
- kernel density probability maps as PNG images per land class,
- land change maps as PNG images per time step per land class,
- land change maps as PNG images showing intensification and expansion per time step per land class,
- land transition maps as PNG images per time step per transition,
- a tabular LULC file per time step in either percent or square kilometres per grid location,
- a spatial LULC file per time step in either percent or square kilometres per grid location as an ESRI Shapefile,
- a tabular LULC transition file per time step per land class in square kilometres per grid location, and
- a NetCDF file per time step containing each land class and the resulting percent land cover per grid cell.
Configuration
Demeter requires the setup of several input files to begin a run. Examples of all input files can be found in the ‘example’ directory (https://github.com/IMMM-SFA/demeter/tree/master/example) and the expected file structure is outlined in the following:
- Example directory
- Inputs directory
- Allocation directory
- Constraint weighting file
- GCAM landclass allocation file
- Kernel density weighting file
- Spatial landclass allocation file
- Transition priority file
- Treatment order file
- Observed spatial data directory
- Observed spatial data file
- Constraint data directory
- Constraint files
- Projected GCAM land allocation directory
- GCAM land allocation file
- Reference data directory
- Reference files.
- Allocation directory
- Inputs directory
The following describes the requirements and format of each input.
Observed spatial data
This file represents the area in square degrees of each land class existing within a grid cell. The grid cell size is defined by the user. This file must be presented as a comma-separated values (CSV) file having a header in the first row and must contain the field names and fields described in Table 1.
Field | Description |
---|---|
fid | Unique integer ID for each grid cell latitude and longitude. |
landclass | Each land class field name (e.g., shrub, grass, etc.). Field names must not include commas. |
region_id | The integer ID of the GCAM region that the grid cell is contained in. Exact field name spelling required. |
metric_id | The integer ID of the GCAM AEZ or basin that the grid cell is contained in. Exact field name spelling required. |
latitude | The geographic latitude value of the grid cell centroid as a signed float. Exact field name spelling required. |
longitude | The geographic longitude value of the grid cell centroid as a signed float. Exact field name spelling required. |
Projected land allocation data
This file is the formatted GCAM run output for land allocation projections. Since the format of this file can vary based on GCAM user preference, the file must be formatted to meet Demeter input requirements as described in Table 2. The file must be a CSV file having the header in the first row.
Field | Description |
---|---|
region | The text name of the GCAM region. Exact field name spelling required. |
landclass | Each land class field name (e.g., shrub, grass, etc.). Field names must not include commas. |
year | Each year of the GCAM run as an integer (e.g., 2005, 2010, etc.). |
metric_id | The integer ID of the GCAM AEZ or basin. Exact field name spelling required. |
Units | The text name of the units of the data. |
Allocation files
Constraint weighting
A weight for each constraint, with a value ranging from –1.0 to 1.0, can be applied to each land class. If no constraints are desired, a user should simply provide a header-only file. For example, for a given land class, the weight for the soil quality constraint with a value of –1 indicates that one land class is fully constrained inversely (e.g., grasslands are opportunistic and grow readily in areas with a low soil quality); a weight of 0 indicates that soil quality exerts no constraint to the land class (e.g., forest, etc.); a weight of 1 for soil quality indicates that high soil quality will highly influence where the type will be spatially allocated (e.g., cropland). These constraints are developed in separate files as described in the following Constraints section. See the constraint weighting file in the example inputs for reference.
Kernel density weighting
Weights the degree to which land classes subjected to a kernel density filter will be utilized during expansion to each class. Value from 0.0 to 1.0. See the kernel density weighting file in the example inputs for reference.
Transition priority
This ordering (ascending integers starting from 0) defines the preferential order of final land allocation (e.g., crops expanding into grasslands rather than forests). See the priority allocation file in the example inputs for reference.
Treatment order
Defines the order in which final land classes are downscaled. This will influence the results (e.g., if crops are downscaled first and overtake grassland, grassland will not be available for shrubs to overtake when processing shrub land). See the treatment order file in the example inputs for reference.
Observational spatial data class allocation
This file defines how the land-use and land-cover classes in the OSD will be binned into final land classes for output, which can be defined by the user and serve to place projected land allocation data from GCAM on a common scale with the on-the-ground representation of land-use and land-cover represented in the OSD. Values are to be from 0 or 1, where 1 is used to designate the chosen bin for the land class. Each land class should be binned to only one bin. See the Observed spatial data class allocation file in the example inputs for reference.
Projected GCAM land class allocation
This file defines how the land-use and land-cover classes in the GCAM projected land allocation data will be binned into final classes. Values are to be from 0 or 1, where 1 is used to designate the chosen bin for the land class. Each land class may be binned to multiple bins and the resulting allocation value will be the result of dividing the land evenly among each bin with a 1 designation. See the projected GCAM land class allocation file in the example inputs for reference.
Constraints (not required)
As discussed earlier, constraints such as soil quality may be desirable to the user and can be prepared by assigning a weighted value from –1.0 to 1.0 for each grid cell in the OSD. Spatial maps of constraints should be provided by the user for application during the downscaling process. Users should note that constraining a grid cell to 0.0 may impede the ability to be able to achieve a projected land allocation from GCAM since land area is being excluded that GCAM expects. Each constraint file must have two fields: fid and weight. The fid field should correspond to the fid field in the OSD input and the weight field should be the weight of the constraint per the cell corresponding to the OSD input. Each file should be a CSV with no header.
Configuration file
Demeter’s configuration file allows the user to customize each run and define where file inputs are, and outputs will be. The name of the configuration file can be arbitrary since it can be specified as an argument when running Demeter. The configuration options and hierarchical level are described in Table 3.
Level | Parameter | Description |
---|---|---|
[STRUCTURE] | root_dir | The full path of the root directory where the inputs and outputs directory are stored |
[STRUCTURE] | in_dir | The name of the input directory |
[STRUCTURE] | out_dir | The name of the output directory |
[INPUTS] | allocation_dir | The name of the directory that holds the allocation files |
[INPUTS] | observed_dir | The name of the directory that holds the observed spatial data file |
[INPUTS] | constraints_dir | The name of the directory that holds the constraints files |
[INPUTS] | projected_dir | The name of the directory that holds the GCAM projected land allocation file |
[INPUTS] | ref_dir | The name of the directory that holds the reference files |
[INPUTS] [ALLOCATION] | spatial_allocation | The file name with extension of the observed spatial data class allocation |
[INPUTS] [ALLOCATION] | gcam_allocation | The file name with extension of the projected land class allocation |
[INPUTS] [ALLOCATION] | kernel_allocation | The file name with extension of the kernel density weighting |
[INPUTS] [ALLOCATION] | transition_priority | The file name with extension of the priority allocation |
[INPUTS] [ALLOCATION] | treatment_order | The file name with extension of the treatment order |
[INPUTS] [ALLOCATION] | constraints | The file name with extension of the constraint weighting |
[INPUTS] [OBSERVED] | observed_lu_data | The file name with extension of the observational spatial data |
[INPUTS] [PROJECTED] | projected_lu_data | The file name with extension of the projected land allocation data from GCAM |
[INPUTS] [REFERENCE] | gcam_regnamefile | The file name with extension of the GCAM region name to region id lookup |
[INPUTS] [REFERENCE] | region_coords | A CSV file of GCAM region coordinates for each grid cell |
[INPUTS] [REFERENCE] | country_coords | A CSV file of GCAM country coordinates for each grid cell |
[OUTPUTS] | diag_dir | The name of the directory that diagnostics outputs will be kept |
[OUTPUTS] | log_dir | The name of the directory that the log file outputs will be kept |
[OUTPUTS] | kernel_map_dir | The name of the directory that kernel density map outputs will be kept |
[OUTPUTS] | transition_tabular | The name of the directory that tabular land transition outputs will be kept |
[OUTPUTS] | transition_maps | The name of the directory that land transition map outputs will be kept |
[OUTPUTS] | luc_intense_p1_dir | The name of the directory that the land intensification first pass map outputs will be kept |
[OUTPUTS] | luc_intense_p2_dir | The name of the directory that the land intensification second pass map outputs will be kept |
[OUTPUTS] | luc_expand_dir | The name of the directory that the land expansion map outputs will be kept |
[OUTPUTS] | luc_timestep | The name of the directory that the land use change per time step map outputs will be kept |
[OUTPUTS] | lc_per_step_csv | The name of the directory that the tabular land change per time step outputs will be kept |
[OUTPUTS] | lc_per_step_nc | The name of the directory that the NetCDF land change per time step outputs will be kept |
[OUTPUTS] | lc_per_step_shp | The name of the directory that the Shapefile land change per time step outputs will be kept |
[OUTPUTS] [DIAGNOSTICS] | harm_coeff | The file name with extension of the NumPy array that will hold the harmonization coefficient data |
[OUTPUTS] [DIAGNOSTICS] | intense_pass1_diag | The file name with extension of the CSV that will hold the land allocation per time step per functional type for the first pass of intensification |
[OUTPUTS] [DIAGNOSTICS] | intense_pass2_diag | The file name with extension of the CSV that will hold the land allocation per time step per functional type for the second pass of intensification |
[OUTPUTS] [DIAGNOSTICS] | expansion_diag | The file name with extension of the CSV that will hold the land allocation per time step per functional type for the expansion pass |
[PARAMS] | model | The model name providing the projected land allocation data (e.g., GCAM) |
[PARAMS] | metric | Subregion type (either AEZ or BASIN) |
[PARAMS] | scenario | Scenario name |
[PARAMS] | run_desc | The description of the current run |
[PARAMS] | agg_level | 1 if only by metric, 2 if by region and metric; AEZ is the default |
[PARAMS] | observed_id_field | Observed spatial data unique field name (e.g., fid) |
[PARAMS] | start_year | First time step to process (e.g., 2005) |
[PARAMS] | end_year | Last time step to process (e.g., 2100) |
[PARAMS] | use_constraints | 1 to use constraints, 0 to ignore constraints |
[PARAMS] | spatial_resolution | Spatial resolution of the observed spatial data in decimal degrees (e.g., 0.25) |
[PARAMS] | errortol | Allowable error tolerance in square kilometres for non-accomplished change |
[PARAMS] | timestep | Time step interval (e.g., 5 years) for the output data. This time step is the increment that Demeter will process when starting with the start year. |
[PARAMS] | proj_factor | Factor to multiply the projected land allocation by |
[PARAMS] | diagnostic | 0 to not output diagnostics, 1 to output |
[PARAMS] | intensification_ratio | Ideal fraction of land change that will occur during intensification. The remainder will be through expansion. Value from 0.0 to 1.0. |
[PARAMS] | stochastic_expansion | 0 to not conduct stochastic expansion of grid cells, 1 to conduct |
[PARAMS] | selection_threshold | Threshold above which grid cells are selected to receive expansion for a target functional type from the kernel density filter. Value from 0.0 to 1.0; where 0 lets all land cells receive expansion and 1 only lets only the grid cells with the maximum likelihood expand. |
[PARAMS] | kernel_distance | Radius in grid cells used to build the kernel density convolution filter used during expansion |
[PARAMS] | map_kernels | 0 to not map kernel density, 1 to map |
[PARAMS] | map_luc_pft | 0 to not map land change per land class per time step, 1 to map |
[PARAMS] | map_luc_steps | 0 to not map land change per time step per land class for intensification and expansion, 1 to map |
[PARAMS] | map_transitions | 0 to not map land transitions, 1 to map |
[PARAMS] | target_years_output | Years to save data for; default is ‘all’; otherwise a semicolon delimited string (e.g., 2005; 2020) |
[PARAMS] | save_tabular | Save tabular spatial land cover as a CSV; define tabular units in tabular_units param |
[PARAMS] | tabular_units | Units to output the spatial land cover data in; either ‘sqkm’ or ‘fraction’ |
[PARAMS] | save_transitions | 0 to not write CSV files for each land transitions per land type, 1 to write |
[PARAMS] | save_shapefile | 0 to not write a Shapefile for each time step containing for all functional types, 1 to write; output units will be same as tabular data |
[PARAMS] | save_netcdf_yr | 0 to not write a NetCDF file of land cover percent for each year by grid cell containing each class; 1 to write |
[PARAMS] | save_ascii_max | 0 to not create an ASCII raster of the land class with the maximum area for each grid cell per 1 to write |
[ENSEMBLE] | permutations | If running an ensemble of configurations, this is the number of permutations to process |
[ENSEMBLE] | limits_file | If running an ensemble of configurations, this is the full path to a CSV file containing limits to generate ensembles of certain parameters. |
[ENSEMBLE] | n_jobs | If running an ensemble of configurations, this is the number of CPU’s to spread the parallel processing over. –1 is all, –2 is all but one, 4 is four, etc. |
Figure 5 details the Demeter’s workflow once the input files have been prepared. The process can be outlined as follows:
- Configuration file read and input parameters are validated.
- Input data is read and validated.
- Grid area discrepancies between the OSD and the projected land allocation from GCAM are harmonized by adjusting the GCAM allocation areas using a correction factor (ratio of the OSD land use data per region and metric to the projected area from GCAM).
- Constraints are processed if provided and prepared for integration.
- The convolution filter that will be used to calculate kernel density is prepared. This is applied during each time step.
- Initial and time step specific arrays are prepared.
- An initial pass to intensify existing land is conducted.
- An expansion pass is conducted to apply land allocation projections where the kernel density probabilities meet the user-specified selection threshold.
- A final intensification pass is conducted to allocate any land projections not yet met.
- Output products are created.
Execution
Demeter has two main model level functions available to the user: execute() and ensemble(). The execute function allows the user to run Demeter based on parameters defined in the configuration file and inputs provided. This is the most common use of Demeter. The ensemble function was built to allow the user to test Demeter using random combinations of the transition priorities, treatment order, intensification ratio, selection threshold, and kernel distance. The ensemble section of the configuration file as seen in the example directory gives the user the ability to define the number of permutations that he/she wish to evaluate as well as a limits file that gives the user the ability to set limits and the interval for sampling values of intensification ratio, selection threshold, and kernel distance. The initial state of the configuration file and input files are used to create a template by which every unique combination of the aforementioned parameters can be generated. Random samples are then drawn based on these configuration options. Demeter runs, each of which represent a unique sample of parameter values, are then executed in parallel based on the number of jobs the user has set.
Demeter can be installed as follows:
- This repository uses the Git Large File Storage (LFS) extension (see https://git-lfs.github.com/ for details). First install Git LFS, then initialize it using the git lfs install command.
- Clone Demeter into your desired location git clone https://github.com/IMMM-SFA/demeter.git.
- Install the Python package setuptools (https://pypi.python.org/pypi/setuptools) if the package is not already on your machine.
- From the directory you cloned Demeter into run python setup.py install which will install Demeter as a Python package on your machine and will install the required dependencies. Note: you need admin privileges to run this install.
The procedure to conduct a Demeter run is as follows:
- Setup your configuration file (.ini). Examples are located in the “example” directory. Be sure to change the root directory to the directory that holds your data (use the ‘demeter/example’ directory as an example).
- If running Demeter from an Integrated Development Environment (IDE), please include the path to your config file. A sample script can be found at “demeter/example/example.py”.
- If running Demeter from terminal, please run in the script at demeter/demeter/model.py by passing the full path using forward slashes with no spaces to the configuration file as the first argument and either “standard” or “ensemble” as the second argument (e.g., python model.py/users/<uname>/github/demeter/example/config.ini standard).
Both the output directory and log file generated during runtime are named per the run scenario and the timestamp of when the run began. An additional suffix of the permutation number will also be added if using the ensemble function in Demeter. All other outputs directories are named according to what the user specifies in the configuration file.
Quality control
All possible combinations of the configuration file and input files have been tested. Strict requirements for input files are documented. Configuration options have set limits, values, and types and are validated during runtime. An example setup for Demeter is included in the package. The example has a configuration file and an input directory containing all other necessary files. To setup the provided example, the user should simply change the file paths in the configuration file and the example.py file to represent their local directories. The configuration file contains a root path and a full path with file name to the limits file that must be set when using the ensemble functionality.
Demeter outputs the following diagnostic files that help the user evaluate the outputs based on how they were processed during runtime:
- Harmonization coefficient NumPy array
- This file contains the correction factor that was applied to each grid cell of the projected land allocation data to ensure the projected area is the same as the area actually available as represented by the OSD for the target base year
- Files documenting the land change per year per grid cell from one landclass to its transition class by region and metric
- A detailed log file
- A suite of tabular, spatial, and mapped data
(2) Availability
Operating system
Mac OS X; Linux; Windows 7
Programming language
Python 2.7.13
Additional system requirements
It is recommended that a minimum of 8 GB of memory be available for runs where the spatial resolution of the data is coarser than or equal to 0.05 degrees. Model runs facilitating data with a resolution finer than 0.05 degrees may require additional memory.
Dependencies
- numpy (>= version 1.12.0rc2)
- scipy (>= version 0.18.1)
- pandas (>= version 0.19.2)
- matplotlib (>= version 1.3.1)
- pyshp (>= version 1.2.12)
- joblib (>= version 0.11)
- setuptools (>= version 18.5)
Software location
Archive
Name: GitHub
Persistent identifier:https://doi.org/10.5281/zenodo.1214342
Licence: BSD 2-Clause
Publisher: Chris R Vernon
Version published: v1.0.0
Date published: April 7, 2018
Code repository
Name: GitHub
Identifier:https://github.com/immm-sfa/demeter
Licence: BSD 2-Clause (https://github.com/IMMM-SFA/demeter/blob/master/LICENSE)
Date published: August 30, 2017
Language
English
(3) Reuse potential
Demeter methods and functions contain docstrings that can be called in through Python’s built-in help function. Demeter’s code base is also verbosely commented to encourage community involvement for future development activities. Other integrated assessment model outputs for projected land allocation could also be utilized provided that they can meet the input requirements. Demeter can also be used in other multi-model, multi-scale integrated assessment studies since it can be called from other models.
An example of future research that could be conducted using Demeter could be to provide disaggregated land cover data as input to a potential evapotranspiration component of a global hydrology model. These types of land-water science applications can easily be evaluated using Demeter.
A tutorial on how to extend Demeter to output different formats is available here: https://github.com/IMMM-SFA/demeter/blob/master/docs/extend_outputs.md.