(1) Overview

Introduction

Integrated Assessment Models (IAMs) link together representations of multiple sectors, including energy, water, and land, examining the interactions between these systems []. These models typically initialize to a specific historic year and then project key variables into the future in annual to decadal time steps. For example, GCAM [, ] calculates information on socioeconomics (gross domestic product, population), energy (production, consumption, price, and emissions), agriculture (production, consumption, price, and emissions), land (land use, land cover, emissions), climate (concentrations, forcing, global mean temperature rise), and interactions between these systems. For example, GCAM can examine the effect of changes in the energy system on land allocation [] or the implications of changes in water availability on crop production []. While the fully coupled model can help address many science questions, separating the individual component models (1) reduces computational expense, (2) enables their use in other modelling systems, (3) facilitates ensemble calculations, and (4) facilitates hindcast and parameter estimation efforts. For these reasons, gcamland v1.0 isolates the land allocation mechanism from GCAM [].

Users can run two scenario types: “Reference” and “Hindcast”. In Reference mode, the model is initialized to a recent year (2010) and estimates future land allocation through 2100 as economic conditions evolve. In this mode, users can examine how land use and land cover might evolve in the future as agricultural yields or commodity prices change. In Hindcast mode, gcamland is initialized and run over a historical period. Such simulations allow the user to evaluate model performance by comparing model results to observational data.

Users can evaluate model uncertainty by including the ability to build random parameter ensembles. The ensemble function in gcamland v1.0 randomly samples from uniform distributions of key parameters, generating an ensemble with a user-specified number of simulations. Ensembles can be run in either Hindcast mode (e.g., to help with parameter estimation) or Reference mode (e.g., to help understand the implications of uncertainty in parameters on future land use). These simulations can be run in parallel using the doParallel package from R, spreading the simulations over the user specified number of cores.

Implementation and architecture

The gcamland package is written in R, using a suite of readily available libraries. R was chosen to encourage broad use and community involvement as the language is freely available and widely used. R is free, open-source and broadly used by scientists [].

Scenario Types

The gcamland package has two different scenario types: Reference and Hindcast. The default scenario type is a Reference type, but this can be changed by adjusting the scenario info object. The Reference type generates a future simulation, where land allocation is calculated in the future, with changes induced by changes in price or agricultural productivity. The Hindcast type simulates land allocation in the past, and it can be used to evaluate the model.

For the Reference type, the years included in the historical period are 1975, 1990, 2005, and 2010; the future period spans from 2015 to 2100 in five-year increments. In Hindcast mode, the historical period only includes the year 1975; the future period spans from 1976 to 2010 in annual increments. These years can be modified in the constants file; the only requirement is that calibration data is needed for the “history years” and prices are needed for the “future years”.

For Reference simulations, the user can adjust agricultural prices and productivity growth to craft their own scenario (see below). For Hindcast simulations, price and productivity estimates are from the Food and Agriculture Organization.

Required Input

The gcamland package has three types of input: initialization data, hindcast data, and scenario data. There are defaults provided for all types of data in the ‘inst/extdata’ directory. Initialization data defines the structure of the land nest (see Figure 1) and provides calibration data for the historical period. That calibration data includes land allocation, the value of unmanaged land, agricultural production for managed land types, non-land cost of production, and logit exponents that parameterize the degree of competition among land types []. Hindcast data includes prices and yields for all agricultural commodities from 1975 to 2010. Scenario data includes prices for all agricultural commodities for years simulated in the Reference scenario type, and productivity growth (i.e., adjustments to future yields) for managed land types in the future period for the Reference scenario type. We anticipate that users will adjust the scenario data to craft a new scenario (e.g., examine alternative future price or yield trajectories). Users can also adjust the logit exponents for certain levels of the nest by adjusting arguments in the scenario info object.

Figure 1 

Default nesting structure used in gcamland.

Calculation Steps

Each gcamland simulation has four steps: setup, initial calculation, final calculation, and reporting. These steps are called from the ‘run_model’ method. The setup step is called once per simulation. This step initializes the land allocator, reading in calibration data (described above) and calculating the required parameters. The calibration portion of the setup step calculates expected profit and land shares in the historical period and then uses that information to calculate a share weight that is used to ensure that model-calculated land allocations match read-in land allocations in the historical period.

The initial and final calculation steps are run for each model year (both history and future). The initial calculation step ensures that expected price, cost, expected yield, and expected profit are initialized for the future period. The final calculation step calculates land shares for each type within each nest based on the expected profit, logit exponent, and share weight. Using these land shares, the amount of land by type or use is also calculated.

The reporting step gathers all outputs and prints this to a file if requested (described below).

Model Output

Model output includes yield, expected yield, price, expected price, expected profit, land shares, and land allocation for all land types. The default file format for most outputs is an rds file (a compressed, serialized R object), but the ‘export_data’ function enables the output from a single scenario to be written as a csv file. There are also optional plotting functions that plot land area by type and region/subregion (Figure 2).

Figure 2 

Sample land allocation by use and type in the USA for a Reference scenario.

Ensemble Mode

The ensemble mode, invoked through the ‘run_ensemble’ function, allows users to generate an ensemble of parameter sets. Two types of parameters are varied in the ensemble mode: the logit exponents used at different points in the land nest (Figure 1) and the algorithm for forming expectations about yield and price. The parameters used in the samples are drawn quasi-randomly from a uniform distribution. These are used to generate input values for the ‘run_model’ function. When ensembles are run in Hindcast mode, gcamland will calculate the likelihood of a given set of parameters by comparing calculated land allocation to observed land allocation. Such information can be used to estimate a probability density function for each parameter.

Analysis Functions

The package includes a suite of Bayesian analysis functions. These functions are used in conjunction with the ensemble mode discussed above to estimate characteristics of the posterior probability density function (PDF) for the model parameters. A user can specify both the Bayesian prior for the model parameters and the likelihood function used to compare model outputs to historical data. Functions are supplied to compute the posterior probability density (PPD) of the parameters sampled in the ensemble, as well as statistics summarizing the PDF. Currently supported statistics include maximum a posteriori parameters (i.e., the set of parameters with the highest PPD), marginal expectation values, and highest-posterior-density intervals.

The package also includes a function to compute the Widely Applicable Information Criterion (WAIC) [, ]. This statistic allows a user to estimate the out of sample performance of a model fit to a particular data set. The WAIC is also useful for model comparison, allowing users to estimate a Bayesian odds ratio for model families, based on their expected out of sample performance.

Installing and Running the Package

The package can be installed directly from its github repository using the R devtools package. From an R prompt, run the command:

devtools::install_github(‘jgcri/gcamland’)

The steps to run an individual gcamland simulation are:

  1. (Optional) Attach the gcamland namespace: library (‘gcamland’)
  2. (Optional) Adjust the scenario information object
  3. Run a simulation run_model (SCENARIO.INFO, aVerbose=TRUE)
  4. (Optional) Export the output to csv export_results (SCENARIO.INFO)
  5. (Optional) Plot the results print(plotRegionalLandAllocation (SCENARIO.INFO))

The steps to run an ensemble of gcamland simulations are:

  1. (Optional) Attach the gcamland namespace: library (‘gcamland’)
  2. Run an ensemble of simulations run_ensemble (N, aOutputDir = MY.DIRECTORY, atype= “Hindcast”)

A list of the main methods is provided in Table 1, including the inputs and outputs of each method.

Table 1

Table of the main methods in the gcamland v1.0 package, including their inputs and outputs. Note that only methods that can be called directly are listed in the table.

MethodDescriptionInputsOutput

NameDescription

run_modelCalculates land use and land cover for a given region and set of time periodsaScenarioInfoScenario-related information, including names, logits, expectations.Table of model results
aPeriodsInteger vector of periods to run. Default is all periods defined for the scenario type.
aVerboseIf TRUE, output additional debugging information.
run_ensembleGenerates a suite of different scenarios and calculates land use and land cover for each (by calling run_model)NNumber of parameter sets to selectList of ScenarioInfo objects for the ensemble members
aOutputDirDirectory where outputs are saved
skipNumber of iterations to skip (i.e., if building on another run.)
atypeScenario type: either “Reference” or “Hindcast”
logparallelName of directory to use for parallel workers’ log files. If NULL, then don’t write log files.
export_resultsSaves results from a specified scenario as a csv fileaScenarioInfoScenario-related information, including names, logits, expectations.csv file with model results
plotLandAllocationPlots allocation over time by land type, with subregional detailaScenarioInfoScenario-related information, including names, logits, expectations.ggplot plot
plotRegionalLand AllocationPlots allocation over time by land type, aggregated to region (as in Figure 2)aScenarioInfoScenario-related information, including names, logits, expectations.ggplot plot
ScenarioInfoCreates a scenario information object that can be used in the functions aboveaExpectationTypeString indicating whether to use “Perfect”, “Lagged”, or “Linear” expectationsScenarioInfo object
aLaggedShareOldWeight to put on older information if “Lagged” expectations are used
aLinearYearsNumber of years to use in extrapolation if “Linear” expectations are used
aLogitUseDefaultBoolean indicating whether to use default logits
aLogitAgroForestAgroForest logit exponent (assuming mLogitUseDefault == FALSE)
aLogitAgroForest_NonPastureAgroForest_NonPasture logit exponent (assuming mLogitUseDefault == FALSE)
aLogitCroplandCropland logit exponent (assuming mLogitUseDefault == FALSE)
aScenarioTypeType of scenario to run: either “Reference” or “Hindcast”
aScenarioNameComplete scenario name, with expectations and logit information
aFileNameFile name
aOutputDirOutput directory
aSerialNumSerial number for a run that is part of a series.
aRegionRegion to use in the calculation. Right now we only run a single region at a time.

gcamland v1.0 Example

By following the steps to run an individual simulation listed above, gcamland v1.0 will calculate land use and land cover in the USA through 2100 assuming commodity prices are constant and crop yields increase as specified in []. This results in an increase in land allocated to crops (e.g., corn, rice, wheat, etc.), as yield increases drive an increase in profits, and a decline in unmanaged forest and other natural lands (Figure 2). gcamland v1.0 can examine the effects of other yield and price trajectories on land use and land cover. For example, by altering the future prices specified in ‘inst/extdata/scenario-data/AgPrices_Reference.csv’, one can explore the effect of changes in commodity prices on future land use and land cover. gcamland v1.0 can also be used to calculate land use and land cover in other regions. For example, by changing the ‘DEFAULT.REGION’ in the ‘constants.R’ file, the model can estimate land use and land cover for 31 pre-specified regions. Prices and yields can then be adjusted for each of these regions, as described above.

Quality control

The gcamland package includes automated functional and unit testing. Unit testing is run using the R testthat package and currently covers just over 80% of the code by line count. The ensemble capability described above is not amenable to automated testing, due to the size of the minimum meaningful run. This functionality is hand-tested whenever changes to that part of the code base occur. The Bayesian inference functions have an additional set of validation tests that are run on a mock dataset generated from a linear model. This model was also analyzed using the rethinking package [], and the unit tests require results of the gcamland analysis functions to be consistent with the results from the previously published package.

Functional testing is provided by the R built-in package checker, which verifies that packages can be installed, loaded, and unloaded cleanly and that the package code and structure conform to established best practices. The gcamland repository uses continuous integration, provided by Travis CI, ensuring that the test suite is run for every pull request. Tests must pass for the pull request to be merged into the repository. Any pull request that modifies production code or data must also be peer reviewed by another member of the development team.

(2) Availability

Operating system

Mac OS Sierra; Windows 7; Unix

Programming language

R (version 3.3 or greater)

Additional system requirements

N/A

Dependencies

  • assertthat (>=0.2),
  • dplyr (>=0.4.3),
  • tibble (>=1.1),
  • tidyr (>=0.6.0),
  • readr (>=1.0.0),
  • ggplot2 (>=2.2.1),
  • igraph (>=1.0.1),
  • randtoolbox (>=1.17),
  • doParallel (>=1.0),
  • foreach (>=1.4)

Software location

Archive

Name: Zenodo

Persistent identifier:http://doi.org/10.5281/zenodo.1264706

Licence: GPL2

Publisher: Katherine V. Calvin

Version published: v1.0.0

Date published: 04/06/18

Code repository

Name: GitHub

Identifier:https://github.com/JGCRI/gcamland

Licence: GPL2 (https://github.com/JGCRI/gcamland/blob/master/LICENSE)

Date published: dd/mm/yy

Language

English

(3) Reuse potential

gcamland methods and functions use Roxygen to generate documentation via R’s built-in help function. The code base is also thoroughly documented to encourage community development and use. Other regional aggregations, land types, and initialization data can be used through modest modifications to the initialization scripts. gcamland can also be used in other coupled model studies since it can be called from other models.

A user guide on how to use and modify gcamland is available at: https://github.com/JGCRI/gcamland/wiki.

An issue tracker is available at: https://github.com/jgcri/gcamland/issues.