gcamland v1.0 – An R Package for Modelling Land Use and Land Cover Change

gcamland v1.0 is an open source R package that was built to allocate land across a variety of uses based on changes in agricultural yield and commodity price. The land allocation algorithm is based on the one included in the Global Change Assessment Model (GCAM). gcamland v1.0 includes the ability to run in a historical mode, enabling model validation and parameter estimation, or in a future mode, simulating changes in land use/land cover in the future. For both modes, gcamland v1.0 can run a single simulation or a large ensemble of simulations with different parameters. When ensembles are generated in the historical mode, gcamland v1.0 calculates the likelihood of a given parameter set by comparing to observational data. gcamland v1.0 is publicly available via GitHub and has can be adjusted to represent alternative scenarios or configured to different regions and land types.


Introduction
Integrated Assessment Models (IAMs) link together repre sentations of multiple sectors, including energy, water, and land, examining the interactions between these systems [1]. These models typically initialize to a specific historic year and then project key variables into the future in annual to decadal time steps. For example, GCAM [2,3] calculates information on socioeconomics (gross domestic product, population), energy (production, consumption, price, and emissions), agriculture (production, consumption, price, and emissions), land (land use, land cover, emissions), climate (concentrations, forcing, global mean temperature rise), and interactions between these systems. For example, GCAM can examine the effect of changes in the energy system on land allocation [4] or the implications of changes in water availability on crop production [5]. While the fully coupled model can help address many science questions, separating the individual component models (1) reduces computational expense, (2) enables their use in other modelling systems, (3) facilitates ensemble calculations, and (4) facilitates hindcast and parameter estimation efforts. For these reasons, gcamland v1.0 isolates the land allocation mechanism from GCAM [6].
Users can run two scenario types: "Reference" and "Hindcast". In Reference mode, the model is initialized to a recent year (2010) and estimates future land allocation through 2100 as economic conditions evolve. In this mode, users can examine how land use and land cover might evolve in the future as agricultural yields or commodity prices change. In Hindcast mode, gcamland is initialized and run over a historical period. Such simulations allow the user to evaluate model performance by comparing model results to observational data. Users can evaluate model uncertainty by including the ability to build random parameter ensembles. The ensemble function in gcamland v1.0 randomly samples from uniform distributions of key parameters, generating an ensemble with a userspecified number of simulations. Ensembles can be run in either Hindcast mode (e.g., to help with parameter estimation) or Reference mode (e.g., to help understand the implications of uncertainty in parameters on future land use). These simulations can be run in parallel using the doParallel package from R, spreading the simulations over the user specified number of cores.

Implementation and architecture
The gcamland package is written in R, using a suite of readily available libraries. R was chosen to encourage broad use and community involvement as the language is freely available and widely used. R is free, opensource and broadly used by scientists [7].

Scenario Types
The gcamland package has two different scenario types: Reference and Hindcast. The default scenario type is a Reference type, but this can be changed by adjusting the scenario info object. The Reference type generates a future simulation, where land allocation is calculated in the future, with changes induced by changes in price or agricultural productivity. The Hindcast type simulates land allocation in the past, and it can be used to evaluate the model.
For the Reference type, the years included in the historical period are 1975, 1990, 2005, and 2010; the future period spans from 2015 to 2100 in fiveyear increments. In Hindcast mode, the historical period only includes the year 1975; the future period spans from 1976 to 2010 in annual increments. These years can be modified in the constants file; the only requirement is that calibration data is needed for the "history years" and prices are needed for the "future years".
For Reference simulations, the user can adjust agricul tural prices and productivity growth to craft their own scenario (see below). For Hindcast simulations, price and productivity estimates are from the Food and Agriculture Organization.

Required Input
The gcamland package has three types of input: initializa tion data, hindcast data, and scenario data. There are defaults provided for all types of data in the 'inst/extdata' directory. Initialization data defines the structure of the land nest (see Figure 1) and provides calibration data for the historical period. That calibration data includes land allocation, the value of unmanaged land, agricultural production for managed land types, nonland cost of production, and logit exponents that parameterize the degree of competition among land types [6]. Hindcast data includes prices and yields for all agricultural commodities from 1975 to 2010. Scenario data includes prices for all agricultural commodities for years simulated in the Reference scenario type, and productivity growth (i.e., adjustments to future yields) for managed land types in the future period for the Reference scenario type. We anticipate that users will adjust the scenario data to craft a new scenario (e.g., examine alternative future price or yield trajectories). Users can also adjust the logit exponents for certain levels of the nest by adjusting arguments in the scenario info object.

Calculation Steps
Each gcamland simulation has four steps: setup, initial calculation, final calculation, and reporting. These steps are called from the 'run_model' method. The setup step is called once per simulation. This step initializes the land allocator, reading in calibration data (described above) and calculating the required parameters. The calibration portion of the setup step calculates expected profit and land shares in the historical period and then uses that information to calculate a share weight that is used to ensure that modelcalculated land allocations match readin land allocations in the historical period.
The initial and final calculation steps are run for each model year (both history and future). The initial calculation step ensures that expected price, cost, expected yield, and expected profit are initialized for the future period. The final calculation step calculates land shares for each type within each nest based on the expected profit, logit exponent, and share weight. Using these land shares, the amount of land by type or use is also calculated.
The reporting step gathers all outputs and prints this to a file if requested (described below).

Model Output
Model output includes yield, expected yield, price, expec ted price, expected profit, land shares, and land allocation for all land types. The default file format for most outputs is an rds file (a compressed, serialized R object), but the ' export_data' function enables the output from a single scenario to be written as a csv file. There are also optio nal plotting functions that plot land area by type and region/subregion (Figure 2).

Ensemble Mode
The ensemble mode, invoked through the 'run_ensemble' function, allows users to generate an ensemble of parameter sets. Two types of parameters are varied in the ensemble mode: the logit exponents used at different points in the land nest (Figure 1) and the algorithm for forming expectations about yield and price. The parameters used in the samples are drawn quasirandomly from a uniform distribution. These are used to generate input values for the 'run_model' function. When ensembles are run in Hindcast mode, gcamland will calculate the likelihood of a given set of parameters by comparing calculated land allocation to observed land allocation. Such information can be used to estimate a probability density function for each parameter.

Analysis Functions
The package includes a suite of Bayesian analysis functions. These functions are used in conjunction with the ensemble mode discussed above to estimate characteristics of the posterior probability density function (PDF) for the model parameters. A user can specify both the Bayesian prior for the model parameters and the likelihood function used to compare model outputs to historical data. Functions are supplied to compute the posterior probability density (PPD) of the parameters sampled in the ensemble, as well as statistics summarizing the PDF. Currently supported statistics include maximum a posteriori parameters (i.e., the set of parameters with the highest PPD), marginal expectation values, and highestposteriordensity intervals.
The package also includes a function to compute the Widely Applicable Information Criterion (WAIC) [8,9]. This statistic allows a user to estimate the out of sample performance of a model fit to a particular data set. The WAIC is also useful for model comparison, allowing users to estimate a Bayesian odds ratio for model families, based on their expected out of sample performance.

Installing and Running the Package
The package can be installed directly from its github repository using the R devtools package. From an R prompt, run the command: A list of the main methods is provided in Table 1, including the inputs and outputs of each method.

gcamland v1.0 Example
By following the steps to run an individual simulation listed above, gcamland v1.0 will calculate land use and land cover in the USA through 2100 assuming commodity   prices are constant and crop yields increase as specified in [10]. This results in an increase in land allocated to crops (e.g., corn, rice, wheat, etc.), as yield increases drive an increase in profits, and a decline in unmanaged forest and other natural lands (Figure 2). gcamland v1.0 can examine the effects of other yield and price trajectories on land use and land cover. For example, by altering the future prices specified in 'inst/extdata/scenariodata/ AgPrices_Reference.csv', one can explore the effect of changes in commodity prices on future land use and land cover. gcamland v1.0 can also be used to calculate land use and land cover in other regions. For example, by changing the 'DEFAULT.REGION' in the ' constants.R' file, the model can estimate land use and land cover for 31 prespecified regions. Prices and yields can then be adjusted for each of these regions, as described above.

Quality control
The gcamland package includes automated functional and unit testing. Unit testing is run using the R testthat package and currently covers just over 80% of the code by line count. The ensemble capability described above is not amenable to automated testing, due to the size of the minimum meaningful run. This functionality is hand tested whenever changes to that part of the code base occur. The Bayesian inference functions have an additional set of validation tests that are run on a mock dataset generated from a linear model. This model was also analyzed using the rethinking package [11], and the unit tests require results of the gcamland analysis functions to be consistent with the results from the previously published package. Functional testing is provided by the R builtin package checker, which verifies that packages can be installed, loaded, and unloaded cleanly and that the package code and structure conform to established best practices. The gcamland repository uses continuous integration, provided by Travis CI, ensuring that the test suite is run for every pull request. Tests must pass for the pull request to be merged into the repository. Any pull request that modifies production code or data must also be peer reviewed by another member of the development team.

English
(3) Reuse potential gcamland methods and functions use Roxygen to generate documentation via R's builtin help function. The code base is also thoroughly documented to encourage community development and use. Other regional aggregations, land types, and initialization data can be used through modest modifications to the initialization scripts. gcamland can also be used in other coupled model studies since it can be called from other models.