Metis – A Tool to Harmonize and Analyze Multi-Sectoral Data and Linkages at Variable Spatial Scales

Zarrar Khan1, Thomas Wild1,2, Chris Vernon1, Andy Miller3, Mohamad Hejazi1,2, Leon Clarke1,4, Fernando Miralles-Wilhelm1,2, Raul Munoz Castillo5, Fekadu Moreda6, Julia Lacal Bereslawski5, Micaela Suriano7 and Jose Casado7 1 Joint Global Change Research institute, Pacific Northwest National Laboratory (PNNL), College Park, MD, US 2 Earth System Science Interdisciplinary Center (ESSIC), University of Maryland, College Park, MD, US 3 National Peace Corps Association, Washington, DC, US 4 Center for Global Sustainability, University of Maryland, College Park, MD, US 5 Inter-American Development Bank (IDB), Washington, DC, US 6 Research Triangle Institute (RTI), Research Triangle Park, NC, US 7 Instituto Nacional del Agua (INA), Buenos Aires, AR Corresponding author: Zarrar Khan (zarrar.khan@pnnl.gov)


Introduction
Important planning and decision making surrounding how to craft policies and allocate scarce financial resources to interventions (e.g., infrastructure) is often conducted at the regional (e.g., large river basin) or sub-regional (e.g., small river basin) scale. This regional planning has commonly been conducted in relative isolation by institutions focused on individual sectors (e.g., water resources management, electricity grid management, etc.) [1][2][3][4]. This approach challenges the goal of holistic, crosssectoral decision making, even when it is understood that a holistic approach would be beneficial.
This traditional planning paradigm is becoming decreasingly effective as a result of rapid interconnection among sectors, which exposes individual sectors to a wide array of forces across different spatiotemporal scales (from sub-regional to global). Understanding and modeling the interactions among energy, water, and land systems at regional and sub-regional levels presents substantial modeling and social science challenges [5][6][7].
There does not exist today a modeling platform that can effectively knit sub-regional energy, water, and land systems together at the regional level and connect them to national and global socioeconomic and climatic forces in an internally consistent, computationally efficient, data efficient, and decision-relevant way. Although multisectoral tools increasingly include representations of multiple sectors in a single analytical platform [8][9][10][11][12][13], this approach has not been widely applied at regional and sub-regional scales [14]. At the same time, a different line of research is attempting to link multiple detailed models together at regional levels [15]. However, these linkedmodel methods are confined to research activities and have not proven tractable in operational decision-making contexts that require greater simplicity and computational efficiency for stakeholder engagement and exploration of uncertainty.
To address the challenges outlined above, the Metis model has been developed jointly by researchers from multiple institutions including the Pacific Northwest National Laboratory (PNNL), the University of Maryland (UMD), the Inter-American Development Bank (IDB), the Research Triangle Institute (RTI) and the Instituto Nacional del Agua (INA). Metis is a data organization and simulation platform that knits together multiple sectors within a single framework to facilitate analysis across sectors, including energy, water, land and socioeconomics, at any user-defined spatial and temporal scale of interest (e.g., provinces or river basins). Metis is designed to assemble, harmonize, and visualize data from different sectors in a consistent manner, and use that information to infer relationships between the sectors (e.g., how much water is used in the energy sector). These relationships, defined through an input-output matrix [16], can then be used to explore the cross-sector implications (i.e., nexus impacts) of alternative future policies or investments, such as hydropower or irrigation expansion. Similarly to multi-sectoral models, Metis represents all sectors with comparable detail and resolution, rather than focusing attention on representing particular sectors in more detail.
Data scarcity can be a significant constraint in regional and sub-regional planning studies seeking to evaluate multi-sector dynamics. Metis seeks to overcome this barrier by providing users with default data sets describing energy, water, and land supplies and demands for their specific region of interest. As shown on the left-hand side of Figure 1, the default data is built from outputs of the open-source Global Change Assessment Model (GCAM) and the ecosystem of modelling tools designed specifically to interact with GCAM [17]. GCAM is used to capture long-term regional and global energy-water-land dynamics in response to drivers such as socioeconomic change, technological change, climatic change, and policy decisions. The broader GCAM ecosystem produces a globally consistent, downscaled set of gridded energy, water, and land data that Metis uses by default. The downscaling tools enable sub-regional planning issues to be linked to broader national and international dynamics. Metis aggregates the downscaled/gridded data to any spatial boundary, thus offering users an initial assessment of the energy, water, and land supplies and demands in their regions of interest. This initial assessment can then serve as the starting point that is refined over time as and when local data becomes available. Users can overwrite Metis' default data sets at any time with these local data sets. Figure 1 shows that the motivation behind developing Metis is to bridge the gap between sectoral models working at different spatial and temporal resolutions. Thus, as shown on the right-hand side of Figure 1, users can supplement global data sets with available outputs from models that capture specific sub-regional and sectoral details at finer resolution (e.g., water management or electricity planning models). While limited in scope to specific sectors, finer-scale models can nonetheless provide valuable input data to Metis. Metis integrates across tools by sharing data in standardized formats.
After aggregating data to relevant spatial and temporal scales, Metis identifies the relationships among sectors using linear input/output methods to establish intensity matrices. The intensity matrices represent the inter-sectoral links throughout the system. These linkages are reflected through intensity coefficients, which describes the commodity flows from one sector (e.g., electricity) that are required to produce a unit of output in another sector (e.g., water). These coefficients, which serve as the basis for calibration of the model, are then used to investigate impacts of changes in one sector on others. The sectoral interlinkages that exist in any given Metis application will depend upon the data used to populate the model. Examples include water demands for power plant cooling and hydropower; energy demands for water purification, transfers, and distribution; and both energy and water demands related to agriculture production and land-use change.

Implementation and architecture
Metis is hosted on GitHub at https://github.com/JGCRI/ metis, where users can find the latest version of the software and user documentation. Model details and features described in this publication pertain to the latest version of the model as of the date of this publication.
Metis is designed to be accessible to a range of stakeholders with varying expertise and goals related to multisector, multi-scale analyses. Various Metis functions can be used independently for different purposes, including visualization, charting, spatial aggregation, mapping, and inter-sectoral dynamics. Figure 2 summarizes key Metis modules and functions. Briefly, existing model capabilities include: The following sub-sections walk users through the installation and example implementations of the key Metis functions shown in Figure 2.  links as shown in Figure 3. In Figure 3b the intensity matrix shows the intensity of inter-related sectors such as the "ag" sector which requires 0.97 units of "water" and 0.48 units of "elec" to produce 1 unit.

metis.readgcam.R
A key feature of metis is the ability to interact with different models. The core Metis model comes with the ability to connect directly with GCAM. This connection is made through the metis.readgcam.R function. The model comes with an example GCAM database ".proj" file called Example_dataProj.proj, which lies in the directory ./metis/dataFiles/ gcam/. GCAM produces output in the form of a database. The database contains outputs from various scenario runs. metis.readgcam.R uses the R package rgcam to establish a connection with the GCAM database and retrieves data based on "queries" provided in an ".xml" file. An example query file called metisQueries.xml is also provided in the same folder. Often scenario names in the model can be long and not appropriate for final figures. This function allows you to rename the scenarios as they are read in. Once the data has been extracted from a GCAM database it is saved in a ".proj" file. Reading data from the GCAM database can take a considerable amount of time depending on the number of scenarios it contains. The metis.readgcam.R function gives the option of directly providing a ".proj" file, which can be loaded directly, or using the ".proj" file from a previous run by setting the parameter reReadData to FALSE.
The metis.master.R file provides a step-by-step guide to connecting with GCAM model outputs. GCAM being a global multi-sectoral human-earth system model provides users with a large database of outputs which can be difficult to manipulate. The connection with Metis will allow users to easily sub-set relevant parameters of interest and compare across scenarios and time periods as well as visualize these results spatially. This functionality will assist planners and stakeholders to easily visualize and assess the regional and sub-regional implications of various policies, climate change, and different socio-economic and technological pathways on multiple resource sectors such as water, energy, and land.

metis.readgcam.R
This function is designed to interact specifically with GCAM outputs. The function processes GCAM outputs into .csv files by GCAM region, which can then be used as inputs to metis. chartsProcess.R metis.chart.R A charting function that allows quick and easy access to features like facets, labels and colors. The function is based on ggplot and returns a ggplot chart.
metis.chartsProcess.R A charting function used to compare GCAM time series outputs across scenarios and regions. The function also creates diff plots with percentage and absolute differences from a given reference scenario.

metis.map.R
A mapping function used to plot raster and polygon data. The function uses the tmap package and returns a tmap object. Several maps can be combined by overlaying and underlaying using this function. Options allow for different color palettes, labels, and text-size. Visualization features include legend breaks that are freescale, kmeans, or equally divided to highlight different kinds of data.

metis.boundaries.R
Metis mapping function to plot shape file boundaries and surrounding regions for quick visualization of any region of interest.
metis.grid2poly.R Function used to crop and aggregate gridded data by a given polygon shape file. If no grid is provided, the function can still be used to produce regional and subregional maps.

metis.mapsProcess.R
Metis mapping function used to compare across scenarios. The function produces diff maps with percentage and absolute differences from a given reference scenario.

metis.prepGrid.R
This function is designed to be used with specific open-source downscaling models (Xanthos [18], Demeter [19], and Tethys [20]) that downscale GCAM data to the grid level. The function takes outputs from these various models and processes them into the format required for providing input to the metis.mapsProcess.R function.

metis.assumptions.R
Contains all conversions and assumptions used in the model.

metis.colors.R
Collection of Metis color palettes. A list of palettes can be viewed in the function help file (?metis.colors). To view a particular palette: metis.colors("pal_hot").
metis.networkOrder.R Determines the order in which to route water flows through the network of sub-regions, given a user-specified network connectivity matrix.
metis.waterBalance.R Determines the natural flows from upstream sub-regions to downstream sub-regions via routing.

metis.chartsProcess.R
After running metis.chartsProcess.R several charts will be produced. These are saved in separate directories for each region, and within each directory Metis creates a sub-directory for each scenario as well as sub-directories for cross-scenario comparisons. In addition to the individual regional directories a directory called compareRegions contains inter-regional comparisons for each scenario as well as a cross-region/ cross-scenario comparison. Tables with the data used for each figure are also provided in corresponding folders. metis.master.R provides examples for using the example data from the default GCAM database to show how different parameters across scenarios and regions can be compared using metis.chartsProcess.R. Some example figures are show in Figure 4. for "Maps" is created. The Maps directory will contain a "Boundary" directory which contains the boundary files defining each region and subregion. The function also shows how different grid cells sizes compare with the selected regions. This is useful to understand if the desired regional resolution is too fine for a particular grid size. Figure 5 shows example outputs of the function used to visualize different boundaries and how they overlap or intersect with other shapes and regions.
metis.grid2poly.R metis.grid2poly.R is used to aggregate gridded data to user-specified sub-regional spatial scales. metis. grid2poly.R accepts gridded data in the form of a table with latitude and longitude coordinates and a value for each point. Depending on the type of gridded data (volume or depth) the aggregation is done based on the part of the polygon that intersects with the grid cells, as shown in Figure 6. "Volume" here is defined as data such as volume of runoff or number of people in a grid cell. For this type of data the aggregate value for the polygon can be calculated as the sum of the values in each grid cell weighted by the percentage of area of that grid cell that overlaps with the polygon as shown in Method 1 in Figure 6. "Depth" on the other hand is data such as mm of water or water scarcity calculated as ratio. For this type of data the mean of the data weighted by the part of the area of the polygon within each grid cell is used to calculate the aggregate value for the polygon as shown in Method 2 Figure 6. Outputs from metis.grid2poly.R include tables with the original data by grid cell as well as new tables with data aggregated by the polygons from the shapefile provided.
metis.mapsProcess.R metis.mapsProcess.R takes provided gridded and/or polygon data (which can be prepared using the suite of Metis functions defined previously) to produce maps as shown in the example in Figure 7. In addition, difference maps showing the absolute and percentage difference between a selected reference scenario and     all other scenarios can also be produced. Each map is produced with three kinds of legends that allow the user to visualize spatial data in different ways: Freescale, Kmeans, and equal breaks. The color schemes for the plots are determined in metis.colors.R and can be adjusted by advanced users. Animations that show a map changing over time are also created for each type of map.

Using Metis with multiple models and data products
Metis has been designed to be modular and easily usable with outputs from any model or data product after the data has been processed into simple standardized Metis formats. A summary of the standard formats is provided below. The function metis.charts.process.R can take as its input any ".csv" table or "R" table to produce all the charts described in the section "metis.charts. process.R" above to compare data across parameters, regions and scenarios. Any number of tables can be provided to the function as long as they contain the headers as shown in the blank template below ( Table 2). This simple table can include data from any other model or raw data as collected from experts or local stakeholders.
Similarly, the function metis.maps.process.R is set up to accept tables for both polygons and rasters (gridded data with latitude and longitude co-ordinates) as described in the section "metis.maps.process.R". These tables need to be in the format as shown in the example blank templates ( Table 3 and Table 4). The polygon table must be accompanied with a shapefile that contains the region and sub-regions corresponding to the names listed in the table. As explained in the section "metis.boundaries.R", Metis comes pre-loaded with global maps with national, state, provincial, and river basins boundary demarcations. For raster data ( Table 4), sub-regions names should be replaced by latitude and longitude values. The function can be pointed to multiple tables and data as it is collected and Metis will produce corresponding visualizations to compare across scenarios, parameters, and time periods.
A key motivation behind allowing Metis functions to work with simple .csv tables is accessibility to a variety of stakeholders who may not be programmers or scientists. The templates presented in Tables 2, 3 and 4 can be easily understood and filled by anyone using any basic spreadsheet software. Data from various stakeholders can then easily be compared with outputs from other more detailed models. The accessibility to the input data in this simplified format accompanied by corresponding graphics from Metis facilitates consistent stakeholder engagement as data products and modeling scenarios evolve over the course of a project. Furthermore, the tables and data can be progressively filtered to hone-in on and identify key stakeholder and locally relevant parameters and nexus challenges.

Extension and Modification of Software
As documented in this paper, Metis is a freely available opensource software and remains under constant development.
Functions are designed to be highly modular and improvements and ideas for enhancements are welcome through the GitHub platform. Independent programmers and institutions are welcome (and encouraged) to fork versions of Metis and develop additional functionalities as needed. As described in the section "Using Metis with multiple models and data products", Metis is designed to work with data tables in a simple standardized format. The function metis.readgcam.R is an example of a function designed to post-process data from outputs of the GCAM model. Users, working with other multi-sector