(1) Overview

Introduction

Spatial models are needed to enable numerical problems to be solved in a broad range of scientific applications. Representation of data and modelled properties can be discretizised to a grid. Each cell in the grid can contain a value from measurements or a modelled value, at a position defined in space and time. Cells can also be assigned a value by interpolation of nearby data points or by assumptions. The location of each grid cell is specified along the dimensions by index number or coordinates from e.g. a geographic coordinate system. Grids that represent part of Earth must also be associated with a geodetic datum for reference to the physical world. Cells in a regular grid represent the shape of parallelepipeds, and can be rectilinear or Cartesian. The latter is the special case where the cells are unit squares, or unit cubes. Some data, e.g. surface elevation, can be expressed in only two dimensions. Other parameters can vary in all spatial directions, and time, and need to be represented in a multidimensional grid. The cell size limits the resolution of the model, smaller cells can represent higher frequencies, but a denser and larger grid add exponentially to the computing cost []. To populate a grid model, data are generally imported from different sources and in various formats. Images and continuous data are often available as regular raster files, while some observations are provided as points in an irregular grid, or vector data as polygons and lines. Spatial data are published in different data, projections and coordinate systems. Given this variety of formats and reference conventions, it is inevitable that combining data from different sources often presents a challenge.

The computational framework

We share agrid, a framework to produce a regular grid for multidimensional and multivariate spatial modelling, processing and analysis. The extended functionality of the grid addresses many of the challenges in working with spatial 2D and 3D data noted above. Following the principles of Wilson et al. (2014) [], the code is written in highest possible language level and made readable and intelligible. We use the general-purpose programming language Python 3. Python is equipped with libraries for fast array operations [, ], basic statistics [], signal processing and other scientific tools [], machine learning [], visualisation [, ] and discipline specific libraries for e.g. seismology [, ], astronomy [] and GIS [, , ]. Python also provides interfaces for other languages as R, C and Fortran. All those tools and packages can be reached from the open structure of agrid (Figure 1).

Figure 1 

Components of agrid: accessory methods, the class Grid() and example-specific code (feature methods). A class object (brown) contains functions for e.g. import and export. It also contains the xarray dataset (gray) and attributes. Various data formats (left) are converted to numpy arrays and incorporated as data arrays in an xarray dataset. Each data array can be associated to coordinates. The dataset also contains metadata (green). Data can be exported or visualized (right). Accessory methods include a download function to link the Grid() class directly to the data source if required, e.g. for dynamic updating. A few example-specific methods are also distributed together with the module [, , ].

A few related open-source projects provide useful code for the Earth Sciences community; GemPy [] is a package that facilitates stochastic geomodeling and probabilistic programming. The package uses the linear algebra compiler Theano [] for efficient computation. Another related project is Verde [] and the Fatiando tool box, which contains advanced methods for e.g. interpolation. There are also examples of successful projects that connect various data sources with users. Quantarctica [] makes Antarctic datasets from various sources easily accessible in a Geographical Information System (GIS) application, QGIS []. However, even with some 3D functionality in recent upgrades, GIS is predominantly a 2D frame. Another related project is the multidimensional DataCube [, ]. DataCube pre-processes and presents remote sensing geographical and geophysical attributes for researchers and the broader public. DataCube is mainly targeted for changes (e.g. in Landsat raster data) over time, but has a broad range of possible applications. In comparison, agrid is relatively light, easy to modify, and the dependencies are kept to a minimum. Data held in the agrid environment are not regarded only as a set of values: each observation can include quantified uncertainty, probability or likelihood, and data can also be associated with metadata for provenance. It is advantageous that cells of a grid model can be populated with such allied information, together with the dataset.

agrid was initially developed for studies of the Antarctic lithosphere [, ], and pre-processing of geophysical data for visualisation purposes [], but with updates as presented here, it can be used in any discipline, geographical region, projection, dimensionality and any resolution. This initial release of the code is presented with tutorial notebooks that demonstrate its usage. The examples given in this paper can be reproduced from the provided SConstruct script [, , ].

Subsequent versions of agrid will include additional functionality. We plan, e.g., additional methods for conversion and improved visualisation, support hexagonal 2D grids, curvilinear grid and increased polar and spherical functionality. We hope that colleagues will find this contribution useful, and hopefully encourage scientists to share code and publish reproducible studies.

Implementation and architecture

agrid is structured as a Python module that imports dependencies and defines an agrid class object, Grid(), when imported. When calling Grid(), an object is created that represents the spatial extent of the model space. The grid is initiated with projection, extent and resolution. When the instance of the agrid class object is created, an xarray dataset is defined with dimensions and populated with coordinates. Dimensions includes, but are not limited to, space (X, Y, Z), time (t) and frequency bands (e.g. RGB). Models might also include probability or likelihood. Extent is defined as left, right, up and down, and refers to the rectangular map view. Predefined coordinates are the default units for the projection, e.g. x and y in metres, and degrees in WGS 1984, EPSG:4326. At setup, there is an option of the grid can represent both the corners or the centre points of each cell. The default settings gives a coarse global grid of WGS84 (EPSG:4326), with a resolution of 1° ≈ 111.1 km.

agrid facilitates access to array operations in the spatial domains, as projected grid cells. The data is stored as data arrays in an xarray dataset [, ]. xarray is built on numpy [, ] and pandas [], and provides high level functions for labelled multidimensional datasets. xarray has a structure similar to netCDF file format [] and netCDF is also used as the native format to store grids. By using dask arrays, only the data used is loaded into memory in chunks []. dask also facilitates some parallel computing. Grid cells can be selected with the advanced indexing methods in xarray by geographical coordinates as well as index numbers in the grid.

Additional coordinates with different resolution can be created and added to the object at any point. Computations with data grids of different resolution are performed by generating vectors from chunks of the larger array so that the resulting grid sizes are identical. The vectors are unfolded back to the higher resolution grid after the computation. By using this approach, fast numpy operations can be applied on arrays of different shapes and size and there is no need to over-sample low resolution data.

In a research project, agrid can point directly to original data sources. This simplifies the workflow, as development can be done in low resolution or small extent, but larger grids can be used when required and data-sets can easily be swapped. Pre-processing and visualisation can be moved from third part software or stand-alone applications to a condensed workflow (Figure 1 and Listings 1 and 2). This provides overview and facilitates reproducibility and flexibility for the researcher [].

from agrid.grid import Grid
from agrid.acc import download
km = 1000
 
# Initiate a class object and set resolution and extent of model: 
ant = Grid(res = [10*km, 10*km],
        crs = 3031,
        depths = [0*km, 10*km, 20*km, 50*km, 100*km],
        left = -3100*km,
        up = 3100*km,
        right = 3100*km,
        down = -3100*km)
 
# Download and import: 
bedmap_url = 'https://link/to/bedmap2_tiff.zip'
bedmap_path = 'data/bedmap2'
download(bedmap_url, bedmap_path + '.zip')
 
GSFC_url = 'http://link/to//GSFC_DrainageSystems'
GSFC_files = 'data/GSFC_DrainageSystems'
for shape_ext in ['.shp','.shx','.prj', '.dbf', '.qix']:
    download(GSFC_url + shape_ext, GSFC_files + shape_ext)
 
# Bulk import grid files from directory: 
seis_url = 'http://link/to/AN1-S_depth_grd.tar.gz'
seis_path = 'data/an/'
download(seis_url, seis_path, bulk=True,
          meta_dict = {'Model':'AN1-S', 'DOI': '10.1002/2014JB011332'})
 
# Import raster files 
for data_set, label in zip(['thickness', 'bed'], ['ICE', 'DEM']):
    ant.ds[label] = (('Y', 'X'),
        ant.read_raster('%s /bedmap2_%s . tif' %(bedmap path , data_set),
            no_data = 32767.))
 
# Import polygons, here the attribute 'ID' is used to define segments. 
ant.ds['DRAINAGE'] = (('Y', 'X'), ant.assign_shape(GSFC_file + '.shp','ID'))
 
# Import grid files to 3D data array. 
# Keyword 'bulk' imports all files in directory
ant.ds['AN1-S'] = (('Y', 'X', 'Z'), ant.read_grid('../local/an/', bulk=True))

Listing 1: Initiation of a grid object, defining extent and projection for Antarctica, in this example. The code downloads and assigns Bedmap [], Antarctic drainage systems, GSFC [] and wave speed from 3D seismic tomography [] to the grid.

# Select a few polygons: 
ant.ds['SEL_ICE'] = ant.ds['ICE']*ant.ds['DRAINAGE'].isin(list(range(0, 53//2)))
 
# Make some 3D data, using e.g. Python or numpy functions
ant.ds['RANDOM'] = (('Y', 'X', 'Z'), np.random.rand(*ant.shape3))
 
# Make maps: 
# Fig. 2a
ant.map_grid('DRAINAGE',
    cmap='RdBu',
    save_name= 'fig/drainage.pdf')
 
# Fig. 2b
ant.map_grid('SEL_ICE',
    cmap = 'viridis',
    save_name = 'fig/selected.pdf')
 
# Fig. 2c
ant.layer_cake('AN1-S',
    cmap = 'BrBG_r',
    save_name = 'fig/layers.pdf')
 
# Fig. 2d
ant.oblique_view('DEM',
    vmin = 0, vmax = 4200,
    cmap = 'bone',
    azimuth = 180, roll = -90,
    save_name = 'fig/oblique_view.pdf')
 
# Analyse: 
# Calculate the volume of the ice in selected segments. 
volume = int(ant.ds[‘SEL_ICE’].sum()*np.prod(ant.res)/km**3)
 
# Export as geoTiff: 
ant.grid_to_raster('SEL_ICE','selected_ice.tif')

Listing 2: Visualization, analyse and export. The code generates all figures in Figure 2.

Example of grid generation and data import

Code in Listing 1 generates a frame of Antarctica, using WGS 84/Antarctic Polar Stereographic projection and a lateral cell size of 10 km × 10 km. The Extent is defined in the default unit of the projection. Coordinate reference system (CRS), is given as an integer and therefore interpreted as an EPSG code. For this example, the 2D grid is Cartesian and quadratic, but the depths slices are defined by the list depths. Due to the convention of indexing arrays as row – column and geographical coordinates as lat – lon, grid coordinates are also given as Y – X for consistency.

The instance of Grid() class contains a number of functions to import data of different types, visualisation and export (Figure 1). Raster data, e.g. GeoTiff, can be imported with a method using rasterio [] and the underlying gdal []. Rasters are warped to fit the extent, resolution and projection of the grid. An imported raster is shown in Figure 2b. Vector data are imported with fiona [] and geopandas [] with options for rasterization of attribute data and interpolation. Grids or data points can be read from a number of formats and interpolated. A rasterized polygon dataset is shown in Figure 2a and is also used to crop and select data in Figure 2c–d.

Figure 2 

Data input and visualisation examples generated by code Listings 1 and 2. (a) Vector polygon data (drainage systems []). (b) Subset of raster data (ice thickness []) Polygon vector data [] is used to select a part of the continuous raster. (c) 3D layered plot of seismic data []. (d) Example of 3D rendering. Supplied tutorials and SCons script contain further details. The code may be used for any geographic area, at any scale.

Example of visualization and data export

The class also contains functions for visualisation using matplotlib [] and Cartopy [] (Figure 2a–c). Map views with e.g. coast lines and coordinates can be produced directly by agrid. Mayavi [] and the underlying VTK [] are used for 3D visualisation (Figure 2d). Data can be exported as netCDF, GeoTiff or ASCII formats. JSON format is used to import metadata and export model parameters.

Quality control

The module is published with a number of tutorials to demonstrate the functionality with different data sources, scales and extent. Known limitations exist in the visualization methods for less common projections and some warnings are not handled smoothly. Error handling mainly relies on used dependencies with only limited functionality in agrid itself. Development errors have been ruled out by comparing results from other GIS applications. 2D data that have been imported, processed and exported, have been compared to similar processing in the GIS applications QGIS. Those test cases and additional test code are also available from the project’s github repository []. The updated issue tracker is likewise available at github.

(2) Availability

Operating system

The code is developed and tested in Ubuntu 16.04, 18.04 and macOS High Sierra 10.13.6. It has also been tested on Windows 10.

Programming language

Python >= 3.6 (tested on Python 3.6 and Python 3.7).

Additional system requirements

Very low requirements for basic use, but can be scaled up for larger grids. The use of dask arrays relax the need for large RAM.

Dependencies

The class depends on a number of Python packages that can all be installed by package managers, e.g. pip3 or conda: Minimum dependencies: cartopy geopandas matplotlib json numpy pyproj rasterio scipy xarray

Additional dependencies used and imported only by some methods: datetime fiona imageio mayavi requests shapely tarfile tqdm zipfile.

List of contributors

Tobias Stål, Anya M. Reading

Software location

Name: agrid

Persistent identifier:https://doi.org/10.5281/zenodo.2553965

Licence: MIT License

Publisher: Tobias Stål

Version published: 0.4.0

Date published: January 25, 2020

Code repository

Name: GitHub

Persistent identifier:https://github.com/TobbeTripitaka/agrid.git

Licence: MIT License

Date published: January 25, 2020

Language

agrid was developed in English.

(3) Reuse potential

agrid is deliberately developed for reuse in a broad range of applications. The code is commented and explained to guide and advice modifications. The code could be useful for any spatial processing and analysis in areas such as solid Earth geophysics, geotechnical and environmental applications. For some uses, the complete package might be installed, but with the open architecture, copied snippets or methods can be included into other projects. The MIT license allows for a broad reuse. Functionality and issues may be discussed on the code repository. Python and the used libraries are also supported by large online communities.