Spatial models are needed to enable numerical problems to be solved in a broad range of scientific applications. Representation of data and modelled properties can be discretizised to a grid. Each cell in the grid can contain a value from measurements or a modelled value, at a position defined in space and time. Cells can also be assigned a value by interpolation of nearby data points or by assumptions. The location of each grid cell is specified along the dimensions by index number or coordinates from e.g. a geographic coordinate system. Grids that represent part of Earth must also be associated with a geodetic datum for reference to the physical world. Cells in a regular grid represent the shape of parallelepipeds, and can be rectilinear or Cartesian. The latter is the special case where the cells are unit squares, or unit cubes. Some data, e.g. surface elevation, can be expressed in only two dimensions. Other parameters can vary in all spatial directions, and time, and need to be represented in a multidimensional grid. The cell size limits the resolution of the model, smaller cells can represent higher frequencies, but a denser and larger grid add exponentially to the computing cost [35]. To populate a grid model, data are generally imported from different sources and in various formats. Images and continuous data are often available as regular raster files, while some observations are provided as points in an irregular grid, or vector data as polygons and lines. Spatial data are published in different data, projections and coordinate systems. Given this variety of formats and reference conventions, it is inevitable that combining data from different sources often presents a challenge.
We share agrid, a framework to produce a regular grid for multidimensional and multivariate spatial modelling, processing and analysis. The extended functionality of the grid addresses many of the challenges in working with spatial 2D and 3D data noted above. Following the principles of Wilson et al. (2014) [39], the code is written in highest possible language level and made readable and intelligible. We use the general-purpose programming language Python 3. Python is equipped with libraries for fast array operations [23, 37], basic statistics [20], signal processing and other scientific tools [15], machine learning [24], visualisation [14, 26] and discipline specific libraries for e.g. seismology [3, 21], astronomy [28] and GIS [8, 10, 16]. Python also provides interfaces for other languages as R, C and Fortran. All those tools and packages can be reached from the open structure of agrid (Figure 1).
Components of agrid: accessory methods, the class Grid() and example-specific code (feature methods). A class object (brown) contains functions for e.g. import and export. It also contains the xarray dataset (gray) and attributes. Various data formats (left) are converted to numpy arrays and incorporated as data arrays in an xarray dataset. Each data array can be associated to coordinates. The dataset also contains metadata (green). Data can be exported or visualized (right). Accessory methods include a download function to link the Grid() class directly to the data source if required, e.g. for dynamic updating. A few example-specific methods are also distributed together with the module [22, 33, 34].
A few related open-source projects provide useful code for the Earth Sciences community; GemPy [4] is a package that facilitates stochastic geomodeling and probabilistic programming. The package uses the linear algebra compiler Theano [2] for efficient computation. Another related project is Verde [36] and the Fatiando tool box, which contains advanced methods for e.g. interpolation. There are also examples of successful projects that connect various data sources with users. Quantarctica [30] makes Antarctic datasets from various sources easily accessible in a Geographical Information System (GIS) application, QGIS [25]. However, even with some 3D functionality in recent upgrades, GIS is predominantly a 2D frame. Another related project is the multidimensional DataCube [18, 19]. DataCube pre-processes and presents remote sensing geographical and geophysical attributes for researchers and the broader public. DataCube is mainly targeted for changes (e.g. in Landsat raster data) over time, but has a broad range of possible applications. In comparison, agrid is relatively light, easy to modify, and the dependencies are kept to a minimum. Data held in the agrid environment are not regarded only as a set of values: each observation can include quantified uncertainty, probability or likelihood, and data can also be associated with metadata for provenance. It is advantageous that cells of a grid model can be populated with such allied information, together with the dataset.
agrid was initially developed for studies of the Antarctic lithosphere [33, 34], and pre-processing of geophysical data for visualisation purposes [22], but with updates as presented here, it can be used in any discipline, geographical region, projection, dimensionality and any resolution. This initial release of the code is presented with tutorial notebooks that demonstrate its usage. The examples given in this paper can be reproduced from the provided SConstruct script [5, 6, 17].
Subsequent versions of agrid will include additional functionality. We plan, e.g., additional methods for conversion and improved visualisation, support hexagonal 2D grids, curvilinear grid and increased polar and spherical functionality. We hope that colleagues will find this contribution useful, and hopefully encourage scientists to share code and publish reproducible studies.
agrid is structured as a Python module that imports dependencies and defines an agrid class object, Grid(), when imported. When calling Grid(), an object is created that represents the spatial extent of the model space. The grid is initiated with projection, extent and resolution. When the instance of the agrid class object is created, an xarray dataset is defined with dimensions and populated with coordinates. Dimensions includes, but are not limited to, space (X, Y, Z), time (t) and frequency bands (e.g. RGB). Models might also include probability or likelihood. Extent is defined as left, right, up and down, and refers to the rectangular map view. Predefined coordinates are the default units for the projection, e.g. x and y in metres, and degrees in WGS 1984, EPSG:4326. At setup, there is an option of the grid can represent both the corners or the centre points of each cell. The default settings gives a coarse global grid of WGS84 (EPSG:4326), with a resolution of 1° ≈ 111.1 km.
agrid facilitates access to array operations in the spatial domains, as projected grid cells. The data is stored as data arrays in an xarray dataset [12, 13]. xarray is built on numpy [23, 37] and pandas [20], and provides high level functions for labelled multidimensional datasets. xarray has a structure similar to netCDF file format [27] and netCDF is also used as the native format to store grids. By using dask arrays, only the data used is loaded into memory in chunks [29]. dask also facilitates some parallel computing. Grid cells can be selected with the advanced indexing methods in xarray by geographical coordinates as well as index numbers in the grid.
Additional coordinates with different resolution can be created and added to the object at any point. Computations with data grids of different resolution are performed by generating vectors from chunks of the larger array so that the resulting grid sizes are identical. The vectors are unfolded back to the higher resolution grid after the computation. By using this approach, fast numpy operations can be applied on arrays of different shapes and size and there is no need to over-sample low resolution data.
In a research project, agrid can point directly to original data sources. This simplifies the workflow, as development can be done in low resolution or small extent, but larger grids can be used when required and data-sets can easily be swapped. Pre-processing and visualisation can be moved from third part software or stand-alone applications to a condensed workflow (Figure 1 and Listings 1 and 2). This provides overview and facilitates reproducibility and flexibility for the researcher [11].
from agrid.grid import Grid
from agrid.acc import download
km = 1000
# Initiate a class object and set resolution and extent of model:
ant = Grid(res = [10*km, 10*km],
crs = 3031,
depths = [0*km, 10*km, 20*km, 50*km, 100*km],
left = -3100*km,
up = 3100*km,
right = 3100*km,
down = -3100*km)
# Download and import:
bedmap_url = 'https://link/to/bedmap2_tiff.zip'
bedmap_path = 'data/bedmap2'
download(bedmap_url, bedmap_path + '.zip')
GSFC_url = 'http://link/to//GSFC_DrainageSystems'
GSFC_files = 'data/GSFC_DrainageSystems'
for shape_ext in ['.shp','.shx','.prj', '.dbf', '.qix']:
download(GSFC_url + shape_ext, GSFC_files + shape_ext)
# Bulk import grid files from directory:
seis_url = 'http://link/to/AN1-S_depth_grd.tar.gz'
seis_path = 'data/an/'
download(seis_url, seis_path, bulk=True,
meta_dict = {'Model':'AN1-S', 'DOI': '10.1002/2014JB011332'})
# Import raster files
for data_set, label in zip(['thickness', 'bed'], ['ICE', 'DEM']):
ant.ds[label] = (('Y', 'X'),
ant.read_raster('%s /bedmap2_%s . tif' %(bedmap path , data_set),
no_data = 32767.))
# Import polygons, here the attribute 'ID' is used to define segments.
ant.ds['DRAINAGE'] = (('Y', 'X'), ant.assign_shape(GSFC_file + '.shp','ID'))
# Import grid files to 3D data array.
# Keyword 'bulk' imports all files in directory
ant.ds['AN1-S'] = (('Y', 'X', 'Z'), ant.read_grid('../local/an/', bulk=True))
Listing 1: Initiation of a grid object, defining extent and projection for Antarctica, in this example. The code downloads and assigns Bedmap [7], Antarctic drainage systems, GSFC [40] and wave speed from 3D seismic tomography [1] to the grid.
# Select a few polygons:
ant.ds['SEL_ICE'] = ant.ds['ICE']*ant.ds['DRAINAGE'].isin(list(range(0, 53//2)))
# Make some 3D data, using e.g. Python or numpy functions
ant.ds['RANDOM'] = (('Y', 'X', 'Z'), np.random.rand(*ant.shape3))
# Make maps:
# Fig. 2a
ant.map_grid('DRAINAGE',
cmap='RdBu',
save_name= 'fig/drainage.pdf')
# Fig. 2b
ant.map_grid('SEL_ICE',
cmap = 'viridis',
save_name = 'fig/selected.pdf')
# Fig. 2c
ant.layer_cake('AN1-S',
cmap = 'BrBG_r',
save_name = 'fig/layers.pdf')
# Fig. 2d
ant.oblique_view('DEM',
vmin = 0, vmax = 4200,
cmap = 'bone',
azimuth = 180, roll = -90,
save_name = 'fig/oblique_view.pdf')
# Analyse:
# Calculate the volume of the ice in selected segments.
volume = int(ant.ds[‘SEL_ICE’].sum()*np.prod(ant.res)/km**3)
# Export as geoTiff:
ant.grid_to_raster('SEL_ICE','selected_ice.tif')
Listing 2: Visualization, analyse and export. The code generates all figures in Figure 2.
Code in Listing 1 generates a frame of Antarctica, using WGS 84/Antarctic Polar Stereographic projection and a lateral cell size of 10 km × 10 km. The Extent is defined in the default unit of the projection. Coordinate reference system (CRS), is given as an integer and therefore interpreted as an EPSG code. For this example, the 2D grid is Cartesian and quadratic, but the depths slices are defined by the list depths. Due to the convention of indexing arrays as row – column and geographical coordinates as lat – lon, grid coordinates are also given as Y – X for consistency.
The instance of Grid() class contains a number of functions to import data of different types, visualisation and export (Figure 1). Raster data, e.g. GeoTiff, can be imported with a method using rasterio [10] and the underlying gdal [38]. Rasters are warped to fit the extent, resolution and projection of the grid. An imported raster is shown in Figure 2b. Vector data are imported with fiona [9] and geopandas [16] with options for rasterization of attribute data and interpolation. Grids or data points can be read from a number of formats and interpolated. A rasterized polygon dataset is shown in Figure 2a and is also used to crop and select data in Figure 2c–d.
Data input and visualisation examples generated by code Listings 1 and 2. (a) Vector polygon data (drainage systems [40]). (b) Subset of raster data (ice thickness [7]) Polygon vector data [40] is used to select a part of the continuous raster. (c) 3D layered plot of seismic data [1]. (d) Example of 3D rendering. Supplied tutorials and SCons script contain further details. The code may be used for any geographic area, at any scale.
The class also contains functions for visualisation using matplotlib [14] and Cartopy [41] (Figure 2a–c). Map views with e.g. coast lines and coordinates can be produced directly by agrid. Mayavi [26] and the underlying VTK [31] are used for 3D visualisation (Figure 2d). Data can be exported as netCDF, GeoTiff or ASCII formats. JSON format is used to import metadata and export model parameters.
The module is published with a number of tutorials to demonstrate the functionality with different data sources, scales and extent. Known limitations exist in the visualization methods for less common projections and some warnings are not handled smoothly. Error handling mainly relies on used dependencies with only limited functionality in agrid itself. Development errors have been ruled out by comparing results from other GIS applications. 2D data that have been imported, processed and exported, have been compared to similar processing in the GIS applications QGIS. Those test cases and additional test code are also available from the project’s github repository [32]. The updated issue tracker is likewise available at github.
The code is developed and tested in Ubuntu 16.04, 18.04 and macOS High Sierra 10.13.6. It has also been tested on Windows 10.
Python >= 3.6 (tested on Python 3.6 and Python 3.7).
Very low requirements for basic use, but can be scaled up for larger grids. The use of dask arrays relax the need for large RAM.
The class depends on a number of Python packages that can all be installed by package managers, e.g. pip3 or conda: Minimum dependencies: cartopy geopandas matplotlib json numpy pyproj rasterio scipy xarray
Additional dependencies used and imported only by some methods: datetime fiona imageio mayavi requests shapely tarfile tqdm zipfile.
Tobias Stål, Anya M. Reading
Name: agrid
Persistent identifier:https://doi.org/10.5281/zenodo.2553965
Licence: MIT License
Publisher: Tobias Stål
Version published: 0.4.0
Date published: January 25, 2020
Name: GitHub
Persistent identifier:https://github.com/TobbeTripitaka/agrid.git
Licence: MIT License
Date published: January 25, 2020
agrid was developed in English.
agrid is deliberately developed for reuse in a broad range of applications. The code is commented and explained to guide and advice modifications. The code could be useful for any spatial processing and analysis in areas such as solid Earth geophysics, geotechnical and environmental applications. For some uses, the complete package might be installed, but with the open architecture, copied snippets or methods can be included into other projects. The MIT license allows for a broad reuse. Functionality and issues may be discussed on the code repository. Python and the used libraries are also supported by large online communities.
We are grateful for discussions and test cases with Eleri Evans, Jacqueline Halpin, Shawn Hood, Peter E Morse and Joanne Whittaker. We also wish to thank two anonymous reviewers.
The authors have no competing interests to declare.
An, M, Wiens, D A, Zhao, Y, Feng, M, Nyblade, A A, Kanao, M, Li, Y, Maggi, A and Lévêque, J-J 2015 S-velocity model and inferred Moho topography beneath the Antarctic Plate from Rayleigh waves. Journal of Geophysical Research, Solid Earth, 120: 2007–2010. DOI: https://doi.org/10.1002/2014JB011332
Bergstra, J, Breuleux, O, Bastien, F, Lamblin, P, Pascanu, R, Desjardins, G, Turian, J, Warde-Farley, D and Bengio, Y 2010 Theano: A cpu and gpu math compiler in python. In Proc. 9th Python in Science Conf, 1.
Beyreuther, M, Barsch, R, Krischer, L, Megies, T, Behr, Y and Wassermann, J 2010 ObsPy: A Python toolbox for seismology. Seismological Research Letters, 81(3): 530–533. ISSN 0895-0695. DOI: https://doi.org/10.1785/gssrl.81.3.530
de la Varga, M, Schaaf, A and Wellmann, F 2019 Gempy 1.0: Open-source stochastic geological modeling and inversion. Geoscientific Model Development. DOI: https://doi.org/10.5194/gmd-2018-61
Fomel, S 2013 Tour with Madagascar, Revisiting s. e. p. and SCons. Journal of Open research software.
Fomel, S and Hennenfent, G 2007 Computational experiments using SCons, reproducible. ICASSP, 2007, 1257–1260. DOI: https://doi.org/10.1109/ICASSP.2007.367305
Fretwell, P, Pritchard, H D, Vaughan, D G, Bamber, J L N, Barrand, E, Bell, R, Bianchi, C, Bingham, R G, Blankenship, D D, Casassa, G, Catania, G, Callens, D, Conway, H, Cook, A J, Corr, H F J, Damaske, D, Damm, V, Ferraccioli, F, Forsberg, R, Fujita, S, Gim, Y, Gogineni, P, Griggs, J A, Hindmarsh, R C A, Holmlund, P, Holt, J W, Jacobel, R W, Jenkins, A, Jokat, W, Jordan, T, King, E C, Kohler, J, Krabill, W, Riger-Kusk, M, Langley, K A, Leitchenkov, G, Leuschen, C, Luyendyk, B P, Matsuoka, K, Mouginot, J, Nitsche, F O, Nogi, Y, Nost, O A, Popov, S V, Rignot, E, Rippin, D M, Rivera, A, Roberts, J, Ross, N M, Siegert, J, Smith, A M, Steinhage, D, Studinger, M, Sun, B, Tinto, B K, Welch, B C, Wilson, D, Young, D A, Xiangbin, C and Zirizzotti, A 2013 Bedmap2: Improved ice bed, surface and thickness datasets for Antarctica. The Cryosphere, 7(1): 375–393. DOI: https://doi.org/10.5194/tcd-6-4305-2012
Gillies, S 2013 The Shapely user manual. URL https://pypi.org/project/Shapely/.
Gillies, S 2014 The Fiona user manual. URL https://fiona.readthedocs.io/en/latest/manual.html.
Gillies, S 2018 Rasterio: Geospatial raster I/O for Python programmers. URL https://github.com/mapbox/rasterio.
Hinsen, K 2011 A data and code model for reproducible research and executable papers. Procedia Computer Science, 4: 579–588. ISSN 1877-0509. DOI: https://doi.org/10.1016/j.procs.2011.04.061
Hoyer, S and Hamman, J 2017 xarray: N-D labeled arrays and datasets in Python. Journal of Open Research Software, 5(1). DOI: https://doi.org/10.5334/jors.148
Hoyer, S, Fitzgerald, C, Hamman, J, Kleeman, A, Kluyver, T, Roos, M, Helmus, J J, Markel, M, Cable, P, Maussion, F, Miles, A, Kanmae, T, Wolfram, P, Sinclair, S, Bovy, B, Ebrevdo, R, Guedes, R, Abernathey, R, Filipe, S, Hill, N, Richards, A, Lee, N, Koldunov, M, Maciekswat, G, Gerard, J, Babuschkin, I, Deil, C, Welch, E and Hilboll, A 2019 Xarray:. http://xarray.pydata.org/en/stable/.
Hunter, J D May 2007 Matplotlib: A 2D graphics environment. Computing In Science & Engineering, 9(3): 90–95. ISSN 1521-9615. DOI: https://doi.org/10.1109/MCSE.2007.55
Jones, E, Oliphant, T, Peterson, P, et al. 2001 SciPy: Open Source scientific tools for Python. URL http://www.scipy.org/. [Online; accessed 13 Dec 2018].
Jordahl, K 2014 GeoPandas: Python tools for geographic data. URL https://github.com/geopandas/geopandas.
Knight, S 2010 Scons user guide. Python Software Foundation.
Lewis, A, Lymburner, L, Purss, M B J, Brooke, B, Evans, B, Ip, A, Dekker, A G, Irons, J R, Minchin, S, Mueller, N, et al. 2016 Rapid, high-resolution detection of environmental change over continental scales from satellite data-the Earth Observation Data Cube. International Journal of Digital Earth, 9(1): 106–111. DOI: https://doi.org/10.1080/17538947.2015.1111952
Lewis, A, Oliver, S, Lymburner, L, Evans, B, Wyborn, L, Mueller, N, Raevksi, G, Hooke, J, Woodcock, R, Sixsmith, J, Wu, W, Tan, P, Li, F, Killough, B, Minchin, S, Roberts, D, Ayers, D, Bala, B, Dwyer, J, Dekker, A, Dhu, T, Hicks, A, Ip, A, Purss, M, Richards, C, Sagar, S, Trenham, C, Wang, P and Wang, L-W December 2017 The Australian geoscience data cube-Foundations and lessons learned. Remote Sensing of Environment, 202: 276–292. DOI: https://doi.org/10.1016/j.rse.2017.03.015
McKinney, W 2015 Pandas: A Python data analysis library. URL http://pandas.pydata.org.
Megies, T, Beyreuther, M, Barsch, R, Krischer, L and Wassermann, J 2011 ObsPy – what can it do for data centers and observatories? Annals of Geophysics, 54(1): 47–58. ISSN 1593-5213. DOI: https://doi.org/10.4401/ag-4838
Morse, P, Reading, A M and Stål, T 2019 Well-posed geoscientific visualization through interactive and color mapping. Frontiers in Earth Science. DOI: https://doi.org/10.3389/feart.2019.00274
Oliphant, T E 2006 A guide to NumPy, 1. Trelgol Publishing USA.
Pedregosa, F, Varoquaux, G, Gramfort, A, Michel, V, Thirion, B, Grisel, O, Blondel, M, Prettenhofer, P, Weiss, R, Dubourg, V, et al. 2011 Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct): 2825–2830.
QGIS. 2015 QGIS geographic information system. Open Source Geospatial Foundation Project. By development team and community.
Ramachandran, P and Varoquaux, G 2011 Mayavi: 3D visualization of scientific data. Computing in Science and Engineering, 13(2): 40–51. ISSN 1521-9615. DOI: https://doi.org/10.1109/MCSE.2011.35
Rew, R and Davis, G 1990 NetCDF: An Interface for Scientific Data Access. IEEE Computer Graphics and Applications, 10(4): 76–82. ISSN 0272-1716. DOI: https://doi.org/10.1109/38.56302
Robitaille, T P, Tollerud, E J, Greenfield, P, Droettboom, M, Bray, E, Aldcroft, T, Davis, M, Ginsburg, A, Price-Whelan, A M, Kerzendorf, W E, et al. 2013 Astropy: A community python package for astronomy. Astronomy & Astrophysics, 558: A33. DOI: https://doi.org/10.1051/0004-6361/201322068
Rocklin, M 2015 Dask: Parallel computation with blocked algorithms and task scheduling. In Proceedings of the 14th Python in Science Conference, number 130–136. Citeseer. DOI: https://doi.org/10.25080/Majora-7b98e3ed-013
Roth, G, Matsuoka, K, Skoglund, A, Melvær, Y and Tronstad, S February 2018 Quantarctica: A unique, open, standalone GIS package for Antarctic research and education. URL http://quantarctica.npolar.no.
Schöberl, J, Martin, K and Lorensen, B 2006 Kitware, The Visualization Toolkit. Technical report. (VTK).
Stål, T January 2020 Agrid code and tests v. 0.4.0. Zenodo. DOI: https://doi.org/10.5281/zenodo.2553966
Stål, T, Reading, A M, Halpin, J A and Whittaker, J M August 2019 A multivariate approach for mapping lithospheric domain boundaries in east Antarctica. Geophysical Research Letters. DOI: https://doi.org/10.1029/2019GL083453
Stål, T, Reading, A M, Halpin, J and Whittaker, J 2020 The Antarctic crust and upper mantle: A 3D model and framework for interdisciplinary research. Manuscript in preparation.
Thompson, J F, Soni, B K and Weatherill, N P 1998 Handbook of grid generation. CRC press.
Uieda, L October 2018 Verde: Processing and gridding spatial data using green’s functions. Journal of Open Source Software, 3(30): 957. DOI: https://doi.org/10.21105/joss.00957
van der Walt, S, Colbert, S C and Varoquaux, G 2011 The NumPy array: A structure for efficient numerical computation. Computing in Science & Engineering, 13(2): 22–30. DOI: https://doi.org/10.1109/MCSE.2011.37
Warmerdam, F and GDAL/O. G. R. contributors. software Library, GDAL/OGR Geospatial Data Abstraction. Foundation, Open Source Geospatial, 2018. URL http://gdal.org.
Wilson, G, Aruliah, D A, Brown, C T, Chue Hong, N P, Davis, M, Guy, R T, Haddock, S H D, Huff, K D, Mitchell, I M, Plumbley, M D and Others 2014 Best practices for scientific computing. PLoS biology, 12(1): e1001745. DOI: https://doi.org/10.1371/journal.pbio.1001745
Zwally, H J, Giovinetto, M B, Beckley, M A and Saba, J L 2012 Antarctic and Greenland drainage systems, gsfc cryospheric sciences laboratory. Available at icesat4.gsfc.nasa.gov/cryo_data/ant_grn_drainage_systems.php. Accessed March 1, 2015.
Met Office 2019 Cartopy: A cartographic python library with a matplotlib interface. Exeter, Devon. URL http://scitools.org.uk/cartopy.