Nansat is a Python toolbox for analysing and processing 2-dimensional geospatial data, such as satellite imagery, output from numerical models, and gridded in-situ data. It is created with strong focus on facilitating research, and development of algorithms and autonomous processing systems. Nansat extends the widely used Geospatial Abstraction Data Library (GDAL) by adding scientific meaning to the datasets through metadata, and by adding common functionality for data analysis and handling (e.g., exporting to various data formats). Nansat uses metadata vocabularies that follow international metadata standards, in particular the Climate and Forecast (CF) conventions, and the NASA Directory Interchange Format (DIF) and Global Change Master Directory (GCMD) keywords. Functionality that is commonly needed in scientific work, such as seamless access to local or remote geospatial data in various file formats, collocation of datasets from different sources and geometries, and visualization, is also built into Nansat. The paper presents Nansat workflows, its functional structure, and examples of typical applications.
Geospatial data is information that identifies the geographic location of features and boundaries on Earth, such as natural or constructed features, oceans, and more, usually stored as coordinates and topology, that can be mapped . This paper describes a new tool to handle the interpretation and processing of geospatial data from satellite images and numerical models.Typical operations on such data include visualization for human perception of spatial patterns, extraction of geophysical values, pixel-per-pixel or contextual calculation of new geophysical variables based on one or several input datasets, re-gridding to another coordinate system, co-location of several datasets on one spatial grid, subsetting in space, and automatic recognition and description of objects.
With a growing amount of available geospatial data, the complexity of this work is, however, increasing. As an example, a collection of 2-dimensional (2D) datasets acquired over one geographic region at a given point of time will form a 3D cube of data, and time series of such cubes represent 4D datasets. The combination of multidimensional data analysis with significant input data heterogeneity and size (a single satellite may, e.g., provide 20TB of data per day) creates a challenge for researchers , and powerful and customized tools are thus required.
A number of analysis and processing tools for gridded data already exist, e.g., ERDAS IMAGINE , Bilko , and ArcGis . Both the European Space Agency (ESA) and the National Aeronautics and Space Administration (NASA) have developed and provide software tools for processing their respective satellite data, including, e.g., the Sentinel Toolbox  and SeaDAS . However, these tools are written by programmers, from a programmer perspective, in compiled low-level and object-oriented languages , such as Java or C/C++ to accomplish highly generic software and reduced execution time. However, scientists often prefer higher-level programming languages in order to minimize the abstraction from their scientific ideas. The nature of scientific work also requires researchers to be able to quickly modify and extend their software tools to be able to analyse new sets of information. The possibility to work from the command line, and create quick and simple visualizations is therefore very important, whereas execution speed is usually less of a concern. Python  is a programming language that can meet these needs. It has a large user community, and it is free and open source. Existing Python packages, such as Numerical Python (Numpy) , Scientific Python (Scipy)  and Matplotlib , already provide much of the required flexibility, and code can be speeded up by clever programming practices. Compared to, e.g., Java, it is relatively easy to get started with Python, and the threshold for being able to produce scientific results is lower. As Python is object oriented, it also allows programmers to create reusable and complex software tools. As such, Python is highly suitable for scientists, while overlapping into what programmers want.
Many of the above mentioned tools are available only as Graphical User Interfaces (GUIs). A challenge with such tools is to automate operations that could be performed by computers alone [13, 14]. In an attempt to solve this problem, scientific users of geospatial data often develop their own software tools which can be more easily integrated into processing chains handling multiple datasets (e.g., an archive of satellite images) . With the boom of open source software, the programming paradigm has gradually changed from development of large, strongly coupled packages to modular, unifunctional libraries that can be used together as building blocks for multifunctional applications. The Geospatial Data Abstraction Library (GDAL ) is an example of an open source software rarely used alone, but which is included in many applications (even proprietary) and online services such as, e.g., QGIS  and GRASS . GDAL has a very usable and well documented application programming interface (API), with bindings for many programming languages, and a number of community developed drivers that are capable of reading geospatial data and metadata from many file formats.
The advantage of GDAL is that it provides access to various sources of information using the same universal data model. However, a limitation for scientific users is that GDAL only provides access to the data or metadata stored in the file and does not attach any geophysical meaning to the data. Thus, it may become difficult for a researcher to correctly interpret the content of the dataset with regard to the relevant band name, calibration, physical units, spatial scale, etc. In addition, GDAL provides low-level access to the data, and a scientific user is often required to perform several complicated operations in order to eventually fetch the values of the required geophysical variable at a given resolution and region of interest. Furthermore, the structure of files containing geospatial data varies significantly between its sources, and GDAL does not provide a unified interface. In many datasets, there are specific details about, e.g., subdatasets, ground control points, or the location of metadata, which should be taken into account.
The main goal of Nansat is thus to combine the versatility and power of GDAL with scientific knowledge about the content of geospatial data from various sources in a user-friendly Python package. In order to simplify the data use and interpretation, Nansat adds a priori known scientific metadata following the CF and GCMD conventions [19, 20] to the datasets, unifies the access to various data (either in local files or through the OpenDAP protocol) as much as possible, and provides a high level API to enable use in diverse, as well as automated operations. Via GDAL, spatial transformation of data and exporting the data into common formats (e.g., NetCDF-CF, PNG, GeoTIFF) is also facilitated.
In this section, the instantiation of a Nansat object with geospatial satellite data is summarized and schematically presented on Fig. 1 (subsection A), and the Nansat structure including details about classes, mappers and use of third party libraries are given (subsection B). Examples of more advanced processing are provided in section (3) ‘Reuse Potential’.
Nansat is designed as a modular collection of classes (Fig. 2) to make the code more cohesive and less coupled , and thus to facilitate independent development of each block. The class Domain describes geographical projection, span and resolution of any kind of gridded data. It contains an instance of the class VRT, which operates on VRT files.The class VRT contains an instance of a GDAL Dataset for the actual reading of data and metadata. VRT is a base class for Mapper classes, which add scientific meaning to various data read by GDAL. A Domain instance also contains an NSR object, which subclasses OGR SpatialReference for work with spatial reference systems. Nansat is the central class in the package and a subclass of Domain. Nansat contains Figure (convenient generation of images) and PointBrowser (interactive digitizer). Nansat is a base class for Mosaic, which performs mosaicking of several files.
The Nansat class is the main container for the geospatial data, and performs all high-level operations. An instance of the Nansat class contains information about the geographical reference of the data (e.g., raster size, pixel resolution, type of projection, etc.) and about bands with geophysical variables (e.g., water leaving radiance, normalized radar cross section, chlorophyll concentration). A Nansat instance contains methods for high-level operations on the data, including mainly the following:
The Domain class is a container for the geographical reference of the raster data. A Domain instance is a GDAL Dataset with information about the geolocation information only (i.e., no bands), containing, e.g.,
There are three ways to store the geo-reference in a GDAL dataset:
The relation between row and column coordinates of the raster and geographic (e.g. latitude and longitude) coordinates is defined by the projection type and projection parameters.
The Domain class has methods for basic operations with georeference information:
The VRT class is a wrapper around a GDAL vrt file. The vrt file is in XML format, and contains the geo-reference, the global and band metadata, and reference to the data grids (or bands) provided in the source file. The class VRT performs all operations on the VRT files, i.e., copying, modifying, reading, and writing of files, adding bands, setting spatial reference attributes as such projection or coordinate transformation, and more. These operations are performed either directly by methods in the VRT class (e.g., remove_geotransform and get_warped_vrt), or using GDAL functions (e.g., Create, AddBand, SetMetadata, and AutoCreateWarpedVRT). The core of the VRT object is a GDAL Dataset instance generated by the GDAL VRT Driver. The respective vrt file is located in computer memory, and is accessible using the GDAL VSI functionality (the path to the vrt file is /vsimem). The VRT class can also be used to write binary files with actual data accompanied with a wrapping vrt file that describes its dimensions and data type. This is used when adding data to existing Nansat instances or caching CPU consuming operations.
When Domain is instantiated, the VRT object is created without any bands, only with geo-reference information using the GDAL Dataset methods. On the contrary, when a file is opened with Nansat, the VRT object with both the georeference and the bands is created using the class Mapper.
When Nansat performs operations that change the vrt file, e.g., by reprojecting, cropping, resizing or adding bands, an additional vrt file with a corresponding VRT object is generated. The new VRT object keeps a reference to the original VRT object. This allows to easily and safely undo operations.
The Figure class performs graphical operations. It is used to create figures, append a legend, add land mask, and adjust brightness and colours, as well as saving figures in PNG, JPG or TIFF formats. An instance of Figure can be created in the Nansat write_figure method, or from a Numpy 2D array (or three 2D layers in case of an RGB image). The Figure class uses two Pillow (Python Imaging Library) Image objects, one for the image canvas and one for the legend. Figure has several convenient methods for optimizing the data presentation:
The Nansatmap class generates nice looking maps where several datasets can be displayed together. It extends the Python Basemap  class and overrides its constructor for simpler instantiation based on a given Nansat object. Nansatmap also overrides several Basemap methods by taking advantage of a priori knowledge about the spatial extent:
The Mosaic class extends Nansat and provides capability to create mosaics of several input files using one of two methods:
If the “mask” band is present, it is used to select the valid pixels for averaging (i.e. where mask equals 64). Otherwise it is assumed that all input pixels are valid.
Quality control of the code is implemented in two blocks: unit tests and integration tests. Currently, the unit tests cover 78% of the core code  and thus the most important functionality. Sample datasets are pre-generated for running the unit tests, and consists of synthetic satellite images with data either on a regular projection or with the projection defined by ground control points (GCPs). The code repository is integrated with the Travis Continuous Integration platform  and tests are automatically run on each commit.
The integration tests are developed for testing mappers and operation chains on actual data. The integration tests are designed as a separate package based on the Python unittest  package, and are run only locally (i.e.,not on TravisCI). This is to prevent downloading of large amounts of satellite data on the test server. Including the integration tests, the coverage of all code including all mappers is 59% (this is not reflected in the Coveralls badge in the readme file at GitHub).
Nansat has been tested on OS X and Linux and is also provided in a Vagrant/Ansible configuration for virtual machines (CentOS 7 or Ubuntu 14.04). Thus, it is easy to install the tool also on Windows, although the actual OS would be Ubuntu.
Most of Nansat is written in Python 2.7. A small and optional part (GDAL pixel functions ) is written in C.
The code is 8 MB, and the system requirements largely depend on the application of Nansat. Small satellite images (less than 1000 x 1000 pixels) can be processed with 100 MB of free RAM and a 1.5 GHz processor. Large datasets (e.g. synthetic aperture radar (SAR) images exceeding 10000 x 10000 pixels) require > 4 GB RAM and > 2.5 GHz CPU.
Nansat has both required and optional dependencies. Installation of all the dependencies for full Nansat functionality, i.e., to access data in netCDF, HDF4 and HDF5 files via GDAL, may sometimes be cumbersome. Therefore, we have pre-built fully functional binaries of GDAL for Linux (32 and 64 bits), which are available in the Anaconda Cloud [29, 30]. To further simplify the installation, we have created a Vagrant/Ansible configuration to install Nansat and all required dependencies on a local virtual machine as referenced in the “Emulation Environment” section.
The basic requirements of Nansat are:
In addition, there are optional dependencies for specific Nansat functionality, in particular:
Persistent identifier: 10.5281/zenodo.59998
Licence: GNU GPLv3
Publisher: Anton Korosov
Date published: 11/08/2016 (last release, 0.6.14)
Licence: GNU GPLv3
Date published: 18/01/2013 (first release, 0.4)
Name: Provisioning of virtual machines with Vagrant/Ansible
Licence: GNU GPLv3
Date published: 10/09/2015
The potential to reuse Nansat for scientific applications is illustrated in three use cases. These examples, plus other tutorials are available in the nansat-lectures repository .
Nansat is at the time of writing capable of reading data from tens of platforms and instruments in 8 file formats, as specified in the online table at https://github.com/nansencenter/nansat/blob/master/nansat/mappers/README.yml. Nansat can be extended by inheritance or by adding new mappers, as described in the online manual . User support and feedback is realized via GitHub issues and through a mailing list firstname.lastname@example.org
This example covers spatial collocation of two images acquired in May 2016 by the Synthetic Aperture Radar (SAR) instrument onboard the Sentinel-1A  satellite, and the Moderate Resolution Imaging Spectroradiometer (MODIS)  onboard the Aqua satellite. The two original level-1 datasets have different spatial reference, resolution and coverage, and are collocated and projected onto the same spatial grid, and then visualized using Nansat. The resulting images (Fig. 3) show MODIS top-of-atmosphere radiances measured at 469, 555 and 858 nm displayed in RGB (Fig. 3A) and a grayscale image of the same area generated from SAR HH (horizontal transmit and horizontal receive) polarized Normalized Radar Cross Section (NRCS) (Fig. 3B) covering parts of the Norwegian Sea. In these images, we see ocean mesoscale eddies as manifested by both the ocean colour (Fig. 3A) and the sea surface roughness (Fig. 3B).
from nansat import Nansat, Domain # Open file with MODIS image n1 = Nansat('MYD02HKM.A2016148.1050.006.2016149162607.hdf') # Open file with Sentinel–1A image n2 = Nansat('S1A_EW_GRDM_1SDH_20160527T163927_20160527T164031_011447' \ '_0116DE_3ED0.SAFE') # Define region of interest (south of Lofoten islands) d = Domain('+proj=stere +lon_0=12 +lat_0=67 +no_defs', '–te –50000 –50000 50000 50000 –tr 100 100') # Collocate the images n1.reproject(d) n2.reproject(d) # Write RGB graphic file from MODIS data f = n1.write_figure('test_rgb_MODIS.png', ['L_858', 'L_555', 'L_469'], clim='hist', ratio=0.95) # Write grayscale file from Sentinel–1A data f = n2.write_figure('test_SENTINEL1.png', 'sigma0_HH', clim='hist', ratio=0.95, cmapName='gray') # Write map of the region of interest d.write_map('test_domain_map.png', resolution='h', dpi=300, meridians=4, parallels=4, latBorder=1, lonBorder=1, merLabels=[False, False, True, False], parLabels=[True, False, False, False])
In this example Nansat is used to compare two datasets of surface Chlorophyll-A (Chl-A) and Sea Surface Temperature (SST) in the Barents Sea. As shown in the code below, Nansat accesses remote Chl-A and SST datasets streamed from the Norwegian Meteorological Institute and Plymouth Marine Laboratory (PML) (the latter dataset is provided as part of the ESA Ocean Colour Climate Change Initiative) via the OpenDAP protocol. When connection to the data stream is established, Nansat is used to define the transects and retrieve and plot the actual data (Fig. 4). The Matplotlib and Basemap code used for creating Fig. 4 is out-of-scope of the paper and is thus not provided.
from nansat import Nansat, Domain # Access online sea surface temperature data streamed from # the Norwegian Meteorological Institute (met.no) via the # OpenDAP protocol n1 = Nansat('http://thredds.met.no/thredds/dodsC/sea_ice/' \ 'SSTMETNOARCSST_L4OBSV2V1/sst_arctic_aggregated', date = '20120601', bands = ['analysed_sst']) # Access online Chlorophyll-A data streamed from # Plymouth Marine Laboratory, UK, via the OpenDAP protocol n2 = Nansat('https://rsg.pml.ac.uk/thredds/dodsC/CCI_ALLv2.08DAY', date = '20120601', bands = ['chlor_a']) # Get transects from points given by [longitude, latitude] t1 = n1.get_transect([[24, 24], [71, 77]], ['analysed_sst']) t2 = n2.get_transect([[24, 24], [71, 77]], ['chlor_a']) # Fetch values from the transects sst_lat = t1['lat'] sst_val = t1['analysed_sst'] chl_lat = t2['lat'] chl_val = t2['chlor_a'] # Mask invalid SST values sst_val[sst_val < 0] = np.nan # …[Plotting is performed using standard Matplotlib and Basemap # libraries (Fig. 4)]…
The example below illustrates how to use Nansat to access online datasets of sea ice concentration, SST and sea ice drift streamed from the Norwegian Meteorological Institute, collocate the datasets, and create a nice looking map (Fig. 5).
from nansat import Nansat, Domain, Nansatmap # Access online sea ice concentration data (IC) from met.no n1 = Nansat('http://thredds.met.no/thredds/dodsC/osisaf/met.no/' \ 'ice/conc/2016/04/' \ 'ice_conc_nh_polstere100_multi_201604011200.nc', bands = ['ice_conc']) # Access online SST data from met.no n2 = Nansat('http://thredds.met.no/thredds/dodsC/myocean/' \ 'siwtac/sstmetnoarcsst03/'\ '20160401000000METNOL4_GHRSST-SSTfnd' \ 'METNO_OIARCv02.0fv02.0.nc', bands = ['analysed_sst']) # Access online sea ice drift data from met.no n3 = Nansat('http://thredds.met.no/thredds/dodsC/osisaf/met.no/' \ 'ice/drift_lr/merged/2016/04/' \ 'ice_drift_nh_polstere625_multi' \ 'oi_201604011200201604031200. nc', bands = ['dX', 'dY']) # Define domain of the region of interest d = Domain('+proj=stere +lon_0=45 +lat_0=90 +no_defs', '–te 300000 –1200000 1700000 300000 –tr 10000 10000') # Project the collocated datasets onto the same domain n1.reproject(d) n2.reproject(d) n3.reproject(d) # Retrieve data covering the region of interest ice_conc = n1['ice_conc'] analysed_sst = n2['analysed_sst'] dX = n3['dX'] dY = n3['dY'] # Mask invalid data with Numpy NotANumber values ice_conc[ice_conc <=1] = np.nan analysed_sst[analysed_sst < 0] = np.nan # Create canvas for drawing a map nmap = Nansatmap(n2, resolution = 'l') # Add SST data as a raster layer, then the corresponding colorbar nmap.imshow(analysed_sst, vmin =270, cmap = 'viridis') nmap.add_colorbar(shrink = 0.5, pad = 0.05) # Add the ice concentration as a raster layer nmap.imshow(ice_conc, vmin = 0, vmax = 100, cmap = 'bone') # Add the ice drift data as vectors nmap.quiver(dX, dY, step = 5) # Decorate the map with a grid nmap.drawmeridians([-20, 0, 20, 40], labels = [False, True, False, True], fontsize = 6) nmap.drawparallels([75, 80, 85], labels = [True, False, True, False], fontsize = 6) # Save the image as a graphical file nmap.save('nmap_example5.png', dpi = 300) plt.close('all')
The authors declare that they have no competing interests.
Beal, F, Stroud, F and Shread, P (2016). WEBOPEDIA. URL: http://www.webopedia.com/TERM/S/spatial_data.html.
Korosov, A, Counillon, F and Johannessen, J A (2014). Monitoring the spreading of the Amazon freshwater plume by MODIS, SMOS, Aquarius, and TOPAZ. Journal of Geophysical Research: Oceans 120(1)DOI: https://doi.org/10.1002/2014JC010155
ERDAS IMAGINE (2016). URL: http://www.hexagongeospatial.com/products/producer-suite/erdas-imagine.
Bilko software for ESA LearnEO! (2012). URL: http://www.learn-eo.org/software.php.
ArcGIS (2010). URL: https://www.arcgis.com.
Toolboxes Sentinel Online – ESA (2014). URL: https://sentinel.esa.int/web/sentinel/toolboxes.
Feldman, G C (2002). SeaDAS Home Page. URL: http://seadas.gsfc.nasa.gov.
“Low-level programming language” (2016). WIKIPEDIA. URL: https://en.wikipedia.org/wiki/Low-level_programming_language.
Rossum, G (1997). Scripting the Web with Python. In “Scripting Languages: Automating the Web” In: World Wide Web Journal. O’Reilly. 2(2) URL: https://www.python.org/.
Oliphant, T et al. (2016). Numpy. URL: http://www.numpy.org/.
Oliphant, T Peterson, P Jones, E et al. (2016). Scipy. URL: https://www.scipy.org/.
Hunter, J D et al. (2016). Matplotlib. URL: http://matplotlib.org/.
Korosov, A A, Pozdnyakov, D V and Grassl, H (2012). Spaceborne quantitative assessment of dissolved organic carbon fluxes in the Kara Sea. Advances in Space Research 50: 1173–1188, DOI: https://doi.org/10.1016/j.asr.2011.10.008
Muckenhuber, S, Nilsen, F, Korosov, A and Sandven, S (2016). Sea ice cover in Isfjorden and Hornsund, Svalbard (2000–2014) from remote sensing data. The Cryosphere 10: 149–158, DOI: https://doi.org/10.5194/tc-10-149-2016
Hansen, M W, Collard, F, Dagestad, K F, Johannessen, J A, Fabry, P and Chapron, B (2011). Retrieval of Sea Surface Range Velocities From Envisat ASAR Doppler Centroid Measurements. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 49(10): 3582–3592, 6.DOI: https://doi.org/10.1109/TGRS.2011.2153864
Rouault, E, Jolma, A and Baryshnikov, D (2004). GDAL Geospatial Data Abstraction Library. URL: http://www.gdal.org/.
Sutton, T, Dassau, O and Sutton, M (2015). QGIS A Free and Open Source Geographic Information System. URL: http://www.qgis.org/.
GRASS Development Team (2015). Geographic Resources Analysis Support System (GRASS) Software, Version 7.0 In: Open Source Geospatial Foundation. URL: http://grass.osgeo.org.
Eaton, B, Gregory, J, Drach, B, Taylor, K, Hankin, S, Caron, J, Signell, R, Bentley, P, Rappa, G, Höck, H, Pamment, A and Juckes, M (2011). NetCDF Climate and Forecast (CF) Metadata Conventions, Version 1.6. URL: http://cfconventions.org/.
Wharton, S (2016). National Aeronautics and Space Administration (NASA), Global Change Master Directory (GCMD), Version 10.0. URL: http://gcmd.nasa.gov/.
Rouault, E (2004). GDAL Virtual Format Tutorial, 2004. URL: http://www.gdal.org/gdal_vrttut.html.
Rouault, E, Jolma, A and Baryshnikov, D (2004). GDAL Dataset Class Reference. URL: http://www.gdal.org/classGDALDataset.html.
Stevens, W P, Myers, G J and Constantine, L L (1974). “Structured design”. IBM Systems Journal 13(2): 115–13, DOI: https://doi.org/10.1147/sj.132.0115
Basemap Toolkit documentation (2012). URL: http://matplotlib.org/basemap/.
Merwin, N, Donahoe, L and McAngus, E (2016). Coverage of nansat at github. URL: https://coveralls.io/github/nansencenter/nansat.
Travis CI (2011). URL: https://travis-ci.org.
Unit testing framework (2014). URL: https://docs.python.org/2/library/unittest.html.
Continuum Analytics (2015). Anaconda Cloud. URL: https://anaconda.org.
Gillingham, S et al. (2016). A conda-smithy repository for gdal. URL: https://github.com/conda-forge/gdal-feedstock.
Lundh, F Clarck, A et al. (2016). Pillow. URL: https://python-pillow.org/.
Korosov, A A, Hansen, M W and Vines, A (2016). Python Thesaurus Interface. URL: https://github.com/nansencenter/py-thesaurus-interface.
Korosov, A A and Hansen, M W (2016). Nansat-lectures. URL: https://github.com/nansencenter/nansat-lectures.
Korosov, A A, Dagestad, K-F and Hansen, M W (2014). How to create a mapper. URL: https://github.com/nansencenter/nansat/wiki/How-to-create-a-mapper.
Meadows, P (2015). Sentinel-1A Annual Performance Report for 2015, CLS Technical report #MPC0139 In: CLS. URL: https://earth.esa.int/documents/247904/1814124/Sentinel-1A-Annual-Performance-Report-2015.
Frazier, S (2016). MODIS Moderate Resolution Imaging Spectroradiometer. URL: http://modis.gsfc.nasa.gov/.