(1) Overview

Introduction

Seismology provides an attractive tool to investigate physical processes in deforming systems. The seismic signals from active glaciers, for example, could enable monitoring of mechanisms including basal sliding [, , ], fracturing [], melt water drainage [], and iceberg calving [, ]. The detection of seismic events from the recorded continuous seismic waveform data is a vital first step in any analysis. Event catalogues thus constructed are needed for local seismicity studies, comparisons between locations, or detection of change over extended time periods. However, the automated detection of seismic events is complicated in environmental and geotechnical seismology by the diverse populations of signal generation mechanisms. Those generated by active glaciers, for example, can be expected to span several orders of magnitude in duration and amplitude []. Volcanoes [], landslides [] and mining activity [] similarly produce a broad range of seismic signals. As a result, the majority of cryoseismology studies to date use manual identification of events [, , ]. Manual techniques are not readily scalable nor exactly reproducible: in particular, they are not a first choice for monitoring applications nor for the data-driven detection of change.

Established event detection algorithms have largely been developed for earthquake seismology [, , , ]. These algorithms, including STA/LTA (short-term average/long-term average) [] and template matching [], are applied in real-time to seismic data to detect earthquakes and produce event catalogues [, , , ]. The core classes in the obspy software package are therefore designed assuming event metadata is available online alongside the waveform data []; additional functions are included outside the core classes to select events directly from waveform data. These standard algorithms are generally only applicable to non-earthquake signals by employing an experimental approach to parameter selection [, ]. The seismic attributes library provides software tools to download seismic waveform data from online repositories and detect environmental and geotechnical seismic events using a choice of algorithms. Algorithm options include the classic, recursive and delayed STA/LTA algorithms [], and the newly developed multi-STA/LTA algorithm []. The multi-STA/LTA can simultaneously extract both short and long duration events of very different signal-to-noise levels and potentially enables real-time monitoring of cryoseismic events.

Event catalogues constructed using the seismic attributes library would be expected to comprise diverse signals generated by a range of mechanisms as noted above. The various signals typically need to be separated into related clusters based on the characteristics of their waveforms in order to study the events further [, ]. For example, Provost et al. [] consider 71 attributes based on waveform data in their study of landslide seismicity. These are broadly split into four categories: (1) waveform attributes (e.g. duration, energy, kurtosis); (2) spectral and spectrogram attributes (e.g. discrete Fourier transform); (3) network attributes (e.g. station with maximum amplitude); and (4) polarity attributes (e.g. azimuth, inclination). The seismic attributes library includes functions to calculate a number of standard signal properties: duration, ratio between ascending and descending time, energy in the autocorrelation function, energy in the frequency filtered spectrum, and the direction of wave propagation. These are provided in three bundles of attribute functions describing the waveform, spectrum and polarity of the signal (Table 1); we do not include network attributes as these are unordered, discrete variables (and largely application dependent). User-defined functions can be added to derive customised characteristics within our software architecture. The correlation between attributes can be investigated using a plotting function to inform the removal of redundant variables, if appropriate, for the subsequent clustering application to a given set of events. Data-driven techniques more generally, such as machine learning algorithms, may be readily applied to the calculated attributes to inform the current and future state of the glacier (i.e. identify signals in the lead-up to a large event []).

Table 1

The attributes included in the three bundles of attribute functions in the seismic attributes library. The first column is the attribute number used by Provost et al. [], the second column provides a description of the attribute, and the final column lists the attribute category, and thus attribute function bundle.


NUMBERDESCRIPTIONBUNDLE

1DurationWaveform

2Ratio of the mean over the maximum of the envelope signal

3Ratio of the median over the maximum of the envelope signal

4Ratio between ascending and descending time

5Kurtosis of the raw signal (peakness of the signal)

6Kurtosis of the envelope

7Skewness of the raw signal

8Skewness of the envelope

10Energy in the first third of the autocorrelation function

11Energy in the remaining part of the autocorrelation function

12Ratio of 11 and 10

13–17Energy of the signal filtered in 5–10 Hz, 10–50 Hz, 5–70 Hz, 50–100 Hz, and 5–100 Hz

18–22Kurtosis of the signal in 5–10 Hz, 10–50 Hz, 5–70 Hz, 50–100 Hz, and 5–100 Hz frequency range

24Mean of the discrete Fourier transform (DFT)Spectral

25Maximum of the DFT

26Frequency at the maximum

27Central frequency of the 1st quartile

28Central frequency of the 2nd quartile

29Median of the normalized DFT

30Variance of the normalized DFT

34–37Energy in DFT for 0,14Nyquist frequency (Nyf),14,12Nyf,12,34Nyf,34,1Nyf

38‘Spectral centroid’ (as defined by Provost et al.)

39Gyration radius

40Spectral centroid width

68RectilinearityPolarity

69Azimuth

70Dip

71Planarity

Implementation and architecture

The analysis of waveform data from environmental and geotechnical seismic deployments requires multiple distinct steps. Following the principle that one should write programs that do one thing well, and write programs that work together [], the seismic waveforms library is correspondingly split into three primary functions to streamline the workflow. The name and purpose of these functions are as follows:

  • get_waveforms(); this function downloads waveform data from an online repository or alternatively reads data stored locally.
  • get_events(); this function uses an event triggering algorithm to produce an event catalogue based on the seismic signals in one or more components of one or more seismic stations.
  • get_attributes(); this function produces a pandas DataFrame of the attributes for each event using functions included in the library, or user-defined attribute functions.

The library includes the numerous attribute functions that are called by get_attributes(). Additional functions are also included to analyse the output of the primary functions (e.g. plotting), together with several private functions that support the primary workflow (documented in the code itself). The implementation of the three primary functions is outlined below.

get_waveforms

Seismic datasets can build to a high data volume (several terabytes) if high sampling rates are required over extended periods of time, or if using data from seismic arrays with multiple seismometers. This can result in memory problems when processing the waveforms and also disk space limitations. The first function in our workflow therefore acts as a gateway, to manage the volume of data downloaded from online repositories using obspy clients. Waveform data from each seismometer is written into separate files comprising only a single day of data for a single component (e.g. vertical, north, east). The user can specify the write location of the files to split different time periods across several external drives. This same function checks if the requested data have previously been downloaded (by default) and can optionally not check for, or not download missing data (e.g. in the case of known data gaps). The waveform data are written as .mseed files using standard obspy functions.

The get_waveforms function recombines downloaded or locally stored files into a single obspy Trace object for the requested time period, thus preparing the downloaded files for the next steps in the workflow. The user should examine small subsets of the time period of interest (e.g. a week at a time) based on factors including data volume and computer hardware performance. The seismic waveforms for the requested time period and components of a given seismometer are output as an obspy Stream object. The streams from different seismometers are combined using the ‘+’ operator into a single object.

get_events

As noted in the introduction, several algorithms exist to trigger events from single component waveform data. The get_events code first separates the signals from multiple seismometers and combines their (usually) three components into a single waveform. The user can optionally specify whether the component waveforms are combined as the sum-of-squares of their amplitude (i.e. Euclidean norm) to give the wave energy, or the absolute value of the wave amplitude. Further, to ensure that taking the absolute value of the amplitude does not affect the results (e.g. doubling frequency) we tested a computationally intensive option to fit the (time-varying) principle component of the direction of wave oscillation to obtain a signed amplitude; this gave identical event detections to the absolute amplitude and so is not included in our published version of the code due to the significantly longer computation time.

We use an adapted version of the obspy coincidence_trigger function to create a reference catalogue of events based on the STA/LTA characteristic function at each seismometer for their combined component waveforms. Our version of the coincidence_trigger function includes adjustments in the algorithm to better align with the outputs needed in our workflow and improvements to computational performance that are possible for our narrower use of this function (see Figure 1 for schematic of algorithm). This function can use the standard STA/LTA algorithms available within obspy [], and optionally the recently developed multi-STA/LTA algorithm []. In all cases, location-specific algorithm parameters are likely to be more successful for environmental seismology applications. The start and stop times of the detected event from a single seismometer can be extended to consider small gaps in between triggers (up to length thr_event_join), creating a single, longer-duration record. The reference event catalogue is created by finding times when n (i.e. thr_coincidence_sum) of the N seismometers in the array have temporally coincident records. The physical dimensions of the seismic array are calculated at this step to estimate an upper bound on the delay in arrival time of the signal at different seismometers. This delay is added to the duration of the single seismometer records to ensure the reference events are not artificially shortened. The reference event catalogue, which includes a reference start time and duration for each event, is output as a pandas DataFrame to provide a convenient format to write to file and interrogate.

Figure 1 

Top: Diagrammatic representation of algorithm used to identify events from a single seismometer comprising one or more channels (traces show their Euclidean norm). The characteristic function for an STA/LTA algorithm is used to ‘trigger’ events (shown in red) and small gaps between these events (shorter than thr_event_join) are ignored (orange). No events are present at other times (shown in green). Bottom: Representation of algorithm used to identify events in the ‘reference event’ and ‘trace’ catalogues for seismic arrays (with multiple seismometers). The example shows an indicative array with N = 3 seismometers and thr_coincidence_sum = 2 simultaneous detections. Events identified for the single seismometers traces are shown in red (as given in the top panel). The duration of these events is extended at each end by half the delay in arrival time of a wave between the most distant seismometers in the array (shown in deep red). The reference event is identified as the times when thr_coincidence_sum = 2 seismometers have a detection (shown in red or deep red); small gaps with fewer seismometers are joined over (shown in orange). The events at each seismometer are simply the events identified from the single seismometer traces (red but not deep red), but with any times between events that occur during the reference event also included (shown in grey).

Our adapted version of the coincidence_trigger function provides additional utility for other researchers by outputting a secondary catalogue of trace metadata, for each seismometer, for all identified reference events. The triggered records at each seismometer that occur during (at least) part of the time period covered by a given reference event are joined together (if not already a single record). This record may extend beyond the time period of the reference event, be contained entirely within it, or have no detection at all in the case of weaker events. This trace (metadata) catalogue, which includes the trigger start time and duration for all seismometers, is similarly output as a pandas DataFrame. The trace metadata can be used to extract waveform data for identified events in the catalogue in a format that will be familiar to seismologists, for example, as shown in Figure 2 for a low-frequency event detected using four seismometers on the Whillans Ice Stream in Antarctica from 13:35:37 on 16 December 2010.

Figure 2 

Waveforms of an event detected using four seismometers (BB01, BB03, BB04 and BB06) comprising part of a seismic array on the Whillans Ice Stream in Antarctica from 13:35:37 on 16 December 2010. The top, middle and bottom panels show the vertical, north and east component of the signal respectively. The waveforms for each seismometer start 30 seconds prior to the start time in the trace catalogue and terminate 60 seconds after the stop time. The code to reproduce this plot is provided as a worked example.

get_attributes

The get_attributes function is written following further principles [], that one should write flexible and open programs. This function extracts the waveform data for a given seismometer for the duration of a given event listed in the reference event, or trace (metadata) catalogue. The waveforms for each component are stored as separate Trace objects in a single stream. This Stream object is passed to one or multiple attribute functions to derive the characteristics of the given event as measured by a given seismometer. The values of the requested attributes for the array of seismometers are output as a pandas DataFrame. The correlation between the spectral attributes included in the library are shown in Figure 3 for events detected at Ilulissat, Greenland (DK.ILULI) on 1 January 2018 using the recursive STA/LTA algorithm.

Figure 3 

Corner (or pair) plot illustrating the correlation between spectral attributes for the events detected at Ilulissat, Greenland on 1 January 2018 using the recursive STA/LTA algorithm. The attribute names are defined to match those used by Provost et al. []; see Table 1 for a description of each attribute. In this example, attribute 26 considers a frequency range close to the sampling rate of the recorded signal, therefore, it has null value for all events. The code to reproduce this plot is provided as a worked example in the detailed documentation provided with the software.

Optionally, custom attribute functions may be added to calculate any chosen characteristic of a waveform. Custom functions must take a Stream object containing one or more components as an input, and output the attribute name and value. Public functions to combine the component waveforms into the wave energy or absolute value of the wave amplitude are included in the seismic attributes library to aid the user. The attribute functions (as many as are required) are passed to the get_attributes function as optional parameters, with return values stored as above.

Quality control

We have tested the seismic attributes library to find any software bugs and to ensure outputs are reproducible by other researchers running the code or using different platforms []. We first created test cases based on glacier seismic signals from a field campaign seismic array on the Whillans Ice Stream in West Antarctica []. Plausible uses of the code were tested by selecting some or all of the seismometers in the array, some or all of the three components of the signal recorded at each seismometer (i.e. vertical, north and east), and considering different durations of requested time (with and without small data gaps). The optional function inputs were also tested in this manner. In application, e.g. [], the event catalogue produced using the seismic attributes workflow compared favourably to events that were visually classified using the same set of seismometers []. Finally, the attribute functions included in the library were tested on mock waveform data to ensure they produce the theoretically expected results.

Testing was primarily carried out using a Python 3.8 installation on a macOS 10/11 system. The key features of the seismic attributes library were tested on other platforms for compatibility, including both Python 2.7 and 3.7/8 for the most up to date version of obspy, numpy, pandas and seaborn available for that installation. The code was further tested on both macOS 10/11 and Microsoft Windows operating systems, especially to verify functionality of the different file systems. The reference event and trace (metadata) catalogues were compared between these platforms to assess the consistency of code functionality.

(2) Availability

Operating system

macOS 10/11, GNU/Linux and Microsoft Windows.

Programming language

Python 2 or 3.

Additional system requirements

Memory and disk space will limit the volume of data that can be processed in a contiguous chunk; the code will work on any system that can support obspy. Internet access is required to download new waveform data from online repositories.

Dependencies

The minimal dependency for use is Python 2/3 with obspy, numpy, pandas and seaborn packages installed.

Software location

Archive

Name: An ObsPy library for event detection and seismic attribute calculation: preparing waveforms for automated analysis

Persistent identifier: https://doi.org/10.5281/zenodo.5496794

Licence: GPL version 3

Publisher: Zenodo

Date published: 10/09/2021

Code repository

Name: seismic_attributes

Persistent identifier: https://github.com/rossjturner/seismic_attributes

Licence: GPL version 3

Date published: 10/09/2021

Language

Python 2 or 3.

(3) Reuse potential

The main purpose behind the seismic attributes library is to provide a streamlined workflow for researchers to detect and categorise seismic events from environmental and geotechnical sources in a rigorous and reproducible manner. The code will find extensive use in areas of seismology where a diverse population of seismic signals are present, including glaciers, volcanoes, landslides and mine sites. The get_events function is expected to be especially useful, in particular in glacier seismology, to provide a consistent method for creating event catalogues for ongoing machine learning and conventional seismological analysis. We anticipate that our get_attributes function may also be useful in applications outside seismology to aid in the construction of a catalogue of waveform attributes for use in machine learning. The code would need only minor modifications to handle time series data, outside of seismology, stored in different formats. It could conceivably find applications other areas of science (e.g. astrophysics), economics and many other potential applications. Limited support may be provided by contacting the corresponding author.