An ObsPy Library for Event Detection and Seismic Attribute Calculation: Preparing Waveforms for Automated Analysis

Ross J. Turner; Rebecca B. Latto; Anya M. Reading

(1) Overview

Introduction

Seismology provides an attractive tool to investigate physical processes in deforming systems. The seismic signals from active glaciers, for example, could enable monitoring of mechanisms including basal sliding [, , ], fracturing [], melt water drainage [], and iceberg calving [, ]. The detection of seismic events from the recorded continuous seismic waveform data is a vital first step in any analysis. Event catalogues thus constructed are needed for local seismicity studies, comparisons between locations, or detection of change over extended time periods. However, the automated detection of seismic events is complicated in environmental and geotechnical seismology by the diverse populations of signal generation mechanisms. Those generated by active glaciers, for example, can be expected to span several orders of magnitude in duration and amplitude []. Volcanoes [], landslides [] and mining activity [] similarly produce a broad range of seismic signals. As a result, the majority of cryoseismology studies to date use manual identification of events [, , ]. Manual techniques are not readily scalable nor exactly reproducible: in particular, they are not a first choice for monitoring applications nor for the data-driven detection of change.

Established event detection algorithms have largely been developed for earthquake seismology [, , , ]. These algorithms, including STA/LTA (short-term average/long-term average) [] and template matching [], are applied in real-time to seismic data to detect earthquakes and produce event catalogues [, , , ]. The core classes in the obspy software package are therefore designed assuming event metadata is available online alongside the waveform data []; additional functions are included outside the core classes to select events directly from waveform data. These standard algorithms are generally only applicable to non-earthquake signals by employing an experimental approach to parameter selection [, ]. The seismic attributes library provides software tools to download seismic waveform data from online repositories and detect environmental and geotechnical seismic events using a choice of algorithms. Algorithm options include the classic, recursive and delayed STA/LTA algorithms [], and the newly developed multi-STA/LTA algorithm []. The multi-STA/LTA can simultaneously extract both short and long duration events of very different signal-to-noise levels and potentially enables real-time monitoring of cryoseismic events.

Event catalogues constructed using the seismic attributes library would be expected to comprise diverse signals generated by a range of mechanisms as noted above. The various signals typically need to be separated into related clusters based on the characteristics of their waveforms in order to study the events further [, ]. For example, Provost et al. [] consider 71 attributes based on waveform data in their study of landslide seismicity. These are broadly split into four categories: (1) waveform attributes (e.g. duration, energy, kurtosis); (2) spectral and spectrogram attributes (e.g. discrete Fourier transform); (3) network attributes (e.g. station with maximum amplitude); and (4) polarity attributes (e.g. azimuth, inclination). The seismic attributes library includes functions to calculate a number of standard signal properties: duration, ratio between ascending and descending time, energy in the autocorrelation function, energy in the frequency filtered spectrum, and the direction of wave propagation. These are provided in three bundles of attribute functions describing the waveform, spectrum and polarity of the signal (Table 1); we do not include network attributes as these are unordered, discrete variables (and largely application dependent). User-defined functions can be added to derive customised characteristics within our software architecture. The correlation between attributes can be investigated using a plotting function to inform the removal of redundant variables, if appropriate, for the subsequent clustering application to a given set of events. Data-driven techniques more generally, such as machine learning algorithms, may be readily applied to the calculated attributes to inform the current and future state of the glacier (i.e. identify signals in the lead-up to a large event []).

Table 1

The attributes included in the three bundles of attribute functions in the seismic attributes library. The first column is the attribute number used by Provost et al. [], the second column provides a description of the attribute, and the final column lists the attribute category, and thus attribute function bundle.


NUMBER	DESCRIPTION	BUNDLE

1	Duration	Waveform

2	Ratio of the mean over the maximum of the envelope signal

3	Ratio of the median over the maximum of the envelope signal

4	Ratio between ascending and descending time

5	Kurtosis of the raw signal (peakness of the signal)

6	Kurtosis of the envelope

7	Skewness of the raw signal

8	Skewness of the envelope

10	Energy in the first third of the autocorrelation function

11	Energy in the remaining part of the autocorrelation function

12	Ratio of 11 and 10

13–17	Energy of the signal filtered in 5–10 Hz, 10–50 Hz, 5–70 Hz, 50–100 Hz, and 5–100 Hz

18–22	Kurtosis of the signal in 5–10 Hz, 10–50 Hz, 5–70 Hz, 50–100 Hz, and 5–100 Hz frequency range

24	Mean of the discrete Fourier transform (DFT)	Spectral

25	Maximum of the DFT

26	Frequency at the maximum

27	Central frequency of the 1st quartile

28	Central frequency of the 2nd quartile

29	Median of the normalized DFT

30	Variance of the normalized DFT

34–37	Energy in DFT for $0, 14 Nyquist frequency (Nyf), 14, 12 Nyf, 12, 34 Nyf, 34, 1 Nyf$ M1 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ \left[ {0,\,\,{\textstyle{1 \over 4}}} \right]{\rm{Nyquist\ frequency\ (Nyf)}},\,\,\left[ {{\textstyle{1 \over 4}},\,{\textstyle{1 \over 2}}} \right]Nyf,\,\,\left[ {{\textstyle{1 \over 2}},\,{\textstyle{3 \over 4}}} \right]Nyf,\,\left[ {\,{\textstyle{3 \over 4}},\,1} \right]Nyf \] \end{document}

38	‘Spectral centroid’ (as defined by Provost et al.)

39	Gyration radius

40	Spectral centroid width

68	Rectilinearity	Polarity

69	Azimuth

70	Dip

71	Planarity

Implementation and architecture

The analysis of waveform data from environmental and geotechnical seismic deployments requires multiple distinct steps. Following the principle that one should write programs that do one thing well, and write programs that work together [], the seismic waveforms library is correspondingly split into three primary functions to streamline the workflow. The name and purpose of these functions are as follows:

get_waveforms(); this function downloads waveform data from an online repository or alternatively reads data stored locally.
get_events(); this function uses an event triggering algorithm to produce an event catalogue based on the seismic signals in one or more components of one or more seismic stations.
get_attributes(); this function produces a pandas DataFrame of the attributes for each event using functions included in the library, or user-defined attribute functions.

The library includes the numerous attribute functions that are called by get_attributes(). Additional functions are also included to analyse the output of the primary functions (e.g. plotting), together with several private functions that support the primary workflow (documented in the code itself). The implementation of the three primary functions is outlined below.

get_waveforms

Seismic datasets can build to a high data volume (several terabytes) if high sampling rates are required over extended periods of time, or if using data from seismic arrays with multiple seismometers. This can result in memory problems when processing the waveforms and also disk space limitations. The first function in our workflow therefore acts as a gateway, to manage the volume of data downloaded from online repositories using obspy clients. Waveform data from each seismometer is written into separate files comprising only a single day of data for a single component (e.g. vertical, north, east). The user can specify the write location of the files to split different time periods across several external drives. This same function checks if the requested data have previously been downloaded (by default) and can optionally not check for, or not download missing data (e.g. in the case of known data gaps). The waveform data are written as .mseed files using standard obspy functions.

The get_waveforms function recombines downloaded or locally stored files into a single obspy Trace object for the requested time period, thus preparing the downloaded files for the next steps in the workflow. The user should examine small subsets of the time period of interest (e.g. a week at a time) based on factors including data volume and computer hardware performance. The seismic waveforms for the requested time period and components of a given seismometer are output as an obspy Stream object. The streams from different seismometers are combined using the ‘+’ operator into a single object.

get_events

As noted in the introduction, several algorithms exist to trigger events from single component waveform data. The get_events code first separates the signals from multiple seismometers and combines their (usually) three components into a single waveform. The user can optionally specify whether the component waveforms are combined as the sum-of-squares of their amplitude (i.e. Euclidean norm) to give the wave energy, or the absolute value of the wave amplitude. Further, to ensure that taking the absolute value of the amplitude does not affect the results (e.g. doubling frequency) we tested a computationally intensive option to fit the (time-varying) principle component of the direction of wave oscillation to obtain a signed amplitude; this gave identical event detections to the absolute amplitude and so is not included in our published version of the code due to the significantly longer computation time.

We use an adapted version of the obspy coincidence_trigger function to create a reference catalogue of events based on the STA/LTA characteristic function at each seismometer for their combined component waveforms. Our version of the coincidence_trigger function includes adjustments in the algorithm to better align with the outputs needed in our workflow and improvements to computational performance that are possible for our narrower use of this function (see Figure 1 for schematic of algorithm). This function can use the standard STA/LTA algorithms available within obspy [], and optionally the recently developed multi-STA/LTA algorithm []. In all cases, location-specific algorithm parameters are likely to be more successful for environmental seismology applications. The start and stop times of the detected event from a single seismometer can be extended to consider small gaps in between triggers (up to length thr_event_join), creating a single, longer-duration record. The reference event catalogue is created by finding times when n (i.e. thr_coincidence_sum) of the N seismometers in the array have temporally coincident records. The physical dimensions of the seismic array are calculated at this step to estimate an upper bound on the delay in arrival time of the signal at different seismometers. This delay is added to the duration of the single seismometer records to ensure the reference events are not artificially shortened. The reference event catalogue, which includes a reference start time and duration for each event, is output as a pandas DataFrame to provide a convenient format to write to file and interrogate.

Figure 1

Top: Diagrammatic representation of algorithm used to identify events from a single seismometer comprising one or more channels (traces show their Euclidean norm). The characteristic function for an STA/LTA algorithm is used to ‘trigger’ events (shown in red) and small gaps between these events (shorter than thr_event_join) are ignored (orange). No events are present at other times (shown in green). Bottom: Representation of algorithm used to identify events in the ‘reference event’ and ‘trace’ catalogues for seismic arrays (with multiple seismometers). The example shows an indicative array with N = 3 seismometers and thr_coincidence_sum = 2 simultaneous detections. Events identified for the single seismometers traces are shown in red (as given in the top panel). The duration of these events is extended at each end by half the delay in arrival time of a wave between the most distant seismometers in the array (shown in deep red). The reference event is identified as the times when thr_coincidence_sum = 2 seismometers have a detection (shown in red or deep red); small gaps with fewer seismometers are joined over (shown in orange). The events at each seismometer are simply the events identified from the single seismometer traces (red but not deep red), but with any times between events that occur during the reference event also included (shown in grey).

Our adapted version of the coincidence_trigger function provides additional utility for other researchers by outputting a secondary catalogue of trace metadata, for each seismometer, for all identified reference events. The triggered records at each seismometer that occur during (at least) part of the time period covered by a given reference event are joined together (if not already a single record). This record may extend beyond the time period of the reference event, be contained entirely within it, or have no detection at all in the case of weaker events. This trace (metadata) catalogue, which includes the trigger start time and duration for all seismometers, is similarly output as a pandas DataFrame. The trace metadata can be used to extract waveform data for identified events in the catalogue in a format that will be familiar to seismologists, for example, as shown in Figure 2 for a low-frequency event detected using four seismometers on the Whillans Ice Stream in Antarctica from 13:35:37 on 16 December 2010.

Figure 2

Waveforms of an event detected using four seismometers (BB01, BB03, BB04 and BB06) comprising part of a seismic array on the Whillans Ice Stream in Antarctica from 13:35:37 on 16 December 2010. The top, middle and bottom panels show the vertical, north and east component of the signal respectively. The waveforms for each seismometer start 30 seconds prior to the start time in the trace catalogue and terminate 60 seconds after the stop time. The code to reproduce this plot is provided as a worked example.

get_attributes

The get_attributes function is written following further principles [], that one should write flexible and open programs. This function extracts the waveform data for a given seismometer for the duration of a given event listed in the reference event, or trace (metadata) catalogue. The waveforms for each component are stored as separate Trace objects in a single stream. This Stream object is passed to one or multiple attribute functions to derive the characteristics of the given event as measured by a given seismometer. The values of the requested attributes for the array of seismometers are output as a pandas DataFrame. The correlation between the spectral attributes included in the library are shown in Figure 3 for events detected at Ilulissat, Greenland (DK.ILULI) on 1 January 2018 using the recursive STA/LTA algorithm.

Figure 3

Corner (or pair) plot illustrating the correlation between spectral attributes for the events detected at Ilulissat, Greenland on 1 January 2018 using the recursive STA/LTA algorithm. The attribute names are defined to match those used by Provost et al. []; see Table 1 for a description of each attribute. In this example, attribute 26 considers a frequency range close to the sampling rate of the recorded signal, therefore, it has null value for all events. The code to reproduce this plot is provided as a worked example in the detailed documentation provided with the software.

Optionally, custom attribute functions may be added to calculate any chosen characteristic of a waveform. Custom functions must take a Stream object containing one or more components as an input, and output the attribute name and value. Public functions to combine the component waveforms into the wave energy or absolute value of the wave amplitude are included in the seismic attributes library to aid the user. The attribute functions (as many as are required) are passed to the get_attributes function as optional parameters, with return values stored as above.

Quality control

We have tested the seismic attributes library to find any software bugs and to ensure outputs are reproducible by other researchers running the code or using different platforms []. We first created test cases based on glacier seismic signals from a field campaign seismic array on the Whillans Ice Stream in West Antarctica []. Plausible uses of the code were tested by selecting some or all of the seismometers in the array, some or all of the three components of the signal recorded at each seismometer (i.e. vertical, north and east), and considering different durations of requested time (with and without small data gaps). The optional function inputs were also tested in this manner. In application, e.g. [], the event catalogue produced using the seismic attributes workflow compared favourably to events that were visually classified using the same set of seismometers []. Finally, the attribute functions included in the library were tested on mock waveform data to ensure they produce the theoretically expected results.

Testing was primarily carried out using a Python 3.8 installation on a macOS 10/11 system. The key features of the seismic attributes library were tested on other platforms for compatibility, including both Python 2.7 and 3.7/8 for the most up to date version of obspy, numpy, pandas and seaborn available for that installation. The code was further tested on both macOS 10/11 and Microsoft Windows operating systems, especially to verify functionality of the different file systems. The reference event and trace (metadata) catalogues were compared between these platforms to assess the consistency of code functionality.

(2) Availability

Operating system

macOS 10/11, GNU/Linux and Microsoft Windows.

Programming language

Python 2 or 3.

Additional system requirements

Memory and disk space will limit the volume of data that can be processed in a contiguous chunk; the code will work on any system that can support obspy. Internet access is required to download new waveform data from online repositories.

Dependencies

The minimal dependency for use is Python 2/3 with obspy, numpy, pandas and seaborn packages installed.

Software location

Archive

Name: An ObsPy library for event detection and seismic attribute calculation: preparing waveforms for automated analysis

Persistent identifier: https://doi.org/10.5281/zenodo.5496794

Licence: GPL version 3

Publisher: Zenodo

Date published: 10/09/2021

Code repository

Name: seismic_attributes

Persistent identifier: https://github.com/rossjturner/seismic_attributes

Licence: GPL version 3

Date published: 10/09/2021

Language

Python 2 or 3.

(3) Reuse potential

The main purpose behind the seismic attributes library is to provide a streamlined workflow for researchers to detect and categorise seismic events from environmental and geotechnical sources in a rigorous and reproducible manner. The code will find extensive use in areas of seismology where a diverse population of seismic signals are present, including glaciers, volcanoes, landslides and mine sites. The get_events function is expected to be especially useful, in particular in glacier seismology, to provide a consistent method for creating event catalogues for ongoing machine learning and conventional seismological analysis. We anticipate that our get_attributes function may also be useful in applications outside seismology to aid in the construction of a catalogue of waveform attributes for use in machine learning. The code would need only minor modifications to handle time series data, outside of seismology, stored in different formats. It could conceivably find applications other areas of science (e.g. astrophysics), economics and many other potential applications. Limited support may be provided by contacting the corresponding author.

[B1] Allen R. Automatic phase pickers: Their present use and future prospects. Bulletin of the Seismological Society of America. 1982; 72(6B): S225–S242. DOI: https://doi.org/10.1785/BSSA07206B0225

[B2] Allen RV. Automatic earthquake recognition and timing from single traces. Bulletin of the Seismological Society of America. 1978; 68(5): 1521–1532, 10. DOI: https://doi.org/10.1785/BSSA0680051521

[B3] Anstey NA. The Sectional Auto-Correlogram and the Sectional Retro-Correlogram. Part I: The Sectional Auto-Correlogram. Geophysical Prospecting. 1966; 14(4): 389–426. DOI: https://doi.org/10.1111/j.1365-2478.1966.tb02245.x

[B4] Aso N, Tsai VC, Schoof C, Flowers GE, Whiteford A, Rada C. Seismologically Observed Spatiotemporal Drainage Activity at Moulins. Journal of Geophysical Research: Solid Earth. 2017; 122(11): 9095–9108. DOI: https://doi.org/10.1002/2017JB014578

[B5] Aster RC, Winberry JP. Glacial seismology. Reports on Progress in Physics. 2017; 80(12): 126801. DOI: https://doi.org/10.1088/1361-6633/aa8473

[B6] Barcheck CG, Tulaczyk S, Schwartz SY, Walter JI, Winberry JP. Implications of basal micro-earthquakes and tremor for ice stream mechanics: Stick-slip basal sliding and till erosion. Earth and Planetary Science Letters. 2018; 486: 54–60. DOI: https://doi.org/10.1016/j.epsl.2017.12.046

[B7] Earle PS, Shearer PM. Characterization of global seismograms using an automatic-picking algorithm. Bulletin of the Seismological Society of America. 04 1994; 84(2): 366–376. DOI: https://doi.org/10.1785/BSSA0840020366

[B8] Houliston D, Waugh G, Laughlin J. Automatic real-time event detection for seismic networks. Computers & Geosciences. 1984; 10(4): 431–436. DOI: https://doi.org/10.1016/0098-3004(84)90043-8

[B9] Kavanaugh J, Schultz R, Andriashek L, Baan M, Ghofrani H, Atkinson G, Utting D. A New Year’s Day Icebreaker: Icequakes on Lakes in Alberta, Canada. Canadian Journal of Earth Sciences. 01 2019; 56: 183–200. DOI: https://doi.org/10.1139/cjes-2018-0196

[B10] Köhler A, Ohrnberger M, Scherbaum F. Unsupervised pattern recognition in continuous seismic wavefield records using Self-Organizing Maps. Geophysical Journal International. 2010; 182(3): 1619–1630. DOI: https://doi.org/10.1111/j.1365-246X.2010.04709.x

[B11] Latto RB, Turner RJ, Reading AM, Cook S, Winberry P. Event Detection in Cryoseismology. [C032-07] presented at 2020 Fall Meeting, AGU, 1–17 Dec; 2020.

[B12] LeVeque RJ, Mitchell IM, Stodden V. Reproducible research for scientific computing: Tools and strategies for changing the culture. Computing in Science & Engineering. 2012; 14(4): 13–17. DOI: https://doi.org/10.1109/MCSE.2012.38

[B13] Lipovsky BP, Meyer CR, Zoet LK, McCarthy C, Hansen DD, Rempel AW, Gimbert F. Glacier sliding, seismicity and sediment entrainment. Annals of Glaciology. 2019; 60(79): 182–192. DOI: https://doi.org/10.1017/aog.2019.24

[B14] Nettles M, Ekström G. Glacial Earthquakes in Greenland and Antarctica. Annual Review of Earth and Planetary Sciences. 2010; 38(1): 467–491. DOI: https://doi.org/10.1146/annurev-earth-040809-152414

[B15] Olsen KG, Nettles M. Constraints on Terminus Dynamics at Greenland Glaciers From Small Glacial Earthquakes. Journal of Geophysical Research: Earth Surface. 2019; 124(7): 1899–1918. DOI: https://doi.org/10.1029/2019JF005054

[B16] Podolskiy EA, Walter F. Cryoseismology. Reviews of Geophysics. 2016; 54(4): 708–758. DOI: https://doi.org/10.1002/2016RG000526

[B17] Pomeroy J, Brisbourne A, Evans J, Graham D. The search for seismic signatures of movement at the glacier bed in a polythermal valley glacier. Annals of Glaciology. 2013; 54(64): 149–156. DOI: https://doi.org/10.3189/2013AoG64A203

[B18] Pratt MJ, Winberry JP, Wiens DA, Anandakrishnan S, Alley RB. Seismic and geodetic evidence for grounding-line control of Whillans Ice Stream stick-slip events. Journal of Geophysical Research: Earth Surface. 2014; 119(2): 333–348. DOI: https://doi.org/10.1002/2013JF002842

[B19] Provost F, Hibert C, Malet J-P. Automatic classification of endogenous landslide seismicity using the Random Forest supervised classifier. Geophysical Research Letters. 2017; 44(1): 113–120. DOI: https://doi.org/10.1002/2016GL070709

[B20] Rouet-Leduc B, Hulbert C, Lubbers N, Barros K, Humphreys CJ, Johnson PA. Machine learning predicts laboratory earthquakes. Geophysical Research Letters. 2017; 44(18): 9276–9282. DOI: https://doi.org/10.1002/2017GL074677

[B21] Salus PH. A quarter century of UNIX. Mass: Addison-Wesley Pub. Co Reading; 1994.

[B22] O. D. Team. obspy.core – Core classes of ObsPy; 2020.

[B23] Vaezi Y, Van der Baan M. Comparison of the STA/LTA and power spectral density methods for microseismic event detection. Geophysical Supplements to the Monthly Notices of the Royal Astronomical Society. 2015; 203(3): 1896–1908. DOI: https://doi.org/10.1093/gji/ggv419

[B24] Winberry JP, Anandakrishnan S, Wiens DA, Alley RB. Nucleation and seismic tremor associated with the glacial earthquakes of Whillans Ice Stream, Antarctica. Geophysical Research Letters. 2013; 40(2): 312–315. DOI: https://doi.org/10.1002/grl.50130

[B25] Zhou Z, Cheng R, Cai X, Ma D, Jiang C. Discrimination of Rock Fracture and Blast Events Based on Signal Complexity and Machine Learning. Shock and Vibration. 02 2018; 2018: 1–10. DOI: https://doi.org/10.1155/2018/9753028

Journal of Open Research Software

Software Metapapers