We describe an

When dealing with stochastic series of data measurements, standard statistical tools, such as mean and centered moments, are able to catch the essential features of thedistribution of observed values. Last end, sufficient high-order moments will retrieve a good approximation of the probability density function (PDF) associated with the stochastic process. However, PDFs are not able to fully characterize the dynamics underlying the process. A typical example is the Gaussian distribution: if the stochastic variable assumes values according to a Gaussian distribution, the dynamics producing such distribution of values can be as simple as an Ornstein-Uhlenbeck process [

Starting from a stochastic differential equation, a process can be statistically reconstructed through simple stochastic integration. The inverse problem however is much more complicated: would a set of measurements be enough for a bottom-up approach to infer the underlying dynamics of the process? The short answer is yes, there are cases where this is possible. In this paper we present the long answer implemented as a package for

A wide range of dynamical systems can be described by a stochastic differential equation, the (non-linear) Langevin equation (cf. [

Consider a general stochastic trajectory ^{(1)}(^{(2)}(

where the square root is taken for consistency, as will be clear below. We assume stationary time series here, so ^{(1)} and ^{(2)} are not time dependent but we show briefly how non-stationary time series can be treated in Section “A glimpse beyond the

The Langevin equation should be interpreted as follows: for every time ^{(1)}(

where η(

Functions ^{(1)}(^{(2)}(^{(2)} is explicitly depending on

In all cases, through substitution of the selected functions into Equation 2 one is able to generate samples of series having the same statistical features and obeying the same dynamics.

Figure ^{(1)}(^{3} + ^{(2)} (^{2} + 1. Notice that these drift and diffusion coefficients describe a non-trivial dynamics, namely the underlying deterministic process, i.e., ^{(2)} ≡ 0, has two attractive fixed points at ^{(2)} ≠ 0) which is able to push the system from one stable fixed point to the other. As shown in Figure ^{(1)} = –^{(2)} = 1.

(a) Sketch of a stochastic process in time governed by a cubic drift and quadratic diffusion contributions and (b) its corresponding probability density function (PDF). Though the series shows a bistable dynamics (cubic drift) the PDF follows a Gaussian function, equivalent to an Ornstein-Uhlenbeck process (see text).

This is one of many possible examples that illustrates the deep insight, which an evolution equation like in Equation 1 can provide and which is not obtained by looking at a density distribution, see Appendix for further details.

As explained previously, it is easy to generate data through the integration of a stochastic equation, such as Equation 2. More difficult is the inverse problem, to derive functions ^{(1)} and ^{(2)} from given data.

A condition to derive the drift and diffusion numerically is that the time-steps τ of the set of

Therefore, having a series of data, one estimates the drift and diffusion by computing the averages of the first and second power of the differences between

where ⟨·⟩ represents the average over time

which means that they are given by derivatives of the corresponding conditional moments ^{(n)}(

Figures

One-dimensional Langevin Approach: (a) drift coefficient, ^{(1)}(^{3} + ^{(2)}(^{2} + 1. Circles indicate the numerical results while the red dashed line indicates the theoretical coefficient, used when generating the synthetic data. Here 10^{7} data points from the series illustrated in Figure

The Langevin Approach summarized previously is applied under a few conditions, though, as we discuss afterwards, when such conditions are not fulfilled in many cases it is still possible to overcome that and apply an alternative approach which also retrieves the dynamics underlying the stochastic process. For the completeness of this paper and for the consistency of the application of our

The first condition is that the data series is stationary. Indeed, the averages for computing the conditional moments have to be taken over all _{i}_{i}

The second condition is that the process should be Markovian, i.e., the present state should depend on the previous state solely. Mathematically it means an equivalence between two-point statistics,

This should hold for any positive integer

To test if the process is Markovian one can also use alternatively the Wilcoxon test [

The third condition to be tested comes from a mathematical result called Pawula Theorem [^{(4)} = 0. To test that one computes its derivative with respect to the time-lag, the fourth coefficient

and checks if it vanishes, i.e., if it is small compared to the diffusion coefficient: ^{(4)}(^{(2)}(^{2} ∀

The tests whether conditions two and three hold ensure that Г(

If all these conditions are fulfilled the Langevin Approach can be carried out and the two functions, drift coefficient ^{(1)} and diffusion coefficient ^{(2)}, can be derived from the given data. With the derived coefficients two additional cross-checking tests can be done.

The first one is to check if the stochastic force in Equation 1 fulfills the two conditions of a δ-correlated Gaussian white noise. To that end, one substitutes in Equation 2 the derived ^{(1)}(^{(2)}(

Taking τ as the time-step of the observed time-series and substituting in

The second cross-checking test is to substitute in Equation 2 the derived ^{(1)}(^{(2)}(

have the same distribution as the original series for a fixed τ spanning from the time-step of the original series up to two or more orders of magnitude larger.

Some extra care should be taken if the derived ^{(1)}(^{(2)}(

Notice that, though the fulfillment of all such conditions through the proposed preliminary tests and cross-checking tests guarantees that the Langevin Approach can be applied, the rejection of one or more of these tests is still no reason for avoiding this approach. In Section “A glimpse beyond the

In this section we present the implementation of the Langevin Approach describing the two available

The one-dimensional case deals with an evolution equation similar to Equation 1 and the two-dimensional case comprehends two stochastic variables, _{1}(_{2}(

where clearly now the drift function ^{(2)} = ^{T}

where η_{1}(_{2}(^{(n)}(_{1}, …, _{N}) and calculating the mean values for each interval _{i}

For estimating the drift and diffusion coefficients from the conditional moments we insert Equation 2 into Equation 11 and apply the conditional averages for n = 1, 2 leading to:

Important to notice is that for ^{(2)}(^{(1)}(

The implementation of the functions heavily relies on the

In the following subsections we present one- and two-dimensional examples of Langevin processes and walk through the analysis based on the framework described in the previous section.

As an example we integrate the Langevin equation illustrated in Figure

The presented package provides the function

Extracting drift and diffusion coefficients from the generated time series is done by the function

The first one is the number of

The second parameter is the vector

From the resulting list

We now want to walk through some of the remarks given in Section “Back to the data” to check if the conditions under which we applied the Langevin Approach are fulfilled. We do not check if the time series is stationary and fulfills the Markovian properties, since here we already know this (as we are using synthetic data).

Therefore we concentrate on cross-checking the estimated drift and diffusion coefficients. For checking if ^{(4)}(^{(2)}(^{(4)} and (^{(2)})^{2}:

The result shows that ^{(4)}(

As a second cross-check we compare the increments, as defined in Equation 8, of the original time series with the ones computed from the reconstructed time series based on the estimated drift and diffusion functions.

To this end we fit a cubic function to the estimated drift coefficient and a quadratic function to the diffusion coefficient:

The resulting coefficients are used to generate a new time series with

We want to emphasize here that the Langevin Approach does not require the drift and the diffusion coefficients to be of any particular functional form, from the estimated coefficients one could directly integrate a stochastic time series which can be used to calculate the increments. We fit the estimated coefficients to polynomials only to be able to use the function

From the original and the reconstructed time series we now calculate PDFs of the increments for different τ and plot them to inspect their agreement visually:

Figure

PDFs of the increments for τ =1, τ =10, τ = 100 and τ = 1000 time lags (from top to bottom). Solid lines show the results for the original time series, broken lines the result for the reconstructed time series.

Notice once again that while the PDF of the series generated by Equation 14 is the same as the one of the simple Ornstein-Uhlenbeck process,

As a two-dimensional example we integrate the coupled Langevin equations in Equations 9 for a particular choice of the drift and diffusion coefficients, namely [

where _{1}, _{2}) for

(a) Trajectory (_{1}(_{2}(^{6} resp. 10^{5} data points where used.

The integration is performed by _{ij}

Estimating the drift and diffusion coefficients is done by

The results shown in Figure

Drift coefficient of (a) the _{1} component, _{2} component,

The numerical results can be properly fitted through the functions used for the integration in Equations 15, namely:

The two examples exposed above show cases where all conditions are fulfilled. When analyzing real empirical data sets this is often not the case: one or more of the conditions under which the Langevin Approach is applied are not met. Still, in the last years we developed different alternatives and extensions to this approach to overcome specific situations in stochastic data analysis. In this section we briefly describe these alternatives and extensions.

One first problem that researchers face is the non-stationary character often appearing in real data. Here, one of two approaches may be possible. One is to ascertain if for “shorter” time-windows of the data series stationarity may be assumed. In case the data set can be decomposed in a series of time-windows which may overlap, each one having more or less constant statistical moments of the observable, the Langevin approach can be applied separately to each one of them, yielding a set of drift and diffusion coefficients, one for each time-window. In the end one extracts one drift and one diffusion coefficient, both functions of the observable and also of time.

Another possibility to handle non-stationary data sets is to check if they can be conditioned to other observables. In that case, considering the periods of the data sets associated to a particular value of the conditioning observable may be itself stationary. This is the case of the stochastic series measured of wind turbines [

Another problem researchers often face are situations where the correlation function of the process is not fully resolved, i.e. the sampling rate of the data is too low. When this is the case, the correlation length is overestimated leading to wrong estimation of the time scale associated with the drift [

The second condition listed above is the Markov property. When the series of measurements fails to fulfill the Markov tests described above, it cannot be reconstructed through stochastic Euler integration since the next state cannot be estimated from the present state alone (see Equation 2). This happens, for instance, when a Markov process is spoiled by additional additive noise when a measurement is taken (see [^{(1)} and ^{(2)} diverge (see Equation 4): when τ → 0 the conditional moment for the measured values (numerator) does not vanish. Still, it is frequently possible to obtain the correct drift and diffusion coefficients for the Markov process ^{(1)}(

A third problem that may appear during preliminary tests of empirical data is the non-vanishing fourth coefficient ^{(4)}. As stated above in Section “Back to the data”, according to Pawula Theorem [^{(1)} and ^{(2)} have not the meaning of drift and diffusion contributions, one can still use both to provide valuable insight about the system being analyzed. One example is the work of Rinn et al. [^{(1)} is a sensitive indicator of the damage and compared to frequency based approaches, like power spectra, which estimate changes of the eigenfrequency of the structure, it is even more sensitive to small damages.

Finally, it is also important to stress that, while the functions of the presented package were prepared for analyzing data series as processes in time, the Langevin Approach can be adapted for analyzing processes in scale. In fact, when the process is not Markovian in time, violating Equation 5, there is the possibility that it is Markovian in “scale”. What does this mean? It means that the increments Δ_{τ} introduced above follow a Markovian process in τ i.e., in time-lags but are instationary. Such analysis in scale is able to reproduce e.g., turbulence energy cascades [

More details on all these extensions and alternatives to the Langevin Approach can be found in Friedrich et al. [

In this paper we present an

The introduced functions serve as a framework to analyze one- and two-dimensional time series. They provide estimation of drift and diffusion coefficients describing the deterministic and the stochastic part of the analyzed process respectively. Integrating Langevin processes numerically enables one for cross-checking the obtained result and for generation of synthetic data sets.

Through illustrative examples we have shown that the Langevin evolution equation is able to uncover complex dynamics, even in cases when the associated statistics is identical to many other stochastic processes.

The presented package can be straightforwardly applied by

Still, additional improvements of the presented functions are possible. For instance, instead of using the common average bin value when performing the binning of the data, one can incorporate a kernel-based regression of such values [

In this appendix we show that a large family of two-point statistical distributions, each one univocaly defining one Langevin equation, Equation 1, corresponds to a one-point statistics given by the standard normal distribution

i.e., a Gaussian distribution with zero mean and unit variance.

To that end, we start with one important remark concerning the evolution equation of one stochastic variable

The stationary solution

For the simple Ornstein-Uhlenbeck process, governed by Equation 1 with ^{(1)} = – ^{(2)} = 1, the stationary PDF reduces to _{0} in Equation 16.

One could, however, consider a much more complex dynamics such as the one exemplified in this paper, with a bistable (cubic) drift coefficient and a non-constant diffusion, depending quadratically on the stochastic variable

Here, ^{(1)} has two stable fixed points at ±^{(2)} has a minimum value

Substituting the cubic drift and the quadratic diffusion, given in Equations 19, into the stationary solution, Equation 18, and integrating, yields the stationary solution:

As one sees, the solution in Equation 20 has, in general, not only a Gaussian part, like Ornstein-Uhlenbeck processes, but also a polynomial part with an exponent depending on all parameters of ^{(1)} and ^{(2)}. However, if the exponent is exactly zero,

the polynomial part vanishes and the stationary solution reduces to the Gaussian distribution. In the example used in Section “Example for analyzing one dimensional data sets” with

In general, the one-point statistic in the stationary regime given by Equation 18 yields Gaussian distributions even in more complex dynamics than the one here chosen. One only needs to have a drift coefficient given by one polynomial of odd degree ^{(1)}/^{(2)} in the stationary solution reduces to a linear function in ^{(1)}, ^{(2)}) yielding the same one- point statistic.

The ambiguity of one-point statistics in characterizing the dynamics of stochastic processes in general, motivates the Langevin Approach implemented in our ^{(1)} and ^{(2)} without any given Ansatz, it can easily distinguish between higher-order drift and diffusion coefficients.

All functions of the presented package have been tested against analytically solvable problems. Both example sections of this paper show how those tests where carried out in principle.

Any system capable of running R ≥ 3.0.2.

R ≥ 3.0.2.

R ≥ 3.0.2, Rcpp ≥ 0.11.0 and RcppArmadillo ≥ 0.4.600.0.

Philip Rinn (Developer)

Pedro G. Lind (Contributed to

David Bastine (Contributed to

English.

The

The source code of these examples is availiable at

The authors thank Constantino Garcia Martínez (University of Santiago de Compostela, Spain) for useful discussions about our methods and motivating in sharing their implementation with a broader audience during his visit to our group.

The authors declare that they have no competing interests.