(1) Overview

Introduction

Function approximation, integration, and optimization are fundamental problems requiring numerical solutions that come from iterative algorithms. A crucial question is how and when to stop the computation.

Theoretical error bounds typically contain unknown quantities, such as the norm of the input function. This makes them impractical as stopping criteria.

Therefore, practical algorithms that adapt the computation to the error requirement are often based on heuristics. These include a popular adaptive quadrature algorithm of Shampine [], which is a part of MATLAB [], and the Chebfun library []. Heuristics based on function data tend to lack theoretical support; one does not know when they work and when they do not. A warning against commonly used adaptive quadrature stopping criteria is given by Lyness [].

To address these shortcomings, we have developed adaptive stopping criteria for univariate function approximation, integration, and optimization; and for multivariate integration. We have implemented them in the Guaranteed Automatic Integration Library (GAIL) []. In contrast to other automatic or adaptive algorithms, our GAIL algorithms have theoretical foundations that are detailed in a series of articles and graduate theses [, , , , , , , , , , , , , ].

The underlying idea in our GAIL algorithms is that for reasonable functions what you see is nearly what you get. The initial sampling of the function to be approximated, integrated, or optimized tells us enough about its norm so that we can compute data-driven error bounds. For each algorithm, there is an associated set of reasonable functions corresponding to a cone, C. Mathematically, “cone” means that a constant multiple of every function in C is also in C. For adaptive simple (i.e., independent and identically distributed) Monte Carlo integration algorithms [, ], C corresponds to a function whose kurtosis is no larger than some bound. The bound reflects the user’s definition of “reasonable”. For adaptive univariate algorithms [, , , , ], C corresponds to functions for which a stronger norm is bounded in terms of a weaker one. For quasi-Monte Carlo integration algorithms [, , , , , ], C corresponds to functions whose Fourier complex exponential or Walsh coefficients decay steadily. For Bayesian quasi-Monte Carlo algorithms [, ], C corresponds to typical (non-outlier) Gaussian processes within the sample space.

Implementation and architecture

GAIL includes the following algorithms. All core algorithm names end with “_g” to denote some form of accuracy guarantee. The last one, meanMC_CLT, is the only exception and is a stopping criterion based on the Central Limit Theorem for pedagogical purposes. Figure 1 shows the structure of GAIL.

Figure 1 

Structure of GAIL Algorithms.

  1. One-dimensional algorithms:
    1. funappx_g [, , ]: One-dimensional function approximation on a closed, bounded interval;
    2. funmin_g [, ]: Global minimum value of univariate function on a closed, bounded interval;
    3. integral_g [, ]: One-dimensional integration on a bounded interval.
  2. Multi-dimensional algorithms:
    1. meanMC_g [, ]: simple Monte Carlo method for estimating mean of a random variable;
    2. cubMC_g [, ]: simple Monte Carlo method for numerical multiple integration;
    3. cubLattice_g []: Quasi-Monte Carlo method using rank-1 lattice cubature for d-dimensional integration;
    4. cubSobol_g [, , ]: Quasi-Monte Carlo method using Sobol’ cubature for d-dimensional integration;
    5. cubBayesLattice_g []: Bayesian cubature method for d-dimensional integration using lattice points;
    6. cubBayesNet_g [, ]: Bayesian cubature method for d-dimensional integration using Sobol points;
    7. meanMC_CLT: Monte Carlo method with Central Limit Theorem (CLT) confidence intervals for estimating mean of a random variable.

Figure 2 shows the architectural design of GAIL algorithms. A GAIL algorithm typically takes in

  • a real-valued function, f,
  • its domain, D (e.g., finite interval or hyperbox),
  • user tolerance, ϵ > 0,
  • an initial number and a maximum number of sample points in D at which f are evaluated, n0,
  • a maximum number of sample points nN, and
  • a maximum number of iterations, I.
Figure 2 

GAIL architectural design. The largest yellow circle contains a compulsory input function f. The other inputs in small yellow circles are typically optional and, when absent, set to default values in the GAIL algorithms. Each GAIL algorithm is iterative in nature. In the ith iteration, a solution estimate soli is computed along with its error estimate ei obtained by ni sampling points in the input domain D. When ei is not greater than the tolerance, ϵ, GAIL iterations stop and return the final numerical solution sol (in the largest blue circle). Other outputs (in small blue circles) are bundled in a MATLAB structure array.

Among all the inputs, only f is compulsory. Other inputs are optional and implemented with default values as specified in the documentation. We note that in some multiple integration algorithms, the user tolerance may be a generalized error tolerance function, max(ϵa, |y|ϵr), where ϵa > 0 and ϵr > 0 are respectively absolute and relative tolerances, and y is the unknown true solution. Each algorithm may have its unique additional inputs. For instance, a (quasi-)Monte Carlo algorithm typically has an input dimension, d.

In the ith iteration, a GAIL algorithm evaluates an estimated solution soli and its error bound, ei, using ni function samples. When eiϵ, the algorithm designates the iteration as the last iteration, l and returns the outputs sol = soll, e = el, and an exit flag that indicates algorithmic success, along with other outputs that are specific to the algorithm. Other less satisfactory stopping conditions are i == I or ni == nN, and the returned exit flag would note such non-success. We refer readers to [] for details of stopping criteria in GAIL’s algorithms.

Each one of our key GAIL algorithms, except for cubBayesLattice_g and cubBayesNet_g, can parse inputs with the following three patterns of Application Programming Interfaces (APIs), where f is a real-valued MATLAB function or function handle; in_param and out_param are MATLAB structure arrays; and x is an estimated output:

  1. Ordered input values: [x, out_param] = algo(f, val1, val2, val3,…)
  2. Input structure array: [x, out_param] = algo(f, in_param)
  3. Ordered input values, followed by optional name-value pairs: [x, out_param] = algo(f, ‘input1’, val1, ‘input2’, val2,…)

For cubBayesLattice_g and cubBayesNet_g, they are implemented in object-oriented design (whereas others are in modular functions). Hence, their interfaces are slightly different. We refer readers to Table 4 for an example and GAIL documentation for details. The three forms of aforementioned inputs are still applicable, but the outputs of the Bayes methods are [obj, x] instead, where obj is an instance of the object class. To obtain the same output form of other algorithms, users can simply do an additional, optional step: [x, out_param] = compInteg(obj). Another small difference is that the second input parameter, dimension d, in the two Bayes algorithms is compulsory.

We note that almost all high-dimensional integration algorithms in GAIL have been re-implemented (with extensions) in the open-source Python software library, QMCPy [].

In the following, we showcase GAIL’s performance with two examples on univariate function optimization and cubature.

Example 1. We want to find the global minimum of the following function:

(1)
f(x)=5e100(x0.15)2e80(x0.65)2 for x[0,1].

We plot the function f in (1), along with the sampling points and best estimates, (, f()) of true minimum (x*, f(x*)) from three solvers, MATLAB’s fminbnd, Chebfun’s min, and GAIL’s funmin_g in Figure 3. For (1), funmin_g automatically samples the function more often in spiky areas and locates the global minimum accurately. In contrast, MATLAB’s fminbnd [, ] returns a local minimum that is not a global minimum. That said, fminbnd is designed for seeking a local minimum. Chebfun [] approximates f with Chebyshev polynomials and samples f at Chebyshev points. Its min function is capable of returning all local minima that it can find, out of which we extract the global minimum. Both min and funmin_g succeeded in locating the global minimum to the required accuracy, with the former being more efficient using fewer sampling points and the latter more accurate, but both met the tolerance of ϵ = 10–6. In addition, Table 1 summarizes the solvers’ performance. In Table 2, we show the essential code for setting up funmin_g for this example.

Figure 3 

Function f defined in (1), sampling points and best estimates retured by solvers MATLAB’s fminbnd, Chebfun’s min, and GAIL’s funmin_g. This figure is reproducible by the MATLAB script, demo_funmin_g2_samplepoints.m available in GAIL’s ‘develop’ branch at https://github.com/GailGithub/GAIL_Dev/tree/develop/GAIL_Matlab/Papers/GAIL_JORS.

Table 1

Performance of funmin_g, fminbnd, and min with automatic stopping criteria for optimizing the function defined in Example 1. This table is reproducible by the MATLAB script, demo_funmin_g2_samplepoints.m.


METHODFUNMIN_GFMINBNDMIN

|– x*|1.0 × 10–10 0.51.0 × 10–8

|f() – f(x*)|04.01.3 × 10–7

n1131037

Time (seconds)0.0420.0480.022

Table 2 

Essential code in the MATLAB script, gail_jors_eg1.m, for invoking funmin_g in Example 1.

Example 2. In this example, we compare GAIL’s Monte Carlo and quasi-Monte Carlo methods in similar ways as in Section 4 in [] with the Keister integrals []:

(2)
[0,1]dπd/2cos(12j=1dΦ1(xj))dx.

where x represents vectors in the d-dimensional unit hypercube. In Table 3, we summarize the performance of the methods MC, Lattice, Sobol, Bayes Lattice, and Bayes Net—they refer to the GAIL cubatures, cubMC_g, cubLattice_g, cubSobol_g, cubBayesLattice_g, and cubBayesNet_g, respectively. In the case of d = 3, all five methods succeeded completely meaning the absolute error is less than given tolerance, i.e., |µ –μ^| ≤ ϵ, where μ^ is a cubature’s approximated value and µ is the true value of (2). In the case of d = 8, success rate is at least 98% for each GAIL cubature. The fastest method was cubSobol_g for the two cases, whereas cubBayesNet_g used the least number of sampling points. cubBayesLattice_g and cubSobol_g achieved the smallest average absolute error for d = 3 and d = 8, respectively. The code in Table 4 shows how the problem with d = 3 is solved with the GAIL solvers.

Table 3

Average performance of cubatures with automatic stopping criteria for estimating the integrals in (2) for 1000 independent runs. These results can be conditionally reproduced with the MATLAB command, KeisterCubatureExampleJORS(1000), in GAIL.


d = 3, = 0.005


METHODMCLATTICESOBOLBAYES LATTICEBAYES NET

Absolute Error1.1 × 10–3 5.2 × 10–4 5.2 × 10–4 3.4 × 10–7 5.8 × 10–4

Tolerence Met100%100%100%100%100%

n25000004100390041001800

Time (seconds)0.17000.00970.00650.01000.1200

d = 8, = 0.050

METHODMCLATTICESOBOLBAYES LATTICEBAYES NET

Absolute Error1.2 × 10–2 1.4 × 10–2 6.9 × 10–3 2.1 × 10–1 8.8 × 10–3

Tolerence Met100%99%100%98%100%

n7400000150001600010000008200

Time (seconds)1.10000.03800.02402.40000.3600

Table 4 

Essential code in the MATLAB script, gail_jors_eg2.m, for invoking GAIL’s (Q)MC algorithms in Example 2.

Quality control

The testing of GAIL library is automated as scheduled tasks. There are two kinds of tests run: fast tests and long tests.

As aptly named, the fast tests take a relatively short time to run so that a user can quickly test the sanity of the library and the installation. Essential capabilities of the algorithms are quickly checked with carefully chosen tests to make sure the new code has not broken existing algorithms.

The long tests are more rigorous use cases that take much longer time, up to several hours to finish. These tests also typically include the examples from the papers and theses associated with the GAIL algorithms. The long tests are meant to test all the features and capabilities of the algorithms which cannot be covered in the fast tests.

Both types of tests are executed on the Karlin computing cluster hosted at the Illinois Institute of Technology (IIT). These machines run Centos Release 6.10. The Portable Batch System (PBS) is used to schedule the tasks. GAIL library is tested with seven recent MATLAB versions at least. The fast tests are automatically run once everyday.

The fast tests take less than two minutes to finish in our test setup. The long tests are run everyday for at least one version of MATLAB so that all the recent seven MATLAB release versions are covered in a circular rotation. As of this writing, both the fast tests and long tests are run with these MATLAB versions: R2017a, R2017b, R2018a, R2018b, R2019a, R2019b, and R2020a.

Before the tests begin, the ‘develop’ branch of the GAIL git repository is pulled in. Then the fast tests are run first, followed by the long tests. Automatically the test results are sent as emails to the maintainers.

All of the GAIL code-base is hosted and version-controlled in GitHub at https://github.com/GailGithub/GAIL_Dev. There are three major git branches used: 1) master, 2) develop and, 3) feature. The major releases come out of the ‘master’ branch after regression testing. A ‘feature’ branch is where one or more developers host their own rudimentary work and start developing an algorithm. Once the feature branch code reaches a satisfactory level of completion with all the tests passing, it gets merged into the ‘develop’ branch. The ‘develop’ branch is used to curate the candidate release algorithms. Periodically, all the developers get together and review the status of the ‘develop’ branch such as the documentation, code cleanliness, and tests completion before voting to merge with ‘master’.

(2) Availability

Operating system

Our software is expected to run on multiple operating systems including but not limited to Windows, Mac, and Linux. Any operating system that is compatible with the MATLAB versions below should be able to run GAIL successfully; please see System Requirements and Supported Compilers at https://www.mathworks.com/support/requirements/previous-releases.html. Our automated test suites are executed daily on CentOS Linux release 6.10.

Programming language

MATLAB, versions R2017a–R2021a.

Additional system requirements

We refer readers to the following page for MATLAB system requirements, which depend on MATLAB version and machine type: https://au.mathworks.com/support/requirements/previous-releases.html.

In addition, the installation of GAIL requires approximately 42 megabytes (MB) of disk space. The memory requirement of executing GAIL applications depends on various factors such as choice of algorithms, user tolerance, and the number of function sampling points. We recommend at least 2 gigabytes (GB) of memory allocated for MATLAB and GAIL.

Dependencies

GAIL is developed in MATLAB versions R2016a to R2021a. In particular, three of our core algorithms, cubSobol_g, cubBayesNet_g, and cubBayesLattice_g require the following MATLAB add-on toolboxes: Signal Processing Toolbox, Optimization Toolbox, Statistics and Machine Learning Toolbox. As each MATLAB release is associated with a specific version of a MATLAB toolbox, we do not detail the toolbox versions here — if necessary, the toolbox version numbers can be simply determined with the MATLAB command ver.

For development and testing purposes, we use the third-party toolboxes, Chebfun [] and Doctest for MATLAB [].

List of contributors

We thank the contributions of current, former, and visiting students in the Department of Applied Mathematics at Illinois Institute of Technology: Noah Grudowski, Cu Hauw Hung, Yueyi Li, Xincheng Sheng, Aleksei Sorokin, Xiaoyang Zhao, and Tianci Zhu.

Software location

Name: MathWorks File Exchange

Persistent identifier: 10.5281/zenodo.4018189

Licence: IIT License; see LICENSE.m in the archive

Publisher: Kan Zhang

Version published: 2.3.2

Date published: 05/09/2021

Code repository

Name: GitHub

Persistent identifier: https://github.com/GailGithub/GAIL_Dev

Licence: IIT License; see LICENSE.m in the zip or tar.gz archives

Date published: 05/09/2021

Language

English

(3) Reuse Potential

GAIL is publicly available as a Git repository hosted on GitHub at https://gailgithub.github.io/GAIL_Dev/. Since GAIL is written in MATLAB, it is accessible by all MATLAB users whose work requires numerical function approximation, integration, or optimization. Multivariate integration arises in fields such as quantitative finance [] and uncertainty quantification [].

Users with questions can submit an issue through GitHub Issues. Developers who wish to add algorithms to or enhance GAIL can submit a pull request, or email to the mailing list, gail-users@googlegroups.com.

Additional File

The additional files for reproducing the results in Tables 1, 2, 3, 4 and Figure 3 this article can be found as follows: