## 1 Introduction

Reproducible research (RR) in computational sciences was pioneered by Claerbout in the
Stanford Exploration Project (SEP) [**1**]. It has since been
championed by Claerbout, Donoho, and their collaborators in their research areas of geophysics,
signal and image processing [**2****, 3****, 4**]. The leading advocates for RR include
Gentleman and Peng in biostatistics [**5****, 6**], Koenkera in economics, LeVeque in numerical partial
differential equations [**7**], and Stodden in legal
frameworks that enable RR [**8**]. The defining principle of
RR [**3**] is “When we publish articles containing
figures which were generated by computer, we also publish the complete software environment
which generates the figures.” Clearly there is an inevitable dependency of RR to software,
hardware, code, and data.

This article has two main messages. First, we contend that worthy RR in modern computational
sciences have to be built upon strong mathematical, statistical, and approximation theories
leading to rigorous analysis of statistical and numerical errors, as well as computational
costs. Second, we establish our position that RR can be made substantially more
reliable—hence reliable reproducible research (RRR)—by development of supportable
scientific software (SSS), which we define as *a conceptual framework that encompasses a
collection of software development methods for promoting the reliability of reproducing
provably correct results in computational sciences*. We note that our discussion of the
SSS framework is not restricted to any particular programming languages.

While the notions of RRR and SSS are not entirely new, they are not as often observed as
hoped. Relatively few computational scientists seem to practice RR as reflected in recent survey
results put together by Flemisch [**9**]. It is also plausible
that the community is not as a whole well aware of the importance of reproducibility. When SSS
is absent, RR risks remaining an ideal. On the other hand, conscientious adoption of the
principles can lead to substantially more trustworthy research results.

More specifically, in terms of theory, we have a relatively new suite of algorithms we call
MINRES-QLP Pack, for solving singular or ill-conditioned linear systems of equations or linear
least-squares problems with guaranteed accuracy. In terms of practice, we do what we advocate;
we borrow a number of strategies and tools from industrial software engineering that lead to
more reliable RR. While this paper focuses on the MINRES-QLP Pack to illustrate how to achieve
RRR via SSS, we and our collaborators have at least two more suites of accuracy-guaranteed
algorithms developed using similar approaches: First is the Guaranteed Automatic Integration
Library (GAIL) [**10****, 11**] for numerical integration in one or many dimensions based on the recent
groundbreaking work of Hickernell et al. [**12****, 13**]. Second is the Open-Source CIM-EARTH Framework (OSCEF) for
solving large-scale computable general equilibrium (CGE) models [**14****, 15**] built on the design of CIM-EARTH [**16****, 17****, 18**].

The outline of this paper is as follows. The next section gives a compressed description of the algorithms in the MINRES-QLP Pack. Section 3 recapitulates principles and benefits of RR, including examples of high-quality software and associated highly cited publications. In Section 4, we draw from our computational research experience and highlight a number of strategies and tools for robust software package development. Section 5 covers some existing but rare incentives and our initiatives on conducting a weekly seminar course and organizing themed meetings, which serves to educate or exchange ideas with computational researchers about RRR and SSS. In the last section, we conclude with thoughts for future work.

## 2 MINRES-QLP Pack

Most Krylov methods for solving large linear systems or linear least-squares problems have pervasive applications in science and engineering fields. Nonetheless, they usually assume nonsingular or full-rank matrices (or linear operators that are not explicitly represented as matrices). These methods are generally divided into two classes: Those for symmetric matrices (e.g., conjugate gradient, MINRES, SYMMLQ) and those for unsymmetric matrices (e.g., BCG, GMRES, QMR, BICGSTAB, LSQR, and IDR(s)). Such a division is largely due to the fact that historically the most prevalent matrix structure from practical applications is symmetric. However, other types of symmetry structures, notably complex symmetric, skew symmetric, and skew Hermitian matrices, are becoming increasingly common in modern applications. Often, these are treated as general unsymmetric matrices. In contrast, our algorithms take advantage of the symmetries to minimize computational costs such as memory and arithmetic operations.

On a singular linear system such as

where *ε* is the machine precision, the pseduoinverse solution is

Most of MATLAB’s Krylov solvers would abort or return an exploding solution. To
carefully handle singularity and exploit various symmetry structures, we have designed a suite
of MINRES-QLP [**19****, 20****, 21****, 22**]
algorithms, which can constructively reveal (numerical) singularity and compatibility of a given
linear system of equations; users do not have to know these properties a priori. (We are also
currently developing two related algorithms known as GMRES-QLP and GMRES-URV for
*unsymmetric* singular square systems; see [**19****, 23**].)

The key automatic steps of the MINRES-QLP algorithm are: **First**, detect special
symmetry with an efficient statistical test. **Second (optional)**, construct a
symmetric positive definite preconditioner *M* whose number of distinct nonzero
eigenvalues *M*^{–1/2} AM^{–1/2} is less than that of
*A* and hence number of iterations required in the next stage are reduced.
Without loss of generality, let *M* be the identity matrix in our subsequent
discussion. **Third**, in the *k*th iteration, for *k* =
1, …, min{*p, K*}, where *p* is the smallest positive
integer such that {*b, Ab*,…,
*A*^{p}*b*} is linearly dependent, and *K*
is a user-input positive integer, the algorithm searches for *k*th in the
*k*th Krylov subspace *K _{k}*(

*A, b*) = (

*b, Ab*,…,

*A*

^{k–1}

*b*), where

*x*is the minimum-length element of minimal-residual solutions for the problem

_{k}It is known that *x _{k}* exists and is unique. If either residual norms
||

*r*||

_{k}*= ||*

_{2}*b*–

*Ax*||

_{k}*or ||*

_{2}*Ar*||

_{k}*is smaller than a scalar multiple of a user-given tolerance, then the algorithm returns*

_{2}*x*as an approximant; otherwise the third step is repeated with

_{k}*k*incremented by one. We note that the square root

*M*

^{–1/2}is not explicitly constructed and all the quantities such as

*x*, ||

_{k}*r*||

_{k}*, and ||*

_{2}*Ar*||

_{k}*are accurately and efficiently computed by short recurrences.*

_{2}In the MINRES-QLP Pack, our goals are to have MATLAB routines well-documented, well-tested,
optimized for speed, accompanied by examples, and stored in a repository [**24**] where they can be modified, extended, and re-tested as needed by the
originators or any public members. Over time we will work to improve existing algorithms and add
new methods in the package. If resources permit, we will implement them in other languages such
as Fortran 90/95 [**21**] and PETSc [**25**], which is a parallel mathematical library capable of peta-scale
computations.

## 3 Reproducible research and challenges

RR for modern computational sciences as originally proposed by Claerbout is not simply a form
of ideological rhetoric. It actually has rather specific instructions. Claerbout encouraged
every author of computational research papers to mark each figure with the tags
[**ER**] for “easily reproducible,” [**CR**] for
“conditionally reproducible,” or [**NR**] for
“non-reproducible” to indicate the extent of reproducibility. To qualify for
[**ER**], a figure is generated and accompanied by programs with parameters and input
data, as well as “makefiles executable on Linux and portable to other operating
systems.” A conditionally reproducible figure is often time consuming to reproduce from
scratch and depends on the availability of computational resources such as memory, commercial
applications, or large input datasets. Non-reproducible figures may include images drawn or
photographs taken—not computed—to illustrate scientific ideas. Today there is no
reason not to extend the tags to other computational results or online resources such as
interactive figures, video clips, tables, or online databases produced or referenced by a report
[**26**].

Paraphrasing Claerbout, Buckheit and Donoho [**3**] stated
that “an article about computational result is *advertising*, not
scholarship. The actual **scholarship** comprises the **full software environment,
code** and data, that produced the result.” Donoho et al. produced multiple MATLAB
packages, WaveLab (1995), BeamLab (2004), SparseLab (2007), and MCALab (2008). Some of the most
highly cited papers Donoho co-authored such as [**27****, 28**] are accompanied by well-crafted MATLAB software. It is
reasonable to posit that reproducible research with quality code contributes significantly to
the citation statistics of the associated papers.

Nevertheless, we ask in practice, how reliable is RR? It would be at most as dependable as the strength and availability of its underlying theories, data, code, software, and hardware. Obviously, it is a “decreasing function” of time due to emergence of new software (versions), phasing out of old computer architecture, unaffordability of commercial software, or lack of quality software. Often, authors release only partial data, code fragments, or incomplete environment specification. It is not uncommon to see loss of data or code due to insufficiently frequent backup or unstable transfer processes. Algorithms of suboptimal design, with insufficient documentation, inadequate testing, and infrequent release of bug fixes—all understandable due to limited human resources or expertise—may render the computational software practically unusable even to the original creators not too much later in time. Modern complex applications with big data and big code, as well as emphasis on global team collaboration only add to these difficulties.

The process of software development by itself is often exceedingly challenging for programmers with moderate or even abundant experience. A common software development cycle such as the one shown on the left of Figure 1 is oversimplified. In practice, the “disorder” on the right of the same figure depicts much more accurately the potential multitude of cycles and transitions among the key phases. In the next section, we show how to restore partial order to the process via what we call the SSS principles.

## 4 Supportable scientific software: principles and tools

Flemisch’s 2013 survey [**9**] on reproducible
research received feedback from only about 10% of targeted recipients. Among those who
responded, more than 70% reported that the efforts to reproduce own or others’
computational results are between average to very high. The survey results also convey that RR
is perceived as “unrewarding” hard work. The survey suggested the central roles of
strategic tools such as the use of a repository (e.g., SVN, CVS, GIT, Mercurial). The survey
results are in line with our understanding and experiences. Despite the burden of RR, we
maintain our position and belief in the importance of enhancing its reliability.

We define supportable scientific software (SSS)^{1} as a
conceptual framework that encompasses a collection of software development methods for promoting
the reliability of reproducing provably correct results in computational sciences. There are
four critical elements of SSS [**29**]:

- Justified: Scientific software should be based on theoretical guarantees that specify
sufficient preconditions leading to success in satisfying error tolerance or uncertainty
level. There are lower and upper bounds of computational cost functions of input size
*n*in terms of memory or storage, as well as computational and communication time (these functions are preferably low-degree polynomials of*n*). - Efficient: In theory, complexity analysis is an important tool for gauging efficiency. Nevertheless, its nature is asymptotic. Even polynomial-time algorithms can be costly if they have large leading coefficients or are of high degrees. In practice, a profiler can be utilized to measure and pinpoint code lines that consume the most resources, such as time and memory. Code reuse is another measure of efficiency; it can be achieved through refactoring of duplicate code patterns into functions to save development time.
- Robust/Durable: A robust software package is tested under a wide variety of conditions and ranges of inputs. The design of a robust software package should ideally be durable under changing computing hardware and software environments. In addition, we recommend persisting and maintaining source code and research data via repositories with version control.
- Intelligible: At the minimum, we expect clear documentation of key algorithmic function and steps, inputs and outputs, parameters, usage examples, and related references.

From the practice of reproducible research in our work, we affirm that *RRR via SSS can
be done*. Many rudimentary examples already exist even though it is probably extremely
hard to find “a general solution” for a given scientific field. The tools we
highlight in Table 1 are available with MATLAB and
similar counterparts can be found with other modern programming languages such as C++, Java, and
Python, enabling researchers to develop functional and polished algorithms.

Justified | Efficient | Durable/Robust | Intelligible | |
---|---|---|---|---|

Repository | ✓ | |||

Editor (IDE) | ✓ | ✓ | ||

Code Analyzer | ✓ | ✓ | ✓ | |

Help Report, Contents Report | ✓ | |||

Searchable HTML documentation | ✓ | |||

Doctest | ✓ | |||

Unit Testing Framework | ✓ | |||

Debugger | ✓ | ✓ | ✓ | |

Profiler | ✓ | ✓ |

We recommend the use of a code repository for every project from the beginning. We also employ Integrated Development Environment (IDE) software. In the case of MATLAB, it has its own IDE built in Java, with visually appealing automated reports on code dependency, documentation evaluation, and performance analysis, which together can help pinpoint at line level or character position where the code base can be improved.

We emphasize on the importance of designing simple application programming interfaces (APIs) from the outset. If we were allowed to communicate only one thing to expert or application users about our implementation, our choice would be its API, which consists of function name, required and optional inputs, outputs, and our parsing schemes. Our algorithms parse user inputs and ensure that they fall in the correct ranges; in some cases we supply default values for missing optional parameters and gently correct invalid input values. We craft detailed and readable documentation in both text and searchable HTML formats with specification of APIs, syntax, default input values, examples of usage, and further references.

Lastly, to ensure the correctness of computational results from our software and examples in
the documentation, we create unit tests [**30**], doctests
[**31**], and stress tests. An effective unit test or
doctest would be demonstrative of API usage, and run in less than a small fraction of a second.
Without these tests in place, our experience is that even minor changes to our code or
documentation could quickly and unknowingly depart from the original design and theory. These
two kinds of tests effectively safeguard most changes we make to our algorithms. We adhere to
the following test practices:

Every time before code is checked into the repository, unit tests and doctests are run.

Every time a bug is found, unit tests are expanded.

All unit tests and doctests have to pass at 100% success rate for each code check-in and software release. The diligent execution of an expanding test suite serves to verify and strengthen the software functions, and to simplify the software development cycles; see Figure 2. On the other hand, stress tests check if the software performs under load.

## 5 Incentives and educational efforts for RRR and SSS

Practitioners of RRR and SSS need more incentives in terms of specialized software citation
[**32**], comprehensive journal review policies for software
publication such as ACM TOMS’ [**33**], and research
funding schemes such as NSF’s SSE & SSI program [**34**] that gives significant weightage to quality scientific software infrastructure at
all scales. We urge members of the scientific community to cite high-quality software and online
resources; the following are two examples of referencing the software GAIL [**10**] and MINRES-QLP community website [**24**] in BibTeX format:

```
@misc{CDHJZ14gail,
title={{GAIL}: {G}uaranteed {A}utomatic {I}ntegration {L}ibrary (Version 1.3),
{MATLAB} Software},
author={Choi, Sou-Cheng T. and Ding, Yuhan and Hickernell, Fred J. and Jiang, Lan and
Zhang, Yizhi},
year={2014},
howpublished={\url{http://code.google.com/p/gail/},
}
```

```
@misc{MinresqlpWebsite,
author={Choi, Sou-Cheng T.},
title={{MINRES-QLP} Project and Community Website},
howpublished={\url{http://code.google.com/p/minres-qlp/}},
year={2013},
notes={Website},
}
```

To maximize quality software’s exposure and usage, we support free software and open source movement, but are also open to having our software included in quality commercial software. There are also viable open source packages sustained by provision of commercial services.

In Summer 2013, we started a team devoted to development of a MATLAB research software package
GAIL [**11**] for numerical integration with guarantee of
accuracy. Our members role play a team of software engineers with specializations in three of
the critical software functions, namely testing, documentation, and release. We make use of a
private wiki to strengthen team communication, recording key steps of important scientific
routine process and procedures; maintaining checklists of day-to-day tasks; and sharing of
intermediate results, knowledge, and expertise. We prepare our students of applied mathematics
to become practitioners of RRR via SSS. As a beneficial side effect, as they gain software
engineering skills that are usually obtainable only in information-technology driven companies,
they become more attractive job candidates. In fact, one of our five student members was
employed as a part-time software test engineer the following semester.

We say, “*Those who can, do and teach*.” Principles and novel ideas
behind the research work would ideally be propagated in beginning and advanced courses in
scientific computing, whereas the practices of engineering supportable software promoted to
computational scientists and researchers through workshops such as WSSSPE [**35****, 36**] To this end, we started an
experimental seminar course, “Reliable Mathematical Software” (IIT MATH-573) [**29**] in our home institution. We drew teaching materials from
exemplary mathematical software such as Chebfun (http://www2.maths.ox.ac.uk/chebfun). The class met weekly for 75 minutes for a
total of 10 weeks; each time we demonstrated one or two tools in Table 1 with motivating scenarios and finished the last two weeks with each student
presenting a research algorithm implemented with some of the SSS principles. After the course, a
master student completed her thesis and five doctoral students are developing research papers
following the principles of RRR via SSS. In addition, we serve the computational community by
organizing a two-session minisymposium “Reliable Computational Science” in the
upcoming SIAM Annual Meeting 2014 (http://meetings.siam.org/program.cfm?CONFCODE=AN14) and co-authoring a
comprehensive summary report for WSSSPE1 [**35****, 36**].

## 6 Conclusions

We agree with the importance of reproducible research at all scales, and recommend to strengthen it with supportable scientific software development practices. In the short term, RRR via SSS may seem more time consuming. In the medium or long run, the investment of time and efforts gives valuable software a better chance to “survive” and make a greater impact in the research area. Some researchers may actually find that our approach of RRR via SSS provides to some extent a structured course for investigating and attacking complex computational problems. More reliable incremental research can be built on supportable software packages, leading to stably upward spirals of progress in scientific knowledge. As a community, we need more time to reflect, brainstorm, and experiment specific operations. We also need more ongoing dialogue on the topic in the community.