FluidSim: modular, object-oriented Python package for high-performance CFD simulations

The Python package fluidsim is introduced in this article as an extensible framework for Computational Fluid Mechanics (CFD) solvers. It is developed as a part of FluidDyn project (Augier et al., 2018), an effort to promote open-source and open-science collaboration within fluid mechanics community and intended for both educational as well as research purposes. Solvers in fluidsim are scalable, High-Performance Computing (HPC) codes which are powered under the hood by the rich, scientific Python ecosystem and the Application Programming Interfaces (API) provided by fluiddyn and fluidfft packages (Mohanan et al., 2018). The present article describes the design aspects of fluidsim, viz. use of Python as the main language; focus on the ease of use, reuse and maintenance of the code without compromising performance. The implementation details including optimization methods, modular organization of features and object-oriented approach of using classes to implement solvers are also briefly explained. Currently, fluidsim includes solvers for a variety of physical problems using different numerical methods (including finite-difference methods). However, this metapaper shall dwell only on the implementation and performance of its pseudo-spectral solvers, in particular the two- and three-dimensional Navier-Stokes solvers. We investigate the performance and scalability of fluidsim in a state of the art HPC cluster. Three similar pseudo-spectral CFD codes based on Python (Dedalus, SpectralDNS) and Fortran (NS3D) are presented and qualitatively and quantitatively compared to fluidsim. The source code is hosted at Bitbucket as a Mercurial repository bitbucket.org/fluiddyn/fluidsim and the documentation generated using Sphinx can be read online at fluidsim.readthedocs.io.


Introduction
Designed as a specialized package of the FluidDyn project for computational fluid mechanics (CFD), fluidsim is a comprehensive solution to address the needs of a fluid mechanics student and researcher alike -by providing scalable high performance solvers, on-the-fly postprocessing, and plotting functionalities under one umbrella. In the open-science paradigm, scientists will be both users and developers of the tools at the same time. An advantage of fluidsim is that, most of the users just have to read and write Python code. fluidsim ensures that all critical modules, classes and functions are well documented -both as inline comments and as standalone documentation, complete with examples and tutorials. For these reasons fluidsim can become a true collaborative code and has the potential to replace some in-house pseudo-spectral codes written in more conventional languages.

Balance between runtime efficiency and cost of development
In today's world where clusters are moving from petascale to exascale performance, computing power is aplenty. In most research scenarios, it becomes apparent that man-hours are more expensive than computing time. In other words, the cost of development outweighs the cost of computing time. Therefore, developers should be willing to make small sacrifices in efficiency, to improve development time, code maintainability and clarity in general. In exceptional cases when computing time become more expensive, one can still use existing optimized and performance-tuned compiled libraries implemented in native languages as the core, and rely on high level programming languages to extend them for specific applications. 1 For the above reasons, majority of fluidsim's codebase, in terms of line of code, is written using pure Python syntax. However, this is done without compromising performance, by making use of libraries such as Numpy, and optimized compilers such as Cython and Pythran.
Νumpy functions and data types are sufficient for applications such as initialization and postprocessing operations, since these functions are used sparingly. Computationally intensive tasks such as time-stepping and linear algebra operators which are used in every single iteration must be offloaded to compiled extensions. This optimization strategy can be considered as the computational equivalent of the Pareto principle, also known as the 80/20 rule. 2 The goal is to optimize such that "80 percent of the runtime is spent in 20 percent of the code" [10]. Here, Cython [3] and Pythran [8] compilers comes in handy. An example on how we use Pythran to reach similar performance than with Fortran by writing only Python code is described in the companion paper on fluidfft [11].
The result of using such an approach is studied in the forthcoming sections by measuring the performance of fluidsim. We will demonstrate that a very large percentage of the elapsed time is spent in the execution of optimized compiled functions and thus that the "Python cost" is negligible.

Target audiences
fluidsim is designed to cater to the needs of three kinds of audience.
• Users, who run simulations with already available solvers. To do this, one needs to have very basic skills in Python scripting. • Advanced users, who may extend fluidsim by developing customized solvers for new problems by taking advantage of built-in operators and time-stepping classes. In practice, one can easily implement such solvers by changing a few methods in base solver classes. To do this, one needs to have fairly good skills in Python and especially object-oriented programming. • Developers, who develop the base classes, in particular, the operators and time stepping classes. One may also sometime need to write compiled extensions to improve runtime performance. To do this, desirable traits include good knowledge in Python, Numpy, Cython and Pythran.
This metapaper is intended as a short introduction to fluidsim and its implementation, written mainly from a user-perspective. Nevertheless, we also discuss how fluidsim can be customized and extended with minimal effort to promote code reuse. A more comprehensive and hands-on look at how to use fluidsim can be found in the tutorials, 3 both from a user's and a developer's perspective. In the latter half of the paper, we shall also inspect the performance of fluidsim in large computing clusters and compare fluidsim with three different pseudo-spectral CFD codes.

Implementation and architecture
New features were added over the years to the package whenever demanded by research interests, thus making the code very user-centric and function-oriented. This aspect of code development is termed as YAGNI, one of the principles of agile programming software development method, which emphasizes not to spent a lot of time developing a functionality, because most likely you aren't gonna need it. Almost all functionalities in fluidsim are implemented as classes and its methods are designed to be modular and generic. This means that the user is free to use inheritance to modify certain parts to suit one's needs, and avoiding the risk of breaking a functioning code.

Package organization
fluidsim is meant to serve as a framework for numerical solvers using different methods. For the present version of fluidsim there is support for finite difference and pseudo-spectral methods. An example of a finite difference solver is fluidsim.solvers.ad1d which solves the 1D advection equation. There are also solvers which do not rely on most of the base classes, such as fluidsim.base.basilisk which implements a 2D adaptive meshing solver based on the CFD code Basilisk. The collection of solvers using pseudo-spectral methods are more feature-rich in comparison.
The code is organized into the following sub-packages: • fluidsim.base: contains all base classes and a solver for the trivial equation ∂ t u = 0. • fluidsim.operators: specialized linear algebra and numerical method operators (e.g., divergence, curl, variable transformations, dealiasing). • fluidsim.solvers: solvers and postprocessing modules for problems such as 1D advection, 2D and 3D Navier-Stokes equations (incompressible and under the Boussinesq approximation, with and without density stratification and system rotation), one-layer shallow water and Föppl-von Kármán equations. • fluidsim.util: utilities to load and modify an existing simulation, to test, and to benchmark a solver.
Subpackages base and operators form the backbone of this package, and are not meant to be used by the user explicitly. In practice, one can make an entirely new solver for a new problem using this framework by simply writing one or two importable files containing three classes: • an InfoSolver class, 4 containing the information on which classes will be used for the different tasks in the solver (time stepping, state, operators, output, etc.). • a Simulation class 5 describing the equations to be solved. • a State class 6 defining all physical variables and their spectral counterparts being solved (for example: u x and u y ) and methods to compute one variable from another.
We now turn our attention to the simulation object which illustrates how to access the code from the user's perspective.

The simulation object
The simulation object is instantiated with necessary parameters just before starting the time stepping. A simple 2D Navier-Stokes simulation can be launched using the following Python script: The script first invokes the create_default_ params classmethod which returns a Parameters object, typically named params containing all default parameters. Modifications to simulation parameters are made after this step, to meet the user's needs. The simulation object is then instantiated by passing params as the only argument, typically named sim, ready to start the iterations. As demonstrated above, parameters are stored in an object of the class Parameters, which uses the ParamsContainer API 7 of fluiddyn package [2]. Parameters for all possible modifications to initialization, preprocessing, forcing, time-stepping, and output of the solvers is incorporated into the object params in a hierarchical manner. Once initialized, the "public" (not hidden) API does not allow to add new parameters to this object and only modifications are permitted. 8 This approach is different from conventional solvers reliant on text-based input files to specify parameters, which is less robust and can cause the simulation to crash due to human errors during runtime. A similar, but less readable approach to ParamsContainer is adopted by OpenFOAM which relies on dictionaries to customize parameters. The params object can be printed on the Python or IPython console and explored interactively using tab-completion, and can be loaded from and saved into XML and HDF5 files, thus being very versatile.
Note that the same simulation object is used for the plotting and post-processing tasks. During or after the execution of a simulation, a simulation object can be created with the following code (to restart a simulation, one would rather use the function fluidsim. # to plot the space averaged quantities versus time sim.output.spatial_means.plot() # to load the corresponding data data = sim.output.spatial_means.load() # for a 2d plot of the variable "b" sim.output.phys_fields.plot("b", time=2) # to save a 2d animation sim.output.phys_fields.animate("b", tmin=1, tmax=5, save_file=True) Figure 1 demonstrates how an user can access different objects and its associated methods through the simulation object for the solver fluidsim.solvers.ns3d. Do note that the object and method names are similar, if not same, for other solvers. The purpose of the objects are listed below in the order of instantiation: • sim.params: A copy of the params object supplied by the user, which holds all information critical to run the simulation and generating the output. • sim.info_solver: Contains all the subclass information, including the module the class belongs to. • sim.info: A union of the sim.info_solver and the sim.params objects. forcing.enable is set as True and it computes the forcing variables, which is added on to right-handside of the equations being solved. • sim.preprocess: Adjusts solver parameters such as the magnitude of initialized fields, viscosity value and forcing rates after all other subclasses are initialized, and just before the time-integration starts.

Performance
Performance of a code can be carefully measured by three different program analysis methods: profiling, microbenchmarking and scalability analysis. Profiling traces various function calls and records the cumulative time consumed and the number of calls for each function. Through profiling, we shed light on what part of the code consumes the lion's share of the computational time and observe the impact as number of processes and MPI communications increase. Micro-benchmarking is the process of timing a specific portion of the code and comparing different implementations. The aspect is addressed in greater detail in the companion paper on fluidfft [11]. On the other hand, a scalability study measures how the speed of the code improves when it is deployed with multiple CPUs operating in parallel.
To do this, the walltime required by the whole code is measured to complete a certain number of iterations, given a problem size. Finally, performance can also be studied by comparing different codes on representative problems. Since such comparisons should not focus only on performance, we present a comparison study in a separate section. The profiling and scaling tests were conducted in a supercomputing cluster of the Swedish National Infrastructure for Computing (SNIC) namely Beskow (PDC, Stockholm). Relevant details regarding software and hardware which affect performance are summarised in Table 1. Note that no hyperthreading was used while carrying out the studies. The code comparisons were made on a smaller machine. Results from the three analyses are systematically studied in the following sections.

Profiling
It is straightforward to perform profiling with the help of the cProfile module, available in the Python standard library. For fluidsim, this module has been conveniently packaged into a command line utility, fluidsimprofile. Here, we have analyzed both 2D and 3D Navier-Stokes solvers in Beskow, and plotted the results in Figures 2 and 3 respectively. Functions which consume less than 2% of the total time are displayed within a single category, other.
In Figure 2 both sequential and parallel profiles of the 2D Navier-Stokes solver shows that majority of time is spent in inverse and forward FFT calls (ifft_as_ arg and fft_as_arg). For the sequential case, approximately 0.14% of the time is spent in pure Python functions, i.e. functions not built using Cython and Pythran. Cython extensions are responsible for interfacing with FFT operators and also for the time-step algorithm. Pythran extensions are used to translate most of the linear algebra operations into optimized, statically compiled extensions. We also see that only 1.3% of the time is not spent in the main six functions (category other in the figure). With 8 processes deployed in parallel, time spent in pure Python function increases to 1.1% of the total time. These results show that during the optimization process, we have to focus on a very small number of functions.
From Figure 3 it can be shown that, for the 3D Navier-Stokes solver, majority of time is attributed to FFT calls.
The overall time spent in pure Python function range from 0.001% for 512 3 grid points and 2 processes to 0.46% for 128 3 grid points and 8 processes. This percentage tends to increase with the number of processes used since the real calculation done in compiled extensions take less time. This percentage is also higher for the coarser resolution for the same reason. However, the time in pure Python remains for all cases largely negligible compared to the time spent in compiled extensions.      where n p,min is the minimum number of processes employed for a specific array size and hardware, α denotes the FFT class used and "fastest" corresponds to the fastest result among various FFT classes. In addition to number of processes, there is another important parameter, which is the size of the problem; in other words, the number of grid points used to discretize the problem at hand. In strong scaling analysis, we keep the global grid-size fixed and increase the number of processes. Ideally, this should yield a speedup which increases linearly with number of processes. Realistically, as the number of processes increase, so does the number of MPI communications, contributing to some latency in the overall time spent and thus resulting in less than ideal performance. Also, as shown by profiling in the previous section, majority of the time is consumed in making forward-and inverse-FFT calls, an inherent bottleneck of the pseudo-spectral approach. The FFT function calls are the source of most of the MPI calls during runtime, limiting the parallelism.

2D benchmarks
The Navier-Stokes 2D solver (fluidsim.solvers. ns2d) solving an initial value problem (with random fields) was chosen as the test case for strong scaling analysis here. The physical grid was discretized with 1024 × 1024 and 2048 × 2048 points. Fourth-order Runge-Kutta (RK4) method with a constant time-step was used for time-integration. File input-output and the forcing term has been disabled so as to measure the performance accurately. The test case is then executed for 20 iterations. The time elapsed was measured just before and after the sim.time_stepping.start() function call, which was then utilized to calculate the average walltime per iteration and speedup. This process is repeated for two different FFT classes provided by fluidfft, namely fft2d.mpi_with_fftw1d and fft2d. mpi_with_fftwmpi2d.
Although the FFT classes are different for sequential and parallel runs, for the sake of comparison, we have tabulated the actual walltime elapsed in Table 2. Clearly due to the differences in the implementations, for fftw1d-based classes it takes a performance hit with parallelization, while for classes based on fftw2d the performance improves.
In Figure 4 we have analyzed the strong scaling speedup S and walltime per iteration. The fastest result for a particular case is assigned the value S = n p as   mentioned earlier in Eq. 1. Ideal speedup is indicated with a dotted black line and it varies linearly with number of processes. We notice that for the 1024 × 1024 case there is an assured increasing trend in speedup for intra-nodes computation. Nevertheless, when this test case is solved with over a node (n p > 32); the speedup drops abruptly. While it may be argued that the speedup is impacted by the cost of inter-node MPI communications via network interfaces, that is not the case here. This is shown by speedup for the 2048 × 2048 case, where speedup increases from n p = 32 to 64, after which it drops again. It is thus important to remember that a decisive factor in pseudo-spectral simulations is the choice of the grid size, both global and local (perprocess), and for certain shapes the FFT calls can be exceptionally fast or vice-versa. From the above results, it may also be inferred that superior performance is achieved through the use of fft2d.mpi_with_fftwmpi2d as the FFT method. The fft2d.mpi_with_fftw1d method serves as a fallback option when either FFTW library is not compiled using MPI bindings or the domain decomposition results in zero-sized arrays. The latter is a known issue with the current version of fluidfft and fluidsim and requires further development. 10 This is the reason why the scalability of fft2d.mpi_with_fftwmpi2d seems to be restricted in Figure 4, whereas in reality it can be scalable as fft2d.mpi_with_fftw1d. To the right of Figure 4, the real-time or walltime required to perform a single iteration in seconds is found to vary inversely proportional to the number of processes, n p . The walltime per iteration ranges from 0.195 to 0.023 seconds for the 1024 × 1024 case, and from 0.128 to 0.051 seconds for the 2048 × 2048 case. Thus it is indeed feasible and scalable to use this particular solver.

3D benchmarks
Using a similar process as described in the previous section, the Navier-Stokes 3D solver (fluidsim. solvers.ns3d) is chosen to perform 3D benchmarks. As demonstrated in Figure 5 two physical global grids with 128 × 128 × 128 and 1024 × 1024 × 1024 are used to discretize the domain. Other parameters are identical to what was described for the 2D benchmarks.
Through fluidfft, this solver has four FFT methods at disposal: The first two methods implements a 1D or slab decomposition, i.e. the processes are distributed over one index of a 3D array. And the last two methods implement a 2D or pencil decomposition. For the sake of clarity, we have restricted this analysis to the fastest FFT method of the two types in this configuration, namely fft3d.mpi_ with_fftwmpi3d and fft3d.mpi_with_p3dfft. A more comprehensive study of the performance of these FFT methods can be found in [11].
In Figure 5 the strong scaling speedup and walltime per iteration are plotted from 3D benchmarks in Beskow. The analysis here is limited to single-node and internode performance. For both grid-sizes analyzed here, the fft3d.mpi_with_fftwmpi3d method is the fastest of all methods but limited in scalability because of the 1D domain decomposition strategy. To utilize a large number of processors, one requires the 2D decomposition approach. Also, note that for the 1024 × 1024 × 1024 case, a single-node measurement was not possible as the size of  the arrays required to run the solvers exceeds the available memory. For the same case, a speedup reasonably close to linear variation is observed with fft3d.mpi_with_ p3dfft. It is also shown that the walltime per iteration improved from 0.083 to 0.027 seconds for the 128 × 128 × 128 case, and from 31.078 to 2.175 seconds for the 1024 × 1024 × 1024 case.

CFD pseudo-spectral code comparisons
As a general CFD framework, fluidsim could be compared to OpenFOAM (a CFD framework based on finite-volume methods). However, in contrast to OpenFOAM, the current version of fluidsim is highly specialized in pseudo-spectral Fourier methods and it is not adapted for industrial CFD.
In this subsection, we compare fluidsim with three other open-source CFD pseudo-spectral codes 11 : • Dedalus [5] is "a flexible framework for spectrally solving differential equations". It is very versatile and the user describes the problem to be solved symbolically. This approach is very different than the one of fluidsim, where the equations are described with simple Numpy code. There is no equivalent of the fluidsim concept of a "solver", i.e. a class corresponding to a set of equations with specialized outputs (with the corresponding plotting methods). To run a simulation with Dedalus, one has to describe the problem using mathematical equations. This can be very convenient because it is very versatile and it is not necessary to understand how Dedalus works to define a new problem. However, this approach has also drawbacks: -Even for very standard problems, one needs to describe the problem in the launching script. -There is a potentially long initialization phase during which Dedalus processes the user input and prepares the "solver". -Even when a user knows how to define a problem symbolically, it is not simple to understand how the problem is solved by Dedalus and how to interact with the program with Python. -Since solvers are not implemented out-of-the-box in Dedalus, specialized forcing scheme or outputs are absent. For example, the user has to implement the computation, saving and plotting of standard outputs like energy spectra. • SpectralDNS [12] is a "high-performance pseudospectral Navier-Stokes DNS solver for triply periodic domains. The most notable feature of this solver is that it is written entirely in Python using Numpy, MPI for Python (mpi4py) and pyFFTW." Therefore, SpectralDNS is technically very similar to fluidsim. Some differences are that SpectralDNS has no object oriented API, and that the user has to define output and forcing in the launching script 12 , which are thus usually much longer than for fluidsim. Moreover, the parallel Fourier transforms are done using the Python package mpiFFT4py, which is only able to use the FFTW library and not other libraries as with fluidfft [11].
• NS3D is a highly efficient pseudo-spectral Fortran code. It has been written in the laboratory LadHyX and used for several studies involving simulations (in 3D and in 2D) of the Navier-Stokes equations under the Boussinesq approximation with stratification and system rotation [7]. It is in particular specialized in stability studies [4]. NS3D has been highly optimized and it is very efficient for sequential and parallel simulations (using MPI and OpenMP). However, the parallelization is limited to 1D decomposition for the FFT [11]. Another weakness compared to fluidsim is that NS3D uses simple binary files instead of HDF5 and NetCDF4 files for fluidsim. Therefore, visualization programs like Paraview or Visit cannot load NS3D data. As with many Fortran codes, Bash and Matlab are used for launching and post-processing, respectively. In terms of user experience, this can be a drawback compared to the coherent framework fluidsim for which the user works only with Python.
Description of the test. We have formulated the benchmarks tests that all the aforementioned codes solve the Navier-Stokes equations expressed in the • stream function-vorticity (ψ -ζ) formulation for the bidimensional simulations, where the hat (^) notation signifies spectral quantities, u, ν is the viscosity, and k represents the velocity and wavenumber vectors, and κ = |k| (See fluidsim. solvers.ns2d solver documentation). • rotational form for the tridimensional simulations, where the P ⊥ operator projects the vector field onto a divergence free plane (see fluidsim.solvers. ns3d solver documentation and [6]) and is equivalent to a pressure correction step.
We compare the code with a very simple and standard task, running a solver for ten time steps with the Runge-Kutta 4 method. The solvers in fluidsim relies on fluidfft to perform the dealiasing operation applied to the non-linear terms (say, N) to ensure that no aliasing errors emerge in the course of the simulation. This is implemented by setting zeros according to this expression.
where C d stands for the coefficient of dealiasing and has a typical value of 2/3. Other codes, such as spectralDNS and Dedalus allows instead to zero-pad the spectral arrays with typically 3/2 times more modes along each wavenumber axis, an optimization which can potentially save significant computation time. 13 For quantitative comparisons and for the sake of simplicity, we limit ourselves to compare only sequential runs. We have already discussed in detail, the issue of the scalability of pseudo-spectral codes based on Fourier transforms in the previous section and in the companion paper [11]. In contrast to the framework fluidsim for which it is easy to define a new solver for a new set of equations, NS3D is specialized in solving the Navier-Stokes equations under the Boussinesq approximation. Using NS3D to solve a new set of equations would require very deep changes in many places in the code.
Note that Dedalus does not implement the standard fully explicit RK4 method 14 . Thus for Dedalus, we use the most similar time stepping scheme available, RK443, a 4-stage, third-order mixed implicit-explicit scheme described in [1]. Note that in the other codes, part of the linear terms are also treated implicitly. Also note that in several cases, the upper bound of time step is not first limited by the stability of the time scheme, rather by other needs (to resolve the fastest wave, accuracy, etc.), so these benchmarks are representative of elapsed time for accurate real-life simulations.
Bidimensional simulations. We first compare the elapsed times for two resolutions (512 2 and 1024 2 ) over a bidimensional space. The results are summarized in Table 3. The results are consistent for the two resolutions. fluidsim is the fastest code for these cases. Dedalus is three times slower than fluidsim but as discussed earlier, the time stepping method is different. Another notable issue of Dedalus is a long initialization time (more than 3 min for 1024 2 compared to approximately 5 s for fluidsim). Also note that Dedalus has been optimized for bounded domains with Chebyshev methods. The two other codes SpectralDNS and NS3D have similar performance: slightly slower than fluidsim and much faster than Dedalus. The Fortran code NS3D is slower (by 47%) than the Python code fluidsim. This can be explained by the fact that there is no specialized numerical scheme for the 2D case in NS3D, so that more FFTs have to be performed compared to SpectralDNS and fluidsim. This highlights the importance of implementing a welladapted algorithm for a class of problems, which is much easier with a highly modular code as fluidsim than with a specialized code as NS3D.
Tridimensional simulations. We now compare the elapsed times for ten RK4 time steps for a tridimensional case with a resolution 128 3 . Dedalus is slow and does not seem to be adapted for this case so we do not give exact elapsed time for this code. SpectralDNS is slightly slower (11.55 s) than the two other codes (9.45 s for fluidsim and 9.52 s for NS3D). This difference is mainly explained by the slower FFTs for SpectralDNS. Figure 6 presents a more detailed comparison between NS3D (blue bars) and fluidsim (yellow bars). The total elapsed times is mainly spent in five tasks: FFTs, Runge-Kutta 4, curl, vector product and "projection". The times spent to perform these tasks are compared for the two codes.
We see that FFTs in NS3D are very fast: the FFT execution is 0.55 s longer for fluidsim (nearly 9% longer). This difference is especially significant for sequential run for which there is no communication cost involved in the FFT computation, thus making it the least favorable case   This difference can partially be explained by the fact that in NS3D, all FFTs are in-place (so the input can be erased during the transform). On one hand, this choice is good for performance and for a lower memory consumption. On the other hand, since the same variables are used to store the fields in real and in Fourier spaces, it makes the code harder to write, to understand and to modify. Since memory consumption in clusters is much less of a problem than in the past and that code simplicity is highly important for a framework like fluidsim, we choose to use out-of-place FFTs in fluidsim. Another factor is that the flag FFTW_PATIENT is used in NS3D which leads to very long initialization and sometimes faster FFTs. Since we did not see significant speed-up by using this flag in fluidsim and that we also care about initialization time, this flag is not used and we prefer to use the flag FFTW_ MEASURE, which usually leads to similar performance.
Time stepping in NS3D is significantly slower than in fluidsim (0.34 s ≃ 20% slower). We did not find a performance issue in NS3D. The linear operators are slightly faster in fluidsim than in the Fortran code NS3D. This is because this corresponds to Pythran functions written with explicit loops (see [11]).
Although the FFTs are a little bit faster for NS3D, the total time is slightly smaller (less than 1% of the total time) for fluidsim for this case.
These examples do not prove that fluidsim is always faster than NS3D or is as fast as any very well optimized Fortran codes. However, they do demonstrate that our very high-level and modular Python code is very efficient and is not slower than a well-optimized Fortran code.

Quality control
fluidsim also packages unit tests to go alongside the related modules. Throughout the development process it is made sure that all tests pass on a priority-basis, especially prior to a release, to ensure that new changes to package does not damage existing functionality.
A part of the unit tests checks the solvers for verifiable conservation laws such as, enstrophy and energy conservation due to non-dissipative terms in the governing equations. Thus we ensure the accuracy of the solutions and also to some degree prevent arithmetic and logic errors. The remainder of tests are meant to verify the robustness of the code for example, to avoid passing unexpected inputs between functions. These tests can prevent common error types such as syntax errors, I/O errors, run time errors and compilation errors (for the extensions).
It is also important to quantify the completeness of the tests, and this is done by calculating the code coverage. Code coverage is the ratio of the number of lines tested by unit tests over the total number of lines in the whole package. For the present version of fluidsim the code coverage is valued at approximately 70%. For fluidsim, the code coverage results are displayed at Codecov. fluidsim has also been checked by comparing its results for well-known verification problems, as for example the Taylor-Green vortices for the incompressible Navier-Stokes equations. 15 We also try to follow a consistent code style as recommended by PEP (Python enhancement proposals) 8 and 257. This is also inspected using lint checkers such as flake8 and pylint among the developers. The code is regularly cleaned up using the Python code formatter black.
All the above quality control techniques are implemented within the continuous testing solutions, Travis CI and Bitbucket Pipelines. Instructions on how to run unit tests, coverage and lint tests are included in the documentation.

Operating system
Any POSIX based OS, such as GNU/Linux and macOS.

Programming language
Python 2.7, 3.4 or above. For the next versions, we will drop Python 2.7 support and Python >= 3.6 will be required.