(1) Overview
Introduction
Designed as a specialized package of the FluidDyn project for computational fluid mechanics (CFD), fluidsim is a comprehensive solution to address the needs of a fluid mechanics student and researcher alike — by providing scalable high performance solvers, onthefly postprocessing, and plotting functionalities under one umbrella. In the openscience paradigm, scientists will be both users and developers of the tools at the same time. An advantage of fluidsim is that, most of the users just have to read and write Python code. fluidsim ensures that all critical modules, classes and functions are well documented — both as inline comments and as standalone documentation, complete with examples and tutorials. For these reasons fluidsim can become a true collaborative code and has the potential to replace some inhouse pseudospectral codes written in more conventional languages.
Balance between runtime efficiency and cost of development
In today’s world where clusters are moving from petascale to exascale performance, computing power is aplenty. In most research scenarios, it becomes apparent that manhours are more expensive than computing time. In other words, the cost of development outweighs the cost of computing time. Therefore, developers should be willing to make small sacrifices in efficiency, to improve development time, code maintainability and clarity in general. In exceptional cases when computing time become more expensive, one can still use existing optimized and performancetuned compiled libraries implemented in native languages as the core, and rely on high level programming languages to extend them for specific applications.^{1}
For the above reasons, majority of fluidsim’s codebase, in terms of line of code, is written using pure Python syntax. However, this is done without compromising performance, by making use of libraries such as Numpy, and optimized compilers such as Cython and Pythran.
Numpy functions and data types are sufficient for applications such as initialization and postprocessing operations, since these functions are used sparingly. Computationally intensive tasks such as timestepping and linear algebra operators which are used in every single iteration must be offloaded to compiled extensions. This optimization strategy can be considered as the computational equivalent of the Pareto principle, also known as the 80/20 rule.^{2} The goal is to optimize such that “80 percent of the runtime is spent in 20 percent of the code” []. Here, Cython [] and Pythran [] compilers comes in handy. An example on how we use Pythran to reach similar performance than with Fortran by writing only Python code is described in the companion paper on fluidfft [].
The result of using such an approach is studied in the forthcoming sections by measuring the performance of fluidsim. We will demonstrate that a very large percentage of the elapsed time is spent in the execution of optimized compiled functions and thus that the “Python cost” is negligible.
Target audiences
fluidsim is designed to cater to the needs of three kinds of audience.
 Users, who run simulations with already available solvers. To do this, one needs to have very basic skills in Python scripting.
 Advanced users, who may extend fluidsim by developing customized solvers for new problems by taking advantage of builtin operators and timestepping classes. In practice, one can easily implement such solvers by changing a few methods in base solver classes. To do this, one needs to have fairly good skills in Python and especially objectoriented programming.
 Developers, who develop the base classes, in particular, the operators and time stepping classes. One may also sometime need to write compiled extensions to improve runtime performance. To do this, desirable traits include good knowledge in Python, Numpy, Cython and Pythran.
This metapaper is intended as a short introduction to fluidsim and its implementation, written mainly from a userperspective. Nevertheless, we also discuss how fluidsim can be customized and extended with minimal effort to promote code reuse. A more comprehensive and handson look at how to use fluidsim can be found in the tutorials,^{3} both from a user’s and a developer’s perspective. In the latter half of the paper, we shall also inspect the performance of fluidsim in large computing clusters and compare fluidsim with three different pseudospectral CFD codes.
Implementation and architecture
New features were added over the years to the package whenever demanded by research interests, thus making the code very usercentric and functionoriented. This aspect of code development is termed as YAGNI, one of the principles of agile programming software development method, which emphasizes not to spent a lot of time developing a functionality, because most likely you aren’t gonna need it.
Almost all functionalities in fluidsim are implemented as classes and its methods are designed to be modular and generic. This means that the user is free to use inheritance to modify certain parts to suit one’s needs, and avoiding the risk of breaking a functioning code.
Package organization
fluidsim is meant to serve as a framework for numerical solvers using different methods. For the present version of fluidsim there is support for finite difference and pseudospectral methods. An example of a finite difference solver is fluidsim.solvers.ad1d which solves the 1D advection equation. There are also solvers which do not rely on most of the base classes, such as fluidsim.base.basilisk which implements a 2D adaptive meshing solver based on the CFD code Basilisk. The collection of solvers using pseudospectral methods are more featurerich in comparison.
The code is organized into the following subpackages:
 fluidsim.base: contains all base classes and a solver for the trivial equation ∂_{t}u = 0.
 fluidsim.operators: specialized linear algebra and numerical method operators (e.g., divergence, curl, variable transformations, dealiasing).
 fluidsim.solvers: solvers and postprocessing modules for problems such as 1D advection, 2D and 3D NavierStokes equations (incompressible and under the Boussinesq approximation, with and without density stratification and system rotation), onelayer shallow water and Föpplvon Kármán equations.
 fluidsim.util: utilities to load and modify an existing simulation, to test, and to benchmark a solver.
Subpackages base and operators form the backbone of this package, and are not meant to be used by the user explicitly. In practice, one can make an entirely new solver for a new problem using this framework by simply writing one or two importable files containing three classes:
 an InfoSolver class,^{4} containing the information on which classes will be used for the different tasks in the solver (time stepping, state, operators, output, etc.).
 a Simulation class^{5} describing the equations to be solved.
 a State class^{6} defining all physical variables and their spectral counterparts being solved (for example: u_{x} and u_{y}) and methods to compute one variable from another.
We now turn our attention to the simulation object which illustrates how to access the code from the user’s perspective.
The simulation object
The simulation object is instantiated with necessary parameters just before starting the time stepping. A simple 2D NavierStokes simulation can be launched using the following Python script:
from fluidsim.solvers.ns2d.solver import Simul
params = Simul.create_default_params()
# Modify parameters as needed
sim = Simul(params)
sim.time_stepping.start()
The script first invokes the create_default_paramsclassmethod which returns a Parameters object, typically named params containing all default parameters. Modifications to simulation parameters are made after this step, to meet the user’s needs. The simulation object is then instantiated by passing params as the only argument, typically named sim, ready to start the iterations.
As demonstrated above, parameters are stored in an object of the class Parameters, which uses the ParamsContainer API^{7} of fluiddyn package []. Parameters for all possible modifications to initialization, preprocessing, forcing, timestepping, and output of the solvers is incorporated into the object params in a hierarchical manner. Once initialized, the “public” (not hidden) API does not allow to add new parameters to this object and only modifications are permitted.^{8} This approach is different from conventional solvers reliant on textbased input files to specify parameters, which is less robust and can cause the simulation to crash due to human errors during runtime. A similar, but less readable approach to ParamsContainer is adopted by OpenFOAM which relies on dictionaries to customize parameters. The params object can be printed on the Python or IPython console and explored interactively using tabcompletion, and can be loaded from and saved into XML and HDF5 files, thus being very versatile.
Note that the same simulation object is used for the plotting and postprocessing tasks. During or after the execution of a simulation, a simulation object can be created with the following code (to restart a simulation, one would rather use the function fluidsim.load_for_restart.):
from fluidsim import load_sim_for_plot
# in the directory of the simulation
sim = load_sim_for_plot()
# or with the path of the simulation# sim = load_sim_for_plot(“~/Sim_data/NS2D.strat_240x240_S8x8_20180420_134554”)
# to retrieve the value of a parameter
print(f”viscosity = {sim.params.nu_2}”)
# to plot the space averaged quantities versus time
sim.output.spatial_means.plot()
# to load the corresponding data
data = sim.output.spatial_means.load()
# for a 2d plot of the variable “b”
sim.output.phys_fields.plot(“b”, time=2)
# to save a 2d animation
sim.output.phys_fields.animate(“b”, tmin=1, tmax=5, save_file=True)
Figure 1 demonstrates how an user can access different objects and its associated methods through the simulation object for the solver fluidsim.solvers.ns3d. Do note that the object and method names are similar, if not same, for other solvers. The purpose of the objects are listed below in the order of instantiation:
 sim.params: A copy of the params object supplied by the user, which holds all information critical to run the simulation and generating the output.
 sim.info_solver: Contains all the subclass information, including the module the class belongs to.
 sim.info: A union of the sim.info_solver and the sim.params objects.
 sim.oper: Responsible for generating the grid, and numerical methods such as FFT, IFFT, dealiasing, divergence, curl, random arrays, etc.

sim.output: Takes care of all onthefly postprocessing outputs and functions to load and plot saved output files. Different objects are assigned with tasks of loading, plotting and sometimes computing:
 – sim.output.print_stdout: the mean energies, time elapsed and timestep of the simulation printed as console output.
 – sim.output.phys_fields: the state variables in the physical plane. It relies on sim.state to load or compute the variables into arrays.
 – sim.output.spatial_means: mean quantities such as energy, enstrophy, forcing power, dissipation.
 – sim.output.spectra: energy spectra as line plots (i.e. as functions of the module or a component of the wavenumber).
 – sim.output.spect_energy_budg: spectral energy budget by calculating the transfer term.
 – sim.output.increments: structure functions from physical velocity fields.
 sim.state: Defines the names of the physical variables being solved for and their spectral equivalents, along with all required variable transformations. Also includes highlevel objects, aptly named sim.state.state_phys and sim.state.state_spect to hold the arrays.
 sim.time_stepping: Generic numeric timeintegration object which dynamically determines the timestep using the CFL criterion for specific solver and advances the state variables using RungeKutta method of order 2 or 4.
 sim.init_fields: Used only once to initialize all state variables from a previously generated output file or with simple kinds of flow structures, for example a dipole vortex, base flow with constant value for all gridpoints, grid of vortices, narrowband noise, etc.
 sim.forcing: Initialized only when params.forcing.enable is set as True and it computes the forcing variables, which is added on to righthandside of the equations being solved.
 sim.preprocess: Adjusts solver parameters such as the magnitude of initialized fields, viscosity value and forcing rates after all other subclasses are initialized, and just before the timeintegration starts.
Such a modular organization of the solver’s features has several advantages. The most obvious one, will be the ease of maintaining the code base. As opposed to a monolithic solver, modular codes are well separated and leads to less conflicts while merging changes from other developers. Secondly, with this implementation, it is possible to extend or replace a particular set of features by inheriting or defining a new class.
Modular codes can be difficult to navigate and understand the connection between objects and the classes in static languages. It is much less a problem with Python where one can easily decipher this information from object attributes or using IPython’s dynamic object information feature.^{9} Now, fluidsim goes one step further and one can effortlessly print the sim.info_solver object in the Python/IPython console, to get this information. A truncated example of the output is shown below.
>>> sim.info_solver
<fluidsim.solvers.ns3d.solver.InfoSolverNS3D object at 0x7fb6278263c8>
<solver class_name=”Simul” module_name=”fluidsim.solvers.ns3d.solver”
short_name=”ns3d”>
<classes>
<Operators class_name=”OperatorsPseudoSpectral3D”
module_name=”fluidsim.operators.operators3d”/>
<State class_name=”StateNS3D” keys_computable=”[‘rotz’]”
keys_linear_eigenmodes=”[‘rot_fft’]” keys_phys_needed=”[‘vx’, ‘vy’,
‘vz’]” keys_state_phys=”[‘vx’, ‘vy’, ‘vz’]”
keys_state_spect=”[‘vx_fft’, ‘vy_fft’, ‘vz_fft’]”
module_name=”fluidsim.solvers.ns3d.state”/>
<TimeStepping class_name=”TimeSteppingPseudoSpectralNS3D”
module_name=”fluidsim.solvers.ns3d.time_stepping”/>
<InitFields class_name=”InitFieldsNS3D”
module_name=”fluidsim.solvers.ns3d.init_fields”>
<classes>
<from_file class_name=”InitFieldsFromFile”
module_name=”fluidsim.base.init_fields”/>
<from_simul class_name=”InitFieldsFromSimul”
module_name=”fluidsim.base.init_fields”/>
<in_script class_name=”InitFieldsInScript”
module_name=”fluidsim.base.init_fields”/>
<constant class_name=”InitFieldsConstant”
module_name=”fluidsim.base.init_fields”/>
<!––truncated output––>
</classes>
</solver>
Note that while the 3D NavierStokes solver relies on some generic base classes, such as Operators PseudoSpectral3D and TimeSteppingPseudo Spectral, shared with other solvers for other purposes there are solverspecific classes. The latter is often inherited from the base classes in fluidsim.base or classes available in other solvers — this made possible by the use of an objectoriented approach. This is particularly advantageous while extending existing features or creating new solvers to use class inheritance.
Performance
Performance of a code can be carefully measured by three different program analysis methods: profiling, microbenchmarking and scalability analysis. Profiling traces various function calls and records the cumulative time consumed and the number of calls for each function. Through profiling, we shed light on what part of the code consumes the lion’s share of the computational time and observe the impact as number of processes and MPI communications increase. Microbenchmarking is the process of timing a specific portion of the code and comparing different implementations. The aspect is addressed in greater detail in the companion paper on fluidfft []. On the other hand, a scalability study measures how the speed of the code improves when it is deployed with multiple CPUs operating in parallel. To do this, the walltime required by the whole code is measured to complete a certain number of iterations, given a problem size. Finally, performance can also be studied by comparing different codes on representative problems. Since such comparisons should not focus only on performance, we present a comparison study in a separate section.
The profiling and scaling tests were conducted in a supercomputing cluster of the Swedish National Infrastructure for Computing (SNIC) namely Beskow (PDC, Stockholm). Relevant details regarding software and hardware which affect performance are summarised in Table 1. Note that no hyperthreading was used while carrying out the studies. The code comparisons were made on a smaller machine. Results from the three analyses are systematically studied in the following sections.
Cluster  Beskow (Cray XC40 system with Aries interconnect) 
CPU  Intel Xeon CPU E5–2695v4, 2.1GHz 
Operating System  SUSE Linux Enterprise Server 11, Linux Kernel 3.0.101 
No. of cores per nodes used  32 
Maximum no. of nodes used  32 (2D cases), 256 (3D cases) 
Compilers  CPython 3.6.5, Intel C++ Compiler (icpc) 18.0.0 
Python packages  fluiddyn 0.2.3, fluidfft 0.2.3, fluidsim 0.2.1, numpy (OpenBLAS) 1.14.2, Cython 0.28.1, mpi4py 3.0.0, pythran 0.8.5 
Profiling
It is straightforward to perform profiling with the help of the cProfile module, available in the Python standard library. For fluidsim, this module has been conveniently packaged into a command line utility, fluidsimprofile. Here, we have analyzed both 2D and 3D NavierStokes solvers in Beskow, and plotted the results in Figures 2 and 3 respectively. Functions which consume less than 2% of the total time are displayed within a single category, other.
In Figure 2 both sequential and parallel profiles of the 2D NavierStokes solver shows that majority of time is spent in inverse and forward FFT calls (ifft_as_arg and fft_as_arg). For the sequential case, approximately 0.14% of the time is spent in pure Python functions, i.e. functions not built using Cython and Pythran. Cython extensions are responsible for interfacing with FFT operators and also for the timestep algorithm. Pythran extensions are used to translate most of the linear algebra operations into optimized, statically compiled extensions. We also see that only 1.3% of the time is not spent in the main six functions (category other in the figure). With 8 processes deployed in parallel, time spent in pure Python function increases to 1.1% of the total time. These results show that during the optimization process, we have to focus on a very small number of functions.
From Figure 3 it can be shown that, for the 3D NavierStokes solver, majority of time is attributed to FFT calls. The overall time spent in pure Python function range from 0.001% for 512^{3} grid points and 2 processes to 0.46% for 128^{3} grid points and 8 processes. This percentage tends to increase with the number of processes used since the real calculation done in compiled extensions take less time. This percentage is also higher for the coarser resolution for the same reason. However, the time in pure Python remains for all cases largely negligible compared to the time spent in compiled extensions.
Scalability
Scalability can be quantified by speedup S which is a measure of the time taken to complete N iterations for different number of processes, n_{p}. Speedup is formally defined here as:
where n_{p,min} is the minimum number of processes employed for a specific array size and hardware, α denotes the FFT class used and “fastest” corresponds to the fastest result among various FFT classes. In addition to number of processes, there is another important parameter, which is the size of the problem; in other words, the number of grid points used to discretize the problem at hand. In strong scaling analysis, we keep the global gridsize fixed and increase the number of processes. Ideally, this should yield a speedup which increases linearly with number of processes. Realistically, as the number of processes increase, so does the number of MPI communications, contributing to some latency in the overall time spent and thus resulting in less than ideal performance. Also, as shown by profiling in the previous section, majority of the time is consumed in making forward and inverseFFT calls, an inherent bottleneck of the pseudospectral approach. The FFT function calls are the source of most of the MPI calls during runtime, limiting the parallelism.
2D benchmarks
The NavierStokes 2D solver (fluidsim.solvers.ns2d) solving an initial value problem (with random fields) was chosen as the test case for strong scaling analysis here. The physical grid was discretized with 1024 × 1024 and 2048 × 2048 points. Fourthorder RungeKutta (RK4) method with a constant timestep was used for timeintegration. File inputoutput and the forcing term has been disabled so as to measure the performance accurately. The test case is then executed for 20 iterations. The time elapsed was measured just before and after the sim.time_stepping.start() function call, which was then utilized to calculate the average walltime per iteration and speedup. This process is repeated for two different FFT classes provided by fluidfft, namely fft2d.mpi_with_fftw1d and fft2d.mpi_with_fftwmpi2d.
Although the FFT classes are different for sequential and parallel runs, for the sake of comparison, we have tabulated the actual walltime elapsed in Table 2. Clearly due to the differences in the implementations, for fftw1dbased classes it takes a performance hit with parallelization, while for classes based on fftw2d the performance improves.
FFT class  Time (n_{p} = 1)  FFT class  Time (n_{p} = 2) 

fft2d.with_fftw1d  6.63  fft2d.mpi_with_fftw1d  7.63 
fft2d.with_fftw2d  5.59  fft2d.mpi_with_fftwmpi2d  3.91 
In Figure 4 we have analyzed the strong scaling speedup S and walltime per iteration. The fastest result for a particular case is assigned the value S = n_{p} as mentioned earlier in Eq. 1. Ideal speedup is indicated with a dotted black line and it varies linearly with number of processes. We notice that for the 1024 × 1024 case there is an assured increasing trend in speedup for intranodes computation. Nevertheless, when this test case is solved with over a node (n_{p} > 32); the speedup drops abruptly. While it may be argued that the speedup is impacted by the cost of internode MPI communications via network interfaces, that is not the case here. This is shown by speedup for the 2048 × 2048 case, where speedup increases from n_{p} = 32 to 64, after which it drops again. It is thus important to remember that a decisive factor in pseudospectral simulations is the choice of the grid size, both global and local (perprocess), and for certain shapes the FFT calls can be exceptionally fast or viceversa.
From the above results, it may also be inferred that superior performance is achieved through the use of fft2d.mpi_with_fftwmpi2d as the FFT method. The fft2d.mpi_with_fftw1d method serves as a fallback option when either FFTW library is not compiled using MPI bindings or the domain decomposition results in zerosized arrays. The latter is a known issue with the current version of fluidfft and fluidsim and requires further development.^{10} This is the reason why the scalability of fft2d.mpi_with_fftwmpi2d seems to be restricted in Figure 4, whereas in reality it can be scalable as fft2d.mpi_with_fftw1d. To the right of Figure 4, the realtime or walltime required to perform a single iteration in seconds is found to vary inversely proportional to the number of processes, n_{p}. The walltime per iteration ranges from 0.195 to 0.023 seconds for the 1024 × 1024 case, and from 0.128 to 0.051 seconds for the 2048 × 2048 case. Thus it is indeed feasible and scalable to use this particular solver.
3D benchmarks
Using a similar process as described in the previous section, the NavierStokes 3D solver (fluidsim.solvers.ns3d) is chosen to perform 3D benchmarks. As demonstrated in Figure 5 two physical global grids with 128 × 128 × 128 and 1024 × 1024 × 1024 are used to discretize the domain. Other parameters are identical to what was described for the 2D benchmarks.
Through fluidfft, this solver has four FFT methods at disposal:
 fft3d.mpi_with_fftw1d
 fft3d.mpi_with_fftwmpi3d
 fft3d.mpi_with_p3dfft
 fft3d.mpi_with_pfft
The first two methods implements a 1D or slab decomposition, i.e. the processes are distributed over one index of a 3D array. And the last two methods implement a 2D or pencil decomposition. For the sake of clarity, we have restricted this analysis to the fastest FFT method of the two types in this configuration, namely fft3d.mpi_with_fftwmpi3d and fft3d.mpi_with_p3dfft. A more comprehensive study of the performance of these FFT methods can be found in [].
In Figure 5 the strong scaling speedup and walltime per iteration are plotted from 3D benchmarks in Beskow. The analysis here is limited to singlenode and internode performance. For both gridsizes analyzed here, the fft3d.mpi_with_fftwmpi3d method is the fastest of all methods but limited in scalability because of the 1D domain decomposition strategy. To utilize a large number of processors, one requires the 2D decomposition approach. Also, note that for the 1024 × 1024 × 1024 case, a singlenode measurement was not possible as the size of the arrays required to run the solvers exceeds the available memory. For the same case, a speedup reasonably close to linear variation is observed with fft3d.mpi_with_p3dfft. It is also shown that the walltime per iteration improved from 0.083 to 0.027 seconds for the 128 × 128 × 128 case, and from 31.078 to 2.175 seconds for the 1024 × 1024 × 1024 case.
CFD pseudospectral code comparisons
As a general CFD framework, fluidsim could be compared to OpenFOAM (a CFD framework based on finitevolume methods). However, in contrast to OpenFOAM, the current version of fluidsim is highly specialized in pseudospectral Fourier methods and it is not adapted for industrial CFD.
In this subsection, we compare fluidsim with three other opensource CFD pseudospectral codes^{11}:

Dedalus [] is “a flexible framework for spectrally solving differential equations”. It is very versatile and the user describes the problem to be solved symbolically. This approach is very different than the one of fluidsim, where the equations are described with simple Numpy code. There is no equivalent of the fluidsim concept of a “solver”, i.e. a class corresponding to a set of equations with specialized outputs (with the corresponding plotting methods). To run a simulation with Dedalus, one has to describe the problem using mathematical equations. This can be very convenient because it is very versatile and it is not necessary to understand how Dedalus works to define a new problem. However, this approach has also drawbacks:
 – Even for very standard problems, one needs to describe the problem in the launching script.
 – There is a potentially long initialization phase during which Dedalus processes the user input and prepares the “solver”.
 – Even when a user knows how to define a problem symbolically, it is not simple to understand how the problem is solved by Dedalus and how to interact with the program with Python.
 – Since solvers are not implemented outofthebox in Dedalus, specialized forcing scheme or outputs are absent. For example, the user has to implement the computation, saving and plotting of standard outputs like energy spectra.

SpectralDNS [] is a “highperformance pseudospectral NavierStokes DNS solver for triply periodic domains. The most notable feature of this solver is that it is written entirely in Python using Numpy, MPI for Python (mpi4py) and pyFFTW.”
Therefore, SpectralDNS is technically very similar to fluidsim. Some differences are that SpectralDNS has no object oriented API, and that the user has to define output and forcing in the launching script^{12}, which are thus usually much longer than for fluidsim. Moreover, the parallel Fourier transforms are done using the Python package mpiFFT4py, which is only able to use the FFTW library and not other libraries as with fluidfft []. 
NS3D is a highly efficient pseudospectral Fortran code. It has been written in the laboratory LadHyX and used for several studies involving simulations (in 3D and in 2D) of the NavierStokes equations under the Boussinesq approximation with stratification and system rotation []. It is in particular specialized in stability studies []. NS3D has been highly optimized and it is very efficient for sequential and parallel simulations (using MPI and OpenMP). However, the parallelization is limited to 1D decomposition for the FFT []. Another weakness compared to fluidsim is that NS3D uses simple binary files instead of HDF5 and NetCDF4 files for fluidsim. Therefore, visualization programs like Paraview or Visit cannot load NS3D data.
As with many Fortran codes, Bash and Matlab are used for launching and postprocessing, respectively. In terms of user experience, this can be a drawback compared to the coherent framework fluidsim for which the user works only with Python.
Description of the test. We have formulated the benchmarks tests that all the aforementioned codes solve the NavierStokes equations expressed in the
 stream functionvorticity (ψ – ζ) formulation for the bidimensional simulations,$\begin{array}{c}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}{\partial}_{t}\widehat{\zeta}=(\widehat{u\cdot \nabla \zeta})\nu {\kappa}^{2}\widehat{\zeta}\\ {\kappa}^{2}\widehat{\psi}=\widehat{\zeta}\\ \widehat{u}=ik\hspace{0.17em}\hspace{0.17em}\times \hspace{0.17em}\hspace{0.17em}{\widehat{e}}_{{k}_{z}}\widehat{\psi}\end{array}$
where the hat (^) notation signifies spectral quantities, u, ν is the viscosity, and k represents the velocity and wavenumber vectors, and κ = k (See fluidsim.solvers.ns2d solver documentation).  rotational form for the tridimensional simulations,${\partial}_{t}\widehat{u}={P}_{\perp}\widehat{(u\times \zeta )}\nu {\kappa}^{2}u$
where the P_{⊥} operator projects the vector field onto a divergence free plane (see fluidsim.solvers.ns3d solver documentation and []) and is equivalent to a pressure correction step.
We compare the code with a very simple and standard task, running a solver for ten time steps with the RungeKutta 4 method. The solvers in fluidsim relies on fluidfft to perform the dealiasing operation applied to the nonlinear terms (say, $\widehat{\text{N}}$ ) to ensure that no aliasing errors emerge in the course of the simulation. This is implemented by setting zeros according to this expression.
where C_{d} stands for the coefficient of dealiasing and has a typical value of 2/3. Other codes, such as spectralDNS and Dedalus allows instead to zeropad the spectral arrays with typically 3/2 times more modes along each wavenumber axis, an optimization which can potentially save significant computation time.^{13} For quantitative comparisons and for the sake of simplicity, we limit ourselves to compare only sequential runs. We have already discussed in detail, the issue of the scalability of pseudospectral codes based on Fourier transforms in the previous section and in the companion paper [].
In contrast to the framework fluidsim for which it is easy to define a new solver for a new set of equations, NS3D is specialized in solving the NavierStokes equations under the Boussinesq approximation. Using NS3D to solve a new set of equations would require very deep changes in many places in the code.
Note that Dedalus does not implement the standard fully explicit RK4 method^{14}. Thus for Dedalus, we use the most similar time stepping scheme available, RK443, a 4stage, thirdorder mixed implicitexplicit scheme described in []. Note that in the other codes, part of the linear terms are also treated implicitly. Also note that in several cases, the upper bound of time step is not first limited by the stability of the time scheme, rather by other needs (to resolve the fastest wave, accuracy, etc.), so these benchmarks are representative of elapsed time for accurate reallife simulations.
Bidimensional simulations. We first compare the elapsed times for two resolutions (512^{2} and 1024^{2}) over a bidimensional space. The results are summarized in Table 3. The results are consistent for the two resolutions. fluidsim is the fastest code for these cases. Dedalus is three times slower than fluidsim but as discussed earlier, the time stepping method is different. Another notable issue of Dedalus is a long initialization time (more than 3 min for 1024^{2} compared to approximately 5 s for fluidsim). Also note that Dedalus has been optimized for bounded domains with Chebyshev methods. The two other codes SpectralDNS and NS3D have similar performance: slightly slower than fluidsim and much faster than Dedalus. The Fortran code NS3D is slower (by 47%) than the Python code fluidsim. This can be explained by the fact that there is no specialized numerical scheme for the 2D case in NS3D, so that more FFTs have to be performed compared to SpectralDNS and fluidsim. This highlights the importance of implementing a welladapted algorithm for a class of problems, which is much easier with a highly modular code as fluidsim than with a specialized code as NS3D.
fluidsim  Dedalus  SpectralDNS  NS3D  

512^{2}  0.51  1.53  0.92  0.82 
1024^{2}  2.61  8.00  3.48  3.96 
Tridimensional simulations. We now compare the elapsed times for ten RK4 time steps for a tridimensional case with a resolution 128^{3}. Dedalus is slow and does not seem to be adapted for this case so we do not give exact elapsed time for this code. SpectralDNS is slightly slower (11.55 s) than the two other codes (9.45 s for fluidsim and 9.52 s for NS3D). This difference is mainly explained by the slower FFTs for SpectralDNS.
Figure 6 presents a more detailed comparison between NS3D (blue bars) and fluidsim (yellow bars). The total elapsed times is mainly spent in five tasks: FFTs, RungeKutta 4, curl, vector product and “projection”. The times spent to perform these tasks are compared for the two codes.
We see that FFTs in NS3D are very fast: the FFT execution is 0.55 s longer for fluidsim (nearly 9% longer). This difference is especially significant for sequential run for which there is no communication cost involved in the FFT computation, thus making it the least favorable case for fluidsim. Indeed, MPI communications are inputoutput bounded tasks which are not faster in Fortran than in Python.
This difference can partially be explained by the fact that in NS3D, all FFTs are inplace (so the input can be erased during the transform). On one hand, this choice is good for performance and for a lower memory consumption. On the other hand, since the same variables are used to store the fields in real and in Fourier spaces, it makes the code harder to write, to understand and to modify. Since memory consumption in clusters is much less of a problem than in the past and that code simplicity is highly important for a framework like fluidsim, we choose to use outofplace FFTs in fluidsim. Another factor is that the flag FFTW_PATIENT is used in NS3D which leads to very long initialization and sometimes faster FFTs. Since we did not see significant speedup by using this flag in fluidsim and that we also care about initialization time, this flag is not used and we prefer to use the flag FFTW_MEASURE, which usually leads to similar performance.
Time stepping in NS3D is significantly slower than in fluidsim (0.34 s ≃ 20% slower). We did not find a performance issue in NS3D. The linear operators are slightly faster in fluidsim than in the Fortran code NS3D. This is because this corresponds to Pythran functions written with explicit loops (see []).
Although the FFTs are a little bit faster for NS3D, the total time is slightly smaller (less than 1% of the total time) for fluidsim for this case.
These examples do not prove that fluidsim is always faster than NS3D or is as fast as any very well optimized Fortran codes. However, they do demonstrate that our very highlevel and modular Python code is very efficient and is not slower than a welloptimized Fortran code.
Quality control
fluidsim also packages unit tests to go alongside the related modules. Throughout the development process it is made sure that all tests pass on a prioritybasis, especially prior to a release, to ensure that new changes to package does not damage existing functionality.
A part of the unit tests checks the solvers for verifiable conservation laws such as, enstrophy and energy conservation due to nondissipative terms in the governing equations. Thus we ensure the accuracy of the solutions and also to some degree prevent arithmetic and logic errors. The remainder of tests are meant to verify the robustness of the code for example, to avoid passing unexpected inputs between functions. These tests can prevent common error types such as syntax errors, I/O errors, run time errors and compilation errors (for the extensions).
It is also important to quantify the completeness of the tests, and this is done by calculating the code coverage. Code coverage is the ratio of the number of lines tested by unit tests over the total number of lines in the whole package. For the present version of fluidsim the code coverage is valued at approximately 70%. For fluidsim, the code coverage results are displayed at Codecov. fluidsim has also been checked by comparing its results for wellknown verification problems, as for example the TaylorGreen vortices for the incompressible NavierStokes equations.^{15} We also try to follow a consistent code style as recommended by PEP (Python enhancement proposals) 8 and 257. This is also inspected using lint checkers such as flake8 and pylint among the developers. The code is regularly cleaned up using the Python code formatter black.
All the above quality control techniques are implemented within the continuous testing solutions, Travis CI and Bitbucket Pipelines. Instructions on how to run unit tests, coverage and lint tests are included in the documentation.
(2) Availability
Operating system
Any POSIX based OS, such as GNU/Linux and macOS.
Programming language
Python 2.7, 3.4 or above. For the next versions, we will drop Python 2.7 support and Python >= 3.6 will be required.
Dependencies
List of contributors
 Ashwin Vishnu Mohanan (KTH): Development of the shallow water equations solver, fluidsim.solvers.sw1l, testing, continuous integration, code coverage and documentation.
 Cyrille Bonamy (LEGI): Extending the subpackage fluidsim.operators.fft (currently deprecated) into a dedicated package, fluidfft used by fluidsim solvers.
 Miguel Calpe (LEGI): Development of the 2D Boussinesq equation solver, fluidsim.solvers.ns2d.strat.
 Pierre Augier (LEGI): Creator of fluidsim and FluidDyn project, developer of majority of the modules and solvers, futureproofing with Python 3 compatibility and documentation.
Software location
Archive
Name: PyPI
Persistent identifier:https://pypi.org/project/fluidsim
Licence: CeCILL, a free software license adapted to both international and French legal matters, in the spirit of and retaining compatibility with the GNU General Public License (GPL).
Publisher: Pierre Augier
Version published: 0.2.2
Date published: 02/07/2018
Code repository
Name: Bitbucket
Persistent identifier:https://bitbucket.org/fluiddyn/fluidsim
Licence: CeCILL
Date published: 2015
Emulation environment
Name: Docker
Persistent identifier:https://hub.docker.com/r/fluiddyn/python3stable
Licence: CeCILLB, a BSD compatible French licence.
Date published: 02/10/2017
Language
English
(3) Reuse potential
fluidsim can be used in research and teaching to run numerical simulations with existing solvers. Its simplicity of use and its plotting capacities make it particularly adapted for teaching. fluidsim is used at LEGI and at KTH for studies on geophysical turbulence (see for example []). Since it is easy to modify any characteristics of the existing solvers or to build new solvers, fluidsim is a good tool to carry out other types of simulations for academic studies. The qualities and advantages of fluidsim (integration with the Python ecosystem and the FluidDyn project, documentation, reliability — thanks to unit tests and continuous integration —, versatility, efficiency and scalability) make us think that fluidsim can become a true collaborative code.
There is no formal support mechanism. However, bug reports can be submitted at the Issues page on Bitbucket. Discussions and questions can be aired on instant messaging channels in Riot (or equivalent with Matrix protocol)^{16} or via IRC protocol on Freenode at #fluiddynusers. Discussions can also be exchanged via the official mailing list.^{17}