pydiffusion: A Python Library for Diffusion Simulation and Data Analysis

pydiffusion is a free and open-source Python library designed to solve diffusion problems for both single-phase and multi-phase binary systems. The key features of pydiffusion include fast simulation of multi-phase diffusion and extraction of diffusion coefficients from experimental concentration profiles using forward simulation analysis. pydiffusion also provides various mathematical models for diffusion profile smoothing, diffusion coefficient evaluation, and data optimization. In pydiffusion, diffusion profiles and various phases are easy to define or read from the experimental datasets. Visualization tools based on Matplotlib are also provided to help users present or refine their simulations and analysis. Funding statement: The development of pydiffusion is supported by the US National Science Foundation (NSF) under Grant number CMMI-1333999, and it is part of an NSF Designing Materials to Revolutionize and Engineer our Future (DMREF) project.


Introduction
Diffusion is the transport of matter from one point to another by thermal motion of atoms or molecules [1]. It is one of the fundamental processes in nature. Diffusion in solids was firstly discovered by Roberts Austen in the late 19 th century. Since then, various theories and models have been established to describe solid-state diffusion. Tremendous amount of diffusion data has been collected to better understand and predict diffusion associated phenomena such as precipitation, homogenization, solidification and creep deformation. In the past decades, many computational tools [2,3] are powerful enough to handle complex simulation problems. A fast and stable simulation tool can help researchers understand and optimize diffusion processes in materials.
As one of the fundamental properties in materials, diffusion coefficients are essential inputs for the establishment of mobility database and Integrated Computational Materials Engineering (ICME) [4] based material design and process optimization. Many high-throughput approaches [5] have been developed to establish various materials databases and accelerate materials design. Forward simulation analysis (FSA) is a powerful approach to extract diffusion coefficient data from diffusion couple experiments [6,7]. It has been applied to various diffusion systems and has demonstrated its robustness over the last several years [8][9][10][11][12][13].
The pydiffusion software package is an open-source Python library designed to simulate diffusion and analyse diffusion data using various mathematical and simulation models. Compared to commercialized simulation software like DICTRA [2] and PanDiffusion [3], pydiffusion focuses on simulation with diffusion coefficients data instead of mobility databases. The key feature of pydiffusion includes fast simulation of multi-phase systems and extraction of diffusion coefficients from experimental profiles. pydiffusion also provides easy constructors for diffusion systems and profile objects, which makes it easy to learn and perform diffusion simulations and data analysis quickly.

Implementation and architecture
The general architecture of pydiffusion is depicted in Figure 1. Diffusion profile and diffusion coefficients are the two basic objects in diffusion studies. They are represented as DiffProfile and DiffSystem object respectively in pydiffusion. The process from DiffSystem to DiffProfile is a diffusion simulation, the reversed process is diffusion coefficient (D ) extraction. Users can read/save DiffProfile and DiffSystem from/to csv files or plot them directly.

DiffProfile
The DiffProfile object is the fundamental representation of diffusion concentration profile data in pydiffusion. It includes the information of 1-dimensional (1D) grids with distance and composition data. For multi-phase diffusion profile, the positions of phase boundaries/interfaces are also included.
There are many types of profile data in diffusion analysis. For experimental raw data, DiffProfile can be constructed by providing composition and distance data. For the initial profile data used for a simulation, step() method is usually used to construct a step profile based on pre-constructed grids. 1D grids can be constructed using the mesh() function, which provides nonlinear/linear meshing of 1-dimensional grids. For convenience, pydiffusion also provides save_csv() and read_csv() methods to save/read DiffProfile to/from csv format files.

DiffSystem
The DiffSystem is a representation of diffusion coefficients data of a binary system. For each phase within the binary diffusion system, the DiffSystem includes its solubility range and a function representing its compositional dependence of interdiffusion coefficients, where spline functions are usually used. The DiffSystem can be constructed with a given spline function for each phase. User can also construct the DiffSystem with solubilities and diffusion coefficient data.
Like DiffProfile, save_csv() and read_csv() can save/read DiffSystem as well. Users can choose to either save DiffProfile and its corresponding DiffSystem together within one csv file or save them separately. mphSim() mphSim() is the core diffusion simulation method within pydiffusion. Its functionality is to use given diffusion coefficients to simulate diffusion process for a certain amount of time. The keyword arguments of mphSim() are def mphSim(profile, diffsys, time, liquid=0, output=True, name='') profile is the initial DiffProfile before simulation, diffsys is the DiffSystem during simulation, time is the diffusion time in seconds and liquid gives the position of liquid phase in the current geometry. By default, liquid=0 means there is no liquid phase during simulation. liquid=1/-1 is corresponding to liquid phase attached to left/right of the current geometry. The mphSim() method returns the simulation result as a DiffProfile object.
Within a single phase, diffusion can be simulated using the finite difference method based on Fick's two diffusion laws: For a binary diffusion system with multiple phases, phase boundary/interface is controlled by moving boundary condition: In mphSim(), phase boundaries/interfaces are independent from simulation grids, i.e. not attached to any grids, the geometry nearby a phase interface is illustrated in Figure 2. The sharp interface model is used so a phase interface is a point with no width. There are two composition values C α and C β assigned to each interface, which corresponds to the solubility limits of two adjacent phases α and β. The phase interface can move across any simulation grids following Eq. 2. Once a simulation grid is passed by an interface, it will be assigned to the adjacent phase. To ensure stability, the simulation time step Δt must be strictly under control. In every simulation loop within mphSim(), time step Δt is limited by the following three rules: 1. Within every phase, von Neumann analysis [14] gives the upper limit of simulation time step to prevent simulation instability.
In which d is the grid size, D is the interdiffusion coefficient. 2. Each phase interface can pass by one grid at most after each loop. The purpose of this rule is to help arrange grids easier by the end of each simulation loop. 3. For the grids nearby phase boundaries, the two rules above cannot keep their composition values within their solubility limits. An additional rule must be assigned. In Figure 2, C A < C α and C B < C β must be ensured after applying Fick's 2 nd law, which is: So Δt must satisfy: In diffusion with a liquid phase attached, diffusion coefficients in liquid phase (~5 × 10 -9 m 2 /s) is extremely high in comparison with a solid phase. Therefore, liquid can only be attached to the end of the geometry in mphSim(), which is controlled by the liquid parameter. During simulation, the liquid phase only provides a diffusion flux at the phase boundary/interface.
For diffusion simulations with thin films, users can setup the initial profile with a very thin phase on the left/right. mphSim() will delete such phase once it is consumed during simulation.
mphSim() can handle most of diffusion simulation for complex systems with multiple phases. For easy simulation of single-phase systems, users can also use sphSim() method, in which moving boundary condition is not considered.

datasmooth()
The datasmooth() function is responsible for smoothing experimental diffusion profile data. Better quality of diffusion coefficients can be extracted using Sauer-Freise equation [15] afterwards. datasmooth() uses moving average method to smooth diffusion profiles. For multi-phase systems, data within different phases are smoothed separately. Then sharp phase interfaces will be created between two adjacent phases to obtain more accurate solubility information of the multiple phases. This step is very important especially for diffusion systems in which diffusion coefficients change dramatically among the various phases [8,9,16]. datasmooth() function returns smoothed data as a DiffProfile object with much more grids compared with original one, which is done by interpolation after data smoothing.

Dmodel()
The Dmodel() function is used to create a DiffSystem object based on smoothed profile by the datasmooth() function. The output of DiffSystem will be used as the initial diffusion coefficients of upcoming FSA. Dmodel() will ask users to model each phase consequently. There are two interpolation methods to choose to represent the composition dependence of diffusion coefficients: splrep() and UnivariateSpline(), both of which belongs to the scipy.interpolate module. UnivariateSpline() is useful for smoothed data, while splrep() is suitable for more scattering data. in which, profile_exp is the DiffProfile contains raw experimental data, profile_sm is the output of datasmooth() on profile_exp, diffsys is a DiffSystem object represents initial guess of diffusion coefficients, which is usually the output of Dmodel(), and time is diffusion time in seconds. A step profile is used as initial profile for every simulation, Xlim is used as the composition values of the two ends if provided. In FSA, diffsys is used to simulate a diffusion process through mphSim() method and compared with profile_exp repeatedly. After each simulation, diffsys is modified using Dadjust() method. The difference between simulation and profile_exp will be calculated through comparing their concentration values at all grids. Once the difference is small enough, the simulation will stop. The final diffsys after several modifications will be output as FSA result.
Because mphSim() will be applied several times within FSA(), an accurate and efficient simulation is required for the FSA() method. Before the first simulation, grids will be arranged efficiently to accelerate all simulation runs. Based on Eq. 3, lower D requires smaller grid size while higher D affords larger grid size, so that higher time step Δt can be assigned to each simulation loop. Therefore, the simulation grid size d follows d ∝ D α based on profile_sm, in which α is a constant with a default value of 0.3 which has been tested well in various systems. This meshing process can be done by automesh() method. In FSA's keyword arguments, n is an integer list with length 2 which indicates the minimum and maximum number of grids produced by automesh(). More grids will help perform simulation with more details but at the expense of lower speed.
Parameter w is a list whose length equals to the number of phases. It provides the weights used to calculate differences between simulation and experimental profile profile_exp. By default, all phases have the same weight. This parameter is useful when diffusion data of multiple phases have different accuracy.
FSA()uses Dadjust() function to adjust diffsys after each simulation run. Users may select from Phase Mode and Point Mode to adjust diffusion coefficients during FSA. Phase Mode only increases or decreases all diffusion coefficients D together within one phase, and the shape of D doesn't change after adjustment. The Point Mode adjusts selected points across the solubility range individually; In other words, both the magnitude and shape of D curve changes. Compared to the Point Mode, the Phase Mode has less degree of freedom but higher stability.

Quality control
The core simulation mechanism in pydiffusion was developed in 2013 by Zhang and Zhao [6]. Diffusion simulation is hard to perform especially for those systems with multiple phases or highly composition dependent diffusion coefficients, such as Ti-(Mo,Nb,Ta), Cu-Zn and Mg-Al. Compared with the original Matlab implementation, pydiffusion has much higher stability and efficiency on simulating those binary systems. Usually, mphSim() function can complete a diffusion simulation within 30 seconds using default settings. For systems with highly composition dependent diffusion coefficients, the pydiffusion simulation speed is hundreds of times faster than the original Matlab code. In addition, the diffusion profiles simulated using pydiffusion also have much greater details, especially for thin-layer with extremely slow diffusion. This is due to the advanced meshing approach implemented in the pydiffusion package.
Sauer-Freise equation [15] is one of the approaches to extract interdiffusion coefficients from diffusion profiles. It is also a good validation for simulation reliability. Figure 3 shows this validation using diffusion coefficient data of the Ni-Mo system at 1100°C [6]. Solid line is the diffusion coefficients (DiffSystem) used in the simulation, while crosses represent Sauer-Freise calculation results based on simulated profile. We can observe that the original diffusion coefficients and Sauer-Freise results are almost identical. The deviation at two ends are unavoidable in simulations with finite difference method. The reproducible diffusion coefficients demonstrate the robustness of pydiffusion.
FSA has been applied to many alloy systems for the past several years [6,8]. Figure 3 illustrates the comparison between simulation and experimental data as a testing result of FSA(). Extracted diffusion coefficients is shown in Figure 3 for the Ni-Mo diffusion couple that was annealed at 1100°C for 800 h. Good agreements between simulation and raw data in Figure 3 validates the robustness of FSA in pydiffusion. pydiffusion has utilized FSA to extract reliable diffusion coefficients from various types of experiments, such as liquid-solid diffusion couples [17], two-step annealing diffusion multiples [9], and thin film diffusion couples [18].

(3) Reuse potential
As the core functionality of pydiffusion, diffusion simulation and diffusion coefficient evaluation are very useful in materials research and development, especially for diffusion (mobility) database establishment and ICMEbased materials design. Because a diffusion system is easy to define in pydiffusion, researchers can also utilize the simulation tools in this package to estimate diffusion length and phase growth for their future experiments. pydiffusion also provides various functions to help diffusion analysis, such as Matano plane calculation, the Hall method [22] to estimate the impurity diffusion coefficients, et al. More features will be added to the pydiffusion package in the future to provide more simulation and analysis tools. For example, the functions to implement 2-dimenstional (2D) numerical simulations will be provided to simulate diffusion in 2D. Various thermodynamics and kinetics modelling tools based on CALPHAD approach will also be included in pydiffusion so that users can perform mobility assessment using various diffusion datasets. These features will make pydiffusion a general and convenient tool to perform efficient and accurate diffusion data analysis and forward simulations. New users can find example documentation in the repository, which illustrates usage of the core functions in pydiffusion. Example datasets and scripts are also provided. All methods in pydiffusion provide full explanations in their documentation strings. Users who are interested in collaboration or seeking technical support are welcome to contact authors by email.