pycalphad: CALPHAD-based Computational Thermodynamics in Python

The pycalphad software package is a free and open-source Python library for designing thermodynamic models, calculating phase diagrams and investigating phase equilibria using the CALPHAD method. It provides routines for reading thermodynamic databases and solving the multi-component, multi-phase Gibbs energy minimization problem. The pycalphad software project advances the state of thermodynamic modeling by providing a flexible yet powerful interface for manipulating CALPHAD data and models. The key feature of the software is that the thermodynamic models of individual phases and their associated databases can be programmatically manipulated and overridden at run-time without modifying any internal solver or calculation code. Because the models are internally decoupled from the equilibrium solver and the models themselves are represented symbolically, pycalphad is an ideal tool for CALPHAD database development and model prototyping.


Introduction
Thermodynamics is the core of every physical description of nature. In recognition of this fact, and coincident with the rise of ubiquitous modern computing, the development of the CALculation of PHAse Diagrams (CALPHAD) method was proposed by Larry Kaufman and Himo Ansara in 1973 to rationalize and systematize alloy chemistry through the use of computer calculations [1]. In the decades since, there has been a tremendous effort by the scientific community to collect data to build thermodynamic descriptions for both metallic and non-metallic systems, descriptions of which have only increased in sophistication and accuracy as our understanding of the underlying physical phenomena have improved.
Better understanding of CALPHAD modeling and equilibrium calculation are necessary for advancing the modeling of phase stability in alloy systems and, for practical materials design problems, we need to have a high-quality software implementation flexible enough to admit these theoretical improvements. Such an implementation would serve as a testbed for new fundamental improvements to CALPHAD modeling. A natural place to start would be to contribute modifications to an existing CALPHAD software package. This unfortunately turns out not to be a feasible path. Because the source needs to be publicly available for modifications to be possible, we exclude the multitude of closed-source commercial packages available [2][3][4]. The availability of open-source CALPHAD packages is limited. The most notable opensource CALPHAD package is OpenCalphad [5], which is architecturally very similar to the commercial Thermo-Calc package. The quality of the implementation is high but the architecture of the system does not allow the kind of direct manipulation of phase models at run-time necessary for CALPHAD model prototyping and database development.
A more promising architecture using the Python programming language was developed in the Gibbs package [6]. In particular the symbolic construction (as opposed to "hard-coding") of phase models makes manipulation vastly simpler. However their implementation is much more limited; Gibbs lacks support for the compound energy formalism (CEF) [7]. Moreover, Gibbs' equilibrium calculation method is based on Quickhull [8], and Quickhull-based solvers will not perform well for multicomponent systems due to the poor scaling of general convex hull algorithms in high dimensions. (For energy minimization it is only necessary to compute the lower convex hull.) Because of these issues the majority of the core would have to be completely rewritten for our purposes. Development by the Gibbs team appears to have ceased after the original publication, so such a significant undertaking would have to be performed alone.
What is desired is something which takes the rigorous theoretical approach of Open-Calphad and combines it with the extensibility and modularity of the Python-based approach pioneered by the Gibbs package. This is the purpose of the pycalphad project.
The pycalphad software package is roughly 3000 lines of Python code designed to solve the multi-component, multi-phase Gibbs energy minimization problem with full support for the CEF. The key feature of pycalphad is that the thermodynamic models of individual phases and their associated databases can be programmatically manipulated and overridden at run-time, without modifying any internal solver or calculation code: the representation of the models is decoupled from the equilibrium solver, and the models themselves are represented symbolically. This makes pycalphad ideal for prototyping CALPHAD models and developing CALPHAD databases. Figure 1 depicts the general architecture of pycalphad. Model parameters and associated inputs are represented as a Database object. Using parameters from the given Database, a Model object is constructed for each phase containing the symbolic representation of that phase's energy function. These symbolic representations are then fed into the calculation engine to produce results for the user. Figure 2 illustrates the benefit of this modular architecture for creating custom phase models. Creating a custom model in pycalphad involves creating a subclass of the Model class. The key step is declaring the contributions class attribute.

Implementation and architecture
The attribute is a list of tuples, and each tuple contains a unique name and a class function name. The class function is called to construct the corresponding energetic contribution, with several contributions already defined in Model. In this case, the custom energetic contribution is the system temperature multiplied by the squared deviation from the equimolar composition of the phase. The CustomModel subclass can then be passed as an argument to calculate() and equilibrium().

Database
The Database object is the fundamental representation of CALPHAD data in pycalphad. It can be considered as the in-memory analogue to a thermodynamic database (TDB) file, and in fact pycalphad supports reading and writing a large subset of the TDB file format through the from_file() and to_file() methods of the Database object. For convenience, calling the Database constructor, e.g., Database('example. tdb'), will automatically call the appropriate parsing function. Database files can also be passed as multi-line strings; this is convenient for embedding TDB files in short Python scripts. The current version of pycalphad (0.4.2) only supports TDB files, but new file formats could be easily implemented without modifying any other code dependent on a Database since the objects are not coupled to any particular file format. Using this scheme it is unnecessary for the rest of pycalphad to know anything about how CALPHAD data is represented on disk. Model object is constructed for each phase and then fed into the calculation engine to produce results for the user.
Database exposes a phases attribute containing a Python dictionary mapping the name of a phase to an object which contains information about the sublattice model and phase constituents. It also exposes a search() function for finding model parameters satisfying certain criteria. These are the primary features used by the Model object to build the symbolic representation of a phase's thermodynamic model.

Model
A Model is an abstract representation of the molar Gibbs energy function of a phase. This representation is built around the computer algebra library SymPy [9], allowing the variables and arithmetic functions required by the CEF to be represented as an abstract graph of Python objects. For example, the operation 2 + 3x might internally be represented as Add(2,Mul(3,x)), with larger structures for more complicated models. For convenience, the library Model class defines several thermodynamic properties. By default, attributes are defined for the following, with Thermo-Calc-style abbreviations listed in parentheses (either are allowed): energy (GM), entropy (SM), enthalpy (SM), heat capacity (CPM), mixing energy (MIX GM), mixing entropy (MIX SM), mixing enthalpy (MIX HM), mixing heat capacity (MIX CPM), curie temperature (TC), and degree of ordering (DOO). It is also possible for users to define custom properties for particular purposes. These SymPy-based abstract graphs, as well as their exact first and second derivatives, are compiled to machine code on demand for computational efficiency.
SymPy's automatic code generation feature is used to provide users maximum flexibility since it offers the ease-of-use of working in Python without having to make a significant performance tradeoff, compared to working only in C, Fortran, or another low-level programming language.
By default the library Model class is used for all phases. It includes support for multi-component Redlich-Kister polynomials using the Muggianu ternary extension [10], the Inden-Hillert-Jarl magnetic model [11,12], and the order-disorder model for atomic ordering [13,14] For parametric model contributions, users can use the param_search argument defined in the function signature of every energetic contribution to query a Database for parameters satisfying some criteria. The Model class defines a redlich_kister_sum() convenience function to allow users to easily build multicomponent Redlich-Kister polynomials using parameters defined in a Database. For example, to construct the symbolic form of the mean magnetic moment of a phase in Redlich-Kister form, inside custom_energy() one could write test(self._array_validity)) ) mean_magnetic_moment = \ self.redlich_kister_sum(phase, param_search, bm_param_query) This code snippet will pull all the relevant magnetic parameters from the database, filtered by self._ array_validity to include only the declared components in our model.

calculate()
The calculate() function is the core property calculation routine of pycalphad. It does not concern itself with equilibrium at all -that is the responsibility of equilibrium() -but instead performs calculations for the case when all independent degrees of freedom, i.e., temperature, pressure, sublattice site fractions, are specified. for the FCC_A1 phase to use CustomModel, the liquid phase to use the library Model, and all other phases in the calculation to use YetAnotherModel (not defined here). The output keyword argument specifies the property to calculate; this is a string corresponding to an attribute of the library Model or a user-defined subclass of Model, as discussed above. For example, we write output='CPM' to indicate the molar heat capacity should be computed. If output is not specified, by default only the molar Gibbs energy is calculated.
The points keyword argument accepts a multidimensional array of shape (P, T, y), where P and T are pressures and temperatures at which to perform the calculation, and y is the number of sublattice site fractions. Site fractions are ordered by sublattice number, then alphabetically within a sublattice, e.g., . The T and P keyword arguments are the temperatures and pressures in Kelvin and pascals, respectively, for the calculation. (Specifically the units are whatever T and P mean in the phase model but, in the default Model, SI units are used.) Valid arguments are either a scalar or a one-dimensional array. **kwargs is a placeholder for other, less commonly used or experimental options; these are discussed in the pycalphad documentation linked from the GitHub repository. The return value of calculate() is a multi-dimensional labeled array. The return value of equilibrium() is a multi-dimensional labeled array. Regardless of the value of output, the result array will always include the equilibrium values of the molar Gibbs energy and chemical potentials since they are necessary to compute the solution.

Representation of results
The result of calls to calculate() and equilibrium() are xarray Dataset objects [15]. The xarray Dataset object makes handling labeled multi-dimensional arrays substantially simpler. Figure 3 shows the xarray summary of the result of a 2-D mapping calculation. The "Dimensions" line indicates the shape of the array, with each dimension having a label and corresponding size. In this case, equilibria at 170 temperatures and 100 compositions are computed for a two component system.
The "internal dof " dimension corresponds to the sublattice site fractions of a phase. For phases with fewer than the maximum number of internal degrees of freedom, the extra elements are filled with NaN (Not A Number). The "vertex" dimension corresponds to the vertices of a tie simplex (tie-line in binary systems). For single-phase regions, only the first vertex is valid and the others are filled with NaN.
The "Data Variables" section contains the actual result of the calculation, with the corresponding dimensions of each property array listed in parentheses, followed by the first few values. The "Phase" and "NP" arrays contain the names and fractions of phases, respectively, present under the corresponding conditions. All the properties we specify in the output keyword argument are included here. The xarray library makes selecting and slicing the data very easy; for example, to get all the Al chemical potentials at 600 K, we write eq.MU.sel(component='AL', T=600), where eq is the result array. Note that a current limitation is that the selection must correspond directly to a calculated value; automatic interpolation of the pressure, temperature, or composition is not currently implemented.
The "Attributes" section contains some metadata about the calculation such as the version of pycalphad used, the calculation date, and the solver iterations.
Datasets have functions for reading from and writing to disk, making storage of the results of long-running calculations easier. Interested users are encouraged to review the xarray documentation [15].

Quality control
Even as a relatively small project, pycalphad is sufficiently complex that it is necessary to implement strategies to avoid the regression, or accidental breakage, of features.
The key concepts to understand when managing and developing a complex software project are source code control (SCC) and continuous integration (CI). SCC is critical to verify the integrity of the project over time when admitting changes from multiple, scattered contributors, but it is useful even in single-contributor projects because SCC systems serve as a semi-automated project journal and backup system. This project uses the popular Git SCC system [16] to manage its source code. This allows the complete history of changes to be recorded for all released and unreleased versions of the software. Git also allows different versions of the software to be stored in separate "branches," allowing concurrent work on, e.g., new major features and bug fixes to existing versions. The Git repository is publicly available online at GitHub (see section 2). Git also extends into pycalphad's versioning system: major.minor. rev+N.gHASH, where HASH is the Git commit identifier of the latest commit in the master branch of the repository, and N is the number of commits ahead of the last public release. For public releases, everything after the + is omitted. For modifications which have not yet been committed, i.e., in a developer's local Git repository, the version identifier will be appended with dirty. CI is the approach of a project to simplify the software release process by testing code incrementally, i.e., every time a revision is made. This makes releasing new versions easier because the release manager can have some confidence that the quality of the code is above some automatically verified baseline. The pycalphad package has a suite of CI tests designed to verify that a revision to the code does not cause unintended behavior. These tests are run automatically every time a new revision is pushed to the Git repository on GitHub. If a test fails for any reason, a report is generated including all the error information. For example, there are tests to ensure that computed values of properties for several known systems do not change. The equilibrium solver, TDB reading and writing, and phase model construction code are also tested for consistency and accuracy. When a bug is reported and fixed in pycalphad, a minimal test case is added to the suite whenever possible to prevent the problem from appearing again in a future release. In total, about 80% of pycalphad, measured by lines of code, is currently tested, with the remainder involving unreachable or experimental code, or code which is difficult to test in an automated fashion, e.g., plotting code.

Operating system
A version of Linux, OSX, or Windows capable of running a supported version of Python is required.
Programming language Python 2.7+ or Python 3.4+ is required.

Additional system requirements
At least 2 GB of RAM is recommended.

Dependencies
• gcc, MinGW or Microsoft Visual C++ compiler and toolchain • matplotlib [17] • numpy ≥ 1.9 [18] • scipy [18] • sympy [19] • xarray [15] • pyparsing [20] • tinydb Figure 4: Equilibrium chemical potential of Fe as a function of Al composition in the Al-Fe system at 600 K, computed using pycalphad. Each point is color-coded with the corresponding stable phase; coexistence regions can be identied by the chemical potential remaining at across a range of composition. The end-points of such an iso-potential region can be directly connected to the corresponding tie-line at the given temperature.

Figure 5:
Phase diagram of the Al-Fe system according to the COST 507 database, computed using pycalphad. The solid black lines in the B2 region correspond to lines of constant "degree of ordering" in the B2 phase. The grey dashed line is the Curie temperature. The bcc ordering transition is second-order since the degree of ordering is continuously changing with respect to composition and temperature. Some lines in the diagram are not smooth due to the coarseness of the grid used in the calculation; mapping in pycalphad is still experimental.

(3) Reuse potential
Al-Fe is chosen as an example system because the COST 507 database [21] containing this subsystem is publicly available, and it allows us to test several pycalphad features simultaneously since the system contains singlesublattice solution phases, multi-sublattice ordered phases, phases with magnetic ordering and stoichiometric compounds. The pycalphad package can perform computations for any number of components; we restrict our example to a binary system only for the simplicity of visualization. Figure 4 shows the result of a one-dimensional ("step") equilibrium calculation at 600 K. The equilibrium chemical potential of Fe is shown as a function of Al composition. Each point is color-coded with the corresponding stable phase; coexistence regions can be identified by the chemical potential remaining flat across a range of composition. The end-points of such an iso-potential region can be directly connected to the corresponding tie-line at the given temperature. The source code for this calculation can be found in Figure 6. Figure 5 shows the phase diagram of the Al-Fe system according to the COST 507 database. The solid black lines in the B2 region correspond to lines of constant "degree of ordering" in the B2 phase. The grey dashed line is the Curie temperature. It is clear from the diagram that the bcc ordering transition is second-order since the degree of ordering is continuously changing with respect to composition and temperature. Some lines in the diagram are not smooth due to the coarseness of the grid used in the calculation; mapping in pycalphad is still experimental. The source code for this calculation can be found in Figure 7.
Those interested in collaborating on pycalphad or seeking support should contact the present authors