pycalphad: CALPHAD-based Computational Thermodynamics in Python

Richard Otis; Zi-Kui Liu

(1) Overview

Introduction

Thermodynamics is the core of every physical description of nature. In recognition of this fact, and coincident with the rise of ubiquitous modern computing, the development of the CALculation of PHAse Diagrams (CALPHAD) method was proposed by Larry Kaufman and Himo Ansara in 1973 to rationalize and systematize alloy chemistry through the use of computer calculations []. In the decades since, there has been a tremendous effort by the scientific community to collect data to build thermodynamic descriptions for both metallic and non-metallic systems, descriptions of which have only increased in sophistication and accuracy as our understanding of the underlying physical phenomena have improved.

Better understanding of CALPHAD modeling and equilibrium calculation are necessary for advancing the modeling of phase stability in alloy systems and, for practical materials design problems, we need to have a high-quality software implementation flexible enough to admit these theoretical improvements. Such an implementation would serve as a testbed for new fundamental improvements to CALPHAD modeling. A natural place to start would be to contribute modifications to an existing CALPHAD software package. This unfortunately turns out not to be a feasible path. Because the source needs to be publicly available for modifications to be possible, we exclude the multitude of closed-source commercial packages available [, , ]. The availability of open-source CALPHAD packages is limited. The most notable open-source CALPHAD package is OpenCalphad [], which is architecturally very similar to the commercial Thermo-Calc package. The quality of the implementation is high but the architecture of the system does not allow the kind of direct manipulation of phase models at run-time necessary for CALPHAD model prototyping and database development.

A more promising architecture using the Python programming language was developed in the Gibbs package []. In particular the symbolic construction (as opposed to “hard-coding”) of phase models makes manipulation vastly simpler. However their implementation is much more limited; Gibbs lacks support for the compound energy formalism (CEF) []. Moreover, Gibbs’ equilibrium calculation method is based on Quickhull [], and Quickhull-based solvers will not perform well for multicomponent systems due to the poor scaling of general convex hull algorithms in high dimensions. (For energy minimization it is only necessary to compute the lower convex hull.) Because of these issues the majority of the core would have to be completely rewritten for our purposes. Development by the Gibbs team appears to have ceased after the original publication, so such a significant undertaking would have to be performed alone.

What is desired is something which takes the rigorous theoretical approach of Open-Calphad and combines it with the extensibility and modularity of the Python-based approach pioneered by the Gibbs package. This is the purpose of the pycalphad project.

The pycalphad software package is roughly 3000 lines of Python code designed to solve the multi-component, multi-phase Gibbs energy minimization problem with full support for the CEF. The key feature of pycalphad is that the thermodynamic models of individual phases and their associated databases can be programmatically manipulated and overridden at run-time, without modifying any internal solver or calculation code: the representation of the models is decoupled from the equilibrium solver, and the models themselves are represented symbolically. This makes pycalphad ideal for prototyping CALPHAD models and developing CALPHAD databases.

Implementation and architecture

Figure 1 depicts the general architecture of pycalphad. Model parameters and associated inputs are represented as a Database object. Using parameters from the given Database, a Model object is constructed for each phase containing the symbolic representation of that phase’s energy function. These symbolic representations are then fed into the calculation engine to produce results for the user.

Figure 1

General architecture of the pycalphad software package. Using parameters from the given Database, a Model object is constructed for each phase and then fed into the calculation engine to produce results for the user.

Figure 2 illustrates the benefit of this modular architecture for creating custom phase models. Creating a custom model in pycalphad involves creating a subclass of the Model class. The key step is declaring the contributions class attribute.

Figure 2

Creating a custom model in pycalphad involves creating a subclass of the Model class. The key step is declaring the contributions class attribute. The CustomModel subclass can then be passed as an argument to calculate() and equilibrium().

The attribute is a list of tuples, and each tuple contains a unique name and a class function name. The class function is called to construct the corresponding energetic contribution, with several contributions already defined in Model. In this case, the custom energetic contribution is the system temperature multiplied by the squared deviation from the equimolar composition of the phase. The CustomModel subclass can then be passed as an argument to calculate() and equilibrium().

Database

The Database object is the fundamental representation of CALPHAD data in pycalphad. It can be considered as the in-memory analogue to a thermodynamic database (TDB) file, and in fact pycalphad supports reading and writing a large subset of the TDB file format through the from_file() and to_file() methods of the Database object. For convenience, calling the Database constructor, e.g., Database(’example.tdb’), will automatically call the appropriate parsing function. Database files can also be passed as multi-line strings; this is convenient for embedding TDB files in short Python scripts. The current version of pycalphad (0.4.2) only supports TDB files, but new file formats could be easily implemented without modifying any other code dependent on a Database since the objects are not coupled to any particular file format. Using this scheme it is unnecessary for the rest of pycalphad to know anything about how CALPHAD data is represented on disk.

Database exposes a phases attribute containing a Python dictionary mapping the name of a phase to an object which contains information about the sublattice model and phase constituents. It also exposes a search() function for finding model parameters satisfying certain criteria. These are the primary features used by the Model object to build the symbolic representation of a phase’s thermodynamic model.

Model

A Model is an abstract representation of the molar Gibbs energy function of a phase. This representation is built around the computer algebra library SymPy [], allowing the variables and arithmetic functions required by the CEF to be represented as an abstract graph of Python objects. For example, the operation 2 + 3x might internally be represented as Add(2,Mul(3,x)), with larger structures for more complicated models. For convenience, the library Model class defines several thermodynamic properties. By default, attributes are defined for the following, with Thermo-Calc-style abbreviations listed in parentheses (either are allowed): energy (GM), entropy (SM), enthalpy (SM), heat capacity (CPM), mixing energy (MIX GM), mixing entropy (MIX SM), mixing enthalpy (MIX HM), mixing heat capacity (MIX CPM), curie temperature (TC), and degree of ordering (DOO). It is also possible for users to define custom properties for particular purposes. These SymPy-based abstract graphs, as well as their exact first and second derivatives, are compiled to machine code on demand for computational efficiency. SymPy’s automatic code generation feature is used to provide users maximum flexibility since it offers the ease-of-use of working in Python without having to make a significant performance tradeoff, compared to working only in C, Fortran, or another low-level programming language.

By default the library Model class is used for all phases. It includes support for multi-component Redlich-Kister polynomials using the Muggianu ternary extension [], the Inden-Hillert-Jarl magnetic model [, ], and the order-disorder model for atomic ordering [, ]

For parametric model contributions, users can use the param_search argument defined in the function signature of every energetic contribution to query a Database for parameters satisfying some criteria. The Model class defines a redlich_kister_sum() convenience function to allow users to easily build multi-component Redlich-Kister polynomials using parameters defined in a Database. For example, to construct the symbolic form of the mean magnetic moment of a phase in Redlich-Kister form, inside custom_energy() one could write

from tinydb import where
bm_param_query = (
           (where(’phase_name’) == phase.name) & \
           (where(’parameter_type’) == ’BMAGN’) & \
           (where(’constituent_array’).
           test(self._array_validity))
)
mean_magnetic_moment = \
          self.redlich_kister_sum(phase, param_search,
          bm_param_query)

This code snippet will pull all the relevant magnetic parameters from the database, filtered by self._array_validity to include only the declared components in our model.

calculate()

The calculate() function is the core property calculation routine of pycalphad. It does not concern itself with equilibrium at all – that is the responsibility of equilibrium() – but instead performs calculations for the case when all independent degrees of freedom, i.e., temperature, pressure, sublattice site fractions, are specified. The most important arguments of calculate() are

def calculate(dbf, comps, phases, output=’GM’,
                       model=None, points=None,
                       T=None, P=None, **kwargs)

dbf is the Database containing the relevant parameters, comps is a list of desired components for the calculation, and phases is a list of desired phases. (Users can get a list of all phases using list(dbf.phases.keys()).) By default calculate() will compute the GM property of all phases, but users can specify any property defined by the phase model, including properties defined in custom models. By default the library Model class is used for all phases.

Custom models can be specified via the model keyword argument. For example, calculate(dbf, comps, phases, model=CustomModel) overrides the default model for all phases in that energy calculation. To override only a specific phase’s model, we write model={’FCC_A1’: CustomModel} to override the model for the FCC_A1 phase. More sophisticated formulations are also possible. We can use model=[{’FCC_A1’: CustomModel, ’LIQUID’: Model}, YetAnotherModel] for the FCC_A1 phase to use CustomModel, the liquid phase to use the library Model, and all other phases in the calculation to use YetAnotherModel (not defined here). The output keyword argument specifies the property to calculate; this is a string corresponding to an attribute of the library Model or a user-defined subclass of Model, as discussed above. For example, we write output=’CPM’ to indicate the molar heat capacity should be computed. If output is not specified, by default only the molar Gibbs energy is calculated.

The points keyword argument accepts a multi-dimensional array of shape (P, T, y), where P and T are pressures and temperatures at which to perform the calculation, and y is the number of sublattice site fractions. Site fractions are ordered by sublattice number, then alphabetically within a sublattice, e.g., $y A l 0, y N i 0, y C r 0, y M o 0, y N i 1, y N b 1$ M1 \documentclass[10pt]{article} \usepackage{wasysym} \usepackage[substack]{amsmath} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage[mathscr]{eucal} \usepackage{mathrsfs} \usepackage{pmc} \usepackage[Euler]{upgreek} \pagestyle{empty} \oddsidemargin -1.0in \begin{document} \[ y_{Al}^0,{\rm{ }}y_{Ni}^0,{\rm{ }}y_{Cr}^0,{\rm{ }}y_{Mo}^0,{\rm{ }}y_{Ni}^1,{\rm{ }}y_{Nb}^1 \] \end{document} . If the same site fractions are meant to be used for all temperatures and pressures in the calculation, the P and T dimensions can be omitted from the array. For multi-phase calculations, users can pass a Python dictionary mapping the name of a phase to an array of site fractions.

The T and P keyword arguments are the temperatures and pressures in Kelvin and pascals, respectively, for the calculation. (Specifically the units are whatever T and P mean in the phase model but, in the default Model, SI units are used.) Valid arguments are either a scalar or a one-dimensional array. **kwargs is a placeholder for other, less commonly used or experimental options; these are discussed in the pycalphad documentation linked from the GitHub repository. The return value of calculate() is a multi-dimensional labeled array.

equilibrium()

The equilibrium() function is responsible for equilibrium property calculation in pycalphad. Its key arguments are

def equilibrium(dbf, comps, phases, conditions,
                            output=None, model=None, **kwargs)

dbf, comps, phases, output, model, and **kwargs all have the same meaning as in the calculate() function, with the additional feature that output can be either a string or a list of strings. conditions is a Python dictionary mapping state variables to values. Valid arguments for a condition are a scalar, one-dimensional array, or tuple with the form (start, stop, step). For example, an isothermal step calculation might have a conditions argument of the form {v.X(’AL’): (0,1, 0.01), v.T: 600}, where v is defined as a shortcut to pycalphad.variables, a library module where all standard symbols are defined.

The return value of equilibrium() is a multi-dimensional labeled array. Regardless of the value of output, the result array will always include the equilibrium values of the molar Gibbs energy and chemical potentials since they are necessary to compute the solution.

Representation of results

The result of calls to calculate() and equilibrium() are xarray Dataset objects []. The xarray Dataset object makes handling labeled multi-dimensional arrays substantially simpler. Figure 3 shows the xarray summary of the result of a 2-D mapping calculation. The “Dimensions” line indicates the shape of the array, with each dimension having a label and corresponding size. In this case, equilibria at 170 temperatures and 100 compositions are computed for a two component system.

Figure 3

This is a summary of the result object returned by a call to equilibrium() when performing a 2-D mapping calculation. The “Dimensions” line indicates the shape of the array, with each dimension having a label and corresponding size. In this case, equilibria at 170 temperatures and 100 compositions are computed for a two component system. The “internal dof” dimension corresponds to the site fractions of a phase. The “vertex” dimension corresponds to the vertices of a tie simplex (tie-line in binary systems). The “Data Variables” section contains the actual result of the calculation, with the corresponding dimensions of each property array listed in parentheses, followed by the rst few values. The “Attributes” section contains some metadata about the calculation.

The “internal dof ” dimension corresponds to the sublattice site fractions of a phase. For phases with fewer than the maximum number of internal degrees of freedom, the extra elements are filled with NaN (Not A Number). The “vertex” dimension corresponds to the vertices of a tie simplex (tie-line in binary systems). For single- phase regions, only the first vertex is valid and the others are filled with NaN.

The “Data Variables” section contains the actual result of the calculation, with the corresponding dimensions of each property array listed in parentheses, followed by the first few values. The “Phase” and “NP” arrays contain the names and fractions of phases, respectively, present under the corresponding conditions. All the properties we specify in the output keyword argument are included here. The xarray library makes selecting and slicing the data very easy; for example, to get all the Al chemical potentials at 600 K, we write eq.MU.sel(component=’AL’, T=600), where eq is the result array. Note that a current limitation is that the selection must correspond directly to a calculated value; automatic interpolation of the pressure, temperature, or composition is not currently implemented.

The “Attributes” section contains some metadata about the calculation such as the version of pycalphad used, the calculation date, and the solver iterations.

Datasets have functions for reading from and writing to disk, making storage of the results of long-running calculations easier. Interested users are encouraged to review the xarray documentation [].

Quality control

Even as a relatively small project, pycalphad is sufficiently complex that it is necessary to implement strategies to avoid the regression, or accidental breakage, of features. The key concepts to understand when managing and developing a complex software project are source code control (SCC) and continuous integration (CI). SCC is critical to verify the integrity of the project over time when admitting changes from multiple, scattered contributors, but it is useful even in single-contributor projects because SCC systems serve as a semi-automated project journal and backup system. This project uses the popular Git SCC system [] to manage its source code. This allows the complete history of changes to be recorded for all released and unreleased versions of the software. Git also allows different versions of the software to be stored in separate “branches,” allowing concurrent work on, e.g., new major features and bug fixes to existing versions. The Git repository is publicly available online at GitHub (see section 2). Git also extends into pycalphad’s versioning system: major.minor.rev+N.gHASH, where HASH is the Git commit identifier of the latest commit in the master branch of the repository, and N is the number of commits ahead of the last public release. For public releases, everything after the + is omitted. For modifications which have not yet been committed, i.e., in a developer’s local Git repository, the version identifier will be appended with dirty.

CI is the approach of a project to simplify the software release process by testing code incrementally, i.e., every time a revision is made. This makes releasing new versions easier because the release manager can have some confidence that the quality of the code is above some automatically verified baseline. The pycalphad package has a suite of CI tests designed to verify that a revision to the code does not cause unintended behavior. These tests are run automatically every time a new revision is pushed to the Git repository on GitHub. If a test fails for any reason, a report is generated including all the error information. For example, there are tests to ensure that computed values of properties for several known systems do not change. The equilibrium solver, TDB reading and writing, and phase model construction code are also tested for consistency and accuracy. When a bug is reported and fixed in pycalphad, a minimal test case is added to the suite whenever possible to prevent the problem from appearing again in a future release. In total, about 80% of pycalphad, measured by lines of code, is currently tested, with the remainder involving unreachable or experimental code, or code which is difficult to test in an automated fashion, e.g., plotting code.

(2) Availability

Operating system

A version of Linux, OSX, or Windows capable of running a supported version of Python is required.

Programming language

Python 2.7+ or Python 3.4+ is required.

Additional system requirements

At least 2 GB of RAM is recommended.

Dependencies

gcc, MinGW or Microsoft Visual C++ compiler and toolchain
matplotlib []
numpy ≥ 1.9 []
scipy []
sympy []
xarray []
pyparsing []
tinydb
autograd
tqdm
dask
dill

List of contributors

Richard Otis (Pennsylvania State University) – Development and testing

Zi-Kui Liu (Pennsylvania State University) – Project supervision

Software location

Code repository

Name: GitHub

Persistent identifier: https://github.com/pycalphad/pycalphad

Licence: MIT

Date published: 09/11/16

Language

English

(3) Reuse potential

Al-Fe is chosen as an example system because the COST 507 database [] containing this subsystem is publicly available, and it allows us to test several pycalphad features simultaneously since the system contains single-sublattice solution phases, multi-sublattice ordered phases, phases with magnetic ordering and stoichiometric compounds. The pycalphad package can perform computations for any number of components; we restrict our example to a binary system only for the simplicity of visualization.

Figure 4 shows the result of a one-dimensional (“step”) equilibrium calculation at 600 K. The equilibrium chemical potential of Fe is shown as a function of Al composition. Each point is color-coded with the corresponding stable phase; coexistence regions can be identified by the chemical potential remaining flat across a range of composition. The end-points of such an iso-potential region can be directly connected to the corresponding tie-line at the given temperature. The source code for this calculation can be found in Figure 6.

Figure 4

Equilibrium chemical potential of Fe as a function of Al composition in the Al-Fe system at 600 K, computed using pycalphad. Each point is color-coded with the corresponding stable phase; coexistence regions can be identied by the chemical potential remaining at across a range of composition. The end-points of such an iso-potential region can be directly connected to the corresponding tie-line at the given temperature.

Figure 5

Phase diagram of the Al-Fe system according to the COST 507 database, computed using pycalphad. The solid black lines in the B2 region correspond to lines of constant “degree of ordering” in the B2 phase. The grey dashed line is the Curie temperature. The bcc ordering transition is second-order since the degree of ordering is continuously changing with respect to composition and temperature. Some lines in the diagram are not smooth due to the coarseness of the grid used in the calculation; mapping in pycalphad is still experimental.

Figure 6

Source code for the Al-Fe chemical potential calculation in Figure 4.

Figure 5 shows the phase diagram of the Al-Fe system according to the COST 507 database. The solid black lines in the B2 region correspond to lines of constant “degree of ordering” in the B2 phase. The grey dashed line is the Curie temperature. It is clear from the diagram that the bcc ordering transition is second-order since the degree of ordering is continuously changing with respect to composition and temperature. Some lines in the diagram are not smooth due to the coarseness of the grid used in the calculation; mapping in pycalphad is still experimental. The source code for this calculation can be found in Figure 7.

Figure 7

Source code for the Al-Fe phase diagram in Figure 5.

Those interested in collaborating on pycalphad or seeking support should contact the present authors via e-mail or visit the official project website at pycalphad.org, where additional documentation and examples can be found.

[B1] Spencer, P (2008). A brief history of CALPHAD Calphad 32(1): 1–8, DOI: https://doi.org/10.1016/j.calphad.2007.10.001 URL http://www.sciencedirect.com/science/article/pii/ S0364591607000764.

[B2] Cao, W, Chen, S-L, Zhang, F, Wu, K, Yang, Y, Chang, Y, Schmid-Fetzer, R and Oates, W (2009). Oates PANDAT software with PanEngine, PanOptimizer and PanPrecipitation for multi-component phase diagram calculation and materials property simulation Calphad 33(2): 328–342, DOI: https://doi.org/10.1016/j.calphad.2008.08.004 URL http://www.sciencedirect.com/science/article/pii/S036459160800070.

[B3] Andersson, J-O, Helander, T, Höglund, L, Shi, P and Sundman, B (2002). Thermo-Calc & DICTRA, computational tools for materials science Calphad 26(2): 273–312, DOI: https://doi.org/10.1016/S0364-5916(02)00037-8

[B4] Bale, C, Bélisle, E, Chartrand, P, Decterov, S, Eriksson, G, Hack, K, Jung, I-H, Kang, Y-B, Melançon, J, Pelton, A, Robelin, C and Petersen, S (2009). FactSage thermochemical software and databases recent developments Calphad 33(2): 295–311, DOI: https://doi.org/10.1016/j.calphad.2008.09.009

[B5] Sundman, B, Kattner, U R, Palumbo, M and Fries, S G (2015). OpenCalphad – a free thermodynamic software Integr. Mater. Manuf. Innov 4(1): 1. DOI: https://doi.org/10.1186/s40192-014-0029-1

[B6] Cool, T, Bartol, A, Kasenga, M, Modi, K and García, R E (2010). Gibbs: Phase equilibria and symbolic computation of thermodynamic properties Calphad 34(4): 393–404, DOI: https://doi.org/10.1016/j.calphad.2010.07.005 URL http://www.sciencedirect.com/science/article/pii/ S0364591610000507.

[B7] Hillert, M (2001). The compound energy formalism J. Alloys Compd 320(2): 161–176, DOI: https://doi.org/10.1016/S0925-8388(00)01481-X

[B8] Barber, C B, Dobkin, D P and Huhdanpaa, H (1996). The quickhull algorithm for convex hulls ACM Trans. Math. Softw 22(4): 469–483, DOI: https://doi.org/10.1145/235815.235821

[B9] Joyner, D, Čertík, O, Meurer, A and Granger, B E (2012). Open source computer algebra systems ACM Commun. Comput. Algebr 45(3/4): 225. DOI: https://doi.org/10.1145/2110170.2110185

[B10] Muggianu, Y M, Gambino, M and Bros, J P (1975). Enthalpies of formation of liquid alloys bismuth-gallium-tin at 723k-choice of an analytical representation of integral and partial thermodynamic functions of mixing for this ternary-system J. Chim. Phys. Physico-Chimie Biol 72(1): 83–88.

[B11] Inden, G (1976). Approximate description of the configurational specific heat during a magnetic order-disorder transformation, in: Proc. CALPHAD V In: Dusseldorf, Germany: Max Planck Institut fuer Eisenforschung, pp. 1–13.

[B12] Hillert, M and Jarl, M (1978). A model for alloying in ferromagnetic metals Calphad 2(3): 227–238, DOI: https://doi.org/10.1016/0364-5916(78)90011-1

[B13] Ansara, I, Sundman, B and Willemin, P (1988). Thermodynamic modeling of or- dered phases in the Ni-Al system Acta Metall 36(4): 977–982, DOI: https://doi.org/10.1016/0001-6160(88)90152-6

[B14] Ansara, I, Dupin, N and Sundman, B (1997). Reply to the paper: When is a compound energy not a compound energy? A critique of the 2-sublattice order/disorder model Calphad 21(4): 535–542, DOI: https://doi.org/10.1016/S0364-5916(98)00010-8

[B15] Hoyer, S, Kleeman, A and Brevdo, E (2016). N-D labeled arrays and datasets in Python xarray 0.7.2 documentation URL http://xarray.pydata.org/en/stable/.

[B16] Chacon, S and Straub, B (2014). Pro Git In: Apress, DOI: https://doi.org/10.1007/978-1-4842-0076-6

[B17] Hunter, J D (2007). Matplotlib: A 2D graphics environment Comput. Sci. Eng 9(3): 90–95, DOI: https://doi.org/10.1109/MCSE.2007.55

[B18] van der Walt, S, Colbert, S C and Varoquaux, G (2011). The NumPy Array: A Structure for Efficient Numerical Computation Comput. Sci. Eng 13(2): 22–30, DOI: https://doi.org/10.1109/MCSE.2011.37 URL http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5725236.

[B19] Meurer, A, Smith, C P, Paprocki, M, Čertík, O, Kirpichev, S B, Rocklin, M, Kumar, A, Ivanov, S, Moore, J K, Singh, S, Rathnayake, T, Vig, S, Granger, B E, Muller, R P, Bonazzi, F, Gupta, H, Vats, S, Johansson, F, Pedregosa, F, Curry, M J, Terrel, A R, Roučka, Š, Saboo, A, Fernando, I, Kulal, S, Cimrman, R and Scopatz, A (). SymPy: Symbolic computing in Python DOI: https://doi.org/10.7287/PEERJ.PREPRINTS.2083V3

[B20] McGuire, P (2007). Getting started with pyparsing In: O’Reilly.

[B21] Ansara, I, Dinsdale, A and Rand, M (1998). COST 507 Thermochemical database for light metal alloys jul 1998 URL https://materialsdata.nist.gov/dspace/xmlui/handle/11256/618.

Journal of Open Research Software

Software Metapapers