BoolSi: a tool for distributed simulations and analysis of Boolean networks

We present BoolSi, an open-source cross-platform command line tool for distributed simulations of deterministic Boolean networks with synchronous update. It uses MPI standard to support execution on computational clusters, as well as parallel processing on a single computer. BoolSi can be used to model the behavior of complex dynamic networks, such as gene regulatory networks. In particular, it allows for identification and statistical analysis of network attractors. We perform a case study of the activity of a cambium cell to demonstrate the capabilities of the tool.


Introduction
A Boolean network is a discrete dynamical system consisting of nodes that represent objects from application domain, and their update rules, that represent interactions between the objects. Update rule of a node is a Boolean function that defines the next state of the node based on current states of nodes. Therefore, each node can take one of two possible states, denoted 0 or 1. In synchronous Boolean networks, the next state of the network is calculated by applying all update rules simultaneously. In asynchronous Boolean networks, the states of only a subset of network nodes get updated at a time. The exact nodes to update at each time step can be either determined by some preselected rule, or chosen at random, which introduces nondeterminism into system dynamics.
State space of a Boolean network is finite and consists of 2 n network states, where n is the number of nodes in the network. Simulating a Boolean network with deterministic dynamics will eventually cause the network to reach a previously visited state and thus fall into a cycle called an attractor. In the case of a cycle of length 1, the attractor is called a steady state. States of the network that lead to a particular attractor constitute its basin of attraction. Attractors represent long-term behavior of the modeled processes and may therefore carry important information about them [10,11].
Using Boolean networks to model continuous realworld processes requires binary interpretation of their states, which is usually achieved through thresholding. Despite such simplification, Boolean network models can capture important properties of the dynamics of complex systems, which makes them a practical choice in a range of domains, including systems biology [6], robotics [16], epidemiology [13], and economics [3,15].
Due to the practicality of the model, a number of software tools for Boolean network simulations have been made available. BoolNet [12] is an R package for construction, simulation and analysis of synchronous, asynchronous, and random Boolean networks, that supports identification of synchronous attractors by both exhaustive and heuristic search. BooleanNet [1] is a Python software library for simulation and visualization of synchronous and asynchronous Boolean networks, capable of detecting their attractors. Unlike most other tools from the field, BooleanNet allows for the simulation of simulate discrete update rules as piecewise linear differential equations by introducing continuous variables to the model. RaBooNet [7] is a Matlab toolbox for generation and simulation of random Boolean networks, with an emphasis on their application to supervised machine learning. RaBooNet does not support user-defined update rules of a network, only allowing rules to be randomly generated based on network structure. BioNetGen [4], GINsim [8], and PlantSimLab [9] are software packages with graphical user interface, aimed at rule-based modeling of biochemical systems. All three packages can operate with both binarystate and multi-state models.
Here, we present BoolSi, a command line tool for simulation and analysis of synchronous Boolean networks. The main distinguishing characteristic of BoolSi is the support for MPI standard to run simulations in parallel, e.g. on a computational cluster. To the best of our knowledge, this is the first Boolean network modeling tool capable of high-performance computing. Another unique feature of BoolSi is the analysis that it performs on the identified attractors. This approach, inspired by statistical mechanics, was first introduced in [14] and used therein to better understand the role of hormonal signals in regulating cambium proliferation.

Implementation and architecture
BoolSi requires user to define nodes, update rules, and initial states of the network via simple text input. It supports simulating the network from a range of initial states within a single run, and aggregating the results of simulations. In particular, BoolSi can use exhaustive search to identify all network attractors, as well as their trajectory length distributions and the sizes of their basins of attraction. User can limit the length of attractors to search for -for example, setting the limit to 1 will identify the steady states only. Another mode of operation aims to find conditions that lead to specific states of the network.
It is possible to override the simulation process, which can be used to model external influence on the network. User can fix the state of any node for the entire simulation (e.g. modeling gene knockout in regulatory networks), or perturb it at any time step (e.g. modeling sensory input in robotics).
To speed up simulations, BoolSi stores pre-evaluated update rules as truth tables. This requires O(n · 2 d ) memory, where n is the number of nodes in the network and d is its maximum in-degree. Such an approach imposes a practical limitation on the in-degree of a node, with the majority of modern computers being able to handle the nodes of in-degree up to 20. This however is sufficient for many Boolean network models from the literature, where the in-degrees rarely exceed 10 (see e.g. [2,5] and the references therein).

Attractor analysis
BoolSi performs statistical analysis of the found attractors to extract information about the interplay between the nodes. Let n denote the number of nodes in a Boolean network. Simulating the network from each of its possible initial states s k ∈ {0, 1} n , k = 1, …, 2 n , leads to an attractor, comprised of one or multiple network states. We define α k i ∈ [0, 1], the activity coefficient of node i associated with network state s k , as the average of 0 and 1 states of the node in the attractor reachable from s k . Figure 1 shows an example of the activity coefficient calculation. All activity coefficients of node i form vector , which represents long-term activity of the node with regard to the entire state space of the network. For every pair of nodes i and j, BoolSi computes Spearman's correlation coefficient ρ ij ∈ [-1, 1] between α i and α j , interpreted as the degree of mutual amplification of the two nodes. We note that the impact of each attractor on ρ ij is proportional to the size of its basin of attraction. Spearman's correlation measures the strength and direction of a monotonic association between two variables, and was chosen for its ability to capture non-linear relationships.

User interface
BoolSi provides a command line interface. It relies on YAML input file to define nodes and update rules of a Boolean network and to provide simulation details, such as initial states. Update rules are described using common syntax for logical functions, including operators "and", "or", "not".
BoolSi supports a number of output formats: PDF documents, vector or raster image files (SVG, PNG, TIFF), or CSV text files to allow for machine processing.

Quality control
BoolSi code is covered by unit and functional tests. To demonstrate performance and functionality of our software, we perform a case study on a Boolean network model of cambium cell. The results can be reproduced by executing $ mpiexec -np 168 boolsi attract cambium.yaml (the contents of cambium.yaml are shown in Appendix A), assuming that BoolSi, as well as an MPI implementation and mpi4py library, are installed.

Case study
Cambium is a plant tissue layer that produces cells for plant growth. It is concealed under multiple layers of other cells, which hinders identification of mutants with  [14], used in Section Case study). The activity coefficient of the node CK associated with each of roughly 0.0115 · 2 30 ≈ 12,348,031 network states s k leading to this attractor is CK 5 6 k   .

Oles and Kukushkin: BoolSi
Art. 26, page 3 of 6 altered secondary growth and isolation of live cambium cells. As a consequence, understanding the biology of cambium remains incomplete. Mathematical modeling has the potential to help with predicting the outcome of interactions controlling cambium activity [14]. We based the case study off of the 30-node BN model of cambium regulation (see Figure 2) described in [14], formatted as a YAML input file for BoolSi (see Appendix A).
We used BoolSi to find network attractors by simulating from all 2 30 ≈ 1.07·10 9 initial states of the network, and to analyze the found attractors. The exhaustive search took about 45 hours on a cluster of 21 computers with 8 CPU cores each. Of the total 168 CPU cores, 112 were 4 GHz, 17 were 3.4 GHz, 11 were 2.1 GHz, and 28 were 1.4 GHz.
The results of attractor analysis (Figure 3) agree with the findings of [14] and experimental data. In particular, each  of the nodes IAA and BR showed positive correlation with both WOX4 and HB8, whose combined activity represents cambium proliferation. Node ETHL exhibited strong positive correlation (>0.5) with WOX4, but no significant correlation with HB8. The analysis also reproduced the mutually inhibitory relationship between cytokinin and auxin signaling [17], manifested as the negative correlation between the nodes IAA and CK. Unlike in [14], our analysis found a positive effect of the nodes CK and TDIF on the expression of WOX4, as well as a positive relationship between CK and HB8 (the correlation between TDIF and HB8 was 0). This is consistent with published experimental data on the key role of both CK and TDIF in regulation of cambium activity. We note that BoolSi automatically extracts information about the interplay between each pair of nodes, stepping up the approach of manually inferring the relationships between the selected nodes.
To test if BoolSi can accurately model the mutations of a cambium cell, we simulated the effect of mutations known to reduce cambium proliferation: double knockout of ERFs erf108erf109, a quadruple knockout of IPT which was deficient in cytokinin synthesis, and mutants in cytokinin receptor AHK4 [14]. Mutations were mimicked by fixing the states of ERF, IPT, or AHK to be 0 in the simulations (see Appendix B for the corresponding BoolSi input). In agreement with the experimental data, all mutations resulted in lower proliferation activity, measured as (see Section Attractor analysis). The proliferation activity in erf, ipt, and ahk was, respectively, 1.91, 1.16, and 1.02 times lower than in the model of cambium unaffected by mutations. We note that the insignificance of the decrease of proliferation activity in ahk suggests the direction for improvement of the model.

(2) Availability
BoolSi can be installed via Python package manager pip (e.g. pip install boolsi), or from source.

Programming language
Python 3.4 or higher.
Running BoolSi with MPI also requires installing an MPI implementation and mpi4py library. BoolSi has been tested with OpenMPI and MPICH, and we expect it to work with other MPI implementations as well.

Software location Archive
Name: Zenodo Persistent identifier: 10