ConfBuster: Open-Source Tools for Macrocycle Conformational Search and Analysis

Xavier Barbeau1, Antony T. Vincent2 and Patrick Lagüe3 1 The Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Institute for Integrative Systems Biology (IBIS) and Department of Chemistry, Faculty of Sciences and Engineering, Laval University, Quebec, CA 2 Institute for Integrative Systems Biology (IBIS) and Department of Biochemistry, Microbiology and Bioinformatics, Faculty of Sciences and Engineering, Laval University, Quebec, CA 3 The Quebec Network for Research on Protein Function, Engineering, and Applications (PROTEO), Institute for Integrative Systems Biology (IBIS) and Department of Biochemistry, Microbiology and Bioinformatics, Faculty of Sciences and Engineering, Laval University, Quebec, CA Corresponding author: Patrick Lagüe (Patrick.Lague@bcm.ulaval.ca)


Introduction
Macrocycles are cyclic macromolecules containing rings of 8-12 or more atoms [1,2]. These compounds are found in natural products such as macrolides, that consist of a large macrocyclic lactone ring to which one or more deoxy sugars may be attached, which have antibiotic or antifungal activities and are used as pharmaceutical drugs [3]. Due to their unique properties [4,5], such as of their ability in achieving protein-protein inhibition [6,7] and cell permeability [8], this class of compounds has gained an increased interest in drug development. As a consequence, many macrocycles are currently in clinical development [4], supporting the development of macrocycle-based pharmaceutical companies (Cyclenium pharma, Circle pharma).
The knowledge of the 3D structure is fundamental to the rational design of biologically active molecules. Ideally, these structures would be determined experimentally, either by X-ray crystallography or nuclear magnetic resonance spectroscopy, but this could sometimes be a laborious, time-consuming costly avenue (if possible at all). Fortunately, molecular modelling techniques were developed over the past decades to assess the conformational sampling of chemical structures with a reliable accuracy [9]. To our knowledge, the current bioinformatics tools that are available to investigate and predict macrocycles 3D conformations are limited in their availability, precluding the development of macrocycles, as they are commercially distributed (MOE [10], MacroModel [11], Tork [12], Corina [13] or Schrödinger [14]) or not available to the public (SOS [15] and ForceGen 3D [16]

Implementation and architecture
ConfBuster is composed of 4 Python scripts with the goal of finding the lowest energy conformation of single macrocycles, given a set of molecular coordinates provided as MOL2 or PDB format. The relation between the ConfBuster scripts is illustrated in Figure 1, along with their respective third-party program and package dependencies. The primary script (ConfBuster-Macrocycle-Linear-Sampling.py) performs a conformational search by cleaving the macrocycle at different positions and, for each of the linear molecule created from the cleavage of a bond, calls the secondary scripts to perform a rotational search (ConfBuster-Rotamer-Search. py) and energy minimisations (ConfBuster-Single-Molecule-Minimization.py). Further, the primary script provides PyMOL input files for visualisation of the search progress and results. As an optional step, the script ConfBuster-Analysis.py generates the plot of the root mean square deviation (RMSD)-based hierarchical clustering of the resulting conformations and of the conformational energies.
Dependencies. ConfBuster depends on Open Babel [17], either for file format conversion as well as several functions for conformational sampling and energy minimisation. ConfBuster also depends on PyMOL [18] for RMSD calculations, for a dihedral sampling protocol and for visualisation. The Python package NetworkX [19] is used to identify atoms belonging to cycles and macrocycles. In addition to standard R packages, the optional ConfBuster analysis relies on the ComplexHeatmap package [20] available through the Bioconductor software project (bioconductor.org) and the circlize package [21]. The respective dependencies of ConfBuster scripts are illustrated in Figure 1.
ConfBuster-Single-Molecule-Minimization.py. This script uses the command-line program obminimize from the Open Babel package to optimize the geometry and minimise the energy of the molecule. Then, the obenergy command-line program is used to calculate the final energy, stored in the title of the final MOL2 file. ConfBuster-Macrocycle-Linear-Sampling.py. This is the main script of the tool suite that performs the macrocycle conformational search. The search is achieved by cleaving the macrocycle in linear molecules (Figure 2A). For each cleavable bond (any single bond between two atoms that are not chiral centres), a conformational search is performed. First, the bond is removed and hydrogens are adjusted on the two terminal atoms using PyMOL. This linear molecule is then sampled n times to identify n low Results. Figure 2B and D presents the search results for the macrocycle sopharen A from PDB 1W96 [22]. Figure 2B exposes the 73 conformations identified from the search, Figure 2C presents the lowest energy conformation aligned to the reference structure from the PDB file (the RMSD between the two conformations is 0.405 Å), while Figure 2D    This was followed by a macrocyclic conformational search: $ ConfBuster-Macrocycle-Linear-Sampling.py -i macro-1w96.mol2 -n 5 -N 5 -r 0.5 The progress of the search may be monitored using the run command in PyMOL: (in PyMOL) run Follow-macro-1w96.py Finally, the analysis of the search results was performed as follows: $ ConfBuster-Analysis.py -i macro-1w96 -R macro-1w96.mol2 -n 20 which built the Heatmap_20.pdf file with the content presented in Figure 2D. Therefore, the users can assess that their installation is working properly by cross-checking their results against those presents in the distribution. Further, the users can also validate the results obtained using other molecules against experimental results whenever available. However, as the searches involve random parameters and different molecular complexity, the results might be slightly different than those included in the code distribution or from the experimental values. Running multiple conformational searches may increase the liability of the results. In any case, support is available and provided via GitHub Issues.

Operating system
ConfBuster is able to function on any operating system that supports standard Python and R installations and the dependent packages, which includes Linux, Windows and macOS (tested on Linux Ubuntu 14.04 LTS).

Additional system requirements
There is no special system requirement. However, hardware requirements in terms of processor power, memory capacity, etc. depend primarily on the complexity of the molecules that are processed.

Dependencies
The following software is a required dependency for all the ConfBuster scripts: Open Babel == 2.4.1 (will not work with older or more recent versions, tested with Open Babel version 2.4.1).
The following Python package is a required dependency for ConfBuster-Macrocycle-Linear-Sampling.py: NetworkX (tested with version 1.11).

(3) Reuse potential
The knowledge of the low-energy 3D conformation is fundamental to the rational design of biologically active molecules. The availability of ConfBuster to the drug developer community will provide an access to a free powerfull tool that will guide macrocycle drug design using any low-cost computer or supercomputers from national research facilities. It is important to note that, to our knowledge, ConfBuster is actually the only free conformational search tool for macrocycles. To help users that are less familiar with scientific computing, the package distribution includes the step by step installation instructions and a number of examples to demonstrate its use and the analysis of the results. The ConfBuster-Analysis.py script provide tools to help the validation and publication of the conformational search results. The ConfBuster code has been implemented with the focus on readability and modularity, simplifying the reuse of the modules independently in different projects. Further, the modularity of the package facilitates its inclusion in a potential conformational search plugin that can be implemented in molecular viewers, such as PyMOL. Finally, the package can be used as an engine of a molecular conformational search server with a graphical user interface, which would eliminate software installation and the use of command lines, that would be useful to chemists with reduced computer skills or knowledge.