Longbow: A Lightweight Remote Job Submission Tool

James Gebbie-Rayet1,*, Gareth Shannon2,*, Hannes H. Loeffler1 and Charles A. Laughton2 1 Scientific Computing Department, STFC Daresbury Laboratory, Warrington WA4 4AD, UK james.gebbie@stfc.ac.uk, hannes.loeffler@stfc.ac.uk 2 School of Pharmacy and Centre for Biomolecular Sciences, University of Nottingham, University Park, Nottingham NG7 2RD, UK gareth.shannon@nottingham.ac.uk, charles.laughton@nottingham.ac.uk * James Gebbie-Rayet and Gareth Shannon contributed equally to this work


Introduction
Much of modern computational science and engineering requires access to high performance compute resources. Notable examples include climate modelling [1] which underpins government environmental policy, jet engine development [2] which increases passenger safety and reduces environmental impact, and molecular dynamics simulations which aid the understanding and treatment of disease [3].
With computer power remaining faithful to Moore's law [4], the games industry delivering massively parallel GPU processors [5] and quantum computing making positive strides [6], society's demand for computationally expensive calculations can only be expected to escalate with time.
Despite the ever-increasing importance of high performance resources, usability remains an issue. HPC resources are typically a) remote; b) require familiarity with an operating environment that may differ significantly from that found on the user's desktop; and c) have job schedulers that differ significantly from one to another. To use such facilities, the average user currently has little option but to, interactively and step-by-step, get through a workflow whose key steps are i) logging in to the remote resource from their desktop, ii) transferring the required input files to the remote resource, iii) writing a job submission script, iv) submitting the job, v) monitoring the job and vi) finally transferring files back to their local workstation to analyse the results. For experienced practitioners this is time-consuming, but for potential new adopters it may be a severe barrier to engagement with HPC-enabled science altogether.
Here we present a software package, Longbow, as a solution to this problem. Longbow is a lightweight console-based remote job submission tool and library that greatly simplifies the process of running a compute job on a high performance resource. It does so by automating the laborious steps outlined above, without requiring the user to leave their familiar desktop environment. Longbow even writes the job submission script and retrieves the results on behalf of the user.
There are a number of benefits of this simplification. First of all, it reduces the time a researcher spends on job managment. Secondly, it allows inexperienced computational researchers to access high performance computers with ease and become productive quickly. Thirdly, as results files are automatically brought back to the local resource, researchers using a number of different remote resources within a single project can effortlessly centralise their results. Finally, operating details of the remote resource are hidden away through a common interface.
The philosophy behind the development of Longbow has been to make the software as simple as possible to use, install and extend. It has been written in "vanilla" python (2.6, 2.7, 3.x) and can be installed without administrative privileges via the pip management system (pip install longbow). Furthermore, it has no dependencies on other packages besides the standard Unix utilities ssh and rsync, making installation much easier for the non-expert. Although "out of the box", Longbow currently only supports Molecular Dynamics packages. New software packages from any branch of computational science can be simply incorporated into the Python code by creating a single short script file as a plug-in. The result is a lightweight, simple and intuitive tool that can be extended in minutes to support the running of new software packages and executables on remote resources.
There are other utilities that facilitate the submission of compute-intensive jobs remotely, however none has quite the same focus as Longbow. For example, the Radical Cybertools suite [7] provides a very flexible and featurerich framework for running jobs on distributed resources. However, this is best suited to advanced users with technical skills to install the sizeable package and its significant dependencies, and to adapt the relatively low-level interface the suite provides to the needs of their specific research problem. Though Longbow is a simpler product, its reduced learning curve means that for common usage patterns, a user can become productive more quickly.
There are also web-based solutions such as the bioinformatics-focussed Galaxy [8] that allow the submission of jobs through a web interface. Though such approaches can provide a particularly rich and supportive user interface, the challenges of integrating the web portal with a particular HPC resource will typically require the active support of the relevant computing center system administration team, and may well require a relaxation of the facility's security policy that could be difficult to negotiate.
To summarise, to the authors' knowledge Longbow is the only widely accessible console-based tool that allows the remote submission of jobs in a simple fashion.

Use of Longbow in research
Longbow is a new tool that was only released as stable software on 21/06/2015. Nevertheless this has not been an obstacle to the uptake of Longbow by researchers interested in utilising it in their research. Amongst the most notable projects that have developed beyond a scoping phase and into formal development are the following.

Use of Longbow in benchmarking HPCs
One of the functions of the HECBioSim consortium is to provide assistance to members of the bio-molecular simulation community with gaining access to HPC facilities of national level (presently the UK national supercomputer ARCHER). To encourage the most efficient use of this massively parallel resource, the consortium has generated and maintains a detailed database of the weak and strong scaling behavior of the most commonly-used Molecular Dynamics (MD) packages (AMBER [9], CHARMM [10], GROMACS [11], LAMMPS [12] and NAMD [13]).
The database is populated with the results from a wideranging set of benchmarking simulations that cover the range of most commonly encountered problem sizes and types (refer to http://purl.org/net/epubs/work/50963 and http://www.hecbiosim.ac.uk/benchmark). Each simulation must be run on a range of core counts, and using each of the supported MD packages. Running this suite is extremely time consuming and editing job submission scripts each time software versions change was a laborious task. Applying Longbow to this task has removed the need to continuously update hundreds of job submission scripts when testing new software. The Longbow-based version of the benchmark suite is also completely portable and can be used to comprehensively profile any supported HPC facility, with any release of supported software, in little more than the run time required for the simulations.

Use of Longbow in FLEX-EM
Longbow has been incorporated into the developing CCP-EM software suite for biological electron cryomicroscopy [14]. Several steps in the interpretation of micrographs require significant compute resources, such as recent Bayesian methods for the 3D reconstruction of molecular volumes, and Longbow provides the ability to submit computationally intensive jobs to remote clusters from a local instance of the CCP-EM graphical user interface. Longbow has been incorporated as a generic utility, but in the first instance has been tested on the FlexEM application [15] which flexibly fits atomic models into low resolution volumes.

Implementation and architecture
Longbow consists of a standalone application and a generically written core library. Here we will give an overview of the basic features and functionality of the application, followed by a description of what is available within the core library and plug-ins.

Longbow application Executing Longbow
To run jobs on a high performance remote resource, a Longbow job is run either interactively or in the background on the user's desktop resource, or is submitted to the user's local batch queue system.
As mentioned previously, the Longbow application has been designed to be as simple to run as possible. In many instances, by simply placing the word "longbow" in front of a command string that would run the desired job on the local resource, the same job will execute on the remote resource. The full format of the Longbow command is displayed in Figure 1.
Once executed, Longbow will create a job submission script, copy this script along with required input files to the remote resource, submit the job, keep track of the progress of the job and periodically bring the output files back to the local machine.

Source of the simplicity
A major reason submitting jobs is so simple is that much of the extra information Longbow requires to describe how and where to run the remote jobs is stored in one, or optionally two, configuration files in .ini file format: the hosts configuration file and the job configuration file. The hosts configuration file is used primarily to store information about the remote resource such as username and account details, but can also provide default values for parameters such as numbers of cores to be used or walltime limit. The job configuration file provides a mechanism to override the default values of any parameter for the current project, while these again may be overridden by arguments specified on the command line to give jobspecific control.
Another reason submitting jobs is so simple is because Longbow will automatically detect input files that should be copied to the remote resource to be used by the executable. The user does not have to provide a list.

Job types
The Longbow application supports three types of jobs to be run on the remote resource: single, replicate and multijobs.
Single jobs involve just one calculation: a single executable, a single set of input data, and all running on a single compute resource. Replicate jobs involve the more-or-less synchronous running of a single executable against a variety of related input data sets, all on a single resource.
A distinguishing feature of Longbow is its ability to simultaneously initiate several jobs on different remote resources at a single keystroke by running what is known as a multijob. Multijobs can be easily set up by providing details of all the jobs to be run in separate sections of a job configuration file. A good example of when this would be useful is the aforementioned FLEX-EM project. In this case, the multijobs feature allowed the simultaneous submission of many jobs that utilised several different codes on several different resources. Without Longbow, achieving this would have required a more complex solution.

Longbow architecture Longbow core library
The Longbow core library (corelibs) contains generically written methods that provide the functionality behind Longbow. The core library consists of procedurally written code, which is categorised into distinct python modules based upon the nature of the method. This ensures that source files stay relatively short in length and that developers wishing to incorporate Longbow can choose which aspects of the core library they wish to use.
The methods within the core library have been engineered in such a way that only two main data structures are required for them to operate. These main data structures consist of python dictionaries, which form a logical distinction between data that describes a job and that which describes a host (HPC resource). In the Longbow application these structures are initialised within the configuration methods of the core library. It is here that developers can find the template structures and logic for setting up such structures should they wish to produce a custom solution.
The following outline gives a brief overview of which methods can be found within the core library. However, more in depth information can be gleaned from either comments within the code or from the developer's documentation on the HECBiosim wiki [16].   An example of how the core library should be utilised can be seen in Figure 2, this diagram outlines how the core library is used within the Longbow application executable.

Longbow plug-ins
Longbow plug-ins contain code that interfaces with the core library to support both the executables to be run on the remote resource and the schedulers to which the jobs are submitted. As such the code is housed in two libraries: apps and schedulers.

plugins.schedulers
Each supported scheduler is defined in a python file named after the scheduler. For example, PBS is supported by Longbow in file pbs.py.
The following outline gives a brief overview of which methods can be found within each supported scheduler file. In every case the plug-ins methods are called by the methods of the same name in scheduling.py in the core library. More in depth information can be gleaned from either comments within the code or from the developer's documentation on the HECBiosim wiki [16].

prepare()
This method writes the job submission file to be submitted to the scheduler. delete() This method will delete a job that has been submitted to the scheduler in question on the remote resource. submit() This method submits the job submission file to the scheduler in question on the remote resource. status() This method will query the status of jobs that have been submitted to the scheduler.

plugins.apps
Each supported app (executable) is defined in a python file named after the software. For example, molecular dynamics software CHARMM is supported by Longbow in file charmm.py.
The following outline gives a brief overview of which methods and dictionaries that may be found within each the supported app file. Those that are required vary depending on the requirements of the software package. More in depth information can be gleaned from either comments within the code or from the developer's documentation on the HECBiosim wiki [16].

EXECDATA
This dictionary defines the names of the supported executables for the package and the command line flags the software requires. This dictionary is required in all app files. file_parser() This method recursively searches through input files to the executable for references to other required input files. All filenames found are added to the list of files to be staged to the remote resource. Only executables that can depend on input files that are not explicitly provided on the command line require this method. sub_dict() This method detects commandline parameter substitutions to be applied in input files. Only packages that support such substitutions and users that wish to implement such a feature require this method. defaultfilename() This method will automatically add the file extension specified in the method onto the name of an input file provided without the extension. This method is to support the atypical case that a package might expect the name of an input file to be provided without the extension.
For Longbow to support a new package, often just a python dictionary needs to be provided which highlights the extensibility of the software. If necessary or desired, the methods outlined above can also be provided to use a given software package in a more complex fashion.

Quality control
To ensure that Longbow conformed to good quality control standards we conducted alpha and beta testing as well as developing a suite of tests.

Alpha and beta testing
Alpha and beta testing: Alpha testing was done in-house using colleagues ranging from junior PhD students to experienced postdoctoral workers with a range of academic backgrounds and technical expertise. Both ease of installation and use of the Longbow package were tested. Following this round of testing, an open beta release was made available to the UK academic community and feedback was collected over a period of 3 months.

Test suite
The test suite has been designed to probe all of the functions within the core library as well as the performance of the standalone Longbow application. The tests are applied to the release code prior to tagging a new release version on the code repository and subsequently publishing a new release. The following types of tests are performed: Unit Tests: These use the PyUnit testing framework, and are designed to test that changes to methods within the core library have not broken the basic functionality.
Functional Tests: These are designed to test that the standalone Longbow application functions for a standard set of configurations. There are also tests within this suite that force failure to make sure error handling occurs correctly.
Operation Tests: These are designed to probe the basic operation of Longbow. These tests are made up of jobs that run using different MD packages, job types, job configurations, scheduling environment and HPC facilities. These test that Longbow will run to completion full jobs across all supported schedulers and software packages (plug-ins supplied out of the box) and will perform all desired functions a user may request.

Operating system
Longbow is designed to work best on Linux or Unix based operating systems. It is possible to run Longbow under the Windows operating system using a Unix emulation environment, such as Cygwin or MinGW to bring the SSH and rsync utilities to Windows.

Programming language
Longbow is written in Python and will run natively with versions 2.6, 2.7 and 3.x.

Additional system requirements
Longbow only requires that the machine it is run on has SSH-type connectivity to the HPC facility that is the intended target of use, and that password-less login to that resource can be established. The local machine must also have the standard unix rsync utility.

Dependencies
Longbow has no dependencies outside of the standard Python libraries.

List of contributors
James. T. Gebbie-Rayet, acted as co-principal developer having designed the software and contributed a substantial amount of code, documentation and usersupport.
Gareth. Shannon, acted as co-principal developer having contributed a substantial amount of code, documentation and user-support.
Hannes H. Loeffler, designed the software, provided valuable technical guidance throughout the project and contributed many ideas.
Charles. A. Laughton, acted as principal investigator by steering the overall direction of the software and contributing numerous ideas as well as code to the codebase.

Software location
Archive (e.g. institutional repository, general repository) (required) Name: pypi The language used throughout the documentation, code repository, support forums and naming and comments within the code-base is English.

(3) Reuse Potential
Longbow has been designed to be highly re-usable both as a standalone application or by directly integrating the Longbow core library into other software projects. Users, groups or consortia can easily tailor Longbow for use with software specific to their field of interest simply by providing Longbow with plug-ins for the software they wish to support. Developers can make use of Longbow as a job submission layer embedded within their software. Developers have the freedom to do anything from simply wrapping Longbow as a standalone application through to incorporating the very core library into their own software.