PerfAndPubTools – Tools for Software Performance Analysis and Publishing of Results

Nuno Fachada1, Vitor V. Lopes2, Rui C. Martins3 and Agostinho C. Rosa1 1 Institute for Systems and Robotics, LARSyS, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal nfachada@laseeb.org 2 Universidad de las Fuerzas Armadas-ESPE, Sangolquí, Ecuador 3 Life and Health Sciences Research Institute, School of Health Sciences, University of Minho, Braga, Portugal Corresponding author: Nuno Fachada


Introduction
PerfAndPubTools consists of a set of MATLAB [1], GNU Octave-compatible [2], functions for the post-processing and analysis of software performance benchmark data and producing associated publication quality materials. More specif ically, the functions bundled with PerfAndPubTools allow to: 1. Batch process files containing benchmarking data of computer programs, one file per run. These tools were originally developed to assess the performance of serial and parallel implementations of the PPHPC simulation model [3], as well as for producing some of the associated publication quality materials. However, the tools can be used with any computational benchmark experiment.

Implementation and architecture
Performance analysis in PerfAndPubTools takes place at two levels: implemen tation and setup. The implementation level is meant to be associated with specific software implementations for performing a given task, for example a particular sorting algorithm or a simulation model realized in a certain pro gramming language. Within the context of each implementation, the software can be executed under different setups. These can be different computational sizes (e.g., vector lengths in a sorting algorithm) or distinct execution parame ters (e.g., number of threads used). PerfAndPubTools is implemented in a layered architecture using a procedu ral programming approach, as shown in Figure 1. From lowest to highest-level of functionality, the functions represented in this Figure have the following roles: get_gtime Given a file containing the default output of the GNU time [4] com mand, this function extracts the user, system and elapsed times in seconds, as well as the percentage of CPU usage.
gather_times Loads execution times from files in a given folder. This function uses get_gtime by default, but can be configured to use another function to load individual benchmark files with a different format.
perfstats Determines mean times and respective standard deviations of a com putational experiment, optionally plotting a scalability graph if different setups correspond to different computational work sizes.
speedup Determines the average, maximum and minimum speedups against one or more reference implementations across a number of setups. Can optionally generate a bar plot displaying the various speedups.
times_table Returns a matrix with useful contents for using in tables for publi cation, namely times (in seconds), absolute standard deviations (seconds), relative standard deviations, and speedups against one or more reference implementations.
times_table_f Returns a table with performance analysis results formatted in plain text or in LATEX (the latter requires the siunitx [5], multirow [6] and booktabs [7] packages).
Although the perfstats and speedup functions optionally create plots, these are mainly intended to provide visual feedback on the performance analysis being undertaken. Those needing more control over the final figures can cus tomize the generated plots via the returned figure handles or create custom plots using the data provided by perfstats and speedup. Either way, MATLAB/Octave plots can be used directly in publications, or converted to LATEX using the excellent matlab2tikz script [8], as exemplified in the PerfAndPubTools user manual.

An example: comparing sorting algorithms
Experimental setup: The performance of four sorting algorithms, imple mented in C [9], is compared using PerfAndPubTools. The algorithms, Bubble sort, Selection sort, Merge sort and Quicksort [10], are used to sort random inte ger vectors of sizes 1 x 10 5 , 2 x 10 5 , 3 x 10 5 and 4 x 10 5 . For each size, individual algorithms are executed ten times. Each run is benchmarked with GNU time, the output of which is redirected to a file with the following identifiers in its name: algorithm employed, run number and vector size.
In this context, a sorting algorithm is an implementation, and each vector size is a setup.
Defining implementation specs: Implementation specs are the basic ob jects accepted by the perfstats, speedup and times_table functions. An implementation spec defines one or more setups for a single implementation. A setup is defined by the following fields: a) sname, the name of the setup; b) folder, the folder where to load benchmark files 1 from; c) files, the specific files to load (using wildcards); and, d) csize, an optional computational size for plotting purposes. Multiple implementation specs can be defined, allowing PerfAndPubTools to compare multiple implementations across different setups.
In the following paragraphs, implementations specs stipulating all the setups (i.e., vector sizes) for the Bubble sort, Selection sort, Merge sort and Quicksort algorithms are represented by the bs, ss, ms and qs variables, respectively. For example, the Bubble sort implementation spec, bs, can be specified as follows: Implementation specs for the remaining algorithms are defined in a similar fashion. Note that all implementations specs must have the same number of setups, and corresponding setups should have the same sname. Additionally, plotting with perfstats requires that the computational size, csize, is defined and has the same value for corresponding setups in different implementations specs.
Algorithm scalability: The perfstats function determines mean times and standard deviations of individual setups for each implementation. If the various setups correspond to different computational work sizes, perfstats can option ally plot a scalability graph. The following instruction performs this task for the experimental setup under discussion:  The m variable represents mean times (in seconds), while s holds the re spective standard deviations. Rows are associated with implementations (i.e., sorting algorithms), while columns represent setups (i.e., vector sizes). The first parameter of perfstats specifies whether to generate a scalability plot. More specifically, the value 3 orders the function to generate a semi-logarithmic plot, as show in Figure 2. Negative values indicate that the figure should also display error bars representing the standard deviation in the measured computational sizes. No plot will be generated if zero is passed as the first argument.
Obtaining the speedup: The speedup function determines speedups against one or more reference implementations, across a number of setups. Its usage is similar to that of perfstats, requiring the identification of the implementation specs to compare: The first parameter concerns the optional bar plot the function is able to generate. An absolute value of 2 states that a bar plot with a logarithmic scale should be generated, as shown in Figure 3. Since this value is negative, error bars representing the maximum and minimum speedups are drawn on top of the average speedup bars. The second parameter defines the reference implementation(s) to which the speedups are to be determined against. Passing 1 identifies the first implementation, Bubble sort, as the reference. The speedup function returns cell arrays containing the average, maximum and minimum speedup matrices for each reference implementation. In this case, one reference was defined, and thus only the first item in the returned cells is available: In a similar fashion to the mean and standard deviation matrices returned by perfstats, rows of speedup matrices are associated with implementations (i.e., sorting algorithms), while columns represent setups (i.e., vector sizes). Note that, in this case, the first row represents the average speedup of Bubble sort against itself, and, as such, the values are all ones.
Generating tables: PerfAndPubTools can generate plain text or publication quality tables summarizing the performed computational benchmarks. The pro cess is divided in two steps using the times_table and times_table_f func tions, respectively. The former determines and returns a matrix containing partial or complete information to generate a table, while the latter effectively generates tables. This division is useful because times_table_f can accept more than one matrix returned by times_table, allowing the generation of more complex tables. The first argument designates the references implementation or implemen tations, in a similar fashion to the second parameter of speedup. The return value, tdata, can be passed to times_table_f in order to generate a table: times_table_f(0, 'vs Bubble', tdata) The first argument, 0, instructs the function to generate a plain text table, as shown in Figure 4. Setting this value to 1 would generate a LATEX table, as shown in Figure 5. Note that LATEX tables require the siunitx, multirow and booktabs packages.
Complete example: The complete example is available in the user manual bundled with the software. It contains the necessary steps required to reproduce these results, also showing how the return values of perfstats and speedup can be used to generate custom publication quality plots. The user manual also details an additional example concerning the performance of serial and parallel implementations of the PPHPC simulation model [3], namely different ways of contextualizing the concept of computational size, and the generation of more complex tables.

Quality control
The available functions are covered by unit tests in order to ensure their correct behavior. The MOxUnit framework [11] is required for running the unit tests. Additionally, all the examples available in the user manual (bundled with the software) have been tested in both MATLAB and Octave.

Operating system
Any system capable of running MATLAB R2013a or GNU Octave 3.8.1, or higher.

Programming language
MATLAB R2013a or GNU Octave 3.8.1, or higher.

Dependencies
There are no additional dependencies for the package tools. However, unit tests depend on the MOxUnit unit test framework for MATLAB and GNU Octave.