(1) Overview

Introduction

In our research group at Aalborg University (AAU) we have recently launched a new research project, FastAFM, seeking to utilise compressed sensing in accelerating the acquisition of atomic force microscopy images. This is a relatively unexplored application area where results have only just started to appear [], []. With the present paper, we present the general software package magni, which we have developed to combine compressed sensing and AFM imaging techniques.

Compressed sensing is a theory which has attracted a great deal of attention recently. In brief, the theory states that a wide range of possible signal types can be accurately represented from a greatly reduced number of acquired samples [], []. That is, these signal types can be accurately reconstructed from samples taken significantly below the Shannon-Nyquist rate which is normally seen as the ultimate limit.

Atomic Force Microscopy (AFM) is one of the most advanced tools for high-resolution imaging and manipulation of nanoscale matter []. When used for imaging, it is able to generate a 3D surface map with sub nanometer resolution of an object []. To generate this map, a sharp probe is brought close to the surface of the object, and the probe tip and the object are then moved relative to each other. The mechanical probe tip is affected by the force on the surface and, loosely speaking, "feels" the surface [], []. Unfortunately, standard AFM imaging requires a timescale on the order of minutes to hours to acquire an image [].

In the course of our work with compressed sensing and AFM, we have identified three shortcomings. We find that these are not adequately met by available free and open source research software in this area:

  1. Software for reconstruction of compressed sensing signals.
  2. Software for consistent and rigorous testing of reconstruction algorithms, particularly of their reconstruction capabilities in terms of phase transition.
  3. Software for acquisition and processing of AFM images in relation to compressed sensing.

While free and open source software for compressed sensing signal reconstruction is available as such [], [], most of this software relies on Matlab from MathWorks which limits the reproducibility. Also, the available free and open source software for AFM image post-processing and visualisation [] does not include compressed sensing related functionality. Instead, to mitigate these shortcomings, we have built the magni software package to ensure the highest degree of reproducibility defined for signal processing []. This has been done by relying on the free and open source programming language Python2 and by making all examples, figures, etc. easily reproducible. An example of the reconstruction of an AFM image is shown in Figure 1. Using magni, the original image was loaded, preprocessed, sampled, reconstructed and displayed in less than 25 lines with intuitive calls to magni such as:

>>> magni.imaging.measurements.spiral_sample_image(h, w, scan_length, num_points)
>>> magni.imaging.measurements.construct_measurement_matrix(img_coords, h, w)
>>> magni.imaging.dictionaries.get_DCT((h, w))
>>> magni.afm.reconstruction.reconstruct(domain.measurements, Phi, Psi)
Fig. 1 

An example compressive sensing reconstruction of an AFM image.

We have designed the magni software package to address the above three needs: it contains a selection of compressed sensing reconstruction algorithms, a framework for evaluating reconstruction algorithms through Monte Carlo Simulations, and more AFM-specific functionality for sampling and reconstructing images from AFM equipment. Further development of the package is planned through our ongoing FastAFM research project as this progresses over the coming years. This further development aims to extend the functionality of the package both in terms of directly interfacing the AFM equipment and in terms of adding more post-processing and reconstruction algorithms.

Implementation and architecture

The magni package is written in the Python programming language. Python combined with a set of third-party libraries is an excellent tool for scientific and engineering applications []. The magni package uses the following third-party libraries to exploit code reuse, to ease the quality control process, and to enhance the end user experience:

  • The numpy and scipy libraries are used for handling data (using the efficient ndarray data container class []) and for performing numerical computations. These are two of the core libraries for scientific computing using Python [].
  • The pytables library [] is used for storing data through a high-abstraction HDF5 database interface.
  • The matplotlib library [] is used for visualising data.
  • The easy-to-use IPython [] Notebook is used for presenting a number of examples showing the capabilities of the magni package.

The magni package is itself a library, i.e. it is a collection of Python sub-packages and modules and as such does not provide any (graphical) user interface. The functionality provided by magni may be grouped into five categories with a sub-package assigned to each category, as illustrated in Figure 2. Furthermore, each sub-package has a number of modules or nested sub-packages to group related functionality.

Fig. 2 

The functionality of the 5 sub-packages of magni along with the dependencies of magni.

As for coding style, procedural programming is preferred over object-oriented programming, to avoid unnecessary overhead []. Also, the developers found procedural programming more transparent for implementing the desired functionality. Only in a few cases where the use of classes leads to significantly cleaner code, object-oriented programming is applied. Thus each module has its functionality encapsulated in a number of functions and classes, for which a distinction is made between public, internal, and private accessibility []. These accessibility levels are reflected in the code by use of the weak "internal use" indicator underscore convention as suggested by PEP8:

  • Private functionality is used only by the module itself. An underscore precedes the name of such functions or classes.
  • Internal functionality is used by modules in the same or a nested sub-package. No underscore precedes the name of such functions or classes, but an underscore precedes the name of the module.
  • Public functionality is available to the end-users and used by the package itself. No underscore precedes the name of such functions or classes, and no underscore precedes the name of the module.

Both functions and methods are implemented as to ensure readability in addition to efficiency by limiting the number of logical tasks per routine, the cyclomatic complexity [], [], and the number of physical code lines. The cyclomatic complexity, i.e. the number of independent paths through the function, is kept below 10 for core functionality, consistent with observations on the level which programmers can usually handle flawlessly. This has been validated via the static code analyser radon. The number of physical code lines is kept below 50 which is consistent with recommendations used at IBM and TRW [] and general experiences in this field [], [].

The magni package complies with the PEP8 recommendation for Python coding conventions. This ensures that all Python code conforms to a number of recommendations with the aim of making the code user-friendly and thus easier and more robust from a maintenance point-of-view. The recommendations cover e.g. line width, variable naming conventions, package importing, indentations, and source encoding. Furthermore, magni is extensively documented using numpydoc formatted doc-strings which describe the objective of the code, specify inputs and outputs of functions, elaborate on the functionality of the code, mention relevant references, and present examples of the use of the package. Finally, the input of every public (i.e. user-accessible) function and class is validated according to the known requirements with appropriate Python exceptions raised for invalid input. This is done to avoid runtime errors with hard-to-debug messages and stack traces.

Quality control

The code development procedure was built on what was found to be the best choice of methods from: 1) Well defined stage-based methods such as the structured waterfall approach [] and the spiral approach [] allowing backward interaction between different development phases; and 2) The test and adaptive centred Agile procedure [] including e.g. Scrum [], [] and extreme programming [], [] with parts such as code reviews, code iteration, simplicity of design, frequent refactoring and collective ownership. All code modules were first developed with tight links to the algorithm and refactoring was then performed to ensure maintainability, readability, robustness and sufficient performance. Multiple smaller and one large code review were held by 2-6 researchers including the main developers. Throughout the development process, Git was used for version control and issue tracking [], and multiple branches were used to ensure that only tested code entered the master branch.

Testing and code validation has been handled by different instruments:

  • 15 carefully designed end-to-end examples have been implemented in IPython Notebook (the .ipynb files). United, these examples exercise all critical code segments and serve the purpose of integration and regression testing.
  • Doc-strings for all public functions include examples that are used in automated doctests. This helps with the regression testing and ensures that the docstrings are kept up-to-date.
  • pyflakes and pylint static source code analysers for Python have been used in the code development process to catch bugs and bad coding quality.

As always, no software package is better than its documentation and examples provided along with the package. The examples and part of the documentation have already been mentioned. Some of the examples use an AFM image, which is provided with the package. Furthermore, a full documentation in html is automatically generated from the doc-strings. A pdf version of this is shipped with the code.

(2) Availability

Operating system

Tested on Ubuntu 12.04 LTS Linux, Apple Mac OS X 10.9, and Microsoft Windows 7. Since magni is written in pure Python, it should run on any system on which Python and the magni dependencies run.

Programming language

The magni package is written in pure Python. Python 2 (>=2.7) or Python 3 (>=3.3) is required to use the package. The package has been tested with the Anaconda Python distribution by Continuum Analytics.

Additional system requirements

magni is designed to process data sets of all sizes. Hardware requirements in terms of processor power, memory capacity, etc. depend primarily on the size of the data sets that are processed.

Dependencies

magni depends on numpy, scipy, pytables, and matplotlib. The package has been tested with:

  • numpy version 1.8
  • scipy version 0.13
  • pytables version 3.1
  • matplotlib version 1.3

The following libraries are optional requirements for magni:

  • IPython Notebook >= 1.1 (for running examples)
  • Math Kernel Library (mkl) >= 11.1 (for accelerated vector operations)
  • sphinx >= 1.2 (for building the documentation from source)
  • napoleon >= 0.2.6 (for building the documentation from source)

List of contributors

  • Christian Schou Oxvig (Aalborg University) - Development
  • Patrick Steffen Pedersen (Aalborg University) - Development
  • Jan Østergaard (Aalborg University) - Testing and code review
  • Thomas Arildsen (Aalborg University) - Testing and code review
  • Tobias L. Jensen (Aalborg University) - Testing and code review
  • Torben Larsen (Aalborg University) - Testing and code review

Archive

Name

Videnbasen (VBN), Aalborg University

Persistent identifier

DOI: http://doi.org/10.5278/VBN/MISC/Magni

License

BSD 2-Clause

Publisher

Christian Schou Oxvig

Date published

23/05/14

Code repository

Name

GitHub

License

BSD 2-Clause

Date published

23/05/14

Language

English

(3) Reuse potential

The magni package has been designed to facilitate reuse through extensive documentation of functionality and interfaces. The code has been implemented with focus on readability. And the package is accompanied by a number of examples to demonstrate its use in various use-cases.

We expect the magni package to have significant reuse potential for researchers in the area of AFM, particularly in relation to compressed sensing acquisition and reconstruction of AFM images. This applies to both users interested in developing and testing new sampling patterns for use in conjunction with compressed sensing techniques and users developing new algorithms for compressed sensing in the context of AFM.

Furthermore, the magni package is applicable to compressed sensing in general and can be particularly useful to those looking for compressed sensing reconstruction algorithms for use in Python, which have so far been scarce. In addition to reconstruction algorithms, the package provides a consistent framework which can be used to empirically estimate the reconstruction capabilities of the users' own reconstruction algorithms in terms of reconstruction phase transitions.

Due to the magni package being based on well established Python libraries, it fits naturally into the Python ecosystem [] of high-quality tools for scientific computing. The software complies with the reproducible research paradigm as used in the field of signal processing []. The intent of reproducible research is to create an open and transparent approach to the software related to some specific conducted research – see e.g. [], [], [], []. We thus provide full open access to all source code and full reuse rights via the generous BSD 2-Clause license, making it easy for others to use the code base. While it is the plan of the developers to continuously expand the functionality of the software, others are free to use it in separate branches. The reproducibility subpackage goes one step further by providing functionality for reading and writing the version and complete configuration of magni. Furthermore, information about conda, git revision, and the system platform is included if available. This information can be automatically shipped alongside the results, by letting magni use the same HDF5 database for storing the two. With these features, the developers hope to inspire others to make their results reproducible.