ImageSURF : An ImageJ Plugin for Batch Pixel-Based Image Segmentation Using Random Forests

Image segmentation is a necessary step in automated quantitative imaging. ImageSURF is a macro-compatible ImageJ2/FIJI plugin for pixel-based image segmentation that considers a range of image derivatives to train pixel classifiers which are then applied to image sets of any size to produce segmentations without bias in a consistent, transparent and reproducible manner. The plugin is available from ImageJ update site http://sites.imagej.net/ImageSURF/ and source code from https://github.com/omaraa/ImageSURF.


Introduction
A critical task in quantitative imaging is segmentation, in which pixels are partitioned into distinct groups.For example, fluorescent labelling in microscopic images might be segmented as signal and background, or photographs of botanical specimens might be segmented as specimen and background (Figure 1).After segmentation, measurements and analyses may be performed to determine, for example, the size, shape, coverage, spatial distribution or morphology of the identified features.
Many segmentation techniques such as thresholding, where pixels above a selected intensity threshold are discriminated from background [1,2], depend exclusively on the brightness of individual pixels, making them sensitive to noise and regional variations in intensity [1].Parameters or seeds for segmentation are often manually selected on a per-image basis based on a preview of the result, or may be selected, reviewed and refined in an unstructured iterative process [3].
The use of image segmentation in research raises several reproducibility issues.In manual segmentation, parameter choice is influenced by conditions such as screen brightness and dynamic range, ambient light, perceived brightness, and subjective bias [4] especially in unblinded raters [5], and such factors are rarely reported, limiting reproducibility.When automated segmentation tools are used, they are often commercial platforms whose detailed algorithms are proprietary.This complicates comparisons between studies and poses replication problems, as legacy software and hardware may no longer be available, and algorithms or interfaces may differ between versions.
Open-source trainable segmentation tools, such as Ilastik [3] or the Trainable Segmentation [6] plugin for ImageJ [7], address many of these issues by using supervised machine learning algorithms to study 'training set' of pixels, which have been manually assigned class annotations, and create a model (' classifier') to reliably discriminate between these classes.The context of each pixel (e.g., intensity, texture, edges, entropy) can be considered, making the classifier more robust to image artifacts and intensity shifts [3].After training, a classifier can be saved and used to perform objective and repeatable segmentation of large numbers of similarly processed images.Although Ilastik and Trainable Segmentation have limited support for importing and exporting image annotations, neither supports both batch import and batch export of these annotations in standard formats.These limitations make it difficult to reproduce and update classifiers, to use alternative software and input devices for annotation, and to share training sets in a transparent and collaborative manner.
For biomedical microscopy, the advantage of machine learning image segmentation is the ability to apply a single classifier to large image sets which vary somewhat in brightness, background level and other image attributes.For our particular application, high resolution multichannel confocal fluorescence images of rodent brains were routinely 22,000 × 18,000 pixels or larger, and 5 or more images per animal need to be segmented to produce useful quantitative data.In order to train a classifier that is robust to the variation across these large image sets it is beneficial to create a training set consisting of smaller cropped images of randomly selected regions in these images.To train a classifier with such large sets of input images (potentially hundreds) it is necessary to use an offline iterative process of class annotation, training and validation, rather than the 'live preview' workflows offered by Ilastik and Trainable Segmentation.To cope with offline iterative training on the sets of large images generated by confocal and fluorescence microscopy, and the very large amount of computed pixel feature information, data structures and algorithms with reduced processing and storage requirements are needed.
In this context, we determined that Trainable Segmentation was unable to use large training sets of arbitrarily sized annotated images without an extensive re-write due to the size and complexity of WEKA Instance data structures.Using a custom implementation of WEKA Instance backed by primitive arrays slightly reduced memory usage, but also increased computation time.Trainable Segmentation only allows import and export of training data with calculated image features in the textbased WEKA arff file format which is wasteful of storage space and slow to import/export for large images, making iterative annotation of large image sets unworkably slow.Furthermore, Trainable Segmentation is a legacy ImageJ1 plugin, limiting its interoperability with SciJava and ImageJ2-compatible applications such as OMERO, KNIME and MiToBo [7].
We therefore developed Image Segmentation Using Random Forests (ImageSURF), a freely-available opensource pixel-classification plugin for ImageJ2/FIJI [7] to meet our requirements.ImageSURF uses standard bitmap formats for class annotations, making the training process open, repeatable and able to incorporate large training sets created by multiple users across multiple sessions with the software of their choice.ImageSURF uses primitive data structures to avoid the substantial overheads of Object data structures such as the WEKA Instance.
We are currently using ImageSURF to study the aggregation and deposition of amyloid-β peptide in brain tissue of Alzheimer's disease rodent models by means of immunolabelling and confocal microscopy.Once trained, ImageSURF is a drop-in replacement for threshold segmentation in our ImageJ scripts.

Implementation and architecture
ImageSURF is an ImageJ2/FIJI plugin written and compiled using Java 1.8, with the user interface classes implementing the SciJava Command interface.
Training input is read from three corresponding sets of images -a set of raw single-plane single-or multi-channel greyscale images, a set of images in RGB format that have been intensity-scaled and pseudo coloured as appropriate for manual annotation, and a set of these RGB images with class annotations in distinct colours.The class annotations are read by taking the difference of the unannotated and annotated RGB images.Each distinct annotation colour is assigned a class index based on the hexadecimal RGB value.ImageSURF supports up to 128 classes.
Annotation images are manually created in ImageJ using the paint tools or using bitmap image software such as Adobe Photoshop or GIMP on any device such as a desktop with a drawing tablet input or portable touchscreen device (Figure 2).
ImageSURF classifiers are built from the training input using an optimised implementation of the random forests algorithm [8] adapted from the FastRandomForest [9] plugin for the WEKA environment [10], which is the default classifier used by Trainable Segmentation [6,7].Features are stored as size-efficient primitive data structures (byte or short arrays for 8-bit and 16-bit images, respectively) and may be pre-calculated and saved to disk.These optimisations reduce flexibility and functionality compared to Trainable Segmentation by making ImageSURF incompatible with the wide range of WEKA classifiers and analysis tools, but substantially increases its capacity for working with large training sets and images while maintaining compatibility with ImageJ workflows, including pre-and post-processing tools and analysis pipelines.
Pixel features are calculated using filters across circular neighbourhoods with various radii (Figure 3).In the current release of ImageSURF we have implemented a filter set to suit confocal fluorescence images of amyloid-β, including mean, minimum, maximum, median, Gaussian, standard deviation, range, difference of Gaussians, difference from mean, minimum, maximum, median and Gaussian, locally scaled intensity, entropy and difference of entropy.Radii are selected as a series of values within the integer set k = 2 a + 1 (3, 5, 9, 17, 33, 65, 129, 257…).For further efficiency, a filter dependency tree used to reduce repeat operations: e.g., the output of Gaussian radii r and s is reused for the difference-of-Gaussians with those radii.
Pixel features can be pre-calculated and cached to substantially reduce computation when re-training classifiers on modified or extended training sets, or as part of an image analysis pipeline where feature images are automatically calculated immediately after image acquisition to speed up later training.Pixel features that can be derived from other saved features with minimal processing are not cached in order to reduce disk usage and read-times.
ImageSURF supports multi-channel images by calculating pixel features for each channel, and all combinations of channels merged in grayscale, by averaging.E.g., for a three channel RGB image, each image filter would be applied to the red, green and blue channels, combined red/green, red/blue, green/blue and red/green/blue to produce seven sets of pixel features.This allows ImageSURF to consider information from all channels and the interactions between channels.
After a classifier has been trained, a subset of the most important features is selected using a modified version of Breiman's feature importance calculation algorithm [8] as implemented in Supek's FastRandomForest [10].For each feature, the classifier is applied to the training set with the values for that feature randomly shuffled.If classification accuracy remains high, that feature is less important and is ranked accordingly.After feature selection the classifier is re-trained considering only the most important features.This optimisation substantially reduces the computation and disk space required when pre-calculating features.Using a minimal set of image features also reduces the memory requirements for image segmentation.The parameters used to train a classifier, including the image features applied to images, can be viewed using the ImageSURF Get Classifier Details command.
ImageSURF also supports segmentation of multidimensional images on a plane-by-plane basis.Each two-dimensional image plane is segmented independently to produce an output image stack with the same dimensions as the input image stack.

Quality control
Automated testing of pixel-based segmentation tools that use sparsely annotated training images is nontrivial, particularly when evaluation of the output requires subjective judgement that is not captured by the annotations.Therefore, we provide a small set of example images and class annotation training sets, along with instructions for end-to-end testing and usage of ImageSURF.
All sample images and label files are contained in the ImageSURF MOAB2 images example [11].ImageSURF commands are in the Plugins >> Segmentation >> ImageSURF menu in ImageJ2 and FIJI (Figure 4): 1) Configure the classifier settings using the ImageSURF Classifier Settings command.Use the default ImageSURF settings (Figure 5).2) Select the image filters using the Select ImageSURF Features command.Exclude the entropy and median filters and their derivatives as these may take some time to calculate.Set the filter radius range as 0-33 (Figure 6).A classifier can be used to segment an open image using the Apply ImageSURF Classifier command or to a folder of images using Batch Apply ImageSURF Classifier.
We recommend using an iterative process of annotation and verification to train ImageSURF classifiers as shown in Figure 9.

Operating system
ImageSURF is compatible with ImageJ2 software running on a Java Virtual Machine version 8 or above.As of writing ImageJ2 is available for macOS, Linux and Windows operating systems.

Figure 2 :
Figure 2: Confocal fluorescence image of MOAB2 labelled amyloid-β pathology in APPswe/PS1dE9 mouse brain tissue with sparse annotations for signal (red) and background (blue) (a) and resulting segmentation (b).

3 )
Train the classifier using the Train ImageSURF Classifier command.Set the raw, un-annotated and annotated image paths.Set an appropriate classifier output path and select "Segment training images and display as stacks" to verify the classifier accuracy after training (Figure 7) The training and segmentation process takes approximately 5 minutes on a modern quad-core computer.Detailed progress information is displayed in the ImageJ console.

Figure 6 :
Figure 6: ImageSURF filter selection dialog with example filters selected.

Figure 9 :
Figure 9: ImageSURF pixel classifier training workflow.A representative set of sub-images are selected and cropped from the full image set and sparsely annotated as signal or background using a bitmap image software package.The sub-images and annotations are used as the input to train an ImageSURF classifier which is them applied back to the input sub-images.The accuracy of the sub-image segmentations is manually verified and the annotation training and verification processes repeated until the sub-image segmentation is accurate.Once the trained classifier has been verified as accurate, it can be applied to any image set of which the training set is representative.

Figure 8 :
Figure 8: ImageSURF training examples.Confocal fluorescence images of MOAB2 labelled amyloid-β pathology in APPswe/PS1dE9 mouse brain tissue (a).Segmented training images (b) and merged image (c) using the ImageJ Merge Channels tool to display the segmented signal pixels as transparent red and background as transparent blue.