ScatterJn: An ImageJ Plugin for Scatterplot-Matrix Analysis and Classification of Spatially Resolved Analytical Microscopy Data

We present ScatterJn, an ImageJ (and Fiji) plugin for scatterplot-based exploration and analysis of analytical microscopy data. In contrast to commonly used scatterplot tools, it handles more than two input images (or image stacks, respectively) by creating a matrix of pairwise scatterplots. The tool offers the possibility to manually classify pixels by selecting regions of datapoints in the scatterplots as well as in the spatial domain. We demonstrate its functioning using a set of elemental maps acquired by SEM-EDX mapping of a soil sample. The plugin is available at https://savannah.nongnu.org/projects/scatterjn.


Introduction
From spatially resolved chemical information of samples such as elemental or species maps, it is possible to derive information about reaction mechanisms and processes [1]. Methods that allow for the spatially resolved chemical analysis of samples on the micron and submicron scales include scanning electron microscopy (SEM) and transmission electron microscopy (TEM), both in combination with energy-dispersive X-ray spectroscopy (EDX) mapping, nanoscale secondary ion mass spectrometry (nanoSIMS), scanning transmission X-ray microscopy (STXM), and confocal laser scanning microscopy (CLSM). These methods typically produce data in the form of sets of images (or image stacks in 3D case) that each represent the spatial distribution e.g. of elements or chemical species. The information contained in such datasets is sometimes difficult to access. A number of evaluation methods exist to extract part of this information. These include direct visual display in colour-coded overlay images; calculation of statistical indicators [2,3]; and various types of cluster analysis, which aim at detecting areas of similar composition [4][5][6]. Each of these methods deals with particular aspects of the data and has its specific shortcomings and advantages.
An additional, very useful approach that is often applied to analytical microscopy datasets [3,[7][8][9] is an analysis based on 2-dimensional histograms or so-called scatterplots. Displaying image data in scatterplots provides an intuitive view on properties such as variations, correlations, trends, and clustering. This can be combined with tracing back features in the scatterplot (e.g. clusters) to the spatial domain [7]. Because the scatterplots have to be displayed in a coordinate system, this type of analysis is usually limited to datasets consisting of no more than 2 (in a few cases 3) images.
Our tool, ScatterJn, creates a matrix of 2D scatterplots, which allows for extending this approach to datasets consisting of more than 2 or 3 images. While the generation of scatterplot matrices combined with interactive functions is a common tool for visualizing multi-dimensional datasets [10][11][12], to our knowledge it has not yet been applied to image data. With our tool, we complement existing methods by providing an additional way of data exploration and analysis.
A number of software tools exist that allow for colocalization analysis of image data, most notably the Fiji [13] plugin Coloc 2 (http://fiji.sc/Coloc_2). Apart from generating scatterplots, this tool allows for the calculation of statistical indicators such as the Pearson's coefficient, Manders' coefficients [2], and Li's intensity correlation quotient [14] as well as statistical significance testing [15]. These are very intricate methods for evaluating pairwise image data, which are not implemented in ScatterJn. Instead, ScatterJn focuses on qualitative analysis and exploration of datasets that consist of more than two images.

Implementation and architecture
The tool ScatterJn is designed as a plugin for ImageJ [16] and Fiji [13]. It is published under the GNU GPL v.3 license and available at https://savannah.nongnu.org/projects/ scatterjn. The plugin was written in Java and is partly based on ScatterJ [1], a scatterplot tool for pairwise analysis of images.

Input data
As input, ScatterJn requires a set of two or more 8-bit greyscale images that have to be loaded in ImageJ. All input images must have the same pixel dimensions and must be spatially aligned in a way that they represent exactly the same area of the sample. The plugin handles 2D images, representing sample surfaces or projections, and 3D image stacks, representing sample volumes. Any kind of pre-processing of the images, such as alignment or conversion to 8-bit greyscale type, which depends on the type of data, has to be done by the user with other tools. In order to use the plugin, the input images are selected by the user.

Generation of scatterplots
The different images are assumed to represent compositional maps of the same sample area or volume, respectively. Corresponding pixels (i.e. pixels with the same coordinates) in the individual images thus correspond to the same position on the sample. The grey value of each pixel thereby represents the result of a quantitative intensity measurement that, for example, represents the local concentration of a chemical species.
A set of datapoints is formed out of the grey values of pixels at the same position in the different input images. Therefore, a datapoint contains the results of all measurements taken at the corresponding position. For a set of n input images, a datapoint has n coordinates. The number of datapoints is equal to the number of pixels in one input image. Each datapoint corresponds to a pixel position in the spatial domain.
The datapoints are displayed in a scatterplot matrix. For each combination of two images, a 2-dimensional scatterplot is created. In this scatterplot, the grey values of the individual pixels in image 1 are plotted against the grey values of the equivalent pixel in image 2. This corresponds to a projection of the n-dimensional cloud of datapoints onto a 2-dimensional plane. The scatterplots are arranged in a matrix, which is displayed in a new image window. To account for the spatial position corresponding to the datapoints, a spatial-domain map is created that is displayed in a second image window.

User-defined display settings
As grey values in 8-bit images are integers within the [0, 255] interval, a large number of datapoints will typically have the same coordinates in the scatterplot and thus be plotted at the same position in the coordinate system. Therefore, the density of datapoints in the scatterplots is visualized using an adaptable colour scale. For a more detailed description see [1]. It is possible for the user to adjust display parameters such as the colour scale used for the scatterplots and labels in the scatterplot matrix image. A factor can be entered for histogram binning, which leads to smaller scatterplots.

Interactive classification functions
The tool offers functions for binary and gradual classification based on the position of datapoints in the scatterplot matrix and/or in the spatial-domain map.
For binary classification, the user can select areas in any of the scatterplots using ImageJ's ROI tools. The selected datapoints are then highlighted in all scatterplots and in the spatial-domain map. It is possible to simultaneously select up to one ROI per scatterplot and/or one ROI in the spatial-domain map. The selection in the spatial-domain map can also be defined using a mask generated from any greyscale image of the same region and dimensions as the input images.
As a method of gradual classification visualizing possible linear trends, datapoints can be classified according to their angular distance from the x-axis in one of the scatterplots. The angular distance values are displayed using a colour scale.

Example Dataset
Instead of merely presenting instructions for the usage of the plugin, here we demonstrate its basic functions using an example dataset and illustrate the results. An extensive manual, however, is included in the plugin itself and can be accessed via the "Info" button.

Example dataset: acquisition and pre-processing
The example dataset is shown in Fig. 1. It consists of a set of elemental maps obtained by SEM-EDX mapping of a soil sample. The sample was washed in deionized water, dried in air, spread on an adhesive carbon pad attached to an aluminium sample holder stub, and coated with 8 nm of carbon using a BAL-TEC SCD 005 sputter coater equipped with a BAL-TEC CEA 035 (BAL-TEC, Balzers, Liechtenstein). Maps of 512 × 448 pixels were acquired using a LEO 1450 VP instrument (now Zeiss, Oberkochen, Germany) equipped with an Oxford Inca X-sight EDS Figure 1: Elemental maps of a soil sample used as demonstration dataset. Images were processed as described in the text. Scale bar 50 µm.
7353 detector (Oxford Instruments, Abingdon, UK) at 15 kV acceleration voltage using a pixel size of ~450 nm by summing up 150 frames of the same sample area. Maps were processed using ImageJ. To reduce the effect of statistical variations of the discrete X-ray counts presented in the images, the images were converted into float type and a Gaussian-blur filter with a sigma value of 2 pixels was applied. The histograms were stretched individually to [0, 255] and the images were converted back into 8-bit greyscale type. The resulting maps (as shown in the figure) of course no longer represent exact matrices of measured values, but probability distributions of the respective elements. This procedure is sometimes helpful when dealing with EDX maps [1,17] or any other, similar image data. No alignment of the maps was necessary as all elements were measured simultaneously. The maps represent the distributions of Si, Al, Ca, Fe, and S.

Scatterplot matrix
The input images representing elemental maps were opened in ImageJ. They were then loaded in ScatterJn using histogram binning by a factor of 2 in order to produce a scatterplot matrix image of manageable size for display. Figure 2 shows a screenshot of an arrangement of image windows during usage of ScatterJn. By default, the spatial-domain map shows the first of all selected elemental or chemical species maps and in this case corresponds to the Si map. Several relationships of elemental concentrations (without any link to the spatial domain) can be directly observed in the scatterplot matrix itself. For example, the Al/Si scatterplot shows two distinct trends, which means that, within the sample, phases exist that differ with respect to their distinct Al/Si ratios. A similar observation can be made in the S/Fe scatterplot.
Three basic classification functions are demonstrated in Figure 3.

Manual binary classification in the scatterplot matrix
In the scatterplot of S and Ca, three clearly separate trends are visible. One of those trends describes a proportional ratio of S/Ca, which may, for example, correspond to a CaSO 4 phase. The datapoints that belong to this trend were selected manually in the S/Ca scatterplot by drawing a ROI around the corresponding cluster. The selected datapoints are highlighted in red in the scatterplot matrix as well as in the spatial-domain map (Fig. 3A). In the spatial-domain map, this reveals two individual particles, which means that only these two particles show the elemental composition that corresponds to the selected cluster, i.e. the assumed CaSO 4 phase. In the scatterplot matrix, it also becomes obvious that none of the other elements (Si, Al, Fe) are present in relevant quantities in this phase.

Binary classification in the spatial-domain map
One particle was selected in the spatial-domain map by manually drawing a ROI around its borders. Again, the selected datapoints are highlighted in red both in the scatterplots and in the spatial-domain map (Fig. 3B). The highlighted datapoints in the scatterplot matrix describe the elemental composition of the selected particle, which contains Si and Al in proportional amounts, but none of the other elements in significant quantities. Thus, the particle could be composed e.g. of an aluminosilicate mineral devoid of Fe, Ca, and S.

Gradual classification
In the Ca/Si scatterplot, no clear trends or clusters can be distinguished. Instead, the datapoints appear in a fanshaped point cloud. Using the "Angular distance map" function, the datapoints were classified according to the angle of their position vector with the x-axis in the Ca/ Si scatterplot. The angle is represented on a continuous colour scale. This function yields more abstract, but at the same time more diversified, information than binary classification. In the spatial-domain map, particles represented in a certain colour have the Ca/Si ratio which is displayed in the in the same colour in the scatterplot. In scatterplots other than the Ca/Si one, only pixels onto which only one datapoint is projected can be unambiguously assigned a colour; other pixels are depicted in the greyscale colour scale defined for the scatterplots.

Quality control
Testing was done on Windows 7, Mac OS X (10.9.5), and Ubuntu 14.04.2.

Operating system
ScatterJn was written as a plugin for ImageJ; we expect it to run on any operating system that will run a compatible version of ImageJ or Fiji.
Programming language Java 1.5.

Dependencies
The plugin is expected to run with any reasonably up-todate version of ImageJ or Fiji. The lowest version that was used for testing was ImageJ 1.47q. (3) Reuse potential With ScatterJn, we complement existing evaluation methods by providing a versatile tool for data exploration. Its intended use is for exploratory analysis of the previously described types of analytical microscopy datasets. This includes a variety of analytical methods such as SEM-EDX, TEM-EDX, STXM, CLSM, nanoSIMS, and other spatially resolved analytical data that are -at least to some extentquantitative. Such approaches are currently used in many different fields of research. Beyond this, the plugin can potentially be of use in any situation that requires exploration of quantitative image data.