Multivariate statistical analysis of non-mass-selected ToF-SIMS data (original) (raw)
Related papers
Applied Surface Science, 2004
Time-of-flight secondary ion mass spectrometry (TOF-SIMS) by its parallel nature, generates complex and very large datasets quickly and easily. An example of such a large dataset is a spectral image where a complete spectrum is collected for each pixel. Unfortunately, the large size of the data matrix involved makes it difficult to extract the chemical information from the data using traditional techniques. Because time constraints prevent an analysis of every peak, prior knowledge is used to select the most probable and significant peaks for evaluation. However, this approach may lead to a misinterpretation of the system under analysis. Ideally, the complete spectral image would be used to provide a comprehensive, unbiased materials characterization based on full spectral signatures.
An Introduction to Cluster Secondary Ion Mass Spectrometry (Cluster SIMS)
Principles and Applications, 2013
Cluster secondary ion mass spectrometry (SIMS) has had a significant impact on the mass spectrometry and surface analysis communities over the past two decades, with its newfound ability to characterize surface and in-depth compositions of molecular species with minimal damage, excellent spatial (100 nm or less) and depth (5 nm) resolutions, and increased sensitivities for bioimaging applications. With the continual development of new cluster ion beam technologies, we are breaking down barriers once thought to be unbreakable, and entering into new fields once labeled as out of reach. Instrument designs are now advancing to account for these new applications, allowing for further improvements in molecular sensitivities, selectivities, and even high throughput analysis. Although we are * Official contribution of the National Institute of Standards and Technology; not subject to copyright in the United States. † Commercial equipment and materials are identified in order to adequately specify certain procedures. In no case does such identification imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose. ‡ This document was prepared as an account of work sponsored by an agency of the US Government.
The resolution of mass spectrometers is often insufficient to conclusively identify all peaks that may be present in recorded spectra. Here, we present new methods to extract consistent molecular and bulk level chemical information by constrained fitting of series of complex organic mass spectra with multiple overlapping peaks. Possible individual peaks in a group of overlapping peaks are identified by both defining a chemical space and by free peak fitting. If simply all possible formulas from the chemical space would be used to fit each peak, the result would not be well constrained. The free peak fitting algorithm provides information about likely peak locations. A new algorithm then reconciles the results of both methods and produces a final peak list for use in subsequent fitting, while using all available experimental constraints. Comparison to ultra-high resolution data suggests that the real peak density is substantially higher than can be resolved with the instrument resolution. Bulk chemical properties such as carbon number (nC) and carbon oxidation state (OS C) can be calculated from the fit results. For mixtures of compounds dominated by C, H, O and N, bulk properties can be reliably extracted, even though some formula assignments may remain uncertain. This ability to retrieve correct bulk parameters even if not all assigned formulas are correct originates from the relationship between mass defects of individual peaks and the chemical parameters under our CHON composition assumptions. Retrieving consistent bulk parameters across series of many mass spectra is essential for extracting time trends, e.g. for field measurements taking place over several weeks. We illustrate the fitting method using a sample data set from a chemical ionization mass spectrometer with a resolution of approximately 4000 (M/dM), operated using acetate reagent ions. Spectral simulation experiments validate the analysis method by showing good agreement of intensities for many specific ions, as well as for bulk chemical parameters. An alternative method to directly extract bulk chemical information from the raw spectra without the need of any peak assignment or peak fitting is also introduced, which shows good agreement with the peak fitting results. The latter method can be applied very rapidly without the need for complex analysis procedures, e.g. as a quick online diagnostic during data acquisition.
Regression analysis and modelling of data acquisition for SELDI-TOF mass spectrometry
Bioinformatics, 2007
Motivation: Pre-processing of SELDI-TOF mass spectrometry data is currently performed on a largel y ad hoc basis. This makes comparison of results from independent analyses troublesome and does not provide a framework for distinguishing different sources of variation in data. Results: In this article, we consider the task of pooling a large number of single-shot spectra, a task commonly performed automatically by the instrument software. By viewing the underlying statistical problem as one of heteroscedastic linear regression, we provide a framework for introducing robust methods and for dealing with missing data resulting from a limited span of recordable intensity values provided by the instrument. Our framework provides an interpretation of currently used methods as a maximum-likelihood estimator and allows theoretical derivation of its variance. We observe that this variance depends crucially on the total number of ionic species, which can vary considerably between different pooled spectra. This variation in variance can potentially invalidate the results from naive methods of discrimination/classification and we outline appropriate data transformations. Introducing methods from robust statistics did not improve the standard errors of the pooled samples. Imputing missing values however-using the EM algorithm-had a notable effect on the result; for our data, the pooled height of peaks which were frequently truncated increased by up to 30%.
Journal of chromatography. A, 2015
Various algorithms have been developed to improve the quantity and quality of information that can be extracted from complex datasets obtained using hyphenated mass spectrometric techniques. While different approaches are possible, the key step often consists in arranging the data into a large series of profiles known as extracted ion profiles. Those profiles, similar to mono-dimensional separation profiles, are then processed to detect potential chromatographic peaks. This allows extracting from the dataset a large number of peaks that are characteristics of the compounds that have been separated. However, with mass spectrometry (MS) detection, the response is usually a complex signal whose pattern depends on the analyte, the MS instrument and the ionization method. When converted to ionic profiles, a single separated analyte will have multiple images at different m/z range. In this manuscript we present a hierarchical agglomerative clustering algorithm to group profiles with very ...
Analytica Chimica Acta, 2011
Random projection (RP) is a simple and fast linear method for dimensionality reduction of highdimensional multivariate data, independent from the data. The method is briefly described and a new memory-saving algorithm is presented for the generation of random projection vectors. Application of RP to data from scanning experiments with a time-of-flight secondary ion mass spectrometer (TOF-SIMS) showed that data reduced by RP have a satisfying discriminant property for separating target material and minerals without using any knowledge about the composition of the sample. A selection method -based on low dimensional RP data -is described and successfully tested for automatic recognition of characteristic, diverse locations of a sample surface. RP is demonstrated as an unbiased, powerful method, especially for large data sets, severe hardware restrictions (such as in space experiments) or the need for fast data evaluation of hyperspectral data.
Cluster Secondary Ion Mass Spectrometry
Surface Analysis and Techniques in Biology, 2014
In principle, secondary ion mass spectrometry (SIMS) molecule-specific imaging has vast implications in biological research where submicrometer spatial resolution, uppermost surface layer sensitivity, and chemically unmodified sample preparation are essential. Yet SIMS imaging using atomic projectiles has been rather ineffective when applied to biological materials. The common pitfalls experienced during these analyses include low secondary ion yields, extensive fragmentation, restricted mass ranges, and the accumulation of significant physical and chemical damage after sample erosion beyond 1 % of the surface molecules. Collectively, these limitations considerably reduce the amount of material available for detection and result in inadequate sensitivity for most applications. In response, polyatomic (cluster) ions have been introduced as an alternate imaging projectile. Cluster ion bombardment has been observed to enhance secondary ion yields, extend the spectral mass range, and decrease the incidence of physical and chemical damage during sample erosion. The projectiles are expected to considerably increase the number of molecules available for analysis and to significantly improve the overall sensitivity. Hence, the objectives of this chapter are to describe the unique physical basis for the improvements observed during polyatomic bombardment and to identify the emerging biological applications made practical by the introduction of cluster projectiles to SIMS.
Iolite: Freeware for the visualisation and processing of mass spectrometric data
Journal of Analytical Atomic Spectrometry, 2011
Iolite is a non-commercial software package developed to aid in the processing of inorganic mass spectrometric data, with a strong emphasis on visualisation versus time of acquisition. The goal of the software is to provide a powerful framework for data processing and interpretation, while giving users the ability to implement their own data reduction protocols. It is intended to be highly interactive, providing the user with a complete overview of the data at all stages of processing, and allowing the freedom to change parameters and reprocess data at any point. The program presents a variety of windows for the selection and viewing of data versus time, as well as features for the generation of X-Y plots, summary reports and export of data. In addition, it is capable of generating X-Y images from laser ablation rasters, and combining information from up to four separate elemental concentrations (intensities of red, green and blue, and the z-axis) in a false-colour three-dimensional image. By virtue of its underlying computing environment-Igor Pro-Iolite is capable of processing very large datasets (i.e., millions of timeslices) rapidly, and is thus ideal for the interrogation of multi-hour sessions of laser ablation data that can not be easily manipulated in conventional spreadsheet applications, for example. It is also well suited to multi-day sessions of solution-mode inductively-coupled plasma mass spectrometer (ICPMS) or thermal ionisation mass spectrometer (TIMS) data. A strong emphasis is placed on the interpolation of parameters that vary with time by a variety of user selectable methods including smoothed cubic splines. Data are processed on a timeslice-by-timeslice basis, allowing outlier rejection and calculation of statistics to be employed directly on calculated results. This approach can reduce the risk of processing biases associated with the manipulation of integrated datasets, while also allowing the implementation of more complex data reduction methods.