SELDI-TOF mass spectra: A view on sources of variation (original) (raw)

Regression analysis and modelling of data acquisition for SELDI-TOF mass spectrometry

Bioinformatics, 2007

Motivation: Pre-processing of SELDI-TOF mass spectrometry data is currently performed on a largel y ad hoc basis. This makes comparison of results from independent analyses troublesome and does not provide a framework for distinguishing different sources of variation in data. Results: In this article, we consider the task of pooling a large number of single-shot spectra, a task commonly performed automatically by the instrument software. By viewing the underlying statistical problem as one of heteroscedastic linear regression, we provide a framework for introducing robust methods and for dealing with missing data resulting from a limited span of recordable intensity values provided by the instrument. Our framework provides an interpretation of currently used methods as a maximum-likelihood estimator and allows theoretical derivation of its variance. We observe that this variance depends crucially on the total number of ionic species, which can vary considerably between different pooled spectra. This variation in variance can potentially invalidate the results from naive methods of discrimination/classification and we outline appropriate data transformations. Introducing methods from robust statistics did not improve the standard errors of the pooled samples. Imputing missing values however-using the EM algorithm-had a notable effect on the result; for our data, the pooled height of peaks which were frequently truncated increased by up to 30%.

Annotated regions of significance of SELDI-TOF-MS spectra for detecting protein biomarkers

PROTEOMICS, 2006

Peak detection is a key step in the analysis of SELDI-TOF-MS spectra, but the current default method has low specificity and poor peak annotation. To improve data quality, scientists still have to validate the identified peaks visually, a tedious and time-consuming process, especially for large data sets. Hence, there is a genuine need for methods that minimize manual validation. We have previously reported a multi-spectral signal detection method, called RS for 'region of significance', with improved specificity. Here we extend it to include a peak quantification algorithm based on annotated regions of significance (ARS). For each spectral region flagged as significant by RS, we first identify a dominant spectrum for determining the number of peaks and the m/z region of these peaks. From each m/z region of peaks, a peak template is extracted from all spectra via the principal component analysis. Finally, with the template, we estimate the amplitude and location of the peak in each spectrum with the least-squares method and refine the estimation of the amplitude via the mixture model. We have evaluated the ARS algorithm on patient samples from a clinical study. Comparison with the standard method shows that ARS (i) inherits the superior specificity of RS, and (ii) gives more accurate peak annotations than the standard method. In conclusion, we find that ARS alleviates the main problems in the preprocessing of SELDI-TOF spectra. The R-package ProSpect that implements ARS is freely available for academic use at http:// www.meb.ki.se/,yudpaw.

How to increase the credibility and ease the investigation of biosignatures in TOF-SIMS mass spectra?

TOF-SIMS mass spectra are labour intensive to interpret because of the very large number of peaks that are in the mass spectrum. In a previous work we concluded that it is possible to interpret TOF-SIMS mass spectra in detail, however, interpretation requires powerful proprietary software to handle the information (Chatzitheodoridis et al., 2005) and a perfect mass calibration (Antonopoulou-Athera et al., 2011). Only then fully automated interpretation can be performed, also resulting in detailed chemical patterns that can be useful for biosignature interpretation. Currently, biosignature interpretation is performed using a small number of characteristic peaks that are compared with pure, isolated biochemical phases, often assisted by statistical techniques, or by labelling specific molecules. Biochemical phases that have been investigated in the literature include proteins and their aminoacids (Quong et al., 2005), extracellular polymeric substances (de Brouwer et al., 2006), lipid...

Quality control and quality assessment of data from surface-enhanced laser desorption/ionization (SELDI) time-of flight (TOF) mass spectrometry (MS)

BMC Bioinformatics, 2005

Background: Proteomic profiling of complex biological mixtures by the ProteinChip technology of surfaceenhanced laser desorption/ionization time-of-flight (SELDI-TOF) mass spectrometry (MS) is one of the most promising approaches in toxicological, biological, and clinic research. The reliable identification of protein expression patterns and associated protein biomarkers that differentiate disease from health or that distinguish different stages of a disease depends on developing methods for assessing the quality of SELDI-TOF mass spectra. The use of SELDI data for biomarker identification requires application of rigorous procedures to detect and discard low quality spectra prior to data analysis.

Advances in pre-processing and model generation for mass spectrometric data analysis

2007

The analysis of complex signals as obtained by mass spectro- metric measurements is complicated and needs an appropriate represen- tation of the data. Thereby the kind of preprocessing, feature extraction as well as the used similarity measure are of particular importance. Fo- cusing on biomarker analysis and taking the functional nature of the data into account this task is even more complicated. A new mass spec- trometry tailored data preprocessing is shown, discussed and analyzed in a clinical proteom study compared to a standard setting.

A method for assessing and maintaining the reproducibility of mass spectrometric analyses of complex samples

Rapid Communications in Mass Spectrometry, 2009

Direct injection mass spectrometric analysis of biological samples is potentially an attractive approach to the discovery of diagnostic patterns for specific pathophysiological conditions because of its speed and simplicity. Despite the possible benefits offered by such a method, its extensive application has been limited so far by several factors, including the inadequate reproducibility of the analytical results. We describe a method for monitoring and optimizing the performance of mass spectrometers used for biomarker discovery studies, based on the analysis of patterns of standardized spectral features. The method was successfully applied to maintaining spectral reproducibility during a multi-day analysis of hundreds of serum samples despite an ion source failure, which necessitated minor maintenance. The monitoring method allowed the early detection of that failure and the restoration of the spectral profiles after the system was restarted.

Identifying technical aliases in SELDI mass spectra of complex mixtures of proteins

BMC Research Notes, 2013

Background: Biomarker discovery datasets created using mass spectrum protein profiling of complex mixtures of proteins contain many peaks that represent the same protein with different charge states. Correlated variables such as these can confound the statistical analyses of proteomic data. Previously we developed an algorithm that clustered mass spectrum peaks that were biologically or technically correlated. Here we demonstrate an algorithm that clusters correlated technical aliases only.

Analysis of Mass Spectrometry Data: Significance Analysis of Microarrays for Seldi-MS Data in Proteomics

International Journal for Computational Biology, 2015

Mass Spectrometry (MS) has arguably become thecore technology in proteomics. MALDI and SELDI-TOFtechniques enable the study biological fluids, e.g. human blood.Analysis of these samples can lead to discover new biomarkerswhich can ease the diagnostic and prognostic of several diseases,e.g. various cancers. In this work, we focus on MS data fromSELDI-TOF experiments. We begin with a preprocessing step inorder to remove noises due to the acquisition process of the data.Then, we apply the differential analysis to a SELDI-MS data,using the Significance Analysis of Microarray (SAM) methodimplemented in Matlab. Results using the SAM method arecompared with those obtained by the conventional t-test andAnalysis Of Variance (ANOVA) in order to evaluate its efficacyand its performance. As a result, we demonstrate that the SAMmethod can be adapted for effective significance analysis ofSELDI-MS data. It is deemed powerful and provides betterresults that totes. An easy-to-use application is developed withMatlab for mass spectrometry data analysis from raw spectra todifferential analysis, including the SAM method.

the need for review and understanding of SELDI/MALDI mass spectroscopy data prior to analysis

Cancer informatics, 2005

Multiple studies have reported that surface enhanced laser desorption/ionization time of flight mass spectroscopy (SELDI-TOF-MS) is useful in the early detection of disease based on the analysis of bodily fluids. Use of any multiplex mass spectroscopy based approach as in the analysis of bodily fluids to detect disease must be analyzed with great care due to the susceptibility of multiplex and mass spectroscopy methods to biases introduced via experimental design, patient samples, and/or methodology. Specific biases include those related to experimental design, patients, samples, protein chips, chip reader and spectral analysis. Contributions to biases based on patients include demographics (e.g., age, race, ethnicity, sex), homeostasis (e.g., fasting, medications, stress, time of sampling), and site of analysis (hospital, clinic, other). Biases in samples include conditions of sampling (type of sample container, time of processing, time to storage), conditions of storage, (time and...