Normalization, baseline correction and alignment of high-throughput mass spectrometry data (original) (raw)

Evaluation of statistical techniques to normalize mass spectrometry-based urinary metabolomics data

Journal of Pharmaceutical and Biomedical Analysis, 2019

Human urine recently became a popular medium for metabolomics biomarker discovery because its collection is non-invasive. Sometimes renal dilution of urine can be problematic in this type of urinary biomarker analysis. Currently, various normalization techniques such as creatinine ratio, osmolality, specific gravity, dry mass, urine volume, and area under the curve are used to account for the renal dilution. However, these normalization techniques have their own drawbacks. In this project, mass spectrometry-based urinary metabolomic data obtained from prostate cancer (n=56), bladder cancer (n=57) and control (n=69) groups were analyzed using statistical normalization techniques. The normalization techniques investigated in this study are Creatinine Ratio, Log Value, Linear Baseline, Cyclic Loess, Quantile, Probabilistic Quotient, Auto Scaling, Pareto Scaling, and Variance Stabilizing Normalization. The appropriate summary statistics for comparison of normalization techniques were created using variances, coefficients of variation, and boxplots. For each normalization technique, a principal component analysis was performed to identify clusters based on cancer type. In addition, hypothesis tests were conducted to determine if the normalized biomarkers could be used to differentiate between the cancer types. The results indicate that the determination of statistical significance can be dependent upon which normalization method is utilized. Therefore, careful consideration should go into choosing an appropriate normalization technique as no method had universally superior performance.

Reliable identification of prostate cancer using mass spectrometry metabolomic imaging in needle core biopsies

Laboratory Investigation, 2019

Metabolomic profiling can aid in understanding crucial biological processes in cancer development and progression and can also yield diagnostic biomarkers. Desorption electrospray ionization coupled to mass spectrometry imaging (DESI-MSI) has been proposed as a potential adjunct to diagnostic surgical pathology, particularly for prostate cancer. However, due to low resolution sampling, small numbers of mass spectra, and little validation, published studies have yet to test whether this method is sufficiently robust to merit clinical translation. We used over 900 spatially resolved DESI-MSI spectra to establish an accurate, high-resolution metabolic profile of prostate cancer. We identified 25 differentially abundant metabolites, with cancer tissue showing increased fatty acids (FAs) and phospholipids, along with utilization of the Krebs cycle, and benign tissue showing increased levels of lyso-phosphatidylethanolamine (PE). Additionally, we identified, for the first time, two lyso-PEs with abundance that decreased with cancer grade and two phosphatidylcholines (PChs) with increased abundance with increasing cancer grade. Importantly, we developed and internally validated a multivariate metabolomic classifier for prostate cancer using 534 spatial regions of interest (ROIs) in the training cohort and 430 ROIs in the test cohort. With excellent statistical power, the training cohort achieved a balanced accuracy of 97% and validation on testing data set demonstrated 85% balanced accuracy. Given the validated accuracy of this classifier and the correlation of differentially abundant metabolites with established patterns of prostate cancer cell metabolism, we conclude that DESI-MSI is an effective tool for characterizing prostate cancer metabolism with the potential for clinical translation.

Prostate cancer biomarker discovery using high performance mass spectral serum profiling

Computer Methods and Programs in Biomedicine, 2009

Prostate-specific antigen (PSA) is the most widely used serum biomarker for early detection of prostate cancer (PCA). Nevertheless, PSA level can be falsely elevated due to prostatic enlargement, inflammation or infection, which limits the PSA test specificity. The objective of this study is to use a machine learning approach for the analysis of mass spectrometry data to discover more reliable biomarkers that distinguish PCA from benign specimens. Serum samples from 179 prostate cancer patients and 74 benign patients were analyzed. These samples were processed using ProXPRESSION TM Biomarker Enrichment Kits (PerkinElmer). Mass spectra were acquired using a prOTOF TM 2000 matrix-assisted laser desorption/ionization orthogonal time-of-flight (MALDI-O-TOF) mass spectrometer. In this study, we search for potential biomarkers using our feature selection method, the Extended Markov Blanket (EMB). From the new marker selection algorithm, a panel of 26 peaks achieved an accuracy of 80.7%, a sensitivity of 83.5%, a specificity of 74.4%, a positive predictive value (PPV) of 87.9%, and a negative predictive value (NPV) of 68.2%. On the other hand, when PSA alone was used (with a cutoff of 4.0 ng/ml), a sensitivity of 66.7%, a specificity of 53.6%, a PPV of 73.5%, and a NPV of 45.4% were obtained.

Serum Protein Expression Profiling for Cancer Detection: Validation of a SELDI-Based Approach for Prostate Cancer

Disease Markers, 2004

Multiple studies have reported that analysis of serum and other bodily fluids using surface enhanced laser desorption/ionization time of flight mass spectroscopy (SELDI-TOF-MS) can identify a "fingerprint" or "signature" of spectral peaks that can separate patients with a specific disease from normal control patients. Ultimately, classification by SELDI-TOF-MS relies on spectral differences in position and amplitude of resolved peaks. Since the reproducibility of quantitation, resolution and mass accuracy of the SELDI-TOF-MS, or any high throughput mass spectrometric technique, has never been determined this method has come under some skepticism as to its clinical usefulness. This manuscript describes a detailed design of a three-phase study to validate the clinical usefulness of SELDI-TOF-MS in the identification of patients with prostatic adenocarcinoma (PCA). At the end of this validation study, the usefulness of the general SELDI-TOF-MS approach to identifying patients with PCA will be demonstrated and how it compares with PCA diagnosis by measuring prostate specific antigen.

Normalization of mass spectrometry data (NOMAD)

Advances in biological regulation, 2018

iTRAQ and TMT reagent-based mass spectrometry (MS) are commonly used technologies for quantitative proteomics in biological samples. Such studies are often performed over multiple MS runs, potentially resulting in introduction of MS run bias that could affect downstream analysis. Such MS data have therefore commonly been normalized using a reference sample which is included in each MS run. We show, however, that reference normalization does not effectively remove systematic MS run bias. A linear model approach was previously proposed to improve on the reference normalization approach but does not computationally scale to larger data sets. Here we describe the NOMAD (normalization of mass spectrometry data) R package which implements a computationally efficient ANOVA normalization approach with protein assembly functionality. NOMAD provides the same advantages as the linear regression solution but is more computationally efficient which allows superior scaling to larger sample sizes. Moreover, NOMAD effectively removes bias which improves valid across MS run comparisons.

SANIST: a rapid mass spectrometric SACI/ESI data acquisition and elaboration platform for verifying potential candidate biomarkers

Rapid communications in mass spectrometry : RCM, 2015

Surface-Activated Chemical Ionization/Electrospray Ionization mass spectrometry (SACI/ESI-MS) is a technique with high sensitivity and low noise that allows accurate biomarker discovery studies. We developed a dedicated SACI/ESI software, named SANIST, for both biomarker fingerprint data acquisition and as a diagnostic tool, using prostate cancer (PCa) as the disease of interest. Liquid chromatography (LC)/SACI/ESI-MS technology was employed to detect a potential biomarker panel for PCa disease prediction. Serum from patients with histologically confirmed or negative prostate biopsies for PCa was employed. The biomarker data (m/z or Thompson value, retention time and extraction mass chromatogram peak area) were stored in an ascii database. SANIST software allowed identification of potential biomarkers. A Bayesian scoring algorithm developed in house allowed sample separation based on comparison with samples in the database. Biomarker candidates from the carnitine family were detecte...

Protein mass spectra data analysis for clinical biomarker discovery: a global review

Briefings in Bioinformatics, 2011

The identification of new diagnostic or prognostic biomarkers is one of the main aims of clinical cancer research. In recent years there has been a growing interest in using high throughput technologies for the detection of such biomarkers. In particular, mass spectrometry appears as an exciting tool with great potential. However, to extract any benefit from the massive potential of clinical proteomic studies, appropriate methods, improvement and validation are required. To better understand the key statistical points involved with such studies, this review presents the main data analysis steps of protein mass spectra data analysis, from the pre-processing of the data to the identification and validation of biomarkers. Efficient pre-processing is an essential pre-requisite for retrieving meaningful proteomic biological information from raw spectra and reaching meaningful clinical conclusions. The identification and validation of pertinent biomarkers requires large, well-designed studies. Methodology improvement would benefit from a tight collaboration between biostatisticians, computer scientists, biologists and clinicians.

Biomarker Selection, Employing an Iterative Peak Selection Method, and Prostate Spectra Characterization for Identifying Biomarkers Related to Prostate Cancer

Lecture Notes in Computer Science, 2007

A proteomic analysis system (PAS) for prostate Mass Spectrometry (MS) spectra is proposed for differentiating normal from abnormal and benign from malignant cases and for identifying biomarkers related to prostate cancer. PAS comprised two stages, 1/a pre-processing stage, consisting of MS-spectrum smoothing, normalization, iterative peak selection, and peak alignment, and 2/a classification stage, comprising a 2-level hierarchical tree structure, employing the PNN and SVM classifiers at the 1 st (normalabnormal) and 2 nd (benign-malignant) classification levels respectively. PAS first applied local thresholding, for determining the MS-spectrum noise level, and second an iterative global threshold estimation algorithm, for selecting peaks at different intensity ranges. Two optimum sub-sets of these peaks, one at each global threshold, were used to optimally design the hierarchical classification scheme and, thus, indicate the best m/z values. The information rich biomarkers 1160.8, 2082.2, 3595.9, 4275.3, 5817.3, 7653.2, that have been associated with the prostate gland, are proposed for further investigation.

MALDI mass spectrometric imaging based identification of clinically relevant signals in prostate cancer using large-scale tissue microarrays

International Journal of Cancer, 2013

To identify molecular features associated with clinico-pathological parameters and TMPRSS2-ERG fusion status in prostate cancer, we employed MALDI mass spectrometric imaging (MSI) to a prostate cancer tissue microarray (TMA) containing formalin-fixed, paraffin-embedded tissues samples from 1,044 patients for which clinical follow-up data were available. MSI analysis revealed 15 distinct mass per charge (m/z)-signals associated to epithelial structures. A comparison of these signals with clinico-pathological features revealed statistical association with favorable tumor phenotype such as low Gleason grade, early pT stage or low Ki67 labeling Index (LI) for four signals (m/z 700, m/z 1,502, m/z 1,199 and m/z 3,577), a link between high Ki67LI for one signal (m/z 1,013) and a relationship with prolonged time to PSA recurrence for one signal (m/z 1,502; p 5 0.0145). Multiple signals were associated with the ERG-fusion status of our cancers. Two of 15 epithelium-associated signals including m/z 1,013 and m/z 1,502 were associated with detectable ERG expression and five signals (m/z 644, 678, 1,044, 3,086 and 3,577) were associated with ERG negativity. These observations are in line with substantial molecular differences between fusion-type and non-fusion type prostate cancer. The signals observed in this study may characterize molecules that play a role in the development of TMPRSS2-ERG fusions, or alternatively reflect pathways that are activated as a consequence of ERG-activation. The combination of MSI and large-scale TMAs reflects a powerful approach enabling immediate prioritization of MSI signals based on associations with clinico-pathological and molecular data.