Comparison of SIMCA pattern recognition and library search identification of hazardous compounds from mass spectra (original) (raw)

Selective detection of classes of chemical compounds by gas chromatography/mass spectrometry/pattern recognition: polycyclic aromatic hydrocarbons and alkanes

Analytical Chemistry, 1987

reliable methods of measuring antimony species in water are needed for speciation studies. This extraction method provides a large preconcentration factor which is necessary for the determination of low levels of Sb(II1) species in water. The proposed analytical method has currently been used to investigate arsenic and antimony species in some groundwater systems in the Coeur d'Alene Mining District in order to obtain a better understanding of the distribution of these species in aquatic environments. The extraction method described in this paper is not limited to NAA. I t can be combined with other instrumental techniques such as graphite furnace atomic absorption spectrometry (GFAAS) for arsenic and antimony speciation studies in natural water systems.

A Simple Comparison of Mass Spectral Search Results and Implications for Environmental Screening Analyses

Archives of Environmental Contamination and Toxicology, 1999

A simple assessment of the ability of environmental laboratories to perform automated library searching procedures on mass spectra of unknown pollutants was conducted. In this assessment, 10 laboratories analyzed a hexane solution containing eight organic chemicals using gas chromatography/mass spectrometry and searched their acquired mass spectral data against mass spectral reference libraries. The search results were used to evaluate the similarity of the lists of tentative identifications (TIDs) among the laboratories and to compare the observed searching success to the searching success reported in the literature using high-quality mass spectral data. A high degree of similarity was observed among the lists of TIDs reported by the laboratories for each chemical. The searching success observed in this study was slightly lower than that reported in the literature based on higher-quality mass spectral data. This simple comparison suggests that laboratories performing routine environmental analyses can successfully perform automated searching procedures for unknown sample components, and that the mass spectral searching component of analytical methods designed to screen for unknown organic pollutants should be successful, i.e., obtain similar and reproducible results among laboratories.

Toxicological Evaluation of Complex Mixtures by Pattern Recognition: Correlating Chemical Fingerprints to Mutagenicity

Environmental Health Perspectives, 2002

We describe the use of pattern recognition and multivariate regression in the assessment of complex mixtures by correlating chemical fingerprints to the mutagenicity of the mixtures. Mixtures were 20 organic extracts of exhaust particles, each containing 102-170 individual compounds such as polycyclic aromatic hydrocarbons (PAHs), nitro-PAHs, oxy-PAHs, and saturated hydrocarbons. Mixtures were characterized by full-scan GC-MS (gas chromatography-mass spectrometry). Data were resolved into peaks and spectra for individual compounds by an automated curve resolution procedure. Resolved chromatograms were integrated, resulting in a predictor matrix that was used as input to a principal component analysis to evaluate similarities between mixtures (i.e., classification). Furthermore, partial least-squares projections to latent structures were used to correlate the GC-MS data to mutagenicity, as measured in the Ames Salmonella assay (i.e., calibration). The best model (high r 2 and Q 2) identifies the variables that co-vary with the observed mutagenicity. These variables may subsequently be identified in more detail. Furthermore, the regression model can be used to predict mutagenicity from GC-MS chromatograms of other organic extracts. We emphasize that both chemical fingerprints as well as detailed data on composition can be used in pattern recognition.

Comprehensive Chemical Fingerprinting by Multidimensional GC and Supervised Machine Learning

2020

Project highlight. This poject leverages and combines recent advances in machine learning and analytical chemistry to progress nuclear nonproliferation technologies beyond current capabilities. The developed approaches can be used to identify and detect complex chemical fingerprints of facilities of interest; these tecniques can be applied to other fields including climate sciences, environmental chemistry, and atmospheric physics. Awards and Recognition Intellectual Property Review This report has been reviewed by SRNL Legal Counsel for intellectual property considerations and is approved to be publicly published in its current form. SRNL Legal Signature Signature Date LDRD-2020-00020 LDRD Report FY2020 Objectives  Analytical method development for multidimesional gas chromatography analysis of volatile organic compounds  Training data set collection utilizing multidimensional gas chromatography  Machine learning based data anlysis development utilzing open source data

Comparative Prediction of Gas Chromatographic Retention Indices for GC/MS Identification of Chemicals Related to Chemical Weapons Convention by Incremental and Machine Learning Methods

Separations

During on-site verification activities conducted by the Technical Secretariat of Organization for the Prohibition of Chemical Weapons, identification by gas chromatography retention indices (RI) data, in addition to mass spectrometry data, increase the reliability of factual findings. However, reference RIs do not cover all the possible chemical structures. That is why it is important to have models to predict RIs. Applicable only for narrow data sets of chemicals with a fixed scaffold (G- and V-series gases as example), the non-learning incremental method demonstrated predictive median absolute and percentage errors of 2–4 units and 0.1–0.2%; these are comparable with the experimental bias in RI measurements in the same laboratory with the same GC conditions. It outperforms the accuracy of two reported machine learning methods–median absolute and percentage errors of 11–52 units and 0.5–2.8%. However, for the whole Chemical Weapons Convention (CWC) data set of chemicals, when a fix...

Pattern recognition studies of tandem mass spectra

Analytica Chimica Acta, 1993

Principal components analysis and pattern recognition (PCAPR) techniques were applied to MS-MS spectra of fourteen organic compounds. Each spectrum was represented as a two-dimensional matrix containing information from the MS1 spectrum as well as from one, two or three MS2 spectra. The data were reduced by calculating a one-principal component model for each spectrum which explained between 86 and 99% of the variance. Each model was used to calculate each of the spectra, and residual standard deviations (R.S.D.s) were used as a measure of spectral similarity: low R.S.D.s (< 1.0) corresponding to similar spectra and higher R.S.D.s (> 1.0) to dissimilar spectra. The system shows promise for use in monitoring situations in that MS-MS spectra can be efficiently reduced and stored as principal components models and R.S.D. calculations can be used to identify a compound based on how well its spectrum is predicted by the available reference models.

A Fast Mass Spectrum Screening Technique for Volatile Organic Compounds Based on Parallel Artificial Neural Networks

Journal of Chromatographic Science, 1998

A technique for screening mass spectra for the presence of volatile organic compounds (VOCs) is developed using probabilistic neural networks. A parallel neural network filter is designed to recognize benzene, toluene, ethyl benzene, and o-xylene in gas chromatography-mass spectrometry (GC-MS) chromatograms of VOC mixtures. The filter trained rapidly and was evaluated by analyzing a variety of VOC combinations. The performance of the network offers some significant advantages over the traditional GC-MS data processing techniques such as ion extraction and compound library searching. Advantages include speed, selectivity, and the ability to discriminate between overlapping compounds.

Automated Strategies To Identify Compounds on the Basis of GC/EI-MS and Calculated Properties

Analytical Chemistry, 2011

The identification of unknown compounds based on GC/ EI-MS spectrum and structure generation techniques has been improved by combining a number of strategies into a programmed sequence. The program MOLGEN-MS is used to determine the molecular formula and incorporate substructural information to generate all structures matching the mass spectral information. Mass spectral fragments are then predicted for each structure and compared with the experimental spectrum using a match value. Additional data are then calculated automatically for each candidate to allow exclusion of candidates that did not match other analytical information. The effectiveness of these "exclusion criteria", as well as the programming sequence, was tested using a case study of 29 isomers of formula C 12 H 10 O 2 . The default classifier precision resulted in the generation of too many structures in some cases, which was improved by up to several orders of magnitude by including additional classifiers or restrictions. Combining this with the exclusion of candidates based on a Lee retention index/boiling point correlation, octanol-water partitioning coefficients, steric energies, and finally spectral match values limited the number of candidate structures further from over 1 billion without any restrictions down to less than 6 structures in 10 cases and below 35 in all but 3 cases. This method can be used in the absence of matching database spectra and brings unknown identification based on MS interpretation and structure generation techniques a step closer to practical reality.

The role of pattern recognition in the computer-aided classification of mass spectra

Analytica Chimica Acta, 1979

The requirements for the use of pattern recognition techniques as an aid in the identification of chemical substances from their mass spectra are reviewed. Decision-tree pattern recognition is recommended as potentially satisfying these requirements. Examples of this approach using a large data base of mass spectra are provided_