Discrimination of wines based on 2D NMR spectra using learning vector quantization neural networks and partial least squares discriminant analysis (original) (raw)

Comprehensive Classification and Regression Modeling of Wine Samples Using 1H NMR Spectra

Foods

Recently, 1H NMR (nuclear magnetic resonance) spectroscopy was presented as a viable option for the quality assurance of foods and beverages, such as wine products. Here, a complex chemometric analysis of red and white wine samples was carried out based on their 1H NMR spectra. Extreme gradient boosting (XGBoost) machine learning algorithm was applied for the wine variety classification with an iterative double cross-validation loop, developed during the present work. In the case of red wines, Cabernet Franc, Merlot and Blue Frankish samples were successfully classified. Three very common white wine varieties were selected and classified: Chardonnay, Sauvignon Blanc and Riesling. The models were robust and were validated against overfitting with iterative randomization tests. Moreover, four novel partial least-squares (PLS) regression models were constructed to predict the major quantitative parameters of the wines: density, total alcohol, total sugar and total SO2 concentrations. A...

The Effect of Grapevine Variety and Wine Region on the Primer Parameters of Wine Based on 1H NMR-Spectroscopy and Machine Learning Methods

Diversity

Nuclear magnetic resonance (NMR) spectroscopy is an innovative method for wine analysis. Every grapevine variety has a unique structural formula, which can be considered as the genetic fingerprint of the plant. This specificity appears in the composition of the final product (wine). In the present study, the originality of Hungarian wines was investigated with 1H NMR-spectroscopy considering 861 wine samples of four varieties (Cabernet Sauvignon, Blaufränkisch, Merlot, and Pinot Noir) that were collected from two wine regions (Villány, Eger) in 2015 and 2016. The aim of our analysis was to classify these varieties and region and to select the most important traits from the observed 22 ones (alcohols, sugars, acids, decomposition products, biogene amines, polyphenols, fermentation compounds, etc.) in order to detect their effect in the identification. From the tested four classification methods—linear discriminant analysis (LDA), neural networks (NN), support vector machines (SVM), a...

Classification of Chinese wine varieties using 1H NMR spectroscopy combined with multivariate statistical analysis

Food Control, 2018

In this study, the feasibility of discriminating grape varieties of Chinese red and white wines was investigated using 1 H NMR spectroscopy in combination with a multivariate statistical procedure consisting of two steps: principal component analysis (PCA) plus linear discriminant analysis (LDA). Three grape varieties of red wines (Cabernet Sauvignon, Rose Honey, Cabernet Gernischt) and white wines (Ugni Blanc, Long Yan, Chardonnay) were examined, respectively. A segment-wise peak alignment was employed to handle peak misalignments of recorded 1 H NMR spectra. Binning of the aligned 1 H NMR spectra was performed for data reduction. The resulting bins were employed as input variables for the subsequent PCA and LDA analyses. The combination of PCA and LDA yielded in a sufficient discrimination of the examined grape varieties. The validity of the PCA/LDA model was confirmed by internal leave-one-out cross validation (LOOCV) as well as by external repeated double random cross validation (RDRCV). LOOCV and RDRCV led to average correct classification rates of 82% and 83% for red wine varieties, respectively, and 94% and 90% for white wine varieties, respectively. The results demonstrate that 1 H NMR spectroscopy combined with multivariate analysis is an effective tool for verifying the authenticity of Chinese wines.

Nuclear Magnetic Resonance Profiling of Wine Blends

Journal of Agricultural and Food Chemistry, 2011

Nuclear magnetic resonance (NMR) profiling is used for characterization of monocultivar binary wine mixtures. Classification and quantification of the relative amount of wine in the mixture are made in two steps. First, each sample is classified as a mixture of a determined type by solving the appropriate classification problem using NMR profiles. The relative amount of the two corresponding monovarietal wines is then evaluated by multilinear regression of a selected set of NMR variables. Linear discriminant analysis (LDA), used in the classification step, gives a very good separation among the different mixture classes. On the other hand, a single layer artificial neural network, used to solve the multilinear problem, gives the relative amount of wine type in the mixture with a precision of about 10%.

Classification of olive oils using high throughput flow 1H NMR fingerprinting with principal component analysis, linear discriminant analysis and probabilistic neural networks

Analytica Chimica Acta, 2005

The combination of 1 H NMR fingerprinting with multivariate analysis provides an original approach to study the profile of olive oil in relation to its geographical origin and processing. The present work aims at illustrating the relevance of 1 H NMR fingerprints for assessing the geographical origin and the year of production for olive oils from various Mediterranean areas. Multivariate (chemometric) techniques are able to filter out the most relevant information from a spectrum, e.g. for a classification. Principal component analysis (PCA) was carried out on the ∼12,000 variables (chemical shifts) and four data sets were defined prior to PCA. Linear discriminant analysis (LDA) of the first 50 PC's was applied for classification of olive oil samples (97 or 91) according to the geographic origin and year of production. The data analysis has been carried out with and without outliers, as well. Variable selection for LDA was achieved using: (i) the best five variables and (ii) an interactive forward stepwise manner. Using LDA on the external validation sets the correct classification varied between 47 and 75% (random selection), and between 35 and 92% (Kennard-Stone selection (KS)) depending on geographic origin (country) and production years. A similar success rate could be achieved using partial least squares discriminant analysis (PLS DA). The success rate can be considerably improved by using probabilistic neural networks (PNN). Correct classification by PNN varied between 58 and 100% on the external validation sets. Other chemometric techniques, such as multiple linear regression, or generalized pair-wise correlation, did not give better results.

Chemometric Classification of Apulian and Slovenian Wines Using 1H NMR and ICP-OES Together with HPICE Data

Journal of Agricultural and Food Chemistry, 2003

High-performance ion chromatography exclusion, inductively coupled plasma emission spectroscopy, and nuclear magnetic resonance (NMR) measurements were carried out in combination with chemometrics on 33 wine samples coming from three Slovenian wine-growing regions and from Apulia (southern Italy). The chemometric classification of wines according to their geographical origin was obtained with a nearly 100% degree of achievement. The discriminating potential of the 1 H NMR and of the other analytical determinations has been estimated separately. The best prediction of wines has been obtained with NMR data.

Model calibration and feature selection for orange juice authentication by 1H NMR spectroscopy

Chemometrics and Intelligent Laboratory Systems, 2012

A 1 H NMR spectroscopic profiling approach has been investigated to discriminate between authentic and adulterated juices. An experimental database of 150 samples of authentic or adulterated orange juices, with a known percentage of clementine juice, was prepared. A repeated stratified cross-validation process was adopted for the validation of PLS regression models and classification rules. The choice of a type of statistical data pre-treatment was discussed. The result was that logarithmic transformation combined with Pareto scaling was the most relevant. The selection of spectral variables has also proven to lead to better results than using the whole spectral range. Various feature selection procedures were compared. The CovSel approach appeared to be the most efficient. However, for a better understanding of the features of atypical profiles, it would be wise to use more than one selection procedure like the ones based on Backward Interval PLS regression approach or on the genetic algorithms.

Joint NMR and Solid-Phase Microextraction-Gas Chromatography Chemometric Approach for Very Complex Mixtures: Grape and Zone Identification in Wines

Analytical chemistry, 2016

In very complex mixtures, classification by chemometric methods may be limited by the difficulties to extract from the NMR or gas chromatography/mass spectrometry (GC/MS) experimental data information useful for a reliable classification. The joint analysis of both data has showed its superiority in the biomedical field but is scarcely used in foodstuffs and never in wine in spite of the complexity of their spectra and classification. In this article we show that univariate and multivariate principal component analysis-discriminant analysis (PCA-DA) statistics applied to the combined (1)H NMR and solid-phase microextraction-gas chromatography (SPME-GC) data of a collection of 270 wines from Galicia (northwest Spain) allows a discrimination and classification not attainable from the separate data, distinguishing wines from autochthonous and nonautochthonous grapes, mono- from the plurivarietals, and identifying, in part, the geographical subzone of origin of the albariño wines. A gen...