Chemometrics Research Papers - Academia.edu (original) (raw)
It is well known that the predictions of the single response orthogonal projections to latent structures (OPLS) and the single response partial least squares regression (PLS1) regression are identical in the single-response case. The... more
It is well known that the predictions of the single response orthogonal projections to latent structures (OPLS) and the single response partial least squares regression (PLS1) regression are identical in the single-response case. The present paper presents an approach to identification of the complete y-orthogonal structure by starting from the viewpoint of standard PLS1 regression. Three alternative non-deflating OPLS algorithms and a modified principal component analysis (PCA)-driven method (including MATLAB code) is presented. The first algorithm implements a postprocessing routine of the standard PLS1 solution where QR factorization applied to a shifted version of the nonorthogonal scores is the key to express the OPLS solution. The second algorithm finds the OPLS model directly by an iterative procedure. By a rigorous mathematical argument, we explain that orthogonal filtering is a 'built-in' property of the traditional PLS1 regression coefficients. Consequently, the capabilities of OPLS with respect to improving the predictions (also for new samples) compared with PLS1 are non-existing. The PCA-driven method is based on the fact that truncating off one dimension from the row subspace of X results in a matrix X orth with y-orthogonal columns and a rank of one less than the rank of X. The desired truncation corresponds exactly to the first X deflation step of Martens non-orthogonal PLS algorithm. The significant y-orthogonal structure of X found by PCA of X orth is split into two fundamental parts: one part that is significantly contributing to correct the first PLS score toward y and one part that is not. The third and final OPLS algorithm presented is a modification of Martens non-orthogonal algorithm into an efficient dual PLS1-OPLS algorithm.
Esnearen gantz edukia ezagutzea beharrezkoa da gaur egungo elikagaien industrian, eta metodo tradizionalak erabili beharrean NIR espektroskopia erabiltzeak bertako eta zuzeneko analisiak egitea ahalbidetzen du. Hau honela, esnearen gantz... more
Esnearen gantz edukia ezagutzea beharrezkoa da gaur egungo elikagaien industrian, eta metodo tradizionalak erabili beharrean NIR espektroskopia erabiltzeak bertako eta zuzeneko analisiak egitea ahalbidetzen du. Hau honela, esnearen gantz edukia determinatzeko metodo alternatibo honen azterketa egin da, espektrometro berri baten garapenak beharrezkoak dituen kalibratuak sortuz aldagai anitzeko analisia (kimiometria) eta NIR espektroen prozesaketaa baliatuz esne motaren araberako sailkapen ereduak eta gantz edukia aurresateko ereduak sortzeko modu ezberdinak aztertuz.
- by Jokin Ezenarro and +1
- •
- Chemometrics, Spectroscopy, NIR spectroscopy, NIR & Chemometrics
Ouzo and tsipouro belong to the group of anise-flavoured spirits that are produced in countries around the Mediterranean Sea. Despite the high commercial value of these spirits, there has previously been no dedicated lexicon to describe... more
Ouzo and tsipouro belong to the group of anise-flavoured spirits that are produced in countries around the Mediterranean Sea. Despite the high commercial value of these spirits, there has previously been no dedicated lexicon to describe their sensory properties. Six commercial samples of ouzo and three commercial samples of tsipouro, selected from different regions of Greece, were analysed using gas chromatography and sensory analysis. Their attributes (odour, taste and aftertaste) were examined in order to create a set of descriptors. Spider webs and principal components analysis (PCA) were used to create a lexicon describing the samples and demonstrating the differences between the two alcoholic beverages. Results from chemical and sensory analyses were combined using PCA and factor analysis. The ability of assessors to separate the products with real differences was confirmed using cluster analysis. A set of descriptors, suitable for these two products, was created and with its use the two products could be discriminated by sensory analysis. The descriptors used for odour were anise, mastic, sweet, alcoholic, herbal, vanilla, menthol and strong; for taste they were sweet, alcoholic, rich, spicy, artificial, aromatic, menthol and caustic; and for aftertaste they were sweet, alcoholic, artificial, spicy and bitter. These descriptors should be a useful methodological tool in future research and development (R&D), in both industry and academia, when studying alcoholic beverages.
The aim of this study was to classify whole-leg cooked hams, made without polyphosphates, by linear discriminant analysis. Principal component analysis (PCA) was used for the selection of significant variables. Thirty-two variables were... more
The aim of this study was to classify whole-leg cooked hams, made without polyphosphates, by linear discriminant analysis. Principal component analysis (PCA) was used for the selection of significant variables. Thirty-two variables were evaluated on 26 cooked hams prepared using different levels of brine injection and legs from pork bred in different countries (France or Denmark). Previously published data related to 20 hams were also used for classification. A chemometric model, based on ten variables, was obtained by using PCA. The variables were pH, moisture, protein, fat, NaCl, superficial wateriness, L* and a*/b* of biceps femoris muscle, modulus and elasticity index of semitendinosus muscle. Discriminant functions calculated using PCA-selected variables enable correct classification of the cooked hams according to the origin of the meat used and, when this is the same, according to the percentage of brine injected.
Genetic algorithms have been created as an optimization strategy to be used especially when complex response surfaces do not allow the use of better-known methods (simplex, experimental design techniques, etc.). This paper shows that... more
Genetic algorithms have been created as an optimization strategy to be used especially when complex response surfaces do not allow the use of better-known methods (simplex, experimental design techniques, etc.). This paper shows that these algorithms, conveniently modified, can also be a valuable tool in solving the feature selection problem. The subsets of variables selected by genetic algorithms are generally more efficient than those obtained by classical methods of feature selection, since they can produce a better result by using a lower number of features.
Systematic errors can occur in every chemical analysis independent of the method used. In general, the risk of systematic errors can be diminished by separation steps prior to the determination procedure. In sensor measurements systematic... more
Systematic errors can occur in every chemical analysis independent of the method used. In general, the risk of systematic errors can be diminished by separation steps prior to the determination procedure. In sensor measurements systematic errors can be noticed only by analysing real samples with a known concentration of the analyte with a great variety of matrices. The sources for such errors are manifold, although excellent reproducibility of the results is shown. The main reason for a systematically blased result in the field of chemo-and biosensors lies in the influence of the sample matrix on the sensor signal, which ought to be produced by the analyte only. Examples of strong matrix interferences on different sensor principles are presented and a classification of the most prominent systematic errors known in analytical chemistry is given. Apart from problems related to a lack of selectivity, which lead to a co-sensing of interferents, the matrix often influences the sensitivity (slope of the calibration curve) and/or the level of the blank signal in an unpredictable manner. Compensation methods, like the well-known blank-signal subtraction or differential sensor measurements, work properly only if certain conditions are fultilled. The principle of signal additivity has to be proven and the invariance of the sensitivity has to be demonstrated in any case and for different matrices. With sensor arrays these requirements must be fulfilled as well.
A new set of derived variables is proposed for exhibiting group separation in multivariate data or for preprocessing such data prior to discriminant analysis. The technique combines optimal features of canonical variate analysis and... more
A new set of derived variables is proposed for exhibiting group separation in multivariate data or for preprocessing such data prior to discriminant analysis. The technique combines optimal features of canonical variate analysis and principal component analysis: the derived variables are linear combinations of the original variables that optimize the canonical variate criterion (ratio of between-group to withingroup variance) but subject to the orthogonality constraints of principal components. In this formulation the canonical variates can be derived even when the within-group matrix is singular (i.e. when there are more variables than objects in the data matrix). A simple computational algorithm for extraction of these variables is proposed. The methods are illustrated on several data sets and compared with alternative techniques such as principal component analysis and partial least squares.
Partial least squares (PLS) was not originally designed as a tool for statistical discrimination. In spite of this, applied scientists routinely use PLS for classification and there is substantial empirical evidence to suggest that it... more
Partial least squares (PLS) was not originally designed as a tool for statistical discrimination. In spite of this, applied scientists routinely use PLS for classification and there is substantial empirical evidence to suggest that it performs well in that role. The interesting question is: why can a procedure that is principally designed for overdetermined regression problems locate and emphasize group structure? Using PLS in this manner has heurestic support owing to the relationship between PLS and canonical correlation analysis (CCA) and the relationship, in turn, between CCA and linear discriminant analysis (LDA). This paper replaces the heuristics with a formal statistical explanation. As a consequence, it will become clear that PLS is to be preferred over PCA when discrimination is the goal and dimension reduction is needed.
Please cite this article as: Emanuella Santos Sousa , Mateus P. Schneider , Licarion Pinto , Mario Cesar Ugulino de Araujo , Adriano de Ará ujo Gomes , Chromatographic quantification of seven pesticide residues in vegetable: univariate... more
Please cite this article as: Emanuella Santos Sousa , Mateus P. Schneider , Licarion Pinto , Mario Cesar Ugulino de Araujo , Adriano de Ará ujo Gomes , Chromatographic quantification of seven pesticide residues in vegetable: univariate and multiway calibration comparison, Microchemi-cal Journal (2019), doi: https://doi.
- by Mário César Ugulino de Araújo and +1
- •
- Chemometrics, Chromatography
Sensitive and selective stability-indicating assay methods (SIAMs) are suggested for the determination of cilostazol (CIL) in the presence of its acid, alkaline and oxidative degradation products. Developing SIAMs is necessary to carry... more
Sensitive and selective stability-indicating assay methods (SIAMs) are suggested for the determination of cilostazol (CIL) in the presence of its acid, alkaline and oxidative degradation products. Developing SIAMs is necessary to carry out any stability study. Stress testing of CIL was performed according to the International Conference on Harmonization (ICH) guidelines in order to validate the stability-indicating power of the analytical procedures. Stress testing showed that CIL underwent acid, alkaline and oxidative degradation; on the other hand, it showed stability towards photo-and thermal degradation. Two chromatographic SIAMs were developed, namely HPLC and HPTLC methods. The concentration range and the mean percentage recovery were 1.0-31.0 g/ml and 99.96 ± 0.46 and 0.6-14.0 g/spot and 99.88 ± 1.10 for HPLC and HPTLC methods, respectively. In addition, derivative spectrophotometric methods were developed in order to determine CIL in the presence of its acid degradation product; these were performed by using the third derivative spectra ( 3 D) and the first derivative of the ratio spectra ( 1 DD) methods. The linearity range and the mean percentage recovery were 2.0-34.0 g/ml and 100.27 ± 1.20 for the ( 3 D) method, while they were 2.0-30.0 g/ml and 99.94 ± 1.18 for the ( 1 DD) method. Also, two chemometric-assisted spectrophotometric methods, based on using partial least squares (PLS) and concentration residual augmented classical least squares method (CRACLS), for the determination of CIL were developed. Both methods were applied on zero order spectra of the mixtures of CIL and its acid degradation product, the mean percentage recovery was 100.03 ± 1.09 and 99.91 ± 1.27 for PLS and CRACLS, respectively. All methods were validated according to the International Conference on Harmonization (ICH) guidelines and applied on bulk powder and pharmaceutical formulations.
The aim of this study was to classify whole-leg cooked hams, made without polyphosphates, by linear discriminant analysis. Principal component analysis (PCA) was used for the selection of significant variables. Thirty-two variables were... more
The aim of this study was to classify whole-leg cooked hams, made without polyphosphates, by linear discriminant analysis. Principal component analysis (PCA) was used for the selection of significant variables. Thirty-two variables were evaluated on 26 cooked hams prepared using different levels of brine injection and legs from pork bred in different countries (France or Denmark). Previously published data related to 20 hams were also used for classification. A chemometric model, based on ten variables, was obtained by using PCA. The variables were pH, moisture, protein, fat, NaCl, superficial wateriness, L* and a*/b* of biceps femoris muscle, modulus and elasticity index of semitendinosus muscle. Discriminant functions calculated using PCA-selected variables enable correct classification of the cooked hams according to the origin of the meat used and, when this is the same, according to the percentage of brine injected.
2,5-Hexanedione (2,5-HD) is the most important metabolite of n-hexane and methyl ethyl ketone in human urine. Urinary 2,5-HD is used as a biomarker for biological monitoring of workers exposed to n-hexane. A simple method using headspace... more
2,5-Hexanedione (2,5-HD) is the most important metabolite of n-hexane and methyl ethyl ketone in human urine. Urinary 2,5-HD is used as a biomarker for biological monitoring of workers exposed to n-hexane. A simple method using headspace solid-phase microextraction (HS-SPME) and gas chromatography (GC) equipped with a flame-ionization detector (FID) was developed. The parameters that affect the HS-SPME-GC-FID process were optimized (i.e., fiber coating, sample volume, adsorption and heating time, salt addition, and extraction temperature). The assay presented linearity in the range of 0.075 to 20.0 mg/L, precision (coefficient of variation < 7.0%), and detection limit of 0.025 mg/L for 2,5-HD in urine. The method was successfully applied to the analysis of 2,5-HD in urine samples from eight workers occupationally exposed to n-hexane in shoemaker's glue. demonstrated that n-hexane was the main solvent present (4). The exposure of workers predominantly occurs in the glue industry and shoe manufacture and repair, mainly by inhalation, whereas dermal absorption occurs only by direct skin-glue contact. The most widely used indicator in the monitoring of workers exposed to n-hexane is urinary 2,5-HD, and many studies have demonstrated good correlation between exposure to n-hexane in workplaces and 2,5-HD urinary excretion (5-7). Urinary 2,5-HD is recommended as a better approach than air monitoring in the assessment of health risks, specifically for the early detection of n-hexane neurotoxicity (8). The use of total 2,5-HD (determined after urine hydrolysis) as opposed to free 2,5-HD for biological monitoring of exposure to n-hexane has been the basis of considerable controversy (9-14). The American Conference of Governmental Industrial Hygienists (ACGIH) (15) recommends the determination of free 2,5-HD in urine collected at the end of a working shift at the end of the workweek and a biological exposure index (BEI) of 0.4 mg/L. This decision was made because the other nhexane metabolites (4,5-dihydroxy-2-hexanone and 5-hydroxy-2-hexanone) are converted into 2,5-HD during hydrolysis. 2,5-HD and free 2,5-HD are both suitable from an analytical point of view and meaningful for biological monitoring purposes because the neurotoxic risk arising from conjugated metabolites is not relevant (12,16,17). Gas chromatography-mass spectrometry (GC-MS) and GC-flame-ionization detection (FID) are the techniques most widely used for the quantification of 2,5-HD in urine (13,14,18-20), in addition to electron capture detection (ECD) after derivatization (10) and high-performance liquid chromatography using UV, fluorescence detection, or MS (21,22). However, whatever method is used for identification, the analyte should be extracted from the biological matrices. Liquid-liquid and solid-phase extraction are commonly used procedures (9,10,13,14). Solid-phase microextraction (SPME) is a very sensitive and selective process of extraction that integrates sampling, extraction, concentration, and sample introduction into a single
In order to improve the storage and CPU time in the numerical analysis of large two-dimensional (hyphenated, second-order) infrared spectra, a data-preprocessing technique (compression) is presented which is based on B-splines. B-splines... more
In order to improve the storage and CPU time in the numerical analysis of large two-dimensional (hyphenated, second-order) infrared spectra, a data-preprocessing technique (compression) is presented which is based on B-splines. B-splines have been chosen as the compression method since they are wellsuited to model smooth curves. There are two primary goals of compression: a reduction of file size and a reduction of computation when analyzing the compressed representation. The compressed representation of the spectra is used as a substitute for the original representation. For the particular example used here, approximately 0.16 bit per data element was required for the compressed representation in contrast with 16 bits per data element in the uncompressed representation. The compressed representation was further analysed using principal component analysis and compared with a similar analysis on the original data set. The results shows that the principal compotent model of the compressed representation is directly comparable with the principal component model of the original data.
Chemical characteristics of wood are used in this study for plant taxonomy classification based on the current Angiosperm Phylogeny Group classification (APG III System) for the division, class and subclass of woody plants. Infrared... more
Chemical characteristics of wood are used in this study for plant taxonomy classification based on the current Angiosperm Phylogeny Group classification (APG III System) for the division, class and subclass of woody plants. Infrared spectra contain information about the molecular structure and intermolecular interactions among the components in wood but the understanding of this information requires multivariate techniques for the analysis of highly dense datasets. This article is written with the purposes of specifying the chemical differences among taxonomic groups, and predicting the taxa of unknown samples with a mathematical model. Principal component analysis, t-test, stepwise discriminant analysis and linear discriminant analysis, were some of the chosen multivariate techniques. A procedure to determine the division, class, subclass, order and family of unknown samples was built with promising implications for future applications of Fourier Transform Infrared spectroscopy in wood taxonomy classification
ABSTRACT We investigate the potential of ultraviolet spectroscopy (UV) to monitor cleaning of whey filtration membrane units. Based on sample collections in a full scale production environment two cases of cleaning monitoring by UV are... more
ABSTRACT We investigate the potential of ultraviolet spectroscopy (UV) to monitor cleaning of whey filtration membrane units. Based on sample collections in a full scale production environment two cases of cleaning monitoring by UV are evaluated. The first case demonstrates that UV can measure progress during both recirculation cleaning and the subsequent flushing of cleaning agents. The second case establishes that kinetics during the enzymatic cleaning step can be followed by UV and shows that different cleaning mechanisms are acting simultaneously. We also assess the detection limit for different whey components in a mixture design. Results show that UV in combination with partial least squares regression can quantify whey components down to between 10–20 ppm for whey protein, 5–9 ppm for whey fat, and 60–80 ppm for non-protein/fat solids.
This work aims at exploring the potentiality of the Fourier transform IR spectroscopy (FTIR) to study the effects that can be generated on plastic materials based on poly(vinyl chloride) (PVC) used for extra-corporal medical disposables,... more
This work aims at exploring the potentiality of the Fourier transform IR spectroscopy (FTIR) to study the effects that can be generated on plastic materials based on poly(vinyl chloride) (PVC) used for extra-corporal medical disposables, after industrial processes such as extrusion, sterilization and conservation. In particular, FTIR equipped with a single attenuated total reflection accessory (ATR) mounted on an infrared microscope (Mic-IR) has been used. At the same time, this paper proposes a quality-control method for semi-finished blood circuits' components, based on the chemometric evaluation of surface-selective spectroscopic signals, i.e. Mic-IR/ATR spectra. Results suggest that IR spectroscopic technique coupled with a multivariate approach might represent a simple and powerful method for quality control of industrial processes.
The goal of this paper is the development of a multivariate calibration method for the quantitative determination of petroleum hydrocarbons in water and waste water by using FT-IR spectroscopy and PLS as a regression method to improve the... more
The goal of this paper is the development of a multivariate calibration method for the quantitative determination of petroleum hydrocarbons in water and waste water by using FT-IR spectroscopy and PLS as a regression method to improve the results attained at the present time through the univariate standard method. In order to evaluate the performance of the regression model, four experimental responses obtained from an independent validation set prepared with spiked samples were examined: Root mean square error of prediction (RMSEP), average recovery, standard deviation, and relative standard deviation. In order to compare final results, the univariate model was developed together with the multivariate approach. The results show that the multivariate calibration method outperforms the univariate standard method. The accuracy of the results, capability of detection, and the high index of recovery obtained show that a multivariate calibration approach for the determination of petroleum hydrocarbons in water and waste water by means of IR spectroscopy can be seen as a very promising option to improve the current univariate standard method.
The pair correlation method (PCM) has been developed for choosing between two correlated predictor variables (factors) provided that the scatter is caused not only by random effects. The distinction between two variables can be made using... more
The pair correlation method (PCM) has been developed for choosing between two correlated predictor variables (factors) provided that the scatter is caused not only by random effects. The distinction between two variables can be made using an arrangement into a 2 Â 2 contingency table.
Different estimators of the Mahalanobis distance (such as that based on the Defrise-Gussenhoven correction) are studied and compared with respect to the bias on the distance and the characteristics (sensitivity and specificity) of the... more
Different estimators of the Mahalanobis distance (such as that based on the Defrise-Gussenhoven correction) are studied and compared with respect to the bias on the distance and the characteristics (sensitivity and specificity) of the class model.
- by Michele Forina and +1
- •
- Analytical Chemistry, Chemometrics
The emission spectrum measured in the middle infrared (IR) band from the plume of a rocket can be used to identify rockets and track inbound missiles. It is useful to test the stealth properties of the IR fingerprint of a rocket during... more
The emission spectrum measured in the middle infrared (IR) band from the plume of a rocket can be used to identify rockets and track inbound missiles. It is useful to test the stealth properties of the IR fingerprint of a rocket during its design phase without needing to spend excessive amounts of money on field trials. The modelled predictions of the IR spectra from selected rocket motor design parameters therefore bear significant benefits in reducing the development costs.The emission spectrum measured in the middle infrared (IR) band from the plume of a rocket can be used to identify rockets and track inbound missiles. It is useful to test the stealth properties of the IR fingerprint of a rocket during its design phase without needing to spend excessive amounts of money on field trials. The modelled predictions of the IR spectra from selected rocket motor design parameters therefore bear significant benefits in reducing the development costs. In a recent doctorate study it was f...
In this paper, different three-way methods are tested for their power and shortcomings to solve complex second-order calibration problems. The generic calibration problem is quantifying for an analyte in the presence of an unknown... more
In this paper, different three-way methods are tested for their power and shortcomings to solve complex second-order calibration problems. The generic calibration problem is quantifying for an analyte in the presence of an unknown interferent: a second-order calibration problem. Due to rank restrictions of the data, standard second-order calibration methods like Generalized Rank Annihilation cannot be used to solve the type of complex second-order calibration problems shown in this paper. Different real examples are tested in which it is shown that the three-way methods can, to a certain extent, deal with the complex calibrations. This stresses the fact that all second-order calibration methods should be regarded as three-way methods, and when put in this framework, can be compared with respect to their performance.
The use of artificial neural networks (ANNs) for nonlinear modeling of symmetric and nonsymmetric peaks in capillary zone electrophoresis (CZE) and in optimization of CZE methods was studied. It was shown that ANNs can be used to estimate... more
The use of artificial neural networks (ANNs) for nonlinear modeling of symmetric and nonsymmetric peaks in capillary zone electrophoresis (CZE) and in optimization of CZE methods was studied. It was shown that ANNs can be used to estimate peak parameters and in combination with experimental design can be applied for efficient prediction of optimal separation conditions. The great advantage is that no use of the explicit model of the separation process and no knowledge of the physicochemical constants are needed.
A new graphical user-friendly interface for Multivariate Curve Resolution using Alternating Least Squares has been developed as a freely available MATLAB toolbox. Through the use of this new easy-to-use graphical interface, the selection... more
A new graphical user-friendly interface for Multivariate Curve Resolution using Alternating Least Squares has been developed as a freely available MATLAB toolbox. Through the use of this new easy-to-use graphical interface, the selection of the type of data analysis (either individual experiments giving a single data matrix or the more powerful simultaneous analysis of several experiments using one or more techniques) and the selection of the appropriate constraints can be performed in an intuitive and easy way, with the help of the options in the graphical interface. Different examples of use of this interface are given. D 2004 Published by Elsevier B.V.
In this paper, an electrochemical application of bismuth film modified glassy carbon electrode for azo-colorants determination was investigated. Bismuth-film electrode (BiFE) was prepared by ex-situ depositing of bismuth onto glassy... more
Instrumental bitterness assessment of traditional Chinese herbal medicine (TCM) preparations was addressed in this study. Three different approaches were evaluated, high-performance liquid chromatography coupled to UV detector (HPLC),... more
Instrumental bitterness assessment of traditional Chinese herbal medicine (TCM) preparations was addressed in this study. Three different approaches were evaluated, high-performance liquid chromatography coupled to UV detector (HPLC), capillary electrophoresis coupled to UV detector (CE) and a potentiometric multisensor system - electronic tongue (ET). Most studies involving HPLC and CE separations use these as selective instruments for quantification of individual substances. However we employed these techniques to provide chromatographic or electrophoretic sample profiles. These profiles are somewhat analogous to the profiles produced by the ET. Profiles from all instruments were then related to professional sensory panel evaluations using projections on latent structures (PLS) regression. It was found that all three methods allow for bitterness assessment in TCM samples in terms of human sensory panel with root mean squared errors of prediction ca. 0.9 within bitterness scale fro...
The O2-PLS method is derived from the basic partial least squares projections to latent structures (PLS) prediction approach. The importance of the covariation matrix (Y T X) is pointed out in relation to both the prediction model and the... more
The O2-PLS method is derived from the basic partial least squares projections to latent structures (PLS) prediction approach. The importance of the covariation matrix (Y T X) is pointed out in relation to both the prediction model and the structured noise in both X and Y. Structured noise in X (or Y) is defined as the systematic variation of X (or Y) not linearly correlated with Y (or X). Examples in spectroscopy include baseline, drift and scatter effects. If structured noise is present in X, the existing latent variable regression (LVR) methods, e.g. PLS, will have weakened score±loading correspondence beyond the first component. This negatively affects the interpretation of model parameters such as scores and loadings. The O2-PLS method models and predicts both X and Y and has an integral orthogonal signal correction (OSC) filter that separates the structured noise in X and Y from their joint X±Y covariation used in the prediction model. This leads to a minimal number of predictive components with full score±loading correspondence and also an opportunity to interpret the structured noise. In both a real and a simulated example, O2-PLS and PLS gave very similar predictions of Y. However, the interpretation of the prediction models was clearly improved with O2-PLS, because structured noise was present. In the NIR example, O2-PLS revealed a strong water peak and baseline offset in the structured noise components. In the simulated example the O2-PLS plot of observed versus predicted Y-scores (u vs u hat ) showed good predictions. The corresponding loading vectors provided good interpretation of the covarying analytes in X and Y.
The transfer of analytical methods from a sending laboratory to a receiving one requires to guarantee that this last laboratory will obtain accurate results. Undeniably method transfer is the ultimate step before routine implementation of... more
The transfer of analytical methods from a sending laboratory to a receiving one requires to guarantee that this last laboratory will obtain accurate results. Undeniably method transfer is the ultimate step before routine implementation of the method at the receiving site. The conventional statistical approaches generally used in this domain which analyze separately the trueness and precision characteristics of the receiver do not achieve this. Therefore, this paper aims first at demonstrating the applicability of two recent statistical approaches using total error-based criterion and taking into account the uncertainty of the true value estimate of the sending laboratory, to the transfer of bioanalytical methods. To achieve this, they were successfully applied to the transfer of two fully automated liquid chromatographic method coupled on-line to solid-phase extraction. The first one was dedicated to the determination of three catecholamines in human urine using electrochemical detection, and the second one to the quantitation of N-methyl-laudanosine in plasma using fluorescence detection. Secondly, a risk-based evaluation is made in order to understand why classical statistical approaches are not sufficient to provide the guarantees that the analytical method will give most of the time accurate results during its routine use. Finally, some recommendations for the transfer studies are proposed.
Nitrogen-rich adulterants in protein powders present sensitivity challenges to conventional combustion methods of protein determination which can be overcome by near Infrared spectroscopy (NIRS). NIRS is a rapid analytical method with... more
Nitrogen-rich adulterants in protein powders present sensitivity challenges to conventional combustion methods of protein determination which can be overcome by near Infrared spectroscopy (NIRS). NIRS is a rapid analytical method with high sensitivity and non-invasive advantages. This study developed robust models using benchtop and handheld spectrometers to predict low concentrations of urea, glycine, taurine, and melamine in whey protein powder (WPP). Effectiveness of scanning samples through optical glass and polyethylene bags was also tested for the handheld NIRS. WPP was adulterated up to six concentration levels from 0.5% to 3% w/w. The two spectrometers were used to obtain three datasets of 819 diffuse reflectance spectra each that were pretreated before linear discriminant analysis (LDA) and regression (PLSR). Pretreatment was effective and revealed important absorption bands that could be correlated with the chemical properties of the mixtures. Benchtop NIR spectrometer showed the best results in LDA and PLSR but handheld NIR spectrometers showed comparatively good results. There were high prediction accuracies and low errors attesting to the robustness of the developed PLSR models using independent test set validation. Both the plastic bag and optical glass gave good results with accuracies depending on the adulterant of interest and can be used for field applications.
A generic preprocessing method for multivariate data, called orthogonal projections to latent structures (O-PLS), is described. O-PLS removes variation from X (descriptor variables) that is not correlated to Y (property variables, e.g.... more
A generic preprocessing method for multivariate data, called orthogonal projections to latent structures (O-PLS), is described. O-PLS removes variation from X (descriptor variables) that is not correlated to Y (property variables, e.g. yield, cost or toxicity). In mathematical terms this is equivalent to removing systematic variation in X that is orthogonal to Y. In an earlier paper, Wold et al. (Chemometrics Intell. Lab. Syst. 1998; 44: 175±185) described orthogonal signal correction (OSC). In this paper a method with the same objective but with different means is described. The proposed O-PLS method analyzes the variation explained in each PLS component. The non-correlated systematic variation in X is removed, making interpretation of the resulting PLS model easier and with the additional benefit that the non-correlated variation itself can be analyzed further. As an example, near-infrared (NIR) reflectance spectra of wood chips were analyzed. Applying O-PLS resulted in reduced model complexity with preserved prediction ability, effective removal of noncorrelated variation in X and, not least, improved interpretational ability of both correlated and noncorrelated variation in the NIR spectra.
The paper derives a new set of principal properties (PPs) for coded amino acids from GRID maps and experimental data. The three scales characterize side chains according to their polarity (PP 1 ), size/ hydrophobicity (PP 2 ) and... more
The paper derives a new set of principal properties (PPs) for coded amino acids from GRID maps and experimental data. The three scales characterize side chains according to their polarity (PP 1 ), size/ hydrophobicity (PP 2 ) and H-bonding capability (PP 3 ) and can be used profitably both for describing and designing peptide series. The new parameters are further used to develop modified auto-and cross-covariance transforms which appear to be even more suitable for the stated goals, as they label each peptide according to its bonding capabilities. Figure 3. PCscoreplot (1vs 2) obtainedfroma PCAmodelover 322 tripeptidesdescribed by nine variables (PPs).From each cluster, one tripeptide (greypoint) was selected.
Concentrations of multiple analytes were simultaneously measured in whole blood with clinical accuracy, without sample processing, using near-infrared Raman spectroscopy. Spectra were acquired with an instrument employing nonimaging... more
Concentrations of multiple analytes were simultaneously measured in whole blood with clinical accuracy, without sample processing, using near-infrared Raman spectroscopy. Spectra were acquired with an instrument employing nonimaging optics, designed using Monte Carlo simulations of the inf luence of light-scattering -absorbing blood cells on the excitation and emission of Raman light in turbid medium. Raman spectra were collected from whole blood drawn from 31 individuals. Quantitative predictions of glucose, urea, total protein, albumin, triglycerides, hematocrit, and hemoglobin were made by means of partial least-squares (PLS) analysis with clinically relevant precision (r 2 values .0.93). The similarity of the features of the PLS calibration spectra to those of the respective analyte spectra illustrates that the predictions are based on molecular information carried by the Raman light. This demonstrates the feasibility of using Raman spectroscopy for quantitative measurements of biomolecular contents in highly light-scattering and absorbing media.
The ability to determine the molar mass of a polymer is of fundamental importance to describe polymer molecular characteristics. Conventional methods for measuring molar mass include viscometry, osmometry, light scattering and analytical... more
The ability to determine the molar mass of a polymer is of fundamental importance to describe polymer molecular characteristics. Conventional methods for measuring molar mass include viscometry, osmometry, light scattering and analytical gel permeation chromatography (GPC). Although high quality data can be obtained by these methods, the results can be signi®cantly affected by sample preparation, and they are often time consuming and unsuitable for real-time on-line processing.
Wine samples of four different countries: Hungary, Czech Republic, Romania and South Africa, have been studied within the European project WINES-DB “establishing of a wine data bank for analytical parameters from third countries”. For... more
Wine samples of four different countries: Hungary, Czech Republic, Romania and South Africa, have been studied within the European project WINES-DB “establishing of a wine data bank for analytical parameters from third countries”. For each country two types of wine samples were collected, during three consecutive years: commercial wines and wines obtained by microvinification according to EC regulation N. 2729/2000. The sampling design was organized to represent both the grape varieties and the official wine regions in the four countries. The 1188 wine samples were analyzed for 58 chemical quantities.Data analysis was performed with special attention to the real problem, namely the control of frauds.Class modeling techniques (UNEQ, SIMCA, MRM) have been applied, to answer to the general question: “Does sample O, stated of class A, really belong to class A?”. Two validation strategies, based on cross validation and on an external, representative, evaluation set, have been used to evaluate carefully the predictive performance of the class models.The results obtained with the four class modeling techniques indicate that for the four countries it is possible to compute models with high efficiency, generally with a reduced number of variables. To obtain efficient models, red and white wines, commercial and microvinification wines, must be considered separately.The validity of the models is ensured by the representativity of the samples, the appropriate application of techniques of Chemometrics and the validation.
The Kool-Aid powders (Kraft General Foods) contain one or two of the following dyes: Allura Red (FD and C Red-40, R40), Sunset Yellow (FD and C Yellow-6, Y6) Tartrazine (FD and C Yellow-5, Y5), Erythrosine B (FD and C Red-3, R3), Amaranth... more
The Kool-Aid powders (Kraft General Foods) contain one or two of the following dyes: Allura Red (FD and C Red-40, R40), Sunset Yellow (FD and C Yellow-6, Y6) Tartrazine (FD and C Yellow-5, Y5), Erythrosine B (FD and C Red-3, R3), Amaranth (FJI and C Red-2, R2) and Brilliant Blue FCF (ID and C Blue 1, Bl), depending on the taste of the drink. In this work the "universal" calibration matrix is proposed for the determination of R40 in soft drink powders by partial least squares method using spectrophotometric data. The training set of samples consists of 23 solutions: three of them contain only R40 at different concentrations and the rest of them are the binary mixtures of R40 with one of the following dyes: R3, R2, Y5 or Y6; all at the concentrations varying in the range 2-22 mg 1-l (exception: R3 2-12 mg 1-l). The good analytical performance of R40 calibration was obtained (R2=0.9993, RMSD=0.2125, REP=2.20%) and this calibration matrix was applied to the analysis of the real samples containing only one dye (R40) and to the samples containing also other dyes, commonly used as the color additives in drink powders. The results obtained were compared with the results of the official spectrophotometric method and with the results of PLS algorithm for different binary dye calibration matrices. A good statistical agreement was obtained in each case, which confirms that some interferences occurring in spectrophotometric determinations can be eliminated using PLS algorithm and including some possibly interfering compounds in calibration samples.
This paper describes the theoretical background, algorithm, and validation of a recently developed novel method of ranking based on the sum of ranking differences [TrAC Trends Anal. Chem. 2010; 29: 101-109]. The ranking is intended to... more
This paper describes the theoretical background, algorithm, and validation of a recently developed novel method of ranking based on the sum of ranking differences [TrAC Trends Anal. Chem. 2010; 29: 101-109]. The ranking is intended to compare models, methods, analytical techniques, panel members, etc. and it is entirely general. First, the objects to be ranked are arranged in the rows and the variables (for example model results) in the columns of an input matrix. Then, the results of each model for each object are ranked in order of increasing magnitude. The difference between the rank of the model results and the rank of the known, reference or standard results are then computed. (If the golden standard ranking is known the rank differences can be completed easily.) In the end the absolute values of differences are summed together for all models to be compared.
the basis of a data set containing laboratory tests for three categories of functional states of the thyroid gland and the performance is compared with those of other probabilistic pattern recognition techniques. Although the results are... more
the basis of a data set containing laboratory tests for three categories of functional states of the thyroid gland and the performance is compared with those of other probabilistic pattern recognition techniques. Although the results are acceptable, the f;NN method did not perform as well as some other probabilistic techniques. One of the disadvantages of the k-nearest neighbour method (hNN) [l, 21 is the paucity of information provided by this technique. A test object (sample, patient, _ _ . ) is usually classified in one of the training classes according to the so-called majority vote. The information obtained about the test object is thus limited to the label of the class to which the object is assigned. Other supervised pattern recognition techniques such as ALLOC [ 3, 41, BAYES [5], SLDA [S] and SIMCA [7, S] , are able to estimate the degree of certainty about the decision in terms of a posteriori probability of class membership_ It has been shown [3, 91 that non-parametric supervised methods such as ALLOC permit more flexible and realistic estimates than is possible by parametric techniques such as statistical finear discriminant analysis (SLDA) because the data usually do not fit the assumptions underlying the parametric methods. The hNN method is also non-parametric but, unfortunately, in its classical form the method is not suitable for probabilistic classifications. In fact, even the probabilistic approach of the alternative vote version of kNN as discussed in the first article of this series 11) , must be considered rudimentary compared to methods such as ALLOC. In the literature [lO-133, however, more sophisticated modifications of the kNN method have been proposed in order to abow more accurate probability estimates. The method suggested by Loftsgaarden and Quesenberry [lo] forms the basis of the generalized k-nearest neighbour method developed by OOO~-~670/8~/000~-0000/$0~.75 o 1982 Elsevier Scientific Publishing Company
This paper evaluates the usefulness of three chemical parameters (compositions on tocopherols, sterols and fatty acids) as a tool to discriminate three varietal olive oils (Cvs. Cobrançosa, Madural and Verdeal Transmontana), which are... more
This paper evaluates the usefulness of three chemical parameters (compositions on tocopherols, sterols and fatty acids) as a tool to discriminate three varietal olive oils (Cvs. Cobrançosa, Madural and Verdeal Transmontana), which are permitted cultivars for the production of ''Trás-os-Montes olive oil'', a Portuguese protected designation of origin (PDO) product. The olives were collected during the year crop 2000/2001 from the same orchard, in order to eliminate the geographical and climatic influences. Lots with different maturation indices were prepared to allow the evaluation of the ripening stage on the characteristics of varietal olive oils produced from each cultivar. Statistical methods such as multivariate analysis of variance (MANOVA), principal components analysis (PCA) and cluster analysis were used to evaluate significant differences on the studied parameters. Regarding the results, the three cultivars were clearly discriminated.
Fourier transform Raman spectroscopy and chemometric tools have been used for exploratory analysis of pure corn and cassava starch samples and mixtures of both starches, as well as for the quantification of amylose content in corn and... more
Fourier transform Raman spectroscopy and chemometric tools have been used for exploratory analysis of pure corn and cassava starch samples and mixtures of both starches, as well as for the quantification of amylose content in corn and cassava starch samples. The exploratory analysis using principal component analysis shows that two natural groups of similar samples can be obtained, according to the amylose content, and consequently the botanical origins. The Raman band at 480 cm −1 , assigned to the ring vibration of starches, has the major contribution to the separation of the corn and cassava starch samples. This region was used as a marker to identify the presence of starch in different samples, as well as to characterize amylose and amylopectin. Two calibration models were developed based on partial least squares regression involving pure corn and cassava, and a third model with both starch samples was also built; the results were compared with the results of the standard colorimetric method. The samples were separated into two groups of calibration and validation by employing the Kennard-Stone algorithm and the optimum number of latent variables was chosen by the root mean square error of cross-validation obtained from the calibration set by internal validation (leave one out). The performance of each model was evaluated by the root mean square errors of calibration and prediction, and the results obtained indicate that Fourier transform Raman spectroscopy can be used for rapid determination of apparent amylose in starch samples with prediction errors similar to those of the standard method.
Near-infrared (NIR) spectrometry and chemometric methods of classification were used to classify and verify adulteration of 69 samples of alcoholic beverages (whiskey, brandy, rum and vodka). The characterization of the drinks was... more
Near-infrared (NIR) spectrometry and chemometric methods of classification were used to classify and verify adulteration of 69 samples of alcoholic beverages (whiskey, brandy, rum and vodka). The characterization of the drinks was accomplished by chemo-metric models based on principal component analysis and soft independent modelling of class analogy elaborated for each different group. Alcoholic beverages adulterated with 5% and 10% (v/v) of water, ethanol or methanol were analyzed and used to verify the models capacity of prediction. Besides, actual samples with suspicions of adulteration were analyzed by gas chromatography and used to test the chemometric models which were able to predict the adulteration in the actual samples. The proposed method was successfully applied in the verification of alcoholic beverages adulteration with 100% of correct prediction at 95% of confidence level. The absence of reagents, low sample consumption, high sampling throughput and good predictive ability enables the developed methodology to be applied as a screening analysis to verify adulteration of the alcoholic beverage, that is, a prior step used to condition the sample to a deeper analysis only when a positive result for adulteration is obtained by the proposed method.
- by Mário César Ugulino de Araújo and +2
- •
- Chemometrics
In spectroscopy the measured spectra are typically plotted as a function of the wavelength (or wavenumber), but analysed with multivariate data analysis techniques (multiple linear regression (MLR), principal components regression (PCR),... more
In spectroscopy the measured spectra are typically plotted as a function of the wavelength (or wavenumber), but analysed with multivariate data analysis techniques (multiple linear regression (MLR), principal components regression (PCR), partial least squares (PLS)) which consider the spectrum as a set of m different variables. From a physical point of view it could be more informative to describe the spectrum as a function rather than as a set of points, hereby taking into account the physical background of the spectrum, being a sum of absorption peaks for the different chemical components, where the absorbance at two wavelengths close to each other is highly correlated. In a first part of this contribution, a motivating example for this functional approach is given. In a second part, the potential of functional data analysis is discussed in the field of chemometrics and compared to the ubiquitous PLS regression technique using two practical data sets. It is shown that for spectral data, the use of B-splines proves to be an appealing basis to accurately describe the data. By applying both functional data analysis and PLS on the data sets the predictive ability of functional data analysis is found to be comparable to that of PLS. Moreover, many chemometric datasets have some specific structure (e.g. replicate measurements, on the same object or objects that are grouped), but the structure is often removed before analysis (e.g. by averaging the replicates). In the second part of this contribution, we suggest a method to adapt traditional analysis of variance (ANOVA) methods to datasets with spectroscopic data. In particular, the possibilities to explore and interpret sources of variation, such as variations in sample and ambient temperature, are examined.