Evaluating the performances of quantitative structure-retention relationship models with different sets of molecular descriptors and databases for high-performance liquid chromatography predictions (original) (raw)

Retention prediction using quantitative structure‐retention relationships combined with the hydrophobic subtraction model in reversed‐phase liquid chromatography

ELECTROPHORESIS, 2019

The hydrophobic subtraction model (HSM) combined with quantitative structure‐retention relationships (QSRR) methodology was utilized to predict retention times in reversed‐phase liquid chromatography (RPLC). A selection of new analytes and new RPLC columns that had never been used in the QSRR modeling process were used to verify the proposed approach. This work is designed to facilitate early prediction of co‐elution of analytes in pharmaceutical drug discovery applications where it is advantageous to predict whether impurities might be co‐eluted with the active drug component. The QSRR models were constructed through partial least squares regression combined with a genetic algorithm (GA‐PLS) which was employed as a feature selection method to choose the most informative molecular descriptors calculated using VolSurf+ software. The analyte hydrophobicity coefficient of the HSM was predicted for subsequent calculation of retention. Clustering approaches based on the local compound ty...

Performance comparison of partial least squares-related variable selection methods for quantitative structure retention relationships modelling of retention times in reversed-phase liquid chromatography

Journal of chromatography. A, 2015

The relative performance of six multivariate data analysis methods derived from or combined with partial least squares (PLS) has been compared in the context of quantitative structure-retention relationships (QSRR). These methods include, GA (genetic algorithm)-PLS, Monte Carlo uninformative variable elimination (MC-UVE), competitive adaptive reweighted sampling (CARS), iteratively retaining informative variables (IRIV), variable iterative space shrinkage approach (VISSA) and PLS with automated backward selection of predictors (autoPLS). A set of 825 molecular descriptors was computed for 86 suspected sports doping compounds and used for predicting their gradient retention times in reversed-phase liquid chromatography (RPLC). The correlation between molecular descriptors selected by each technique and the retention time was established using the PLS method. All models derived from a selected subset of descriptors outperformed the reference PLS model derived from all descriptors, wit...

Investigation of retention behaviour of non-steroidal anti-inflammatory drugs in high-performance liquid chromatography by using quantitative structure–retention relationships

Analytica Chimica Acta, 2007

a n a l y t i c a c h i m i c a a c t a 6 0 1 ( 2 0 0 7 ) 68-76 a v a i l a b l e a t w w w . s c i e n c e d i r e c t . c o m j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / a c a Quantitative structure-retention relationships Artificial neural network Reversed-phase high-performance liquid chromatography Non-steroidal anti-inflammatory drugs Chromatographic optimisation a b s t r a c t In this paper, a quantitative structure-retention relationship (QSRR) method is employed to model the retention behaviour in reversed-phase high-performance liquid chromatography of arylpropionic acid derivatives, largely used non-steroidal anti-inflammatory drugs (NSAIDs). Computed molecular descriptors and the organic modifier content in the mobile phase are associated into a comprehensive model to describe the effect of both solute structure and eluent composition on the isocratic retention of these drugs in water-acetonitrile mobile phases. Multilinear regression (MLR) combined with genetic algorithm (GA) variable selection is used to extract from a large set of computed 3D descriptors an optimal subset. Based on GA-MLR analysis, a five-dimensional QSRR model is identified. All the four selected molecular descriptors belong to the category of GEometry, Topology, and Atom-Weights AssemblY (GETAWAY) descriptors. The related multilinear model exhibits a quite good fitting and predictive performance. This model is further improved using an artificial neural network (ANN) learned by error back-propagation. Finally, the ANN-based model displays a remarkably better performance as compared with the MLR counterpart and, based on external validation, is able to predict with good accuracy the behaviour of unknown arylpropionic NSAIDs in the range of mobile phase composition of analytical interest (between 35 and 75% acetonitrile (v/v)). (A.A. D'Archivio).

Predicting retention times of naturally occurring phenolic compounds in reversed-phase liquid chromatography: a Quantitative Structure-Retention Relationship (QSRR) approach

International journal of molecular sciences, 2012

Quantitative structure-retention relationships (QSRRs) have successfully been developed for naturally occurring phenolic compounds in a reversed-phase liquid chromatographic (RPLC) system. A total of 1519 descriptors were calculated from the optimized structures of the molecules using MOPAC2009 and DRAGON softwares. The data set of 39 molecules was divided into training and external validation sets. For feature selection and mapping we used step-wise multiple linear regression (SMLR), unsupervised forward selection followed by step-wise multiple linear regression (UFS-SMLR) and artificial neural networks (ANN). Stable and robust models with significant predictive abilities in terms of validation statistics were obtained with negation of any chance correlation. ANN models were found better than remaining two approaches. HNar, IDM, Mp, GATS2v, DISP and 3D-MoRSE (signals 22, 28 and 32) descriptors based on van der Waals volume, electronegativity, mass and polarizability, at atomic level, were found to have significant effects on the retention times. The possible implications of these descriptors in RPLC have been discussed. All the models are proven to be quite able to predict the retention times of phenolic compounds and have shown remarkable validation, robustness, stability and predictive performance.

A comparison of three liquid chromatography (LC) retention time prediction models

Talanta, 2018

High-resolution mass spectrometry (HRMS) data has revolutionized the identification of environmental contaminants through non-targeted analysis (NTA). However, chemical identification remains challenging due to the vast number of unknown molecular features typically observed in environmental samples. Advanced data processing techniques are required to improve chemical identification workflows. The ideal workflow brings together a variety of data and tools to increase the certainty of identification. One such tool is chromatographic retention time (RT) prediction, which can be used to reduce the number of possible suspect chemicals within an observed RT window. This paper compares the relative predictive ability and applicability to NTA workflows of three RT prediction models: (1) a logP (octanol-water partition coefficient)-based model using EPI Suite™ logP predictions; (2) a commercially available ACD/ChromGenius model; and, (3) a newly developed Quantitative Structure Retention Re...

QSRR Prediction of the Chromatographic Retention Behavior of Painkiller Drugs

2015

Quantitative structure-retention relationship (QSRR) analysis is a useful technique capable of relating chromatographic retention time to the chemical structure of a solute. A QSRR study has been carried out on the reversed-phase high-performance liquid chromatography retention times (log t R ) of 62 diverse drugs (painkillers) by using molecular descriptors. Multiple linear regression (MLR) is utilized to construct the linear QSRR model. The applied MLR is based on a variety of theoretical molecular descriptors selected by the stepwise variable subset selection procedure. Stepwise regression was employed to develop a regression equation based on 50 training compounds, and predictive ability was tested on 12 compounds reserved for that purpose. The geometry of all drugs was optimized by the semiempirical method AM1 and used to calculate different molecular descriptors. The regression equation included three parameters: noctanol-water partition coefficient (log P), molecular surface area, and hydrophilic-lipophilic balance of the drug molecules, all of which could be related to retention time property. Modeling of retention times of these compounds as a function of the theoretically derived descriptors was established by MLR. The results indicate that a strong correlation exists between the log t R and the previously mentioned descriptors for drug compounds. The prediction results are in good agreement with the experimental values.

A performance comparison of modern statistical techniques for molecular descriptor selection and retention prediction in chromatographic QSRR studies

Chemometrics and Intelligent Laboratory Systems, 2005

As datasets are becoming larger, a solution to the problem of variable prediction, this problem is becoming harder. The problem is to define which subset of variables produces optimum predictions. The example studied aims to predict the chromatographic retention of 83 basic drugs on a Unisphere PBD column at pH 11.7 using 1272 molecular descriptors. The goal of this paper is to compare the relative performance of recently developed data mining methods, specifically classification and regression trees (CART), stochastic gradient boosting for tree-based models (Treeboost), and random forests (RF), with common statistical techniques in chemometrics; and genetic algorithms on multiple linear regression (GA-MLR), uninformative variable elimination partial least squares (UVE-PLS), and SIMPLS. The comparison will be performed primarily on predictive performance, but also on the variables found to be most important for the predictions. The results of this study indicated that, individually, GA-MLR (R 2 =0.93) outperformed all models. Further analysis found that a combination approach of GA-MLR and Treeboost (R 2 =0.98) further improved these results. D

Quantitative structure chromatography relationships in reversed-phase high performance liquid chromatography: Prediction of retention behaviour using theoretically derived molecular properties

1993

The use of theoretically calculated molecular properties as predictors for retention in reversed-phase HPLC has been explored. HPLC retention times have been measured for a series of 47 substituted aromatic molecules in three solvent mixtures and steric and electronic properties of these compounds have been derived using semi-empirical molecular orbital and empirical theoretical methods. A subset of the experimental data (a training set) was used to derive Property-retention time relationships and the remaining data were then used to test the predictive capability of the methods. Good retention time prediction was possible using derived regression equations for individual solvents and after including solvent parameters it was possible to predict retention for all solvents using a single equation. This method showed that the most useful properties were calculated log P and the calculated dipole moment of the solutes, and the calculated Solvent polarisability. In addition, 90 % of the data Were used to train an artificial neural network and the remaining 10 % of the data used to test the network; excellent prediction was obtained, the neural network approach being as successful as the regression analysis.

QSRR Prediction of Chromatographic Retention of Ethynyl-Substituted PAH from Semiempirically Computed Solute Descriptors

Analytical Chemistry, 2000

Quantitative structure-retention relationship (QSRR) analysis is a useful technique capable of relating chromatographic retention time to the chemical structure of a solute. A QSRR study has been carried out on the reversed-phase high-performance liquid chromatography retention times (log t R ) of 62 diverse drugs (painkillers) by using molecular descriptors. Multiple linear regression (MLR) is utilized to construct the linear QSRR model. The applied MLR is based on a variety of theoretical molecular descriptors selected by the stepwise variable subset selection procedure. Stepwise regression was employed to develop a regression equation based on 50 training compounds, and predictive ability was tested on 12 compounds reserved for that purpose. The geometry of all drugs was optimized by the semiempirical method AM1 and used to calculate different molecular descriptors. The regression equation included three parameters: noctanol-water partition coefficient (log P), molecular surface area, and hydrophilic-lipophilic balance of the drug molecules, all of which could be related to retention time property. Modeling of retention times of these compounds as a function of the theoretically derived descriptors was established by MLR. The results indicate that a strong correlation exists between the log t R and the previously mentioned descriptors for drug compounds. The prediction results are in good agreement with the experimental values.