Representation/Prediction of Solubilities of Pure Compounds in Water Using Artificial Neural Network−Group Contribution Method (original) (raw)

Prediction of the pharmaceutical solubility in water and organic solvents via different soft computing models

2019

Solubility data of solid in aqueous and different organic solvents are very important physicochemical properties considered in the design of the industrial processes and the theoretical studies. In this study, experimental solubility data of 666 pharmaceutical compounds in water and 712 pharmaceutical compounds in organic solvents were collected from different sources. Three different artificial neural networks including multilayer perceptron, radial basis function and support vector machine were constructed to predict the solubility of these different pharmaceutical compounds in water and different solvents. Molecular weight, melting point, temperature and the number of each functional group in the pharmaceutical compound and organic solvents were selected as the input variables of these three different neural network models. The neural network predictions were compared with the experimental data and the SVR-PSO model with the Average Absolute Relative Deviation equal to 0.0166 for...

Accurate prediction of the solubility parameter of pure compounds from their molecular structures

Fluid Phase Equilibria, 2014

A quantitative structure property relation (QSPR) method for predicting the solubility parameter (ı) of pure compounds is presented. Artificial neural network (ANN) model was developed and used to probe the structural groups that have significant contribution to the overall solubility of pure compounds and arrive at the set of groups that can best represent the solubility parameter for about 418 substances. The 36 atom-type structural groups listed can predict the solubility parameter of pure compounds from the knowledge of the molecular structure alone with a correlation coefficient of 0.998 and an absolute standard deviation and error of 0.109 and 0.67%, respectively. The results are further compared with those of the traditional structural group contribution (SGC) method based on multivariable regression as well as other methods in the literature. The method is very useful in predicting the solubility potential of various compounds and has advantages in terms of combined accuracy and simplicity.

Linear and nonlinear functions on modeling of aqueous solubility of organic compounds by two structure representation methods

Journal of Computer-Aided Molecular Design, 2000

Several quantitative models for the prediction of aqueous solubility of organic compounds were developed based on a diverse dataset with 2084 compounds by using multi-linear regression analysis and backpropagation neural networks. The compounds were described by two different structure representation methods: (1) with 18 topological descriptors; and (2) with 32 radial distribution function codes representing the 3D structure of a molecule and eight additional descriptors. The dataset was divided into a training and a test set based on Kohonen's selforganizing neural network. Good prediction results were obtained for backpropagation neural network models: with 18 topological descriptors, for the 936 compounds in the test set, a correlation coefficient of 0.92, and a standard deviation of 0.62 were achieved; with 3D descriptors, for the 866 compounds in the test set, a correlation coefficient of 0.90, and a standard deviation of 0.73 were achieved. The models were also tested by using another dataset, and the relationship of the two datasets was examined by Kohonen's self-organizing neural network.

Prediction of Aqueous Solubility of Organic Compounds Based on a 3D Structure Representation

Journal of Chemical Information and Modeling, 2003

The revised general solubility equation (GSE) is used along with four different methods including Huuskonen's artificial neural network (ANN) and three multiple linear regression (MLR) methods to estimate the aqueous solubility of a test set of the 21 pharmaceutically and environmentally interesting compounds. For the selected test sets, it is clear that the GSE and ANN predictions are more accurate than MLR methods. The GSE has the advantages of being simple and thermodynamically sound. The only two inputs used in the GSE are the Celsius melting point (MP) and the octanol water partition coefficient (K ow ). No fitted parameters and no training data are used in the GSE, whereas other methods utilize a large number of parameters and require a training set. The GSE is also applied to a test set of 413 organic nonelectrolytes that were studied by Huuskonen. Although the GSE uses only two parameters and no training set, its average absolute errors is only 0.1 log units larger than that of the ANN, which requires many parameters and a large training set. The average absolute error AAE is 0.54 log units using the GSE and 0.43 log units using Huuskonen's ANN modeling. This study provides evidence for the GSE being a convenient and reliable method to predict aqueous solubilities of organic compounds.

Application of Artificial Neural Networks to Predict the Intrinsic Solubility of Drug-Like Molecules

Pharmaceutics

Machine learning (ML) approaches are receiving increasing attention from pharmaceutical companies and regulatory agencies, given their ability to mine knowledge from available data. In drug discovery, for example, they are employed in quantitative structure–property relationship (QSPR) models to predict biological properties from the chemical structure of a drug molecule. In this paper, following the Second Solubility Challenge (SC-2), a QSPR model based on artificial neural networks (ANNs) was built to predict the intrinsic solubility (logS0) of the 100-compound low-variance tight set and the 32-compound high-variance loose set provided by SC-2 as test datasets. First, a training dataset of 270 drug-like molecules with logS0 value experimentally determined was gathered from the literature. Then, a standard three-layer feed-forward neural network was defined by using 10 ChemGPS physico-chemical descriptors as input features. The developed ANN showed adequate predictive performances ...

Representation and Prediction of Molecular Diffusivity of Nonelectrolyte Organic Compounds in Water at Infinite Dilution Using the Artificial Neural Network-Group Contribution Method

Journal of Chemical & Engineering Data, 2011

The determination of diffusion coefficients of pure compounds in water at infinite dilution is of utmost interest in chemical and environmental engineering, especially wastewater treatment processes. In this work, the artificial neural network-group contribution (ANN-GC) method is applied to represent and predict the molecular diffusivity of nonelectrolyte organic compounds in water at infinite dilution and 298.15 K. A total of 4852 pure compounds from various chemical families has been investigated to propose a predictive model. The obtained results show the squared correlation coefficient of 0.996, root-mean-square error of about 0.02, and average absolute deviation lower than 1.5 % for the calculated or predicted property from existing experimental values.

Prediction of Henry’s Law Constant of Organic Compounds in Water from a New Group-Contribution-Based Model

Industrial & Engineering Chemistry Research, 2010

In this work, a new model is presented for estimation of Henry's law constant of pure compounds in water at 25°C (H). This model is based on a combination between a group contribution method and neural networks. The needed parameters of the model are the occurrences of a new collection of 107 functional groups. On the basis of these 107 functional groups, a feed forward neural network is presented to estimate the H of pure compounds. The squared correlation coefficient, absolute percent error, standard deviation error, and rootmean-square error of the model over a diverse set of 1940 pure compounds used are, respectively, 0.9981, 2.84%, 2.4, and 0.1 (all the values obtained using log H based data). Therefore, the model is a comprehensive and an accurate model and can be used to predict the H of a wide range of chemical families of pure compounds in water better than previously presented models.

Prediction of Aqueous Solubility of Drug-Like Compounds by Using an Artificial Neural Network and Least-Squares Support Vector Machine

Bulletin of the Chemical Society of Japan, 2010

In this work the aqueous solubilities of 145 drug-like compounds were predicted from their theoretical derived molecular descriptors. Descriptors which were selected by stepwise multiple subset selection methods are; 1st-order solvation connectivity index, average span R, overall hydrogen bond basicity, and percent of hydrophilic surface area. These descriptors can encode features of molecules which are effected on dispersion, hydrophobic and steric interactions between solute and solvent molecules. To develop quantitative structureactivity relationship (QSAR) models, the methods of multiple linear regressions, least-squares support vector machine, and artificial neural network (ANN) were used by applying the selected descriptors as their inputs. The obtained statistical parameters of these models revealed that ANN model was superior to other methods. The standard error (SE), average error (AE), and average absolute error (AAE) for ANN model are: SE = 0.714, AE = ¹0.178, and AAE = 0.546, while these values for internal test set are: SE = 0.830, AE = ¹0.056, and AAE = 0.630 and for external test set are: SE = 0.762, AE = ¹0.431, and AAE = 0.626, respectively. Moreover the leave-many-out cross validation test was used to further investigate the prediction power and robustness of model, which lead to R L10O 2 = 0.816 and SPRESS = 0.32 for ANN model, which revealed the reliability of this model.

Group Contribution-Based Method for Determination of Solubility Parameter of Nonelectrolyte Organic Compounds

Industrial & Engineering Chemistry Research, 2011

The determination of the solubility parameter of organic compounds has been of much significance in the chemical industry. In this study, we propose a predictive method based on the combination of the Group Contribution strategy with the Artificial Neural Network to calculate/estimate the solubility parameter values of about 1620 nonelectrolyte organic compounds at 298.15 K and atmospheric pressure. The chemical functional groups are obtained for various compounds categorized in 81 different chemical families. The final results indicate the following statistical parameters of the presented method: average relative deviation (ARD %) of the determined properties from existing experimental values of 1.5% and a squared correlation coefficient of 0.985. It is finally inferred that the developed model is more accurate and predictive than our previously proposed models based on the Quantitative StructureÀProperty Relationship algorithm, which yielded 4.6, 3.4, and 3.1 ARD % from experimental values.