Machine learning algorithms used in Quantitative structure-activity relationships studies as new approaches in drug discovery (original) (raw)
Related papers
Neural networks in building QSAR models
Methods in molecular biology (Clifton, N.J.), 2008
This chapter critically reviews some of the important methods being used for building quantitative structure-activity relationship (QSAR) models using the artificial neural networks (ANNs). It attends predominantly to the use of multilayer ANNs in the regression analysis of structure-activity data. The highlighted topics cover the approximating ability of ANNs, the interpretability of the resulting models, the issues of generalization and memorization, the problems of overfitting and overtraining, the learning dynamics, regularization, and the use of neural network ensembles. The next part of the chapter focuses attention on the use of descriptors. It reviews different descriptor selection and preprocessing techniques; considers the use of the substituent, substructural, and superstructural descriptors in building common QSAR models; the use of molecular field descriptors in three-dimensional QSAR studies; along with the prospects of "direct" graph-based QSAR analysis. The...
Chapter 8. Neural networks in building QSAR models
Igor I. Baskin, Vladimir A. Palyulin, and Nikolai S. Zefirov. Neural networks in building qsar models. Methods in molecular biology (Clifton, N.J.), 458:137–158, 2008, 2008
This chapter critically reviews some of the important methods being used for building quantitative structure-activity relationship (QSAR) models using the artificial neural networks (ANNs). It attends predominantly to the use of multilayer ANNs in the regression analysis of structure-activity data. The highlighted topics cover the approximating ability of ANNs, the interpretability of the resulting models, the issues of generalization and memorization, the problems of overfitting and overtraining, the learning dynamics, regularization, and the use of neural network ensembles. The next part of the chapter focuses attention on the use of descriptors. It reviews different descriptor selection and preprocessing techniques; considers the use of the substituent, substructural, and superstructural descriptors in building common QSAR models; the use of molecular field descriptors in three-dimensional QSAR studies; along with the prospects of “direct” graph-based QSAR analysis. The chapter starts with a short historical survey of the main milestones in this area.
Computer-Aided Linear Modeling Employing Qsar for Drug Discovery
Scientific Research and Essays, 2009
Quantitative structure-activity relationship (QSAR) is a computational process that relates the chemical structure of compounds with their activities, especially biologic activities or effects. It employs series of computer-based processes to analyze quantitative experimental data of the activities of given compounds with known chemical structures in order to predict a relationship, model or equation that will help to propose the activity of known compounds with unknown activities or unknown compounds and their activities. Commonly used computer softwares in QSAR analysis include HYPERCHEM, MATLAB, DRAGON and RECKON. Key words: QSAR, biological activity, prediction, computer software.
Arkivoc, 2007
Artificial neural networks (ANNs) can be utilized to generate predictive models of quantitative structure-activity relationships (QSAR) between a set of molecular descriptors and activity. In the present work, QSAR analysis for a set of 95 1-[(2-hydroxyethoxy)-methyl]-6-(phenylthio)thymine (HEPT) derivatives has been investigated by means of a three-layered neural network (NN). It has been shown that NN can be a potential tool in the investigation of QSAR analysis compared with the models given in the literature. The results obtained by using the NN adopted for QSAR models showing not only good statistical significance in fitting, but also high predictive ability. (0.916< r <0.968 and q 2 = 0.8779). The relevant factors controlling the anti-HIV-1 activity of HEPT derivatives have been identified. The results are along the same lines as those of our previous studies on HEPT derivatives and indicate the importance of the hydrophobic parameter in modelling the QSAR for HEPT derivatives
Pre-processing in AI based Prediction of QSARs
Computing Research Repository, 2009
Machine learning, data mining and artificial intelligence (AI) based methods have been used to determine the relations between chemical structure and biological activity, called quantitative structure activity relationships (QSARs) for the compounds. Pre-processing of the dataset, which includes the mapping from a large number of molecular descriptors in the original high dimensional space to a small number of components in the lower dimensional space while retaining the features of the original data, is the first step in this process. A common practice is to use a mapping method for a dataset without prior analysis. This pre-analysis has been stressed in our work by applying it to two important classes of QSAR prediction problems: drug design (predicting anti-HIV-1 activity) and predictive toxicology (estimating hepatocarcinogenicity of chemicals). We apply one linear and two nonlinear mapping methods on each of the datasets. Based on this analysis, we conclude the nature of the inherent relationships between the elements of each dataset, and hence, the mapping method best suited for it. We also show that proper preprocessing can help us in choosing the right feature extraction tool as well as give an insight about the type of classifier pertinent for the given problem.
Journal of Enzyme Inhibition and Medicinal Chemistry, 2008
Quantitative structure-activity relationship (QSAR) studies have been carried out on indolyl aryl sulfones, a class of novel HIV-1 non-nucleoside reverse transcriptase inhibitors, using physicochemical, topological and structural parameters along with appropriate indicator variables. The statistical tools used were linear methods (e.g., stepwise regression analysis, partial least squares (PLS), factor analysis followed by multiple regression (FA-MLR), genetic function approximation combined with multiple linear regression (GFA-MLR) and GFA followed by PLS or G/PLS and nonlinear method (artificial neural network or ANN). In case of physicochemical parameters, GFA-MLR generated the best Equation (n ¼ 97, R 2 ¼ 0.862, Q 2 ¼ 0.821). Using topological parameters, the best Equation (based on leave-one-out Q 2) was obtained with stepwise regression technique (n ¼ 97, R 2 ¼ 0.867, Q 2 ¼ 0.811). When topological and physicochemical parameters were used in combination, statistical quality increased to a great extent (n ¼ 97, R 2 ¼ 0.891, Q 2 ¼ 0.849 from stepwise regression). Furthermore, the whole dataset had been divided into test (25% of whole dataset) and training (remaining 75%) sets. Models were developed based on the training set and predictive potential of such models was checked from the test set. The selection of the training set was based on K-means clustering of the standardized descriptors (topological and physicochemical). In this case also the best results were obtained with stepwise regression (n ¼ 72, R 2 ¼ 0.906, Q 2 ¼ 0.853) but external predictive capacity of this model (R 2 pred ¼ 0:738) was inferior to the model developed from GFA-MLR technique (R 2 ¼ 0.883, Q 2 ¼ 0.823, R 2 pred ¼ 0:760). However, the squared regression coefficient between observed activity and predicted activity values of the test set compounds for the best linear model, i.e., GFA-MLR (r 2 ¼ 0.736) was lower in comparison to the best nonlinear model developed using artificial neural network (r 2 ¼ 0.781). Thus, based on external validation, the ANN models were superior to the linear models. The predictive potential of the best linear Equation (stepwise regression model) was superior to that of the previously published CoMFA (Q 2 ¼ 0.81, SDEP Test ¼ 0.89) on the same data set (Ragno R. et al., J Med Chem 2006, 49, 3172-3184). Furthermore, the physicochemical parameter based models also supported the previous observations based on docking (Ragno R. et al.,
Drug Discovery Using Machine Learning and Data Analysis
International Journal for Research in Applied Science and Engineering Technology (IJRASET), 2022
An objective of drug discovery is to identify novel substances with certain chemical properties for the treatment of diseases. A significant amount of biological data has been produced recently from a variety of sources. Using this data, molecular analysis has been used to determine the most successful treatments. Trial-and-error medicine is frequently frustrating and significantly more expensive. This makes it easier to complete the work by predicting whether a drug will be active or not. The information about the drug can also be used to develop new medications. Quantitative Structure Activity Relationship (QSAR) analysis is one application that uses machine learning to improve decision-making in pharmaceutical data across multiple applications. Predictive models based on machine learning have recently grown substantially in prominence with in phase beyond preclinical research. In this stage, new drug discovery expenses and research times are significantly reduced. Utilizing pattern recognition algorithms, deciphering mathematical correlations, chemical and biological features of compounds, and machine learning has been used for drug development increasingly and more frequently, with positive outcomes. Other restrictions include the necessity for a large volume of data, a lack of interpretability, etc. Machine learning approaches are comparable to physical models in that they may be applied to large data sets without the need for computational resources.
New-Generation Drug Discovery using Machine Learning
International Journal for Research in Applied Science and Engineering Technology, 2023
Finding innovative molecules with specific chemical properties to treat diseases is one of the goals of drug discovery. Recent years have seen the production of a sizable volume of biological data from many sources. These statistics and molecular analyses have been used to determine the most effective medications. Medical research often frustrates people and is far more expensive. The work at hand is made easier by having the ability to predict whether a medicine will be active or not. The information about the drug can also be used to develop other drugs. One application that makes use of machine learning to enhance decision-making in pharmaceutical data across numerous applications is quantitative structure activity relationship (QSAR) analysis. Machine learning-based predictive models have recently gained a lot of attention in areas outside of preclinical research. Costs and research times associated with finding new drugs are considerably decreased at this stage. Drug research is growing and more commonly utilising machine learning, algorithms for pattern recognition, knowledge of mathematical correlations, and knowledge of the chemical and biological characteristics of molecules. The necessity for a sizable volume of data, the incapacity to interpret the data, and other issues are further restrictions. Without the need for computational resources, massive amounts of data can be analysed using both physical models and machine learning approaches.
Application of machine learning in prediction of bioactivity of molecular compounds : A review
2017
The use and implementation of machine learning has been majorly seen in chemoinformatics, drug discovery and development and especially bioactive molecule prediction. It has been shown that due to the inability of simple tools to handle the majorly increasing amount of data, machine learning comes to the rescue with its ability to handle high dimensional data, either with homogeneous or heterogeneous molecular structure. This study therefore reviews the current use, application, and principles of machine learning in chemoinformatics and drug discovery. It is discovered that the several machine learning methods or classifiers perform differently under the influence of several conditions. Also, no classifier can be said to claim superiority over another since they all perform differently depending on the dataset involved, and the classification process involved. The pharmaceutical industry will benefit more if better classifiers with better performance are implemented and this can be ...