ANN-QSAR model for selection of anticancer leads from structurally heterogeneous series of compounds (original) (raw)
Related papers
BMC cancer, 2015
In past, numerous quantitative structure-activity relationship (QSAR) based models have been developed for predicting anticancer activity for a specific class of molecules against different cancer drug targets. In contrast, limited attempt have been made to predict the anticancer activity of a diverse class of chemicals against a wide variety of cancer cell lines. In this study, we described a hybrid method developed on thousands of anticancer and non-anticancer molecules tested against National Cancer Institute (NCI) 60 cancer cell lines. Our analysis of anticancer molecules revealed that majority of anticancer molecules contains 18-24 carbon atoms and are dominated by functional groups like R2NH, R3N, ROH, RCOR, and ROR. It was also observed that certain substructures (e.g., 1-methoxy-4-methylbenzene, 1-methoxy benzene, Nitrobenzene, Indole, Propenyl benzene) are more abundant in anticancer molecules. Next, we developed anticancer molecule prediction models using various machine-l...
Journal of King Saud University - Science, 2018
Cancer-causing nature is one of the toxicological endpoints bringing about the most elevated concern. Likewise, the standard bioassays in rodents used to survey the cancer-mitigating capability of chemicals and medications are expensive and require the sacrifice of animals. Thus, we have endeavored the development of a worldwide QSAR model utilizing an information set of 85 compounds, including drugs for their anti-leukemia potential. Considering expansive number of information focuses with different structural elements utilized for model development (ntraining = 68) and model validation (ntest = 17), the model developed in this study has an encouraging statistical quality (leave-one-out Q2 = 0.833, R2pred = 0.716) for pLC50 and (leave-one-out Q2 = 0.744, R2pred = 0.614) for pGI50. Our developed model suggests that the absence of methanal fragments, low dipole moment and presence of some 2D autocorrelated molecular descriptors reduces the carcinogenicity. Branching, size and shape are found to be crucial factors for drug-mitigating carcinogenicity.
Journal of Computer-Aided Molecular Design, 2007
A combined approach of validated QSAR modeling and virtual screening was successfully applied to the discovery of novel tylophrine derivatives as anticancer agents. QSAR models have been initially developed for 52 chemically diverse phenanthrine-based tylophrine derivatives (PBTs) with known experimental EC 50 using chemical topological descriptors (calculated with the MolConnZ program) and variable selection k nearest neighbor (kNN) method. Several validation protocols have been applied to achieve robust QSAR models. The original dataset was divided into multiple training and test sets, and the models were considered acceptable only if the leave-one-out cross-validated R 2 (q 2) values were greater than 0.5 for the training sets and the correlation coefficient R 2 values were greater than 0.6 for the test sets. Furthermore, the q 2 values for the actual dataset were shown to be significantly higher than those obtained for the same dataset with randomized target properties (Y-randomization test), indicating that models were statistically significant. Ten best models were then employed to mine a commercially available ChemDiv Database (ca. 500K compounds) resulting in 34 consensus hits with moderate to high predicted activities. Ten structurally diverse hits were experimentally tested and eight were confirmed active with the highest experimental EC 50 of 1.8µM implying an exceptionally high hit rate (80%). The same ten models were further applied to predict EC50 for four new PBTs, and the correlation coefficient (R 2) between the experimental and predicted EC 50 for these compounds plus eight active consensus hits was shown to be as high as 0.57. Our studies suggest that the approach combining validated QSAR modeling and virtual screening could be successfully used as a general tool for the discovery of novel biologically active compounds.
Study on cytotoxicity of diarylaniline derivatives by using quantitative structure-activity relationship (QSAR) has been done. The structures and cytotoxicities of diarylaniline derivatives were obtained from the literature. Calculation of molecular and electronic parameters was conducted using Austin Model 1 (AM1), Parameterized Model 3 (PM3), Hartree-Fock (HF), and density functional theory (DFT) methods. Artificial neural networks (ANN) analysis used to produce the best equation with configuration of input data-hidden node-output data = 5-8-1, value of r 2 = 0.913; PRESS = 0.069. The best equation used to design and predict new diarylaniline derivatives. The result shows that compound N1-(4′-Cyanophenyl)-5-(4″-cyanovinyl-2″,6″-dimethyl-phenoxy)-4-dimethylether benzene-1,2-diamine) is the best-proposed compound with cytotoxicity value (CC 50) of 93.037 μM. ABSTRAK Kajian terhadap sitotoksisitas turunan diarilanilina menggunakan hubungan kuantitatif struktur-aktifitas (HKSA) telah dilakukan. Struktur dan sitotoksisitas dari turunan diarilanilina diperoleh dari literatur. Perhitungan parameter molekuler dan elektronik dilakukan dengan metode Austin Model 1 (AM1), Parameterized Model 3 (PM3), Hartree-Fock (HF), dan density functional theory (DFT). Analisis jaringan syaraf tiruan (JST) digunakan untuk menghasilkan persamaan terbaik dengan konfigurasi dari input data-hidden node-output data = 5-8-1, nilai r 2 = 0,913; PRESS = 0,069. Persamaan terbaik tersebut kemudian digunakan untuk merancang dan memprediksi senyawa-senyawa turunan diarilanilina yang baru. Hasil yang diperoleh menunjukkan bahwa senyawa N 1-(4′-sianofenil)-5-(4″-sianofinil-2″,6″-dimetil-fenoksi)-4-dimetileter benzena-1,2-diamina) adalah senyawa usulan terbaik dengan nilai sitotoksisitas (CC 50) sebesar 93,037 μM.
Analogue-based approaches in anti-cancer compound modelling: the relevance of QSAR models
Organic and Medicinal Chemistry Letters, 2011
Background QSAR is among the most extensively used computational methodology for analogue-based design. The application of various descriptor classes like quantum chemical, molecular mechanics, conceptual density functional theory (DFT)- and docking-based descriptors for predicting anti-cancer activity is well known. Although in vitro assay for anti-cancer activity is available against many different cell lines, most of the computational studies are carried out targeting insufficient number of cell lines. Hence, statistically robust and extensive QSAR studies against 29 different cancer cell lines and its comparative account, has been carried out. Results The predictive models were built for 266 compounds with experimental data against 29 different cancer cell lines, employing independent and least number of descriptors. Robust statistical analysis shows a high correlation, cross-validation coefficient values, and provides a range of QSAR equations. Comparative performance of each c...
Sanat Tasarim Dergisi
Heterocyclic compounds present unique structural and physicochemical diversity. Especially, the aromatic heterocycliccompounds like indole exhibit anti-cancer properties by facilitating cancerous cell death. Drug discovery and development process relyto a large extent on the in silico identification of the putative targets and rational design of potentially therapeutic ligands. We propose a facile strategy of in silico drug designing hereby curating libraries having a functional group modification of the indole heterocyclic compounds with 2-chloro-N-(2-chloroethyl)-N-methylethanamine. Subsequently, we compare the designed drugs with the existing alkylating Reference Listed Drugs (RLDs) of the USFDA. We computationally model the indole ring as a basic scaffold and induce 2-chloro-N-(2chloroethyl)-N-methylethanamine substitution on the C-3 of indole with experimentally available target DNA receptor to design an extensive library of 200 molecules. This was followed by extensive ligand-DNA docking studies to predict putative targets andADMET prediction of an optimized ligand. Our simple in silico strategy reveals that the designed compounds such as AGSPBM134, AGSPBM133, AGSPBM131, AGSPBM130, AGSPBM132 and AGSPBM019 exhibitstructural similarity towards the RLD as shown by ECFP-6 fingerprints. We show that they pass all the similarity criteria of the physicochemical parameters with no violation of the Lipinski's rule of five. We positthat these agents can potentially be a good choice for further synthesis in the development of novel anti-cancer agents.
International Journal of Molecular Sciences
Leukemia invades the bone marrow progressively and, through unknown mechanisms, outcompetes healthy hematopoiesis. Protein arginine methyltransferases 1 (PRMT1) are found in prokaryotes and eukaryotes cells. They are necessary for a number of biological processes and have been linked to several human diseases, including cancer. Small compounds that target PRMT1 have a significant impact on both functional research and clinical disease treatment. In fact, numerous PRMT1 inhibitors targeting the S-adenosyl-L-methionine binding region have been studied. Through topographical descriptors, quantitative structure-activity relationships (QSAR) were developed in order to identify the most effective PRMT1 inhibitors among 17 compounds. The model built using linear discriminant analysis allows us to accurately classify over 90% of the investigated active substances. Antileukemic activity is predicted using a multilinear regression analysis, and it can account for more than 56% of the variatio...
Journal of Enzyme Inhibition and Medicinal Chemistry, 2008
Quantitative structure-activity relationship (QSAR) studies have been carried out on indolyl aryl sulfones, a class of novel HIV-1 non-nucleoside reverse transcriptase inhibitors, using physicochemical, topological and structural parameters along with appropriate indicator variables. The statistical tools used were linear methods (e.g., stepwise regression analysis, partial least squares (PLS), factor analysis followed by multiple regression (FA-MLR), genetic function approximation combined with multiple linear regression (GFA-MLR) and GFA followed by PLS or G/PLS and nonlinear method (artificial neural network or ANN). In case of physicochemical parameters, GFA-MLR generated the best Equation (n ¼ 97, R 2 ¼ 0.862, Q 2 ¼ 0.821). Using topological parameters, the best Equation (based on leave-one-out Q 2) was obtained with stepwise regression technique (n ¼ 97, R 2 ¼ 0.867, Q 2 ¼ 0.811). When topological and physicochemical parameters were used in combination, statistical quality increased to a great extent (n ¼ 97, R 2 ¼ 0.891, Q 2 ¼ 0.849 from stepwise regression). Furthermore, the whole dataset had been divided into test (25% of whole dataset) and training (remaining 75%) sets. Models were developed based on the training set and predictive potential of such models was checked from the test set. The selection of the training set was based on K-means clustering of the standardized descriptors (topological and physicochemical). In this case also the best results were obtained with stepwise regression (n ¼ 72, R 2 ¼ 0.906, Q 2 ¼ 0.853) but external predictive capacity of this model (R 2 pred ¼ 0:738) was inferior to the model developed from GFA-MLR technique (R 2 ¼ 0.883, Q 2 ¼ 0.823, R 2 pred ¼ 0:760). However, the squared regression coefficient between observed activity and predicted activity values of the test set compounds for the best linear model, i.e., GFA-MLR (r 2 ¼ 0.736) was lower in comparison to the best nonlinear model developed using artificial neural network (r 2 ¼ 0.781). Thus, based on external validation, the ANN models were superior to the linear models. The predictive potential of the best linear Equation (stepwise regression model) was superior to that of the previously published CoMFA (Q 2 ¼ 0.81, SDEP Test ¼ 0.89) on the same data set (Ragno R. et al., J Med Chem 2006, 49, 3172-3184). Furthermore, the physicochemical parameter based models also supported the previous observations based on docking (Ragno R. et al.,
Cogent Chemistry
The pGI 50 cytotoxicity values of 112 compounds on K-562 cancer cell line were modelled in order to illustrate the quantitative structure-activity relationship of the compounds. The data set were divided into training and test set through Kennardstone algorithm, while the pool of molecular descriptors calculated with paDEL descriptor metric program was subjected to genetic functional algorithm for selection of descriptor to be modeled. The statistical significance of the model was verified by calculating the values of Q 2 LOO (0.845), Q 2 F1 (0.9397), Q 2 F2 (0.6862) and R 2 pred (0.6862) needed to evaluate the strength and robustness of the model. The result of the internal and external validation of the model indicates that the model is good and could be used to predict the GI 50 of anticancer compounds on K-562 leukemia cell line.
QSAR & Combinatorial Science, 2009
Cancer is among the top ten causes of death in the world but in spite of the efforts of the pharmaceutical companies and many governmental organizations, new and more effective drugs are urgently needed. Computer assisted studies have been widely used to predict anticancer activity taking into account different molecular descriptors, statistical techniques, cell lines and data sets of congeneric and noncongeneric compounds. This paper describes a QSAR study and the successful application of 3D-MoRSE descriptors for developing Linear Discriminant Analysis (LDA) to predict the anticancer potential of a diverse set of indolocarbazoles derivatives. Despite the structural complexity of this sort of compounds the used descriptors are able to identify the most remarkable features like the incidence of polarizability of the substituents and the interatomic distance in the 7-azaindole moiety in the antiproliferative activity. A comparison with other approaches such as the Getaway, Randić molecular profile, Geometrical, RDF descriptors, was carried out showing the model with 3D-MoRSE descriptors resulted in the best accuracy and predictive capability. An LDA based desirability analysis was conducted to select the levels of the predictor variables which should generate more desirable drugs, i.e. with higher posterior probability to be classified cytotoxic.