Weighted feature significance: a simple, interpretable model of compound toxicity based on the statistical enrichment of structural features - PubMed (original) (raw)

Weighted feature significance: a simple, interpretable model of compound toxicity based on the statistical enrichment of structural features

Ruili Huang et al. Toxicol Sci. 2009 Dec.

Abstract

In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high-throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation.

PubMed Disclaimer

Figures

FIG. 1.

FIG. 1.

ROC curves for the prediction of cytotoxic compounds in the training and testing compound sets using cell viability data generated on the NTP collection. WFS scores were calculated using feature p values generated from the training compound set. The predictive power of the model decreased, but is still significantly better than random, when applied to the testing compound set as indicated by the reduction in the AUC.

FIG. 2.

FIG. 2.

ROC curves for the prediction of cytotoxic compounds in the EPA collection using three different modeling approaches: WFS, Naive Bayesian, and SMO. Models are trained on data generated from compounds in the NTP collection.

FIG. 3.

FIG. 3.

ROC curves for the prediction of compounds that activated caspase-3,7 in the testing compound set of the NTP collection using three different modeling approaches: WFS, Naive Bayesian, and SMO. Models are built using data generated from the training compound set. The predictive power of the model decreased, but is still significantly better than random, when applied to the testing compound set as indicated by the reduction in the AUC.

FIG. 4.

FIG. 4.

ROC curves for the prediction of hepatotoxic compounds using three different modeling approaches: WFS, Naive Bayesian, and SMO (logistic model). WFS is as shown the superior method of the three.

References

    1. Anonymous. 2009. Available at: http://ntp-server.niehs.nih.gov/. Accessed August 22, 2007.
    1. Ashby J, Tennant RW. Definitive relationships among chemical structure, carcinogenicity and mutagenicity for 301 chemicals tested by the U.S. NTP. Mutat. Res. 1991;257:229–306. - PubMed
    1. Bahler D, Stone B, Wellington C, Bristol DW. Symbolic, neural, and Bayesian machine learning models for predicting carcinogenicity of chemical compounds. J. Chem. Inf. Comput. Sci. 2000;40:906–914. - PubMed
    1. Casalegno M, Sello G, Benfenati E. Top-priority fragment QSAR approach in predicting pesticide aquatic toxicity. Chem. Res. Toxicol. 2006;19:1533–1539. - PubMed
    1. Collins FS, Gray GM, Bucher JR. Toxicology. Transforming environmental health protection. Science. 2008;319:906–907. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources