Use of in vitro HTS-derived concentration-response data as biological descriptors improves the accuracy of QSAR models of in vivo toxicity - PubMed (original) (raw)
Use of in vitro HTS-derived concentration-response data as biological descriptors improves the accuracy of QSAR models of in vivo toxicity
Alexander Sedykh et al. Environ Health Perspect. 2011 Mar.
Abstract
Background: Quantitative high-throughput screening (qHTS) assays are increasingly being used to inform chemical hazard identification. Hundreds of chemicals have been tested in dozens of cell lines across extensive concentration ranges by the National Toxicology Program in collaboration with the National Institutes of Health Chemical Genomics Center.
Objectives: Our goal was to test a hypothesis that dose-response data points of the qHTS assays can serve as biological descriptors of assayed chemicals and, when combined with conventional chemical descriptors, improve the accuracy of quantitative structure-activity relationship (QSAR) models applied to prediction of in vivo toxicity end points.
Methods: We obtained cell viability qHTS concentration-response data for 1,408 substances assayed in 13 cell lines from PubChem; for a subset of these compounds, rodent acute toxicity half-maximal lethal dose (LD50) data were also available. We used the k nearest neighbor classification and random forest QSAR methods to model LD50 data using chemical descriptors either alone (conventional models) or combined with biological descriptors derived from the concentration-response qHTS data (hybrid models). Critical to our approach was the use of a novel noise-filtering algorithm to treat qHTS data.
Results: Both the external classification accuracy and coverage (i.e., fraction of compounds in the external set that fall within the applicability domain) of the hybrid QSAR models were superior to conventional models.
Conclusions: Concentration-response qHTS data may serve as informative biological descriptors of molecules that, when combined with conventional chemical descriptors, may considerably improve the accuracy and utility of computational approaches for predicting in vivo animal toxicity end points.
Figures
Figure 1
Modeling workflow. (A) Preparation of the target data set. (B) Modeling procedure for qHTS LD50 data set.
Figure 2
Examples of qHTS concentration–response curves and their noise-filtering transformations. (A) Original concentration–response curves for three sample chemicals from the qHTS data set (Jurkat cell line, AID no. 426). (B) Data after noise filtering (THR = 15%, MXDV = 5%). THR controls data variation near baseline; MXDV controls deviation from monotonicity. (C) Representation of concentration–response by binary fingerprints. (D) Concentration–response curve fingerprint of β-nitrostyrene. The _x_-axis indicates the qHTS profile based on 14 concentrations: “00 . . . 00 01 11 11 11” indicates 26 + 25 + 24 + 23 + 22 + 21 + 20 = 127.
Figure 3
Pairwise Euclidean distances in the chemical (_y_-axis) and biological (_x_-axis) descriptor space for the qHTS LD50 data set. Dots represent compound pairs; colors reflect in vivo toxicity: blue, pairs of nontoxic compounds; red, pairs of toxic compounds; green, pairs where one compound is toxic and another nontoxic.
Figure 4
External prediction results of _k_NN models using different classification criteria: distribution of the predicted values (A) and heat maps illustrating classification (B, CCR) and coverage (C, percent chemicals within the applicability domain) results for each pair of classification thresholds T1, T2 (i.e., “nontoxic” < T1 ≤ “not covered” < T2 ≤ “toxic”). Red dashed (A) and diagonal (B,C) lines denote a default single-threshold classification (T1 = T2 = 0.5). Gray (A) and black (B,C) dashed lines denote an example of double-threshold classification (T1 = 0.3 and T2 = 0.7).
Figure 5
Occurrence frequencies of the descriptors in the hybrid _k_NN (THR = 15%) model (A) and relative frequencies of qHTS biological descriptors (B). Max, maximum. The fraction of most frequent descriptors selected by mean occurrence is marked by a dashed line (A) and by a red arrowhead and red boxes (B).
References
- Andersen ME, Krewski D. Toxicity testing in the 21st century: bringing the vision to life. Toxicol Sci. 2009;107:324–330. - PubMed
- Breiman L. Random forests. Machine Learning. 2001;41:5–32.
- Bucher JR, Portier C. Human carcinogenic risk evaluation, Part V: the national toxicology program vision for assessing the human carcinogenic hazard of chemicals. Toxicol Sci. 2004;82:363–366. - PubMed
- Chawla NV. Data mining for imbalanced datasets: an overview. In: Maimon O, Rokach L, editors. The Data Mining and Knowledge Discovery Handbook. New York: Springer; 2005. pp. 853–867.
Publication types
MeSH terms
Substances
Grants and funding
- P30 ES010126/ES/NIEHS NIH HHS/United States
- R21 GM076059/GM/NIGMS NIH HHS/United States
- R01 GM066940/GM/NIGMS NIH HHS/United States
- GM066940/GM/NIGMS NIH HHS/United States
- ES005948/ES/NIEHS NIH HHS/United States
- P42 ES005948/ES/NIEHS NIH HHS/United States
- R01 ES015241/ES/NIEHS NIH HHS/United States
- GM076059/GM/NIGMS NIH HHS/United States
LinkOut - more resources
Full Text Sources