Heterogeneity Aware Random Forest for Drug Sensitivity Prediction - PubMed (original) (raw)
Heterogeneity Aware Random Forest for Drug Sensitivity Prediction
Raziur Rahman et al. Sci Rep. 2017.
Abstract
Samples collected in pharmacogenomics databases typically belong to various cancer types. For designing a drug sensitivity predictive model from such a database, a natural question arises whether a model trained on diverse inter-tumor heterogeneous samples will perform similar to a predictive model that takes into consideration the heterogeneity of the samples in model training and prediction. We explore this hypothesis and observe that ensemble model predictions obtained when cancer type is known out-perform predictions when that information is withheld even when the samples sizes for the former is considerably lower than the combined sample size. To incorporate the heterogeneity idea in the commonly used ensemble based predictive model of Random Forests, we propose Heterogeneity Aware Random Forests (HARF) that assigns weights to the trees based on the category of the sample. We treat heterogeneity as a latent class allocation problem and present a covariate free class allocation approach based on the distribution of leaf nodes of the model ensemble. Applications on CCLE and GDSC databases show that HARF outperforms traditional Random Forest when the average drug responses of cancer types are different.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures
Figure 1
Melanoma (Skin) tumorigenesis pathway, collected from KEGG.
Algorithm 1
Algorithmic representation of Heterogeneity Aware Random Forest (HARF) Regression.
Figure 2
3 sample trees with leaf information. Boxed numbers represent the samples contained within each leaf node. Red samples belong to cancer type C A while green samples belong to cancer type C B.
Figure 3
With an increase in the number of samples for training, the percentage of mis-classifications for HARF, Decision Tree and Linear Discriminant Analysis (LDA) all get reduced. Using drug Nilotinib of CCLE database and 2 cancer types HLT and Lung, this reduction of misclassification is shown. For small number of samples, HARF has the lowest misclassification rate. For large sample sizes, LDA gives the lowest misclassification rate, but the differences are minimal in both the cases.
Figure 4
Changes in misclassification rate of HARF and Bayes error (Eq. 16) for different number of trees are shown. For model with few trees, misclassification rate is higher compared to model with high number of trees. As expected, HARF misclassification rate is always higher compared to minimum Bayes error, but the difference is always minimal for models with different number of trees. Drug AZD − 6244 and cancer types Skin & CNS are used for the generation of these curves.
Similar articles
- Application of transfer learning for cancer drug sensitivity prediction.
Dhruba SR, Rahman R, Matlock K, Ghosh S, Pal R. Dhruba SR, et al. BMC Bioinformatics. 2018 Dec 28;19(Suppl 17):497. doi: 10.1186/s12859-018-2465-y. BMC Bioinformatics. 2018. PMID: 30591023 Free PMC article. - Functional random forest with applications in dose-response predictions.
Rahman R, Dhruba SR, Ghosh S, Pal R. Rahman R, et al. Sci Rep. 2019 Feb 7;9(1):1628. doi: 10.1038/s41598-018-38231-w. Sci Rep. 2019. PMID: 30733524 Free PMC article. - Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis.
Ozçift A. Ozçift A. Comput Biol Med. 2011 May;41(5):265-71. doi: 10.1016/j.compbiomed.2011.03.001. Epub 2011 Mar 17. Comput Biol Med. 2011. PMID: 21419401 - Genomic approach towards personalized anticancer drug therapy.
Midorikawa Y, Tsuji S, Takayama T, Aburatani H. Midorikawa Y, et al. Pharmacogenomics. 2012 Jan;13(2):191-9. doi: 10.2217/pgs.11.157. Pharmacogenomics. 2012. PMID: 22256868 Review. - Class-imbalanced classifiers for high-dimensional data.
Lin WJ, Chen JJ. Lin WJ, et al. Brief Bioinform. 2013 Jan;14(1):13-26. doi: 10.1093/bib/bbs006. Epub 2012 Mar 9. Brief Bioinform. 2013. PMID: 22408190 Review.
Cited by
- Big Data to Knowledge: Application of Machine Learning to Predictive Modeling of Therapeutic Response in Cancer.
Panja S, Rahem S, Chu CJ, Mitrofanova A. Panja S, et al. Curr Genomics. 2021 Dec 16;22(4):244-266. doi: 10.2174/1389202921999201224110101. Curr Genomics. 2021. PMID: 35273457 Free PMC article. Review. - Contrast-enhanced harmonic endoscopic ultrasound (CH-EUS) MASTER: A novel deep learning-based system in pancreatic mass diagnosis.
Tang A, Tian L, Gao K, Liu R, Hu S, Liu J, Xu J, Fu T, Zhang Z, Wang W, Zeng L, Qu W, Dai Y, Hou R, Tang S, Wang X. Tang A, et al. Cancer Med. 2023 Apr;12(7):7962-7973. doi: 10.1002/cam4.5578. Epub 2023 Jan 6. Cancer Med. 2023. PMID: 36606571 Free PMC article. Clinical Trial. - Improving prediction of phenotypic drug response on cancer cell lines using deep convolutional network.
Liu P, Li H, Li S, Leung KS. Liu P, et al. BMC Bioinformatics. 2019 Jul 29;20(1):408. doi: 10.1186/s12859-019-2910-6. BMC Bioinformatics. 2019. PMID: 31357929 Free PMC article. - Sstack: an R package for stacking with applications to scenarios involving sequential addition of samples and features.
Matlock K, Rahman R, Ghosh S, Pal R. Matlock K, et al. Bioinformatics. 2019 Sep 1;35(17):3143-3145. doi: 10.1093/bioinformatics/btz010. Bioinformatics. 2019. PMID: 30649230 Free PMC article. - Machine Learning in Drug Discovery: A Review.
Dara S, Dhamercherla S, Jadav SS, Babu CM, Ahsan MJ. Dara S, et al. Artif Intell Rev. 2022;55(3):1947-1999. doi: 10.1007/s10462-021-10058-4. Epub 2021 Aug 11. Artif Intell Rev. 2022. PMID: 34393317 Free PMC article.
References
- Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2005;67:301–320. doi: 10.1111/j.1467-9868.2005.00503.x. - DOI
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources