DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays - PubMed (original) (raw)

DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays

Amrit Singh et al. Bioinformatics. 2019.

Abstract

Motivation: In the continuously expanding omics era, novel computational and statistical strategies are needed for data integration and identification of biomarkers and molecular signatures. We present Data Integration Analysis for Biomarker discovery using Latent cOmponents (DIABLO), a multi-omics integrative method that seeks for common information across different data types through the selection of a subset of molecular features, while discriminating between multiple phenotypic groups.

Results: Using simulations and benchmark multi-omics studies, we show that DIABLO identifies features with superior biological relevance compared with existing unsupervised integrative methods, while achieving predictive performance comparable to state-of-the-art supervised approaches. DIABLO is versatile, allowing for modular-based analyses and cross-over study designs. In two case studies, DIABLO identified both known and novel multi-omics biomarkers consisting of mRNAs, miRNAs, CpGs, proteins and metabolites.

Availability and implementation: DIABLO is implemented in the mixOmics R Bioconductor package with functions for parameters' choice and visualization to assist in the interpretation of the integrative analyses, along with tutorials on http://mixomics.org and in our Bioconductor vignette.

Supplementary information: Supplementary data are available at Bioinformatics online.

© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

Simulation study. (A) Classification error rates (10-fold CV averaged over 20 simulations) for different FCs between groups and varying level of noise (SD). Dashed line indicates a random performance (error rate = 50%). (B) Types of variables selected by the different classification methods amongst the 180 variables selected for each classification method

Fig. 2.

Fig. 2.

Benchmark for colon cancer. (A) Number of selected features overlapping between supervised and unsupervised methods. (B) Number of correlated variables in the biomarker panels for various Pearson correlation cutoffs. (C) Top: network modularity of each multi-omics biomarker panel. Gray circles depict modules based on the edge betweenness index from the igraph R-library. Bottom: consensus component plots depicting the separation of subjects in the high and low survival groups. Similar patterns were observed for kidney, gbm and lung cancer datasets, see Supplementary Figures S5–S10

Fig. 3.

Fig. 3.

A multi-omics biomarker panel predictive of breast cancer subtypes. (A) DIABLO consensus component plot based on the identified multi-omics biomarker panel: test samples are overlaid with 95% confidence ellipses calculated from the training data. (B) Network visualization of the biomarker panel highlighting correlated variables (absolute Pearson’s correlation >0.4) and four communities based on the edge betweenness index

Fig. 4.

Fig. 4.

Asthma study: cross-over design and module-based analysis. (A) DIABLO design includes module-based decomposition to discriminate pre- and post-allergen challenge samples. (B) Receiver operating characteristic curves comparing standard DIABLO and multilevel DIABLO for repeated measures (mDIABLO) using leave-one-out CV. (C) Component plots of the pre- and post-challenge samples (DIABLO and mDIABLO)

Similar articles

Cited by

References

    1. Aben N., et al. (2016) Tandem: a two-stage approach to maximize interpretability of drug response models based on multiple molecular data types. Bioinformatics, 32, i413–i420. - PubMed
    1. Allahyar A., De Ridder J. (2015) Feral: network-based classifier with application to breast cancer outcome prediction. Bioinformatics, 31, i311–i319. - PMC - PubMed
    1. Argelaguet R., et al. (2018) Multi-omics factor analysis—a framework for unsupervised integration of multi-omics data sets. Mol. Syst. Biol., 14, e8124. - PMC - PubMed
    1. Bersanelli M., et al. (2016) Methods for the integration of multi-omics data: mathematical aspects. BMC Bioinformatics, 17, S15. - PMC - PubMed
    1. Chung I.-F., et al. (2016) Driverdbv2: a database for human cancer driver gene research. Nucleic Acids Res., 44, D975–D979. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources