Wenbao Yu - Academia.edu (original) (raw)

Papers by Wenbao Yu

Research paper thumbnail of A unified model based multifactor dimensionality reduction framework for detecting gene-gene interactions

Bioinformatics (Oxford, England), 2016

Gene-gene interaction (GGI) is one of the most popular approaches for finding and explaining the ... more Gene-gene interaction (GGI) is one of the most popular approaches for finding and explaining the missing heritability of common complex traits in genome-wide association studies. The multifactor dimensionality reduction (MDR) method has been widely studied for detecting GGI effects. However, there are several disadvantages of the existing MDR-based approaches, such as the lack of an efficient way of evaluating the significance of multi-locus models and the high computational burden due to intensive permutation. Furthermore, the MDR method does not distinguish marginal effects from pure interaction effects. We propose a two-step unified model based MDR approach (UM-MDR), in which, the significance of a multi-locus model, even a high-order model, can be easily obtained through a regression framework with a semi-parametric correction procedure for controlling Type I error rates. In comparison to the conventional permutation approach, the proposed semi-parametric correction procedure av...

Research paper thumbnail of Gene-Gene Interaction Analysis for the Accelerated Failure Time Model Using a Unified Model-Based Multifactor Dimensionality Reduction Method

Genomics & informatics, 2016

Although a large number of genetic variants have been identified to be associated with common dis... more Although a large number of genetic variants have been identified to be associated with common diseases through genome-wide association studies, there still exits limitations in explaining the missing heritability. One approach to solving this missing heritability problem is to investigate gene-gene interactions, rather than a single-locus approach. For gene-gene interaction analysis, the multifactor dimensionality reduction (MDR) method has been widely applied, since the constructive induction algorithm of MDR efficiently reduces high-order dimensions into one dimension by classifying multi-level genotypes into high- and low-risk groups. The MDR method has been extended to various phenotypes and has been improved to provide a significance test for gene-gene interactions. In this paper, we propose a simple method, called accelerated failure time (AFT) UM-MDR, in which the idea of a unified model-based MDR is extended to the survival phenotype by incorporating AFT-MDR into the classif...

Research paper thumbnail of Two simple algorithms on linear combination of multiple biomarkers to maximize partial area under the ROC curve

Computational Statistics & Data Analysis, 2015

In clinical practices, it is common that several biomakers are related to a specific disease and ... more In clinical practices, it is common that several biomakers are related to a specific disease and each single marker does not have enough diagnostic power. An effective way to improve the diagnostic accuracy is to combine multiple markers. It is known that the area under the receiver operating characteristic curve (AUC) is very popular for evaluation of a diagnostic tool. Su and Liu (1993) derived the best linear combination that maximizes AUC when the markers are multivariate normally distributed. However, there are many applications that do not operate in the entire range of the curve, but only in particular regions of it, for example, high specificity regions. In these cases, it is more practical to analyze the partial area under the curve (pAUC). In this paper, we propose two easy-implemented algorithms, to find the best linear combination of multiple biomarkers that optimizes the pAUC, for given range of specificity. Analysis of synthesized and real datasets shows that the proposed algorithms achieve larger predictive pAUC values on future observations than existing methods, such as Su and Liu's method, logistic regression and others.

Research paper thumbnail of Multivariate Quantitative Multifactor Dimensionality Reduction for Detecting Gene-Gene Interactions

Human Heredity, 2015

To determine gene-gene interactions and missing heritability of complex diseases is a challenging... more To determine gene-gene interactions and missing heritability of complex diseases is a challenging topic in genome-wide association studies. The multifactor dimensionality reduction (MDR) method is one of the most commonly used methods for identifying gene-gene interactions with dichotomous phenotypes. For quantitative phenotypes, the generalized MDR or quantitative MDR (QMDR) methods have been proposed. These methods are known as univariate methods because they consider only one phenotype. To date, there are few methods for analyzing multiple phenotypes. To address this problem, we propose a multivariate QMDR method (Multi-QMDR) for multivariate correlated phenotypes. We summarize the multivariate phenotypes into a univariate score by dimensional reduction analysis, and then classify the samples accordingly into high-risk and low-risk groups. We use different ways of summarizing mainly based on the principal components. Multi-QMDR is model-free and easy to implement. Multi-QMDR is applied to lipid-related traits. The properties of Multi- QMDR were investigated through simulation studies. Empirical studies show that Multi-QMDR outperforms existing univariate and multivariate methods at identifying causal interactions. The Multi-QMDR approach improves the performance of QMDR when multiple quantitative phenotypes are available. © 2015 S. Karger AG, Basel.

Research paper thumbnail of The dynamics of NF-κB pathway regulated by circadian clock

Mathematical Biosciences, 2015

The circadian clock regulates many physiological parameters involving immune response to infectio... more The circadian clock regulates many physiological parameters involving immune response to infectious agents, which is mediated by activation of the transcription factor NF-κB. Thus, understanding the NF-κB dynamics regulated by circadian clocks will help in developing better therapeutics. To this end, we proposed a detailed model in the present work on the basis of understanding inflammatory response under control from circadian clocks. Our results show that the frequencies and amplitudes of the NF-κB oscillation are dependent on the strength and modes of coupling to circadian clock. This circadian control of NF-κB pathway can therefore serve as a useful mechanism in keeping the system in check and controlling inflammatory response induced by infection and other agents. The results are consistent with earlier experimental findings.

Research paper thumbnail of Bistable switch in let-7 miRNA biogenesis pathway involving Lin28

International journal of molecular sciences, Jan 21, 2014

miRNAs are small noncoding RNAs capable of regulating gene expression at the post-transcriptional... more miRNAs are small noncoding RNAs capable of regulating gene expression at the post-transcriptional level. A growing body of evidence demonstrated that let-7 family of miRNAs, as one of the highly conserved miRNAs, plays an important role in cell differentiation and development, as well as tumor suppressor function depending on their levels of expression. To explore the physiological significance of let-7 in regulating cell fate decisions, we present a coarse grained model of let-7 biogenesis network, in which let-7 and its regulator Lin28 inhibit mutually. The dynamics of this minimal network architecture indicates that, as the concentration of Lin28 increases, the system undergoes a transition from monostability to a bistability and then to a one-way switch with increasing strength of positive feedback of let-7, while in the absence of Lin28 inhibition, the system loses bistability. Moreover, the ratio of degradation rates of let-7 and Lin28 is critical for the switching sensitivity...

Research paper thumbnail of Comparison of paired ROC curves through a two-stage test

Journal of Biopharmaceutical Statistics, 2014

The area under the Receiver Operating Characteristic (ROC) curve (AUC) is a popularly used index ... more The area under the Receiver Operating Characteristic (ROC) curve (AUC) is a popularly used index when comparing two ROC curves. Statistical tests based on it for analyzing the difference have been well developed. However, this index is less informative when two ROC curves cross and have similar AUCs. In order to detect differences between ROC curves in such situations, a two-stage non-parametric test that uses a shifted area under the ROC curve (sAUC), along with AUCs, is proposed for paired designs. The new procedure is shown, numerically, to be effective in terms of power under a wide range of scenarios; additionally, it outperforms two conventional

Research paper thumbnail of AucPR: An AUC-based approach using penalized regression for disease prediction with high-dimensional omics data

BMC genomics, Jan 12, 2014

It is common to get an optimal combination of markers for disease classification and prediction w... more It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non...

Research paper thumbnail of A modified area under the ROC curve and its application to marker selection and classification

Journal of the Korean Statistical Society, 2014

The area under the ROC curve (AUC) can be interpreted as the probability that the classification ... more The area under the ROC curve (AUC) can be interpreted as the probability that the classification scores of a diseased subject is larger than that of a non-diseased subject for a randomly sampled pair of subjects. From the perspective of classification, we want to find a way to separate two groups as distinctly as possible via AUC. When the difference of the scores of a marker is small, its impact on classification is less important. Thus, a new diagnostic/classification measure based on a modified area under the ROC curve (mAUC) is proposed, which is defined as a weighted sum of two AUCs, where the AUC with the smaller difference is assigned a lower weight, and vice versa. Using mAUC is robust in the sense that mAUC gets larger as AUC gets larger as long as they are not equal. Moreover, in many diagnostic situations, only a specific range of specificity is of interest. Under normal distributions, we show that if the AUCs of two markers are within similar ranges, the larger mAUC implies the larger partial AUC for a given specificity. This property of mAUC will help to identify the marker with the higher partial AUC, even when the AUCs are similar. Two nonparametric estimates of an mAUC and their variances are given. We also suggest the use of mAUC as the objective function for classification, and the use of the gradient Lasso algorithm for classifier construction and marker selection. Application to simulation datasets and real microarray gene expression datasets show that our method finds a linear classifier with a higher ROC curve than some other existing linear classifiers, especially in the range of low false positive rates.

Research paper thumbnail of A new evaluation measure of diagnostic tests based on modified area under the receiver operating characteristic curve

2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW), 2010

In this paper, a new diagnostic/classification measure based on modified AUC (MAUC) is proposed. ... more In this paper, a new diagnostic/classification measure based on modified AUC (MAUC) is proposed. By this measure, we penalize the margin of features between diseased and non-diseased groups in AUC. It's threshold independent, and under normal distribution assumption, we can prove that higher MAUC always means higher PAUC (within relatively low FPR) when AUCs are close to each other. Our simulations and experiment about prostate cancer MS data can also help to demonstrate it.

Research paper thumbnail of A unified model based multifactor dimensionality reduction framework for detecting gene-gene interactions

Bioinformatics (Oxford, England), 2016

Gene-gene interaction (GGI) is one of the most popular approaches for finding and explaining the ... more Gene-gene interaction (GGI) is one of the most popular approaches for finding and explaining the missing heritability of common complex traits in genome-wide association studies. The multifactor dimensionality reduction (MDR) method has been widely studied for detecting GGI effects. However, there are several disadvantages of the existing MDR-based approaches, such as the lack of an efficient way of evaluating the significance of multi-locus models and the high computational burden due to intensive permutation. Furthermore, the MDR method does not distinguish marginal effects from pure interaction effects. We propose a two-step unified model based MDR approach (UM-MDR), in which, the significance of a multi-locus model, even a high-order model, can be easily obtained through a regression framework with a semi-parametric correction procedure for controlling Type I error rates. In comparison to the conventional permutation approach, the proposed semi-parametric correction procedure av...

Research paper thumbnail of Gene-Gene Interaction Analysis for the Accelerated Failure Time Model Using a Unified Model-Based Multifactor Dimensionality Reduction Method

Genomics & informatics, 2016

Although a large number of genetic variants have been identified to be associated with common dis... more Although a large number of genetic variants have been identified to be associated with common diseases through genome-wide association studies, there still exits limitations in explaining the missing heritability. One approach to solving this missing heritability problem is to investigate gene-gene interactions, rather than a single-locus approach. For gene-gene interaction analysis, the multifactor dimensionality reduction (MDR) method has been widely applied, since the constructive induction algorithm of MDR efficiently reduces high-order dimensions into one dimension by classifying multi-level genotypes into high- and low-risk groups. The MDR method has been extended to various phenotypes and has been improved to provide a significance test for gene-gene interactions. In this paper, we propose a simple method, called accelerated failure time (AFT) UM-MDR, in which the idea of a unified model-based MDR is extended to the survival phenotype by incorporating AFT-MDR into the classif...

Research paper thumbnail of Two simple algorithms on linear combination of multiple biomarkers to maximize partial area under the ROC curve

Computational Statistics & Data Analysis, 2015

In clinical practices, it is common that several biomakers are related to a specific disease and ... more In clinical practices, it is common that several biomakers are related to a specific disease and each single marker does not have enough diagnostic power. An effective way to improve the diagnostic accuracy is to combine multiple markers. It is known that the area under the receiver operating characteristic curve (AUC) is very popular for evaluation of a diagnostic tool. Su and Liu (1993) derived the best linear combination that maximizes AUC when the markers are multivariate normally distributed. However, there are many applications that do not operate in the entire range of the curve, but only in particular regions of it, for example, high specificity regions. In these cases, it is more practical to analyze the partial area under the curve (pAUC). In this paper, we propose two easy-implemented algorithms, to find the best linear combination of multiple biomarkers that optimizes the pAUC, for given range of specificity. Analysis of synthesized and real datasets shows that the proposed algorithms achieve larger predictive pAUC values on future observations than existing methods, such as Su and Liu's method, logistic regression and others.

Research paper thumbnail of Multivariate Quantitative Multifactor Dimensionality Reduction for Detecting Gene-Gene Interactions

Human Heredity, 2015

To determine gene-gene interactions and missing heritability of complex diseases is a challenging... more To determine gene-gene interactions and missing heritability of complex diseases is a challenging topic in genome-wide association studies. The multifactor dimensionality reduction (MDR) method is one of the most commonly used methods for identifying gene-gene interactions with dichotomous phenotypes. For quantitative phenotypes, the generalized MDR or quantitative MDR (QMDR) methods have been proposed. These methods are known as univariate methods because they consider only one phenotype. To date, there are few methods for analyzing multiple phenotypes. To address this problem, we propose a multivariate QMDR method (Multi-QMDR) for multivariate correlated phenotypes. We summarize the multivariate phenotypes into a univariate score by dimensional reduction analysis, and then classify the samples accordingly into high-risk and low-risk groups. We use different ways of summarizing mainly based on the principal components. Multi-QMDR is model-free and easy to implement. Multi-QMDR is applied to lipid-related traits. The properties of Multi- QMDR were investigated through simulation studies. Empirical studies show that Multi-QMDR outperforms existing univariate and multivariate methods at identifying causal interactions. The Multi-QMDR approach improves the performance of QMDR when multiple quantitative phenotypes are available. © 2015 S. Karger AG, Basel.

Research paper thumbnail of The dynamics of NF-κB pathway regulated by circadian clock

Mathematical Biosciences, 2015

The circadian clock regulates many physiological parameters involving immune response to infectio... more The circadian clock regulates many physiological parameters involving immune response to infectious agents, which is mediated by activation of the transcription factor NF-κB. Thus, understanding the NF-κB dynamics regulated by circadian clocks will help in developing better therapeutics. To this end, we proposed a detailed model in the present work on the basis of understanding inflammatory response under control from circadian clocks. Our results show that the frequencies and amplitudes of the NF-κB oscillation are dependent on the strength and modes of coupling to circadian clock. This circadian control of NF-κB pathway can therefore serve as a useful mechanism in keeping the system in check and controlling inflammatory response induced by infection and other agents. The results are consistent with earlier experimental findings.

Research paper thumbnail of Bistable switch in let-7 miRNA biogenesis pathway involving Lin28

International journal of molecular sciences, Jan 21, 2014

miRNAs are small noncoding RNAs capable of regulating gene expression at the post-transcriptional... more miRNAs are small noncoding RNAs capable of regulating gene expression at the post-transcriptional level. A growing body of evidence demonstrated that let-7 family of miRNAs, as one of the highly conserved miRNAs, plays an important role in cell differentiation and development, as well as tumor suppressor function depending on their levels of expression. To explore the physiological significance of let-7 in regulating cell fate decisions, we present a coarse grained model of let-7 biogenesis network, in which let-7 and its regulator Lin28 inhibit mutually. The dynamics of this minimal network architecture indicates that, as the concentration of Lin28 increases, the system undergoes a transition from monostability to a bistability and then to a one-way switch with increasing strength of positive feedback of let-7, while in the absence of Lin28 inhibition, the system loses bistability. Moreover, the ratio of degradation rates of let-7 and Lin28 is critical for the switching sensitivity...

Research paper thumbnail of Comparison of paired ROC curves through a two-stage test

Journal of Biopharmaceutical Statistics, 2014

The area under the Receiver Operating Characteristic (ROC) curve (AUC) is a popularly used index ... more The area under the Receiver Operating Characteristic (ROC) curve (AUC) is a popularly used index when comparing two ROC curves. Statistical tests based on it for analyzing the difference have been well developed. However, this index is less informative when two ROC curves cross and have similar AUCs. In order to detect differences between ROC curves in such situations, a two-stage non-parametric test that uses a shifted area under the ROC curve (sAUC), along with AUCs, is proposed for paired designs. The new procedure is shown, numerically, to be effective in terms of power under a wide range of scenarios; additionally, it outperforms two conventional

Research paper thumbnail of AucPR: An AUC-based approach using penalized regression for disease prediction with high-dimensional omics data

BMC genomics, Jan 12, 2014

It is common to get an optimal combination of markers for disease classification and prediction w... more It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non...

Research paper thumbnail of A modified area under the ROC curve and its application to marker selection and classification

Journal of the Korean Statistical Society, 2014

The area under the ROC curve (AUC) can be interpreted as the probability that the classification ... more The area under the ROC curve (AUC) can be interpreted as the probability that the classification scores of a diseased subject is larger than that of a non-diseased subject for a randomly sampled pair of subjects. From the perspective of classification, we want to find a way to separate two groups as distinctly as possible via AUC. When the difference of the scores of a marker is small, its impact on classification is less important. Thus, a new diagnostic/classification measure based on a modified area under the ROC curve (mAUC) is proposed, which is defined as a weighted sum of two AUCs, where the AUC with the smaller difference is assigned a lower weight, and vice versa. Using mAUC is robust in the sense that mAUC gets larger as AUC gets larger as long as they are not equal. Moreover, in many diagnostic situations, only a specific range of specificity is of interest. Under normal distributions, we show that if the AUCs of two markers are within similar ranges, the larger mAUC implies the larger partial AUC for a given specificity. This property of mAUC will help to identify the marker with the higher partial AUC, even when the AUCs are similar. Two nonparametric estimates of an mAUC and their variances are given. We also suggest the use of mAUC as the objective function for classification, and the use of the gradient Lasso algorithm for classifier construction and marker selection. Application to simulation datasets and real microarray gene expression datasets show that our method finds a linear classifier with a higher ROC curve than some other existing linear classifiers, especially in the range of low false positive rates.

Research paper thumbnail of A new evaluation measure of diagnostic tests based on modified area under the receiver operating characteristic curve

2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW), 2010

In this paper, a new diagnostic/classification measure based on modified AUC (MAUC) is proposed. ... more In this paper, a new diagnostic/classification measure based on modified AUC (MAUC) is proposed. By this measure, we penalize the margin of features between diseased and non-diseased groups in AUC. It's threshold independent, and under normal distribution assumption, we can prove that higher MAUC always means higher PAUC (within relatively low FPR) when AUCs are close to each other. Our simulations and experiment about prostate cancer MS data can also help to demonstrate it.