Hua Liang - Academia.edu (original) (raw)

Papers by Hua Liang

Research paper thumbnail of GEE ANALYSIS IN PARTIALLY LINEAR SINGLE-INDEX MODELS FOR LONGITUDINAL DATA By

In this article, we study a partially linear single-index model for longitudinal data under a gen... more In this article, we study a partially linear single-index model for longitudinal data under a general framework which includes both the sparse and dense longitudinal data cases. A semiparametric estimation method based on a combination of the local linear smoothing and generalized estimation equations (GEE) is introduced to estimate the two parameter vectors as well as the unknown link function. Under some mild conditions, we derive the asymptotic properties of the proposed parametric and nonparametric estimators in different scenarios, from which we find that the convergence rates and asymptotic variances of the proposed estimators for sparse longitudinal data would be substantially different from those for dense longitudinal data. We also discuss the estimation of the covariance (or weight) matrices involved in the semiparametric GEE method. Furthermore, we provide some numerical studies including Monte Carlo simulation and an empirical application to illustrate our methodology an...

Research paper thumbnail of Quantile Regression Estimates for a Class of Linear and Partially Linear Errors-in-Variables Models

We consider the problem of estimating quantile regression coefficients in errors-in-variables mod... more We consider the problem of estimating quantile regression coefficients in errors-in-variables models. When the error variables for both the response and the manifest variables have a joint distribution that is spherically symmetric but otherwise unknown, the regression quantile estimates based on orthogonal residuals are shown to be consistent and asymptotically normal. We also extend the work to partially linear models when the response is related to some additional covariate.

Research paper thumbnail of Analysis of Schizophrenia Data Using A Nonlinear Threshold Index Logistic Model

PLoS ONE, 2014

Genetic information, such as single nucleotide polymorphism (SNP) data, has been widely recognize... more Genetic information, such as single nucleotide polymorphism (SNP) data, has been widely recognized as useful in prediction of disease risk. However, how to model the genetic data that is often categorical in disease class prediction is complex and challenging. In this paper, we propose a novel class of nonlinear threshold index logistic models to deal with the complex, nonlinear effects of categorical/discrete SNP covariates for Schizophrenia class prediction. A maximum likelihood methodology is suggested to estimate the unknown parameters in the models. Simulation studies demonstrate that the proposed methodology works viably well for moderate-size samples. The suggested approach is therefore applied to the analysis of the Schizophrenia classification by using a real set of SNP data from Western Australian Family Study of Schizophrenia (WAFSS). Our empirical findings provide evidence that the proposed nonlinear models well outperform the widely used linear and tree based logistic regression models in class prediction of schizophrenia risk with SNP data in terms of both Types I/II error rates and ROC curves.

Research paper thumbnail of Asymptotic normality of parametric part in partially linear models with measurement error in the nonparametric part

Journal of Statistical Planning and Inference, 2000

Research paper thumbnail of Partially Linear Single-Index Measurement Error Models

Statistica Sinica, 2005

Statistica Sinica 15(2005), 99-116 PARTIALLY LINEAR SINGLE-INDEX MEASUREMENT ERROR MODELS Hua Lia... more Statistica Sinica 15(2005), 99-116 PARTIALLY LINEAR SINGLE-INDEX MEASUREMENT ERROR MODELS Hua Liang and Naisyin Wang St. Jude Children's Research Hospital and Texas A&M University Abstract: We consider a partially linear single-index model Y = η(ZT ...

Research paper thumbnail of Iterative Likelihood: A Unified Inference Tool

Journal of Computational and Graphical Statistics, 2021

We propose a framework for inference based on an "iterative likelihood function", which provides ... more We propose a framework for inference based on an "iterative likelihood function", which provides a unified representation for a number of iterative approaches, including the EM algorithm and the generalized estimating equations. The parameters are decoupled to facilitate construction of the inference vehicle, to simplify computation, or to ensure robustness to model misspecification and then recoupled to retain their original interpretations. For simplicity, throughout the paper we will refer to the log-likelihood as the "likelihood". We define the global, local, and stationary estimates of an iterative likelihood and, correspondingly, the global, local, and stationary attraction points of the expected iterative likelihood. Asymptotic properties of the global, local, and stationary estimates are derived under certain assumptions. An iterative likelihood is usually constructed such that the true value of the parameter is a a point of attraction of the expected log-likelihood. Often, one can only verify that the true value of the parameter is a local or stationary attraction, but not a global attraction. We show that when the true value of the parameter is a global attraction, any global estimate is consistent and asymptotically normal; when the true value is a local or stationary attraction, there exists a local or stationary estimate that is consistent and asymptotically normal, with a probability tending to 1. The behavior of the estimates under a misspecified model is also discussed. Our methodology is illustrated with three examples: 1) estimation of the treatment group difference in the level of censored HIV RNA viral load from an AIDS clinical trial; 2) analysis of the relationship between forced expiratory volume and height in girls from a longitudinal pulmonary function study; and 3) investigation of the impact of smoking on lung cancer in the presence of DNA adducts. Two additional examples are in the supplementary materials, GEEs (Generalized Estimating Equations) with missing covariates and an unweighted estimator for big data with subsampling.

Research paper thumbnail of Efficient Diagnostics for Parametric Regression Models with Distortion Measurement Errors Incorporating Dimension-reduction

Statistica Sinica, 2022

In this work, we study the diagnostics of parametric regression models when both the response var... more In this work, we study the diagnostics of parametric regression models when both the response variable and covariates are distorted with errors. We employ a projected empirical process to develop Cramér-von Mises and Kolmogorov-Smirnov tests with dimension-reduction effects. We apply random approximation to enable the expedient calculation of Kolmogorov-Smirnov test for checking the suitability of regression models. The proposed tests are shown to be consistent and can detect an alternative hypothesis close to the null hypothesis at the root−n rate. Simulation studies show that the proposed tests outperform the existing methods. A real data set is analyzed for illustration.

Research paper thumbnail of Estimation of Single-index Models with Fixed Censored Responses

Statistica Sinica, 2020

We propose a new procedure to estimate the index parameter and link function of single-index mode... more We propose a new procedure to estimate the index parameter and link function of single-index models, where the response variable is subject to fixed censoring. Under some regularity conditions, we show that the estimated index parameter is root-n consistent and asymptotically normal, and the estimated nonparametric link function achieves the optimal convergence rate and is asymptotically normal. In addition, we propose a linearity testing method for the nonparametric link function. The simulation study shows that the proposed procedures perform well in finite sample experiments. An application to an HIV dataset is presented for illustration.

Research paper thumbnail of Bi-level variable selection in high-dimensional Tobit models

Statistics and Its Interface, 2020

To study variable selection for high dimensional Tobit models, we formulate Tobit models to singl... more To study variable selection for high dimensional Tobit models, we formulate Tobit models to single-index models. We hybrid group variable selection procedures for single index models and univariate regression methods for Tobit models to achieve variable selection for Tobit models with group structures taken into consideration. The procedure is computationally efficient and easily implemented. Finite sample experiments show its promising performance. We also illustrate its utility by analyzing a dataset from an HIV/AIDS study.

Research paper thumbnail of A projection-based consistent test incorporating dimension-reduction in partial linear models

Statistica Sinica, 2021

We propose a projection-based test to check partially linear models. The proposed test achieves a... more We propose a projection-based test to check partially linear models. The proposed test achieves a reduction in dimension and, in the presence of multiple linear regressors, the proposed method behaves as if only a single covariate is present. The test is shown to be consistent and can detect Pitman local alternative hypothetical models. We further derive asymptotic distributions of the proposed test under the null hypothesis, and local and global alternatives. Most importantly, the test's numerical performance is consistently and remarkably superior to its competitors. Real examples are presented for illustration. Although we assume that the nonparametric component of the model has a univariate covariate, our model can be generalized to partially linear additive models, partially linear single-index models, and other models with linear and nonparametric components.

Research paper thumbnail of A generalized partially linear framework for variance functions

Annals of the Institute of Statistical Mathematics, 2017

When model the heteroscedasticity in a broad class of partially linear models, we allow the varia... more When model the heteroscedasticity in a broad class of partially linear models, we allow the variance function to be a partial linear model as well and the parameters in the variance function to be different from those in the mean function. We develop a two-step estimation procedure, where in the first step some initial estimates of the parameters in both the mean and variance functions are obtained and then in the second step the estimates are updated using the weights calculated based on the initial estimates. The resulting weighted estimators of the linear coefficients in both the mean and variance functions are shown to be asymptotically normal, more efficient than the initial un-weighted estimators, and most efficient in the sense of semiparametric efficiency for some special cases. Simulation experiments are conducted to examine the

Research paper thumbnail of Additive partially linear models for massive heterogeneous data

Electronic Journal of Statistics, 2019

We consider an additive partially linear framework for modelling massive heterogeneous data. The ... more We consider an additive partially linear framework for modelling massive heterogeneous data. The major goal is to extract multiple common features simultaneously across all sub-populations while exploring heterogeneity of each sub-population. We propose an aggregation type of estimators for the commonality parameters that possess the asymptotic optimal bounds and the asymptotic distributions as if there were no heterogeneity. This oracle result holds when the number of sub-populations does not grow too fast and the tuning parameters are selected carefully. A plugin estimator for the heterogeneity parameter is further constructed, and shown to possess the asymptotic distribution as if the commonality information were available. Furthermore, we develop a heterogeneity test for the linear components and a homogeneity test for the non-linear components accordingly. The performance of the proposed methods is evaluated via simulation studies and an application to the Medicare Provider Utilization and Payment data.

Research paper thumbnail of Using single-index ODEs to study dynamic gene regulatory network

PloS one, 2018

With the development of biotechnology, high-throughput studies on protein-protein, protein-gene, ... more With the development of biotechnology, high-throughput studies on protein-protein, protein-gene, and gene-gene interactions become possible and attract remarkable attention. To explore the interactions in dynamic gene regulatory networks, we propose a single-index ordinary differential equation (ODE) model and develop a variable selection procedure. We employ the smoothly clipped absolute deviation penalty (SCAD) penalized function for variable selection. We analyze a yeast cell cycle gene expression data set to illustrate the usefulness of the single-index ODE model. In real data analysis, we group genes into functional modules using the smoothing spline clustering approach. We estimate state functions and their first derivatives for functional modules using penalized spline-based nonparametric mixed-effects models and the spline method. We substitute the estimates into the single-index ODE models, and then use the penalized profile least-squares procedure to identify network struc...

Research paper thumbnail of Separation of linear and index covariates in partially linear single-index models

Journal of Multivariate Analysis, 2016

Motivated to automatically partition predictors into a linear part and a nonlinear part in partia... more Motivated to automatically partition predictors into a linear part and a nonlinear part in partially linear single-index models (PLSIM), we consider the estimation of a partially linear single-index model where the linear part and the nonlinear part involves the same set of covariates. We use two penalties to identify the nonzero components of the linear and index vectors, which automatically separates covariates into the linear and nonlinear part, and thus solves the difficult problem of model structure identification in PLSIM. We propose an estimation procedure and establish its asymptotic properties, which takes into account constraints that guarantee identifiability of the model. Both simulated and real data are used to illustrate the estimation procedure.

Research paper thumbnail of Variable Selection and Model Averaging for Longitudinal Data Incorporating GEE Approach

Statistica Sinica, 2017

The Akaike Criterion, which is based on maximum likelihood estimation and cannot be applied direc... more The Akaike Criterion, which is based on maximum likelihood estimation and cannot be applied directly to the situations when likelihood functions are not available, has been modified for variable selection in longitudinal data with generalized estimating equations via a working independence model. This paper proposes another modification to AIC, the difference between the quasi-likelihood functions of a candidate model and of a narrow model plus a penalty term. Such a difference avoids calculating complex integration from quasi-likelihood, but inherits theoretical asymptotic properties from AIC. We also propose a focused information criterion for variable selection on the basis of the quasi-score function. Further, this paper develops a frequentist model average estimator for longitudinal data with generalized estimating equations. Simulation studies provide evidence of the superiority of the proposed procedures. The procedures are further applied to a data example.

Research paper thumbnail of Optimal Model Averaging Estimation for Generalized Linear Models and Generalized Linear Mixed-Effects Models

Journal of the American Statistical Association, 2016

Considering model averaging estimation in generalized linear models, we propose a weight choice c... more Considering model averaging estimation in generalized linear models, we propose a weight choice criterion based on the Kullback-Leibler (KL) loss with a penalty term. This criterion is different from that for continuous observations in principle, but reduces to the Mallows criterion in the situation. We prove that the corresponding model averaging estimator is asymptotically optimal under certain assumptions. We further extend our concern to the generalized linear mixed-effects model framework and establish associated theory. Numerical experiments illustrate that the proposed method is promising.

Research paper thumbnail of Identification of significant B cell associations with undetected observations using a Tobit model

Statistics and Its Interface, 2016

To study the relationship of serum antibody neutralization activity (determined by IC50) and the ... more To study the relationship of serum antibody neutralization activity (determined by IC50) and the B cell immune response, we face two challenges: (i) IC50 values can not be observed when they are below the detected limitation, and (ii) the number of factors is larger than the number of observations. To address these two challenges, we propose a Tobit model for the analysis of the study, and an adaptive LASSO penalized variable selection procedure to identify important factors. Furthermore, we suggest extended Bayesian information criterion for selecting the tuning parameter. Our analysis indicates that three measured B cells, specifically the frequency of CD19+CD20+, CD19-CD20+, and IgD-B220-CD27-peripheral blood B cell subsets have significant effects on IC50. We have also run simulation studies to evaluate the numerical performance of the proposed method.

Research paper thumbnail of Estimation and inference in generalized additive coefficient models for nonlinear interactions with high-dimensional covariates

The Annals of Statistics, 2015

In the low-dimensional case, the generalized additive coefficient model (GACM) proposed by Xue an... more In the low-dimensional case, the generalized additive coefficient model (GACM) proposed by Xue and Yang [Statist. Sinica 16 (2006) 1423-1446] has been demonstrated to be a powerful tool for studying nonlinear interaction effects of variables. In this paper, we propose estimation and inference procedures for the GACM when the dimension of the variables is high. Specifically, we propose a groupwise penalization based procedure to distinguish significant covariates for the "large p small n" setting. The procedure is shown to be consistent for model structure identification. Further, we construct simultaneous confidence bands for the coefficient functions in the selected model based on a refined two-step spline estimator. We also discuss how to choose the tuning parameters. To estimate the standard deviation of the functional estimator, we adopt the smoothed bootstrap method. We conduct simulation experiments to evaluate the numerical performance of the proposed methods and analyze an obesity data set from a genome-wide association study as an illustration.

Research paper thumbnail of Integrated conditional moment test for partially linear single index models incorporating dimension-reduction

Electronic Journal of Statistics, 2014

Studying model checking problems for partially linear singleindex models, we propose a variant of... more Studying model checking problems for partially linear singleindex models, we propose a variant of the integrated conditional moment test using a linear projection weighting function, which gains dimension reduction and makes the proposed method act as if there exists only one covariate even in the presence of multiple dimensional regressors. We derive asymptotic distributions of the proposed test; i.e., an integral of a

Research paper thumbnail of Partially linear single index models for repeated measurements

Journal of Multivariate Analysis, 2014

In this article, we study the estimations of partially linear single-index models (PLSiM) with re... more In this article, we study the estimations of partially linear single-index models (PLSiM) with repeated measurements. Specifically, we approximate the nonparametric function by the polynomial spline, and then employ the quadratic inference function (QIF) together with profile principle to derive the QIF-based estimators for the linear coefficients. The asymptotic normality of the resulting linear coefficient estimators and the optimal convergence rate of the nonparametric function estimate are established. In addition, we employ a penalized procedure to simultaneously select significant variables and estimate unknown parameters. The resulting penalized QIF estimators are shown to have the oracle property, and Monte Carlo studies support this finding. An empirical example is also presented to illustrate the usefulness of penalized estimators.

Research paper thumbnail of GEE ANALYSIS IN PARTIALLY LINEAR SINGLE-INDEX MODELS FOR LONGITUDINAL DATA By

In this article, we study a partially linear single-index model for longitudinal data under a gen... more In this article, we study a partially linear single-index model for longitudinal data under a general framework which includes both the sparse and dense longitudinal data cases. A semiparametric estimation method based on a combination of the local linear smoothing and generalized estimation equations (GEE) is introduced to estimate the two parameter vectors as well as the unknown link function. Under some mild conditions, we derive the asymptotic properties of the proposed parametric and nonparametric estimators in different scenarios, from which we find that the convergence rates and asymptotic variances of the proposed estimators for sparse longitudinal data would be substantially different from those for dense longitudinal data. We also discuss the estimation of the covariance (or weight) matrices involved in the semiparametric GEE method. Furthermore, we provide some numerical studies including Monte Carlo simulation and an empirical application to illustrate our methodology an...

Research paper thumbnail of Quantile Regression Estimates for a Class of Linear and Partially Linear Errors-in-Variables Models

We consider the problem of estimating quantile regression coefficients in errors-in-variables mod... more We consider the problem of estimating quantile regression coefficients in errors-in-variables models. When the error variables for both the response and the manifest variables have a joint distribution that is spherically symmetric but otherwise unknown, the regression quantile estimates based on orthogonal residuals are shown to be consistent and asymptotically normal. We also extend the work to partially linear models when the response is related to some additional covariate.

Research paper thumbnail of Analysis of Schizophrenia Data Using A Nonlinear Threshold Index Logistic Model

PLoS ONE, 2014

Genetic information, such as single nucleotide polymorphism (SNP) data, has been widely recognize... more Genetic information, such as single nucleotide polymorphism (SNP) data, has been widely recognized as useful in prediction of disease risk. However, how to model the genetic data that is often categorical in disease class prediction is complex and challenging. In this paper, we propose a novel class of nonlinear threshold index logistic models to deal with the complex, nonlinear effects of categorical/discrete SNP covariates for Schizophrenia class prediction. A maximum likelihood methodology is suggested to estimate the unknown parameters in the models. Simulation studies demonstrate that the proposed methodology works viably well for moderate-size samples. The suggested approach is therefore applied to the analysis of the Schizophrenia classification by using a real set of SNP data from Western Australian Family Study of Schizophrenia (WAFSS). Our empirical findings provide evidence that the proposed nonlinear models well outperform the widely used linear and tree based logistic regression models in class prediction of schizophrenia risk with SNP data in terms of both Types I/II error rates and ROC curves.

Research paper thumbnail of Asymptotic normality of parametric part in partially linear models with measurement error in the nonparametric part

Journal of Statistical Planning and Inference, 2000

Research paper thumbnail of Partially Linear Single-Index Measurement Error Models

Statistica Sinica, 2005

Statistica Sinica 15(2005), 99-116 PARTIALLY LINEAR SINGLE-INDEX MEASUREMENT ERROR MODELS Hua Lia... more Statistica Sinica 15(2005), 99-116 PARTIALLY LINEAR SINGLE-INDEX MEASUREMENT ERROR MODELS Hua Liang and Naisyin Wang St. Jude Children's Research Hospital and Texas A&M University Abstract: We consider a partially linear single-index model Y = η(ZT ...

Research paper thumbnail of Iterative Likelihood: A Unified Inference Tool

Journal of Computational and Graphical Statistics, 2021

We propose a framework for inference based on an "iterative likelihood function", which provides ... more We propose a framework for inference based on an "iterative likelihood function", which provides a unified representation for a number of iterative approaches, including the EM algorithm and the generalized estimating equations. The parameters are decoupled to facilitate construction of the inference vehicle, to simplify computation, or to ensure robustness to model misspecification and then recoupled to retain their original interpretations. For simplicity, throughout the paper we will refer to the log-likelihood as the "likelihood". We define the global, local, and stationary estimates of an iterative likelihood and, correspondingly, the global, local, and stationary attraction points of the expected iterative likelihood. Asymptotic properties of the global, local, and stationary estimates are derived under certain assumptions. An iterative likelihood is usually constructed such that the true value of the parameter is a a point of attraction of the expected log-likelihood. Often, one can only verify that the true value of the parameter is a local or stationary attraction, but not a global attraction. We show that when the true value of the parameter is a global attraction, any global estimate is consistent and asymptotically normal; when the true value is a local or stationary attraction, there exists a local or stationary estimate that is consistent and asymptotically normal, with a probability tending to 1. The behavior of the estimates under a misspecified model is also discussed. Our methodology is illustrated with three examples: 1) estimation of the treatment group difference in the level of censored HIV RNA viral load from an AIDS clinical trial; 2) analysis of the relationship between forced expiratory volume and height in girls from a longitudinal pulmonary function study; and 3) investigation of the impact of smoking on lung cancer in the presence of DNA adducts. Two additional examples are in the supplementary materials, GEEs (Generalized Estimating Equations) with missing covariates and an unweighted estimator for big data with subsampling.

Research paper thumbnail of Efficient Diagnostics for Parametric Regression Models with Distortion Measurement Errors Incorporating Dimension-reduction

Statistica Sinica, 2022

In this work, we study the diagnostics of parametric regression models when both the response var... more In this work, we study the diagnostics of parametric regression models when both the response variable and covariates are distorted with errors. We employ a projected empirical process to develop Cramér-von Mises and Kolmogorov-Smirnov tests with dimension-reduction effects. We apply random approximation to enable the expedient calculation of Kolmogorov-Smirnov test for checking the suitability of regression models. The proposed tests are shown to be consistent and can detect an alternative hypothesis close to the null hypothesis at the root−n rate. Simulation studies show that the proposed tests outperform the existing methods. A real data set is analyzed for illustration.

Research paper thumbnail of Estimation of Single-index Models with Fixed Censored Responses

Statistica Sinica, 2020

We propose a new procedure to estimate the index parameter and link function of single-index mode... more We propose a new procedure to estimate the index parameter and link function of single-index models, where the response variable is subject to fixed censoring. Under some regularity conditions, we show that the estimated index parameter is root-n consistent and asymptotically normal, and the estimated nonparametric link function achieves the optimal convergence rate and is asymptotically normal. In addition, we propose a linearity testing method for the nonparametric link function. The simulation study shows that the proposed procedures perform well in finite sample experiments. An application to an HIV dataset is presented for illustration.

Research paper thumbnail of Bi-level variable selection in high-dimensional Tobit models

Statistics and Its Interface, 2020

To study variable selection for high dimensional Tobit models, we formulate Tobit models to singl... more To study variable selection for high dimensional Tobit models, we formulate Tobit models to single-index models. We hybrid group variable selection procedures for single index models and univariate regression methods for Tobit models to achieve variable selection for Tobit models with group structures taken into consideration. The procedure is computationally efficient and easily implemented. Finite sample experiments show its promising performance. We also illustrate its utility by analyzing a dataset from an HIV/AIDS study.

Research paper thumbnail of A projection-based consistent test incorporating dimension-reduction in partial linear models

Statistica Sinica, 2021

We propose a projection-based test to check partially linear models. The proposed test achieves a... more We propose a projection-based test to check partially linear models. The proposed test achieves a reduction in dimension and, in the presence of multiple linear regressors, the proposed method behaves as if only a single covariate is present. The test is shown to be consistent and can detect Pitman local alternative hypothetical models. We further derive asymptotic distributions of the proposed test under the null hypothesis, and local and global alternatives. Most importantly, the test's numerical performance is consistently and remarkably superior to its competitors. Real examples are presented for illustration. Although we assume that the nonparametric component of the model has a univariate covariate, our model can be generalized to partially linear additive models, partially linear single-index models, and other models with linear and nonparametric components.

Research paper thumbnail of A generalized partially linear framework for variance functions

Annals of the Institute of Statistical Mathematics, 2017

When model the heteroscedasticity in a broad class of partially linear models, we allow the varia... more When model the heteroscedasticity in a broad class of partially linear models, we allow the variance function to be a partial linear model as well and the parameters in the variance function to be different from those in the mean function. We develop a two-step estimation procedure, where in the first step some initial estimates of the parameters in both the mean and variance functions are obtained and then in the second step the estimates are updated using the weights calculated based on the initial estimates. The resulting weighted estimators of the linear coefficients in both the mean and variance functions are shown to be asymptotically normal, more efficient than the initial un-weighted estimators, and most efficient in the sense of semiparametric efficiency for some special cases. Simulation experiments are conducted to examine the

Research paper thumbnail of Additive partially linear models for massive heterogeneous data

Electronic Journal of Statistics, 2019

We consider an additive partially linear framework for modelling massive heterogeneous data. The ... more We consider an additive partially linear framework for modelling massive heterogeneous data. The major goal is to extract multiple common features simultaneously across all sub-populations while exploring heterogeneity of each sub-population. We propose an aggregation type of estimators for the commonality parameters that possess the asymptotic optimal bounds and the asymptotic distributions as if there were no heterogeneity. This oracle result holds when the number of sub-populations does not grow too fast and the tuning parameters are selected carefully. A plugin estimator for the heterogeneity parameter is further constructed, and shown to possess the asymptotic distribution as if the commonality information were available. Furthermore, we develop a heterogeneity test for the linear components and a homogeneity test for the non-linear components accordingly. The performance of the proposed methods is evaluated via simulation studies and an application to the Medicare Provider Utilization and Payment data.

Research paper thumbnail of Using single-index ODEs to study dynamic gene regulatory network

PloS one, 2018

With the development of biotechnology, high-throughput studies on protein-protein, protein-gene, ... more With the development of biotechnology, high-throughput studies on protein-protein, protein-gene, and gene-gene interactions become possible and attract remarkable attention. To explore the interactions in dynamic gene regulatory networks, we propose a single-index ordinary differential equation (ODE) model and develop a variable selection procedure. We employ the smoothly clipped absolute deviation penalty (SCAD) penalized function for variable selection. We analyze a yeast cell cycle gene expression data set to illustrate the usefulness of the single-index ODE model. In real data analysis, we group genes into functional modules using the smoothing spline clustering approach. We estimate state functions and their first derivatives for functional modules using penalized spline-based nonparametric mixed-effects models and the spline method. We substitute the estimates into the single-index ODE models, and then use the penalized profile least-squares procedure to identify network struc...

Research paper thumbnail of Separation of linear and index covariates in partially linear single-index models

Journal of Multivariate Analysis, 2016

Motivated to automatically partition predictors into a linear part and a nonlinear part in partia... more Motivated to automatically partition predictors into a linear part and a nonlinear part in partially linear single-index models (PLSIM), we consider the estimation of a partially linear single-index model where the linear part and the nonlinear part involves the same set of covariates. We use two penalties to identify the nonzero components of the linear and index vectors, which automatically separates covariates into the linear and nonlinear part, and thus solves the difficult problem of model structure identification in PLSIM. We propose an estimation procedure and establish its asymptotic properties, which takes into account constraints that guarantee identifiability of the model. Both simulated and real data are used to illustrate the estimation procedure.

Research paper thumbnail of Variable Selection and Model Averaging for Longitudinal Data Incorporating GEE Approach

Statistica Sinica, 2017

The Akaike Criterion, which is based on maximum likelihood estimation and cannot be applied direc... more The Akaike Criterion, which is based on maximum likelihood estimation and cannot be applied directly to the situations when likelihood functions are not available, has been modified for variable selection in longitudinal data with generalized estimating equations via a working independence model. This paper proposes another modification to AIC, the difference between the quasi-likelihood functions of a candidate model and of a narrow model plus a penalty term. Such a difference avoids calculating complex integration from quasi-likelihood, but inherits theoretical asymptotic properties from AIC. We also propose a focused information criterion for variable selection on the basis of the quasi-score function. Further, this paper develops a frequentist model average estimator for longitudinal data with generalized estimating equations. Simulation studies provide evidence of the superiority of the proposed procedures. The procedures are further applied to a data example.

Research paper thumbnail of Optimal Model Averaging Estimation for Generalized Linear Models and Generalized Linear Mixed-Effects Models

Journal of the American Statistical Association, 2016

Considering model averaging estimation in generalized linear models, we propose a weight choice c... more Considering model averaging estimation in generalized linear models, we propose a weight choice criterion based on the Kullback-Leibler (KL) loss with a penalty term. This criterion is different from that for continuous observations in principle, but reduces to the Mallows criterion in the situation. We prove that the corresponding model averaging estimator is asymptotically optimal under certain assumptions. We further extend our concern to the generalized linear mixed-effects model framework and establish associated theory. Numerical experiments illustrate that the proposed method is promising.

Research paper thumbnail of Identification of significant B cell associations with undetected observations using a Tobit model

Statistics and Its Interface, 2016

To study the relationship of serum antibody neutralization activity (determined by IC50) and the ... more To study the relationship of serum antibody neutralization activity (determined by IC50) and the B cell immune response, we face two challenges: (i) IC50 values can not be observed when they are below the detected limitation, and (ii) the number of factors is larger than the number of observations. To address these two challenges, we propose a Tobit model for the analysis of the study, and an adaptive LASSO penalized variable selection procedure to identify important factors. Furthermore, we suggest extended Bayesian information criterion for selecting the tuning parameter. Our analysis indicates that three measured B cells, specifically the frequency of CD19+CD20+, CD19-CD20+, and IgD-B220-CD27-peripheral blood B cell subsets have significant effects on IC50. We have also run simulation studies to evaluate the numerical performance of the proposed method.

Research paper thumbnail of Estimation and inference in generalized additive coefficient models for nonlinear interactions with high-dimensional covariates

The Annals of Statistics, 2015

In the low-dimensional case, the generalized additive coefficient model (GACM) proposed by Xue an... more In the low-dimensional case, the generalized additive coefficient model (GACM) proposed by Xue and Yang [Statist. Sinica 16 (2006) 1423-1446] has been demonstrated to be a powerful tool for studying nonlinear interaction effects of variables. In this paper, we propose estimation and inference procedures for the GACM when the dimension of the variables is high. Specifically, we propose a groupwise penalization based procedure to distinguish significant covariates for the "large p small n" setting. The procedure is shown to be consistent for model structure identification. Further, we construct simultaneous confidence bands for the coefficient functions in the selected model based on a refined two-step spline estimator. We also discuss how to choose the tuning parameters. To estimate the standard deviation of the functional estimator, we adopt the smoothed bootstrap method. We conduct simulation experiments to evaluate the numerical performance of the proposed methods and analyze an obesity data set from a genome-wide association study as an illustration.

Research paper thumbnail of Integrated conditional moment test for partially linear single index models incorporating dimension-reduction

Electronic Journal of Statistics, 2014

Studying model checking problems for partially linear singleindex models, we propose a variant of... more Studying model checking problems for partially linear singleindex models, we propose a variant of the integrated conditional moment test using a linear projection weighting function, which gains dimension reduction and makes the proposed method act as if there exists only one covariate even in the presence of multiple dimensional regressors. We derive asymptotic distributions of the proposed test; i.e., an integral of a

Research paper thumbnail of Partially linear single index models for repeated measurements

Journal of Multivariate Analysis, 2014

In this article, we study the estimations of partially linear single-index models (PLSiM) with re... more In this article, we study the estimations of partially linear single-index models (PLSiM) with repeated measurements. Specifically, we approximate the nonparametric function by the polynomial spline, and then employ the quadratic inference function (QIF) together with profile principle to derive the QIF-based estimators for the linear coefficients. The asymptotic normality of the resulting linear coefficient estimators and the optimal convergence rate of the nonparametric function estimate are established. In addition, we employ a penalized procedure to simultaneously select significant variables and estimate unknown parameters. The resulting penalized QIF estimators are shown to have the oracle property, and Monte Carlo studies support this finding. An empirical example is also presented to illustrate the usefulness of penalized estimators.