Hongtu Zhu - Academia.edu (original) (raw)

Papers by Hongtu Zhu

Cerebral Cortex, Feb 13, 2016

Brain structural covariance networks (SCNs) composed of regions with correlated variation are alt... more Brain structural covariance networks (SCNs) composed of regions with correlated variation are altered in neuropsychiatric disease and change with age. Little is known about the development of SCNs in early childhood, a period of rapid cortical growth. We investigated the development of structural and maturational covariance networks, including default, dorsal attention, primary visual and sensorimotor networks in a longitudinal population of 118 children after birth to 2 years old and compared them with intrinsic functional connectivity networks. We found that structural covariance of all networks exhibit strong correlations mostly limited to their seed regions. By Age 2, default and dorsal attention structural networks are much less distributed compared with their functional maps. The maturational covariance maps, however, revealed significant couplings in rates of change between distributed regions, which partially recapitulate their functional networks. The structural and maturational covariance of the primary visual and sensorimotor networks shows similar patterns to the corresponding functional networks. Results indicate that functional networks are in place prior to structural networks, that correlated structural patterns in adult may arise in part from coordinated cortical maturation, and that regional co-activation in functional networks may guide and refine the maturation of SCNs over childhood development.

Asynchronous functional linear regression models for longitudinal data in reproducing kernel Hilbert space

Biometrics, Oct 17, 2022

Motivated by the analysis of longitudinal neuroimaging studies, we study the longitudinal functio... more Motivated by the analysis of longitudinal neuroimaging studies, we study the longitudinal functional linear regression model under asynchronous data setting for modeling the association between clinical outcomes and functional (or imaging) covariates. In the asynchronous data setting, both covariates and responses may be measured at irregular and mismatched time points, posing methodological challenges to existing statistical methods. We develop a kernel weighted loss function with roughness penalty to obtain the functional estimator and derive its representer theorem. The rate of convergence, a Bahadur representation, and the asymptotic pointwise distribution of the functional estimator are obtained under the reproducing kernel Hilbert space framework. We propose a penalized likelihood ratio test to test the nullity of the functional coefficient, derive its asymptotic distribution under the null hypothesis, and investigate the separation rate under the alternative hypotheses. Simulation studies are conducted to examine the finite‐sample performance of the proposed procedure. We apply the proposed methods to the analysis of multitype data obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, which reveals significant association between 21 regional brain volume density curves and the cognitive function. Data used in preparation of this paper were obtained from the ADNI database (adni.loni.usc.edu).

Journal of the American Statistical Association, 2018

The aim of this paper is to develop a novel class of functional structural equation models (FSEMs... more The aim of this paper is to develop a novel class of functional structural equation models (FSEMs) for dissecting functional genetic and environmental effects on twin functional data, while characterizing the varying association between functional data and covariates of interest. We propose a three-stage estimation procedure to estimate varying coefficient functions for various covariates (e.g., gender) as well as three covariance operators for the genetic and environmental effects. We develop an inference procedure based on weighted likelihood ratio statistics to test the genetic/environmental effect at either a fixed location or a compact region. We also systematically carry out the theoretical analysis of the estimated varying functions, the weighted likelihood ratio statistics, and the estimated covariance operators. We conduct extensive Monte Carlo simulations to examine the finite-sample performance of the estimation and inference procedures. We apply the proposed FSEM to quantify the degree of genetic and environmental effects on twin whitematter tracts obtained from the UNC early brain development study.

Statistical methods in medical research, 2017

Alzheimer's disease is a firmly incurable and progressive disease. The pathology of Alzheimer... more Alzheimer's disease is a firmly incurable and progressive disease. The pathology of Alzheimer's disease usually evolves from cognitive normal, to mild cognitive impairment, to Alzheimer's disease. The aim of this paper is to develop a Bayesian hidden Markov model to characterize disease pathology, identify hidden states corresponding to the diagnosed stages of cognitive decline, and examine the dynamic changes of potential risk factors associated with the cognitive normal-mild cognitive impairment-Alzheimer's disease transition. The hidden Markov model framework consists of two major components. The first one is a state-dependent semiparametric regression for delineating the complex associations between clinical outcomes of interest and a set of prognostic biomarkers across neurodegenerative states. The second one is a parametric transition model, while accounting for potential covariate effects on the cross-state transition. The inter-individual and inter-process di...

Psychometrika, Jan 13, 2017

Many psychological concepts are unobserved and usually represented as latent factors apprehended ... more Many psychological concepts are unobserved and usually represented as latent factors apprehended through multiple observed indicators. When multiple-subject multivariate time series data are available, dynamic factor analysis models with random effects offer one way of modeling patterns of within- and between-person variations by combining factor analysis and time series analysis at the factor level. Using the Dirichlet process (DP) as a nonparametric prior for individual-specific time series parameters further allows the distributional forms of these parameters to deviate from commonly imposed (e.g., normal or other symmetric) functional forms, arising as a result of these parameters' restricted ranges. Given the complexity of such models, a thorough sensitivity analysis is critical but computationally prohibitive. We propose a Bayesian local influence method that allows for simultaneous sensitivity analysis of multiple modeling components within a single fitting of the model o...

Statistica Sinica, 2018

As an important part of modern health care, medical imaging data, which can be regarded as densel... more As an important part of modern health care, medical imaging data, which can be regarded as densely sampled functional data, have been widely used for diagnosis, screening, treatment, and prognosis, such as finding breast cancer through mammograms. The aim of this paper is to propose a functional linear regression model for using functional (or imaging) predictors to predict clinical outcomes (e.g., disease status), while addressing missing clinical outcomes. We introduce an exponential tilting semiparametric model to account for the nonignorable missing data mechanism. We develop a set of estimating equations and its associated computational methods for both parameter estimation and the selection of the tuning parameters. We also propose a bootstrap resampling procedure for carrying out statistical inference. We systematically establish the asymptotic properties (e.g., consistency and convergence rate) of the estimates calculated from the proposed

Journal of the American Statistical Association, 2017

We propose a multiscale weighted principal component regression (MWPCR) framework for the use of ... more We propose a multiscale weighted principal component regression (MWPCR) framework for the use of high dimensional features with strong spatial features (e.g., smoothness and correlation) to predict an outcome variable, such as disease status. This development is motivated by identifying imaging biomarkers that could potentially aid detection, diagnosis, assessment of prognosis, prediction of response to treatment, and monitoring of disease status, among many others. The MWPCR can be regarded as a novel integration of principal components analysis (PCA), kernel methods, and regression models. In MWPCR, we introduce various weight matrices to prewhitten high dimensional feature vectors, perform matrix decomposition for both dimension reduction and feature extraction, and build a prediction model by using the extracted features. Examples of such weight matrices include an importance score weight matrix for the selection of individual features at each location and a spatial weight matrix for the incorporation of the spatial pattern of feature vectors. We integrate the importance score weights with the spatial weights in order to recover the low dimensional structure of high dimensional features. We demonstrate the utility of our methods through extensive simulations and real data analyses of the Alzheimer's disease neuroimaging initiative (ADNI) data set.

Cerebral cortex (New York, N.Y. : 1991), Mar 13, 2016

The Annals of Applied Statistics, 2015

cardiovascular (CV) measurements provide valuable insights into individuals' health conditions in... more cardiovascular (CV) measurements provide valuable insights into individuals' health conditions in "real-life," everyday settings. Current methods of modeling ambulatory CV data do not consider the dynamic characteristics of the full data set and their relationships with covariates such as caffeine use and stress. We propose a stochastic differential equation (SDE) in the form of a dual nonlinear Ornstein-Uhlenbeck (OU) model with person-specific covariates to capture the morning surge and nighttime dipping dynamics of ambulatory CV data. To circumvent the data analytic constraint that empirical measurements are typically collected at irregular and much larger time intervals than those evaluated in simulation studies of SDEs, we adopt a Bayesian approach with a regularized Brownian Bridge sampler (RBBS) and an efficient multiresolution (MR) algorithm to fit the proposed SDE. The MR algorithm can produce more efficient MCMC samples that is crucial for valid parameter estimation and inference. Using this model and algorithm to data from the Duke Behavioral Investigation of Hypertension Study, results indicate that age, caffeine intake, gender and race have effects on distinct dynamic characteristics of the participants' CV trajectories.

Lecture Notes in Computer Science, 2013

Many longitudinal imaging studies have been/are being widely conducted to use diffusion tensor im... more Many longitudinal imaging studies have been/are being widely conducted to use diffusion tensor imaging (DTI) to better understand white matter maturation in normal controls and diseased subjects. There is an urgent demand for the development of statistical methods for analyzing diffusion properties along major fiber tracts obtained from longitudinal DTI studies. Jointly analyzing fiber-tract diffusion properties and covariates from longitudinal studies raises several major challenges including (i) infinite-dimensional functional response data, (ii) complex spatialtemporal correlation structure, and (iii) complex spatial smoothness. To address these challenges, this article is to develop a longitudinal functional analysis framework (LFAF) to delineate the dynamic changes of diffusion properties along major fiber tracts and their association with a set of covariates of interest (e.g., age and group status) and the structure of the variability of these white matter tract properties in various longitudinal studies. Our LFAF consists of a functional mixed effects model for addressing all three challenges, an efficient method for spatially smoothing varying coefficient functions, an estimation method for estimating the spatial-temporal correlation structure, a test procedure with a global test statistic for testing hypotheses of interest associated with functional response, and a simultaneous confidence band for quantifying the uncertainty in the estimated coefficient functions. Simulated data are used to evaluate the finite sample performance of LFAF and to demonstrate that LFAF significantly outperforms a voxel-wise mixed model method. We apply LFAF to study the spatial-temporal dynamics of white-matter fiber tracts in a clinical study of neurodevelopment.

Bayesian Sensitivity Analysis of Statistical Models with Missing Data

Statistica Sinica, 2014

Methods for handling missing data depend strongly on the mechanism that generated the missing val... more Methods for handling missing data depend strongly on the mechanism that generated the missing values, such as missing completely at random (MCAR) or missing at random (MAR), as well as other distributional and modeling assumptions at various stages. It is well known that the resulting estimates and tests may be sensitive to these assumptions as well as to outlying observations. In this paper, we introduce various perturbations to modeling assumptions and individual observations, and then develop a formal sensitivity analysis to assess these perturbations in the Bayesian analysis of statistical models with missing data. We develop a geometric framework, called the Bayesian perturbation manifold, to characterize the intrinsic structure of these perturbations. We propose several intrinsic influence measures to perform sensitivity analysis and quantify the effect of various perturbations to statistical models. We use the proposed sensitivity analysis procedure to systematically investig...

Structural Equation Modeling: A Multidisciplinary Journal, 2001

Recently, analysis of structural equation models with polytomous and continuous variables has rec... more Recently, analysis of structural equation models with polytomous and continuous variables has received a lot of attention. However, contributions to the selection of good models are limited. The main objective of this article is to investigate the maximum likelihood estimation of unknown parameters in a general LISREL-type model with mixed polytomous and continuous data and propose a model selection procedure for obtaining good models for the underlying substantive theory. The maximum likelihood estimate is obtained by a Monte Carlo Expectation Maximization algorithm, in which the E step is evaluated via the Gibbs sampler and the M step is completed via the method of conditional maximization. The convergence of the Monte Carlo Expectation Maximization algorithm is monitored by the bridge sampling. A model selection procedure based on Bayes factor and Occam's window search strategy is proposed. The effectiveness of the procedure in accounting for the model uncertainty and in picking good models is discussed. The proposed methodology is illustrated with a real example.

Nuclear Medicine Communications, 2008

Aim The aim of this study was to investigate the protective effect of chuanxiongzine-puerarin on ... more Aim The aim of this study was to investigate the protective effect of chuanxiongzine-puerarin on cerebral ischemiareperfusion damage in a rat model. Conclusion This study showed that chuanxiongzinepuerarin played an important protective role against cerebral ischemic reperfusion damage in a rat model.

Bayesian local influence for survival models

Lifetime Data Analysis, 2010

Journal of the American Statistical Association, 2012

The aim of this paper is to develop a semiparametric model for describing the variability of the ... more The aim of this paper is to develop a semiparametric model for describing the variability of the medial representation of subcortical structures, which belongs to a Riemannian manifold, and establishing its association with covariates of interest, such as diagnostic status, age and gender. We develop a two-stage estimation procedure to calculate the parameter estimates. The first stage is to calculate an intrinsic least squares estimator of the parameter vector using the annealing evolutionary stochastic approximation Monte Carlo algorithm and then the second stage is to construct a set of estimating equations to obtain a more efficient estimate with the intrinsic least squares estimate as the starting point. We use Wald statistics to test linear hypotheses of unknown parameters and establish their limiting distributions. Simulation studies are used to evaluate the accuracy of our parameter estimates and the finite sample performance of the Wald statistics. We apply our methods to the detection of the difference in the morphological changes of the left and right hippocampi between schizophrenia patients and healthy controls using medial shape description.

Journal of the American Statistical Association, 2009

Stochastic noise, susceptibility artifacts, magnetic field and radiofrequency inhomogeneities, an... more Stochastic noise, susceptibility artifacts, magnetic field and radiofrequency inhomogeneities, and other noise components in magnetic resonance images (MRIs) can introduce serious bias into any measurements made with those images. We formally introduce three regression models including a Rician regression model and two associated normal models to characterize stochastic noise in various magnetic resonance imaging modalities, including diffusion-weighted imaging (DWI) and functional MRI (fMRI). Estimation algorithms are introduced to maximize the likelihood function of the three regression models. We also develop a diagnostic procedure for systematically exploring MR images to identify noise components other than simple stochastic noise, and to detect discrepancies between the fitted regression models and MRI data. The diagnostic procedure includes goodness-of-fit statistics, measures of influence, and tools for graphical display. The goodness-of-fit statistics can assess the key assumptions of the three regression models, whereas measures of influence can isolate outliers caused by certain noise components, including motion artifacts. The tools for graphical display permit graphical visualization of the values for the goodness-of-fit statistic and influence measures. Finally, we conduct simulation studies to evaluate performance of these methods, and we analyze a real dataset to illustrate how our diagnostic procedure localizes subtle image artifacts by detecting intravoxel variability that is not captured by the regression models.

Journal of Computational and Graphical Statistics, 2012

We examine three Bayesian case influence measures including the φ-divergence, Cook's posterior mo... more We examine three Bayesian case influence measures including the φ-divergence, Cook's posterior mode distance and Cook's posterior mean distance for identifying a set of influential observations for a variety of statistical models with missing data including models for longitudinal data and latent variable models in the absence/presence of missing data. Since it can be computationally prohibitive to compute these Bayesian case influence measures in models with missing data, we derive simple first-order approximations to the three Bayesian case influence measures by using the Laplace approximation formula and examine the applications of these approximations to the identification of influential sets. All of the computations for the first-order approximations can be easily done using Markov chain Monte Carlo samples from the posterior distribution based on the full data. Simulated data and an AIDS dataset are analyzed to illustrate the methodology.

Cerebral Cortex, 2013

There are numerous reports of sexual dimorphism in brain structure in children and adults, but da... more There are numerous reports of sexual dimorphism in brain structure in children and adults, but data on sex differences in infancy are extremely limited. Our primary goal was to identify sex differences in neonatal brain structure. Our secondary goal was to explore whether brain structure was related to androgen exposure or sensitivity. Two hundred and ninety-three neonates (149 males) received high-resolution structural magnetic resonance imaging scans. Sensitivity to androgen was measured using the number of cytosine, adenine, guanine (CAG) triplets in the androgen receptor gene and the ratio of the second to fourth digit, provided a proxy measure of prenatal androgen exposure. There was a significant sex difference in intracranial volume of 5.87%, which was not related to CAG triplets or digit ratios. Tensor-based morphometry identified extensive areas of local sexual dimorphism. Males had larger volumes in medial temporal cortex and rolandic operculum, and females had larger volumes in dorsolateral prefrontal, motor, and visual cortices. Androgen exposure and sensitivity had minor sex-specific effects on local gray matter volume, but did not appear to be the primary determinant of sexual dimorphism at this age. Comparing our study with the existing literature suggests that sex differences in cortical structure vary in a complex and highly dynamic way across the human lifespan.

Canadian Journal of Statistics, 2003

The authors describe a method for assessing model inadequacy in maximum likelihood estimation of ... more The authors describe a method for assessing model inadequacy in maximum likelihood estimation of a generalized linear mixed model. They treat the latent random effects in the model as missing data and develop the influence analysis on the basis of a Q-function which is associated with the conditional expectation of the complete-data log-likelihood function in the EM algorithm. They propose a procedure to detect influential observations in six model perturbation schemes. They also illustrate their methodology in a hypothetical situation and in two real cases. Appréhender l'influence locale dans le cadre des modèles linéaires généralisésà effets aléatoires Résumé : Les auteurs décrivent une méthode permettant de juger de l'adéquation d'un modèle lors de l'estimation par vraisemblance maximale d'un modèle linéaire généralisé comportant des effets aléatoires. Ils traitent ces effets latents comme des données manquantes et font une analyse d'influence en s'appuyant sur une Q-fonction associéeà l'espérance conditionnelle de la fonction de log-vraisemblance pour les données complètes dans l'algorithme EM. Ils proposent une procédure permettant de repérer les observations influentes sous six scénarios de perturbation du modèle. Ils illustrent en outre leur approche dans une situation hypothétique et dans deux cas concrets.

Biometrics, 2010

We consider selecting both fixed and random effects in a general class of mixed effects models us... more We consider selecting both fixed and random effects in a general class of mixed effects models using maximum penalized likelihood (MPL) estimation along with the smoothly clipped absolute deviation (SCAD) and adaptive LASSO (ALASSO) penalty functions. The maximum penalized likelihood estimates are shown to posses consistency and sparsity properties and asymptotic normality. A model selection criterion, called the IC Q statistic, is proposed for selecting the penalty parameters (Ibrahim, Zhu and Tang, 2008). The variable selection procedure based on IC Q is shown to consistently select important fixed and random effects. The methodology is very general and can be applied to numerous situations involving random effects, including generalized linear mixed models. Simulation studies and a real data set from an Yale infant growth study are used to illustrate the proposed methodology.