You-gan Wang | Australian Catholic University (original) (raw)

Papers by You-gan Wang

Australian & New Zealand Journal of Statistics, 2000

The article describes a generalized estimating equations approach that was used to investigate th... more The article describes a generalized estimating equations approach that was used to investigate the impact of technology on vessel performance in a trawl fishery during 1988-96, while accounting for spatial and temporal correlations in the catch-effort data. Robust estimation of parameters in the presence of several levels of clustering depended more on the choice of cluster definition than on the choice of correlation structure within the cluster. Models with smaller cluster sizes produced stable results, while models with larger cluster sizes, that may have had complex within-cluster correlation structures and that had withincluster covariates, produced estimates sensitive to the correlation structure. The preferred model arising from this dataset assumed that catches from a vessel were correlated in the same years and the same areas, but independent in different years and areas. The model that assumed catches from a vessel were correlated in all years and areas, equivalent to a random effects term for vessel, produced spurious results. This was an unexpected finding that highlighted the need to adopt a systematic strategy for modelling. The article proposes a modelling strategy of selecting the best cluster definition first, and the working correlation structure (within clusters) second. The article discusses the selection and interpretation of the model in the light of background knowledge of the data and utility of the model, and the potential for this modelling approach to apply in similar statistical situations.

This paper focuses on the spatio-temporal pattern of Leishmaniasis incidence in Afghanistan. We ... more This paper focuses on the spatio-temporal pattern of Leishmaniasis incidence in Afghanistan. We hold the view that correlations that arise from spatial and temporal sources are inherently distinct. Our method decouples these two sources of correlations, there are at least two advantages in taking this approach. First, it circumvents the need to inverting a large correlation matrix, which is a commonly encountered problem in spatio-temporal analyses (e.g., Yasui and Lele, 1997) . Second, it simplifies the modelling of complex relationships such as anisotropy, which would have been extremely difficult or impossible if spatio-temporal correlations were simultaneously considered. The model was built on a foundation of the generalized estimating equations (Liang and Zeger, 1986). We illustrate the method using data from Afghanistan between 2003-2009. Since the data covers a period that overlaps with the US invasion of Afghanistan, the zero counts may be the result of no disease incidenc...

Journal of the Royal Statistical Society: Series C (Applied Statistics), 2017

Spatial statistical analyses are often used to study the link between environmental factors and d... more Spatial statistical analyses are often used to study the link between environmental factors and disease incidence. This paper studies the influence of environmental factors on malaria incidence in Afghanistan. The data present several challenges, including multiple latent sources of spatial correlation for observations, which is addressed via a classical generalized estimating equations (GEE) approach. The multiple sources of correlation is tackled by extending the GEE approach to a system of multiple GEEs, which is used in conjunction with another system of equations constructed to estimate the mean and correlation parameters of the model. The proposed spatial correlation function is a linear combination of different correlation functions that may display various degrees of anisotropy. The generalized method of moments to combine GEE type estimating equations is suggested and all estimates are obtained by alternating between the two systems and iterating to convergence.

This paper is motivated by spatio-temporal pattern in the occurrence of Leishmaniasis in Afghanis... more This paper is motivated by spatio-temporal pattern in the occurrence of Leishmaniasis in Afghanistan and the relatively high number of zero counts. We hold the view that correlations that arise from spatial and temporal sources are inherently distinct. Our method decouples these two sources of correlations, there are at least two advantages in taking this approach. First, it circumvents the need to inverting a large correlation matrix, which is a commonly encountered problem in spatio-temporal analyses. Second, it simplifies the modelling of complex relationships such as anisotropy, which would have been extremely difficult or impossible if spatio-temporal correlations were simultaneously considered. We identify three challenges in the modelling of a spatio-temporal process: (1) accommodation of covariances that arise from spatial and temporal sources; (2) choosing the correct covariance structure and (3) extending to situations where a covariance is not the natural measure of...

The Journal of Pediatrics, 1994

Developmental Dynamics, 2002

Gene-targeted disruption of Grg5, a mouse homologue of Drosophila groucho (gro), results in postn... more Gene-targeted disruption of Grg5, a mouse homologue of Drosophila groucho (gro), results in postnatal growth retardation in mice. The growth defect, most striking in approximately half of the Grg5 null mice, occurs during the first 4-5 weeks of age, but most mice recover retarded growth later. We used the nonlinear mixed-effects model to fit the growth data of wild-type, heterozygous, and Grg5 null mice. On the basis of preliminary evidence suggesting an interaction between Grg5 and the transcription factor Cbfa1/Runx2, critical for skeletal development, we further investigated the skeleton in the mice. A long bone growth plate defect was identified, which included shorter zones of proliferative and hypertrophic chondrocytes and decreased trabecular bone formation. This decreased trabecular bone formation is likely caused by a reduced recruitment of osteoblasts into the growth plate region of Grg5 null mice. Like the growth defect, the growth plate and trabecular bone abnormality improved as the mice grew older. The growth plate defect was associated with reduced Indian hedgehog expression and signaling. We suggest that Grg5, a transcriptional coregulator, modulates the activities of transcription factors, such as Cbfa1/Runx2 in vivo to affect Ihh expression and the function of long bone growth plates.

Stock assessment of the eastern king prawn (EKP) fishery, and the subsequent advice to management... more Stock assessment of the eastern king prawn (EKP) fishery, and the subsequent advice to management and industry, could be improved by addressing a number of issues. The recruitment dynamics of EKP in the northern (i.e., North Reef to the Swain Reefs) parts of the fishery need to be clarified. Fishers report that the size of the prawns from these areas when they recruit to the fishing grounds is resulting in suboptimal sizes/ages at first capture, and therefore localised growth overfishing. There is a need to assess alternative harvest strategies of the EKP fishery, via computer simulations, particularly seasonal and monthly or lunar-based closures to identify scenarios that improve the value of the catch, decrease costs and reduce the risk of overfishing, prior to implementing new management measures.

Environmental Modeling & Assessment, 2017

Corrected Akaike information criterion (AICc), is a widely used tool in analyze empirical data in... more Corrected Akaike information criterion (AICc), is a widely used tool in analyze empirical data in many fields, and it provide a more unbiased estimator than Akaike information criterion (AIC), especially in small-size samples. To this end, we propose a modified version of the AICc in a generalized linear model framework, referred to as the blockwise AICc (bAICc), which make full use of AICc's advantages in small-size samples. Compared with some other information criteria, extensive simulation results show that the bAICc performs well. We also analyzed two environmental datasets, one for snail survival and the other for fish infection, to illustrate the usefulness of this new model selection criterion.

Statistical Methods in Medical Research

In robust regression, it is usually assumed that the distribution of the error term is symmetric ... more In robust regression, it is usually assumed that the distribution of the error term is symmetric or the data are symmetrically contaminated by outliers. However, this assumption is usually not satisfied in practical problems, and thus if the traditional robust methods, such as Tukey’s biweight and Huber’s method, are used to estimate the regression parameters, the efficiency of the parameter estimation can be lost. In this paper, we construct an asymmetric Tukey’s biweight loss function with two tuning parameters and propose a data-driven method to find the most appropriate tuning parameters. Furthermore, we provide an adaptive algorithm to obtain robust and efficient parameter estimates. Our extensive simulation studies suggest that the proposed method performs better than the symmetric methods when error terms follow an asymmetric distribution or are asymmetrically contaminated. Finally, a cardiovascular risk factors dataset is analyzed to illustrate the proposed method.

Computers and Electronics in Agriculture

Bioinformatics

Motivation Under two biologically different conditions, we are often interested in identifying di... more Motivation Under two biologically different conditions, we are often interested in identifying differentially expressed genes. It is usually the case that the assumption of equal variances on the two groups is violated for many genes where a large number of them are required to be filtered or ranked. In these cases, exact tests are unavailable and the Welch’s approximate test is most reliable one. The Welch’s test involves two layers of approximations: approximating the distribution of the statistic by a t-distribution, which in turn depends on approximate degrees of freedom. This study attempts to improve upon Welch’s approximate test by avoiding one layer of approximation. Results We introduce a new distribution that generalizes the t-distribution and propose a Monte Carlo based test that uses only one layer of approximation for statistical inferences. Experimental results based on extensive simulation studies show that the Monte Carol based tests enhance the statistical power and...

Frontiers in genetics, 2018

The analysis of large genomic data is hampered by issues such as a small number of observations a... more The analysis of large genomic data is hampered by issues such as a small number of observations and a large number of predictive variables (commonly known as "large P small N"), high dimensionality or highly correlated data structures. Machine learning methods are renowned for dealing with these problems. To date machine learning methods have been applied in Genome-Wide Association Studies for identification of candidate genes, epistasis detection, gene network pathway analyses and genomic prediction of phenotypic values. However, the utility of two machine learning methods, Gradient Boosting Machine (GBM) and Extreme Gradient Boosting Method (XgBoost), in identifying a subset of SNP makers for genomic prediction of breeding values has never been explored before. In this study, using 38,082 SNP markers and body weight phenotypes from 2,093 Brahman cattle (1,097 bulls as a discovery population and 996 cows as a validation population), we examined the efficiency of three mac...

Statistical Methods in Medical Research

In this paper, we consider variable selection in rank regression models for longitudinal data. To... more In this paper, we consider variable selection in rank regression models for longitudinal data. To obtain both robustness and effective selection of important covariates, we propose incorporating shrinkage by adaptive lasso or SCAD in the Wilcoxon dispersion function and establishing the oracle properties of the new method. The new method can be conveniently implemented with the statistical software R. The performance of the proposed method is demonstrated via simulation studies. Finally, two datasets are analyzed for illustration. Some interesting findings are reported and discussed.

$Research paper thumbnail of A quasi-likelihood method for fractal-dimension estimation$

Mathematics and Computers in Simulation, 1999

ABSTRACT

Fisheries management agencies around the world collect age data for the purpose of assessing the ... more Fisheries management agencies around the world collect age data for the purpose of assessing the status of natural resources in their jurisdiction. Estimates of mortality rates represent a key information to assess

Statistics in Medicine, 2011

We investigate methods for data-based selection of working covariance models in the analysis of c... more We investigate methods for data-based selection of working covariance models in the analysis of correlated data with generalized estimating equations. We study two selection criteria: Gaussian pseudolikelihood and a geodesic distance based on discrepancy between model-sensitive and model-robust regression parameter covariance estimators. The Gaussian pseudolikelihood is found in simulation to be reasonably sensitive for several response distributions and noncanonical mean-variance relations for longitudinal data. Application is also made to a clinical dataset. Assessment of adequacy of both correlation and variance models for longitudinal data should be routine in applications, and we describe open-source software supporting this practice.

Statistics in Medicine, 2009

Selecting an appropriate working correlation structure is pertinent to clustered data analysis us... more Selecting an appropriate working correlation structure is pertinent to clustered data analysis using generalized estimating equations (GEE) because an inappropriate choice will lead to inefficient parameter estimation. We investigate the well-known criterion of QIC for selecting a working correlation structure, and have found that performance of the QIC is deteriorated by a term that is theoretically independent of the correlation structures but has to be estimated with an error. This leads us to propose a correlation information criterion (CIC) that substantially improves the QIC performance. Extensive simulation studies indicate that the CIC has remarkable improvement in selecting the correct correlation structures. We also illustrate our findings using a data set from the Madras Longitudinal Schizophrenia Study.

Statistics in Medicine, 2002

The primary goal of a phase I trial is to find the maximally tolerated dose (MTD) of a treatment.... more The primary goal of a phase I trial is to find the maximally tolerated dose (MTD) of a treatment. The MTD is usually defined in terms of a tolerable probability, q(*), of toxicity. Our objective is to find the highest dose with toxicity risk that does not exceed q(*), a criterion that is often desired in designing phase I trials. This criterion differs from that of finding the dose with toxicity risk closest to q(*), that is used in methods such as the continual reassessment method. We use the theory of decision processes to find optimal sequential designs that maximize the expected number of patients within the trial allocated to the highest dose with toxicity not exceeding q(*), among the doses under consideration. The proposed method is very general in the sense that criteria other than the one considered here can be optimized and that optimal dose assignment can be defined in terms of patients within or outside the trial. It includes as an important special case the continual reassessment method. Numerical study indicates the strategy compares favourably with other phase I designs.

Statistics in Medicine, 2012

A flexible and simple Bayesian decision-theoretic design for dose-finding trials is proposed in t... more A flexible and simple Bayesian decision-theoretic design for dose-finding trials is proposed in this paper. In order to reduce the computational burden, we adopt a working model with conjugate priors, which is flexible to fit all monotonic dose-toxicity curves and produces analytic posterior distributions. We also discuss how to use a proper utility function to reflect the interest of the trial. Patients are allocated based on not only the utility function but also the chosen dose selection rule. The most popular dose selection rule is the one-step-look-ahead (OSLA), which selects the best-so-far dose. A more complicated rule, such as the two-step-look-ahead, is theoretically more efficient than the OSLA only when the required distributional assumptions are met, which is, however, often not the case in practice. We carried out extensive simulation studies to evaluate these two dose selection rules and found that OSLA was often more efficient than two-step-look-ahead under the proposed Bayesian structure. Moreover, our simulation results show that the proposed Bayesian method's performance is superior to several popular Bayesian methods and that the negative impact of prior misspecification can be managed in the design stage.

Australian & New Zealand Journal of Statistics, 2000

Journal of the Royal Statistical Society: Series C (Applied Statistics), 2017

The Journal of Pediatrics, 1994

Developmental Dynamics, 2002

Environmental Modeling & Assessment, 2017

Statistical Methods in Medical Research

Computers and Electronics in Agriculture

Bioinformatics

Frontiers in genetics, 2018

Statistical Methods in Medical Research

$Research paper thumbnail of A quasi-likelihood method for fractal-dimension estimation$

Mathematics and Computers in Simulation, 1999

ABSTRACT

Statistics in Medicine, 2011

Statistics in Medicine, 2009

Statistics in Medicine, 2002

Statistics in Medicine, 2012