Peihua Qiu - Profile on Academia.edu (original) (raw)

Papers by Peihua Qiu

Research paper thumbnail of Error-in-Variables Jump Regression Using Local Clustering

Social Science Research Network, 2016

Error-in-variables regression is widely used in econometric models. The statistical analysis beco... more Error-in-variables regression is widely used in econometric models. The statistical analysis becomes challenging when the regression function is discontinuous and the distribution of measurement error is unknown. In this paper, we propose a novel jump-preserving curve estimation method. A major feature of our method is that it can remove the noise effectively while preserving the jumps well, without requiring much prior knowledge about the measurement error distribution. The jump-preserving property is achieved mainly by local clustering. We show that the proposed curve estimator is statistical consistent, and it performs favorably, in comparison with an existing jump-preserving estimator. Finally, we demonstrate our method by an application to a health tax policy study in Australia.

Research paper thumbnail of A New Process Control Chart for Monitoring Short-Range Serially Correlated Data

Technometrics, May 9, 2019

Statistical process control (SPC) charts are critically important for quality control and managem... more Statistical process control (SPC) charts are critically important for quality control and management in manufacturing industries, environmental monitoring, disease surveillance and many other applications. Conventional SPC charts are designed for cases when process observations are independent at different observation times. In practice, however, serial data correlation almost always exists in sequential data. It has been well demonstrated in the literature that control charts designed for independent data are unstable for monitoring serially correlated data. Thus, it is important to develop control charts specifically for monitoring serially correlated data. To this end, there is some existing discussion in the SPC literature. Most existing methods are based on parametric time series modeling and residual monitoring, where the data are often assumed to be normally distributed. In applications, however, the assumed parametric time series model with a given order and the normality assumption are often invalid, resulting in unstable process monitoring. Although there is some nice discussion on robust design of such residual monitoring control charts, the suggested designs can only handle certain special cases well. In this paper, we try to make another effort by proposing a novel control chart that makes use of the restarting mechanism of a CUSUM chart and the related spring length concept. Our proposed chart uses observations within the spring length of the current time point and ignores all history data that are beyond the spring length. It does not require any parametric time series model and/or a parametric process distribution. It only requires the assumption that process observation at a given time point is associated with nearby observations and independent of observations that are far away in observation times, which should be reasonable for many applications. Numerical studies show that it performs well in different cases.

Research paper thumbnail of A Change-Point Approach for Phase-I Analysis in Multivariate Profile Monitoring and Diagnosis

Technometrics, Apr 2, 2016

We study the phase I analysis of a multistage process where the input of the current process stag... more We study the phase I analysis of a multistage process where the input of the current process stage may be closely related to the output(s) of the earlier stage(s). We frame univariate observations from each of the stages in a multistage process as a single vector and recognize that the directions in which these vectors can shift are limited when attention is restricted to a single step shift in the mean of one stage. This allows us to focus detection power on a limited subspace with improved sensitivity. Taking advantage of this particular characteristic, we propose a change point approach that integrates the classical binary segmentation test with the directional information based on the state-space model for testing the stability of a batch of historical data. We give an accurate approximation for the significance level of the proposed test. Our simulation results show that the proposed approach consistently outperforms existing methods for multistage processes.

Research paper thumbnail of Multivariate Image Analysis

Multivariate Image Analysis

Technometrics, May 1, 2000

Research paper thumbnail of Sequential adaptive design for jump regression estimation

IISE transactions, Dec 3, 2021

Selecting input variables or design points for statistical models has been of great interest in a... more Selecting input variables or design points for statistical models has been of great interest in adaptive design and active learning. Motivated by two scientific examples, this paper presents a strategy of selecting the design points for a regression model when the underlying regression function is discontinuous. The first example we undertook was for the purpose of accelerating imaging speed in a high resolution material imaging; the second was use of sequential design for the purpose of mapping a chemical phase diagram. In both examples, the underlying regression functions have discontinuities, so many of the existing design optimization approaches cannot be applied because they mostly assume a continuous regression function. Although some existing adaptive design strategies developed from treed regression models can handle the discontinuities, the Bayesian approaches come with computationally expensive Markov Chain Monte Carlo techniques for posterior inferences and subsequent design point selections, which is not appropriate for the first motivating example that requires computation at least faster than the original imaging speed. In addition, the treed models are based on the domain partitioning that are inefficient when the discontinuities occurs over complex sub-domain boundaries. We propose a simple and effective adaptive design strategy for a regression analysis with discontinuities: some statistical properties with a fixed design will be presented first, and then these properties will be used to propose a new criterion of selecting the design points for the regression analysis. Sequential design with the new criterion will be presented with comprehensive simulated examples, and its application to the two motivating examples will be presented.

Research paper thumbnail of Transparent Sequential Learning for Statistical Process Control of Serially Correlated Data

Transparent Sequential Learning for Statistical Process Control of Serially Correlated Data

Technometrics, Jun 28, 2021

Machine learning methods have been widely used in different applications, including process contr... more Machine learning methods have been widely used in different applications, including process control and monitoring. For handling statistical process control (SPC) problems, conventional supervised ...

Research paper thumbnail of Nonparametric dynamic screening system for monitoring correlated longitudinal data

Iie Transactions, Jun 9, 2016

In many applications, including disease early detection and prevention, and performance evaluatio... more In many applications, including disease early detection and prevention, and performance evaluation of airplanes and other durable products, we need to sequentially monitor the longitudinal pattern of certain performance variables of a subject. A signal should be given as soon as possible once the pattern becomes abnormal. Recently, a new statistical method called dynamic screening system (DySS) has been proposed to solve this problem. It is a combination of longitudinal data analysis and statistical process control. However, the current DySS method can only handle cases when observations are normally distributed and within-subject observations are independent or follow a specific time series model (e.g., AR(1) model). In this paper, we propose a new nonparametric DySS method which can handle cases when the observation distribution and the correlation among within-subject observations are arbitrary. Therefore, it broadens the application of the DySS method greatly. Numerical studies show that the new method works well in practice.

Research paper thumbnail of Some perspectives on nonparametric statistical process control

Journal of Quality Technology, Jan 2, 2018

Statistical process control (SPC) charts play a central role in quality control and management. M... more Statistical process control (SPC) charts play a central role in quality control and management. Many conventional SPC charts are designed under the assumption that the related process distribution is normal. In practice, the normality assumption is often invalid. In such cases, some papers show that certain conventional SPC charts are robust and they can still be used as long as their parameters are properly chosen. Some other papers argue that results from such conventional SPC charts would not be reliable and nonparametric SPC charts should be considered instead. In recent years, many nonparametric SPC charts have been proposed. Most of them are based on the ranking information in process observations collected at different time points. Some of them are based on data categorization and categorical data analysis. In this paper, we give some perspectives on issues related to the robustness of the conventional SPC charts and to the strengths and limitations of various nonparametric SPC charts.

Research paper thumbnail of Multivariate Statistical Process Control Using LASSO

Journal of the American Statistical Association, Dec 1, 2009

This paper develops a new multivariate statistical process control (SPC) methodology based on ada... more This paper develops a new multivariate statistical process control (SPC) methodology based on adapting the LASSO variable selection method to the SPC problem. The LASSO method has the sparsity property that it can select exactly the set of nonzero regression coefficients in multivariate regression modeling, which is especially useful in cases when the number of nonzero coefficients is small. In multivariate SPC applications, process mean vectors often shift in a small number of components. Our major goal is to detect such a shift as soon as it occurs and identify the shifted mean components. Using this connection between the two problems, a LASSO-based multivariate test statistic is proposed, which is then integrated into the multivariate EWMA charting scheme for Phase II multivariate process monitoring. It is shown that this approach balances protection against various shift levels and shift directions, and hence provides an effective tool for multivariate SPC applications.

Research paper thumbnail of Two robust multivariate exponentially weighted moving average charts to facilitate distinctive product quality features assessment

Two robust multivariate exponentially weighted moving average charts to facilitate distinctive product quality features assessment

Computers & Industrial Engineering, Jul 1, 2023

Research paper thumbnail of On Jump Structure Consideration in One-Dimensional Nonparametric Regression

This article introduces some recent local smoothing methods in fitting one dimensional jump regre... more This article introduces some recent local smoothing methods in fitting one dimensional jump regression models. Their strengths and limitations are discussed from several directions including: (1) their ability to get rid of the effect of slope or curvature of the regression curve on jump detection, (2) their ability to diminish the effect of noise, and (3) their ability to detect jumps in both the regression function itself and its derivatives.

Research paper thumbnail of Wiley Series in Probability and Statistics

Wiley Series in Probability and Statistics

John Wiley & Sons, Inc. eBooks, May 20, 2005

Research paper thumbnail of Sequential Adaptive Design for Jump Regression Estimation in Materials Discovery

arXiv (Cornell University), Apr 2, 2019

Selecting input variables or design points for statistical models has been of great interest in a... more Selecting input variables or design points for statistical models has been of great interest in adaptive design and active learning. Motivated by two scientific examples, this paper presents a strategy of selecting the design points for a regression model when the underlying regression function is discontinuous. The first example we undertook was for the purpose of accelerating imaging speed in a high resolution material imaging; the second was use of sequential design for the purpose of mapping a chemical phase diagram. In both examples, the underlying regression functions have discontinuities, so many of the existing design optimization approaches cannot be applied because they mostly assume a continuous regression function. Although some existing adaptive design strategies developed from treed regression models can handle the discontinuities, the Bayesian approaches come with computationally expensive Markov Chain Monte Carlo techniques for posterior inferences and subsequent design point selections, which is not appropriate for the first motivating example that requires computation at least faster than the original imaging speed. In addition, the treed models are based on the domain partitioning that are inefficient when the discontinuities occurs over complex sub-domain boundaries. We propose a simple and effective adaptive design strategy for a regression analysis with discontinuities: some statistical properties with a fixed design will be presented first, and then these properties will be used to propose a new criterion of selecting the design points for the regression analysis. Sequential design with the new criterion will be presented with comprehensive simulated examples, and its application to the two motivating examples will be presented.

Research paper thumbnail of Nonparametric profile monitoring by mixed effects modeling

Quality Engineering, 2011

In some applications, quality of a process is characterized by the functional relationship betwee... more In some applications, quality of a process is characterized by the functional relationship between a response variable and one or more explanatory variables. Profile monitoring is for checking the stability of this relationship over time. Control charts for monitoring nonparametric profiles are useful when the relationship is too complicated to be described parametrically. Most existing control charts in the literature are for monitoring parametric profiles. They require the assumption that within-profile measurements are independent of each other, which is often invalid in practice. This paper focuses on nonparametric profile monitoring when within-profile data are correlated. A novel control chart is suggested, which incorporates local linear kernel smoothing into the exponentially weighted moving average (EWMA) control scheme. In this method, within-profile correlation is described by a nonparametric mixed-effects model. Our proposed control chart is fast to compute and convenient to use. Numerical examples show that it works well in various cases. Some technical details are provided in an appendix available online as supplemental materials.

Research paper thumbnail of Nonparametric Profile Monitoring by Mixed Effects Modeling - Supplementary Material

Technometrics, Aug 1, 2010

In some applications, quality of a process is characterized by the functional relationship betwee... more In some applications, quality of a process is characterized by the functional relationship between a response variable and one or more explanatory variables. Profile monitoring is for checking the stability of this relationship over time. Control charts for monitoring nonparametric profiles are useful when the relationship is too complicated to be described parametrically. Most existing control charts in the literature are for monitoring parametric profiles. They require the assumption that within-profile measurements are independent of each other, which is often invalid in practice. This paper focuses on nonparametric profile monitoring when within-profile data are correlated. A novel control chart is suggested, which incorporates local linear kernel smoothing into the exponentially weighted moving average (EWMA) control scheme. In this method, within-profile correlation is described by a nonparametric mixed-effects model. Our proposed control chart is fast to compute and convenient to use. Numerical examples show that it works well in various cases. Some technical details are provided in an appendix available online as supplemental materials.

Research paper thumbnail of Analysis of US Household Catastrophic Health Care Expenditures Associated With Chronic Disease, 2008-2018

JAMA network open, May 27, 2022

Based on the federal poverty guidelines (low income: FPL, <200%; middle income: FPL, 200%-400%; h... more Based on the federal poverty guidelines (low income: FPL, <200%; middle income: FPL, 200%-400%; high income: FPL, >400%). e Includes other state-based programs.

Research paper thumbnail of Dynamic Disease Screening by Joint Modelling of Survival and Longitudinal Data

Dynamic Disease Screening by Joint Modelling of Survival and Longitudinal Data

Applied statistics, May 27, 2022

Research paper thumbnail of SUPP MATERIAL: Efficient Blind Image Deblurring Using Nonparametric Regression and Local Pixel Clustering

SUPP MATERIAL: Efficient Blind Image Deblurring Using Nonparametric Regression and Local Pixel Clustering

Technometrics, Dec 4, 2018

Research paper thumbnail of Abstract 591: Circulating testosterone in modifying the association of BMI change rate with serum PSA in prostate cancer-free men with initial-PSA less than 4 ng/mL

Abstract 591: Circulating testosterone in modifying the association of BMI change rate with serum PSA in prostate cancer-free men with initial-PSA less than 4 ng/mL

Background: Body mass index (BMI)-adjusted prostate-specific antigen (PSA) model has been propose... more Background: Body mass index (BMI)-adjusted prostate-specific antigen (PSA) model has been proposed to improve the predictive accuracy of serum PSA in prostate cancer (PCa) screening. However, how BMI change rate may influence PSA levels in PCa-free men has not been well studied. The current study is to examine the relationship between BMI change rate and serum PSA in PCa-free men and whether this relationship is modified by circulating testosterone. Methods: We conducted this study at a tertiary hospital in the Southeastern US using the Electronic Medical Records of PCa-free men with initial PSA less than 4 ng/mL (cutoff for prostate biopsy), at least 1 testosterone measurement and at least 2 BMI measurements during the study period. Time when the first BMI measurement was recorded served as the baseline, and the study period was defined from baseline to the most recent hospital visit. The included medical records ranged from Jun 2001 to Oct 2015. BMI change rate was created in two ways depending on the number of data points. For men with only 2 BMI measurements, it was calculated by firstly subtracting baseline BMI from the second BMI, then dividing the difference by time interval (months) between the two BMI measurements. For men with more than 2 BMI measurements, we firstly regressed BMI to time interval (months) between that measurement and baseline, then took the β regression coefficient (slope) as the BMI change rate for that men. Multivariable linear regression was used to assess the association of BMI change rate with three PSA measures, including peak, the most recent, and mean PSA during the study period. Effect modification by testosterone was assessed through stratified analysis by testosterone level of 280 ng/dL as cutoff. Results: A total of 470 men with a mean study period of 97.6 months were included. Median age at baseline was 62 years. After adjusting for covariates including baseline BMI, no significant association of BMI change rate was observed with peak PSA (β =0.416, P =0.078), the most recent PSA (β =0.360, P =0.139), or mean PSA (β =0.405, P =0.064) in the overall sample. However, testosterone-stratified analyses indicated that BMI change rate was positively associated with peak PSA (β =1.118, P =0.013), the most recent PSA (β =0.932, P =0.044), and mean PSA (β =1.034, P =0.013) in men with testosterone &lt;280 ng/dL, but no significant association was observed in men with testosterone ≥280 ng/dL (for peak PSA, β =0.076, P =0.785; for the most recent PSA, β =0.072, P =0.802; for mean PSA, β =0.099, P =0.700). Conclusion: Accelerated BMI increase in middle-to-late adulthood might correlate with higher PSA level if a low circulating testosterone occurred concurrently. Further studies are needed to confirm this finding. Citation Format: Kai Wang, Mattia Prosperi, Peihua Qiu, Ting-Yuan David Cheng, Victoria Y. Bird, Xinguang Chen, Mingyang Song. Circulating testosterone in modifying the association of BMI change rate with serum PSA in prostate cancer-free men with initial-PSA less than 4 ng/mL [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 591.

Research paper thumbnail of Multivariate single index modeling of longitudinal data with multiple responses

Multivariate single index modeling of longitudinal data with multiple responses

Statistics in Medicine

In medical studies, composite indices and/or scores are routinely used for predicting medical con... more In medical studies, composite indices and/or scores are routinely used for predicting medical conditions of patients. These indices are usually developed from observed data of certain disease risk factors, and it has been demonstrated in the literature that single index models can provide a powerful tool for this purpose. In practice, the observed data of disease risk factors are often longitudinal in the sense that they are collected at multiple time points for individual patients, and there are often multiple aspects of a patient's medical condition that are of our concern. However, most existing single‐index models are developed for cases with independent data and a single response variable, which are inappropriate for the problem just described in which within‐subject observations are usually correlated and there are multiple mutually correlated response variables involved. This paper aims to fill this methodological gap by developing a single index model for analyzing longi...

Research paper thumbnail of Error-in-Variables Jump Regression Using Local Clustering

Social Science Research Network, 2016

Error-in-variables regression is widely used in econometric models. The statistical analysis beco... more Error-in-variables regression is widely used in econometric models. The statistical analysis becomes challenging when the regression function is discontinuous and the distribution of measurement error is unknown. In this paper, we propose a novel jump-preserving curve estimation method. A major feature of our method is that it can remove the noise effectively while preserving the jumps well, without requiring much prior knowledge about the measurement error distribution. The jump-preserving property is achieved mainly by local clustering. We show that the proposed curve estimator is statistical consistent, and it performs favorably, in comparison with an existing jump-preserving estimator. Finally, we demonstrate our method by an application to a health tax policy study in Australia.

Research paper thumbnail of A New Process Control Chart for Monitoring Short-Range Serially Correlated Data

Technometrics, May 9, 2019

Statistical process control (SPC) charts are critically important for quality control and managem... more Statistical process control (SPC) charts are critically important for quality control and management in manufacturing industries, environmental monitoring, disease surveillance and many other applications. Conventional SPC charts are designed for cases when process observations are independent at different observation times. In practice, however, serial data correlation almost always exists in sequential data. It has been well demonstrated in the literature that control charts designed for independent data are unstable for monitoring serially correlated data. Thus, it is important to develop control charts specifically for monitoring serially correlated data. To this end, there is some existing discussion in the SPC literature. Most existing methods are based on parametric time series modeling and residual monitoring, where the data are often assumed to be normally distributed. In applications, however, the assumed parametric time series model with a given order and the normality assumption are often invalid, resulting in unstable process monitoring. Although there is some nice discussion on robust design of such residual monitoring control charts, the suggested designs can only handle certain special cases well. In this paper, we try to make another effort by proposing a novel control chart that makes use of the restarting mechanism of a CUSUM chart and the related spring length concept. Our proposed chart uses observations within the spring length of the current time point and ignores all history data that are beyond the spring length. It does not require any parametric time series model and/or a parametric process distribution. It only requires the assumption that process observation at a given time point is associated with nearby observations and independent of observations that are far away in observation times, which should be reasonable for many applications. Numerical studies show that it performs well in different cases.

Research paper thumbnail of A Change-Point Approach for Phase-I Analysis in Multivariate Profile Monitoring and Diagnosis

Technometrics, Apr 2, 2016

We study the phase I analysis of a multistage process where the input of the current process stag... more We study the phase I analysis of a multistage process where the input of the current process stage may be closely related to the output(s) of the earlier stage(s). We frame univariate observations from each of the stages in a multistage process as a single vector and recognize that the directions in which these vectors can shift are limited when attention is restricted to a single step shift in the mean of one stage. This allows us to focus detection power on a limited subspace with improved sensitivity. Taking advantage of this particular characteristic, we propose a change point approach that integrates the classical binary segmentation test with the directional information based on the state-space model for testing the stability of a batch of historical data. We give an accurate approximation for the significance level of the proposed test. Our simulation results show that the proposed approach consistently outperforms existing methods for multistage processes.

Research paper thumbnail of Multivariate Image Analysis

Multivariate Image Analysis

Technometrics, May 1, 2000

Research paper thumbnail of Sequential adaptive design for jump regression estimation

IISE transactions, Dec 3, 2021

Selecting input variables or design points for statistical models has been of great interest in a... more Selecting input variables or design points for statistical models has been of great interest in adaptive design and active learning. Motivated by two scientific examples, this paper presents a strategy of selecting the design points for a regression model when the underlying regression function is discontinuous. The first example we undertook was for the purpose of accelerating imaging speed in a high resolution material imaging; the second was use of sequential design for the purpose of mapping a chemical phase diagram. In both examples, the underlying regression functions have discontinuities, so many of the existing design optimization approaches cannot be applied because they mostly assume a continuous regression function. Although some existing adaptive design strategies developed from treed regression models can handle the discontinuities, the Bayesian approaches come with computationally expensive Markov Chain Monte Carlo techniques for posterior inferences and subsequent design point selections, which is not appropriate for the first motivating example that requires computation at least faster than the original imaging speed. In addition, the treed models are based on the domain partitioning that are inefficient when the discontinuities occurs over complex sub-domain boundaries. We propose a simple and effective adaptive design strategy for a regression analysis with discontinuities: some statistical properties with a fixed design will be presented first, and then these properties will be used to propose a new criterion of selecting the design points for the regression analysis. Sequential design with the new criterion will be presented with comprehensive simulated examples, and its application to the two motivating examples will be presented.

Research paper thumbnail of Transparent Sequential Learning for Statistical Process Control of Serially Correlated Data

Transparent Sequential Learning for Statistical Process Control of Serially Correlated Data

Technometrics, Jun 28, 2021

Machine learning methods have been widely used in different applications, including process contr... more Machine learning methods have been widely used in different applications, including process control and monitoring. For handling statistical process control (SPC) problems, conventional supervised ...

Research paper thumbnail of Nonparametric dynamic screening system for monitoring correlated longitudinal data

Iie Transactions, Jun 9, 2016

In many applications, including disease early detection and prevention, and performance evaluatio... more In many applications, including disease early detection and prevention, and performance evaluation of airplanes and other durable products, we need to sequentially monitor the longitudinal pattern of certain performance variables of a subject. A signal should be given as soon as possible once the pattern becomes abnormal. Recently, a new statistical method called dynamic screening system (DySS) has been proposed to solve this problem. It is a combination of longitudinal data analysis and statistical process control. However, the current DySS method can only handle cases when observations are normally distributed and within-subject observations are independent or follow a specific time series model (e.g., AR(1) model). In this paper, we propose a new nonparametric DySS method which can handle cases when the observation distribution and the correlation among within-subject observations are arbitrary. Therefore, it broadens the application of the DySS method greatly. Numerical studies show that the new method works well in practice.

Research paper thumbnail of Some perspectives on nonparametric statistical process control

Journal of Quality Technology, Jan 2, 2018

Statistical process control (SPC) charts play a central role in quality control and management. M... more Statistical process control (SPC) charts play a central role in quality control and management. Many conventional SPC charts are designed under the assumption that the related process distribution is normal. In practice, the normality assumption is often invalid. In such cases, some papers show that certain conventional SPC charts are robust and they can still be used as long as their parameters are properly chosen. Some other papers argue that results from such conventional SPC charts would not be reliable and nonparametric SPC charts should be considered instead. In recent years, many nonparametric SPC charts have been proposed. Most of them are based on the ranking information in process observations collected at different time points. Some of them are based on data categorization and categorical data analysis. In this paper, we give some perspectives on issues related to the robustness of the conventional SPC charts and to the strengths and limitations of various nonparametric SPC charts.

Research paper thumbnail of Multivariate Statistical Process Control Using LASSO

Journal of the American Statistical Association, Dec 1, 2009

This paper develops a new multivariate statistical process control (SPC) methodology based on ada... more This paper develops a new multivariate statistical process control (SPC) methodology based on adapting the LASSO variable selection method to the SPC problem. The LASSO method has the sparsity property that it can select exactly the set of nonzero regression coefficients in multivariate regression modeling, which is especially useful in cases when the number of nonzero coefficients is small. In multivariate SPC applications, process mean vectors often shift in a small number of components. Our major goal is to detect such a shift as soon as it occurs and identify the shifted mean components. Using this connection between the two problems, a LASSO-based multivariate test statistic is proposed, which is then integrated into the multivariate EWMA charting scheme for Phase II multivariate process monitoring. It is shown that this approach balances protection against various shift levels and shift directions, and hence provides an effective tool for multivariate SPC applications.

Research paper thumbnail of Two robust multivariate exponentially weighted moving average charts to facilitate distinctive product quality features assessment

Two robust multivariate exponentially weighted moving average charts to facilitate distinctive product quality features assessment

Computers & Industrial Engineering, Jul 1, 2023

Research paper thumbnail of On Jump Structure Consideration in One-Dimensional Nonparametric Regression

This article introduces some recent local smoothing methods in fitting one dimensional jump regre... more This article introduces some recent local smoothing methods in fitting one dimensional jump regression models. Their strengths and limitations are discussed from several directions including: (1) their ability to get rid of the effect of slope or curvature of the regression curve on jump detection, (2) their ability to diminish the effect of noise, and (3) their ability to detect jumps in both the regression function itself and its derivatives.

Research paper thumbnail of Wiley Series in Probability and Statistics

Wiley Series in Probability and Statistics

John Wiley & Sons, Inc. eBooks, May 20, 2005

Research paper thumbnail of Sequential Adaptive Design for Jump Regression Estimation in Materials Discovery

arXiv (Cornell University), Apr 2, 2019

Selecting input variables or design points for statistical models has been of great interest in a... more Selecting input variables or design points for statistical models has been of great interest in adaptive design and active learning. Motivated by two scientific examples, this paper presents a strategy of selecting the design points for a regression model when the underlying regression function is discontinuous. The first example we undertook was for the purpose of accelerating imaging speed in a high resolution material imaging; the second was use of sequential design for the purpose of mapping a chemical phase diagram. In both examples, the underlying regression functions have discontinuities, so many of the existing design optimization approaches cannot be applied because they mostly assume a continuous regression function. Although some existing adaptive design strategies developed from treed regression models can handle the discontinuities, the Bayesian approaches come with computationally expensive Markov Chain Monte Carlo techniques for posterior inferences and subsequent design point selections, which is not appropriate for the first motivating example that requires computation at least faster than the original imaging speed. In addition, the treed models are based on the domain partitioning that are inefficient when the discontinuities occurs over complex sub-domain boundaries. We propose a simple and effective adaptive design strategy for a regression analysis with discontinuities: some statistical properties with a fixed design will be presented first, and then these properties will be used to propose a new criterion of selecting the design points for the regression analysis. Sequential design with the new criterion will be presented with comprehensive simulated examples, and its application to the two motivating examples will be presented.

Research paper thumbnail of Nonparametric profile monitoring by mixed effects modeling

Quality Engineering, 2011

In some applications, quality of a process is characterized by the functional relationship betwee... more In some applications, quality of a process is characterized by the functional relationship between a response variable and one or more explanatory variables. Profile monitoring is for checking the stability of this relationship over time. Control charts for monitoring nonparametric profiles are useful when the relationship is too complicated to be described parametrically. Most existing control charts in the literature are for monitoring parametric profiles. They require the assumption that within-profile measurements are independent of each other, which is often invalid in practice. This paper focuses on nonparametric profile monitoring when within-profile data are correlated. A novel control chart is suggested, which incorporates local linear kernel smoothing into the exponentially weighted moving average (EWMA) control scheme. In this method, within-profile correlation is described by a nonparametric mixed-effects model. Our proposed control chart is fast to compute and convenient to use. Numerical examples show that it works well in various cases. Some technical details are provided in an appendix available online as supplemental materials.

Research paper thumbnail of Nonparametric Profile Monitoring by Mixed Effects Modeling - Supplementary Material

Technometrics, Aug 1, 2010

In some applications, quality of a process is characterized by the functional relationship betwee... more In some applications, quality of a process is characterized by the functional relationship between a response variable and one or more explanatory variables. Profile monitoring is for checking the stability of this relationship over time. Control charts for monitoring nonparametric profiles are useful when the relationship is too complicated to be described parametrically. Most existing control charts in the literature are for monitoring parametric profiles. They require the assumption that within-profile measurements are independent of each other, which is often invalid in practice. This paper focuses on nonparametric profile monitoring when within-profile data are correlated. A novel control chart is suggested, which incorporates local linear kernel smoothing into the exponentially weighted moving average (EWMA) control scheme. In this method, within-profile correlation is described by a nonparametric mixed-effects model. Our proposed control chart is fast to compute and convenient to use. Numerical examples show that it works well in various cases. Some technical details are provided in an appendix available online as supplemental materials.

Research paper thumbnail of Analysis of US Household Catastrophic Health Care Expenditures Associated With Chronic Disease, 2008-2018

JAMA network open, May 27, 2022

Based on the federal poverty guidelines (low income: FPL, <200%; middle income: FPL, 200%-400%; h... more Based on the federal poverty guidelines (low income: FPL, <200%; middle income: FPL, 200%-400%; high income: FPL, >400%). e Includes other state-based programs.

Research paper thumbnail of Dynamic Disease Screening by Joint Modelling of Survival and Longitudinal Data

Dynamic Disease Screening by Joint Modelling of Survival and Longitudinal Data

Applied statistics, May 27, 2022

Research paper thumbnail of SUPP MATERIAL: Efficient Blind Image Deblurring Using Nonparametric Regression and Local Pixel Clustering

SUPP MATERIAL: Efficient Blind Image Deblurring Using Nonparametric Regression and Local Pixel Clustering

Technometrics, Dec 4, 2018

Research paper thumbnail of Abstract 591: Circulating testosterone in modifying the association of BMI change rate with serum PSA in prostate cancer-free men with initial-PSA less than 4 ng/mL

Abstract 591: Circulating testosterone in modifying the association of BMI change rate with serum PSA in prostate cancer-free men with initial-PSA less than 4 ng/mL

Background: Body mass index (BMI)-adjusted prostate-specific antigen (PSA) model has been propose... more Background: Body mass index (BMI)-adjusted prostate-specific antigen (PSA) model has been proposed to improve the predictive accuracy of serum PSA in prostate cancer (PCa) screening. However, how BMI change rate may influence PSA levels in PCa-free men has not been well studied. The current study is to examine the relationship between BMI change rate and serum PSA in PCa-free men and whether this relationship is modified by circulating testosterone. Methods: We conducted this study at a tertiary hospital in the Southeastern US using the Electronic Medical Records of PCa-free men with initial PSA less than 4 ng/mL (cutoff for prostate biopsy), at least 1 testosterone measurement and at least 2 BMI measurements during the study period. Time when the first BMI measurement was recorded served as the baseline, and the study period was defined from baseline to the most recent hospital visit. The included medical records ranged from Jun 2001 to Oct 2015. BMI change rate was created in two ways depending on the number of data points. For men with only 2 BMI measurements, it was calculated by firstly subtracting baseline BMI from the second BMI, then dividing the difference by time interval (months) between the two BMI measurements. For men with more than 2 BMI measurements, we firstly regressed BMI to time interval (months) between that measurement and baseline, then took the β regression coefficient (slope) as the BMI change rate for that men. Multivariable linear regression was used to assess the association of BMI change rate with three PSA measures, including peak, the most recent, and mean PSA during the study period. Effect modification by testosterone was assessed through stratified analysis by testosterone level of 280 ng/dL as cutoff. Results: A total of 470 men with a mean study period of 97.6 months were included. Median age at baseline was 62 years. After adjusting for covariates including baseline BMI, no significant association of BMI change rate was observed with peak PSA (β =0.416, P =0.078), the most recent PSA (β =0.360, P =0.139), or mean PSA (β =0.405, P =0.064) in the overall sample. However, testosterone-stratified analyses indicated that BMI change rate was positively associated with peak PSA (β =1.118, P =0.013), the most recent PSA (β =0.932, P =0.044), and mean PSA (β =1.034, P =0.013) in men with testosterone &lt;280 ng/dL, but no significant association was observed in men with testosterone ≥280 ng/dL (for peak PSA, β =0.076, P =0.785; for the most recent PSA, β =0.072, P =0.802; for mean PSA, β =0.099, P =0.700). Conclusion: Accelerated BMI increase in middle-to-late adulthood might correlate with higher PSA level if a low circulating testosterone occurred concurrently. Further studies are needed to confirm this finding. Citation Format: Kai Wang, Mattia Prosperi, Peihua Qiu, Ting-Yuan David Cheng, Victoria Y. Bird, Xinguang Chen, Mingyang Song. Circulating testosterone in modifying the association of BMI change rate with serum PSA in prostate cancer-free men with initial-PSA less than 4 ng/mL [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 591.

Research paper thumbnail of Multivariate single index modeling of longitudinal data with multiple responses

Multivariate single index modeling of longitudinal data with multiple responses

Statistics in Medicine

In medical studies, composite indices and/or scores are routinely used for predicting medical con... more In medical studies, composite indices and/or scores are routinely used for predicting medical conditions of patients. These indices are usually developed from observed data of certain disease risk factors, and it has been demonstrated in the literature that single index models can provide a powerful tool for this purpose. In practice, the observed data of disease risk factors are often longitudinal in the sense that they are collected at multiple time points for individual patients, and there are often multiple aspects of a patient's medical condition that are of our concern. However, most existing single‐index models are developed for cases with independent data and a single response variable, which are inappropriate for the problem just described in which within‐subject observations are usually correlated and there are multiple mutually correlated response variables involved. This paper aims to fill this methodological gap by developing a single index model for analyzing longi...