A. Pepelyshev - Academia.edu (original) (raw)
Papers by A. Pepelyshev
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2003
Estimation and experimental design in a non-linear regression model that is used in microbiology ... more Estimation and experimental design in a non-linear regression model that is used in microbiology are studied. The Monod model is defined implicitly by a differential equation and has numerous applications in microbial growth kinetics, water research, pharmacokinetics and plant physiology. It is proved that least squares estimates are asymptotically unbiased and normally distributed. The asymptotic covariance matrix of the estimator is the basis for the construction of efficient designs of experiments. In particular locally D-, Eand c-optimal designs are determined and their properties are studied theoretically and by simulation. If certain intervals for the non-linear parameters can be specified, locally optimal designs can be constructed which are robust with respect to a misspecification of the initial parameters and which allow efficient parameter estimation. Parameter variances can be decreased by a factor of 2 by simply sampling at optimal times during the experiment.
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2011
We consider the problem of optimal design of experiments for random effects models, especially po... more We consider the problem of optimal design of experiments for random effects models, especially population models, where a small number of correlated observations can be taken on each individual, while the observations corresponding to different individuals can be assumed to be uncorrelated. We focus on c-optimal design problems and show that the classical equivalence theorem and the famous geometric characterization of Elfving (1952) from the case of uncorrelated data can be adapted to the problem of selecting optimal sets of observations for the n individual patients. The theory is demonstrated in a linear model with correlated observations and a nonlinear random effects population model, which is commonly used in pharmacokinetics.
Journal of the American Statistical Association, 2008
Identifying the "right" dose is one of the most critical and difficult steps in the clinical deve... more Identifying the "right" dose is one of the most critical and difficult steps in the clinical development process of any medicinal drug. Its importance cannot be understated: selecting too high a dose can result in unacceptable toxicity and associated safety problems, while choosing too low a dose leads to smaller chances of showing sufficient efficacy in confirmatory trials, thus reducing the chance of approval for the drug. In this paper we investigate the problem of deriving efficient designs for the estimation of the minimum effective dose (MED) by determining the appropriate number and actual levels of the doses to be administered to patients, as well as their relative sample size allocations. More specifically, we derive local optimal designs that minimize the asymptotic variance of the MED estimate under a particular dose response model. The small sample properties of these designs are investigated via simulation, together with their sensitivity to misspecification of the true parameter values and of the underlying dose response model. Finally, robust optimal designs are constructed, which take into account a set of potential dose response profiles within classes of models commonly used in practice.
Journal of Statistical Planning and Inference, 2008
Computational Statistics & Data Analysis, 2012
The application of Singular Spectrum Analysis (SSA) to the empirical distribution function sample... more The application of Singular Spectrum Analysis (SSA) to the empirical distribution function sampled at a grid of points spanning the range of the sample leads to a novel and promising method for the computer-intensive nonparametric estimation of both the distribution function and the density function. SSA yields a data-adaptive filter, whose length is a parameter that controls the smoothness of the filtered series. A data-adaptive algorithm for the automatic selection of a general smoothing parameter is introduced, which controls the number of modes of the estimated density. Extensive computer simulations demonstrate that the new automatic bandwidth selector improves on other popular methods for various densities of interest. A EL a model N (0, 1) Kernel est. with h LSCV 0.0071 0.0551 0.0143 Kernel est. with h SJPI 0.0066 0.0536 0.0131 Kernel est. with h ICV 0.0075 0.0546 0.0146 Kernel est. with h a 0.0063 0.0546 0.0131 SSA 1c est. 0.0061 0.0537 0.0128 107.6 SSA 2c est. 0.0060 0.0503 0.0142 145.7 SSA 3c est. 0.0090 0.0556 0.0189 143.4 SSA b est. 0.0052 0.0488 0.0141 141.7 model 0.4N (0, 1) + 0.6N (5, 2 2 ) Kernel est. with h LSCV 0.0058 0.0617 0.0231 Kernel est. with h SJPI 0.0052 0.0609 0.0210 Kernel est. with h ICV 0.0055 0.0614 0.0229 Kernel est. with h a 0.0053 0.0611 0.0236 SSA 1c est. 0.0051 0.0607 0.0215 67.6 SSA 2c est. 0.0047 0.0610 0.0195 86.4 SSA 3c est. 0.0054 0.0623 0.0219 90.3 SSA b est. 0.0052 0.0617 0.0206 89.7 model 0.3N (0, 1) + 0.7N (15, 4 2 ) Kernel est. with h LSCV 0.0048 0.0672 0.0394 Kernel est. with h SJPI 0.0069 0.0733 0.0602 Kernel est. with h ICV 0.0049 0.0670 0.0396 Kernel est. with h a 0.0050 0.0674 0.0440 SSA 1c est. 0.0053 0.0679 0.0451 40.9 SSA 2c est. 0.0046 0.0660 0.0380 60.0 SSA 3c est. 0.0043 0.0643 0.0349 73.8 SSA b est. 0.0047 0.0670 0.0391 48.9
Chemometrics and Intelligent Laboratory Systems, 2010
The main issue in the analysis of computer experiments is an uncertainty of prediction and relate... more The main issue in the analysis of computer experiments is an uncertainty of prediction and related inferences. To address the uncertainty analysis, the Bayesian analysis of deterministic computer models has been actively developed in the last decade. In the Bayesian approach, the uncertainty is expressed through a Gaussian process model. As a consequence, the resulting analysis is rather sensitive with respect to these prior assumptions. Moreover, for high dimensional data this approach leads to time consuming computations.
Bernoulli, 2010
We consider design issues for toxicology studies when we have a continuous response and the true ... more We consider design issues for toxicology studies when we have a continuous response and the true mean response is only known to be a member of a class of nested models. This class of non-linear models was proposed by toxicologists who were concerned only with estimation problems. We develop robust and efficient designs for model discrimination and for estimating parameters in the selected model at the same time. In particular, we propose designs that maximize the minimum of D-or D1-efficiencies over all models in the given class. We show that our optimal designs are efficient for determining an appropriate model from the postulated class, quite efficient for estimating model parameters in the identified model and also robust with respect to model misspecification. To facilitate the use of optimal design ideas in practice, we have also constructed a website that freely enables practitioners to generate a variety of optimal designs for a range of models and also enables them to evaluate the efficiency of any design. This is an electronic reprint of the original article published by the ISI/BS in Bernoulli, 2010, Vol. 16, No. 4, 1164-1176. This reprint differs from the original in pagination and typographic detail.
Annals of the Institute of Statistical Mathematics, 2006
In this paper we investigate local E-and c-optimal designs for exponential regression models of t... more In this paper we investigate local E-and c-optimal designs for exponential regression models of the form k i=1 a i exp (−µ i x). We establish a numerical method for the construction of efficient and local optimal designs, which is based on two results. First, we consider for fixed k the limit µ i → γ (i = 1, . . . , k) and show that the optimal designs converge weakly to the optimal designs in a heteroscedastic polynomial regression model. It is then demonstrated that in this model the optimal designs can be easily determined by standard numerical software. Secondly, it is proved that the support points and weights of the local optimal designs in the exponential regression model are analytic functions of the nonlinear parameters µ 1 , . . . , µ k . This result is used for the numerical calculation of the local E-optimal designs by means of a Taylor expansion for any vector (µ 1 , . . . , µ k ). It is also demonstrated that in the models under consideration E-optimal designs are usually more efficient for estimating individual parameters than D-optimal designs.
The Annals of Applied Statistics, 2010
We consider the problem of constructing optimal designs for population pharmacokinetics which use... more We consider the problem of constructing optimal designs for population pharmacokinetics which use random effect models. It is common practice in the design of experiments in such studies to assume uncorrelated errors for each subject. In the present paper a new approach is introduced to determine efficient designs for nonlinear least squares estimation which addresses the problem of correlation between observations corresponding to the same subject. We use asymptotic arguments to derive optimal design densities, and the designs for finite sample sizes are constructed from the quantiles of the corresponding optimal distribution function. It is demonstrated that compared to the optimal exact designs, whose determination is a hard numerical problem, these designs are very efficient. Alternatively, the designs derived from asymptotic theory could be used as starting designs for the numerical computation of exact optimal designs. Several examples of linear and nonlinear models are presented in order to illustrate the methodology. In particular, it is demonstrated that naively chosen equally spaced designs may lead to less accurate estimation.
Annals of the Institute of Statistical Mathematics, 2002
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2003
Estimation and experimental design in a non-linear regression model that is used in microbiology ... more Estimation and experimental design in a non-linear regression model that is used in microbiology are studied. The Monod model is defined implicitly by a differential equation and has numerous applications in microbial growth kinetics, water research, pharmacokinetics and plant physiology. It is proved that least squares estimates are asymptotically unbiased and normally distributed. The asymptotic covariance matrix of the estimator is the basis for the construction of efficient designs of experiments. In particular locally D-, Eand c-optimal designs are determined and their properties are studied theoretically and by simulation. If certain intervals for the non-linear parameters can be specified, locally optimal designs can be constructed which are robust with respect to a misspecification of the initial parameters and which allow efficient parameter estimation. Parameter variances can be decreased by a factor of 2 by simply sampling at optimal times during the experiment.
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2011
We consider the problem of optimal design of experiments for random effects models, especially po... more We consider the problem of optimal design of experiments for random effects models, especially population models, where a small number of correlated observations can be taken on each individual, while the observations corresponding to different individuals can be assumed to be uncorrelated. We focus on c-optimal design problems and show that the classical equivalence theorem and the famous geometric characterization of Elfving (1952) from the case of uncorrelated data can be adapted to the problem of selecting optimal sets of observations for the n individual patients. The theory is demonstrated in a linear model with correlated observations and a nonlinear random effects population model, which is commonly used in pharmacokinetics.
Journal of the American Statistical Association, 2008
Identifying the "right" dose is one of the most critical and difficult steps in the clinical deve... more Identifying the "right" dose is one of the most critical and difficult steps in the clinical development process of any medicinal drug. Its importance cannot be understated: selecting too high a dose can result in unacceptable toxicity and associated safety problems, while choosing too low a dose leads to smaller chances of showing sufficient efficacy in confirmatory trials, thus reducing the chance of approval for the drug. In this paper we investigate the problem of deriving efficient designs for the estimation of the minimum effective dose (MED) by determining the appropriate number and actual levels of the doses to be administered to patients, as well as their relative sample size allocations. More specifically, we derive local optimal designs that minimize the asymptotic variance of the MED estimate under a particular dose response model. The small sample properties of these designs are investigated via simulation, together with their sensitivity to misspecification of the true parameter values and of the underlying dose response model. Finally, robust optimal designs are constructed, which take into account a set of potential dose response profiles within classes of models commonly used in practice.
Journal of Statistical Planning and Inference, 2008
Computational Statistics & Data Analysis, 2012
The application of Singular Spectrum Analysis (SSA) to the empirical distribution function sample... more The application of Singular Spectrum Analysis (SSA) to the empirical distribution function sampled at a grid of points spanning the range of the sample leads to a novel and promising method for the computer-intensive nonparametric estimation of both the distribution function and the density function. SSA yields a data-adaptive filter, whose length is a parameter that controls the smoothness of the filtered series. A data-adaptive algorithm for the automatic selection of a general smoothing parameter is introduced, which controls the number of modes of the estimated density. Extensive computer simulations demonstrate that the new automatic bandwidth selector improves on other popular methods for various densities of interest. A EL a model N (0, 1) Kernel est. with h LSCV 0.0071 0.0551 0.0143 Kernel est. with h SJPI 0.0066 0.0536 0.0131 Kernel est. with h ICV 0.0075 0.0546 0.0146 Kernel est. with h a 0.0063 0.0546 0.0131 SSA 1c est. 0.0061 0.0537 0.0128 107.6 SSA 2c est. 0.0060 0.0503 0.0142 145.7 SSA 3c est. 0.0090 0.0556 0.0189 143.4 SSA b est. 0.0052 0.0488 0.0141 141.7 model 0.4N (0, 1) + 0.6N (5, 2 2 ) Kernel est. with h LSCV 0.0058 0.0617 0.0231 Kernel est. with h SJPI 0.0052 0.0609 0.0210 Kernel est. with h ICV 0.0055 0.0614 0.0229 Kernel est. with h a 0.0053 0.0611 0.0236 SSA 1c est. 0.0051 0.0607 0.0215 67.6 SSA 2c est. 0.0047 0.0610 0.0195 86.4 SSA 3c est. 0.0054 0.0623 0.0219 90.3 SSA b est. 0.0052 0.0617 0.0206 89.7 model 0.3N (0, 1) + 0.7N (15, 4 2 ) Kernel est. with h LSCV 0.0048 0.0672 0.0394 Kernel est. with h SJPI 0.0069 0.0733 0.0602 Kernel est. with h ICV 0.0049 0.0670 0.0396 Kernel est. with h a 0.0050 0.0674 0.0440 SSA 1c est. 0.0053 0.0679 0.0451 40.9 SSA 2c est. 0.0046 0.0660 0.0380 60.0 SSA 3c est. 0.0043 0.0643 0.0349 73.8 SSA b est. 0.0047 0.0670 0.0391 48.9
Chemometrics and Intelligent Laboratory Systems, 2010
The main issue in the analysis of computer experiments is an uncertainty of prediction and relate... more The main issue in the analysis of computer experiments is an uncertainty of prediction and related inferences. To address the uncertainty analysis, the Bayesian analysis of deterministic computer models has been actively developed in the last decade. In the Bayesian approach, the uncertainty is expressed through a Gaussian process model. As a consequence, the resulting analysis is rather sensitive with respect to these prior assumptions. Moreover, for high dimensional data this approach leads to time consuming computations.
Bernoulli, 2010
We consider design issues for toxicology studies when we have a continuous response and the true ... more We consider design issues for toxicology studies when we have a continuous response and the true mean response is only known to be a member of a class of nested models. This class of non-linear models was proposed by toxicologists who were concerned only with estimation problems. We develop robust and efficient designs for model discrimination and for estimating parameters in the selected model at the same time. In particular, we propose designs that maximize the minimum of D-or D1-efficiencies over all models in the given class. We show that our optimal designs are efficient for determining an appropriate model from the postulated class, quite efficient for estimating model parameters in the identified model and also robust with respect to model misspecification. To facilitate the use of optimal design ideas in practice, we have also constructed a website that freely enables practitioners to generate a variety of optimal designs for a range of models and also enables them to evaluate the efficiency of any design. This is an electronic reprint of the original article published by the ISI/BS in Bernoulli, 2010, Vol. 16, No. 4, 1164-1176. This reprint differs from the original in pagination and typographic detail.
Annals of the Institute of Statistical Mathematics, 2006
In this paper we investigate local E-and c-optimal designs for exponential regression models of t... more In this paper we investigate local E-and c-optimal designs for exponential regression models of the form k i=1 a i exp (−µ i x). We establish a numerical method for the construction of efficient and local optimal designs, which is based on two results. First, we consider for fixed k the limit µ i → γ (i = 1, . . . , k) and show that the optimal designs converge weakly to the optimal designs in a heteroscedastic polynomial regression model. It is then demonstrated that in this model the optimal designs can be easily determined by standard numerical software. Secondly, it is proved that the support points and weights of the local optimal designs in the exponential regression model are analytic functions of the nonlinear parameters µ 1 , . . . , µ k . This result is used for the numerical calculation of the local E-optimal designs by means of a Taylor expansion for any vector (µ 1 , . . . , µ k ). It is also demonstrated that in the models under consideration E-optimal designs are usually more efficient for estimating individual parameters than D-optimal designs.
The Annals of Applied Statistics, 2010
We consider the problem of constructing optimal designs for population pharmacokinetics which use... more We consider the problem of constructing optimal designs for population pharmacokinetics which use random effect models. It is common practice in the design of experiments in such studies to assume uncorrelated errors for each subject. In the present paper a new approach is introduced to determine efficient designs for nonlinear least squares estimation which addresses the problem of correlation between observations corresponding to the same subject. We use asymptotic arguments to derive optimal design densities, and the designs for finite sample sizes are constructed from the quantiles of the corresponding optimal distribution function. It is demonstrated that compared to the optimal exact designs, whose determination is a hard numerical problem, these designs are very efficient. Alternatively, the designs derived from asymptotic theory could be used as starting designs for the numerical computation of exact optimal designs. Several examples of linear and nonlinear models are presented in order to illustrate the methodology. In particular, it is demonstrated that naively chosen equally spaced designs may lead to less accurate estimation.
Annals of the Institute of Statistical Mathematics, 2002