Athanasios Kottas - Academia.edu (original) (raw)
Papers by Athanasios Kottas
We develop a class of nearest neighbor mixture transition distribution process (NNMP) models that... more We develop a class of nearest neighbor mixture transition distribution process (NNMP) models that provides flexibility and scalability for non-Gaussian geostatistical data. We use a directed acyclic graph to define a proper spatial process with finite-dimensional distributions given by finite mixtures. We develop conditions to construct general NNMP models with pre-specified stationary marginal distributions. We also establish lower bounds for the strength of the tail dependence implied by NNMP models, demonstrating the flexibility of the proposed methodology for modeling multivariate dependence through bivariate distribution specification. To implement inference and prediction, we formulate a Bayesian hierarchical model for the data, using the NNMP prior model for the spatial random effects process. From an inferential point of view, the NNMP model lays out a new computational approach to handling large spatial data sets, leveraging the mixture model structure to avoid computationa...
arXiv (Cornell University), Nov 8, 2022
We develop a nonparametric Bayesian modeling approach to ordinal regression based on priors place... more We develop a nonparametric Bayesian modeling approach to ordinal regression based on priors placed directly on the discrete distribution of the ordinal responses. The prior probability models are built from a structured mixture of multinomial distributions. We leverage a continuation-ratio logits representation to formulate the mixture kernel, with mixture weights defined through the logit stick-breaking process that incorporates the covariates through a linear function. The implied regression functions for the response probabilities can be expressed as weighted sums of parametric regression functions, with covariate-dependent weights. Thus, the modeling approach achieves flexible ordinal regression relationships, avoiding linearity or additivity assumptions in the covariate effects. A key model feature is that the parameters for both the mixture kernel and the mixture weights can be associated with a continuation-ratio logits regression structure. Hence, an efficient and relatively easy to implement posterior simulation method can be designed, using Pólya-Gamma data augmentation. Moreover, the model is built from a conditional independence structure for category-specific parameters, which results in additional computational efficiency gains through partial parallel sampling. In addition to the general mixture structure, we study simplified model versions that incorporate covariate dependence only in the mixture kernel parameters or only in the mixture weights. For all proposed models, we discuss approaches to prior specification and develop Markov chain Monte Carlo methods for posterior simulation. The methodology is illustrated with several synthetic and real data examples.
Journal of the American Statistical Association, 2009
The CRASH computer model simulates the effect of a vehicle colliding against different barrier ty... more The CRASH computer model simulates the effect of a vehicle colliding against different barrier types. If it accurately represents real vehicle crashworthiness, the computer model can be of great value in various aspects of vehicle design, such as the setting of timing of air bag releases. The goal of this study is to address the problem of validating the computer model for such design goals, based on utilizing computer model runs and experimental data from real crashes. This task is complicated by the fact that (i) the output of this model consists of smooth functional data, and (ii) certain types of collision have very limited data. We address problem (i) by extending existing Gaussian process-based methodology developed for models that produce real-valued output, and resort to Bayesian hierarchical modeling to attack problem (ii).
Journal of Computational and Graphical Statistics, 2021
Mixture transition distribution time series models build high-order dependence through a weighted... more Mixture transition distribution time series models build high-order dependence through a weighted combination of first-order transition densities for each one of a specified number of lags. We present a framework to construct stationary mixture transition distribution models that extend beyond linear, Gaussian dynamics. We study conditions for first-order strict stationarity which allow for different constructions with either continuous or discrete families for the first-order transition densities given a pre-specified family for the marginal density, and with general forms for the resulting conditional expectations. Inference and prediction are developed under the Bayesian framework with particular emphasis on flexible, structured priors for the mixture weights. Model properties are investigated both analytically and through synthetic data examples. Finally, Poisson and Lomax examples are illustrated through real data applications.
Journal of Computational and Graphical Statistics, 2021
We develop two models for Bayesian estimation and selection in high-order, discrete-state Markov ... more We develop two models for Bayesian estimation and selection in high-order, discrete-state Markov chains. Both are based on the mixture transition distribution, which constructs a transition probability tensor with additive mixing of probabilities from first-order transition matrices. We demonstrate two uses for the proposed models: parsimonious approximation of high-order dynamics by mixing lower-order transition models, and order/lag selection through over-specification and shrinkage via priors for sparse probability vectors. The priors further shrink all models to an identifiable and interpretable parameterization, useful for data analysis. We discuss properties of the models and demonstrate their utility with simulation studies. We further apply the methodology to a data analysis from the high-order Markov chain literature and to a time series of pink salmon abundance in a creek in Alaska, U.S.A.
Statistics and Computing, 2016
Stationary time series models built from parametric distributions are, in general, limited in sco... more Stationary time series models built from parametric distributions are, in general, limited in scope due to the assumptions imposed on the residual distribution and autoregression relationship. We present a modeling approach for univariate time series data, which makes no assumptions of stationarity, and can accommodate complex dynamics and capture nonstandard distributions. The model arises from a Bayesian nonparametric mixture of normals specification for the joint distribution of successive observations in time. This implies a flexible autoregressive form for the conditional transition density, defining a time-homogeneous, nonstationary, Markovian model for real-valued data indexed in discrete-time. To obtain a more computationally tractable algorithm for posterior inference, we utilize a square-root-free Cholesky decomposition of the mixture kernel covariance matrix. Results from simulated data suggest the model is able to recover challenging transition and predictive densities. We also illustrate the model on time intervals between eruptions of the Old Faithful geyser. Extensions to accommodate higher order structure and to develop a state-space model are also discussed.
Statistics and Computing, 2016
Integro-Difference Equations (IDEs) provide a flexible framework for dynamic modeling of spatio-t... more Integro-Difference Equations (IDEs) provide a flexible framework for dynamic modeling of spatio-temporal data. The choice of kernel in an IDE model relates directly to the underlying physical process modeled, and it can affect model fit and predictive accuracy. We introduce Bayesian nonparametric methods to the IDE literature as a means to allow flexibility in modeling the kernel. We propose a mixture of normal distributions for the IDE kernel, built from a spatial Dirichlet process for the mixing distribution, which can model kernels with shapes that change with location. This allows the IDE model to capture non-stationarity with respect to location and to reflect a changing physical process across the domain. We address computational concerns for inference that leverage the use of Hermite polynomials as a basis for the representation of the process and the IDE kernel, and incorporate Hamiltonian Markov chain Monte Carlo steps in the posterior simulation method. An example with synthetic data demonstrates that the model can successfully capture location-dependent dynamics. Moreover, using a data set of ozone pressure, we show that the spatial Dirichlet process mixture model outperforms several alternative models for the IDE kernel, including the state of the art in the IDE literature, that is, a Gaussian kernel with location-dependent parameters.
SSRN Electronic Journal, 2014
We develop a Bayesian nonparametric model to assess the effect of systematic risks on multiple fi... more We develop a Bayesian nonparametric model to assess the effect of systematic risks on multiple financial markets, and apply it to understand the behavior of the S&P500 sector indexes between January 1, 2000 and December 31, 2011. More than prediction, our main goal is to understand the evolution of systematic and idiosyncratic risks in the U.S. economy over this particular time period, leading to novel sectorspecific risk indexes. To accomplish this goal, we model the appearance of extreme losses in each market using a superposition of two Poisson processes, one that corresponds to systematic risks that are shared by all sectors, and one that correspond to the idiosyncratic risk associated with a specific sector. In order to capture changes in the risk structure over time, the intensity functions associated with each of the underlying components are modeled using a Dirichlet process mixture model. Among other interesting results, our analysis of the S&P500 index suggests that there are few idiosyncratic risks associated with the consumer staples sector, whose extreme negative log returns appear to be driven mostly by systematic risks.
Journal of the American Statistical Association, 2009
The CRASH computer model simulates the effect of a vehicle colliding against different barrier ty... more The CRASH computer model simulates the effect of a vehicle colliding against different barrier types. If it accurately represents real vehicle crashworthiness, the computer model can be of great value in various aspects of vehicle design, such as the setting of timing of air bag releases. The goal of this study is to address the problem of validating the computer model for such design goals, based on utilizing computer model runs and experimental data from real crashes. This task is complicated by the fact that (i) the output of this model consists of smooth functional data, and (ii) certain types of collision have very limited data. We address problem (i) by extending existing Gaussian process-based methodology developed for models that produce real-valued output, and resort to Bayesian hierarchical modeling to attack problem (ii).
IEEE Transactions on Geoscience and Remote Sensing, 2008
Research work Research interests Bayesian nonparametric modeling and inference; Analysis of compu... more Research work Research interests Bayesian nonparametric modeling and inference; Analysis of computer model experiments; Inference under probability order constraints; Mixture models; Quantile regression; Spatial statistics; Survival analysis. Applications in ecology and engineering.
IEEE Transactions on Geoscience and Remote Sensing, 2008
Research work Research interests Bayesian nonparametric modeling and inference; Analysis of compu... more Research work Research interests Bayesian nonparametric modeling and inference; Analysis of computer model experiments; Inference under probability order constraints; Mixture models; Quantile regression; Spatial statistics; Survival analysis. Applications in ecology and engineering.
Annals of the Institute of Statistical Mathematics, 1999
Bayesian Analysis, 2022
We develop a Bayesian nonparametric autoregressive model applied to flexibly estimate general tra... more We develop a Bayesian nonparametric autoregressive model applied to flexibly estimate general transition densities exhibiting nonlinear lag dependence. Our approach is related to Bayesian density regression using Dirichlet process mixtures, with the Markovian likelihood defined through the conditional distribution obtained from the mixture. This results in a Bayesian nonparametric extension of a mixtures-of-experts model formulation. We address computational challenges to posterior sampling that arise from the Markovian structure in the likelihood. The base model is illustrated with synthetic data from a classical model for population dynamics, as well as a series of waiting times between eruptions of Old Faithful Geyser. We study inferences available through the base model before extending the methodology to include automatic relevance detection among a pre-specified set of lags. Inference for global and local lag selection is explored with additional simulation studies, and the methods are illustrated through analysis of an annual time series of pink salmon abundance in a stream in Alaska. We further explore and compare transition density estimation performance for alternative configurations of the proposed model.
The mean residual life function is a key functional for a survival distribution. It has practical... more The mean residual life function is a key functional for a survival distribution. It has practically useful interpretation as the expected remaining lifetime given survival up to a particular time point, and it also characterizes the survival distribution. However, it has received limited attention in terms of inference methods under a probabilistic modeling framework. In this paper, we seek to provide general inference methodology for mean residual life regression. Survival data often include a set of predictor variables for the survival response distribution, and in many cases it is natural to include the covariates as random variables into the modeling. We thus propose a Dirichlet process mixture modeling approach for the joint stochastic mechanism of the covariates and survival responses. This approach implies a flexible model structure for the mean residual life of the conditional response distribution, allowing general shapes for mean residual life as a function of covariates g...
We propose a general nonparametric Bayesian framework for binary regression, which is built from ... more We propose a general nonparametric Bayesian framework for binary regression, which is built from modelling for the joint response-covariate distribution. The observed binary responses are assumed to arise from underlying continuous random variables through discretization, and we model the joint distribution of these latent responses and the covariates using a Dirichlet pro-cess mixture of multivariate normals. We show that the kernel of the induced mixture model for the observed data is identifiable upon a restriction on the latent variables. To allow for appro-priate dependence structure while facilitating identifiability, we use a square-root-free Cholesky decomposition of the covariance matrix in the normal mixture kernel. In addition to allowing for the necessary restriction, this modelling strategy provides substantial simplifications in im-plementation of Markov chain Monte Carlo posterior simulation. We illustrate the utility of the modelling approach with two data examples, ...
We propose a prior probability model for two distributions that are ordered according to a stocha... more We propose a prior probability model for two distributions that are ordered according to a stochastic precedence constraint, a weaker restriction than the more commonly utilized stochastic order constraint. The modeling approach is based on structured Dirichlet process mixtures of normal distributions. Full inference for functionals of the stochastic precedence constrained mixture distributions is obtained through a Markov chain Monte Carlo posterior simulation method. A motivating application involves study of the discriminatory ability of continuous diagnostic tests in epidemiologic research. Here, stochastic precedence provides a natural restriction for the distributions of test scores corresponding to the non-infected and infected groups. Inference under the model is illustrated with data from a diagnostic test for Johne's disease in dairy cattle. We also apply the methodology to comparison of survival distributions associated with two distinct conditions, and illustrate with analysis of data on survival time after bone marrow transplantation for treatment of leukemia.
Widely used parametric generalizedlinear models are, unfortunately,a somewhat limited class of sp... more Widely used parametric generalizedlinear models are, unfortunately,a somewhat limited class of speci � cations. Nonparametric aspects are often introduced to enrich this class, resultingin semiparametricmodels. Focusing on single or k-sample problems,many classical nonparametricapproachesare limited to hypothesistesting. Those that allow estimation are limited to certain functionals of the underlying distributions. Moreover, the associated inference often relies upon asymptotics when nonparametric speci � cations are often most appealing for smaller sample sizes. Bayesian nonparametricapproachesavoid asymptotics but have, to date, been limited in the range of inference. Working with Dirichlet process priors, we overcome the limitations of existing simulation-basedmodel � tting approaches which yield inference that is con � ned to posterior moments of linear functionals of the population distribution.This article provides a computationalapproach to obtain the entire posterior distrib...
Statistical Equivalent Models, or SEMs, have recently attracted considerable interest as a genera... more Statistical Equivalent Models, or SEMs, have recently attracted considerable interest as a general approach to study computer simulators. By fitting a statistical model to the simulator’s output, SEMs provide an efficient way to quickly explore the simulator’s result. In this paper, we develop a SEM for random waypoint mobility, one of the most widely used mobility models employed by network simulators in the evaluation of communication protocols for wireless multi-hop ad hoc networks (MANETs). We chose the random waypoint mobility model as a case study of SEMs due to recent results pointing out some serious drawbacks of the model (e.g., [1]). In particular, these studies show that, under the random waypoint mobility regime, average node speed tends to zero in steady state. They also show that average node speed varies considerably from the expected average value for the time scales under consideration in most simulation analysis. In order to investigate further the behavior of the ...
The process of predicting a satellite observation of a vegetated region (e.g. a MODIS scene) invo... more The process of predicting a satellite observation of a vegetated region (e.g. a MODIS scene) involves running a Radiative Transfer Model (RTM). The RTM takes as input various biospheric and illumination parameters and computes the upwelling radiation at the top of the canopy that is ultimately observed by the satellite mounted sensor. The question we address is the following: which of the inputs to the RTM has the greatest impact on the computed observation? We use the Leaf Canopy Model (LCM) RTM as a surrogate for the RTM used as the basis of the MODIS production algorithm. The LCM was designed to study the feasibility of observing leaf chemistry remotely. It takes as input leaf chemistry variables (chlorophyll, water, lignin, cellulose) and canopy structural parameters (leaf area index, leaf angle distribution, soil reflectance, sun angle). The influence of each input variable, or small subsets of the inputs, is captured through the determination of the “main effects”. Computing t...
We develop a prior probability model for temporal Poisson process intensities through structured ... more We develop a prior probability model for temporal Poisson process intensities through structured mixtures of Erlang densities with common scale parameter, mixing on the integer shape parameters. The mixture weights are constructed through increments of a cumulative intensity function which is modeled nonparametrically with a gamma process prior. Such model specification provides a novel extension of Erlang mixtures for density estimation to the intensity estimation setting. The prior model structure supports general shapes for the point process intensity function, and it also enables effective handling of the Poisson process likelihood normalizing term resulting in efficient posterior simulation. The Erlang mixture modeling approach is further elaborated to develop an inference method for spatial Poisson processes. The methodology is examined relative to existing Bayesian nonparametric modeling approaches, including empirical comparison with Gaussian process prior based models, and ...
We develop a class of nearest neighbor mixture transition distribution process (NNMP) models that... more We develop a class of nearest neighbor mixture transition distribution process (NNMP) models that provides flexibility and scalability for non-Gaussian geostatistical data. We use a directed acyclic graph to define a proper spatial process with finite-dimensional distributions given by finite mixtures. We develop conditions to construct general NNMP models with pre-specified stationary marginal distributions. We also establish lower bounds for the strength of the tail dependence implied by NNMP models, demonstrating the flexibility of the proposed methodology for modeling multivariate dependence through bivariate distribution specification. To implement inference and prediction, we formulate a Bayesian hierarchical model for the data, using the NNMP prior model for the spatial random effects process. From an inferential point of view, the NNMP model lays out a new computational approach to handling large spatial data sets, leveraging the mixture model structure to avoid computationa...
arXiv (Cornell University), Nov 8, 2022
We develop a nonparametric Bayesian modeling approach to ordinal regression based on priors place... more We develop a nonparametric Bayesian modeling approach to ordinal regression based on priors placed directly on the discrete distribution of the ordinal responses. The prior probability models are built from a structured mixture of multinomial distributions. We leverage a continuation-ratio logits representation to formulate the mixture kernel, with mixture weights defined through the logit stick-breaking process that incorporates the covariates through a linear function. The implied regression functions for the response probabilities can be expressed as weighted sums of parametric regression functions, with covariate-dependent weights. Thus, the modeling approach achieves flexible ordinal regression relationships, avoiding linearity or additivity assumptions in the covariate effects. A key model feature is that the parameters for both the mixture kernel and the mixture weights can be associated with a continuation-ratio logits regression structure. Hence, an efficient and relatively easy to implement posterior simulation method can be designed, using Pólya-Gamma data augmentation. Moreover, the model is built from a conditional independence structure for category-specific parameters, which results in additional computational efficiency gains through partial parallel sampling. In addition to the general mixture structure, we study simplified model versions that incorporate covariate dependence only in the mixture kernel parameters or only in the mixture weights. For all proposed models, we discuss approaches to prior specification and develop Markov chain Monte Carlo methods for posterior simulation. The methodology is illustrated with several synthetic and real data examples.
Journal of the American Statistical Association, 2009
The CRASH computer model simulates the effect of a vehicle colliding against different barrier ty... more The CRASH computer model simulates the effect of a vehicle colliding against different barrier types. If it accurately represents real vehicle crashworthiness, the computer model can be of great value in various aspects of vehicle design, such as the setting of timing of air bag releases. The goal of this study is to address the problem of validating the computer model for such design goals, based on utilizing computer model runs and experimental data from real crashes. This task is complicated by the fact that (i) the output of this model consists of smooth functional data, and (ii) certain types of collision have very limited data. We address problem (i) by extending existing Gaussian process-based methodology developed for models that produce real-valued output, and resort to Bayesian hierarchical modeling to attack problem (ii).
Journal of Computational and Graphical Statistics, 2021
Mixture transition distribution time series models build high-order dependence through a weighted... more Mixture transition distribution time series models build high-order dependence through a weighted combination of first-order transition densities for each one of a specified number of lags. We present a framework to construct stationary mixture transition distribution models that extend beyond linear, Gaussian dynamics. We study conditions for first-order strict stationarity which allow for different constructions with either continuous or discrete families for the first-order transition densities given a pre-specified family for the marginal density, and with general forms for the resulting conditional expectations. Inference and prediction are developed under the Bayesian framework with particular emphasis on flexible, structured priors for the mixture weights. Model properties are investigated both analytically and through synthetic data examples. Finally, Poisson and Lomax examples are illustrated through real data applications.
Journal of Computational and Graphical Statistics, 2021
We develop two models for Bayesian estimation and selection in high-order, discrete-state Markov ... more We develop two models for Bayesian estimation and selection in high-order, discrete-state Markov chains. Both are based on the mixture transition distribution, which constructs a transition probability tensor with additive mixing of probabilities from first-order transition matrices. We demonstrate two uses for the proposed models: parsimonious approximation of high-order dynamics by mixing lower-order transition models, and order/lag selection through over-specification and shrinkage via priors for sparse probability vectors. The priors further shrink all models to an identifiable and interpretable parameterization, useful for data analysis. We discuss properties of the models and demonstrate their utility with simulation studies. We further apply the methodology to a data analysis from the high-order Markov chain literature and to a time series of pink salmon abundance in a creek in Alaska, U.S.A.
Statistics and Computing, 2016
Stationary time series models built from parametric distributions are, in general, limited in sco... more Stationary time series models built from parametric distributions are, in general, limited in scope due to the assumptions imposed on the residual distribution and autoregression relationship. We present a modeling approach for univariate time series data, which makes no assumptions of stationarity, and can accommodate complex dynamics and capture nonstandard distributions. The model arises from a Bayesian nonparametric mixture of normals specification for the joint distribution of successive observations in time. This implies a flexible autoregressive form for the conditional transition density, defining a time-homogeneous, nonstationary, Markovian model for real-valued data indexed in discrete-time. To obtain a more computationally tractable algorithm for posterior inference, we utilize a square-root-free Cholesky decomposition of the mixture kernel covariance matrix. Results from simulated data suggest the model is able to recover challenging transition and predictive densities. We also illustrate the model on time intervals between eruptions of the Old Faithful geyser. Extensions to accommodate higher order structure and to develop a state-space model are also discussed.
Statistics and Computing, 2016
Integro-Difference Equations (IDEs) provide a flexible framework for dynamic modeling of spatio-t... more Integro-Difference Equations (IDEs) provide a flexible framework for dynamic modeling of spatio-temporal data. The choice of kernel in an IDE model relates directly to the underlying physical process modeled, and it can affect model fit and predictive accuracy. We introduce Bayesian nonparametric methods to the IDE literature as a means to allow flexibility in modeling the kernel. We propose a mixture of normal distributions for the IDE kernel, built from a spatial Dirichlet process for the mixing distribution, which can model kernels with shapes that change with location. This allows the IDE model to capture non-stationarity with respect to location and to reflect a changing physical process across the domain. We address computational concerns for inference that leverage the use of Hermite polynomials as a basis for the representation of the process and the IDE kernel, and incorporate Hamiltonian Markov chain Monte Carlo steps in the posterior simulation method. An example with synthetic data demonstrates that the model can successfully capture location-dependent dynamics. Moreover, using a data set of ozone pressure, we show that the spatial Dirichlet process mixture model outperforms several alternative models for the IDE kernel, including the state of the art in the IDE literature, that is, a Gaussian kernel with location-dependent parameters.
SSRN Electronic Journal, 2014
We develop a Bayesian nonparametric model to assess the effect of systematic risks on multiple fi... more We develop a Bayesian nonparametric model to assess the effect of systematic risks on multiple financial markets, and apply it to understand the behavior of the S&P500 sector indexes between January 1, 2000 and December 31, 2011. More than prediction, our main goal is to understand the evolution of systematic and idiosyncratic risks in the U.S. economy over this particular time period, leading to novel sectorspecific risk indexes. To accomplish this goal, we model the appearance of extreme losses in each market using a superposition of two Poisson processes, one that corresponds to systematic risks that are shared by all sectors, and one that correspond to the idiosyncratic risk associated with a specific sector. In order to capture changes in the risk structure over time, the intensity functions associated with each of the underlying components are modeled using a Dirichlet process mixture model. Among other interesting results, our analysis of the S&P500 index suggests that there are few idiosyncratic risks associated with the consumer staples sector, whose extreme negative log returns appear to be driven mostly by systematic risks.
Journal of the American Statistical Association, 2009
The CRASH computer model simulates the effect of a vehicle colliding against different barrier ty... more The CRASH computer model simulates the effect of a vehicle colliding against different barrier types. If it accurately represents real vehicle crashworthiness, the computer model can be of great value in various aspects of vehicle design, such as the setting of timing of air bag releases. The goal of this study is to address the problem of validating the computer model for such design goals, based on utilizing computer model runs and experimental data from real crashes. This task is complicated by the fact that (i) the output of this model consists of smooth functional data, and (ii) certain types of collision have very limited data. We address problem (i) by extending existing Gaussian process-based methodology developed for models that produce real-valued output, and resort to Bayesian hierarchical modeling to attack problem (ii).
IEEE Transactions on Geoscience and Remote Sensing, 2008
Research work Research interests Bayesian nonparametric modeling and inference; Analysis of compu... more Research work Research interests Bayesian nonparametric modeling and inference; Analysis of computer model experiments; Inference under probability order constraints; Mixture models; Quantile regression; Spatial statistics; Survival analysis. Applications in ecology and engineering.
IEEE Transactions on Geoscience and Remote Sensing, 2008
Research work Research interests Bayesian nonparametric modeling and inference; Analysis of compu... more Research work Research interests Bayesian nonparametric modeling and inference; Analysis of computer model experiments; Inference under probability order constraints; Mixture models; Quantile regression; Spatial statistics; Survival analysis. Applications in ecology and engineering.
Annals of the Institute of Statistical Mathematics, 1999
Bayesian Analysis, 2022
We develop a Bayesian nonparametric autoregressive model applied to flexibly estimate general tra... more We develop a Bayesian nonparametric autoregressive model applied to flexibly estimate general transition densities exhibiting nonlinear lag dependence. Our approach is related to Bayesian density regression using Dirichlet process mixtures, with the Markovian likelihood defined through the conditional distribution obtained from the mixture. This results in a Bayesian nonparametric extension of a mixtures-of-experts model formulation. We address computational challenges to posterior sampling that arise from the Markovian structure in the likelihood. The base model is illustrated with synthetic data from a classical model for population dynamics, as well as a series of waiting times between eruptions of Old Faithful Geyser. We study inferences available through the base model before extending the methodology to include automatic relevance detection among a pre-specified set of lags. Inference for global and local lag selection is explored with additional simulation studies, and the methods are illustrated through analysis of an annual time series of pink salmon abundance in a stream in Alaska. We further explore and compare transition density estimation performance for alternative configurations of the proposed model.
The mean residual life function is a key functional for a survival distribution. It has practical... more The mean residual life function is a key functional for a survival distribution. It has practically useful interpretation as the expected remaining lifetime given survival up to a particular time point, and it also characterizes the survival distribution. However, it has received limited attention in terms of inference methods under a probabilistic modeling framework. In this paper, we seek to provide general inference methodology for mean residual life regression. Survival data often include a set of predictor variables for the survival response distribution, and in many cases it is natural to include the covariates as random variables into the modeling. We thus propose a Dirichlet process mixture modeling approach for the joint stochastic mechanism of the covariates and survival responses. This approach implies a flexible model structure for the mean residual life of the conditional response distribution, allowing general shapes for mean residual life as a function of covariates g...
We propose a general nonparametric Bayesian framework for binary regression, which is built from ... more We propose a general nonparametric Bayesian framework for binary regression, which is built from modelling for the joint response-covariate distribution. The observed binary responses are assumed to arise from underlying continuous random variables through discretization, and we model the joint distribution of these latent responses and the covariates using a Dirichlet pro-cess mixture of multivariate normals. We show that the kernel of the induced mixture model for the observed data is identifiable upon a restriction on the latent variables. To allow for appro-priate dependence structure while facilitating identifiability, we use a square-root-free Cholesky decomposition of the covariance matrix in the normal mixture kernel. In addition to allowing for the necessary restriction, this modelling strategy provides substantial simplifications in im-plementation of Markov chain Monte Carlo posterior simulation. We illustrate the utility of the modelling approach with two data examples, ...
We propose a prior probability model for two distributions that are ordered according to a stocha... more We propose a prior probability model for two distributions that are ordered according to a stochastic precedence constraint, a weaker restriction than the more commonly utilized stochastic order constraint. The modeling approach is based on structured Dirichlet process mixtures of normal distributions. Full inference for functionals of the stochastic precedence constrained mixture distributions is obtained through a Markov chain Monte Carlo posterior simulation method. A motivating application involves study of the discriminatory ability of continuous diagnostic tests in epidemiologic research. Here, stochastic precedence provides a natural restriction for the distributions of test scores corresponding to the non-infected and infected groups. Inference under the model is illustrated with data from a diagnostic test for Johne's disease in dairy cattle. We also apply the methodology to comparison of survival distributions associated with two distinct conditions, and illustrate with analysis of data on survival time after bone marrow transplantation for treatment of leukemia.
Widely used parametric generalizedlinear models are, unfortunately,a somewhat limited class of sp... more Widely used parametric generalizedlinear models are, unfortunately,a somewhat limited class of speci � cations. Nonparametric aspects are often introduced to enrich this class, resultingin semiparametricmodels. Focusing on single or k-sample problems,many classical nonparametricapproachesare limited to hypothesistesting. Those that allow estimation are limited to certain functionals of the underlying distributions. Moreover, the associated inference often relies upon asymptotics when nonparametric speci � cations are often most appealing for smaller sample sizes. Bayesian nonparametricapproachesavoid asymptotics but have, to date, been limited in the range of inference. Working with Dirichlet process priors, we overcome the limitations of existing simulation-basedmodel � tting approaches which yield inference that is con � ned to posterior moments of linear functionals of the population distribution.This article provides a computationalapproach to obtain the entire posterior distrib...
Statistical Equivalent Models, or SEMs, have recently attracted considerable interest as a genera... more Statistical Equivalent Models, or SEMs, have recently attracted considerable interest as a general approach to study computer simulators. By fitting a statistical model to the simulator’s output, SEMs provide an efficient way to quickly explore the simulator’s result. In this paper, we develop a SEM for random waypoint mobility, one of the most widely used mobility models employed by network simulators in the evaluation of communication protocols for wireless multi-hop ad hoc networks (MANETs). We chose the random waypoint mobility model as a case study of SEMs due to recent results pointing out some serious drawbacks of the model (e.g., [1]). In particular, these studies show that, under the random waypoint mobility regime, average node speed tends to zero in steady state. They also show that average node speed varies considerably from the expected average value for the time scales under consideration in most simulation analysis. In order to investigate further the behavior of the ...
The process of predicting a satellite observation of a vegetated region (e.g. a MODIS scene) invo... more The process of predicting a satellite observation of a vegetated region (e.g. a MODIS scene) involves running a Radiative Transfer Model (RTM). The RTM takes as input various biospheric and illumination parameters and computes the upwelling radiation at the top of the canopy that is ultimately observed by the satellite mounted sensor. The question we address is the following: which of the inputs to the RTM has the greatest impact on the computed observation? We use the Leaf Canopy Model (LCM) RTM as a surrogate for the RTM used as the basis of the MODIS production algorithm. The LCM was designed to study the feasibility of observing leaf chemistry remotely. It takes as input leaf chemistry variables (chlorophyll, water, lignin, cellulose) and canopy structural parameters (leaf area index, leaf angle distribution, soil reflectance, sun angle). The influence of each input variable, or small subsets of the inputs, is captured through the determination of the “main effects”. Computing t...
We develop a prior probability model for temporal Poisson process intensities through structured ... more We develop a prior probability model for temporal Poisson process intensities through structured mixtures of Erlang densities with common scale parameter, mixing on the integer shape parameters. The mixture weights are constructed through increments of a cumulative intensity function which is modeled nonparametrically with a gamma process prior. Such model specification provides a novel extension of Erlang mixtures for density estimation to the intensity estimation setting. The prior model structure supports general shapes for the point process intensity function, and it also enables effective handling of the Poisson process likelihood normalizing term resulting in efficient posterior simulation. The Erlang mixture modeling approach is further elaborated to develop an inference method for spatial Poisson processes. The methodology is examined relative to existing Bayesian nonparametric modeling approaches, including empirical comparison with Gaussian process prior based models, and ...