DPpackage: Bayesian Non- and Semi-parametric Modelling in R (original) (raw)

DPpackage: Bayesian Semi- and Nonparametric Modeling inR

Journal of Statistical Software, 2011

Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian nonparametric and semiparametric models in R, DPpackage. Currently, DPpackage includes models for marginal and conditional density estimation, receiver operating characteristic curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison and for eliciting the precision parameter of the Dirichlet process prior, and a general purpose Metropolis sampling algorithm. To maximize computational efficiency, the actual sampling for each model is carried out using compiled C, C++ or Fortran code.

DPpackage: Bayesian Semi and Nonparametric Modeling in R

2011

Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian nonparametric and semiparametric models in R, DPpackage. Currently, DPpackage includes models for marginal and conditional density estimation, receiver operating characteristic curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison and for eliciting the precision parameter of the Dirichlet process prior, and a general purpose Metropolis sampling algorithm. To maximize computational efficiency, the actual sampling for each model is carried out using compiled C, C++ or Fortran code.

Parametric and nonparametric Bayesian model specification: A case study involving models for count data

Computational Statistics & Data Analysis, 2008

In this paper we present the results of a simulation study to explore the ability of Bayesian parametric and nonparametric models to provide an adequate fit to count data, of the type that would routinely be analyzed parametrically either through fixed-effects or random-effects Poisson models. The context of the study is a randomized controlled trial with two groups (treatment and control). Our nonparametric approach utilizes several modeling formulations based on Dirichlet process priors. We find that the nonparametric models are able to flexibly adapt to the data, to offer rich posterior inference, and to provide, in a variety of settings, more accurate predictive inference than parametric models.

More Nonparametric Bayesian Models for Biostatistics

2009

In this companion chapter to Dunson (2009) we discuss and extend some of the models and inference approaches introduced there. We elaborate on the discussion of random partition priors implied by the Dirichlet process (DP). We review some additional variations of dependent DP (DDP) models and we review in more detail the PT prior used briefly in Dunson (2009). Finally, we review variation of DP models for data formats beyond continuous responses.

The R Package MitISEM: Efficient and Robust Simulation Procedures for Bayesian Inference

SSRN Electronic Journal, 2000

This paper presents the R package MitISEM (mixture of t by importance sampling weighted expectation maximization) which provides an automatic and flexible two-stage method to approximate a non-elliptical target density kernel-typically a posterior density kernel-using an adaptive mixture of Student t densities as approximating density. In the first stage a mixture of Student t densities is fitted to the target using an expectation maximization algorithm where each step of the optimization procedure is weighted using importance sampling. In the second stage this mixture density is a candidate density for efficient and robust application of importance sampling or the Metropolis-Hastings (MH) method to estimate properties of the target distribution. The package enables Bayesian inference and prediction on model parameters and probabilities, in particular, for models where densities have multi-modal or other non-elliptical shapes like curved ridges. These shapes occur in research topics in several scientific fields. For instance, analysis of DNA data in bio-informatics, obtaining loans in the banking sector by heterogeneous groups in financial economics and analysis of education's effect on earned income in labor economics. The package MitISEM provides also an extended algorithm, 'sequential Mi-tISEM', which substantially decreases computation time when the target density has to be approximated for increasing data samples. This occurs when the posterior or predictive density is updated with new observations and/or when one computes model probabilities using predictive likelihoods. We illustrate the MitISEM algorithm using three canonical statistical and econometric models that are characterized by several types of non-elliptical posterior shapes and that describe well-known data patterns in econometrics and finance. We show that MH using the candidate density obtained by MitISEM outperforms, in terms of numerical efficiency, MH using a simpler candidate, as well as the Gibbs sampler. The MitISEM approach is also used for Bayesian model comparison using predictive likelihoods.

Bayesian Statistics from Methods to Models and Applications

Springer Proceedings in Mathematics & Statistics, 2015

Bayesian nonparametric marginal methods are very popular since they lead to fairly easy implementation due to the formal marginalization of the infinitedimensional parameter of the model. However, the straightforwardness of these methods also entails some limitations: they typically yield point estimates in the form of posterior expectations, but cannot be used to estimate non-linear functionals of the posterior distribution, such as median, mode or credible intervals. This is particularly relevant in survival analysis where non-linear functionals such as e.g. the median survival time, play a central role for clinicians and practitioners. The main goal of this paper is to summarize the methodology introduced in Arbel et al (2015) for hazard mixture models in order to draw approximate Bayesian inference on survival functions that is not limited to the posterior mean. In addition, we propose a practical implementation of an R package called momentify designed for moment-based density approximation, and, by means of an extensive simulation study, we thoroughly compare the introduced methodology with standard marginal methods and empirical estimation.

Bayesian Inference in Semiparametric Mixed Models for Longitudinal Data

Biometrics, 2010

We consider Bayesian inference in semiparametric mixed models (SPMMs) for longitudinal data. SPMMs are a class of models that use a nonparametric function to model a time effect, a parametric function to model other covariate effects, and parametric or nonparametric random effects to account for the within-subject correlation. We model the nonparametric function using a Bayesian formulation of a cubic smoothing spline, and the random effect distribution using a normal distribution and alternatively a nonparametric Dirichlet process (DP) prior. When the random effect distribution is assumed to be normal, we propose a uniform shrinkage prior (USP) for the variance components and the smoothing parameter. When the random effect distribution is modeled nonparametrically, we use a DP prior with a normal base measure and propose a USP for the hyperparameters of the DP base measure. We argue that the commonly assumed DP prior implies a non-zero mean of the random effect distribution, even when a base measure with mean zero is specified. This implies weak identifiability for the fixed effects, and can therefore lead to biased estimators and poor inference for the regression coefficients and the spline estimator of the nonparametric function. We propose an adjustment using a post-processing technique. We show that under mild conditions the posterior is proper under the proposed USP, a flat prior for the fixed effect parameters, and an improper prior for the residual variance. We illustrate the proposed approach using a longitudinal hormone dataset, and carry out extensive simulation studies to compare its finite sample performance with existing methods.

A nonparametric Bayesian model for inference in related longitudinal studies

Journal of the Royal Statistical Society: Series C (Applied Statistics), 2005

We discuss a method for combining different but related longitudinal studies to improve predictive precision. The motivation is to borrow strength across clinical studies in which the same measurements are collected at different frequencies. Key features of the data are heterogeneous populations and an unbalanced design across three studies of interest. The first two studies are phase I studies with very detailed observations on a relatively small number of patients. The third study is a large phase III study with over 1500 enrolled patients, but with relatively few measurements on each patient. Patients receive different doses of several drugs in the studies, with the phase III study containing significantly less toxic treatments. Thus, the main challenges for the analysis are to accommodate heterogeneous population distributions and to formalize borrowing strength across the studies and across the various treatment levels. We describe a hierarchical extension over suitable semiparametric longitudinal data models to achieve the inferential goal. A nonparametric random-effects model accommodates the heterogeneity of the population of patients. A hierarchical extension allows borrowing strength across different studies and different levels of treatment by introducing dependence across these nonparametric random-effects distributions. Dependence is introduced by building an analysis of variance (ANOVA) like structure over the random-effects distributions for different studies and treatment combinations. Model structure and parameter interpretation are similar to standard ANOVA models. Instead of the unknown normal means as in standard ANOVA models, however, the basic objects of inference are random distributions, namely the unknown population distributions under each study. The analysis is based on a mixture of Dirichlet processes model as the underlying semiparametric model.

Efficient Bayesian Modeling of Binary and Categorical Data in R: The UPG Package

arXiv (Cornell University), 2021

We introduce the UPG package for highly efficient Bayesian inference in probit, logit, multinomial logit and binomial logit models. UPG offers a convenient estimation framework for balanced and imbalanced data settings where sampling efficiency is ensured through Markov chain Monte Carlo boosting methods. All sampling algorithms are implemented in C++, allowing for rapid parameter estimation. In addition, UPG provides several methods for fast production of output tables and summary plots that are easily accessible to a broad range of users.

A computational approach for full nonparametric Bayesian inference under Dirichlet process mixture models

2002

Widely used parametric generalizedlinear models are, unfortunately,a somewhat limited class of speci cations. Nonparametric aspects are often introduced to enrich this class, resulting in semiparametricmodels. Focusing on single or k-sample problems, many classical nonparametricapproachesare limited to hypothesis testing. Those that allow estimation are limited to certain functionals of the underlying distributions. Moreover, the associated inference often relies upon asymptotics when nonparametric speci cations are often most appealing for smaller sample sizes. Bayesian nonparametric approaches avoid asymptotics but have, to date, been limited in the range of inference. Working with Dirichlet process priors, we overcome the limitations of existing simulation-based model tting approaches which yield inference that is con ned to posterior moments of linear functionals of the population distribution. This article provides a computational approach to obtain the entire posterior distribution for more general functionals. We illustrate with three applications: investigation of extreme value distributions associated with a single population, comparison of medians in a k-sample problem, and comparison of survival times from different populations under fairly heavy censoring.