DPpackage: Bayesian Non- and Semi-parametric Modelling in R (original) (raw)

Journal of Statistical Software

DPpackage: Bayesian Non- and Semi-parametric Modelling in R

Alejandro Jara
University of Minnesota
Fernando A. Quintana
Pontificia Universidad Católica de Chile

Peter Müller Gary L. Rosner
University of Texas M.D. Anderson Cancer Center

Abstract

Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian non- and semi-parametric models in R, DPpackage. Currently DPpackage includes models for marginal and conditional density estimation, ROC curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison, and for eliciting the precision parameter of the Dirichlet process prior. To maximize computational efficiency, the actual sampling for each model is carried out using compiled FORTRAN.

Keywords: Bayesian semiparametric analysis, Random probability measures, Random functions, Markov chain Monte Carlo, R.

1. Introduction

In many practical situations, a parametric model cannot be expected to coherently describe the chance mechanism generating an observed dataset. Unrealistic features of some common models (e.g., the thin tails of the normal distribution when compared to the distribution of the observed data) can lead to unsatisfactory inferences. Constraining the analysis to a specific parametric form may limit the scope and type of inferences that can be drawn from such models. In these situations, we would like to relax

parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of a parametric statistical model. In the Bayesian context such flexible inference is typically achieved by placing a prior distribution on infinite-dimensional spaces, such as the space of all probability distributions for a random variable of interest. These models are usually referred to as Bayesian nonparametric (BNP) or semiparametric (BSP) models depending on whether all or at least one of the parameters is infinite dimensional (see, e.g. Dey, Müller, and Sinha, 1998; Walker, Damien, Laud, and Smith, 1999; Ghosh and Ramamoorthi, 2003; Müller and Quintana, 2004; Hanson, Branscum, and Johnson, 2005).

BNP is a relatively young research area in statistics. First advances were made in the sixties and seventies, and were primarily mathematical formulations. It was only in the early nineties with the advent of sampling based methods, in particular Markov Chain Monte Carlo (MCMC) methods, that substantial progress has been made. Posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. The introduction of MCMC methods in the area began with the work of Escobar (1994) for Dirichlet process mixtures. A number of themes are still undergoing development, including issues in theory, methodology and applications. We refer to Walker et al. (1999), Müller and Quintana (2004) and Hanson et al. (2005) for recent overviews.

While BNP and BSP are extremely powerful and have a wide range of applicability, they are not as widely used as one might expect. One reason for this has been the gap between the type of software that many applied users would like to have for fitting models and the software that is currently available. The most general programs currently available for Bayesian inference are BUGS (see, e.g. Gilks, Thomas, and Spiegelhalter, 1994) and OpenBugs (Thomas, O’Hara, Ligges, and Sibylle, 2006). BUGS can be accessed from the publicly available R program (R Development Core Team, 2009), using the R2WinBUGS package (Strurtz, Ligges, and Gelman, 2005). OpenBugs can run on Windows and Linux, as well as from inside R. In addition, various R packages exist that directly fit particular Bayesian models. We refer to Appendix C in Carlin and Louis (2008), for an extensive list of software for Bayesian modeling. Although the number of fully Bayesian programs continues to burgeon, with many available at little or no cost, they generally do not include semiparametric models. An exception to this rule is the R package bayesm (Rossi, Allenby, and McCulloch, 2005; Rossi and McCulloch, 2008), including functions for some models based on Dirichlet process priors (Ferguson, 1973). The range of different Bayesian semiparametric models is huge. It is practically impossible to build flexible and efficient software for the generality of such models.
In this paper we present an up to date introduction to a publicly available R (R Development Core Team, 2009) package designed to help bridging the previously mentioned gap, the DPpackage, originally presented in Jara (2007). Although the name of the package is due to the most widely used prior on the space of the probability distributions, the Dirichlet Process (DP) (Ferguson, 1973), the package includes many other priors on function spaces. Currently, DPpackage includes models considering DP (Ferguson, 1973), mixtures of DP (MDP) (Antoniak, 1974), DP mixtures (DPM) (Lo, 1984; Escobar and West, 1995), linear dependent DP (LDDP) (De Iorio, Müller, Rosner, and MacEachern, 2004; De Iorio, Johnson, Müller, and Rosner, 2009), weight dependent DP (WDDP) (Müller, Erkanli, and West, 1996), hierarchical mixture of DPM of normals (HDPM) (Müller, Quintana, and Rosner, 2004), centrally standardized DP (CSDP) (Newton, Czado, and Chapell, 1996), Polya Trees (PT) (Ferguson, 1974; Mauldin, Sudderth, and Williams, 1992; Lavine, 1992, 1994), mixtures of Polya trees (MPT) (Lavine, 1992, 1994; Hanson and Johnson, 2002; Hanson, 2006; Christensen, Hanson, and Jara, 2008), mixtures of triangular distributions (Perron and Mengersen, 2001), and random Bernstein polynomials (Petrone, 1999a,b; Petrone and Wasserman, 2002). The package also includes models considering Penalized B-Splines (Lang and Brezger, 2004).

The article is organized as follows. Section 2 reviews the general syntax and design philosophy. Although the material in this section was presented in Jara (2007), its inclusion here is necessary in order to make the paper self-contained. In Section 3 the available functions are described in detail. In Section 4 the main features and usages of DPpackage are illustrated by means of simulated and real life data analyses. We conclude with additional comments and discussion in Section 5.

2. Design philosophy and general syntax

The design philosophy behind DPpackage is quite different from the one of a general purpose language. The most important design goal has been the implementation of model-specific MCMC algorithms. A direct benefit of this approach is that the sampling algorithms can be made dramatically more efficient than in a generic environment.
Fitting a model in DPpackage begins with a call to an R function, for instance, DPmodel, or PImodel. Here “model” denotes a descriptive name for the model being fitted. Typically, the model function will take a number of arguments that control the specific MCMC sampling strategy adopted. In addition, the model(s) formula(s), data, and prior parameters are passed to the model function as arguments. The common elements in any model function are:
i) prior: an object list which includes the values of the prior hyper-parameters.
ii) mcmc: an object list which must include the integers nburn giving the number of burn-in scans, nskip giving the thinning interval, nsave giving the total number of scans to be saved, and ndisplay giving the number of saved scans to be displayed on screen: the function reports on the screen when every ndisplay iterations have been carried out and returns the process’s runtime in seconds. For some specific models, one or more tuning parameters for Metropolis steps may be needed and must be included in this list. The names of these tuning parameters are explained in each specific model description in the associated help files.
iii) state: an object list giving the current value of the parameters, when the analysis is the continuation of a previous analysis, or giving the starting values for a new Markov chain, which is useful to run multiple chains starting from different points.
iv) status: a logical variable indicating whether it is a new run (TRUE) or the continuation of a previous analysis (FALSE). In the latter case the current value of the parameters must be specified in the object state.

Inside the R model function the inputs are organized in a more useable form, the MCMC sampling is performed by calling a shared library written in a compiled language, and the posterior sample is summarized, labeled, assigned into an output list, and returned. The output list includes:
i) state: a list of objects containing the current value of the parameters.
ii) save.state: a list of objects containing the MCMC samples for the parameters. This list contains two matrices randsave and thetasave which contain the MCMC samples of the variables with random distribution (errors, random effects, etc.) and the parametric part of the model, respectively.

In order to exemplify the extraction of the output elements, consider the abstract model fit:

fit <- DPmodel(…, prior, mcmc, state, status, …)

The lists can be extracted using the following code:

fit$state
fit$save.state$randsave
fit$save.state$thetasave

Based on these output objects, it is possible to use, for instance, the boa (Smith, 2007) or the coda (Plummer, Best, Cowles, and Vines, 2006) R packages to perform convergence diagnostics. For illustration, we consider the coda package here. It requires a matrix of posterior draws for relevant parameters to be saved as a mcmc object. Assume that we have obtained fit1, fit2, and fit3, by independently running a model function three times, specifying different starting values each. To compute the Gelman-Rubin convergence diagnostic statistic for the first parameter stored in the thetasave object, the following commands may be used:

library(coda)
coda.obj <- mcmc.list(
    chain1=mcmc(fit1$save.state$thetasave[,1]),
    chain2=mcmc(fit2$save.state$thetasave[,1]),
    chain3=mcmc(fit3$save.state$thetasave[,1]))
gelman.diag(coda.obj, transform = TRUE)

Note that the second command line saves the results as a mcmc. list object of class and the third command line computes the Gelman-Rubin statistic from these three chains.
Generic R functions such as print, plot, summary, and anova have methods to display the results of the DPpackage model fit. The function print displays the posterior means of the parameters in the model, and summary displays posterior summary statistics (mean, median, standard deviation, naive standard errors, and credibility intervals). By default, the function summary computes the 95%95 \% HPD intervals using the Monte Carlo method proposed by Chen and Shao (1999). The user can display the order statistic estimator of the 95%95 \% credible interval by using the following code,
summary(fit, hpd=FALSE)
The plot function displays the trace plots and a kernel-based estimate of the posterior distribution for the parameters of the model. Similarly to summary, the plot function displays the 95%95 \% HPD regions in the density plot and the posterior mean. The same plot but considering the the 95%95 \% credible region can be obtained by using,
plot(fit, hpd=FALSE)
The anova function computes simultaneous credible regions for a vector of parameters from the MCMC sample using the method described by Besag, Green, Higdon, and Mengersen (1995). The output of the anova function is an anova-like table containing the pseudo-contour probabilities for each of the factors included in the linear part of the model.

3. Implemented Models

In this section we describe in detail the functions available in version 1.0-8 of DPpackage.

3.1. Marginal density estimation

DPdensity, PTdensity, TDPdensity, and BDPdensity functions implement models for marginal density estimation using DPM of normals, MPT, triangular-Dirichlet, and a BernsteinDirichlet prior, respectively. The first two functions allow the user to fit uni- and multi-variate models. We next introduce the notation used for each model along with the associated computational approaches used to fit the models.

Dirichlet Process Mixtures of Normals

The DPdensity function considers the multivariate extension of the univariate DPM of normals model presented in Escobar and West (1995). Let yi\boldsymbol{y}_{i} be a kk-dimensional vector of measurements for the ii th subject, i=1,…,ni=1, \ldots, n. The model assumes

yi∣G∼iid∫Nk(yi∣μ,Σ)dG(μ,Σ)\boldsymbol{y}_{i} \mid G \stackrel{i i d}{\sim} \int N_{k}\left(\boldsymbol{y}_{i} \mid \boldsymbol{\mu}, \boldsymbol{\Sigma}\right) d G(\boldsymbol{\mu}, \boldsymbol{\Sigma})

and

G∣α,G0∼DP(αG0)G \mid \alpha, G_{0} \sim D P\left(\alpha G_{0}\right)

where, the baseline distribution, G0G_{0}, corresponds to the conjugate normal-inverted-Wishart distribution

G0≡Nk(μ∣m1,κ0−1Σ)IWk(Σ∣ν1,Ψ1)G_{0} \equiv N_{k}\left(\boldsymbol{\mu} \mid \boldsymbol{m}_{1}, \kappa_{0}^{-1} \boldsymbol{\Sigma}\right) I W_{k}\left(\boldsymbol{\Sigma} \mid \nu_{1}, \boldsymbol{\Psi}_{1}\right)

To complete the model specification, the following independent hyper-priors are assumed,

α∼Γ(a0,b0)m1∣m2,S2∼Nk(m2,S2)κ0∣τ1,τ2∼Γ(τ1/2,τ2/2)\begin{gathered} \alpha \sim \Gamma\left(a_{0}, b_{0}\right) \\ \boldsymbol{m}_{1} \mid \boldsymbol{m}_{2}, \boldsymbol{S}_{2} \sim N_{k}\left(\boldsymbol{m}_{2}, \boldsymbol{S}_{2}\right) \\ \kappa_{0} \mid \tau_{1}, \tau_{2} \sim \Gamma\left(\tau_{1} / 2, \tau_{2} / 2\right) \end{gathered}

and

Ψ1∣ν2,Ψ2∼IWk(ν2,Ψ2)\boldsymbol{\Psi}_{1} \mid \nu_{2}, \boldsymbol{\Psi}_{2} \sim I W_{k}\left(\nu_{2}, \boldsymbol{\Psi}_{2}\right)

Note that the inverted-Wishart prior, W∣ν,Ψ∼IWk(ν,Ψ)\boldsymbol{W} \mid \nu, \boldsymbol{\Psi} \sim I W_{k}(\nu, \boldsymbol{\Psi}), is parameterized such that E(W)=E(\boldsymbol{W})= 1ν−k−1Ψ−1\frac{1}{\nu-k-1} \boldsymbol{\Psi}^{-1}.
The computation implementation is based on the marginalized version of the model where the random probability measure GG is integrated out. Although the baseline distribution, G0G_{0}, is a conjugate prior in this model specification, the algorithms with auxiliary parameters described in MacEachern and Müller (1998) and Neal (2000) are adopted. Specifically, the no-gaps algorithm of MacEachern and Müller (1998) and algorithm 8 of Neal (2000), with m=1m=1, are considered. The default method is algorithm 8 of Neal (2000).

Mixtures of Polya trees

The current implementation of the PTdensity function considers a MPT model as in Hanson (2006). As in the previous section, let yi\boldsymbol{y}_{i} be a kk-dimensional vector of measurements for the ii th subject, i=1,…,ni=1, \ldots, n. The model assumes

yi∣G∼ iid G\boldsymbol{y}_{i} \mid G \stackrel{\text { iid }}{\sim} G

and

G∣α,μ,Σ,M∼PTM(Πμ,Σ,Aα)G \mid \alpha, \boldsymbol{\mu}, \boldsymbol{\Sigma}, M \sim P T^{M}\left(\Pi^{\boldsymbol{\mu}, \boldsymbol{\Sigma}}, \mathcal{A}^{\alpha}\right)

where MM is the maximum level of the partition to be updated (the default value is M=∞M=\infty ), Πμ,Σ=\Pi^{\boldsymbol{\mu}, \boldsymbol{\Sigma}}= {πj}j≥0\left\{\pi_{j}\right\}_{j \geq 0} is a set of partitions of Rk\mathbb{R}^{k}, indexed by μ\boldsymbol{\mu} and Σ\boldsymbol{\Sigma}, and Aα\mathcal{A}^{\alpha} is a family of non-negative vectors controlling the variability of the process indexed by α\alpha. Following Hanson (2006), the PT is centered around the Nk(μ,Σ)N_{k}(\boldsymbol{\mu}, \boldsymbol{\Sigma}) distribution by taking

Aα={γα(j,r):r∈{1,…,2j−1}k,j=1,…}\mathcal{A}^{\alpha}=\left\{\boldsymbol{\gamma}^{\alpha}(j, \boldsymbol{r}): \boldsymbol{r} \in\left\{1, \ldots, 2^{j-1}\right\}^{k}, j=1, \ldots\right\}

with γα(j,r)=αj21k\gamma^{\alpha}(j, \boldsymbol{r})=\alpha j^{2} \mathbf{1}_{k}, and further taking each level jj of the sequence of partitions in Πμ,Σ\Pi^{\boldsymbol{\mu}, \boldsymbol{\Sigma}}, as the sets arising from a location-scale transformation μ+Σ1/2z\boldsymbol{\mu}+\boldsymbol{\Sigma}^{1 / 2} \boldsymbol{z} of the Cartesian products of intervals obtained as quantiles from the standard univariate normal distribution, where Σ1/2\boldsymbol{\Sigma}^{1 / 2} is the Cholesky decomposition of Σ\boldsymbol{\Sigma}. Notice that we consider a different parameterization than the one considered by Hanson (2006), were Σ1/2\boldsymbol{\Sigma}^{1 / 2} is taken to be the unique symmetric square root of Σ\boldsymbol{\Sigma}. The base sets for level jj are given by

B0(j,p)=(Φ−1((p1−1)/2j),Φ−1((p1)/2j)]×⋯×(Φ−1((pk−1)/2j),Φ−1((pk)/2j)]B_{0}(j, \boldsymbol{p})=\left(\Phi^{-1}\left(\left(p_{1}-1\right) / 2^{j}\right), \Phi^{-1}\left(\left(p_{1}\right) / 2^{j}\right)\right] \times \cdots \times\left(\Phi^{-1}\left(\left(p_{k}-1\right) / 2^{j}\right), \Phi^{-1}\left(\left(p_{k}\right) / 2^{j}\right)\right]

for vectors p=(p1,…,pk)\boldsymbol{p}=\left(p_{1}, \ldots, p_{k}\right) with pi∈{1,…,2j},i=1,…,kp_{i} \in\left\{1, \ldots, 2^{j}\right\}, i=1, \ldots, k. The location-scale transformation applied to each base set yields the final sets B(j,p)={μ+Σ1/2z:z∈B0(j,p)}B(j, \boldsymbol{p})=\left\{\boldsymbol{\mu}+\boldsymbol{\Sigma}^{1 / 2} \boldsymbol{z}: \boldsymbol{z} \in B_{0}(j, \boldsymbol{p})\right\}, such that πj=\pi_{j}= {B(j,p):p∈{1,…,2j}k}\left\{B(j, \boldsymbol{p}): \boldsymbol{p} \in\left\{1, \ldots, 2^{j}\right\}^{k}\right\}
The model specification is completed by assuming the following hyper-priors

p(μ,Σ)∝∣Σ∣−(k+1)/2p(\boldsymbol{\mu}, \boldsymbol{\Sigma}) \propto|\boldsymbol{\Sigma}|^{-(k+1) / 2}

and

α∣a0,b0∼Γ(a0,b0)\alpha \mid a_{0}, b_{0} \sim \Gamma\left(a_{0}, b_{0}\right)

As noticed by Jara, Hanson, and Lesaffre (2009), the PT prior specification is dependent on the square root of the centering covariance matrix considered to define the partitions sets. Indeed, in the Nk(μ,Σ)N_{k}(\boldsymbol{\mu}, \boldsymbol{\Sigma})-centered multivariate extension considered by Hanson (2006), the direction of the sets are completely defined through the decomposition of the covariance matrix by the unique symmetric square root. In the context of multivariate random effects distributions, Jara et al. (2009) proposed a novel mixture of PT priors where the effect of the partitions is smoothed over by mixing over the decomposition of the centering covariance matrix (see, Section 3.2). This option will be considered in future version of the package.
For univariate analyses using a finite (M<∞)(M<\infty) PT, a full version of the model is considered where the Dirichlet vectors are updated during the MCMC scheme. For univariate analysis with a fully specified

PT(M=∞)\mathrm{PT}(M=\infty) and for multivariate analyses, a marginalized version of the model is considered, where the random probability measure GG is integrated out. The baseline parameters μ\boldsymbol{\mu} and Σ\boldsymbol{\Sigma}, and the precision parameter α\alpha are updated using Metropolis-Hastings (MH) steps (Tierney, 1994).

Bernstein-Dirichlet prior

The function BDPdensity consider density estimation using a Bernstein-Dirichlet prior (BDP) proposed by Petrone (1999a,b). For a continuous cdf GG on (0,1](0,1], the associated Bernstein polynomial (BP) is defined as

B(x∣k,G)=∑j=0kG(j/k)(kj)xj(1−x)k−jB(x \mid k, G)=\sum_{j=0}^{k} G(j / k)\binom{k}{j} x^{j}(1-x)^{k-j}

which is a mixture of beta distributions. Its density is given by

b(x∣k,G)=∑j=1k(G(j/k)−G((j−1)/k))β(x∣j,k−j+1)b(x \mid k, G)=\sum_{j=1}^{k}(G(j / k)-G((j-1) / k)) \beta(x \mid j, k-j+1)

where β(x∣j,k−j+1)\beta(x \mid j, k-j+1) stands for a beta density with parameters jj and k−j+1k-j+1. Petrone (1999a,b) proposed a hierarchical prior, called the Bernstein polynomial prior (BPP), where the random density f(⋅)f(\cdot) is given by the following mixture of beta densities,

f(x)=∑j=1kwj,kβ(x∣j,k−j+1)f(x)=\sum_{j=1}^{k} w_{j, k} \beta(x \mid j, k-j+1)

where wj,k=G(j/k)−G((j−1)/k),kw_{j, k}=G(j / k)-G((j-1) / k), k as probability mass function ρ(⋅)\rho(\cdot), and given k,wk=k, \boldsymbol{w}_{k}= (w1,k,…,wk,k)\left(w_{1, k}, \ldots, w_{k, k}\right) has distribution Hk(⋅)H_{k}(\cdot) on the kk-dimensional simplex

Δk={(ω1,…,ωk):0≤ωj≤1,j=1,…,k,∑j=1kωj=1}\Delta_{k}=\left\{\left(\omega_{1}, \ldots, \omega_{k}\right): 0 \leq \omega_{j} \leq 1, j=1, \ldots, k, \sum_{j=1}^{k} \omega_{j}=1\right\}

Petrone (1999a,b) called expression (1) the Bernstein polynomial density with parameters kk and wk\boldsymbol{w}_{k}, and shows that to assume wk=(w1,k,…,wk,k)∼Dirichlet⁡(ζ1,k,…,ζk,k)\boldsymbol{w}_{k}=\left(w_{1, k}, \ldots, w_{k, k}\right) \sim \operatorname{Dirichlet}\left(\zeta_{1, k}, \ldots, \zeta_{k, k}\right), with ζj,k=\zeta_{j, k}= α(G0(j/k)−G0((j−1)/k)),j=1,…,k,G0\alpha\left(G_{0}(j / k)-G_{0}((j-1) / k)\right), j=1, \ldots, k, G_{0} a probability distribution on (0,1](0,1] and α\alpha a positive constant, is equivalent to assume that G∣α,G0∼DP(αG0)G \mid \alpha, G_{0} \sim D P\left(\alpha G_{0}\right). Petrone (1999a,b) refers to this as the Bernstein-Dirichlet prior (BDP) and discussed MCMC algorithms to scan the posterior distribution.
Our MCMC implementation is similar to the one described by Petrone (1999a,b) but adds the resampling step described by Bush and MacEachern (1996) for Dirichlet process mixture models. The function BDPdensity considers

yi∣G∼ iid Gy_{i} \mid G \stackrel{\text { iid }}{\sim} G

and

G∣kmax⁡,α,G0∼BDP(kmax⁡,αG0)G \mid k \max , \alpha, G_{0} \sim B D P\left(k \max , \alpha G_{0}\right)

where yiy_{i} is the data transformed to lie in (0,1](0,1] and G0=Beta⁡(a0,b0)G_{0}=\operatorname{Beta}\left(a_{0}, b_{0}\right). It is further assumed that

α∣aa0,ab0∼Beta⁡(aa0,ab0)\alpha \mid a a_{0}, a b_{0} \sim \operatorname{Beta}\left(a a_{0}, a b_{0}\right)

and

k∣kmax⁡∼DU({1,…,kmax⁡})k \mid k \max \sim D U(\{1, \ldots, k \max \})

where DU(A)D U(A) refers to the discrete uniform distribution on the set AA. Although BDP are naturally defined as probability models for distributions on the unit interval (0,1](0,1], different measurable mappings could be considered to transform the data when the support is not the unit interval. With this aim we consider the uniform CDF on the range of the data.

Mixtures of triangular distributions

The TDPdensity function considers a triangular-Dirichlet prior (TDP) for univariate density estimation. The logic behind the TDP is similar to the BDP construction but replaces the beta kernels in the mixture model by triangular distributions as proposed by Perron and Mengersen (2001). The model is given by

yi∣G∼i id Gy_{i} \mid G \stackrel{i \text { id }}{\sim} G

and

G∣kmax⁡,α,G0∼TDP(kmax⁡,αG0)α∣aa0,ab0∼Beta⁡(aa0,ab0)\begin{gathered} G \mid k \max , \alpha, G_{0} \sim T D P\left(k \max , \alpha G_{0}\right) \\ \alpha \mid a a_{0}, a b_{0} \sim \operatorname{Beta}\left(a a_{0}, a b_{0}\right) \end{gathered}

and

k∣kmax⁡∼DU({1,…,kmax⁡})k \mid k \max \sim D U(\{1, \ldots, k \max \})

where yiy_{i} is the data transformed to lie in (0,1],kmax⁡(0,1], k \max is the upper limit of the discrete uniform prior or the number of components in the mixture of Triangular distributions, α\alpha is the total mass parameter of the Dirichlet process component, and G0G_{0} is the centering distribution of the DP. The centering distribution corresponds to a G0=Beta⁡(a0,b0)G_{0}=\operatorname{Beta}\left(a_{0}, b_{0}\right) distribution.
Our representation is equivalent to the mixture of triangular distributions proposed by Perron and Mengersen (2001), with random weights following a Dirichlet prior. However, in this function we exploit the underlying DP structure, thus avoiding the use of Reversible-Jump algorithms (Green, 1995). In fact, the same MCMC algorithm considered for the BDP prior is implemented in the TDPdensity function.

3.2. Nonparametric random effects distributions in mixed effects models

Assume that for each of mm experimental units the regression data (Yij,xij,zij),1≤i≤m,1≤\left(Y_{i j}, \boldsymbol{x}_{i j}, \boldsymbol{z}_{i j}\right), 1 \leq i \leq m, 1 \leq j≤nij \leq n_{i}, is recorded, where YijY_{i j} is a response variable, and xij∈Rp\boldsymbol{x}_{i j} \in \mathbb{R}^{p} and zij∈Rq\boldsymbol{z}_{i j} \in \mathbb{R}^{q} are vectors of pp and qq explanatory variables, respectively. Let Yi=(Yi1,…,Yini)T,Xi=(xi1,…,xini)T\boldsymbol{Y}_{i}=\left(Y_{i 1}, \ldots, Y_{i n_{i}}\right)^{T}, \boldsymbol{X}_{i}=\left(\boldsymbol{x}_{i 1}, \ldots, \boldsymbol{x}_{i n_{i}}\right)^{T}, and Zi=(zi1,…,zini)T,i=1,…,m\boldsymbol{Z}_{i}=\left(\boldsymbol{z}_{i 1}, \ldots, \boldsymbol{z}_{i n_{i}}\right)^{T}, i=1, \ldots, m. The observations are assumed to be conditionally independent with exponential family distribution,

p(Yij∣ϑij,τ)=exp⁡{[Yijϑij−b(ϑij)]/τ}c(Yij,τ)p\left(Y_{i j} \mid \vartheta_{i j}, \tau\right)=\exp \left\{\left[Y_{i j} \vartheta_{i j}-b\left(\vartheta_{i j}\right)\right] / \tau\right\} c\left(Y_{i j}, \tau\right)

The means μij=E(Yij∣ϑij,τ)\mu_{i j}=E\left(Y_{i j} \mid \vartheta_{i j}, \tau\right) and variances σij2=Var⁡(Yij∣ϑij,τ)\sigma_{i j}^{2}=\operatorname{Var}\left(Y_{i j} \mid \vartheta_{i j}, \tau\right) are related to the canonical ϑij\vartheta_{i j} and dispersion parameter τ\tau via μij=b′(ϑij)\mu_{i j}=b^{\prime}\left(\vartheta_{i j}\right) and σij2=τb′′(ϑij)\sigma_{i j}^{2}=\tau b^{\prime \prime}\left(\vartheta_{i j}\right), respectively. The means μij\mu_{i j}

are related to the pp-dimensional and qq-dimensional “fixed” effects vectors βF\boldsymbol{\beta}^{F} and βR\boldsymbol{\beta}^{R}, respectively, and the qq-dimensional “random” effects vector bi\boldsymbol{b}_{i} via the link relation

h(μij)=ηij=xij′βF+zij′βR+zij′bih\left(\mu_{i j}\right)=\eta_{i j}=\boldsymbol{x}_{i j}^{\prime} \boldsymbol{\beta}^{F}+\boldsymbol{z}_{i j}^{\prime} \boldsymbol{\beta}^{R}+\boldsymbol{z}_{i j}^{\prime} \boldsymbol{b}_{i}

where, h(⋅)h(\cdot) is a known monotonic differentiable link function, and ηij\eta_{i j} is called the linear predictor. Due to software limitations, the analyses are often restricted to the setting in which the random effects follow a multivariate normal distribution, b1,…,bm∣Σ∼iidNq(0,Σ)\boldsymbol{b}_{1}, \ldots, \boldsymbol{b}_{m} \mid \boldsymbol{\Sigma} \stackrel{i i d}{\sim} N_{q}(\mathbf{0}, \boldsymbol{\Sigma}). In this context, Bayesian nonparametric extensions incorporate a probability model for the random effects distribution in order to better represent the distributional uncertainty and to avoid the effects of the miss-specification of an arbitrary parametric random effects distribution. Bush and MacEachern (1996) and Kleinman and Ibrahim (1998b) describe Bayesian semiparametric versions of the linear mixed model considering DP prior for the random effects distribution. Under this approach the DP prior is centered at a normal base mesure with zero mean. Similar approaches were considered by Mukhopadhyay and Gelfand (1997) and Kleinman and Ibrahim (1998a) in the context of GLMM. In order to avoid the discrete nature of the DP realizations, Müller and Rosner (1997) consider a DPM of normals model in the context of a normal nonlinear mixed model. Alternatively, Walker and Mallick (1997) and Hanson (2006) consider PT and mixtures of PT priors in random intercept models. Jara et al. (2009) propose a novel mixture of multivariate PT priors to define flexible nonparametric models for multivariate distributions that reduces the undesirable sensitivity to the choice of the partitions associated with the PT constructions. Under these approaches, the parametric assumption is relaxed by considering

b1,…,bm∣G∼iidG\boldsymbol{b}_{1}, \ldots, \boldsymbol{b}_{m} \mid G \stackrel{i i d}{\sim} G

and

G∣H∼HG \mid H \sim H

where HH is one of the previously mentioned probability models for probability distributions. We will specify the nonparametric priors in more detail next, but first it is necessary to discuss some important issues regarding the specification of the semiparametric model. Specifically, it is important to stress that under parametrization (2), βR\boldsymbol{\beta}^{R} represents the mean of random effects, and bi\boldsymbol{b}_{i} represents the subject-specific deviation from the mean. It follows that fixing the mean of the normal prior distribution for the random effects b\boldsymbol{b} at zero in the parametric context corresponds to an identification restriction for the model parameters (see e.g., Newton, 1994; San Martín, Jara, Rolin, and Mouchart, 2007). Equivalently, the random probability measure must be appropriately restricted in a semiparametric GLMM specification. In our settings, the location of GG is “confounded” with the parameters βR\boldsymbol{\beta}^{R}. Although such identification issues present no difficulties to a Bayesian analysis in the sense that a prior is transformed into a posterior using the sampling model and the probability calculus, if the interest focuses on a “confounded” parameter then such formal assurances have little practical value. Furthermore, as more data become available, the posterior mass will not concentrate on a point in the model, making asymptotic analysis difficult. As pointed out by Newton (1994), from a computational point of view, identification problems imply ridges in the posterior distribution and MCMC methods can be difficult to implement in these situations.
Following Jara et al. (2009), we consider the following re-parameterization of the model

ηij=xij′β+zij′θi\eta_{i j}=\boldsymbol{x}_{i j}^{\prime} \boldsymbol{\beta}+\boldsymbol{z}_{i j}^{\prime} \boldsymbol{\theta}_{i}

θ1,…,θm∣G∼ iid G\boldsymbol{\theta}_{1}, \ldots, \boldsymbol{\theta}_{m} \mid G \stackrel{\text { iid }}{\sim} G

and

G∣H∼HG \mid H \sim H

where β=βF\boldsymbol{\beta}=\boldsymbol{\beta}^{F}, and θi=βR+bi\boldsymbol{\theta}_{i}=\boldsymbol{\beta}^{R}+\boldsymbol{b}_{i}, and we center the nonparametric priors for GG at a Nq(μ,Σ)N_{q}(\boldsymbol{\mu}, \boldsymbol{\Sigma}) distribution. Notice that samples under the original parameterization can be obtained in a straightforward manner from MCMC samples as explained in Jara et al. (2009) for PT priors. For DP or DPM priors the ϵ\epsilon-DP approximation proposed by Muliere and Tardella (1998) is considered, with ϵ=0.01\epsilon=0.01. The latter is similar to the approach proposed by Gelfand and Kottas (2002) who considered a fixed truncation to the DP. When a DP or DPM prior is used to model the random effects distribution, Dunson, Yang, and Baird (2007a) and Li, Müller, and Lin (2007) proposed alternative strategies to avoid the identifiability problem described above but these approaches are not implemented in the current version of DPpackage.
The functions DP1mm, DPglmm, and DPolmm implement mixed effects models using a DP prior for GG such that

G∣α,μ,Σ∼DP(αNq(μ,Σ))G \mid \alpha, \boldsymbol{\mu}, \boldsymbol{\Sigma} \sim D P\left(\alpha N_{q}(\boldsymbol{\mu}, \boldsymbol{\Sigma})\right)

The functions DPM1mm, DPMglmm, and DPolmm consider a DPM of normals prior for GG such that

G∣Σk,H∼∫Nq(μ,Σk)dP(μ)G \mid \boldsymbol{\Sigma}_{k}, H \sim \int N_{q}\left(\boldsymbol{\mu}, \boldsymbol{\Sigma}_{k}\right) d P(\boldsymbol{\mu})

with

P∣α,μ,Σ∼DP(αNq(μ,Σ))P \mid \alpha, \boldsymbol{\mu}, \boldsymbol{\Sigma} \sim D P\left(\alpha N_{q}(\boldsymbol{\mu}, \boldsymbol{\Sigma})\right)

The functions PTlmm, PTglmm, and PTolmm consider a multivariate PT prior for GG such that

G∣α,μ,Σ,O∼PT(Πμ,Σ,O,Aα)G \mid \alpha, \boldsymbol{\mu}, \boldsymbol{\Sigma}, \boldsymbol{O} \sim P T\left(\Pi^{\boldsymbol{\mu}, \boldsymbol{\Sigma}, \boldsymbol{O}}, \mathcal{A}^{\alpha}\right)

where O\boldsymbol{O} is a q×qq \times q orthogonal matrix defining the “direction” of the partition sets. The models are completed by assuming the following prior distributions:

β∼Np(β0,Sβ0)τ−1∣τ1,τ2∼Γ(τ1/2,τ2/2)μ∣μb,Sb∼Nq(μb,Sb)Σ∣ν0,T∼IWk(ν0,T)O∼Haar⁡(q)\begin{gathered} \boldsymbol{\beta} \sim N_{p}\left(\boldsymbol{\beta}_{0}, \boldsymbol{S}_{\boldsymbol{\beta}_{0}}\right) \\ \tau^{-1} \mid \tau_{1}, \tau_{2} \sim \Gamma\left(\tau_{1} / 2, \tau_{2} / 2\right) \\ \boldsymbol{\mu} \mid \boldsymbol{\mu}_{b}, \boldsymbol{S}_{b} \sim N_{q}\left(\boldsymbol{\mu}_{b}, \boldsymbol{S}_{b}\right) \\ \boldsymbol{\Sigma} \mid \nu_{0}, \boldsymbol{T} \sim I W_{k}\left(\nu_{0}, \boldsymbol{T}\right) \\ \boldsymbol{O} \sim \operatorname{Haar}(q) \end{gathered}

and

α∣a0,b0∼Γ(a0,b0)\alpha \mid a_{0}, b_{0} \sim \Gamma\left(a_{0}, b_{0}\right)

where Γ\Gamma and IWI W refers to the Gamma and inverted Wishart distributions, respectively. As before, the inverted Wishart prior is parameterized such that E(Σ)=T−1/(ν0−q−1)E(\boldsymbol{\Sigma})=\boldsymbol{T}^{-1} /\left(\nu_{0}-q-1\right).
The DP1mm, DPM1mm and PT1mm functions consider the normal sampling distribution with an identity link. The DPglmm, DPMglmm, and PTglmm functions include the following sampling distributions (link): binomial (logit and probit), Poisson (log) and gamma (log). The DPo1mm, DPMo1mm and PTolmm consider a multinomial sampling distribution and an ordered-probit link function.
In all functions, a marginalized version of the semiparametric GLMM is considered where the random probability distribution GG is integrated out. For the multinomial and probit-binomial models, the latent variable approach of Albert and Chib (1993) is considered.
The computational implementation associated to the functions DPM1mm and DPMo1mm, and to the probit-Bernoulli model included in the DPMglmm function, is based on the use of MCMC methods for conjugate priors for a collapsed state of MacEachern (1998). For the poisson, Gamma, and logit-binomial models included in the DPglmm and DPMglmm functions, MCMC methods for nonconjugate priors are used. Specifically, algorithm 8 of Neal (2000), with m=1m=1, is considered. In this case, a MH step with the iterative weighted least square (IWLS) normal proposal of Gamerman (1997) is used to update fixed and random effects.

For the functions DP1mm and DPo1mm, and the probit-Bernoulli model included in DPglmm, the MCMC strategy described by Bush and MacEachern (1996) is employed. Finally, for the PT1mm, PTgmm and PTomm the modified IWLS proposal normal proposal described by Jara et al. (2009) is considered for sampling the random effects. In these functions, IWLS normal proposal of Gamerman (1997) is used to update the fixed effects in the nonconjugate case. The PT centering and precision parameters are updated using adaptive MCMC algorithms as described by Jara et al. (2009).

3.3. Semiparametric IRT-type models

Item response theory (IRT) models are widely used in educational measurement (see e.g., De Boeck and Wilson, 2004). Rasch-type models (Rasch, 1960) are typical examples of this class and can be viewed as a particular case of GLMM (see e.g., De Boeck and Wilson, 2004). In Rasch-type models, the linear predictor ηij\eta_{i j} depends on two parameters in an additive way ηij=θi−βj\eta_{i j}=\theta_{i}-\beta_{j}, where θi∈R\theta_{i} \in \mathbb{R} corresponds to the ability of subject i,i=1,…,mi, i=1, \ldots, m, and βj∈R\beta_{j} \in \mathbb{R} corresponds to the difficulty of probe/item j,j=1,…,pj, j=1, \ldots, p. The difficulty and ability parameters are interpreted as “fixed” and “random” effects, respectively. Two versions of the model are considered here: the Rasch model (RM) and the Rasch Poisson count model (RPCM). In the RM, YijY_{i j} represents a binary variable coding the correct answer of individual ii to the item jj, such that

Yij∣θi,βj∼ ind. Bernoulli (Ψ(θi−βj))Y_{i j} \mid \theta_{i}, \beta_{j} \stackrel{\text { ind. }}{\sim} \text { Bernoulli }\left(\Psi\left(\theta_{i}-\beta_{j}\right)\right)

where Ψ(x)=exp⁡(x)/(1+exp⁡(x))\Psi(x)=\exp (x) /(1+\exp (x)). In the RPCM the sampling distribution is given by

Yij∣θi,βj∼ ind. Poisson (exp⁡(θi−βj))Y_{i j} \mid \theta_{i}, \beta_{j} \stackrel{\text { ind. }}{\sim} \text { Poisson }\left(\exp \left(\theta_{i}-\beta_{j}\right)\right)

where YijY_{i j} is an “unbounded” count variable, typically representing the number of miss-reading / misscopying for the subject ii in the text jj. We consider semiparametric versions of the models where the abilities distribution GG is modeled using DP, PT and DPM priors. To avoid identification problems in the semiparametric specification of the model (see, San Martín et al., 2007), we fixed the first difficulty parameter at 0 and consider a normal prior for the remaining elements in the vector

βFp∣β0,Sβ0∼Np−1(β0,Sβ0)\boldsymbol{\beta}_{\mathcal{F} p} \mid \boldsymbol{\beta}_{0}, \boldsymbol{S}_{\beta_{0}} \sim N_{p-1}\left(\boldsymbol{\beta}_{0}, \boldsymbol{S}_{\beta_{0}}\right)

The functions DP rasch and DPraschpoisson implement semiparametric versions of the RM and RPCM, respectively, where

θi∣G∼i id G\theta_{i} \mid G \stackrel{i \text { id }}{\sim} G

and

G∣α,G0∼DP(αN(μ,σ2))G \mid \alpha, G_{0} \sim D P\left(\alpha N\left(\mu, \sigma^{2}\right)\right)

In a similar way, the functions FPTrasch and FPTraschpoisson implement semiparametric versions of the RM and RPCM, respectively, using a finite PT prior,

G∣α,μ,σ2∼PTM(Πμ,σ2,Aα)G \mid \alpha, \mu, \sigma^{2} \sim P T^{M}\left(\Pi^{\mu, \sigma^{2}}, \mathcal{A}^{\alpha}\right)

where, the PT is centered around a N(μ,σ2)N\left(\mu, \sigma^{2}\right) distribution, by taking each mm level of the partition Πμ,σ2\Pi^{\mu, \sigma^{2}} to coincide with the k/2m,k=0,…,2mk / 2^{m}, k=0, \ldots, 2^{m} quantiles of the N(μ,σ2)N\left(\mu, \sigma^{2}\right) distribution. The family Aα={αϵ:ϵ∈E∗}\mathcal{A}^{\alpha}=\left\{\alpha_{\epsilon}: \epsilon \in E^{*}\right\}, where E∗=⋃m=1∞EmE^{*}=\bigcup_{m=1}^{\infty} E^{m} and EmE^{m} is the mm-fold product of E={0,1}E=\{0,1\}, was specified as αϵ1…ϵm=αm2\alpha_{\epsilon_{1} \ldots \epsilon_{m}}=\alpha m^{2}. For the DP and PT priors, the model is completed by assuming

α∣a0,b0∼Γ(a0,b0)μ∣μb,Sb∼N(μb,Sb)\begin{aligned} & \alpha \mid a_{0}, b_{0} \sim \Gamma\left(a_{0}, b_{0}\right) \\ & \mu \mid \mu_{b}, S_{b} \sim N\left(\mu_{b}, S_{b}\right) \end{aligned}

and

σ−2∣τ1,τ2∼Γ(τ1/2,τ2/2)\sigma^{-2} \mid \tau_{1}, \tau_{2} \sim \Gamma\left(\tau_{1} / 2, \tau_{2} / 2\right)

The functions DPMrasch and DPMraschpoisson consider DPM of normals priors for the abilities distribution in a RM and RPCM, respectively, given by

θi∣G∼ iid ∫N(μ,σ2)dG(μ,σ2)\theta_{i} \mid G \stackrel{\text { iid }}{\sim} \int N\left(\mu, \sigma^{2}\right) d G\left(\mu, \sigma^{2}\right)

and

G∣α,G0∼DP(αG0)G \mid \alpha, G_{0} \sim D P\left(\alpha G_{0}\right)

where G0≡N(μ∣μb,σb2)IG(σ2∣τk1,τk2)G_{0} \equiv N\left(\mu \mid \mu_{b}, \sigma_{b}^{2}\right) I G\left(\sigma^{2} \mid \tau_{k 1}, \tau_{k 2}\right). We further assume that

α∣a0,b0∼Γ(a0,b0)μb∣m0,s0∼N(m0,s0)σb−2∣τb1,τb2∼Γ(τb1/2,τb2/2)\begin{gathered} \alpha \mid a_{0}, b_{0} \sim \Gamma\left(a_{0}, b_{0}\right) \\ \mu_{b} \mid m_{0}, s_{0} \sim N\left(m_{0}, s_{0}\right) \\ \sigma_{b}^{-2} \mid \tau_{b 1}, \tau_{b 2} \sim \Gamma\left(\tau_{b 1} / 2, \tau_{b 2} / 2\right) \end{gathered}

and

τk2∣τs1,τs2∼Γ(τs1/2,τs2/2)\tau_{k 2} \mid \tau_{s 1}, \tau_{s 2} \sim \Gamma\left(\tau_{s 1} / 2, \tau_{s 2} / 2\right)

In all functions, the difficulty and ability parameters are updated using a MH step with the IWLS normal proposal of Gamerman (1997). The computational implementation in the DPresch and DPraschpoisson functions is based on the marginalization of the DP and on the use of algorithm 8 of Neal (2000), with m=1m=1. The DPM implementations of functions DPMrasch and DPMraschpoisson are based on the finite approximation for DP proposed by Ishwaran and James (2002). Finally, the functions using finite PT priors for the abilities distribution, FPTrasch and FPTraschpoisson, fit a full version of the models where the PT conditional probabilities are updated during the MCMC scheme. In this case, the abilities, centering and precision parameters are updated using slice sampling (Neal, 2003).

3.4. Semiparametric meta-analysis models

The DPmeta, DPMmeta and PTmeta functions implement random (mixed) effects univariate metaanalysis models using a MDP, DPM of normals, and MPT prior for the random effects, respectively. In this case, the conditional model is given by

yi∣θi,β,σi2∼ ind. N(θi+xi′β,σi2)y_{i} \mid \theta_{i}, \boldsymbol{\beta}, \sigma_{i}^{2} \stackrel{\text { ind. }}{\sim} N\left(\theta_{i}+\boldsymbol{x}_{i}^{\prime} \boldsymbol{\beta}, \sigma_{i}^{2}\right)

where the variances σi2\sigma_{i}^{2} are known, xi\boldsymbol{x}_{i} is a pp-dimensional design vector, excluding an intercept term, and

βp∣β0,Sβ0∼Np(β0,Sβ0)\boldsymbol{\beta}_{p} \mid \boldsymbol{\beta}_{0}, \boldsymbol{S}_{\beta_{0}} \sim N_{p}\left(\boldsymbol{\beta}_{0}, \boldsymbol{S}_{\beta_{0}}\right)

The DPmeta function assumes that

θi∣G∼ iid G\theta_{i} \mid G \stackrel{\text { iid }}{\sim} G

and

G∣α,μ,σ2∼DP(αN(μ,σ2))G \mid \alpha, \mu, \sigma^{2} \sim D P\left(\alpha N\left(\mu, \sigma^{2}\right)\right)

The PTmeta function, replaces the latter assumption by a PT prior,

G∣α,μ,σ2∼PT(Πμ,σ2,Aα)G \mid \alpha, \mu, \sigma^{2} \sim P T\left(\Pi^{\mu, \sigma^{2}}, \mathcal{A}^{\alpha}\right)

where the PT prior is centered around a N(μ,σ2)N\left(\mu, \sigma^{2}\right) distribution. The PTmeta function can also center the PT prior around a N(0,σ2)N\left(0, \sigma^{2}\right) distribution for the median- 0 model described by Branscum and Hanson (2008). This model is fitted if the option frstlprob is set equal to TRUE in the model prior object. In this case, the design vector xi\boldsymbol{x}_{i} includes an intercept term and the associated regression coefficient represents the median effect. The computational implementation of the DPmeta and PTmeta functions are based on the marginalization of the DP and PT, respectively. In both cases, the model specification is completed by assuming

α∣a0,b0∼Γ(a0,b0)μ∣μb,Sb∼N(μb,Sb)\begin{gathered} \alpha \mid a_{0}, b_{0} \sim \Gamma\left(a_{0}, b_{0}\right) \\ \mu \mid \mu_{b}, S_{b} \sim N\left(\mu_{b}, S_{b}\right) \end{gathered}

and

σ−2∣τ1,τ2∼Γ(τ1/2,τ2/2)\sigma^{-2} \mid \tau_{1}, \tau_{2} \sim \Gamma\left(\tau_{1} / 2, \tau_{2} / 2\right)

The the average effect in the DPmeta function is sampled using the method of composition and the ϵ\epsilon-DP approximation proposed by Muliere and Tardella (1998), with ϵ=0.01\epsilon=0.01. For the PTmeta function, the mean effect is sampled using the finite PT approximation described by Jara et al. (2009). The DPMmeta function considers a location DPM of normals priors for the study effects

θi∣σ2,G∼ iid ∫N(μ,σ2)dG(μ)σ−2∣τ01,τ02∼Γ(τ01/2,τ02/2)\begin{gathered} \theta_{i} \mid \sigma^{2}, G \stackrel{\text { iid }}{\sim} \int N\left(\mu, \sigma^{2}\right) d G(\mu) \\ \sigma^{-2} \mid \tau_{01}, \tau_{02} \sim \Gamma\left(\tau_{01} / 2, \tau_{02} / 2\right) \end{gathered}

and

G∣α,G0∼DP(αG0)G \mid \alpha, G_{0} \sim D P\left(\alpha G_{0}\right)

where G0≡N(μ∣μb,σb2)G_{0} \equiv N\left(\mu \mid \mu_{b}, \sigma_{b}^{2}\right). This function further assumes that

α∣a0,b0∼Γ(a0,b0)μb∣mb,Sb∼N(mb,Sb)\begin{gathered} \alpha \mid a_{0}, b_{0} \sim \Gamma\left(a_{0}, b_{0}\right) \\ \mu_{b} \mid m_{b}, S_{b} \sim N\left(m_{b}, S_{b}\right) \end{gathered}

and

σb−2∣τ11,τ12∼Γ(τ11/2,τ12/2)\sigma_{b}^{-2} \mid \tau_{11}, \tau_{12} \sim \Gamma\left(\tau_{11} / 2, \tau_{12} / 2\right)

The computational implementation of the model is based on the marginalization of the DP and on the use of MCMC methods for conjugate priors for a collapsed state, as presented in MacEachern (1998). The average effect is also sampled using the method of composition and the ϵ\epsilon-DP approximation proposed by Muliere and Tardella (1998), with ϵ=0.01\epsilon=0.01.
The function DPmultmeta implements a multivariate extension of the no-covariate model considered in the DPmeta function, given by

yi∣θi,Σi∼ ind. Nk(θi,Σi)θi∣G∼ iid G\begin{gathered} y_{i} \mid \boldsymbol{\theta}_{i}, \boldsymbol{\Sigma}_{i} \stackrel{\text { ind. }}{\sim} N_{k}\left(\boldsymbol{\theta}_{i}, \boldsymbol{\Sigma}_{i}\right) \\ \boldsymbol{\theta}_{i} \mid G \stackrel{\text { iid }}{\sim} G \end{gathered}

and

G∣α,m1,S1∼DP(αNk(m1,S1))G \mid \alpha, \boldsymbol{m}_{1}, \boldsymbol{S}_{1} \sim D P\left(\alpha N_{k}\left(\boldsymbol{m}_{1}, \boldsymbol{S}_{1}\right)\right)

where the covariance matrices Σi\boldsymbol{\Sigma}_{i} are known. To complete the model specification, independent hyperpriors are assumed,

α∣a0,b0∼Γ(a0,b0)m1∣m2,S2∼Nk(m2,S2)\begin{gathered} \alpha \mid a_{0}, b_{0} \sim \Gamma\left(a_{0}, b_{0}\right) \\ \boldsymbol{m}_{1} \mid \boldsymbol{m}_{2}, \boldsymbol{S}_{2} \sim N_{k}\left(\boldsymbol{m}_{2}, \boldsymbol{S}_{2}\right) \end{gathered}

and

S1∣ν,Ψ∼IWk(ν,Ψ)\boldsymbol{S}_{1} \mid \nu, \boldsymbol{\Psi} \sim I W_{k}(\nu, \boldsymbol{\Psi})

The computational implementation is similar to the one employed for the DPmeta function.

3.5. Accelerated failure time modeling for interval-censored data

The DPsurvint function implements the algorithm described by Hanson and Johnson (2004) for semiparametric accelerated failure time (AFT) models. The AFT regression model is given by

Ti∈[li,ui),i=1,…,nTi=exp⁡(−xi′β)Viβ∣β0,Sβ0∼Np(β0,Sβ0)V1,…,Vn∣G∼ iid G\begin{gathered} T_{i} \in\left[l_{i}, u_{i}\right), i=1, \ldots, n \\ T_{i}=\exp \left(-\boldsymbol{x}_{i}^{\prime} \boldsymbol{\beta}\right) V_{i} \\ \boldsymbol{\beta} \mid \boldsymbol{\beta}_{0}, \boldsymbol{S}_{\boldsymbol{\beta}_{0}} \sim N_{p}\left(\boldsymbol{\beta}_{0}, \boldsymbol{S}_{\boldsymbol{\beta}_{0}}\right) \\ V_{1}, \ldots, V_{n} \mid G \stackrel{\text { iid }}{\sim} G \end{gathered}

and

G∣α,μ,σ2∼DP(αLN(μ,σ2))G \mid \alpha, \mu, \sigma^{2} \sim D P\left(\alpha L N\left(\mu, \sigma^{2}\right)\right)

where LN(v∣μ,σ2)L N\left(v \mid \mu, \sigma^{2}\right) refers to a log-normal distribution with location and scale parameter μ\mu and σ2\sigma^{2}, respectively. The model is completed by assuming independent hyperpriors,

α∼Γ(a0,b0)μ∣m0,s0∼N(m0,s0)\begin{gathered} \alpha \sim \Gamma\left(a_{0}, b_{0}\right) \\ \mu \mid m_{0}, s_{0} \sim N\left(m_{0}, s_{0}\right) \end{gathered}

and

σ−2∣τ1,τ2∼Γ(τ1/2,τ2/2)\sigma^{-2} \mid \tau_{1}, \tau_{2} \sim \Gamma\left(\tau_{1} / 2, \tau_{2} / 2\right)

The likelihood in the AFT model for interval censored data involves the product of indicator functions ∏i=1nI(Ti∈Ai)\prod_{i=1}^{n} I\left(T_{i} \in A_{i}\right), where AiA_{i} is an interval in the sample space. This fact gives rise to algorithmic possibilities which are unavailable or very difficult to implement under standard hierarchical models with uncensored data. As described in Hanson and Johnson (2004), the DPsurvint function partially sample GG, in order to sample (V1,…,Vn,Vn+1,β,α)\left(V_{1}, \ldots, V_{n}, V_{n+1}, \boldsymbol{\beta}, \alpha\right) with perfect accuracy. This can be performed by using the properties of DP. Specifically, the following representation of the process is considered

G=∑j=1MGjGjG=\sum_{j=1}^{M} G_{j} G^{j}

where jj indexes the intervals that define a finite partition of the sample space {B1,…,BM},Gj=\left\{B_{1}, \ldots, B_{M}\right\}, G_{j}= G(Bj)G\left(B_{j}\right), and Gj(⋅)=G(⋅∣Bj)G^{j}(\cdot)=G\left(\cdot \mid B_{j}\right), with the GjG_{j} 's being Dirichlet distributed random variables and the GjG^{j} 's being independent Dirichlet processes. Therefore, GG can be updated by first updating {Gj}\left\{G_{j}\right\} using Ferguson’s definition of DP and then by updating each Gj∣{Gj},…G^{j} \mid\left\{G_{j}\right\}, \ldots using the Sethuraman (1994) stick-breaking representation of DP (see, e.g. Doss, 1994; Hanson and Johnson, 2004). Based on this, a MH step is used to update the regression coefficients, followed by updates of V1,…,Vn+1V_{1}, \ldots, V_{n+1}.

The function predict.DPsurvint can be used to extract posterior information about the survival curve based on the MCMC output. Given a sample of the parameters of size JJ, a sample of the survival curve for a given x\boldsymbol{x} is drawn as follows. For the jj th MCMC scan of the posterior distribution, j=1,…,Jj=1, \ldots, J, the survival function evaluated at tt is sampled from

S(j)(t∣x, data )∼Beta⁡(a(j)(t),b(j)(t))S^{(j)}(t \mid \boldsymbol{x}, \text { data }) \sim \operatorname{Beta}\left(a^{(j)}(t), b^{(j)}(t)\right)

where

a(j)(t)=α(j)G0(j)((texp⁡(x′β(j)),+∞))+∑i=1nδYi(j)((texp⁡(x′β(j)),+∞))a^{(j)}(t)=\alpha^{(j)} G_{0}^{(j)}\left(\left(t \exp \left(\boldsymbol{x}^{\prime} \boldsymbol{\beta}^{(j)}\right),+\infty\right)\right)+\sum_{i=1}^{n} \delta_{Y_{i}^{(j)}}\left(\left(t \exp \left(\boldsymbol{x}^{\prime} \boldsymbol{\beta}^{(j)}\right),+\infty\right)\right)

and b(j)(t)=α(j)+n−a(j)(t)b^{(j)}(t)=\alpha^{(j)}+n-a^{(j)}(t).

3.6. Binary regression with nonparametric link

Consider binary regression data, (Yi,xi),1≤i≤n\left(Y_{i}, \boldsymbol{x}_{i}\right), 1 \leq i \leq n, where YiY_{i} is a binary response variable (Yi∈\left(Y_{i} \in\right. {0,1})\{0,1\}) and xi∈Rp\boldsymbol{x}_{i} \in \mathbb{R}^{p} is a vector of pp explanatory variables. Parametric versions of this model are characterized by the following assumption

Pr⁡(Yi=1∣xi,θ)=E(Yi=1∣xi,θ)=Fφ(m(β,xi))\begin{aligned} \operatorname{Pr}\left(Y_{i}=1 \mid \boldsymbol{x}_{i}, \boldsymbol{\theta}\right) & =E\left(Y_{i}=1 \mid \boldsymbol{x}_{i}, \boldsymbol{\theta}\right) \\ & =F_{\varphi}\left(m\left(\boldsymbol{\beta}, \boldsymbol{x}_{i}\right)\right) \end{aligned}

where FφF_{\varphi} is a distribution function on R\mathbb{R}, called the inverse link function in the context of generalized linear models, known up to a Euclidean parameter φ\varphi, and m(⋅)m(\cdot) is a known function, called the index function, parameterized by β\boldsymbol{\beta}. Popular parametric versions include a linear index function, m(β,xi)=xi′βm\left(\boldsymbol{\beta}, \boldsymbol{x}_{i}\right)=\boldsymbol{x}_{i}^{\prime} \boldsymbol{\beta}, and where FφF_{\varphi} is considered to be a known cumulative distribution function, i.e. with φ=φ0\varphi=\varphi_{0}, thus allowing relatively simple treatment of the finite regression parameters, θ=β\boldsymbol{\theta}=\boldsymbol{\beta}. The function Pbinary implements parametric versions of this model considering the logit, probit, cloglog, and Cauchy link functions.
The DPbinary, FPTbinary, and CSDPbinary functions replace the parametric inverse link function FφF_{\varphi} by a general distribution GG and placing a DP prior,

G∣α,G0∼DP(αG0)G \mid \alpha, G_{0} \sim D P\left(\alpha G_{0}\right)

a finite PT where the first and second quartiles are fixed (Hanson, 2006),

G∣α∼PTM(Π,Aα)G \mid \alpha \sim P T^{M}\left(\Pi, \mathcal{A}^{\alpha}\right)

and a CSDP (Newton et al., 1996),

G∣p,d,h,G0∼CSDP(αG0,p,d,h)G \mid p, d, h, G_{0} \sim C S D P\left(\alpha G_{0}, p, d, h\right)

on GG, respectively. Newton et al. (1996) described the CSDP as a prior distribution on the space of the probability distribution with fixed location and scale in order to assure sampling identification. The reasoning behind their construction is presented here for completeness. The following definition is a slight modification of the one given by Newton et al. (1996). Let G0G_{0} and HH be two probability measures on R\mathbb{R} and (0,d)(0, d), respectively, such that for all d>0,G0((−∞,−d))>0d>0, G_{0}((-\infty,-d))>0 and G0((d,∞))>0G_{0}((d, \infty))>0. Let θ∼h\theta \sim h, where hh is the density of HH with respect to Lebesgue measure. Given θ\theta, define the

following partition of the real line, A1(θ)=(−∞,θ−d],A2(θ)=(θ−d,0],A3(θ)=(0,θ]A_{1}(\theta)=(-\infty, \theta-d], A_{2}(\theta)=(\theta-d, 0], A_{3}(\theta)=(0, \theta], and A1(θ)=(θ,∞)A_{1}(\theta)=(\theta, \infty). Finally, suppose that for each θ∈(0,d)\theta \in(0, d), the random probability measures φ1,φ2,φ3\varphi_{1}, \varphi_{2}, \varphi_{3}, and φ4\varphi_{4} follow conditionally independent DP priors, φi∣θ,α,G0∼ ind DP(αG0I(Ai(θ)))\varphi_{i} \mid \theta, \alpha, G_{0} \stackrel{\text { ind }}{\sim} D P\left(\alpha G_{0} I_{\left(A_{i}(\theta)\right)}\right), i=1,…,4i=1, \ldots, 4. The random probability measure GG on (R,B)(\mathbb{R}, \mathcal{B}) is said to follow CSDP prior with parameter (α,G0,p,d,h)\left(\alpha, G_{0}, p, d, h\right), written G∼CSDP⁡(αG0,p,d,h)G \sim \operatorname{CSDP}\left(\alpha G_{0}, p, d, h\right), if,

G=1−p2(φ1+φ4)+p2(φ2+φ3) a.s. G=\frac{1-p}{2}\left(\varphi_{1}+\varphi_{4}\right)+\frac{p}{2}\left(\varphi_{2}+\varphi_{3}\right) \text { a.s. }

In all cases, the functions allows for misclassified binary responses with known misclassification parameters and the model specification is completed by assuming

α∼Γ(a0,b0)\alpha \sim \Gamma\left(a_{0}, b_{0}\right)

and

β∣β0,Sβ0∼Np(β0,Sβ0)\boldsymbol{\beta} \mid \boldsymbol{\beta}_{0}, \boldsymbol{S}_{\boldsymbol{\beta}_{0}} \sim N_{p}\left(\boldsymbol{\beta}_{0}, \boldsymbol{S}_{\boldsymbol{\beta}_{0}}\right)

The DPbinary function allows the user to center the DP around a logistic, normal or Cauchy distribution. The CSDPbinary function takes H≡U(0,d)H \equiv U(0, d) distribution and G0G_{0} as the standard logistic distribution. In both functions, a latent variable representation

Yi=I{Vi≤xiTβ}Y_{i}=I_{\left\{V_{i} \leq \boldsymbol{x}_{i}^{T} \boldsymbol{\beta}\right\}}

and

V1,…,Vn∣G∼GV_{1}, \ldots, V_{n} \mid G \sim G

is used, along with a MH step to update the regression coefficients. In the computational implementation of this model, GG is considered as latent data and sampled partially with sufficient accuracy to be able to generate V1,…,Vn+1V_{1}, \ldots, V_{n+1} such that are exactly iid random variables from GG, as proposed by Doss (1994). Both Ferguson’s definition of DP and the Sethuraman (1994)'s representation of the process are used. As in Bush and MacEachern (1996), an extra step which moves the clusters in such a way that the posterior distribution is still the stationary distribution, is performed in order to improve the mixing of the chain.
The FPTbinary function creates the partition sets based on the logistic distribution. In the computational implementation of the model, MH steps are used to update the regression coefficients and the precision parameter, as described in Hanson (2006).

3.7. ROC curve estimation

The DProc function performs a ROC curve analysis based on DPM of normals models for density estimation. Let x1,…,xn\boldsymbol{x}_{1}, \ldots, \boldsymbol{x}_{n} and y1,…,ym\boldsymbol{y}_{1}, \ldots, \boldsymbol{y}_{m} be the diagnostic marker measurements for the healthy and diseased subjects, respectively. The model is given by

xi∣Gx∼iid∫N(μx,Σx)dGx(μx,Σx)yi∣Gy∼iid∫N(μy,Σy)dGy(μy,Σy)\begin{aligned} & \boldsymbol{x}_{i} \mid G_{x} \stackrel{i i d}{\sim} \int N\left(\boldsymbol{\mu}_{\boldsymbol{x}}, \boldsymbol{\Sigma}_{\boldsymbol{x}}\right) d G_{\boldsymbol{x}}\left(\boldsymbol{\mu}_{\boldsymbol{x}}, \boldsymbol{\Sigma}_{\boldsymbol{x}}\right) \\ & \boldsymbol{y}_{i} \mid G_{\boldsymbol{y}} \stackrel{i i d}{\sim} \int N\left(\boldsymbol{\mu}_{\boldsymbol{y}}, \boldsymbol{\Sigma}_{\boldsymbol{y}}\right) d G_{\boldsymbol{y}}\left(\boldsymbol{\mu}_{\boldsymbol{y}}, \boldsymbol{\Sigma}_{\boldsymbol{y}}\right) \end{aligned}

Gx∣αx,Gx0∼DP(αxGx0)G_{x} \mid \alpha_{x}, G_{x_{0}} \sim D P\left(\alpha_{x} G_{x_{0}}\right)

and

Gy∣αy,Gy0∼DP(αyGy0)G_{\boldsymbol{y}} \mid \alpha_{\boldsymbol{y}}, G_{\boldsymbol{y}_{0}} \sim D P\left(\alpha_{\boldsymbol{y}} G_{\boldsymbol{y}_{0}}\right)

where, the baseline distributions, Gz0,z={x,y}G_{z_{0}}, \boldsymbol{z}=\{\boldsymbol{x}, \boldsymbol{y}\}, correspond to the conjugate normal-invertedWishart distribution

Gz0≡Nk(μz∣mz1,κz0−1Σz)IWk(Σz∣νz1,Ψz1)G_{z_{0}} \equiv N_{k}\left(\boldsymbol{\mu}_{\boldsymbol{z}} \mid \boldsymbol{m}_{\boldsymbol{z} 1}, \kappa_{\boldsymbol{z} 0}^{-1} \boldsymbol{\Sigma}_{\boldsymbol{z}}\right) I W_{k}\left(\boldsymbol{\Sigma}_{\boldsymbol{z}} \mid \nu_{\boldsymbol{z} 1}, \boldsymbol{\Psi}_{\boldsymbol{z} 1}\right)

To complete the model specification, the model is extended by assuming independent hyper-priors,

αx∼Γ(ax0,bx0),αy∼Γ(ay0,by0)mx1∣mx2,Sx2∼Nk(mx2,Sx2),my1∣my2,Sy2∼Nk(my2,Sy2)κx0∣τx1,τx2∼Γ(τx1/2,τx2/2),κy0∣τy1,τy2∼Γ(τy1/2,τy2/2)Ψx1∣νx2,Ψx2∼IWk(νx2,Ψx2), and Ψy1∣νy2,Ψy2∼IWk(νy2,Ψy2)\begin{gathered} \alpha_{x} \sim \Gamma\left(a_{x 0}, b_{x 0}\right), \alpha_{y} \sim \Gamma\left(a_{y 0}, b_{y 0}\right) \\ \boldsymbol{m}_{x 1} \mid \boldsymbol{m}_{x 2}, \boldsymbol{S}_{x 2} \sim N_{k}\left(\boldsymbol{m}_{x 2}, \boldsymbol{S}_{x 2}\right), \boldsymbol{m}_{y 1} \mid \boldsymbol{m}_{y 2}, \boldsymbol{S}_{y 2} \sim N_{k}\left(\boldsymbol{m}_{y 2}, \boldsymbol{S}_{y 2}\right) \\ \kappa_{x 0} \mid \tau_{x 1}, \tau_{x 2} \sim \Gamma\left(\tau_{x 1} / 2, \tau_{x 2} / 2\right), \kappa_{y 0} \mid \tau_{y 1}, \tau_{y 2} \sim \Gamma\left(\tau_{y 1} / 2, \tau_{y 2} / 2\right) \\ \boldsymbol{\Psi}_{x 1} \mid \nu_{x 2}, \boldsymbol{\Psi}_{x 2} \sim I W_{k}\left(\nu_{x 2}, \boldsymbol{\Psi}_{x 2}\right), \text { and } \boldsymbol{\Psi}_{y 1} \mid \nu_{y 2}, \boldsymbol{\Psi}_{y 2} \sim I W_{k}\left(\nu_{y 2}, \boldsymbol{\Psi}_{y 2}\right) \end{gathered}

The survival and ROC curves are estimated by using a Monte Carlo approximation to the posterior means E(Gx∣x1,…,xn)E\left(G_{x} \mid \boldsymbol{x}_{1}, \ldots, \boldsymbol{x}_{n}\right) and E(Gy∣y1…,ym)E\left(G_{\boldsymbol{y}} \mid \boldsymbol{y}_{1} \ldots, \boldsymbol{y}_{m}\right), which is based on MCMC samples from posterior predictive distribution for a future observation. The optimal cut-off point is based on the efficiency of the test and is built on Cohen’s kappa as defined in Kraemer (1992).

3.8. Median regression modeling

Consider regression data (yi,xi),i=1,…,n\left(y_{i}, \boldsymbol{x}_{i}\right), i=1, \ldots, n, where yiy_{i} is the response and xi\boldsymbol{x}_{i} is a pp-dimensional vector of predictors. By default, the PTlm function fits a median regression model using a scale MPT prior for the distribution of the errors (Hanson and Johnson, 2002),

Yi=xi′β+Viβ∼Np(β0,Sβ0)Vi∣G∼ iid G\begin{gathered} Y_{i}=\boldsymbol{x}_{i}^{\prime} \boldsymbol{\beta}+V_{i} \\ \boldsymbol{\beta} \sim N_{p}\left(\boldsymbol{\beta}_{0}, \boldsymbol{S}_{\boldsymbol{\beta}_{0}}\right) \\ V_{i} \mid G \stackrel{\text { iid }}{\sim} G \end{gathered}

and

G∣α,σ2∼PT(Πσ2,Aα)G \mid \alpha, \sigma^{2} \sim P T\left(\Pi^{\sigma^{2}}, \mathcal{A}^{\alpha}\right)

where, the PT is centered around a N(0,σ2)N\left(0, \sigma^{2}\right) distribution, by taking each mm level of the partition Πσ2\Pi^{\sigma^{2}} to coincide with the k/2m,k=0,…,2mk / 2^{m}, k=0, \ldots, 2^{m} quantiles of the N(0,σ2)N\left(0, \sigma^{2}\right) distribution. The family Aα={αϵ:ϵ∈E∗}\mathcal{A}^{\alpha}=\left\{\alpha_{\epsilon}: \epsilon \in E^{*}\right\}, where E∗=⋃m=1∞EmE^{*}=\bigcup_{m=1}^{\infty} E^{m} and EmE^{m} is the mm-fold product of E={0,1}E=\{0,1\},

was specified as αϵ1…ϵm=αm2\alpha_{\epsilon_{1} \ldots \epsilon_{m}}=\alpha m^{2}. To complete the model specification, independent hyperpriors are assumed,

α∣a0,b0∼Γ(a0,b0)σ−2∣τ1,τ2∼Γ(τ1/2,τ2/2)\begin{gathered} \alpha\left|a_{0}, b_{0} \sim \Gamma\left(a_{0}, b_{0}\right)\right. \\ \sigma^{-2}\left|\tau_{1}, \tau_{2} \sim \Gamma\left(\tau_{1} / 2, \tau_{2} / 2\right)\right. \end{gathered}

Optionally, if frstlprob=FALSE (the default value is TRUE) is specified, a mean regression model is considered. In this case, the following PT prior is considered

G∣α,μ,σ2∼PT(Πμ,σ2,Aα)G \mid \alpha, \mu, \sigma^{2} \sim P T\left(\Pi^{\mu, \sigma^{2}}, \mathcal{A}^{\alpha}\right)

where, the PT is centered around a N(μ,σ2)N\left(\mu, \sigma^{2}\right) distribution. In this case, the intercept term is automatically excluded from the model and the hyperparameters for the normal prior for μ\mu must be specified. The normal prior is given by

μ∣μb,Sb∼N(μb,Sb)\mu \mid \mu_{b}, S_{b} \sim N\left(\mu_{b}, S_{b}\right)

In the computational implementation of the model, random-walk Metropolis steps are used to update the regression coefficients and hyperparameters.

The current version of DPpackage considers models for related random probability distributions based on particular implementations of the dependent DP (DDP) proposed by MacEachern (1999, 2000), a natural generalization of the approach discussed by Müller et al. (1996) for nonparametric regression to the context of conditional density estimation, and the hierarchical mixture of DPM models (HDPM) proposed by Müller et al. (2004). These approaches and the associated functions are described next.

Linear dependent Dirichlet process

MacEachern (1999,2000)(1999,2000), proposes the DDP as an approach to define a prior model for an uncountable set of random measures indexed by a single continuous covariate, say x,{Gx:x∈X⊂R}x,\left\{G_{x}: x \in \mathcal{X} \subset \mathbb{R}\right\}. The key idea behind the DDP is to create an uncountable set of DPs (Ferguson, 1973) and to introduce dependence by modifying the Sethuraman (1994)'s stick-breaking representation of each element in the set. If GG follows a DP prior with precision parameter α\alpha and base measure G0G_{0}, denoted by G∼DP(αG0)G \sim D P\left(\alpha G_{0}\right), then the stick-breaking representation of GG is

G(B)=∑l=1∞ωlδθl(B)G(B)=\sum_{l=1}^{\infty} \omega_{l} \delta_{\theta_{l}}(B)

where BB is a measurable set, δa(⋅)\delta_{a}(\cdot) is the Dirac measure at a,θl∣G0∼iidG0a, \theta_{l} \mid G_{0} \stackrel{i i d}{\sim} G_{0} and ωl=Vl∏j<l(1−Vj)\omega_{l}=V_{l} \prod_{j<l}\left(1-V_{j}\right), with Vl∣α∼iidBeta⁡(1,α)V_{l} \mid \alpha \stackrel{i i d}{\sim} \operatorname{Beta}(1, \alpha). MacEachern (1999,2000)(1999,2000) generalizes (3) by assuming the point masses θ(x)l,l=1,…\theta(x)_{l}, l=1, \ldots, to be dependent across different levels of xx, but independent across ll.
De Iorio et al. (2004) and De Iorio et al. (2009) proposed a particular version of the DDP where the component of the atoms defining the location in a DDP mixture model follows a linear regression model θl(x)=(x′βl,σl2)\theta_{l}(\boldsymbol{x})=\left(\boldsymbol{x}^{\prime} \boldsymbol{\beta}_{l}, \sigma_{l}^{2}\right), where x\boldsymbol{x} is a pp-dimensional design vector. An advantage of this model

for related random probability measures, referred to as the Linear DDP (LDDP), is that it can be represented as DPM of linear (in the coefficients) regression models. This approach is implemented in the LDDPdensity function, where for the regression data (yi,xi),i=1,…,n\left(y_{i}, \boldsymbol{x}_{i}\right), i=1, \ldots, n, the following model is considered

yi∣G∼ ind. ∫N(yi∣xi′β,σ2)dG(β,σ2)y_{i} \mid G \stackrel{\text { ind. }}{\sim} \int N\left(y_{i} \mid \boldsymbol{x}_{i}^{\prime} \boldsymbol{\beta}, \sigma^{2}\right) d G\left(\boldsymbol{\beta}, \sigma^{2}\right)

and

G∣α,G0∼DP(αG0)G \mid \alpha, G_{0} \sim D P\left(\alpha G_{0}\right)

where G0≡Np(β∣μb,Sb)Γ(σ−2∣τ1/2,τ2/2)G_{0} \equiv N_{p}\left(\boldsymbol{\beta} \mid \boldsymbol{\mu}_{b}, \boldsymbol{S}_{b}\right) \Gamma\left(\sigma^{-2} \mid \tau_{1} / 2, \tau_{2} / 2\right). The LDDP model specification is completed with the following hyper-priors

α∣a0,b0∼Γ(a0,b0)τ2∣τs1,τs2∼Γ(τs1/2,τs2/2)μb∣m0,S0∼Np(m0,S0)\begin{gathered} \alpha \mid a_{0}, b_{0} \sim \Gamma\left(a_{0}, b_{0}\right) \\ \tau_{2} \mid \tau_{s_{1}}, \tau_{s_{2}} \sim \Gamma\left(\tau_{s_{1}} / 2, \tau_{s_{2}} / 2\right) \\ \boldsymbol{\mu}_{b} \mid \boldsymbol{m}_{0}, \boldsymbol{S}_{0} \sim N_{p}\left(\boldsymbol{m}_{0}, \boldsymbol{S}_{0}\right) \end{gathered}

and

Sb∣ν,Ψ∼IWp(ν,Ψ)\boldsymbol{S}_{b} \mid \nu, \boldsymbol{\Psi} \sim I W_{p}(\nu, \boldsymbol{\Psi})

The LDDPsurvival function implements this model in the context of survival data. Now let yiy_{i} the time to event for the ii th subject. The LDDP mixture of survival models is given by

log⁡yi∣G∼ ind. ∫N(log⁡yi∣xi′β,σ2)dG(β,σ2)\log y_{i} \mid G \stackrel{\text { ind. }}{\sim} \int N\left(\log y_{i} \mid \boldsymbol{x}_{i}^{\prime} \boldsymbol{\beta}, \sigma^{2}\right) d G\left(\boldsymbol{\beta}, \sigma^{2}\right)

with the same hierarchical specification given above for the LDDPdensity function. Note that this function can deal with censored observations by using a data-augmented approach.
Finally, the LDDPrach and LDDPrachpoisson functions consider this modeling strategy in a Rasch and Rasch Poisson model context, respectively, as in Fariña, Quintana, San Martín, and Jara (2009). Here the linear predictor is given by ηij=θi−βj\eta_{i j}=\theta_{i}-\beta_{j}, where the abilities follow a LDDP mixture of normals model based on subject-specific covariates included in xi\boldsymbol{x}_{i},

θi∣G∼ ind. ∫N(θi∣xi′β,σ2)dG(β,σ2)\theta_{i} \mid G \stackrel{\text { ind. }}{\sim} \int N\left(\theta_{i} \mid \boldsymbol{x}_{i}^{\prime} \boldsymbol{\beta}, \sigma^{2}\right) d G\left(\boldsymbol{\beta}, \sigma^{2}\right)

These functions fit a marginalized version of the models where the random probability measure GG is integrated out. Full inference on the conditional density, and survival and hazard functions in the case of the LDDPsurvival function, at covariate level are obtained using the ϵ−DP\epsilon-\mathrm{DP} approximation proposed by Muliere and Tardella (1998), with ϵ=0.01\epsilon=0.01.

Weight dependent Dirichlet process

Let xi=(1,zi′)′\boldsymbol{x}_{i}=\left(1, \boldsymbol{z}_{i}^{\prime}\right)^{\prime}, where zi\boldsymbol{z}_{i} is a pp-dimensional vector of continuous predictors. The LDDP of the previous section defines a mixture model where the weights are independent of the predictors z\boldsymbol{z}, given by

fz(⋅)=∑l=1∞ωlN(⋅∣β0l+z′βl,σl2)f_{\boldsymbol{z}}(\cdot)=\sum_{l=1}^{\infty} \omega_{l} N\left(\cdot \mid \beta_{0 l}+\boldsymbol{z}^{\prime} \boldsymbol{\beta}_{l}, \sigma_{l}^{2}\right)

where the weights ωl\omega_{l} follow a stick-breaking construction and (β0l,βl,σl2)∼ iid G0\left(\beta_{0 l}, \boldsymbol{\beta}_{l}, \sigma_{l}^{2}\right) \stackrel{\text { iid }}{\sim} G_{0}. Motivated by regression problems with continuous predictors different extensions have been proposed by making the weights dependent on covariates (see, e.g. Griffin and Steel, 2006; Duan, Guindani, and Gelfand, 2007; Dunson, Pillai, and Park, 2007b; Dunson and Park, 2008), such that

fz(⋅)=∑l=1∞ωl(z)N(⋅∣β0l+z′βl,σl2)f_{\boldsymbol{z}}(\cdot)=\sum_{l=1}^{\infty} \omega_{l}(\boldsymbol{z}) N\left(\cdot \mid \beta_{0 l}+\boldsymbol{z}^{\prime} \boldsymbol{\beta}_{l}, \sigma_{l}^{2}\right)

An earlier approach that is related to the latter references and that also induces a weight-dependent DP model, as in expression (4), was discussed by Müller et al. (1996). These authors fitted a “standard” DPM of multivariate Gaussian distributions to the complete data di=(yi,zi)′,i=1,…,n\boldsymbol{d}_{i}=\left(y_{i}, \boldsymbol{z}_{i}\right)^{\prime}, i=1, \ldots, n, and looked at the induced conditional distributions. Although Müller et al. (1996) focused on the mean function only, m(z)=E(y∣z)m(\boldsymbol{z})=E(y \mid \boldsymbol{z}), their method can be easily extended to provide inferences for the conditional density at covariate level z\boldsymbol{z}, i.e. a “density regression” model in the spirit of Dunson et al. (2007b). The extension of the approach of Müller et al. (1996) for related probability measures is implemented in the DPcdensity function, where the model is given by

di∣G∼ ind. ∫Nk(di∣μ,Σ)dG(μ,Σ)\boldsymbol{d}_{i} \mid G \stackrel{\text { ind. }}{\sim} \int N_{k}\left(\boldsymbol{d}_{i} \mid \boldsymbol{\mu}, \boldsymbol{\Sigma}\right) d G(\boldsymbol{\mu}, \boldsymbol{\Sigma})

and

G∣α,G0∼DP(αG0)G \mid \alpha, G_{0} \sim D P\left(\alpha G_{0}\right)

where k=p+1k=p+1 is the dimension of the vector of complete data di\boldsymbol{d}_{i}, the baseline distribution G0G_{0} is the conjugate normal-inverted-Wishart (IW) distribution G0≡N2(μ∣m1,κ0−1Σ)IW2(Σ∣ν1,Ψ1)G_{0} \equiv N_{2}\left(\boldsymbol{\mu} \mid \boldsymbol{m}_{1}, \kappa_{0}^{-1} \boldsymbol{\Sigma}\right) I W_{2}\left(\boldsymbol{\Sigma} \mid \nu_{1}, \boldsymbol{\Psi}_{1}\right). To complete the model specification, the following hyper-priors are assumed

α∣a0,b0∼Γ(a0,b0)m1∣m2,S2∼N2(m2,S2)κ0∣τ1,τ2∼Γ(τ1/2,τ2/2)\begin{gathered} \alpha \mid a_{0}, b_{0} \sim \Gamma\left(a_{0}, b_{0}\right) \\ \boldsymbol{m}_{1} \mid \boldsymbol{m}_{2}, \boldsymbol{S}_{2} \sim N_{2}\left(\boldsymbol{m}_{2}, \boldsymbol{S}_{2}\right) \\ \kappa_{0} \mid \tau_{1}, \tau_{2} \sim \Gamma\left(\tau_{1} / 2, \tau_{2} / 2\right) \end{gathered}

and

Ψ1∣ν2,Ψ2∼IW2(ν2,Ψ2)\boldsymbol{\Psi}_{1} \mid \nu_{2}, \boldsymbol{\Psi}_{2} \sim I W_{2}\left(\nu_{2}, \boldsymbol{\Psi}_{2}\right)

This model induce a weight dependent mixture models, as in expression (4), where the components are given by

ωl(z)=ωlNp(z∣μ2l,Σ22l)∑j=1∞ωjNp(z∣μ2j,Σ22j)β0l=μ1l−Σ12lΣ22l−1μ2lβl=Σ12lΣ22l−1\begin{gathered} \omega_{l}(\boldsymbol{z})=\frac{\omega_{l} N_{p}\left(\boldsymbol{z} \mid \boldsymbol{\mu}_{2 l}, \boldsymbol{\Sigma}_{22 l}\right)}{\sum_{j=1}^{\infty} \omega_{j} N_{p}\left(\boldsymbol{z} \mid \boldsymbol{\mu}_{2 j}, \boldsymbol{\Sigma}_{22 j}\right)} \\ \beta_{0 l}=\mu_{1 l}-\boldsymbol{\Sigma}_{12 l} \boldsymbol{\Sigma}_{22 l}^{-1} \boldsymbol{\mu}_{2 l} \\ \boldsymbol{\beta}_{l}=\boldsymbol{\Sigma}_{12 l} \boldsymbol{\Sigma}_{22 l}^{-1} \end{gathered}

and

σl2=σ11l2−Σ12lΣ22l−1Σ21l\sigma_{l}^{2}=\sigma_{11 l}^{2}-\boldsymbol{\Sigma}_{12 l} \boldsymbol{\Sigma}_{22 l}^{-1} \boldsymbol{\Sigma}_{21 l}

where the weights ωl\omega_{l} follow a DP stick-breaking construction and the remaining elements arise from the standard partition of the vectors of means and (co)variance matrices given by

μl=(μ1lμ2l), and Σl=(σ11l2Σ12lΣ21lΣ22l)\boldsymbol{\mu}_{l}=\left(\begin{array}{l} \mu_{1 l} \\ \boldsymbol{\mu}_{2 l} \end{array}\right), \text { and } \boldsymbol{\Sigma}_{l}=\left(\begin{array}{ll} \sigma_{11 l}^{2} & \boldsymbol{\Sigma}_{12 l} \\ \boldsymbol{\Sigma}_{21 l} & \boldsymbol{\Sigma}_{22 l} \end{array}\right)

respectively.
The DPcdensity function fits a marginalized version of the model where the random probability measure GG is integrated out. Full inference on the conditional density at covariate level z\boldsymbol{z} is obtained using the ϵ\epsilon-DP approximation proposed by Muliere and Tardella (1998), with ϵ=0.01\epsilon=0.01.

Hierarchical mixture of Dirichlet process mixture of normals

The HDPMdensity function considers the hierarchical mixture of DPM of normal models for density estimation presented in Müller et al. (2004). Let yij\boldsymbol{y}_{i j} be the qq-dimensional vector of responses for the jj th observation, j=1,…,nij=1, \ldots, n_{i}, for the ii th group, i=1,…,Ii=1, \ldots, I. The model assumes that

yi1,…,yini∣Fi∼iidFi\boldsymbol{y}_{i 1}, \ldots, \boldsymbol{y}_{i n_{i}} \mid F_{i} \stackrel{i i d}{\sim} F_{i}

where FiF_{i} is assumed to arise as a mixture model Fi=ϵH0+(1−ϵ)HiF_{i}=\epsilon H_{0}+(1-\epsilon) H_{i} of one common distribution H0H_{0} and a distribution HiH_{i} that is specific or idiosyncratic to the ii th group. The random probability measures Hi,i=0,1,…,IH_{i}, i=0,1, \ldots, I in turn are given a DPM of normal prior,

Hi(y)=∫Nq(y∣μ,Σ)dGi(μ)H_{i}(\boldsymbol{y})=\int N_{q}(\boldsymbol{y} \mid \boldsymbol{\mu}, \boldsymbol{\Sigma}) d G_{i}(\boldsymbol{\mu})

with

Gi∣αi,μb,Σb∼DP(αNq(μb,Σb))G_{i} \mid \alpha_{i}, \boldsymbol{\mu}_{b}, \boldsymbol{\Sigma}_{b} \sim D P\left(\alpha N_{q}\left(\boldsymbol{\mu}_{b}, \boldsymbol{\Sigma}_{b}\right)\right)

The model specification is completed by assuming the following hyper-priors,

Σ∣ν,T∼IWq(ν,T)αi∣a0i,b0i∼Γ(a0i,b0i)μb∣m0,S0∼Nq(m0,S0)Σb∣νb,Tb∼IWq(νb,Tb)\begin{gathered} \boldsymbol{\Sigma} \mid \nu, \boldsymbol{T} \sim I W_{q}(\nu, \boldsymbol{T}) \\ \alpha_{i} \mid a_{0 i}, b_{0 i} \sim \Gamma\left(a_{0 i}, b_{0 i}\right) \\ \boldsymbol{\mu}_{b} \mid \boldsymbol{m}_{0}, \boldsymbol{S}_{0} \sim N_{q}\left(\boldsymbol{m}_{0}, \boldsymbol{S}_{0}\right) \\ \boldsymbol{\Sigma}_{b} \mid \nu_{b}, \boldsymbol{T}_{b} \sim I W_{q}\left(\nu_{b}, \boldsymbol{T}_{b}\right) \end{gathered}

and

ϵ∣π0,π1,aϵ,bϵ∼π0δ0+π1δ1+(1−π0−π1)β(aϵ,bϵ)\epsilon \mid \pi_{0}, \pi_{1}, a_{\epsilon}, b_{\epsilon} \sim \pi_{0} \delta_{0}+\pi_{1} \delta_{1}+\left(1-\pi_{0}-\pi_{1}\right) \beta\left(a_{\epsilon}, b_{\epsilon}\right)

where δc\delta_{c} represents the Dirac measure at cc, and β(a,b)\beta(a, b) represents the beta distribution with parameters aa and bb.

The HDPMcdensity function considers the extension of the previously described approach to the inclusion of continuos predictors z\boldsymbol{z}. This functions fits the HDPM model to the complete data di=\boldsymbol{d}_{i}= (yi,zi)′,i=1,…,n\left(y_{i}, \boldsymbol{z}_{i}\right)^{\prime}, i=1, \ldots, n, and reports the induced conditional distributions.

3.10. Generalized additive models

The PSgam function fits a generalized additive model (see, e.g. Hastie and Tibshirani, 1990) using Penalized splines (see e.g., Eilers and Marx, 1996; Lang and Brezger, 2004). The linear predictors ηi\eta_{i}, i=1,…,ni=1, \ldots, n, are modeled in an additive way. Let xi\boldsymbol{x}_{i} be a pp-dimensional design vector and zi\boldsymbol{z}_{i} be a qq-dimensional vector of continuous predictors. Then, the model is given by

ηi=xi′β+∑j=1qfj(zij)\eta_{i}=\boldsymbol{x}_{i}^{\prime} \boldsymbol{\beta}+\sum_{j=1}^{q} f_{j}\left(z_{i j}\right)

where the effect fjf_{j} of the a covariate zjz_{j} is approximated by a polynomial spline with equally spaced knots, written in terms of a linear combination of B-spline basis functions. Specifically, the function fjf_{j} is aproximated by a spline of degree ll with rr equally spaced knots within the domain of zjz_{j},

fj(zj)=∑m=1l+rbjmBjml(zj)f_{j}\left(z_{j}\right)=\sum_{m=1}^{l+r} b_{j m} B_{j m}^{l}\left(z_{j}\right)

where Bjml(⋅)B_{j m}^{l}(\cdot) are B-spline basis function of degree ll, and bjmb_{j m} represents the associated B-spline coefficients. For the parametric component of the model, a normal prior distribution is assumed,

β∼Np(β0,Sβ0)\boldsymbol{\beta} \sim N_{p}\left(\boldsymbol{\beta}_{0}, \boldsymbol{S}_{\boldsymbol{\beta}_{0}}\right)

For the vector of basis coefficients bj=(bj1,…,bj(l+r))T\boldsymbol{b}_{j}=\left(b_{j 1}, \ldots, b_{j(l+r)}\right)^{T}, independent Gaussian smoothness priors (Lang and Brezger, 2004) are assumed

p(bj∣σbj2)∝exp⁡(−12σbj2bj′Kjbj)p\left(\boldsymbol{b}_{j} \mid \sigma_{b j}^{2}\right) \propto \exp \left(-\frac{1}{2 \sigma_{b j}^{2}} \boldsymbol{b}_{j}^{\prime} \boldsymbol{K}_{j} \boldsymbol{b}_{j}\right)

The precision matrix acts as a penalty matrix to enforce smoothness and is defined through Kj=\boldsymbol{K}_{j}= DjTDj\boldsymbol{D}_{j}^{T} \boldsymbol{D}_{j}, where Dj\boldsymbol{D}_{j} is a first or second order difference matrix for adjacent B-spline coefficients. The variance (or inverse smoothing) parameter σbj2\sigma_{b j}^{2} controls the amount of smoothness. Note that the logpenalty corresponds exactly to the penalty term introduced by Eilers and Marx (1996) in a frequentist penalized likelihood setting. For the variance parameters, we assume independent inverse gamma priors

σbj−2∣τb1,τb2∼Γ(τb1/2,τb2/2)\sigma_{b j}^{-2} \mid \tau_{b 1}, \tau_{b 2} \sim \Gamma\left(\tau_{b 1} / 2, \tau_{b 2} / 2\right)

Finally, for the gamma and Gaussian models, an inverse gamma prior is assumed for the dispersion parameter σ2\sigma^{2},

σ−2∣τ1,τ2∼Γ(τ1/2,τ2/2)\sigma^{-2} \mid \tau_{1}, \tau_{2} \sim \Gamma\left(\tau_{1} / 2, \tau_{2} / 2\right)

The computational implementation of the model is model-specific. For the Poisson, gamma, and binomial (logit) models, fixed and random effects are updated using MH steps with a IWLS normal

proposal (see, West, 1985; Gamerman, 1997). For the probit-Bernoulli model, the latent variable representation of the binary responses is used, leading to conjugate normal updates.

3.11. Additional tools

Additional functions included in the package are DPelicit and PsBF. The DPelicit function implements methods for eliciting the DP prior using exact and approximated formulas for the mean and variance of the number of clusters given the total mass parameter and the number of subjects (see, e.g. Jara, García-Zattera, and Lesaffre, 2007). The PsBF function computes pseudo-Bayes factors for model comparison.
The practical implementation of models based on DP priors with a random precision parameter requires adopting values for the hyperparameters a0a_{0} and b0b_{0}. The discrete nature of the DP realizations leads to their well-known clustering properties. The choice of a0a_{0} and b0b_{0} needs some careful thoughts, as the parameter α\alpha directly controls the number of distinct components. Kottas, Müller, and Quintana (2005), referred to as the KMQ approach, and Jara et al. (2007), referred to as the JGL approach, proposed strategies for the specification of these hyperparameters.
The KMQ approach is based on approximations of the conditional mean and conditional variance of the number of clusters, given the precision parameter α\alpha (see e.g., Liu, 1996). Specifically, denoting by nn the number of elements associated to the DP prior, and n∗n^{*} the number of resulting clusters, their approach relies on

E(n∗∣α)=∑i=1nαα+i−1≈αlog⁡(α+nα)Var⁡(n∗∣α)=∑i=1nα(i−1)(α+i−1)2≈α{log⁡(α+nα)−1}\begin{gathered} E\left(n^{*} \mid \alpha\right)=\sum_{i=1}^{n} \frac{\alpha}{\alpha+i-1} \approx \alpha \log \left(\frac{\alpha+n}{\alpha}\right) \\ \operatorname{Var}\left(n^{*} \mid \alpha\right)=\sum_{i=1}^{n} \frac{\alpha(i-1)}{(\alpha+i-1)^{2}} \approx \alpha\left\{\log \left(\frac{\alpha+n}{\alpha}\right)-1\right\} \end{gathered}

Using the fact that a priori E(α∣a0,b0)=a0b0E\left(\alpha \mid a_{0}, b_{0}\right)=\frac{a_{0}}{b_{0}} and Var⁡(α∣a0,b0)=a0b02\operatorname{Var}\left(\alpha \mid a_{0}, b_{0}\right)=\frac{a_{0}}{b_{0}^{2}}, the resulting expressions for the prior mean and variance of n∗n^{*} are

E(n∗)≈a0b0log⁡(1+nb0a0)E\left(n^{*}\right) \approx \frac{a_{0}}{b_{0}} \log \left(1+\frac{n b_{0}}{a_{0}}\right)

and

Var⁡(n∗)≈a0b0log⁡(1+nb0a0)−nb0a0+{log⁡(1+nb0a0)−nb0a0+nb0}2a0b02\operatorname{Var}\left(n^{*}\right) \approx \frac{a_{0}}{b_{0}} \log \left(1+\frac{n b_{0}}{a_{0}}\right)-\frac{n b_{0}}{a_{0}}+\left\{\log \left(1+\frac{n b_{0}}{a_{0}}\right)-\frac{n b_{0}}{a_{0}+n b_{0}}\right\}^{2} \frac{a_{0}}{b_{0}^{2}}

On the other hand, the JGL approach is based on the exact value of conditional mean and conditional variance of the number of clusters given the precision parameter α\alpha. They noted that the approximations given by the expression (5) and expression (6) may be dangerous when α\alpha is considered a function of nn. For instance, (5) gives 0 instead of 1 with α=1n\alpha=\frac{1}{n}. Better approximations may be obtained by noticing that

E(n∗∣α)=∑i=1nαα+i−1=α{ψ0(α+n)−ψ0(α)}E\left(n^{*} \mid \alpha\right)=\sum_{i=1}^{n} \frac{\alpha}{\alpha+i-1}=\alpha\left\{\psi_{0}(\alpha+n)-\psi_{0}(\alpha)\right\}

and

Var⁡(n∗∣α)=∑i=1nα(i−1)(α+i−1)2=α{ψ0(α+n)−ψ0(α)}+α2{ψ1(α+n)−ψ1(α),}\operatorname{Var}\left(n^{*} \mid \alpha\right)=\sum_{i=1}^{n} \frac{\alpha(i-1)}{(\alpha+i-1)^{2}}=\alpha\left\{\psi_{0}(\alpha+n)-\psi_{0}(\alpha)\right\}+\alpha^{2}\left\{\psi_{1}(\alpha+n)-\psi_{1}(\alpha),\right\}

where ψ0(\psi_{0}(.)andψ1() and \psi_{1}(.)representsthedigammaandtrigammafunction,respectively.Usingthese) represents the digamma and trigamma function, respectively. Using these results, an approximation based on a first-order Taylor series expansion, and the fact that a priori E(α∣a0,b0)=a0b0E\left(\alpha \mid a_{0}, b_{0}\right)=\frac{a_{0}}{b_{0}} and Var⁡(α∣a0,b0)=a0b02\operatorname{Var}\left(\alpha \mid a_{0}, b_{0}\right)=\frac{a_{0}}{b_{0}^{2}} we get

E(n∗)≈a0b0{ψ0(a0+nb0b0)−ψ0(a0b0)}E\left(n^{*}\right) \approx \frac{a_{0}}{b_{0}}\left\{\psi_{0}\left(\frac{a_{0}+n b_{0}}{b_{0}}\right)-\psi_{0}\left(\frac{a_{0}}{b_{0}}\right)\right\}

and

Var⁡(n∗)≈a0b0{ψ0(a0+nb0b0)−ψ0(a0b0)}+a02b02{ψ1(a0+nb0b0)−ψ1(a0b0)}+{a0b0[ψ1(a0+nb0b0)−ψ1(a0b0)]+ψ0(a0+nb0b0)−ψ0(a0b0)}2a0b02.\begin{aligned} & \operatorname{Var}\left(n^{*}\right) \approx \frac{a_{0}}{b_{0}}\left\{\psi_{0}\left(\frac{a_{0}+n b_{0}}{b_{0}}\right)-\psi_{0}\left(\frac{a_{0}}{b_{0}}\right)\right\}+\frac{a_{0}^{2}}{b_{0}^{2}}\left\{\psi_{1}\left(\frac{a_{0}+n b_{0}}{b_{0}}\right)-\psi_{1}\left(\frac{a_{0}}{b_{0}}\right)\right\}+ \\ & \left\{\frac{a_{0}}{b_{0}}\left[\psi_{1}\left(\frac{a_{0}+n b_{0}}{b_{0}}\right)-\psi_{1}\left(\frac{a_{0}}{b_{0}}\right)\right]\right. \\ & \left.+\psi_{0}\left(\frac{a_{0}+n b_{0}}{b_{0}}\right)-\psi_{0}\left(\frac{a_{0}}{b_{0}}\right)\right\}^{2} \frac{a_{0}}{b_{0}^{2}} . \end{aligned}

These expressions could be used in order to evaluate the robustness of the model to the specification of prior distribution for the precision parameter. The function DPelicit computes either the expected value and the standard deviation of the number of clusters, given the values of the parameters of the Gamma prior for the precision parameter, a0a_{0} and b0b_{0}, or the value of the parameters a0a_{0} and b0b_{0} of the Gamma prior distribution for the precision parameter, α\alpha, given the prior judgement for the expected number and the standard deviation of the number of clusters. With this objective in mind, the NewtonRaphson algorithm and the forward-difference approximation to Jacobian are used.

4. Examples

In this section we consider the analyses of simulated and real-life data in order to illustrate the usage of DPpackage.

4.1. Bayesian density regression

We illustrate the DPcdensity and LDDPdensity functions by means of simulated data. We replicate the results reported by Dunson et al. (2007b), where a different approach is proposed. Following Dunson et al. (2007b), we simulate n=500n=500 observations from from a mixture of two normal linear regression models, with the mixture weights depending on the predictor, with different error variances and with a non-linear mean function for the second component,

yi∣xi∼ ind. exp⁡{−2xi}N(yi∣xi,0.01)+(1−exp⁡{−2xi})N(yi∣xi4,0.04),i=1,…,ny_{i} \mid x_{i} \stackrel{\text { ind. }}{\sim} \exp \left\{-2 x_{i}\right\} N\left(y_{i} \mid x_{i}, 0.01\right)+\left(1-\exp \left\{-2 x_{i}\right\}\right) N\left(y_{i} \mid x_{i}^{4}, 0.04\right), \quad i=1, \ldots, n

where the predictor values xix_{i} are simulated from a uniform distribution, xi∼ iid U(0,1)x_{i} \stackrel{\text { iid }}{\sim} U(0,1). The data was simulated using the following piece of code

#######################################################################
# true conditional densities,
# mean function and
# simulation of the data.
###########################################
dtrue <- function(grid, x)
{
        exp(-2*x)*dnorm(grid,mean=x,sd=sqrt (0.01))+
        (1-exp(-2*x))*dnorm(grid,mean=x^4,sd=sqrt (0.04))
}
mtrue <- function(x)
{
    exp(-2*x)*x+(1-exp(-2*x))*x^4
}
set.seed(0)
nrec <- 500
x <- runif(nrec)
y1 <- x + rnorm(nrec, 0, sqrt(0.01))
y2 <- x^4 + rnorm(nrec, 0, sqrt(0.04))
u <- runif(nrec)
prob <- exp(-2*x)
y <- ifelse(u<prob,y1,y2)

The extension of the DPM of normals approach of Müller et al. (1996) considered by the DPcdensity function, was fitted using the following hyper-parameters: a0=10,b0=1,ν1=ν2=4a_{0}=10, b_{0}=1, \nu_{1}=\nu_{2}=4, m2=(yˉ,xˉ)′,τ1=6.01,τ2=3.01m_{2}=(\bar{y}, \bar{x})^{\prime}, \tau_{1}=6.01, \tau_{2}=3.01, and S2=Ψ2−1=0.5S\boldsymbol{S}_{2}=\boldsymbol{\Psi}_{2}^{-1}=0.5 \boldsymbol{S}, where S\boldsymbol{S} is the sample covariance matrix for the response and predictor. A total number of 25,000 scans of the Markov chain cycle implemented in the DPcdensity function were completed. A burn-in period of 5,000 samples was considered and the chain was subsampled every 4 iterates to get a final sample size of 5,000. The following commands were used to fit the model, where the conditional density estimates were evaluated on a grid of 100 points on the range of the response,

#####################################################################################################################################################################################################################```

    psiinv2=2*solve(wcov),
    tau1=6.01,
    tau2=3.01)
#############################################################################
# mcmc specification
############################################################################
    mcmc <- list(nburn=5000,
        nsave=5000,
        nskip=3,
        ndisplay=1000)
#############################################################################
# covariate values where the density
# and mean function is evaluated
############################################################################
    xpred <- seq(0,1,0.02)
############################################################################
# fitting the model
############################################################################
    fitWDDP <- DPcdensity(y=y,x=x,
                        xpred=xpred,
                        ngrid=100,
                        prior=prior,
                        mcmc=mcmc,
                        state=NULL,
                        status=TRUE)

Using the same MCMC specification, the LDDP model was also fitted to the data. The LDDPdensity function was used to fit a a mixture of B-splines models with x′β=β0+∑j=16ψj(x)βj\boldsymbol{x}^{\prime} \boldsymbol{\beta}=\beta_{0}+\sum_{j=1}^{6} \psi_{j}(x) \beta_{j}, where ψk(x)\psi_{k}(x) corresponds to the kk th B-spline basis function evaluated at xx as implemented in the bs function of the splines R package. The LDDP model was fitted using Zellner’s g-prior (Zellner, 1983), with g=103g=10^{3}. The following values for the hyper-parameters were considered: α0=10,b0=1\alpha_{0}=10, b_{0}=1, m0=(X′X)−1X′y,S0=g(X′X)−1,τ1=6.01,τs1=6.01,τs2=2.01,ν=9\boldsymbol{m}_{0}=\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1} \boldsymbol{X}^{\prime} \boldsymbol{y}, \boldsymbol{S}_{0}=g\left(\boldsymbol{X}^{\prime} \boldsymbol{X}\right)^{-1}, \tau_{1}=6.01, \tau_{s 1}=6.01, \tau_{s 2}=2.01, \nu=9, and Ψ−1=S0\boldsymbol{\Psi}^{-1}=\boldsymbol{S}_{0}. The following piece of code was used to fit the model:

############################################################################
# prior information
############################################################################
    library(splines)
    W <- cbind(rep(1,nrec),bs(x,df=6))
    S0 <- 1000*solve(t (W) %*%W)
    m0 <- solve(t (W) %*%W) %*%t (W) %*%y
    prior<-list(a0=10,
        b0=1,

    m0=m0,
    S0=S0,
    tau1=6.01,
    taus1=6.01,
    taus2=2.01,
    nu=9,
    psiinv=solve(S0))
#############################################################################
# covariate values where the density
# and mean function is evaluated
############################################################################
    xpred <- seq(0,1,0.02)
    Wpred <- cbind(rep(1,length(xpred)),bs(xpred,df=6))
#############################################################################
# fitting the model
############################################################################
    fitLDDP <- LDDPdensity(formula=y-W-1,zpred=Wpred,
                                    ngrid=100,
        prior=prior,
        mcmc=mcmc,
        state=NULL,
        status=TRUE)

Figures 1 and 2 show the true density, the estimated density and point-wise 95% HPD intervals for a range of values of the predictor for the WDDP and LDDP model, respectively. The estimates correspond approximately to the true densities in each case. The figures also display the plot of the data along with the estimated mean function, which is very close to the true one under both models.
In both functions, the posterior mean estimates and the limits of point-wise 95% HPD intervals for the conditional density for each value of the predictors are stored in the model objects densp.m, and densp. 1 and densp. hh, respectively. The following piece of code illustrates how these objects can be used in order to get the posterior estimates for x=0.1x=0.1 in the LDDP model. This code was used to draw the plots displayed in Figures 1 and 2.

par(cex=1.5,mar=c(4.1, 4.1, 1, 1))
plot(fitLDDP$grid,fitLDDP$densp.h[6,],lwd=3,type="l",lty=2,
    main="",xlab="y",ylab="f(y|x)",ylim=c(0,4))
lines(fitLDDP$grid,fitLDDP$densp.l[6,],lwd=3,type="l",lty=2)
lines(fitLDDP$grid,fitLDDP$densp.m[6,],lwd=3,type="l",lty=1)
lines(fitLDDP$grid,dtrue(fitLDDP$grid,xpred[6]),lwd=3,
    type="l",lty=1,col="red")

Finally, both functions return the posterior mean estimates and the limits of point-wise 95% HPD intervals for the mean function in the model objects meanfp.m, and meanfp.l and meanfp.h, respectively. The following pice of code was used to obtain the estimated mean function under the LDDP model along with the true function.

Figure 1: Simulated data - WDDP model: True conditional densities of y∣xy \mid x (in red), posterior mean estimates (black continuos line) and point-wise 95% HPD intervals (black dashed lines) for: (a) x=x= 0.1 , (b) x=0.25x=0.25, © x=0.48x=0.48, (d) x=0.76x=0.76, and (e) x=0.88x=0.88. Panel (f) shows the data, along with the true and estimated mean regression curves.

Figure 2: Simulated data - LDDP model: True conditional densities of y∣xy \mid x (in red), posterior mean estimates (black continuos line) and point-wise 95%95 \% HPD intervals (black dashed lines) for: (a) x=x= 0.1 , (b) x=0.25x=0.25, © x=0.48x=0.48, (d) x=0.76x=0.76, and (e) x=0.88x=0.88. Panel (f) shows the data, along with the true and estimated mean regression curves.

par(cex=1.5,mar=c(4.1, 4.1, 1, 1))
plot(x,y,xlab="x",ylab="y",main="")
lines (xpred, fitLDDP$meanfp.m,type="1",lwd=3,1ty=1)
lines (xpred, fitLDDP$meanfp.l,type="1",lwd=3,1ty=2)
lines (xpred, fitLDDP$meanfp.h,type="1",lwd=3,1ty=2)
lines (xpred,mtrue(xpred),col="red",lwd=3)

4.2. Dependent random effects distributions

We consider data from the Chilean system for educational quality measurement (Sistema de Medicición de la Calidad de la Educación, SIMCE). The Chilean education system is subject to several performance evaluations regularly at the school, teacher and student level. In the last case, SIMCE has developed mandatory census-type tests to regularly assess the educational progress at three stages: 4th and 8th grades in primary school ( 9 and 13 years old children, respectively), and 2nd grade in secondary school ( 16 years old children). The SIMCE instruments are designed to assess the achievement of fundamental goals and minimal contents of the curricular frame in different areas of knowledge, currently Spanish, mathematics and science. Here we focus on data from the math test applied in 2004 to 8 grader examinees in primary school. The test consists of 45 multiple choice items questions with 4 alternatives. The response yij∈{0,1}y_{i j} \in\{0,1\} is a binary variable indicating whether the individual ii answers item jj correctly.
The main purpose of collecting these data is to monitor standards and progress of educational systems, focusing on characterizing the population (and its evolution) rather than individual examinees. It is of particular interest to understand the way in which some factors at individual and/or school level could explain systematic differences in the performance of students in order to establish policies to improve the education system. For instance, a significant characteristic of the Chilean elementary and secondary education system is a variety of different school types. These are grouped as Public I, financed by the state and administered by county governments; Public II, financed by the state and administered by county corporations; Private I, financed by the state and administered by the private sector; Private II, fee-paying schools that operate solely on payments from parents and administered by the private sector.
In order to evaluate the effect of the type of school and gender on the student performance we consider the LDDP mixture of normals prior for the abilities in a Rasch model as in Fariña et al. (2009). For illustration purposes, we consider a subset of 500 children. We refer to Fariña et al. (2009) for a full analysis of the complete data. The model is given by

ParseError: KaTeX parse error: Undefined control sequence: \logit at position 117: …i j}\right) \\ \̲l̲o̲g̲i̲t̲\left(\pi_{i j}…

Here, xi\boldsymbol{x}_{i} includes an intercept term, three dummy variables for the type of school and the gender indicator. The LDDP Rasch model was fitted using the LDDPrasch function and assuming β∼\boldsymbol{\beta} \sim N44(0,103I44),α=1,μ0=05,S0=100I5,τ1=6.01,τs1=6.01,τs2=2.01,ν=8,Ψ=I5N_{44}\left(\mathbf{0}, 10^{3} \boldsymbol{I}_{44}\right), \alpha=1, \boldsymbol{\mu}_{0}=\mathbf{0}_{5}, \boldsymbol{S}_{0}=100 \boldsymbol{I}_{5}, \tau_{1}=6.01, \tau_{s 1}=6.01, \tau_{s 2}=2.01, \nu=8, \boldsymbol{\Psi}=\boldsymbol{I}_{5}. A single Markov chain cycle of length 25,000 was completed. The full chain was sub-sampled every

4 steps after a burn in period of 5,000 samples, to give a reduced chain of length 5,000. For each gender and type of school the density of the abilities distribution was evaluated on a grid of 100 equally spaced points in the range (−3,8)(-3,8). The following commands were used to fit the model,

# שעוער
# 4 steps after a burn in period of 5,000 samples, to give a reduced chain of length 5,000. For each gender and type of school the density of the abilities distribution was evaluated on a grid of 100
equally spaced points in the range <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false">(</mo><mo>−</mo><mn>3</mn><mo separator="true">,</mo><mn>8</mn><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(-3,8)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">(</span><span class="mord">−</span><span class="mord">3</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em;"></span><span class="mord">8</span><span class="mclose">)</span></span></span></span>. The following commands were used to fit the model,
# שעוער
# 3 dumries for type of school.
# - gender indicator (1 = girl).
# שעוער
    1, 0, 1, 0, 0,
    1, 0, 0, 1, 0,
    1, 0, 0, 0, 1,
    1, 1, 0, 0, 0,
    1, 0, 1, 0, 1,
    nrow=8,ncol=5, byrow=T)
# שעוער
# prior information
# שעוער
    prior <- list(alpha=1,
    beta0=rep(0,44),
    Sbeta0=diag(1000,44),
    mu0=rep(0,5),
    S0=diag(100,5),
    tau1=6.01,
    taus1=6.01,
    taus2=2.01,
    nu=8,
    psiinv=diag(1,5))
# שעוער
# 4
# mcmc
# שעוער
    # שעוער
    mcmc <- list(nburn=5000,
    nskip=3,
    ndisplay=1000,
    nsave=5000)
# שעוער
# fitting the model
# שעוער
fitLDDP <- LDDfrasch(formula=y ~ types+gender,
prior=prior,

mcmc=mcmc,
state=NULL,
status=TRUE,
zpred=zpred,
grid=seq(-3,8,len=100),
compute.band=TRUE)

Different shapes in the resulting posterior densities were observed. Figure 3 displays the posterior mean and point wise 95%95 \% HPD interval for the random effects distribution for different combinations of the predictors. The density estimates show a clear departure from the commonly assumed normality of the random effects distributions. We found no important differences in the behavior of boys and girls. Children in Public I and II schools showed a similar skewed to the right random effects distribution. The estimated abilities distributions for children in private schools were shifted to the right in comparison with the distribution observed for children from public schools. This shift was more pronounced for children in fee-paying schools that operate solely on payments from parents and administered by the private sector (Private II) than those from schools financed by the state and administered by the private sector (Private I). A bimodal random effects distribution was observed in the abilities distributions from private schools.

4.3. Proportional hazards regression with nonparametric frailties

Consider right censored survival data where failure times are repeatedly observed within a group or subject. Let i=1,…,ni=1, \ldots, n denote the strata over which repeated times-to-event are recorded, and j=1,…,nij=1, \ldots, n_{i} denote the repeated observations within stratum ii. The data are denoted {(wij,tij,δij)\left\{\left(\boldsymbol{w}_{i j}, t_{i j}, \delta_{i j}\right)\right. : i=1,…,n;j=1,…,ni}i=1, \ldots, n ; j=1, \ldots, n_{i}\} where tijt_{i j} is the recorded event time, δi=1\delta_{i}=1 if tijt_{i j} is an observed failure time and δij=0\delta_{i j}=0 if the failure time is right censored at tijt_{i j}, and wij\boldsymbol{w}_{i j} is a pp-dimensional vector of covariates.
Functions fitting generalized linear mixed models (PTglmm, DPglmm, and DPMglmm) can be used to fit the Cox proportional hazards model (Cox, 1972) with nonparametric, multivariate frailties. Briefly, the baseline hazard function λ0(t)\lambda_{0}(t) corresponds to an individual with covariates w=0\boldsymbol{w}=\mathbf{0} and survival time T0T_{0}. Given that the baseline individual has made it up to t,T0≥tt, T_{0} \geq t, the baseline hazard is how the probability of expiring in the next instant is changing. In terms of the baseline survival function S0(t)=P(T0>t)S_{0}(t)=P\left(T_{0}>t\right) and density f0(t)f_{0}(t), this is given by

λ0(t)=lim⁡ϵ→0+P(t≤T0<t+ϵ∣T0≥t)ϵ=f0(t)S0(t)\lambda_{0}(t)=\lim _{\epsilon \rightarrow 0^{+}} \frac{P\left(t \leq T_{0}<t+\epsilon \mid T_{0} \geq t\right)}{\epsilon}=\frac{f_{0}(t)}{S_{0}(t)}

The conditionally proportional hazards assumption stipulates that

λ(tij∣zij)=λ0(t)exp⁡(wij′γ+θi)\lambda\left(t_{i j} \mid \boldsymbol{z}_{i j}\right)=\lambda_{0}(t) \exp \left(\boldsymbol{w}_{i j}^{\prime} \gamma+\theta_{i}\right)

where θ=(θ1,…,θn)′\boldsymbol{\theta}=\left(\theta_{1}, \ldots, \theta_{n}\right)^{\prime} are random effects, termed frailties in the survival literature. Often the frailties θi\theta_{i}, or exponentiated frailties ϵθi\epsilon^{\theta_{i}}, are assumed to arise iid from some parametric distribution such as N(0,σ2)N\left(0, \sigma^{2}\right), gamma, positive stable, etc. We consider a nonparametric MPT prior on the frailties below.
The specification is conditional because proportionality only holds for survival times within a given strata ii, not across strata unless the distribution of θi\theta_{i} is positive stable (see, e.g. Qiou, Ravishanker,

Figure 3: SIMCE data: Posterior estimates (mean and point-wise 95% HPD intervals) for the ability distribution for type of school and gender. The results for boys are shown in panels (a), ©, (e) and (g) for type of school Public I, Public II, Private I, and Private II, respectively. The results for girls are shown in panels (b), (d), (f) and (h) for type of school Public I, Public II, Private I, and Private II, respectively.

and Dey, 1999). Precisely, for individuals j1j_{1} and j2j_{2} within strata ii,

λ(tij1∣wij1)λ(tij2∣wij2)=exp⁡{(wij1−wij2)′γ}\frac{\lambda\left(t_{i j_{1}} \mid \boldsymbol{w}_{i j_{1}}\right)}{\lambda\left(t_{i j_{2}} \mid \boldsymbol{w}_{i j_{2}}\right)}=\exp \left\{\left(\boldsymbol{w}_{i j_{1}}-\boldsymbol{w}_{i j_{2}}\right)^{\prime} \boldsymbol{\gamma}\right\}

Often the baseline hazard is assumed to be piecewise constant on a partition of mathbbR+\mathbb{R}^{+}mathbbR+comprised of KK intervals, yielding the piecewise exponential model. References are too numerous to list; but see Walker and Mallick (1997), Aslanidou, Dey, and Sinha (1998), and Qiou et al. (1999). Assume

λ0(t)=∑k=1KλkI{ak−1<t≤ak}\lambda_{0}(t)=\sum_{k=1}^{K} \lambda_{k} I\left\{a_{k-1}<t \leq a_{k}\right\}

where a0=0a_{0}=0 and aK=∞a_{K}=\infty, although in practice aK=max⁡{tij}a_{K}=\max \left\{t_{i j}\right\} is sufficient. The prior hazard is specified by cutpoints {ak}k=0K\left\{a_{k}\right\}_{k=0}^{K} and hazard values λ=(λ1,…,λK)′\boldsymbol{\lambda}=\left(\lambda_{1}, \ldots, \lambda_{K}\right)^{\prime}. If the prior on λ\boldsymbol{\lambda} is taken to be independent gamma distributions, the model can approximate the gamma process on a fine mesh (Kalbfleisch, 1978). Regardless, the resulting model implies a Poisson likelihood for “data” yijky_{i j k} taking values yijk=0y_{i j k}=0 when tij∉(ak−1,ak]t_{i j} \notin\left(a_{k-1}, a_{k}\right] or δij=0\delta_{i j}=0, and yijk=1y_{i j k}=1 when tij∈(ak−1,ak]t_{i j} \in\left(a_{k-1}, a_{k}\right] and δij=1\delta_{i j}=1, for k=1,…,K(tij)k=1, \ldots, K\left(t_{i j}\right), where K(t)=max⁡{k:ak≤t}K(t)=\max \left\{k: a_{k} \leq t\right\}. The likelihood for (β,λ,γ)(\boldsymbol{\beta}, \boldsymbol{\lambda}, \boldsymbol{\gamma}) is

L(β,λ,γ)=∏i=1n∏j=1ni[∏k=1K(tij)e−exp⁡{log⁡(λk)+wij′β+γi}Δijk][elog⁡{λK(tij)}+wij′β+γi]δij∝∏i=1n∏j=1ni∏k=1K(tij)p(yijk∣μijk)\begin{aligned} \mathcal{L}(\boldsymbol{\beta}, \boldsymbol{\lambda}, \boldsymbol{\gamma}) & =\prod_{i=1}^{n} \prod_{j=1}^{n_{i}}\left[\prod_{k=1}^{K\left(t_{i j}\right)} e^{-\exp \left\{\log \left(\lambda_{k}\right)+\boldsymbol{w}_{i j}^{\prime} \boldsymbol{\beta}+\gamma_{i}\right\} \Delta_{i j k}}\right]\left[e^{\log \left\{\lambda_{K\left(t_{i j}\right)}\right\}+\boldsymbol{w}_{i j}^{\prime} \boldsymbol{\beta}+\gamma_{i}}\right]^{\delta_{i j}} \\ & \propto \prod_{i=1}^{n} \prod_{j=1}^{n_{i}} \prod_{k=1}^{K\left(t_{i j}\right)} p\left(y_{i j k} \mid \mu_{i j k}\right) \end{aligned}

where p(y∣μ)p(y \mid \mu) is the probability mass function for a Poisson (μ)(\mu) random variable, μijk=exp⁡{log⁡(λk)+\mu_{i j k}=\exp \left\{\log \left(\lambda_{k}\right)+\right. wij′β+γi}Δijk\left.\boldsymbol{w}_{i j}^{\prime} \boldsymbol{\beta}+\gamma_{i}\right\} \Delta_{i j k}, and Δijk=min⁡{ak,tij}−ak−1\Delta_{i j k}=\min \left\{a_{k}, t_{i j}\right\}-a_{k-1}. Thus, the Cox model assuming a piecewise constant baseline hazard can be fitted in any software allowing for Poisson regression. Note that if covariates are time dependent as well, and change only at values included in {ak}k=0K\left\{a_{k}\right\}_{k=0}^{K}, the likelihood is trivially extended to include wijk\boldsymbol{w}_{i j k} above for k=1,…,K(tij)k=1, \ldots, K\left(t_{i j}\right) rather than wij\boldsymbol{w}_{i j}.
We consider data on n=38n=38 kidney patients discussed by McGilchrist and Aisbett (1991). Each of the patients provides ni=2n_{i}=2 infection times, some of which are right censored. McGilchrist and Aisbett (1991) found that only gender was significant, and so we follow Aslanidou et al. (1998), Walker and Mallick (1997), Qiou et al. (1999), and Hemming and Shaw (2005) in considering only this covariate in what follows. We fitted the semiparametric proportional hazards regression model using a nonparametric prior for the frailties distribution. The following commands were used to prepare the data to fit the model. The original dataset, d[i,j]d[i, j], is a 38 by 6 matrix, which for each row (from left to right) contains the subject indicator, ti1,δi1,ti2,δi2t_{i 1}, \delta_{i 1}, t_{i 2}, \delta_{i 2}, and the gender indicator. Ten intervals were considered with cutpoints {a1,…,a10}\left\{a_{1}, \ldots, a_{10}\right\} taken from the empirical distribution of the data.

############################################################
# function to make a row with 'l' at ind
######################################
    onv <- function(ind,len)
    {
        onv <- rep(0,len)
        onv'ind] <- 1

        return (onv)
    }
############################################
# Create data to fit Cox model using
# Poisson likelihood for piecewise
# exponential model.
############################################
    newdat <- matrix(1:(38*2*2),nrow=38*2,ncol=2)
    tt <- rep(0,38*2)
    delta <- tt
    for(i in 1:38)
    {
        newdat[i*2-1,1] <- d[i,1]
        newdat[i*2-1,2] <- d[i,6]
        newdat[i*2 ,1] <- d[i,1]
        newdat[i*2 ,2] <- d[i,6]
        tt[i*2-1] <- d[i,2]
        delta[i*2-1] <- d[i,3]
        tt[i*2] <- d[i,4]
        delta[i*2] <- d[i,5]
    }
    y <- NULL
    mat <- NULL
    tot <- 0
    p <- ncol (newdat)
    off <- NULL
    n <- length(tt)
    intervals <- 10
    cutpoint <- quantile(tt,(1:intervals)/intervals,names=FALSE)
    for(i in 1:n)
    {
        tot <- tot+1
        mat <- matrix(append(mat, c(newdat[i,1:p],onv(1,intervals))),
                c(p+intervals, tot))
        off <- append (off, min (cutpoint [1],tt[i]))
        if(tt[i]<=cutpoint[1] && delta[i]==1)
        {
            y <- append(y,1)
        }
        else
        {
            y <- append(y,0)
        }
        for(j in 1:(intervals-1))

    if (tt[i]>cutpoint[j])
    {
        off <- append(off,min(cutpoint[j+1],
                                    tt [i])-cutpoint[j])
        tot <- tot+1
        mat <- matrix(append(mat, c(newdat[i, 1:p],
                onv(j+1,intervals)));
                c(p+intervals, tot))
            if(tt[i] <= cutpoint[j+1] && delta[i]==1)
            {
                y <- append(y,1)
            }
            else
            {
                y <- append(y,0)
            }
        }
    }
    }
    mat <- t(mat)
    id <- mat[,1]
    gender <- mat[,2]
    loghazard <- mat[,3:12]

We performed the analysis using the PTglmm function to the responses

yi=(yi11,…,yi1K(ti1),…,yi21,…,yi2K(ti2))\boldsymbol{y}_{i}=\left(y_{i 11}, \ldots, y_{i 1 K\left(t_{i 1}\right)}, \ldots, y_{i 21}, \ldots, y_{i 2 K\left(t_{i 2}\right)}\right)

and where xij\boldsymbol{x}_{i j} is a 11-dimensional design vector containing the gender indicator and the indicator for the interval associated to the corresponding response. Finally, we set β=(γ′,λ′)′\boldsymbol{\beta}=\left(\gamma^{\prime}, \boldsymbol{\lambda}^{\prime}\right)^{\prime}, and assume

β∣β0,Sβ0∼Np(β0,Sβ0)θ1,…,θn∣G∼ iid G\begin{gathered} \boldsymbol{\beta} \mid \boldsymbol{\beta}_{0}, \boldsymbol{S}_{\boldsymbol{\beta}_{0}} \sim N_{p}\left(\boldsymbol{\beta}_{0}, \boldsymbol{S}_{\boldsymbol{\beta}_{0}}\right) \\ \theta_{1}, \ldots, \theta_{n} \mid G \stackrel{\text { iid }}{\sim} G \end{gathered}

and

G∼PTM(Πσ2,Aα)G \sim P T^{M}\left(\Pi^{\sigma^{2}}, \mathcal{A}^{\alpha}\right)

We consider a M=5M=5 finite PT prior which was centered around a N(0,σ2)N\left(0, \sigma^{2}\right) distribution and constrained to have median-0 (frstlprob=TRUE in the prior object below). The values for the hyperparameters β0\boldsymbol{\beta}_{0} and Sβ0\boldsymbol{S}_{\boldsymbol{\beta}_{0}} were obtained from a penalized quasi-likelihood (PQL) fit using the glmmPQL function available from the MASS pakage (Venables and Ripley, 2002). The matrix Sβ0\boldsymbol{S}_{\boldsymbol{\beta}_{0}} was inflated by a factor of 100 . The remaining hyper-parameters were a0=b0=1,ν0=3a_{0}=b_{0}=1, \nu_{0}=3, and T=I1\boldsymbol{T}=\boldsymbol{I}_{1}. Starting values for the model parameters were obtained from the PQL fit. A single Markov chain cycle of length 25,000 was completed. The full chain was sub-sampled every 4 steps after a burn in period of 5,000 samples, to give a reduced chain of length 5,000 . The code for fitting the model using PTglmm was

#########
#######
#######

############
    beta0 <- fit0$coefficients$fixed
    Sbeta0 <- vcov(fit0)
    prior <- list(M=5,
        a0=1,
        b0=1,
        nu0=3,
        tinv=diag(1,1),
        mu=rep(0,1),
        beta0=beta0,
        Sbeta0=Sbeta0,
        frstlprob=TRUE)

#####################
# starting values from PQL estimation
#######################
beta <- fit0$coefficients$fixed
    b <- as.vector(fit0$coefficients$random$id)
    mu <- rep(0,1)
    sigma <- getVarCov(fit0)[1,1]
    state <- list(alpha=1,
        beta=beta,
        b=b,
        mu=mu,
        sigma=sigma)

####################################################################################################

############
# / fitting the model
#####################################################
fitPT <- PTglmm(fixed=y~gender+loghazard,
offset=log (off),
random=~1|id,
family=poisson(log),
prior=prior,
mcmc=mcmc,
state=state,
status=FALSE)
######################
# posterior inferences
# ////

summary(fitPT)
#############################################################
# frailties density estimate
###########################################
    predPT <- PTrandom(fitPT,predictive=TRUE,
        gridl=c(-2.3,2.3))
    plot(predPT)

The abridged output is given below. The output lists the estimated effect for gender β^1=−1.13\hat{\beta}_{1}=-1.13 followed by K=10K=10 estimated log-hazard values. Notice that the intercept term in the posterior information for the “fixed” effects (regression coefficients in the output), corresponds to the mean of the frailties distribution GG. The posterior median estimate of the centering variance was σ^2=0.35\hat{\sigma}^{2}=0.35 and close to the posterior median of the frailties variance ( 0.33 ). Further, the posterior median ( 95%95 \% credible interval) for α\alpha was 0.75(0.04;3.77)0.75(0.04 ; 3.77). The trace plots of the parameters (not shown) indicate a good mixing of the chain. The acceptance rates for the MH steps associated to the regression coefficients, frailties, centering variance and precision parameter was 36,61,4336,61,43 and 0.46%0.46 \%, respectively. Notice that the 0 values for the acceptance rates in the output corresponds to the centering mean, which is sampled, and the decomposition of the centering covariance matrix. The latter is only sampled for dimensions greater than or equal to 2 .
Walker and Mallick (1997) analyzed these data with piecewise exponential model and frailties following a Polya tree with fixed centering variance, PT8(Π100,A0.1)P T^{8}\left(\Pi^{100}, \mathcal{A}^{0.1}\right) and find β^1=−1.0\hat{\beta}_{1}=-1.0. McGilchrist and Aisbett (1991) obtain β^1=−1.8\hat{\beta}_{1}=-1.8, but with other nonsignificant covariates included. Aslanidou et al. (1998) also reportes β^=−1.0\hat{\beta}=-1.0. Hemming and Shaw (2005) obtain β^=−1.7\hat{\beta}=-1.7 and Qiou et al. (1999) obtain β^=−1.1\hat{\beta}=-1.1 under the positive stable and β^=−1.6\hat{\beta}=-1.6 under gamma frailties, respectively. The the deviance information criterion (DIC), as presented by Spiegelhalter, Best, Carlin, and Van der Linde (2002), was 398 for either PT or normal model (not shown), so the normal model does about the same from a predictive standpoint based on the DIC.

Bayesian semiparametric generalized linear mixed effect model
Call:
PTglmm.default(fixed = y ~ gender + loghazard, random = ~1 |

    id, family = poisson(log), offset = log(off), prior = prior,
    mcmc = mcmc, state = state, status = FALSE)
Posterior Predictive Distributions (log):
    Min. 1st Qu. Median Mean 3rd Qu. Max.
-5.99200 -0.22250 -0.10970 -0.48500 -0.05714 -0.01381
Model's performance:
    Dbar Dhat pD DIC LPML
    379.21 360.63 18.58 397.79 -200.29
Regression coefficients:
    Mean Median Std. Dev. Naive Std.Error 95%CI-Low 95%CI-Upp
(Intercept) -0.0004443 0.0015210 0.0960076 0.0013578 -0.2066125 0.2021371
gender -1.1321281 -1.1296717 0.3219508 0.0045531 -1.7762785 -0.5117994
loghazard1 -4.2608268 -4.2375512 0.4412274 0.0062399 -5.1598904 -3.4611046
loghazard2 -3.7898628 -3.7638395 0.5018976 0.0070979 -4.8383288 -2.8794989
loghazard3 -3.9792281 -3.9691425 0.4556631 0.0064440 -4.9028932 -3.1213276
loghazard4 -3.0627136 -3.0526713 0.4526581 0.0064016 -4.0124879 -2.2353213
loghazard5 -3.2581084 -3.2477986 0.4219626 0.0059675 -4.1039312 -2.4603991
loghazard6 -3.9951390 -3.9805448 0.4544001 0.0064262 -4.9103962 -3.1403702
loghazard7 -4.9343777 -4.9183270 0.5365962 0.0075886 -6.0496817 -3.9150135
loghazard8 -3.6883152 -3.6845014 0.4479935 0.0063356 -4.5692123 -2.8232222
loghazard9 -3.6723423 -3.6673231 0.4810002 0.0068024 -4.6112294 -2.7315973
loghazard10 -4.1246955 -4.1272752 0.4966618 0.0070239 -5.0749243 -3.1886274
Baseline distribution:
    Mean Median Std. Dev. Naive Std.Error 95%CI-Low 95%CI-Upp
mu-(Intercept) 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
sigma-(Intercept) 0.430385 0.354618 0.294752 0.004168 0.119319 1.212674
Precision parameter:
    Mean Median Std. Dev. Naive Std.Error 95%CI-Low 95%CI-Upp
alpha 1.05875 0.75117 1.02204 0.01445 0.04448 3.76967
Random effects variance:
    Mean Median Std. Dev. Naive Std.Error 95%CI-Low 95%CI-Upp
R.E.Cov-(Intercept) 0.378637 0.331281 0.222121 0.003141 0.096121 0.948495
Acceptance Rate for Metropolis Steps = 0.3570935 0.6072718 0 0.428972 0.463486 0
Number of Observations: 413
Number of Groups: 38

Figure 4 shows the estimated frailty distribution from these data along with the posterior mean of the frailty term for each patient. The distribution is remarkably Gaussian-shaped, in contrast to the analysis presented in Walker and Mallick (1997), which showed two well defined density modes corresponding to men and women. We were unable to duplicate this result across several sets of hyperprior values, including the consideration of PTh(Π100,A0.1)P T^{h}\left(\Pi^{100}, \mathcal{A}^{0.1}\right). In retrospect, this is not surprising. Two well separated modes would typically indicate an omitted covariate, yet gender was included as a risk factor in the model.
Finally, Figure 5 show the posterior median and 95%95 \% credible interval for survival curves for males and females, taking the individual-level heterogeneity modeled through the frailty distribution into account.

5. Concluding remarks

Because the main obstacle for the practical use of BSP and BNP methods has been the lack of estimation tools, we presented an R package for fitting some frequently used models. Until the release of

Figure 4: Kidney data: Posterior mean of the frailty distribution. The density is overlaid on a plot of the posterior mean of the individuals frailty terms.

Figure 5: Kidney data: Posterior estimates (median and point-wise 95% credible intervals) for the survival function for time to infection. The results for males and females are shown in panels (a) and (b), respectively.

DPpackage, the two options for researchers who wished to fit a BSP or BNP model were to write their own code or to rely heavily on particular parametric approximations to some specific processes using the BUGS code given in Peter Congdon’s books (see e.g., Congdon, 2001). DPpackage is geared primarily towards users who are not willing to bear the costs associated with both of these options.
Chambers (2000) conceptualized statistical software as a set of tools to organize, analyze and visualize data. Data organization and visualization of results is based on Rr\mathrm{R}_{\mathrm{r}} capabilities. Chambers (2000) also proposed requirements and guidelines for developing and assessing statistical softwares. These requirements may be discussed with respect to DPpackage:

Easy specification of simple tasks: The documentation contains examples, and similar problems can be analyzed by moderate modifications of the model description files. The examples have been chosen so that they demonstrate the functionality of DPpackage with wellknown data sets.
Gradual refinement of the tasks: The user can enhance a nonparametric model by adding covariates, and by fixing part of the baseline distributions and the precision parameters.
Arbitrarily extensive programming: DPpackage has a programming environment for implementing sophisticated proposal distributions, if the default proposals are not sufficient.
Implementing high-quality computations: Also, because the source code in a compiled language is available, new procedures can be added and the old ones modified to improve performance and flexibility.
Embedding the results of items 2-4 as new simple tools: DPpackage has the capability of continuing a Markov chain from the last value of the parameters of a previous analysis. As the MCMC samples are saved in matrix objects, both parts of the Markov chain can be easily merged.

Many improvements to the current status of the package can be made. For example, all DPpackage modeling functions compute CPOs for model comparison. However, only some of them compute the effective number of parameters pDp D and DIC, as presented by Spiegelhalter et al. (2002). These and other model comparison criterion will be included for all functions in future versions of DPpackage.
The implementation of more models, the development of general-purpose sampling functions, realtime visualization of simulation progress, and the ability to handle large dataset problems, through the use of sparse matrix techniques (George and Liu, 1981), are the topic of further improvements.

6. Acknowledgments

The first author is supported by Fondecyt grant 3095003. Partial support from the KUL-PUC bilateral (Belgium-Chile) grant BIL05/03 and of the IAP research network grant Nr P6/03 of the Belgian government (Belgian Science Policy) for previous versions of DPpackage is also acknowledged. The work of the second author was supported in part by NIH grant 2-R01-CA95955-05. The third author was partially supported by grant Fondecyt 1060729. The last two authors were partially supported by grant NIH/NCI R01CA75981. The SIMCE Office from the Chilean Government kindly allowed us access to the databases used in this paper.

References

Albert JH, Chib S (1993). “Bayesian analysis of binary and polychotomous response data.” Journal of the American Statistical Association, 88, 669-679.

Antoniak CE (1974). “Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems.” The Annals of Statistics, 2, 1152-1174.

Aslanidou H, Dey DK, Sinha D (1998). “Bayesian analysis of multivariate survival data using Monte Carlo methods.” Canadian Journal of Statistics, 26, 33-48.

Besag J, Green P, Higdon D, Mengersen K (1995). “Bayesian computation and stochastic systems (with Discussion).” Statistical Science, 10, 3-66.

Branscum A, Hanson T (2008). “Bayesian nonparametric meta-analysis using Polya tree mixture models.” Biometrics, 64, 825-833.

Bush CA, MacEachern SN (1996). “A semiparametric Bayesian model for randomised block designs.” Biometrika, 83, 275-285.

Carlin BP, Louis TA (2008). Bayesian methods for data analysis, 3rd Ed. Chapman and Hall/CRC, New York, USA.

Chambers JM (2000). “Users, programmers, and statistical software.” Journal of Computational and Graphical Statistics, 9(3), 402-422.

Chen MH, Shao QM (1999). “Monte Carlo estimation of Bayesian credible and HPD intervals.” Journal of Computational and Graphical Statistics, 8, 69-92.

Christensen R, Hanson T, Jara A (2008). “Parametric nonparametric statistics: An introduction to mixtures of finite Polya trees.” The American Statistician, 62, 296-306.

Congdon P (2001). Bayesian statistical modelling. John Wiley and Sons, New York, USA.
Cox DR (1972). “Regression models and life-tables (with Discussion).” Journal of the Royal Statistical Society, Series B, 34, 187-220.

De Boeck P, Wilson M (2004). Explanatory item response models. A generalized linear and nonlinear approach. Springer, New York, USA.

De Iorio M, Johnson WO, Müller P, Rosner GL (2009). “Bayesian nonparametric non-proportional hazards survival modelling.” Biometrics, 65, 762-771.

De Iorio M, Müller P, Rosner GL, MacEachern SN (2004). “An ANOVA model for dependent random measures.” Journal of the American Statistical Association, 99, 205-215.

Dey D, Müller P, Sinha D (1998). Practical nonparametric and semiparametric Bayesian statistics. Springer, New York, USA.

Doss H (1994). “Bayesian nonparametric estimation for incomplete data via successive substitution sampling.” The Annals of Statistics, 22, 1763-1786.

Duan JA, Guindani M, Gelfand AE (2007). “Generalized spatial Dirichlet process models.” Biometrika, 94, 809-825.

Dunson D, Yang M, Baird D (2007a). “Semiparametric Bayes hierarchical models with mean and variance constraints.” Technical report, Department of Statistical Science, Duke University.

Dunson DB, Park JH (2008). “Kernel stick-breaking processes.” Biometrika, 95, 307-323.
Dunson DB, Pillai N, Park JH (2007b). “Bayesian density regression.” Journal of the Royal Statistical Society, Series B, 69, 163-183.

Eilers PHC, Marx BD (1996). “Flexible smoothing with B-splines and penalties.” Statistical Science, 11(2), 89-121.

Escobar MD (1994). “Estimating normal means with a Dirichlet process prior.” Journal of the American Statistical Association, 89, 268-277.

Escobar MD, West M (1995). “Bayesian density estimation and inference using mixtures.” Journal of the American Statistical Association, 90, 577-588.

Fariña P, Quintana FA, San Martín E, Jara A (2009). “A dependent semiparametric Rasch model for the analysis of Chilean educational data.” Technical report, Department of Statistics, Pontificia Universidad Católica de Chile.

Ferguson TS (1973). “A Bayesian analysis of some nonparametric problems.” Annals of Statistics, 1, 209-230.

Ferguson TS (1974). “Prior distribution on the spaces of probability measures.” Annals of Statistics, 2, 615-629.

Gamerman D (1997). “Sampling from the posterior distribution in generalized linear mixed models.” Statistics and Computing, 7, 57-68.

Gelfand AE, Kottas A (2002). “A computational approach for full nonparametric Bayesian inference under Dirichlet Process Mixture models.” Journal of Computational and Graphical Statistics, 11, 289-304.

George A, Liu JW (1981). Computer solution of large sparse positive definite systems. Prentice-Hall, New York, USA.

Ghosh JK, Ramamoorthi RV (2003). Bayesian nonparametrics. Springer, New York, USA.
Gilks WR, Thomas A, Spiegelhalter DJ (1994). “A language and program for complex Bayesian modelling.” The Statistician, 43, 169-178.

Green PJ (1995). “Reversible jump Markov chain Monte Carlo computation and Bayesian model determination.” Biometrika, 82, 711-732.

Griffin JE, Steel MFJ (2006). “Order-based dependent Dirichlet processes.” Journal of the American Statistical Association, 101, 179-194.

Hanson T (2006). “Inference for mixtures of finite Polya tree models.” Journal of the American Statistical Association, 101, 1548-1565.

Hanson T, Branscum A, Johnson W (2005). “Bayesian nonparametric modeling and data analysis: an introduction.” In DK Dey, CR Rao (eds.), “Bayesian Thinking: Modeling and Computation (Handbook of Statistics, volume 25),” pp. 245-278. Elsevier, Amsterdam, The Netherlands.

Hanson T, Johnson WO (2002). “Modeling regression error with a mixture of Polya trees.” Journal of the American Statistical Association, 97, 1020-1033.

Hanson T, Johnson WO (2004). “A Bayesian semiparametric AFT model for interval-censored data.” Journal of Computational and Graphical Statistics, 13, 341-361.

Hastie T, Tibshirani R (1990). Generalized additive models. Chapman and Hall, New York, USA.
Hemming K, Shaw JEH (2005). “A class of parametric dynamic survival models.” Lifetime Data Analysis, 11, 81-98.

Ishwaran H, James LF (2002). “Approximate Dirichlet process computing in finite normal mixtures: smoothing and prior information.” Journal of Computational and Graphical Statistics, 11, 508532 .

Jara A (2007). “Applied Bayesian non- and semi-parametric inference using DPpackage.” Rnews, 7, 17−2617-26.

Jara A, García-Zattera MJ, Lesaffre E (2007). “A Dirichlet process mixture model for the analysis of correlated binary responses.” Computational Statistics and Data Analysis, 51, 5402-5415.

Jara A, Hanson T, Lesaffre E (2009). “Robustifying generalized linear mixed models using a new class of mixture of multivariate Polya trees.” Journal of Computational and Graphical Statistics, To appear.

Kalbfleisch JD (1978). “Nonparametric Bayesian analysis of survival time data.” Journal of the Royal Statistical Society, Series B, 40, 214-221.

Kleinman KP, Ibrahim JG (1998a). “A semi-parametric Bayesian approach to generalized linear mixed models.” Statistics in Medicine, 17, 2579-2596.

Kleinman KP, Ibrahim JG (1998b). “A semiparametric Bayesian approach to the random effects model.” Biometrics, 54, 921-938.

Kottas A, Müller P, Quintana F (2005). “Nonparametric Bayesian modeling for multivariate ordinal data.” Journal of Computational and Graphical Statistics, 14, 610-625.

Kraemer HC (1992). Evaluating medical tests. Sage Publications, New York, USA.
Lang S, Brezger A (2004). “Bayesian P-splines.” Journal of Computational and Graphical Statistics, 13, 183-212.

Lavine M (1992). “Some aspects of Polya tree distributions for statistical modeling.” The Annals of Statistics, 20, 1222-1235.

Lavine M (1994). “More aspects of Polya tree distributions for statistical modeling.” The Annals of Statistics, 22, 1161-1176.

Li Y, Müller P, Lin X (2007). “Center-adjusted inference for a nonparametric Bayesian random effect distribution.” Technical report, Department of Biostatistics, The MD Anderson Cancer Center.

Liu JS (1996). “Nonparametric hierarchical Bayes via sequential imputations.” The Annals of Statistics, 24, 911-930.

Lo AY (1984). “On a class of Bayesian nonparametric estimates I: Density estimates.” The Annals of Statistics, 12, 351-357.

MacEachern SN (1998). “Computational methods for mixture of Dirichlet process models.” In D Dey, P Müller, D Sinha (eds.), “Practical Nonparametric and Semiparametric Bayesian Statistics,” pp. 1−221-22. Springer.

MacEachern SN (1999). “Dependent nonparametric processes.” In “ASA Proceedings of the Section on Bayesian Statistical Science, Alexandria, VA,” American Statistical Association.

MacEachern SN (2000). “Dependent Dirichlet processes.” Technical report, Department of Statistics, The Ohio State University.

MacEachern SN, Müller P (1998). “Estimating mixture of Dirichlet Process models.” Journal of Computational and Graphical Statistics, 7(7(2)), 223-338.

Mauldin RD, Sudderth WD, Williams SC (1992). “Polya trees and random distributions.” Annals of Statistics, 20, 1203-1221.

McGilchrist CA, Aisbett CW (1991). “Regression with frailty in survival analysis.” Biometrics, 47, 461−466461-466.

Mukhopadhyay S, Gelfand AE (1997). “Dirichlet process mixed generalized linear models.” Journal of the American Statistical Association, 92, 633-647.

Muliere P, Tardella L (1998). “Approximating distributions of random functionals of FergusonDirichlet priors.” The Canadian Journal of Statistics, 26, 283-297.

Müller P, Erkanli A, West M (1996). “Bayesian curve fitting using multivariate normal mixtures.” Biometrika, 83, 67-79.

Müller P, Quintana FA (2004). “Nonparametric Bayesian data analysis.” Statistical Science, 19, 95−11095-110.

Müller P, Quintana FA, Rosner G (2004). “A method for combining inference across related nonparametric Bayesian models.” Journal of the Royal Statistical Society, Series B, 66, 735-749.

Müller P, Rosner GL (1997). “A Bayesian population model with hierarchical mixture priors applied to blood count data.” Journal of the American Statistical Association, 92, 1279-1292.

Neal R (2000). “Markov chain sampling methods for Dirichlet process mixture models.” Journal of Computational and Graphical Statistics, 9, 249-265.

Neal R (2003). “Slice sampling.” The Annals of Statistics, 31, 705-767.
Newton MA (1994). “Computing with priors that support identifiable semiparametric models.” Technical report, N. 905, University of Wisconsin-Madison, Department of Statistics.

Newton MA, Czado C, Chapell R (1996). “Bayesian inference for semiparametric binary regression.” Journal of the American Statistical Association, 91, 142-153.

Perron F, Mengersen K (2001). “Bayesian nonparametric modeling using mixtures of triangular distributions.” Biometrics, 57, 518-528.

Petrone S (1999a). “Bayesian density estimation using Bernstein polynomials.” The Canadian Journal of Statistics, 27, 105-126.

Petrone S (1999b). “Random Bernstein polynomials.” Scandinavian Journal of Statistics, 26, 373393.

Petrone S, Wasserman L (2002). “Consistency of Bernstein polynomial posterior.” Journal of the Royal Statistical Society, Series B, 64, 79-100.

Plummer M, Best N, Cowles K, Vines K (2006). CODA: Output analysis and diagnostics for MCMC. R package version 0.10−70.10-7.

Qiou Z, Ravishanker N, Dey DK (1999). “Multivariate survival analysis with positive stable frailties.” Biometrics, 55, 81-88.

R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http:// www.R-project.org.

Rasch G (1960). Probabilistic models for some intelligence and attainment tests. The Danish Institute for Educational Research (Expanded Edition, 1980, The University Chicago Press), Chicago, USA.

Rossi P, Allenby G, McCulloch R (2005). Bayesian statistics and marketing. John Wiley and Sons, New York, USA.

Rossi P, McCulloch R (2008). bayesm: Bayesian inference for marketing/micro-econometrics. R package version 2.2-2, URL http://faculty.chicagogsb.edu/peter.rossi/ research/bsm.html.

San Martín E, Jara A, Rolin JM, Mouchart M (2007). “On the analysis of Bayesian semiparametric IRT-type models.” p. (Submitted).

Sethuraman J (1994). “A constructive definition of Dirichlet prior.” Statistica Sinica, 2, 639-650.
Smith BJ (2007). “BOA: An R package for MCMC output convergence assessment and posterior inference.” Journal of Statistical Software, 21, 1-37.

Spiegelhalter SD, Best NG, Carlin BP, Van der Linde A (2002). “Bayesian measures of model complexity and fit.” Journal of the Royal Statistical Society: Series B, 64, 583-639.

Strurtz S, Ligges U, Gelman A (2005). “R2WinBUGS: A package for running WinBUGS from R.” Journal of Statistical Software, 12, 1-16.

Thomas A, O’Hara B, Ligges U, Sibylle S (2006). “Making BUGS open.” Rnews, 6, 12-17.
Tierney L (1994). “Markov chains for exploring posterior distributions.” The Annals of Statistics, 22, 1701−17621701-1762.

Venables WN, Ripley BD (2002). Modern applied statistics with S. Springer, New York, fourth edition. ISBN 0-387-95457-0, URL http://www.stats.ox.ac.uk/pub/MAS54.

Walker SG, Damien P, Laud PW, Smith AFM (1999). “Bayesian nonparametric inference for random distributions and related functions (with discussion).” Journal of the Royal Statistical Society, Series B, 61, 485-527.

Walker SG, Mallick BK (1997). “Hierarchical generalized linear models and frailty models with Bayesian nonparametric mixing.” Journal of the Royal Statistical Society, Series B, 59, 845-860.

West M (1985). “Generalized linear models: outlier accomodation, scale parameter and prior distributions.” In JM Bernardo, MH DeGroot, DV Lindley, AFM Smith (eds.), “Proceedings of the Second Valencia International Meeting,” North Holland, Amsterdam.

Zellner A (1983). “Applications of Bayesian analysis in econometrics.” The Statistician, 32, 23-34.

Affiliation:

Alejandro Jara
Department of Statistics
Universidad de Concepción
Avenida Esteban Iturra S/N
Barrio Universitario
Concepción, Chile
Telephone: +56-41-2203163
Fax: +56-41-2251529
E-mail: ajarav@udec.cl
URL: http://www2.udec.cl/ ajarav

Journal of Statistical Software	http://www.jstatsoft.org/
published by the American Statistical Association	http://www.amstat.org/
Volume VV, Issue II	Submitted: yyyy-mm-dd
MMMMMM YYYY	Accepted: yyyy-mm-dd

DPpackage: Bayesian Non- and Semi-parametric Modelling in R (original) (raw)

Journal of Statistical Software