A flexible Bayesian nonparametric model for preference-based clustering: toward a Netflix-like collaborative filtering algorithm to improve healthcare decisions (original) (raw)

Clustering Survival Outcomes using Dirichlet Process Mixture

2014

Motivated by the national evaluation of mortality rates at kidney transplant centers in the United States, we sought to assess transplant center long-term survival outcomes by applying a methodology developed in Bayesian non-parametrics literature. We described a Dirichlet process model and a Dirichlet process mixture model with a Half-Cauchy for the estimation of the risk-adjusted effects of the transplant centers. To improve the model performance and interpretability, we centered the Dirichlet process. We also proposed strategies to increase model's classification ability. Finally we derived statistical measures and created graphical tools to rate transplant centers and identify outlying centers with exceptionally good or poor performance. The proposed method was evaluated through simulation, and then applied to assess kidney transplant centers from a national organ failure registry.

A computational approach for full nonparametric Bayesian inference under Dirichlet process mixture models

2002

Widely used parametric generalizedlinear models are, unfortunately,a somewhat limited class of speci cations. Nonparametric aspects are often introduced to enrich this class, resulting in semiparametricmodels. Focusing on single or k-sample problems, many classical nonparametricapproachesare limited to hypothesis testing. Those that allow estimation are limited to certain functionals of the underlying distributions. Moreover, the associated inference often relies upon asymptotics when nonparametric speci cations are often most appealing for smaller sample sizes. Bayesian nonparametric approaches avoid asymptotics but have, to date, been limited in the range of inference. Working with Dirichlet process priors, we overcome the limitations of existing simulation-based model tting approaches which yield inference that is con ned to posterior moments of linear functionals of the population distribution. This article provides a computational approach to obtain the entire posterior distribution for more general functionals. We illustrate with three applications: investigation of extreme value distributions associated with a single population, comparison of medians in a k-sample problem, and comparison of survival times from different populations under fairly heavy censoring.

Bayesian nonparametric predictive modeling of group health claims

Insurance: Mathematics and Economics, 2015

Models commonly employed to fit current claims data and predict future claims are often parametric and relatively inflexible. An incorrect model assumption can cause model misspecification which leads to reduced profits at best and dangerous, unanticipated risk exposure at worst. Even mixture models may not be sufficiently flexible to properly fit the data. Using a Bayesian nonparametric model instead can dramatically improve claim predictions and consequently risk management decisions in group health practices. The improvement is significant in both simulated and real data from a major health insurer's medium-sized groups. The nonparametric method outperforms a similar Bayesian parametric model, especially when predicting future claims for new business (entire groups not in the previous year's data). In our analysis, the nonparametric model outperforms the parametric model in predicting costs of both renewal and new business. This is particularly important as healthcare costs rise around the world.

Semiparametric Bayesian models for clustering and classification in the presence of unbalanced in-hospital survival

Journal of the Royal Statistical Society: Series C (Applied Statistics), 2013

Bayesian semiparametric logit models are fitted to grouped data related to in-hospital survival outcome of patients hospitalized with an ST-segment elevation myocardial infarction diagnosis. Dependent Dirichlet process priors are considered for modelling the random-effects distribution of the grouping factor (hospital of admission), to provide a cluster analysis of the hospitals. The clustering structure is highlighted through the optimal random partition that minimizes the posterior expected value of a suitable loss function. There are two main goals of the work: to provide model-based clustering and ranking of the providers according to the similarity of their effect on patients' outcomes, and to make reliable predictions on the survival outcome at the patient's level, even when the survival rate itself is strongly unbalanced. The study is within a project, named the 'Strategic program of Regione Lombardia', and is aimed at supporting decisions in healthcare policies.

On Bayesian Estimation of Dirichlet Process Lognormal Mixture Models and Comparison of Treatments in Censoring

International Journal of Statistical Distributions and Applications

One current interest in medical research is the comparison of treatments in the analysis of survival times of patients. This is particularly problematic, especially for censored data, and when these data consists of several groups, where each group has distinct properties and characteristics but belong to the same distribution. There are various modeling schemes that have been contemplated to overcome these complexities inherent in the data. One such possibility is the Bayesian approach which integrates prior knowledge in analysis. In this paper, we focus on the use of Bayesian lognormal mixture model (MLNM) with related Dirichlet process (DP) prior distribution for estimating patient survival. The advances in the Bayesian paradigm have considerably bolstered the development and application of mixture modelling methodology in the field of survival analysis. The proposed MLN model is compared with the conventional parametric lognormal and the nonparametric Kaplan Meier (K-M) models used to estimate survival to establish model robustness. A simulation study that investigates the impact of censoring on these models is also described. Real data from past research is used to show the resulting Dirichlet process mixture model's robustness in the comparison of censored treatment. The results indicate that the proposed lognormal mixtures provide a better fit to complex data. Further, the MLN models are able to estimate various survival distributions and therefore appropriate to compare treatments. Clinicians will find these models useful especially when confronted with the obstacle of choosing a suitable therapy for a disease.

A nonparametric hierarchical bayesian framework for information filtering

Proceedings of the 27th annual international conference on Research and development in information retrieval - SIGIR '04, 2004

Information filtering has made considerable progress in recent years.The predominant approaches are content-based methods and collaborative methods. Researchers have largely concentrated on either of the two approaches since a principled unifying framework is still lacking. This paper suggests that both approaches can be combined under a hierarchical Bayesian framework. Individual content-based user profiles are generated and collaboration between various user models is achieved via a common learned prior distribution. However, it turns out that a parametric distribution (e.g. Gaussian) is too restrictive to describe such a common learned prior distribution. We thus introduce a nonparametric common prior, which is a sample generated from a Dirichlet process which assumes the role of a hyper prior. We describe effective means to learn this nonparametric distribution, and apply it to learn users' information needs. The resultant algorithm is simple and understandable, and offers a principled solution to combine content-based filtering and collaborative filtering. Within our framework, we are now able to interpret various existing techniques from a unifying point of view. Finally we demonstrate the empirical success of the proposed information filtering methods.

Mixtures of products of Dirichlet processes for variable selection in survival analysis

Journal of statistical planning and …, 2003

A very important problem in survival analysis is the accurate selection of the relevant prognostic explanatory variables. We propose a novel approach, based on mixtures of products of Dirichlet process priors, that provides a formal inferential tool to compare the explanatory power of each covariate, in terms of the marginal likelihood attached to the induced partitions of the observations. Our proposed model is Bayesian nonparametric, and, thus, keeps the amount of model specification to a minimum, increasing robustness of the final inferences.

Nonparametric empirical Bayes for the Dirichlet process mixture model

2006

Abstract The Dirichlet process prior allows flexible nonparametric mixture modeling. The number of mixture components is not specified in advance and can grow as new data arrive. However, analyses based on the Dirichlet process prior are sensitive to the choice of the parameters, including an infinite-dimensional distributional parameter G 0. Most previous applications have either fixed G 0 as a member of a parametric family or treated G 0 in a Bayesian fashion, using parametric prior specifications.

More nonparametric Bayesian inference in applications

Statistical Methods & Applications, 2017

We are delighted to have the opportunity to discuss this paper. We are long time believers in the BNP approach to modeling statistical data. The authors have a an extensive and distinguished record of accomplishments in this area and it is fitting that they would feature some of their work that displays the utility and outright advantage of the BNP method in a variety of complex clinically relevant settings, as they have done masterfully in the work displayed in this article. We take this opportunity to augment the discussion of the authors' work by mentioning some of our own; part of which also includes the authors of this article. The authors discussed novel applications to survival analysis. We mention the work of De Iorio et al (2009), that uses a DP mixture of log normal distributions in order to provide a semi-parametric survival model that allows survival curves to cross, thus avoiding the assumption of proportional hazards. The papers Hanson and Johnson (2002, 2004), Hanson et al (2009), Hanson et al (2011) constitute a body of work that embeds parametric survival families of distributions into broader non-parametric families using Mixtures of Finite Polya Trees (MFPT). The models discussed in these papers allow for considerable flexibility compared with their parametric counterparts. An additional theme involves consideration of several alternative semi-parametric families, for example, they model baseline survival distributions using MFPTs for proportional hazards, accelerated failure time, proportional odds and Cox and Oakes models. Some of the work focusses on fixed time dependent covariates, and other work develops joint models for survival and longitudinal processes that are related

A Non-parametric Bayesian Learning Model Using Accelerated Variational Inference on Multivariate Beta Mixture Models for Medical Applications

International Journal of Semantic Computing

Clustering as an exploratory technique has been a promising approach for performing data analysis. In this paper, we propose a non-parametric Bayesian inference to address clustering problem. This approach is based on infinite multivariate Beta mixture models constructed through the framework of Dirichlet process. We apply an accelerated variational method to learn the model. The motivation behind proposing this technique is that Dirichlet process mixture models are capable to fit the data where the number of components is unknown. For large-scale data, this approach is computationally expensive. We overcome this problem with the help of accelerated Dirichlet process mixture models. Moreover, the truncation is managed using kd-trees. The performance of the model is validated on real medical applications and compared to three other similar alternatives. The results show the outperformance of our proposed framework.