Inferring properties of probability kernels from the pairs of variables they involve (original) (raw)

Compatibility of Distributions in Probabilistic Models: An Algebraic Frame and Some Characterizations

2020

A probabilistic model may be formed of distinct distributional assumptions, and these may specify admissible distributions on distinct (not necessarily disjoint) subsets of the whole set of random variables of concern in the model. Such distributions on subsets of variables are said to be mutually compatible if there exists a distribution on the whole set of variables that precisely subsumes all of them. In Section 2 of this paper, an algebraic frame for this compatibility concept is constructed, by first observing that all marginal and/or conditional distributions (also called "probability kernels") that are implicit in a global distribution form a lattice, and then by highlighting the properties of useful operations that are internal to this algebraic structure. In Sections 3, 4, and 5, characterizations of the concept of compatibility are presented; first a characterization that depends only on set-theoretic relations between the variables involved in the distributions under judgment; then characterizations that are applicable only to pairs of candidate distributions; and then a characterization that is applicable to any set of candidate distributions when the variables involved in each of these are exhaustive of the set of variables in the model. Lastly, in Section 6, different categories of models are mentioned (a model of classical statistics, a corresponding hierarchical Bayesian model, Bayesian networks, Markov random fields, and the Gibbs sampler) to illustrate why the compatibility problem may have different levels of saliency and solutions in different kinds of probabilistic models.

Characterizing the function space for Bayesian kernel models

2007

Kernel methods have been very popular in the machine learning literature in the last ten years, mainly in the context of Tikhonov regularization algorithms. In this paper we study a coherent Bayesian kernel model based on an integral operator whose domain is a space of signed measures. Priors on the signed measures induce prior distributions on their image functions under the integral operator. We study several classes of signed measures and their images, and identify general classes of measures whose images are dense in the reproducing kernel Hilbert space (RKHS) induced by the kernel. This gives a function-theoretic foundation for some nonparametric prior specifications commonly-used in Bayesian modeling, including Gaussian processes and Dirichlet processes, and suggests generalizations.

A notion of conditional probability and some of its consequences

Decisions in Economics and Finance, 2019

An alternative notion of conditional probability (say AN) is discussed and investigated. If compared with the usual notion (regular conditional distributions), AN gives up the measurability constraint but requires a properness condition. An existence result for AN is provided. Also, some consequences of AN are pointed out, with reference to Bayesian statistics, exchangeability and compatibility.

On Distance and Kernel Measures of Conditional Independence

arXiv: Statistics Theory, 2019

Measuring conditional independence is one of the important tasks in statistical inference and is fundamental in causal discovery, feature selection, dimensionality reduction, Bayesian network learning, and others. In this work, we explore the connection between conditional independence measures induced by distances on a metric space and reproducing kernels associated with a reproducing kernel Hilbert space (RKHS). For certain distance and kernel pairs, we show the distance-based conditional independence measures to be equivalent to that of kernel-based measures. On the other hand, we also show that some popular---in machine learning---kernel conditional independence measures based on the Hilbert-Schmidt norm of a certain cross-conditional covariance operator, do not have a simple distance representation, except in some limiting cases. This paper, therefore, shows the distance and kernel measures of conditional independence to be not quite equivalent unlike in the case of joint indep...

On predictive distributions and Bayesian networks

Statistics and Computing, 2000

In this paper we are interested in discrete prediction problems for a decision-theoretic setting, where the task is to compute the predictive distribution for a finite set of possible alternatives. This question is first addressed in a general Bayesian framework, where we consider a set of probability distributions defined by some parametric model class. Given a prior distribution on the model parameters and a set of sample data, one possible approach for determining a predictive distribution is to fix the parameters to the instantiation with the maximum a posteriori probability. A more accurate predictive distribution can be obtained by computing the evidence (marginal likelihood), i.e., the integral over all the individual parameter instantiations. As an alternative to these two approaches, we demonstrate how to use Rissanen's new definition of stochastic complexity for determining predictive distributions, and show how the evidence predictive distribution with Jeffrey's...

The Category of Markov Kernels

Electronic Notes in Theoretical Computer Science, 1999

Markov kernels are fundamental objects in probability theory. One can de ne a category based on Markov kernels which has many of the formal properties of the ordinary category of relations. In the present paper we will examine the categorical properties of Markov kernels and stress the analogies and di erences with the category of relations. We will show that this category has partially-additive structure and, as such, supports basic constructs like iteration. This allows one to give a probabilistic semantics for a language with while loops in the manner of Kozen. The category in question was originally de ned by Giry following suggestions of Lawvere.

Probabilistic relations

1998

Abstract The notion of binary relation is fundamental in logic. What is the correct analogue of this concept in the probabilistic case? I will argue that the notion of conditional probability distribution (Markov kernel, stochastic kernel) is the correct generalization. One can de ne a category based on stochastic kernels which has many of the formal properties of the ordinary category of relations.

About Conditional Belief Function Independence

2001

The concept of conditional independence has been extensively studied in probability theory (see, for instance, [2], [3], [6], ...). Pearl and Paz [7] have introduced some basic properties of the conditional independence relation, called "graphoid axioms". These axioms are satisfied not only by probabilistic conditional independence, but also by embedded multi-valued dependency models in relational databases [8], by conditional independence in Spohn's theory of ordinal conditional functions [11], [4], by qualitative conditional independence in Dempster-Shafer theory of belief functions partitions [9], and by conditional independence in valuation-based systems (VBS) [10] capable of representing many different uncertainty calculi. The aim of this paper is to propose the new definitions of conditional independence when uncertainty is expressed under the form of belief functions and then to discuss the relationships between these definitions. The notion of conditional independence is given with the conditional independence relations [6], [2], [3], which successfully depict our intuition about how dependencies should update in response to new pieces of information.

A sufficient condition for belief function construction from conditional belief functions

1998

It is commonly acknowledged that we need to accept and handle uncertainty when reasoning with real world data. The most profoundly studied measure of uncertainty i s the probability. H o wever, the general feeling is that probability cannot express all types of uncertainty, including vagueness and incompleteness of knowledge. The Mathematical Theory of Evidence or the Dempster-Shafer Theory (DST) 1, 12] has been intensely investigated in the past as a means of expressing incomplete knowledge. The interesting property i n t h i s c o n text is that DST formally ts into the framework of graphoidal structures 13] which implies possibilities of e cient reasoning by local computations in large multivariate belief distributions given a factorization of the belief distribution into low dimensional component conditional belief functions. But the concept of conditional belief functions is generally not usable because composition of conditional belief functions is not granted to yield joint m ultivariate belief distribution, as some values of the belief distribution may turn out to be negative 4, 13, 15]. To o vercome this problem creation of an adequate frequency model is needed. In this paper we suggest that a Dempster-Shafer distribution results from "clustering" (merging) of objects sharing common features. Upon "clustering" two (or more) objects become indistinguishable (will be counted as one) but some attributes will behave a s i f t h e y h a ve more than one value at once. The next elements of the model needed are the concept of conditional independence and that of merger conditions. It is assumed that before merger the objects move closer in such a w ay that conditional distributions of features for the objects to merge are identical. The traditional conditional independence of feature variables is assumed before merger (thereafter only the DST conditional independence holds). Furthermore it is necessary that the objects get "closer" before the merger independly for each feature variable and only those areas merge where the conditional distributions get identical in each v ariable. The paper demonstrates that within this model, the graphoidal properties hold and a su cient condition for non-negativity of the graphoidally represented belief function is presented and its validity demonstrated.