scipy.stats.multivariate_hypergeom — SciPy v1.15.2 Manual (original) (raw)
scipy.stats.multivariate_hypergeom = <scipy.stats._multivariate.multivariate_hypergeom_gen object>[source]#
A multivariate hypergeometric random variable.
Parameters:
marray_like
The number of each type of object in the population. That is, \(m[i]\) is the number of objects of type \(i\).
narray_like
The number of samples taken from the population.
seed{None, int, np.random.RandomState, np.random.Generator}, optional
Used for drawing random variates. If seed is None, the RandomState singleton is used. If seed is an int, a new RandomState
instance is used, seeded with seed. If seed is already a RandomState
or Generator
instance, then that object is used. Default is None.
Notes
m must be an array of positive integers. If the quantile\(i\) contains values out of the range \([0, m_i]\)where \(m_i\) is the number of objects of type \(i\)in the population or if the parameters are inconsistent with one another (e.g. x.sum() != n
), methods return the appropriate value (e.g. 0
for pmf
). If m or n contain negative values, the result will contain nan
there.
The probability mass function for multivariate_hypergeom is
\[\begin{split}P(X_1 = x_1, X_2 = x_2, \ldots, X_k = x_k) = \frac{\binom{m_1}{x_1} \binom{m_2}{x_2} \cdots \binom{m_k}{x_k}}{\binom{M}{n}}, \\ \quad (x_1, x_2, \ldots, x_k) \in \mathbb{N}^k \text{ with } \sum_{i=1}^k x_i = n\end{split}\]
where \(m_i\) are the number of objects of type \(i\), \(M\)is the total number of objects in the population (sum of all the\(m_i\)), and \(n\) is the size of the sample to be taken from the population.
Added in version 1.6.0.
References
Examples
To evaluate the probability mass function of the multivariate hypergeometric distribution, with a dichotomous population of size\(10\) and \(20\), at a sample of size \(12\) with\(8\) objects of the first type and \(4\) objects of the second type, use:
from scipy.stats import multivariate_hypergeom multivariate_hypergeom.pmf(x=[8, 4], m=[10, 20], n=12) 0.0025207176631464523
The multivariate_hypergeom distribution is identical to the corresponding hypergeom distribution (tiny numerical differences notwithstanding) when only two types (good and bad) of objects are present in the population as in the example above. Consider another example for a comparison with the hypergeometric distribution:
from scipy.stats import hypergeom multivariate_hypergeom.pmf(x=[3, 1], m=[10, 5], n=4) 0.4395604395604395 hypergeom.pmf(k=3, M=15, n=4, N=10) 0.43956043956044005
The functions pmf
, logpmf
, mean
, var
, cov
, and rvs
support broadcasting, under the convention that the vector parameters (x
, m
, and n
) are interpreted as if each row along the last axis is a single object. For instance, we can combine the previous two calls to multivariate_hypergeom as
multivariate_hypergeom.pmf(x=[[8, 4], [3, 1]], m=[[10, 20], [10, 5]], ... n=[12, 4]) array([0.00252072, 0.43956044])
This broadcasting also works for cov
, where the output objects are square matrices of size m.shape[-1]
. For example:
multivariate_hypergeom.cov(m=[[7, 9], [10, 15]], n=[8, 12]) array([[[ 1.05, -1.05], [-1.05, 1.05]], [[ 1.56, -1.56], [-1.56, 1.56]]])
That is, result[0]
is equal tomultivariate_hypergeom.cov(m=[7, 9], n=8)
and result[1]
is equal to multivariate_hypergeom.cov(m=[10, 15], n=12)
.
Alternatively, the object may be called (as a function) to fix the _m_and n parameters, returning a “frozen” multivariate hypergeometric random variable.
rv = multivariate_hypergeom(m=[10, 20], n=12) rv.pmf(x=[8, 4]) 0.0025207176631464523
Methods
pmf(x, m, n) | Probability mass function. |
---|---|
logpmf(x, m, n) | Log of the probability mass function. |
rvs(m, n, size=1, random_state=None) | Draw random samples from a multivariate hypergeometric distribution. |
mean(m, n) | Mean of the multivariate hypergeometric distribution. |
var(m, n) | Variance of the multivariate hypergeometric distribution. |
cov(m, n) | Compute the covariance matrix of the multivariate hypergeometric distribution. |