hoi.metrics.TC — HOI 0.0.5 documentation (original) (raw)
Contents
hoi.metrics.TC#
class hoi.metrics.TC(x, y=None, multiplets=None, verbose=None)[source]#
Total correlation.
Total correlation is the oldest extension of mutual information to an arbitrary number of variables. It is defined as:
\[\begin{split}TC(X^{n}) &= \sum_{j=1}^{n} H(X_{j}) - H(X^{n}) \\\end{split}\]
The total correlation is equivalent to the Kullback-Liebler divergence between the joint distribution :math: P(X) and the product of the marginals. The total correlation is largely a measure of redundancy, sensitive to information shared between single elements.
Parameters:
xarray_like
Standard NumPy arrays of shape (n_samples, n_features) or (n_samples, n_features, n_variables)
yarray_like
The feature of shape (n_samples,) for estimating task-related TC
multipletslist | None
List of multiplets to compute. Should be a list of multiplets, for example [(0, 1, 2), (2, 7, 8, 9)]. By default, all multiplets are going to be computed.
Attributes:
Entropies of shape (n_mult,)
Indices of the multiplets of shape (n_mult, maxsize).
Order of each multiplet of shape (n_mult,).
Under-sampling threshold.
Methods
References
Watabe, 1960 [37]
__iter__()#
Iteration over orders.
compute_entropies(method='gc', minsize=1, maxsize=None, samples=None, **kwargs)#
Compute entropies for all multiplets.
Parameters:
method{‘gc’, ‘binning’, ‘knn’, ‘kernel}
Name of the method to compute entropy. Use either :
- ‘gc’: gaussian copula entropy [default]. Seehoi.core.entropy_gc()
- ‘binning’: binning-based estimator of entropy. Note that to use this estimator, the data have be to discretized. Seehoi.core.entropy_bin()
- ‘knn’: k-nearest neighbor estimator. Seehoi.core.entropy_knn()
- ‘kernel’: kernel-based estimator of entropy see hoi.core.entropy_kernel()
samplesnp.ndarray
List of samples to use to compute HOI. If None, all samples are going to be used.
minsizeint, optional
Minimum size of the multiplets. Default is 1.
maxsizeint, optional
Maximum size of the multiplets. Default is None.
kwargsdict, optional
Additional arguments to pass to the entropy function.
Returns:
h_xarray_like
Entropies of shape (n_mult, n_variables)
h_idxarray_like
Indices of the multiplets of shape (n_mult, maxsize)
orderarray_like
Order of each multiplet of shape (n_mult,)
property entropies#
Entropies of shape (n_mult,)
fit(minsize=2, maxsize=None, method='gc', samples=None, **kwargs)[source]#
Compute the Total correlation.
Parameters:
minsize, maxsizeint | 2, None
Minimum and maximum size of the multiplets
method{‘gc’, ‘binning’, ‘knn’, ‘kernel’, callable}
Name of the method to compute entropy. Use either :
- ‘gc’: gaussian copula entropy [default]. Seehoi.core.entropy_gc()
- ‘gauss’: gaussian entropy. See hoi.core.entropy_gauss()
- ‘binning’: binning-based estimator of entropy. Note that to use this estimator, the data have be to discretized. Seehoi.core.entropy_bin()
- ‘knn’: k-nearest neighbor estimator. Seehoi.core.entropy_knn()
- ‘kernel’: kernel-based estimator of entropy see hoi.core.entropy_kernel()
- A custom entropy estimator can be provided. It should be a callable function written with Jax taking a single 2D input of shape (n_features, n_samples) and returning a float.
samplesnp.ndarray
List of samples to use to compute HOI. If None, all samples are going to be used.
kwargsdict | {}
Additional arguments are sent to each entropy function
Returns:
hoiarray_like
The NumPy array containing values of higher-order interactions of shape (n_multiplets, n_variables)
get_combinations(minsize, maxsize=None, astype='jax')#
Get combinations of features.
Parameters:
minsizeint
Minimum size of the multiplets
maxsizeint | None
Maximum size of the multiplets. If None, minsize is used.
astype{‘jax’, ‘numpy’, ‘iterator’}
Specify the output type. Use either ‘jax’ get the data as a jax array [default], ‘numpy’ for NumPy array or ‘iterator’.
Returns:
combinationsarray_like
Combinations of features.
property multiplets#
Indices of the multiplets of shape (n_mult, maxsize).
By convention, we used -1 to indicate that a feature has been ignored.
property order#
Order of each multiplet of shape (n_mult,).
property undersampling#
Under-sampling threshold.