eli5.lime — ELI5 0.15.0 documentation (original) (raw)

eli5.lime.lime

An impementation of LIME (http://arxiv.org/abs/1602.04938), an algorithm to explain predictions of black-box models.

class TextExplainer(n_samples: int = 5000, char_based: bool | None = None, clf=None, vec=None, sampler: BaseSampler | None = None, position_dependent: bool = False, rbf_sigma: float | None = None, random_state=None, expand_factor: int | None = 10, token_pattern: str | None = None)[source]

TextExplainer allows to explain predictions of black-box text classifiers using LIME algorithm.

Parameters:

rng_

random state

Type:

numpy.random.RandomState

samples_

A list of samples the local model is trained on. Only available after fit().

Type:

list[str]

X_

A matrix with vectorized samples_. Only available after fit().

Type:

ndarray or scipy.sparse matrix

similarity_

Similarity vector. Only available after fit().

Type:

ndarray

y_proba_

probabilities predicted by black-box classifier (predict_proba(self.samples_) result). Only available after fit().

Type:

ndarray

clf_

Trained white-box classifier. Only available after fit().

Type:

object

vec_

Fit white-box vectorizer. Only available after fit().

Type:

object

metrics_

A dictionary with metrics of how well the local classification pipeline approximates the black-box pipeline. Only available after fit().

Type:

dict

explain_prediction(**kwargs)[source]

Call eli5.explain_prediction() for the locally-fit classification pipeline. Keyword arguments are passed to eli5.explain_prediction().

fit() must be called before using this method.

explain_weights(**kwargs)[source]

Call eli5.show_weights() for the locally-fit classification pipeline. Keyword arguments are passed to eli5.show_weights().

fit() must be called before using this method.

fit(doc: str, predict_proba: Callable[[Any], Any]) → TextExplainer[source]

Explain predict_proba probabilistic classification function for the doc example. This method fits a local classification pipeline following LIME approach.

To get the explanation use show_prediction(),show_weights(), explain_prediction() orexplain_weights().

Parameters:

set_fit_request(*, doc: bool | None | str = '$UNCHANGED$', predict_proba: bool | None | str = '$UNCHANGED$') → TextExplainer

Request metadata passed to the fit method.

Note that this method is only relevant ifenable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside aPipeline. Otherwise it has no effect.

Parameters:

Returns:

self (object) – The updated object.

show_prediction(**kwargs)[source]

Call eli5.show_prediction() for the locally-fit classification pipeline. Keyword arguments are passed to eli5.show_prediction().

fit() must be called before using this method.

show_weights(**kwargs)[source]

Call eli5.show_weights() for the locally-fit classification pipeline. Keyword arguments are passed to eli5.show_weights().

fit() must be called before using this method.

eli5.lime.samplers

class BaseSampler[source]

Base sampler class. Sampler is an object which generates examples similar to a given example.

fit(X=None, y=None)[source]

abstractmethod sample_near(doc, n_samples=1)[source]

Return (examples, similarity) tuple with generated documents similar to a given document and a vector of similarity values.

class MaskingTextSampler(token_pattern: str | None = None, bow: bool = True, random_state=None, replacement: str = '', min_replace: int | float = 1, max_replace: int | float = 1.0, group_size: int = 1)[source]

Sampler for text data. It randomly removes or replaces tokens from text.

Parameters:

sample_near(doc: str, n_samples: int = 1) → tuple[list[str], ndarray][source]

Return (examples, similarity) tuple with generated documents similar to a given document and a vector of similarity values.

sample_near_with_mask(doc: TokenizedText | str, n_samples: int = 1) → tuple[list[str], ndarray, ndarray, TokenizedText][source]

class MaskingTextSamplers(sampler_params: list[dict[str, Any]], token_pattern: str | None = None, random_state=None, weights: ndarray | list[float] | None = None)[source]

Union of MaskingText samplers, with weights.sample_near() or sample_near_with_mask() generate a requested number of samples using all samplers; a probability of using a sampler is proportional to its weight.

All samplers must use the same token_pattern in order forsample_near_with_mask() to work.

Create it with a list of {param: value} dicts with MaskingTextSampler paremeters.

sample_near(doc: str, n_samples: int = 1) → tuple[list[str], ndarray][source]

Return (examples, similarity) tuple with generated documents similar to a given document and a vector of similarity values.

sample_near_with_mask(doc: str, n_samples: int = 1) → tuple[list[str], ndarray, ndarray, TokenizedText][source]

class MultivariateKernelDensitySampler(kde=None, metric='euclidean', fit_bandwidth=True, bandwidths=array([1.00000000e-06, 1.00000000e-03, 3.16227766e-03, 1.00000000e-02, 3.16227766e-02, 1.00000000e-01, 3.16227766e-01, 1.00000000e+00, 3.16227766e+00, 1.00000000e+01, 3.16227766e+01, 1.00000000e+02, 3.16227766e+02, 1.00000000e+03, 3.16227766e+03, 1.00000000e+04]), sigma='bandwidth', n_jobs=1, random_state=None)[source]

General-purpose sampler for dense continuous data, based on multivariate kernel density estimation.

The limitation is that a single bandwidth value is used for all dimensions, i.e. bandwith matrix is a positive scalar times the identity matrix. It is a problem e.g. when features have different variances (e.g. some of them are one-hot encoded and other are continuous).

fit(X=None, y=None)[source]

sample_near(doc, n_samples=1)[source]

Return (examples, similarity) tuple with generated documents similar to a given document and a vector of similarity values.

class UnivariateKernelDensitySampler(kde=None, metric='euclidean', fit_bandwidth=True, bandwidths=array([1.00000000e-06, 1.00000000e-03, 3.16227766e-03, 1.00000000e-02, 3.16227766e-02, 1.00000000e-01, 3.16227766e-01, 1.00000000e+00, 3.16227766e+00, 1.00000000e+01, 3.16227766e+01, 1.00000000e+02, 3.16227766e+02, 1.00000000e+03, 3.16227766e+03, 1.00000000e+04]), sigma='bandwidth', n_jobs=1, random_state=None)[source]

General-purpose sampler for dense continuous data, based on univariate kernel density estimation. It estimates a separate probability distribution for each input dimension.

The limitation is that variable interactions are not taken in account.

Unlike KernelDensitySampler it uses different bandwidths for different dimensions; because of that it can handle one-hot encoded features somehow (make sure to at least tune the default sigma parameter). Also, at sampling time it replaces only random subsets of the features instead of generating totally new examples.

fit(X=None, y=None)[source]

sample_near(doc, n_samples=1)[source]

Sample near the document by replacing some of its features with values sampled from distribution found by KDE.

eli5.lime.textutils

Utilities for text generation.

cosine_similarity_vec(num_tokens, num_removed_vec)[source]

Return cosine similarity between a binary vector with all ones of length num_tokens and vectors of the same length withnum_removed_vec elements set to zero.

generate_samples(text: TokenizedText, n_samples=500, bow=True, random_state=None, replacement='', min_replace=1.0, max_replace=1.0, group_size=1) → Tuple[List[str], ndarray, ndarray][source]

Return n_samples changed versions of text (with some words removed), along with distances between the original text and a generated examples. If bow=False, all tokens are considered unique (i.e. token position matters).