explabox.explain.text — explabox documentation (original) (raw)

Add explainability to your text model/dataset.

class explabox.explain.text.Explainer(data=None, model=None, ingestibles=None, **kwargs)

Bases: Readable, IngestiblesMixin

The Explainer creates explanations corresponding to a model and dataset (with ground-truth labels).

With the Explainer you can use explainble AI (XAI) methods for explaining the whole dataset (global), model behavior on the dataset (global), and specific predictions/decisions (local).

The Explainer requires ‘data’ and ‘model’ defined. It is included in the Explabox under the .explain property.

Examples

Construct the explainer:

from explabox.explain import Explainer explainer = Explainer(data=data, model=model)

Get a local explanation with LIME (https://github.com/marcotcr/lime) and kernelSHAP (https://github.com/slundberg/shap):

explainer.explain_prediction('I love this so much!', methods=['lime', 'kernel_shap'])

See the top-25 tokens for predicted classifier labels on the test set:

explainer.token_frequency(k=25, explain_model=True, splits='test')

Select the top-5 prototypical examples in the train set:

explainer.prototypes(n=5, splits='train')

Parameters:

explain_prediction(sample, *args, methods=['lime'], **kwargs)

Explain specific sample locally.

Parameters:

Returns:

Explanations for each selected method, unless method is unknown (returns None).

Return type:

Optional[MultipleReturn]

prototypes(method='mmdcritic', n=5, splits='test', embedder=<class 'text_explainability.data.embedding.TfidfVectorizer'>, labelwise=False, seed=0)

Select n prototypes (representative samples) for the given split(s).

Parameters:

Raises:

ValueError – Unknown method selected.

Returns:

Prototypes for each methods and split.

Return type:

Union[Instances, MultipleReturn]

prototypes_criticisms(n_prototypes=5, n_criticisms=3, splits='test', embedder=<class 'text_explainability.data.embedding.TfidfVectorizer'>, labelwise=False, **kwargs)

Select n prototypes (representative samples) and n criticisms (outliers) for the given split(s).

Parameters:

Returns:

Prototypes for each methods and split.

Return type:

Union[Instances, MultipleReturn]

token_frequency(splits='test', explain_model=True, labelwise=True, k=25, filter_words=<Proxy at 0x721e98071f40 wrapping ['de'_, _'het'_, _'een'] at 0x721e96fe6c40 with factory <function lazy..>>, lower=True, seed=0, **count_vectorizer_kwargs)

Show the top-k number of tokens for each ground-truth or predicted label.

Parameters:

Returns:

Each label with corresponding top words and their frequency

Return type:

Union[FeatureList, MultipleReturn]

token_information(splits='test', explain_model=True, k=25, filter_words=<Proxy at 0x721f4db20440 wrapping ['de'_, _'het'_, _'een'] at 0x721e97027c80 with factory <function lazy..>>, lower=True, seed=0, **count_vectorizer_kwargs)

Show the top-k token mutual information for a dataset or model.

Parameters:

Returns:

k labels, sorted based on their mutual information with

the output (predictive model labels or ground-truth labels)

Return type:

Union[FeatureList, MultipleReturn]