explabox.explain.text — explabox documentation (original) (raw)
Add explainability to your text model/dataset.
class explabox.explain.text.Explainer(data=None, model=None, ingestibles=None, **kwargs)
Bases: Readable
, IngestiblesMixin
The Explainer creates explanations corresponding to a model and dataset (with ground-truth labels).
With the Explainer you can use explainble AI (XAI) methods for explaining the whole dataset (global), model behavior on the dataset (global), and specific predictions/decisions (local).
The Explainer requires ‘data’ and ‘model’ defined. It is included in the Explabox under the .explain property.
Examples
Construct the explainer:
from explabox.explain import Explainer explainer = Explainer(data=data, model=model)
Get a local explanation with LIME (https://github.com/marcotcr/lime) and kernelSHAP (https://github.com/slundberg/shap):
explainer.explain_prediction('I love this so much!', methods=['lime', 'kernel_shap'])
See the top-25 tokens for predicted classifier labels on the test set:
explainer.token_frequency(k=25, explain_model=True, splits='test')
Select the top-5 prototypical examples in the train set:
explainer.prototypes(n=5, splits='train')
Parameters:
- data (Optional [ Environment ] , optional) – Data for ingestibles. Defaults to None.
- model (Optional [ AbstractClassifier ] , optional) – Model for ingestibles. Defaults to None.
- ingestibles (Optional _[_Ingestible] , optional) – Ingestible. Defaults to None.
explain_prediction(sample, *args, methods=['lime'], **kwargs)
Explain specific sample locally.
Parameters:
- sample (
Union
[int
,str
]) – Identifier of sample in dataset (int) or input (str). - methods (
Union
[str
,List
[str
]]) – List of methods to get explanations from. Choose from ‘lime’, ‘shap’, ‘baylime’, ‘tree’, ‘rules’, ‘foil_tree’. - *args – Positional arguments passed to local explanation technique.
- **kwargs – Keyword arguments passed to local explanation technique.
Returns:
Explanations for each selected method, unless method is unknown (returns None).
Return type:
Optional[MultipleReturn]
prototypes(method='mmdcritic', n=5, splits='test', embedder=<class 'text_explainability.data.embedding.TfidfVectorizer'>, labelwise=False, seed=0)
Select n prototypes (representative samples) for the given split(s).
Parameters:
- method (str , optional) – Method(s) to apply. Choose from [‘mmdcritic’, ‘kmedoids’]. Defaults to ‘mmdcritic’.
- n (int , optional) – Number of prototypes to generate. Defaults to 5.
- splits (Union [ str , List [ str ] ] , optional) – Name(s) of split(s). Defaults to “test”.
- embedder (Optional [ Embedder ] , optional) – Embedder used. Defaults to TfidfVectorizer.
- labelwise (bool , optional) – Select for each label. Defaults to False.
- seed (int , optional) – Seed for reproducibility. Defaults to 0.
Raises:
ValueError – Unknown method selected.
Returns:
Prototypes for each methods and split.
Return type:
Union[Instances, MultipleReturn]
prototypes_criticisms(n_prototypes=5, n_criticisms=3, splits='test', embedder=<class 'text_explainability.data.embedding.TfidfVectorizer'>, labelwise=False, **kwargs)
Select n prototypes (representative samples) and n criticisms (outliers) for the given split(s).
Parameters:
- n_prototypes (int , optional) – Number of prototypes to generate. Defaults to 5.
- n_criticsms (int , optional) – Number of criticisms to generate. Defaults to 3.
- splits (Union [ str , List [ str ] ] , optional) – Name(s) of split(s). Defaults to “test”.
- embedder (Optional [ Embedder ] , optional) – Embedder used. Defaults to TfidfVectorizer.
- labelwise (bool , optional) – Select for each label. Defaults to False.
- n_criticisms (int) –
Returns:
Prototypes for each methods and split.
Return type:
Union[Instances, MultipleReturn]
token_frequency(splits='test', explain_model=True, labelwise=True, k=25, filter_words=<Proxy at 0x721e98071f40 wrapping ['de'_, _'het'_, _'een'] at 0x721e96fe6c40 with factory <function lazy..>>, lower=True, seed=0, **count_vectorizer_kwargs)
Show the top-k number of tokens for each ground-truth or predicted label.
Parameters:
- splits (Union [ str , List [ str ] ] , optional) – Split names to get the explanation for. Defaults to ‘test’.
- explain_model (bool , optional) – Whether to explain the model (True) or ground-truth labels (False). Defaults to True.
- labelwise (bool , optional) – Whether to summarize the counts for each label seperately. Defaults to True.
- k (Optional [ int ] , optional) – Limit to the top-k words per label, or all words if None. Defaults to 25.
- filter_words (List [ str ] , optional) – Words to filter out from top-k. Defaults to [‘a’, ‘an’, ‘the’].
- lower (bool , optional) – Whether to make all tokens lowercase. Defaults to True.
- seed (int , optional) –
- **count_vectorizer_kwargs – Optional arguments passed to CountVectorizer/FastCountVectorizer.
Returns:
Each label with corresponding top words and their frequency
Return type:
Union[FeatureList, MultipleReturn]
token_information(splits='test', explain_model=True, k=25, filter_words=<Proxy at 0x721f4db20440 wrapping ['de'_, _'het'_, _'een'] at 0x721e97027c80 with factory <function lazy..>>, lower=True, seed=0, **count_vectorizer_kwargs)
Show the top-k token mutual information for a dataset or model.
Parameters:
- splits (Union [ str , List [ str ] ] , optional) – Split names to get the explanation for. Defaults to ‘test’.
- explain_model (bool , optional) – Whether to explain the model (True) or ground-truth labels (False). Defaults to True.
- labelwise (bool , optional) – Whether to summarize the counts for each label seperately. Defaults to True.
- k (Optional [ int ] , optional) – Limit to the top-k words per label, or all words if None. Defaults to 25.
- filter_words (List [ str ] , optional) – Words to filter out from top-k. Defaults to [‘a’, ‘an’, ‘the’].
- lower (bool , optional) – Whether to make all tokens lowercase. Defaults to True.
- seed (int , optional) –
- **count_vectorizer_kwargs – Optional arguments passed to CountVectorizer/FastCountVectorizer.
Returns:
k labels, sorted based on their mutual information with
the output (predictive model labels or ground-truth labels)
Return type:
Union[FeatureList, MultipleReturn]