Maximum Mean Discrepancy — alibi-detect 0.12.0 documentation (original) (raw)

Overview

The Maximum Mean Discrepancy (MMD) detector is a kernel-based method for multivariate 2 sample testing. The MMD is a distance-based measure between 2 distributions p and q based on the mean embeddings \(\mu_{p}\) and \(\mu_{q}\) in a reproducing kernel Hilbert space \(F\):

\[MMD(F, p, q) = || \mu_{p} - \mu_{q} ||^2_{F}\]

We can compute unbiased estimates of \(MMD^2\) from the samples of the 2 distributions after applying the kernel trick. We use by default a radial basis function kernel, but users are free to pass their own kernel of preference to the detector. We obtain a \(p\)-value via a permutation test on the values of \(MMD^2\).

For high-dimensional data, we typically want to reduce the dimensionality before computing the permutation test. Following suggestions in Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift, we incorporate Untrained AutoEncoders (UAE) and black-box shift detection using the classifier’s softmax outputs (BBSDs) as out-of-the box preprocessing methods and note thatPCA can also be easily implemented using scikit-learn. Preprocessing methods which do not rely on the classifier will usually pick up drift in the input data, while BBSDs focuses on label shift.

Detecting input data drift (covariate shift) \(\Delta p(x)\) for text data requires a custom preprocessing step. We can pick up changes in the semantics of the input by extracting (contextual) embeddings and detect drift on those. Strictly speaking we are not detecting \(\Delta p(x)\) anymore since the whole training procedure (objective function, training data etc) for the (pre)trained embeddings has an impact on the embeddings we extract. The library contains functionality to leverage pre-trained embeddings from HuggingFace’s transformer package but also allows you to easily use your own embeddings of choice. Both options are illustrated with examples in the Text drift detection on IMDB movie reviews notebook.

Usage

Initialize

Arguments:

Keyword arguments:

Additional PyTorch keyword arguments:

Additional KeOps keyword arguments:

Initialized drift detector examples for each of the available backends:

from alibi_detect.cd import MMDDrift

cd_tf = MMDDrift(x_ref, backend='tensorflow', p_val=.05) cd_torch = MMDDrift(x_ref, backend='pytorch', p_val=.05) cd_keops = MMDDrift(x_ref, backend='keops', p_val=.05)

We can also easily add preprocessing functions for the TensorFlow and PyTorch frameworks. Note that we can also combine for instance a PyTorch preprocessing step with a KeOps detector. The following example uses a randomly initialized image encoder in PyTorch:

from functools import partial import torch import torch.nn as nn from alibi_detect.cd.pytorch import preprocess_drift

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

define encoder

encoder_net = nn.Sequential( nn.Conv2d(3, 64, 4, stride=2, padding=0), nn.ReLU(), nn.Conv2d(64, 128, 4, stride=2, padding=0), nn.ReLU(), nn.Conv2d(128, 512, 4, stride=2, padding=0), nn.ReLU(), nn.Flatten(), nn.Linear(2048, 32) ).to(device).eval()

define preprocessing function

preprocess_fn = partial(preprocess_drift, model=encoder_net, device=device, batch_size=512)

cd = MMDDrift(x_ref, backend='pytorch', p_val=.05, preprocess_fn=preprocess_fn)

The same functionality is supported in TensorFlow and the main difference is that you would import from alibi_detect.cd.tensorflow import preprocess_drift. Other preprocessing steps such as the output of hidden layers of a model or extracted text embeddings using transformer models can be used in a similar way in both frameworks. TensorFlow example for the hidden layer output:

from alibi_detect.cd.tensorflow import HiddenOutput, preprocess_drift

model = # TensorFlow model; tf.keras.Model or tf.keras.Sequential preprocess_fn = partial(preprocess_drift, model=HiddenOutput(model, layer=-1), batch_size=128)

cd = MMDDrift(x_ref, backend='tensorflow', p_val=.05, preprocess_fn=preprocess_fn)

Check out the Drift detection on CIFAR10 example for more details.

Alibi Detect also includes custom text preprocessing steps in both TensorFlow and PyTorch based on Huggingface’s transformers package:

import torch import torch.nn as nn from transformers import AutoTokenizer from alibi_detect.cd.pytorch import preprocess_drift from alibi_detect.models.pytorch import TransformerEmbedding

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model_name = 'bert-base-cased' tokenizer = AutoTokenizer.from_pretrained(model_name)

embedding_type = 'hidden_state' layers = [5, 6, 7] embed = TransformerEmbedding(model_name, embedding_type, layers) model = nn.Sequential(embed, nn.Linear(768, 256), nn.ReLU(), nn.Linear(256, enc_dim)).to(device).eval() preprocess_fn = partial(preprocess_drift, model=model, tokenizer=tokenizer, max_len=512, batch_size=32)

initialise drift detector

cd = MMDDrift(x_ref, backend='pytorch', p_val=.05, preprocess_fn=preprocess_fn)

Again the same functionality is supported in TensorFlow but with from alibi_detect.cd.tensorflow import preprocess_drift and from alibi_detect.models.tensorflow import TransformerEmbedding imports. Check out the Text drift detection on IMDB movie reviews example for more information.

Detect Drift

We detect data drift by simply calling predict on a batch of instances x. We can return the p-value and the threshold of the permutation test by setting return_p_val to True and the maximum mean discrepancy metric and threshold by setting return_distance to True.

The prediction takes the form of a dictionary with meta and data keys. meta contains the detector’s metadata while data is also a dictionary which contains the actual predictions stored in the following keys:

preds = cd.predict(X, return_p_val=True, return_distance=True)

Examples

Graph

Drift detection on molecular graphs

Image

Drift detection on CIFAR10

Tabular

Scaling up drift detection with KeOps

Text

Text drift detection on IMDB movie reviews