Datasets API Reference — TextAttack 0.3.10 documentation (original) (raw)

Dataset class define the dataset object used to for carrying out attacks, augmentation, and training.Dataset class is the most basic class that could be used to wrap a list of input and output pairs. To load datasets from text, CSV, or JSON files, we recommend using 🤗 Datasets library to first load it as a datasets.Dataset object and then pass it to TextAttack’s HuggingFaceDataset class.

Dataset

class textattack.datasets.Dataset(dataset, input_columns=['text'], label_map=None, label_names=None, output_scale_factor=None, shuffle=False)[source]

Basic class for dataset. It operates as a map-style dataset, fetching data via __getitem__() and __len__() methods.

Note

This class subclasses torch.utils.data.Dataset and therefore can be treated as a regular PyTorch Dataset.

Parameters:

Examples:

import textattack

Example of sentiment-classification dataset

data = [("I enjoyed the movie a lot!", 1), ("Absolutely horrible film.", 0), ("Our family had a fun time!", 1)] dataset = textattack.datasets.Dataset(data) dataset[1:2]

Example for pair of sequence inputs (e.g. SNLI)

data = [("A man inspects the uniform of a figure in some East Asian country.", "The man is sleeping"), 1)] dataset = textattack.datasets.Dataset(data, input_columns=("premise", "hypothesis"))

Example for seq2seq

data = [("J'aime le film.", "I love the movie.")] dataset = textattack.datasets.Dataset(data)

__getitem__(i)[source]

Return i-th sample.

__len__()[source]

Returns the size of dataset.

HuggingFaceDataset

class textattack.datasets.HuggingFaceDataset(name_or_dataset, subset=None, split='train', dataset_columns=None, label_map=None, label_names=None, output_scale_factor=None, shuffle=False)[source]

Loads a dataset from 🤗 Datasets and prepares it as a TextAttack dataset.

Parameters:

__getitem__(i)[source]

Return i-th sample.

__len__()

Returns the size of dataset.