Feature Extractor (original) (raw)
A feature extractor is in charge of preparing input features for audio or vision models. This includes feature extraction from sequences, e.g., pre-processing audio files to generate Log-Mel Spectrogram features, feature extraction from images, e.g., cropping image files, but also padding, normalization, and conversion to NumPy, PyTorch, and TensorFlow tensors.
FeatureExtractionMixin
This is a feature extraction mixin used to provide saving/loading functionality for sequential and image feature extractors.
( pretrained_model_name_or_path: typing.Union[str, os.PathLike] cache_dir: typing.Union[str, os.PathLike, NoneType] = None force_download: bool = False local_files_only: bool = False token: typing.Union[str, bool, NoneType] = None revision: str = 'main' **kwargs )
Parameters
- pretrained_model_name_or_path (
str
oros.PathLike
) — This can be either:- a string, the model id of a pretrained feature_extractor hosted inside a model repo on huggingface.co.
- a path to a directory containing a feature extractor file saved using thesave_pretrained() method, e.g.,
./my_model_directory/
. - a path or url to a saved feature extractor JSON file, e.g.,
./my_model_directory/preprocessor_config.json
.
- cache_dir (
str
oros.PathLike
, optional) — Path to a directory in which a downloaded pretrained model feature extractor should be cached if the standard cache should not be used. - force_download (
bool
, optional, defaults toFalse
) — Whether or not to force to (re-)download the feature extractor files and override the cached versions if they exist. - resume_download — Deprecated and ignored. All downloads are now resumed by default when possible. Will be removed in v5 of Transformers.
- proxies (
Dict[str, str]
, optional) — A dictionary of proxy servers to use by protocol or endpoint, e.g.,{'http': 'foo.bar:3128', 'http://hostname': 'foo.bar:4012'}.
The proxies are used on each request. - token (
str
orbool
, optional) — The token to use as HTTP bearer authorization for remote files. IfTrue
, or not specified, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - revision (
str
, optional, defaults to"main"
) — The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a git-based system for storing models and other artifacts on huggingface.co, sorevision
can be any identifier allowed by git.
Instantiate a type of FeatureExtractionMixin from a feature extractor, e.g. a derived class of SequenceFeatureExtractor.
Examples:
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(
"facebook/wav2vec2-base-960h"
)
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(
"./test/saved_model/"
)
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("./test/saved_model/preprocessor_config.json")
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(
"facebook/wav2vec2-base-960h", return_attention_mask=False, foo=False
)
assert feature_extractor.return_attention_mask is False
feature_extractor, unused_kwargs = Wav2Vec2FeatureExtractor.from_pretrained(
"facebook/wav2vec2-base-960h", return_attention_mask=False, foo=False, return_unused_kwargs=True
)
assert feature_extractor.return_attention_mask is False
assert unused_kwargs == {"foo": False}
( save_directory: typing.Union[str, os.PathLike] push_to_hub: bool = False **kwargs )
Parameters
- save_directory (
str
oros.PathLike
) — Directory where the feature extractor JSON file will be saved (will be created if it does not exist). - push_to_hub (
bool
, optional, defaults toFalse
) — Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the repository you want to push to withrepo_id
(will default to the name ofsave_directory
in your namespace). - kwargs (
Dict[str, Any]
, optional) — Additional key word arguments passed along to the push_to_hub() method.
Save a feature_extractor object to the directory save_directory
, so that it can be re-loaded using thefrom_pretrained() class method.
SequenceFeatureExtractor
( feature_size: int sampling_rate: int padding_value: float **kwargs )
Parameters
- feature_size (
int
) — The feature dimension of the extracted features. - sampling_rate (
int
) — The sampling rate at which the audio files should be digitalized expressed in hertz (Hz). - padding_value (
float
) — The value that is used to fill the padding values / vectors.
This is a general feature extraction class for speech recognition.
( processed_features: typing.Union[transformers.feature_extraction_utils.BatchFeature, list[transformers.feature_extraction_utils.BatchFeature], dict[str, transformers.feature_extraction_utils.BatchFeature], dict[str, list[transformers.feature_extraction_utils.BatchFeature]], list[dict[str, transformers.feature_extraction_utils.BatchFeature]]] padding: typing.Union[bool, str, transformers.utils.generic.PaddingStrategy] = True max_length: typing.Optional[int] = None truncation: bool = False pad_to_multiple_of: typing.Optional[int] = None return_attention_mask: typing.Optional[bool] = None return_tensors: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None )
Parameters
- processed_features (BatchFeature, list of BatchFeature,
Dict[str, List[float]]
,Dict[str, List[List[float]]
orList[Dict[str, List[float]]]
) — Processed inputs. Can represent one input (BatchFeature orDict[str, List[float]]
) or a batch of input values / vectors (list of BatchFeature, Dict[str, List[List[float]]] or List[Dict[str, List[float]]]) so you can use this method during preprocessing as well as in a PyTorch Dataloader collate function.
Instead ofList[float]
you can have tensors (numpy arrays, PyTorch tensors or TensorFlow tensors), see the note above for the return type. - padding (
bool
,str
or PaddingStrategy, optional, defaults toTrue
) — Select a strategy to pad the returned sequences (according to the model’s padding side and padding index) among:True
or'longest'
: Pad to the longest sequence in the batch (or no padding if only a single sequence if provided).'max_length'
: Pad to a maximum length specified with the argumentmax_length
or to the maximum acceptable input length for the model if that argument is not provided.False
or'do_not_pad'
(default): No padding (i.e., can output a batch with sequences of different lengths).
- max_length (
int
, optional) — Maximum length of the returned list and optionally padding length (see above). - truncation (
bool
) — Activates truncation to cut input sequences longer thanmax_length
tomax_length
. - pad_to_multiple_of (
int
, optional) — If set will pad the sequence to a multiple of the provided value.
This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability>= 7.5
(Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128. - return_attention_mask (
bool
, optional) — Whether to return the attention mask. If left to the default, will return the attention mask according to the specific feature_extractor’s default.
What are attention masks? - return_tensors (
str
or TensorType, optional) — If set, will return tensors instead of list of python integers. Acceptable values are:'tf'
: Return TensorFlowtf.constant
objects.'pt'
: Return PyTorchtorch.Tensor
objects.'np'
: Return Numpynp.ndarray
objects.
Pad input values / input vectors or a batch of input values / input vectors up to predefined length or to the max sequence length in the batch.
Padding side (left/right) padding values are defined at the feature extractor level (with self.padding_side
,self.padding_value
)
If the processed_features
passed are dictionary of numpy arrays, PyTorch tensors or TensorFlow tensors, the result will use the same type unless you provide a different tensor type with return_tensors
. In the case of PyTorch tensors, you will lose the specific device of your tensors however.
BatchFeature
class transformers.BatchFeature
( data: typing.Optional[dict[str, typing.Any]] = None tensor_type: typing.Union[NoneType, str, transformers.utils.generic.TensorType] = None )
Parameters
- data (
dict
, optional) — Dictionary of lists/arrays/tensors returned by the call/pad methods (‘input_values’, ‘attention_mask’, etc.). - tensor_type (
Union[None, str, TensorType]
, optional) — You can give a tensor_type here to convert the lists of integers in PyTorch/TensorFlow/Numpy Tensors at initialization.
Holds the output of the pad() and feature extractor specific __call__
methods.
This class is derived from a python dictionary and can be used as a dictionary.
convert_to_tensors
( tensor_type: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None )
Parameters
- tensor_type (
str
or TensorType, optional) — The type of tensors to use. Ifstr
, should be one of the values of the enum TensorType. IfNone
, no modification is done.
Convert the inner content to tensors.
to
( *args **kwargs ) → BatchFeature
Parameters
- args (
Tuple
) — Will be passed to theto(...)
function of the tensors. - kwargs (
Dict
, optional) — Will be passed to theto(...)
function of the tensors. To enable asynchronous data transfer, set thenon_blocking
flag inkwargs
(defaults toFalse
).
The same instance after modification.
Send all values to device by calling v.to(*args, **kwargs)
(PyTorch only). This should support casting in different dtypes
and sending the BatchFeature
to a different device
.
ImageFeatureExtractionMixin
Mixin that contain utilities for preparing image features.
( image size ) → new_image
Parameters
- image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
of shape (n_channels, height, width) or (height, width, n_channels)) — The image to resize. - size (
int
orTuple[int, int]
) — The size to which crop the image.
A center cropped PIL.Image.Image
or np.ndarray
or torch.Tensor
of shape: (n_channels, height, width).
Crops image
to the given size using a center crop. Note that if the image is too small to be cropped to the size given, it will be padded (so the returned result has the size asked).
( image )
Parameters
- image (
PIL.Image.Image
) — The image to convert.
Converts PIL.Image.Image
to RGB format.
expand_dims
( image )
Parameters
- image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) — The image to expand.
Expands 2-dimensional image
to 3 dimensions.
( image )
Parameters
- image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) — The image whose color channels to flip. Ifnp.ndarray
ortorch.Tensor
, the channel dimension should be first.
Flips the channel order of image
from RGB to BGR, or vice versa. Note that this will trigger a conversion ofimage
to a NumPy array if it’s a PIL Image.
( image mean std rescale = False )
Parameters
- image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) — The image to normalize. - mean (
List[float]
ornp.ndarray
ortorch.Tensor
) — The mean (per channel) to use for normalization. - std (
List[float]
ornp.ndarray
ortorch.Tensor
) — The standard deviation (per channel) to use for normalization. - rescale (
bool
, optional, defaults toFalse
) — Whether or not to rescale the image to be between 0 and 1. If a PIL image is provided, scaling will happen automatically.
Normalizes image
with mean
and std
. Note that this will trigger a conversion of image
to a NumPy array if it’s a PIL Image.
( image: ndarray scale: typing.Union[float, int] )
Rescale a numpy image by scale amount
( image size resample = None default_to_square = True max_size = None ) → image
Parameters
- image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) — The image to resize. - size (
int
orTuple[int, int]
) — The size to use for resizing the image. Ifsize
is a sequence like (h, w), output size will be matched to this.
Ifsize
is an int anddefault_to_square
isTrue
, then image will be resized to (size, size). Ifsize
is an int anddefault_to_square
isFalse
, then smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size). - resample (
int
, optional, defaults toPILImageResampling.BILINEAR
) — The filter to user for resampling. - default_to_square (
bool
, optional, defaults toTrue
) — How to convertsize
when it is a single int. If set toTrue
, thesize
will be converted to a square (size
,size
). If set toFalse
, will replicatetorchvision.transforms.Resizewith support for resizing only the smallest edge and providing an optionalmax_size
. - max_size (
int
, optional, defaults toNone
) — The maximum allowed for the longer edge of the resized image: if the longer edge of the image is greater thanmax_size
after being resized according tosize
, then the image is resized again so that the longer edge is equal tomax_size
. As a result,size
might be overruled, i.e the smaller edge may be shorter thansize
. Only used ifdefault_to_square
isFalse
.
A resized PIL.Image.Image
.
Resizes image
. Enforces conversion of input to PIL.Image.
( image angle resample = None expand = 0 center = None translate = None fillcolor = None ) → image
Parameters
- image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) — The image to rotate. Ifnp.ndarray
ortorch.Tensor
, will be converted toPIL.Image.Image
before rotating.
A rotated PIL.Image.Image
.
Returns a rotated copy of image
. This method returns a copy of image
, rotated the given number of degrees counter clockwise around its centre.
( image rescale = None channel_first = True )
Parameters
- image (
PIL.Image.Image
ornp.ndarray
ortorch.Tensor
) — The image to convert to a NumPy array. - rescale (
bool
, optional) — Whether or not to apply the scaling factor (to make pixel values floats between 0. and 1.). Will default toTrue
if the image is a PIL Image or an array/tensor of integers,False
otherwise. - channel_first (
bool
, optional, defaults toTrue
) — Whether or not to permute the dimensions of the image to put the channel dimension first.
Converts image
to a numpy array. Optionally rescales it and puts the channel dimension as the first dimension.
( image rescale = None )
Parameters
- image (
PIL.Image.Image
ornumpy.ndarray
ortorch.Tensor
) — The image to convert to the PIL Image format. - rescale (
bool
, optional) — Whether or not to apply the scaling factor (to make pixel values integers between 0 and 255). Will default toTrue
if the image type is a floating type,False
otherwise.
Converts image
to a PIL Image. Optionally rescales it and puts the channel dimension back as the last axis if needed.