Feature Extractor (original) (raw)

A feature extractor is in charge of preparing input features for audio or vision models. This includes feature extraction from sequences, e.g., pre-processing audio files to generate Log-Mel Spectrogram features, feature extraction from images, e.g., cropping image files, but also padding, normalization, and conversion to NumPy, PyTorch, and TensorFlow tensors.

FeatureExtractionMixin

This is a feature extraction mixin used to provide saving/loading functionality for sequential and image feature extractors.

( pretrained_model_name_or_path: typing.Union[str, os.PathLike] cache_dir: typing.Union[str, os.PathLike, NoneType] = None force_download: bool = False local_files_only: bool = False token: typing.Union[str, bool, NoneType] = None revision: str = 'main' **kwargs )

Parameters

Instantiate a type of FeatureExtractionMixin from a feature extractor, e.g. a derived class of SequenceFeatureExtractor.

Examples:

feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained( "facebook/wav2vec2-base-960h" )
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained( "./test/saved_model/" )
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("./test/saved_model/preprocessor_config.json") feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained( "facebook/wav2vec2-base-960h", return_attention_mask=False, foo=False ) assert feature_extractor.return_attention_mask is False feature_extractor, unused_kwargs = Wav2Vec2FeatureExtractor.from_pretrained( "facebook/wav2vec2-base-960h", return_attention_mask=False, foo=False, return_unused_kwargs=True ) assert feature_extractor.return_attention_mask is False assert unused_kwargs == {"foo": False}

( save_directory: typing.Union[str, os.PathLike] push_to_hub: bool = False **kwargs )

Parameters

Save a feature_extractor object to the directory save_directory, so that it can be re-loaded using thefrom_pretrained() class method.

SequenceFeatureExtractor

( feature_size: int sampling_rate: int padding_value: float **kwargs )

Parameters

This is a general feature extraction class for speech recognition.

( processed_features: typing.Union[transformers.feature_extraction_utils.BatchFeature, list[transformers.feature_extraction_utils.BatchFeature], dict[str, transformers.feature_extraction_utils.BatchFeature], dict[str, list[transformers.feature_extraction_utils.BatchFeature]], list[dict[str, transformers.feature_extraction_utils.BatchFeature]]] padding: typing.Union[bool, str, transformers.utils.generic.PaddingStrategy] = True max_length: typing.Optional[int] = None truncation: bool = False pad_to_multiple_of: typing.Optional[int] = None return_attention_mask: typing.Optional[bool] = None return_tensors: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None )

Parameters

Pad input values / input vectors or a batch of input values / input vectors up to predefined length or to the max sequence length in the batch.

Padding side (left/right) padding values are defined at the feature extractor level (with self.padding_side,self.padding_value)

If the processed_features passed are dictionary of numpy arrays, PyTorch tensors or TensorFlow tensors, the result will use the same type unless you provide a different tensor type with return_tensors. In the case of PyTorch tensors, you will lose the specific device of your tensors however.

BatchFeature

class transformers.BatchFeature

< source >

( data: typing.Optional[dict[str, typing.Any]] = None tensor_type: typing.Union[NoneType, str, transformers.utils.generic.TensorType] = None )

Parameters

Holds the output of the pad() and feature extractor specific __call__ methods.

This class is derived from a python dictionary and can be used as a dictionary.

convert_to_tensors

< source >

( tensor_type: typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None )

Parameters

Convert the inner content to tensors.

to

< source >

( *args **kwargs ) → BatchFeature

Parameters

The same instance after modification.

Send all values to device by calling v.to(*args, **kwargs) (PyTorch only). This should support casting in different dtypes and sending the BatchFeature to a different device.

ImageFeatureExtractionMixin

Mixin that contain utilities for preparing image features.

( image size ) → new_image

Parameters

A center cropped PIL.Image.Image or np.ndarray or torch.Tensor of shape: (n_channels, height, width).

Crops image to the given size using a center crop. Note that if the image is too small to be cropped to the size given, it will be padded (so the returned result has the size asked).

( image )

Parameters

Converts PIL.Image.Image to RGB format.

expand_dims

< source >

( image )

Parameters

Expands 2-dimensional image to 3 dimensions.

( image )

Parameters

Flips the channel order of image from RGB to BGR, or vice versa. Note that this will trigger a conversion ofimage to a NumPy array if it’s a PIL Image.

( image mean std rescale = False )

Parameters

Normalizes image with mean and std. Note that this will trigger a conversion of image to a NumPy array if it’s a PIL Image.

( image: ndarray scale: typing.Union[float, int] )

Rescale a numpy image by scale amount

( image size resample = None default_to_square = True max_size = None ) → image

Parameters

A resized PIL.Image.Image.

Resizes image. Enforces conversion of input to PIL.Image.

( image angle resample = None expand = 0 center = None translate = None fillcolor = None ) → image

Parameters

A rotated PIL.Image.Image.

Returns a rotated copy of image. This method returns a copy of image, rotated the given number of degrees counter clockwise around its centre.

( image rescale = None channel_first = True )

Parameters

Converts image to a numpy array. Optionally rescales it and puts the channel dimension as the first dimension.

( image rescale = None )

Parameters

Converts image to a PIL Image. Optionally rescales it and puts the channel dimension back as the last axis if needed.

< > Update on GitHub