LIBRISPEECH — Torchaudio 2.7.0 documentation (original) (raw)
class torchaudio.datasets.LIBRISPEECH(root: Union[str, Path], url: str = 'train-clean-100', folder_in_archive: str = 'LibriSpeech', download: bool = False)[source]¶
LibriSpeech [Panayotov et al., 2015] dataset.
Parameters:
- root (str or Path) – Path to the directory where the dataset is found or downloaded.
- url (str, optional) – The URL to download the dataset from, or the type of the dataset to dowload. Allowed type values are
"dev-clean"
,"dev-other"
,"test-clean"
,"test-other"
,"train-clean-100"
,"train-clean-360"
and"train-other-500"
. (default:"train-clean-100"
) - folder_in_archive (str, optional) – The top-level directory of the dataset. (default:
"LibriSpeech"
) - download (bool, optional) – Whether to download the dataset if it is not found at root path. (default:
False
).
__getitem__¶
LIBRISPEECH.__getitem__(n: int) → Tuple[Tensor, int, str, int, int, int][source]¶
Load the n-th sample from the dataset.
Parameters:
n (int) – The index of the sample to be loaded
Returns:
Tuple of the following items;
Tensor:
Waveform
int:
Sample rate
str:
Transcript
int:
Speaker ID
int:
Chapter ID
int:
Utterance ID
get_metadata¶
LIBRISPEECH.get_metadata(n: int) → Tuple[str, int, str, int, int, int][source]¶
Get metadata for the n-th sample from the dataset. Returns filepath instead of waveform, but otherwise returns the same fields as __getitem__().
Parameters:
n (int) – The index of the sample to be loaded
Returns:
Tuple of the following items;
str:
Path to audio
int:
Sample rate
str:
Transcript
int:
Speaker ID
int:
Chapter ID
int:
Utterance ID