nvidia.dali.fn.readers.nemo_asr — NVIDIA DALI (original) (raw)

nvidia.dali.fn.readers.nemo_asr(*, bytes_per_sample_hint=[0], dont_use_mmap=False, downmix=True, dtype=DALIDataType.FLOAT, initial_fill=1024, lazy_init=False, manifest_filepaths, max_duration=0.0, min_duration=0.0, num_shards=1, pad_last_batch=False, prefetch_queue_depth=1, preserve=False, quality=50.0, random_shuffle=False, read_ahead=False, read_idxs=False, read_sample_rate=True, read_text=True, sample_rate=-1.0, seed=-1, shard_id=0, shuffle_after_epoch=False, skip_cached_images=False, stick_to_shard=False, tensor_init_bytes=1048576, device=None, name=None)#

Reads automatic speech recognition (ASR) data (audio, text) from an NVIDIA NeMo compatible manifest.

Example manifest file:

{ "audio_filepath": "path/to/audio1.wav", "duration": 3.45, "text": "this is a nemo tutorial" } { "audio_filepath": "path/to/audio1.wav", "offset": 3.45, "duration": 1.45, "text": "same audio file but using offset" } { "audio_filepath": "path/to/audio2.wav", "duration": 3.45, "text": "third transcript in this example" }

Note

Only audio_filepath is field mandatory. If duration is not specified, the whole audio file will be used. A missing text field will produce an empty string as a text.

Warning

Handling of duration and offset fields is not yet implemented. The current implementation always reads the whole audio file.

This reader produces between 1 and 3 outputs:

Supported backends

Keyword Arguments: