NeMo Speaker Diarization API — NVIDIA NeMo Framework User Guide (original) (raw)

Model Classes#

Mixins#

class nemo.collections.asr.parts.mixins.DiarizationMixin#

Bases: VerificationMixin

abstract diarize(

paths2audio_files: List[str],

batch_size: int = 1,

) → List[str]#

Takes paths to audio files and returns speaker labels :param paths2audio_files: paths to audio fragment to be transcribed

Returns:

Speaker labels

class nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin#

Bases: ABC

An abstract class for diarize-able models.

Creates a template function diarize() that provides an interface to perform transcription of audio tensors or filepaths.

The following abstract classes must be implemented by the subclass:

_setup_diarize_dataloader():
Setup the dataloader for diarization. Receives the output from_diarize_input_manifest_processing().

_diarize_forward():
Implements the model’s custom forward pass to return outputs that are processed by_diarize_output_processing().

_diarize_output_processing():
Implements the post processing of the model’s outputs to return the results to the user. The result can be a list of objects, list of list of objects, tuple of objects, tuple of list of objects, or a dict of list of objects.

abstract _diarize_forward(batch: Any)#

Internal function to perform the model’s custom forward pass to return outputs that are processed by_diarize_output_processing(). This function is called by diarize() and diarize_generator() to perform the model’s forward pass.

Parameters:

batch – A batch of input data from the data loader that is used to perform the model’s forward pass.

Returns:

The model’s outputs that are processed by _diarize_output_processing().

_diarize_input_manifest_processing(

audio_files: List[str],

temp_dir: str,

diarcfg: DiarizeConfig,

) → Dict[str, Any]#

Internal function to process the input audio filepaths and return a config dict for the dataloader.

Parameters:

audio_files – A list of string filepaths for audio files.
temp_dir – A temporary directory to store intermediate files.
diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.

Returns:

A config dict that is used to setup the dataloader for diarization.

_diarize_input_processing(

audio,

diarcfg: DiarizeConfig,

Internal function to process the input audio data and return a DataLoader. This function is called bydiarize() and diarize_generator() to setup the input data for diarization.

Parameters:

audio – Of type GenericDiarizationType
diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.

Returns:

A DataLoader object that is used to iterate over the input audio data.

_diarize_on_begin(

audio: str | List[str],

diarcfg: DiarizeConfig,

Internal function to setup the model for diarization. Perform all setup and pre-checks here.

Parameters:

audio (Union [ str , List [ str ] ]) – Of type GenericDiarizationType
diarcfg (DiarizeConfig) – An instance of DiarizeConfig.

_diarize_on_end(

diarcfg: DiarizeConfig,

Internal function to teardown the model after transcription. Perform all teardown and post-checks here.

Parameters:

diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.

abstract _diarize_output_processing(

outputs,

uniq_ids,

diarcfg: DiarizeConfig,

) → List[Any] | List[List[Any]] | Tuple[Any] | Tuple[List[Any]]#

Internal function to process the model’s outputs to return the results to the user. This function is called bydiarize() and diarize_generator() to process the model’s outputs.

Parameters:

outputs – The model’s outputs that are processed by _diarize_forward().
uniq_ids – List of unique recording identificators in batch
diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.

Returns:

The output can be a list of objects, list of list of objects, tuple of objects, tuple of list of objects. Its type is defined in GenericDiarizationType.

_input_audio_to_rttm_processing(

audio_files: List[str],

) → List[Dict[str, str | float]]#

Generate manifest style dict if audio is a list of paths to audio files.

Parameters:

audio_files – A list of paths to audio files.

Returns:

audio_rttm_map_dict A list of manifest style dicts.

abstract _setup_diarize_dataloader(

config: Dict,

) → torch.utils.data.DataLoader#

Internal function to setup the dataloader for diarization. This function is called bydiarize() and diarize_generator() to setup the input data for diarization.

Parameters:

config – A config dict that is used to setup the dataloader for diarization. It can be generated by _diarize_input_manifest_processing().

Returns:

A DataLoader object that is used to iterate over the input audio data.

diarize(

audio: str | List[str] | numpy.ndarray | torch.utils.data.DataLoader,

batch_size: int = 1,

include_tensor_outputs: bool = False,

postprocessing_yaml: str | None = None,

num_workers: int = 1,

verbose: bool = False,

override_config: DiarizeConfig | None = None,

**config_kwargs,

) → List[Any] | List[List[Any]] | Tuple[Any] | Tuple[List[Any]]#

Takes paths to audio files and returns speaker labels

diarize_generator(

audio,

override_config: DiarizeConfig | None,

A generator version of diarize function.