NeMo Speaker Diarization API — NVIDIA NeMo Framework User Guide (original) (raw)
Model Classes#
Mixins#
class nemo.collections.asr.parts.mixins.DiarizationMixin#
Bases: VerificationMixin
abstract diarize(
paths2audio_files: List[str],
batch_size: int = 1,
) → List[str]#
Takes paths to audio files and returns speaker labels :param paths2audio_files: paths to audio fragment to be transcribed
Returns:
Speaker labels
class nemo.collections.asr.parts.mixins.diarization.SpkDiarizationMixin#
Bases: ABC
An abstract class for diarize-able models.
Creates a template function diarize() that provides an interface to perform transcription of audio tensors or filepaths.
The following abstract classes must be implemented by the subclass:
- _setup_diarize_dataloader():
Setup the dataloader for diarization. Receives the output from_diarize_input_manifest_processing().- _diarize_forward():
Implements the model’s custom forward pass to return outputs that are processed by_diarize_output_processing().- _diarize_output_processing():
Implements the post processing of the model’s outputs to return the results to the user. The result can be a list of objects, list of list of objects, tuple of objects, tuple of list of objects, or a dict of list of objects.
abstract _diarize_forward(batch: Any)#
Internal function to perform the model’s custom forward pass to return outputs that are processed by_diarize_output_processing(). This function is called by diarize() and diarize_generator() to perform the model’s forward pass.
Parameters:
batch – A batch of input data from the data loader that is used to perform the model’s forward pass.
Returns:
The model’s outputs that are processed by _diarize_output_processing().
_diarize_input_manifest_processing(
audio_files: List[str],
temp_dir: str,
diarcfg: DiarizeConfig,
) → Dict[str, Any]#
Internal function to process the input audio filepaths and return a config dict for the dataloader.
Parameters:
- audio_files – A list of string filepaths for audio files.
- temp_dir – A temporary directory to store intermediate files.
- diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.
Returns:
A config dict that is used to setup the dataloader for diarization.
_diarize_input_processing(
audio,
diarcfg: DiarizeConfig,
)#
Internal function to process the input audio data and return a DataLoader. This function is called bydiarize() and diarize_generator() to setup the input data for diarization.
Parameters:
- audio – Of type GenericDiarizationType
- diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.
Returns:
A DataLoader object that is used to iterate over the input audio data.
_diarize_on_begin(
audio: str | List[str],
diarcfg: DiarizeConfig,
)#
Internal function to setup the model for diarization. Perform all setup and pre-checks here.
Parameters:
- audio (Union [ str , List [ str ] ]) – Of type GenericDiarizationType
- diarcfg (DiarizeConfig) – An instance of DiarizeConfig.
_diarize_on_end(
diarcfg: DiarizeConfig,
)#
Internal function to teardown the model after transcription. Perform all teardown and post-checks here.
Parameters:
diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.
abstract _diarize_output_processing(
outputs,
uniq_ids,
diarcfg: DiarizeConfig,
) → List[Any] | List[List[Any]] | Tuple[Any] | Tuple[List[Any]]#
Internal function to process the model’s outputs to return the results to the user. This function is called bydiarize() and diarize_generator() to process the model’s outputs.
Parameters:
- outputs – The model’s outputs that are processed by _diarize_forward().
- uniq_ids – List of unique recording identificators in batch
- diarcfg – The diarization config dataclass. Subclasses can change this to a different dataclass if needed.
Returns:
The output can be a list of objects, list of list of objects, tuple of objects, tuple of list of objects. Its type is defined in GenericDiarizationType.
_input_audio_to_rttm_processing(
audio_files: List[str],
) → List[Dict[str, str | float]]#
Generate manifest style dict if audio is a list of paths to audio files.
Parameters:
audio_files – A list of paths to audio files.
Returns:
audio_rttm_map_dict A list of manifest style dicts.
abstract _setup_diarize_dataloader(
config: Dict,
) → torch.utils.data.DataLoader#
Internal function to setup the dataloader for diarization. This function is called bydiarize() and diarize_generator() to setup the input data for diarization.
Parameters:
config – A config dict that is used to setup the dataloader for diarization. It can be generated by _diarize_input_manifest_processing().
Returns:
A DataLoader object that is used to iterate over the input audio data.
diarize(
audio: str | List[str] | numpy.ndarray | torch.utils.data.DataLoader,
batch_size: int = 1,
include_tensor_outputs: bool = False,
postprocessing_yaml: str | None = None,
num_workers: int = 1,
verbose: bool = False,
override_config: DiarizeConfig | None = None,
**config_kwargs,
) → List[Any] | List[List[Any]] | Tuple[Any] | Tuple[List[Any]]#
Takes paths to audio files and returns speaker labels
diarize_generator(
audio,
override_config: DiarizeConfig | None,
)#
A generator version of diarize function.