Python — Mozilla DeepSpeech 0.9.3 documentation (original) (raw)
Model¶
class Model(model_path)[source]¶
Class holding a DeepSpeech model
Parameters
aModelPath (str) – Path to model file to load
addHotWord(word, boost)[source]¶
Add a word and its boost for decoding.
Parameters
Throws
RuntimeError on error
Get beam width value used by the model. If setModelBeamWidth was not called before, will return the default value loaded from the model file.
Returns
Beam width value used by the model.
Type
Remove all entries from hot-words dict.
Throws
RuntimeError on error
Create a new streaming inference state. The streaming state returned by this function can then be passed to feedAudioContent() and finishStream().
Returns
Stream object representing the newly created stream
Type
Throws
RuntimeError on error
disableExternalScorer()[source]¶
Disable decoding using an external scorer.
Returns
Zero on success, non-zero on failure.
enableExternalScorer(scorer_path)[source]¶
Enable decoding using an external scorer.
Parameters
scorer_path (str) – The path to the external scorer file.
Throws
RuntimeError on error
Remove entry for word from hot-words dict.
Parameters
word (str) – the hot-word
Throws
RuntimeError on error
Return the sample rate expected by the model.
Returns
Sample rate.
Type
setBeamWidth(beam_width)[source]¶
Set beam width value used by the model.
Parameters
beam_width (int) – The beam width used by the model. A larger beam width value generates better results at the cost of decoding time.
Returns
Zero on success, non-zero on failure.
Type
setScorerAlphaBeta(alpha, beta)[source]¶
Set hyperparameters alpha and beta of the external scorer.
Parameters
- alpha (float) – The alpha hyperparameter of the decoder. Language model weight.
- beta (float) – The beta hyperparameter of the decoder. Word insertion weight.
Returns
Zero on success, non-zero on failure.
Type
Use the DeepSpeech model to perform Speech-To-Text.
Parameters
audio_buffer (numpy.int16 array) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
Returns
The STT result.
Type
sttWithMetadata(audio_buffer, num_results=1)[source]¶
Use the DeepSpeech model to perform Speech-To-Text and return results including metadata.
Parameters
- audio_buffer (numpy.int16 array) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
- num_results (int) – Maximum number of candidate transcripts to return. Returned list might be smaller than this.
Returns
Metadata object containing multiple candidate transcripts. Each transcript has per-token metadata including timing information.
Type
Stream¶
class Stream(native_stream)[source]¶
Class wrapping a DeepSpeech stream. The constructor cannot be called directly. Use Model.createStream()
feedAudioContent(audio_buffer)[source]¶
Feed audio samples to an ongoing streaming inference.
Parameters
audio_buffer (numpy.int16 array) – A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).
Throws
RuntimeError if the stream object is not valid
Compute the final decoding of an ongoing streaming inference and return the result. Signals the end of an ongoing streaming inference. The underlying stream object must not be used after this method is called.
Returns
The STT result.
Type
Throws
RuntimeError if the stream object is not valid
finishStreamWithMetadata(num_results=1)[source]¶
Compute the final decoding of an ongoing streaming inference and return results including metadata. Signals the end of an ongoing streaming inference. The underlying stream object must not be used after this method is called.
Parameters
num_results (int) – Maximum number of candidate transcripts to return. Returned list might be smaller than this.
Returns
Metadata object containing multiple candidate transcripts. Each transcript has per-token metadata including timing information.
Type
Throws
RuntimeError if the stream object is not valid
Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference.
Throws
RuntimeError if the stream object is not valid
Compute the intermediate decoding of an ongoing streaming inference.
Returns
The STT intermediate result.
Type
Throws
RuntimeError if the stream object is not valid
intermediateDecodeWithMetadata(num_results=1)[source]¶
Compute the intermediate decoding of an ongoing streaming inference and return results including metadata.
Parameters
num_results (int) – Maximum number of candidate transcripts to return. Returned list might be smaller than this.
Returns
Metadata object containing multiple candidate transcripts. Each transcript has per-token metadata including timing information.
Type
Throws
RuntimeError if the stream object is not valid
Metadata¶
List of candidate transcripts
Returns
A list of CandidateTranscript() objects
Type
CandidateTranscript¶
class CandidateTranscript[source]¶
Stores the entire CTC output as an array of character metadata objects
Approximated confidence value for this transcription. This is roughly the sum of the acoustic model logit values for each timestep/character that contributed to the creation of this transcription.
List of tokens
Returns
A list of TokenMetadata() elements
Type
TokenMetadata¶
Stores each individual character, along with its timing information
Position of the token in seconds
The text for this token
Position of the token in units of 20ms