Java — Mozilla DeepSpeech 0.9.3 documentation (original) (raw)
DeepSpeechModel¶
class DeepSpeechModel¶
Exposes a DeepSpeech model in Java.
Public Functions
org.deepspeech.libdeepspeech.DeepSpeechModel.DeepSpeechModel(String modelPath)
An object providing an interface to a trained DeepSpeech model.
Parameters
modelPath: The path to the frozen model graph.
Exceptions
RuntimeException: on failure.
long org.deepspeech.libdeepspeech.DeepSpeechModel.beamWidth()
Get beam width value used by the model. If setModelBeamWidth was not called before, will return the default value loaded from the model file.
Return
Beam width value used by the model.
void org.deepspeech.libdeepspeech.DeepSpeechModel.setBeamWidth(long beamWidth)
Set beam width value used by the model.
Parameters
aBeamWidth: The beam width used by the model. A larger beam width value generates better results at the cost of decoding time.
Exceptions
RuntimeException: on failure.
int org.deepspeech.libdeepspeech.DeepSpeechModel.sampleRate()
Return the sample rate expected by the model.
Return
Sample rate.
void org.deepspeech.libdeepspeech.DeepSpeechModel.freeModel()
Frees associated resources and destroys model object.
void org.deepspeech.libdeepspeech.DeepSpeechModel.enableExternalScorer(String scorer)
Enable decoding using an external scorer.
Parameters
scorer: The path to the external scorer file.
Exceptions
RuntimeException: on failure.
void org.deepspeech.libdeepspeech.DeepSpeechModel.disableExternalScorer()
Disable decoding using an external scorer.
Exceptions
RuntimeException: on failure.
void org.deepspeech.libdeepspeech.DeepSpeechModel.setScorerAlphaBeta(float alpha, float beta)
Enable decoding using beam scoring with a KenLM language model.
Parameters
alpha: The alpha hyperparameter of the decoder. Language model weight.beta: The beta hyperparameter of the decoder. Word insertion weight.
Exceptions
RuntimeException: on failure.
Metadata org.deepspeech.libdeepspeech.DeepSpeechModel.sttWithMetadata(short [] buffer, int buffer_size, int num_results)
Use the DeepSpeech model to perform Speech-To-Text and output metadata about the results.
Return
Metadata struct containing multiple candidate transcripts. Each transcript has per-token metadata including timing information.
Parameters
buffer: A 16-bit, mono raw audio signal at the appropriate sample rate (matching what the model was trained on).buffer_size: The number of samples in the audio signal.num_results: Maximum number of candidate transcripts to return. Returned list might be smaller than this.
DeepSpeechStreamingState org.deepspeech.libdeepspeech.DeepSpeechModel.createStream()
Create a new streaming inference state. The streaming state returned by this function can then be passed to feedAudioContent() and finishStream().
Return
An opaque object that represents the streaming state.
Exceptions
RuntimeException: on failure.
void org.deepspeech.libdeepspeech.DeepSpeechModel.feedAudioContent(DeepSpeechStreamingState ctx, short [] buffer, int buffer_size)
Feed audio samples to an ongoing streaming inference.
Parameters
cctx: A streaming state pointer returned by createStream().buffer: An array of 16-bit, mono raw audio samples at the appropriate sample rate (matching what the model was trained on).buffer_size: The number of samples inbuffer.
String org.deepspeech.libdeepspeech.DeepSpeechModel.intermediateDecode(DeepSpeechStreamingState ctx)
Compute the intermediate decoding of an ongoing streaming inference.
Return
The STT intermediate result.
Parameters
ctx: A streaming state pointer returned by createStream().
Metadata org.deepspeech.libdeepspeech.DeepSpeechModel.intermediateDecodeWithMetadata(DeepSpeechStreamingState ctx, int num_results)
Compute the intermediate decoding of an ongoing streaming inference.
Return
The STT intermediate result.
Parameters
ctx: A streaming state pointer returned by createStream().num_results: Maximum number of candidate transcripts to return. Returned list might be smaller than this.
String org.deepspeech.libdeepspeech.DeepSpeechModel.finishStream(DeepSpeechStreamingState ctx)
Compute the final decoding of an ongoing streaming inference and return the result. Signals the end of an ongoing streaming inference.
Return
The STT result.
Note
This method will free the state pointer (ctx).
Parameters
ctx: A streaming state pointer returned by createStream().
Metadata org.deepspeech.libdeepspeech.DeepSpeechModel.finishStreamWithMetadata(DeepSpeechStreamingState ctx, int num_results)
Compute the final decoding of an ongoing streaming inference and return the results including metadata. Signals the end of an ongoing streaming inference.
Return
Metadata struct containing multiple candidate transcripts. Each transcript has per-token metadata including timing information.
Note
This method will free the state pointer (ctx).
Parameters
ctx: A streaming state pointer returned by createStream().num_results: Maximum number of candidate transcripts to return. Returned list might be smaller than this.
void org.deepspeech.libdeepspeech.DeepSpeechModel.addHotWord(String word, float boost)
Add a hot-word.
Parameters
word:boost:
Exceptions
RuntimeException: on failure.
void org.deepspeech.libdeepspeech.DeepSpeechModel.eraseHotWord(String word)
Erase a hot-word.
Parameters
word:
Exceptions
RuntimeException: on failure.
void org.deepspeech.libdeepspeech.DeepSpeechModel.clearHotWords()
Clear all hot-words.
Exceptions
RuntimeException: on failure.
Metadata¶
class Metadata¶
An array of CandidateTranscript objects computed by the model.
Public Functions
long org.deepspeech.libdeepspeech.Metadata.getNumTranscripts()
Size of the transcripts array
CandidateTranscript org.deepspeech.libdeepspeech.Metadata.getTranscript(int i)
Retrieve one CandidateTranscript element
Return
The CandidateTranscript requested or null
Parameters
i: Array index of the CandidateTranscript to get
CandidateTranscript¶
class CandidateTranscript¶
A single transcript computed by the model, including a confidence
value and the metadata for its constituent tokens.
Public Functions
long org.deepspeech.libdeepspeech.CandidateTranscript.getNumTokens()
Size of the tokens array
double org.deepspeech.libdeepspeech.CandidateTranscript.getConfidence()
Approximated confidence value for this transcript. This is roughly the
sum of the acoustic model logit values for each timestep/character that
contributed to the creation of this transcript.
TokenMetadata org.deepspeech.libdeepspeech.CandidateTranscript.getToken(int i)
Retrieve one TokenMetadata element
Return
The TokenMetadata requested or null
Parameters
i: Array index of the TokenMetadata to get
TokenMetadata¶
class TokenMetadata¶
Stores text of an individual token, along with its timing information
Public Functions
String org.deepspeech.libdeepspeech.TokenMetadata.getText()
The text corresponding to this token
long org.deepspeech.libdeepspeech.TokenMetadata.getTimestep()
Position of the token in units of 20ms
float org.deepspeech.libdeepspeech.TokenMetadata.getStartTime()
Position of the token in seconds