.NET Framework — Mozilla DeepSpeech 0.9.3 documentation (original) (raw)

DeepSpeech Class

class

Concrete implementation of DeepSpeechClient.Interfaces.IDeepSpeech.

Public Functions

DeepSpeechClient.DeepSpeech.DeepSpeech(string aModelPath)

Initializes a new instance of DeepSpeech class and creates a new acoustic model.

Parameters

Exceptions

unsafe uint DeepSpeechClient.DeepSpeech.GetModelBeamWidth()

Get beam width value used by the model. If SetModelBeamWidth was not called before, will return the default value loaded from the model file.

Return

Beam width value used by the model.

unsafe void DeepSpeechClient.DeepSpeech.SetModelBeamWidth(uint aBeamWidth)

Set beam width value used by the model.

Parameters

Exceptions

unsafe void DeepSpeechClient.DeepSpeech.AddHotWord(string aWord, float aBoost)

Add a hot-word.

Parameters

Exceptions

unsafe void DeepSpeechClient.DeepSpeech.EraseHotWord(string aWord)

Erase entry for a hot-word.

Parameters

Exceptions

unsafe void DeepSpeechClient.DeepSpeech.ClearHotWords()

Clear all hot-words.

Exceptions

unsafe int DeepSpeechClient.DeepSpeech.GetModelSampleRate()

Return the sample rate expected by the model.

Return

Sample rate.

unsafe void DeepSpeechClient.DeepSpeech.Dispose()

Frees associated resources and destroys models objects.

unsafe void DeepSpeechClient.DeepSpeech.EnableExternalScorer(string aScorerPath)

Enable decoding using an external scorer.

Parameters

Exceptions

unsafe void DeepSpeechClient.DeepSpeech.DisableExternalScorer()

Disable decoding using an external scorer.

Exceptions

unsafe void DeepSpeechClient.DeepSpeech.SetScorerAlphaBeta(float aAlpha, float aBeta)

Set hyperparameters alpha and beta of the external scorer.

Parameters

Exceptions

unsafe void DeepSpeechClient.DeepSpeech.FeedAudioContent(DeepSpeechStream stream, short [] aBuffer, uint aBufferSize)

Feeds audio samples to an ongoing streaming inference.

Parameters

unsafe string DeepSpeechClient.DeepSpeech.FinishStream(DeepSpeechStream stream)

Closes the ongoing streaming inference, returns the STT result over the whole audio signal.

Return

The STT result.

Parameters

unsafe Metadata DeepSpeechClient.DeepSpeech.FinishStreamWithMetadata(DeepSpeechStream stream, uint aNumResults)

Closes the ongoing streaming inference, returns the STT result over the whole audio signal, including metadata.

Return

The extended metadata result.

Parameters

unsafe string DeepSpeechClient.DeepSpeech.IntermediateDecode(DeepSpeechStream stream)

Computes the intermediate decoding of an ongoing streaming inference.

Return

The STT intermediate result.

Parameters

unsafe Metadata DeepSpeechClient.DeepSpeech.IntermediateDecodeWithMetadata(DeepSpeechStream stream, uint aNumResults)

Computes the intermediate decoding of an ongoing streaming inference, including metadata.

Return

The STT intermediate result.

Parameters

unsafe string DeepSpeechClient.DeepSpeech.Version()

Return version of this library. The returned version is a semantic version (SemVer 2.0.0).

unsafe DeepSpeechStream DeepSpeechClient.DeepSpeech.CreateStream()

Creates a new streaming inference state.

unsafe void DeepSpeechClient.DeepSpeech.FreeStream(DeepSpeechStream stream)

Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.

unsafe string DeepSpeechClient.DeepSpeech.SpeechToText(short [] aBuffer, uint aBufferSize)

Use the DeepSpeech model to perform Speech-To-Text.

Return

The STT result. Returns NULL on error.

Parameters

unsafe Metadata DeepSpeechClient.DeepSpeech.SpeechToTextWithMetadata(short [] aBuffer, uint aBufferSize, uint aNumResults)

Use the DeepSpeech model to perform Speech-To-Text, return results including metadata.

Return

The extended metadata. Returns NULL on error.

Parameters

DeepSpeechStream Class

class DeepSpeechStream : public IDisposable

Wrapper of the pointer used for the decoding stream.

Public Functions

unsafe DeepSpeechClient.Models.DeepSpeechStream.DeepSpeechStream(IntPtr ** streamingStatePP)

Initializes a new instance of DeepSpeechStream.

Parameters

ErrorCodes

See also the main definition including descriptions for each error in Error codes.

enum DeepSpeechClient::Enums`::` ErrorCodes

Error codes from the native DeepSpeech binary.

Values:

DS_ERR_OK = 0x0000

DS_ERR_NO_MODEL = 0x1000

DS_ERR_INVALID_ALPHABET = 0x2000

DS_ERR_INVALID_SHAPE = 0x2001

DS_ERR_INVALID_SCORER = 0x2002

DS_ERR_MODEL_INCOMPATIBLE = 0x2003

DS_ERR_SCORER_NOT_ENABLED = 0x2004

DS_ERR_FAIL_INIT_MMAP = 0x3000

DS_ERR_FAIL_INIT_SESS = 0x3001

DS_ERR_FAIL_INTERPRETER = 0x3002

DS_ERR_FAIL_RUN_SESS = 0x3003

DS_ERR_FAIL_CREATE_STREAM = 0x3004

DS_ERR_FAIL_READ_PROTOBUF = 0x3005

DS_ERR_FAIL_CREATE_SESS = 0x3006

DS_ERR_FAIL_INSERT_HOTWORD = 0x3008

DS_ERR_FAIL_CLEAR_HOTWORD = 0x3009

DS_ERR_FAIL_ERASE_HOTWORD = 0x3010

Metadata

class Metadata

Stores the entire CTC output as an array of character metadata objects.

Property

property DeepSpeechClient::Models::Metadata::Transcripts

List of candidate transcripts.

CandidateTranscript

class CandidateTranscript

Stores the entire CTC output as an array of character metadata objects.

Property

property DeepSpeechClient::Models::CandidateTranscript::Confidence

Approximated confidence value for this transcription.

property DeepSpeechClient::Models::CandidateTranscript::Tokens

List of metada tokens containing text, timestep, and time offset.

TokenMetadata

class TokenMetadata

Stores each individual character, along with its timing information.

Public Members

string DeepSpeechClient.Models.TokenMetadata.Text

Char of the current timestep.

int DeepSpeechClient.Models.TokenMetadata.Timestep

Position of the character in units of 20ms.

float DeepSpeechClient.Models.TokenMetadata.StartTime

Position of the character in seconds.

DeepSpeech Interface

interface IDeepSpeech

Client interface for DeepSpeech

Subclassed by DeepSpeechClient.DeepSpeech

Public Functions

unsafe string DeepSpeechClient.Interfaces.IDeepSpeech.Version()

Return version of this library. The returned version is a semantic version (SemVer 2.0.0).

unsafe int DeepSpeechClient.Interfaces.IDeepSpeech.GetModelSampleRate()

Return the sample rate expected by the model.

Return

Sample rate.

unsafe uint DeepSpeechClient.Interfaces.IDeepSpeech.GetModelBeamWidth()

Get beam width value used by the model. If SetModelBeamWidth was not called before, will return the default value loaded from the model file.

Return

Beam width value used by the model.

unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.SetModelBeamWidth(uint aBeamWidth)

Set beam width value used by the model.

Parameters

Exceptions

unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.EnableExternalScorer(string aScorerPath)

Enable decoding using an external scorer.

Parameters

Exceptions

unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.AddHotWord(string aWord, float aBoost)

Add a hot-word.

Parameters

Exceptions

unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.EraseHotWord(string aWord)

Erase entry for a hot-word.

Parameters

Exceptions

unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.ClearHotWords()

Clear all hot-words.

Exceptions

unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.DisableExternalScorer()

Disable decoding using an external scorer.

Exceptions

unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.SetScorerAlphaBeta(float aAlpha, float aBeta)

Set hyperparameters alpha and beta of the external scorer.

Parameters

Exceptions

unsafe string DeepSpeechClient.Interfaces.IDeepSpeech.SpeechToText(short [] aBuffer, uint aBufferSize)

Use the DeepSpeech model to perform Speech-To-Text.

Return

The STT result. Returns NULL on error.

Parameters

unsafe Metadata DeepSpeechClient.Interfaces.IDeepSpeech.SpeechToTextWithMetadata(short [] aBuffer, uint aBufferSize, uint aNumResults)

Use the DeepSpeech model to perform Speech-To-Text, return results including metadata.

Return

The extended metadata. Returns NULL on error.

Parameters

unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.FreeStream(DeepSpeechStream stream)

Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.

unsafe DeepSpeechStream DeepSpeechClient.Interfaces.IDeepSpeech.CreateStream()

Creates a new streaming inference state.

unsafe void DeepSpeechClient.Interfaces.IDeepSpeech.FeedAudioContent(DeepSpeechStream stream, short [] aBuffer, uint aBufferSize)

Feeds audio samples to an ongoing streaming inference.

Parameters

unsafe string DeepSpeechClient.Interfaces.IDeepSpeech.IntermediateDecode(DeepSpeechStream stream)

Computes the intermediate decoding of an ongoing streaming inference.

Return

The STT intermediate result.

Parameters

unsafe Metadata DeepSpeechClient.Interfaces.IDeepSpeech.IntermediateDecodeWithMetadata(DeepSpeechStream stream, uint aNumResults)

Computes the intermediate decoding of an ongoing streaming inference, including metadata.

Return

The extended metadata result.

Parameters

unsafe string DeepSpeechClient.Interfaces.IDeepSpeech.FinishStream(DeepSpeechStream stream)

Closes the ongoing streaming inference, returns the STT result over the whole audio signal.

Return

The STT result.

Parameters

unsafe Metadata DeepSpeechClient.Interfaces.IDeepSpeech.FinishStreamWithMetadata(DeepSpeechStream stream, uint aNumResults)

Closes the ongoing streaming inference, returns the STT result over the whole audio signal, including metadata.

Return

The extended metadata result.

Parameters