JavaScript (NodeJS / ElectronJS) — Mozilla DeepSpeech 0.9.3 documentation (original) (raw)

Model

class Model(aModelPath)

exported from index

An object providing an interface to a trained DeepSpeech model.

Arguments

Model. addHotWord(aWord, aBoost)

Add a hot-word and its boost

Arguments

Model. beamWidth()

Get beam width value used by the model. If Model.setBeamWidth() was not called before, will return the default value loaded from the model file.

Returns

number – Beam width value used by the model.

Model. clearHotWords()

Clear all hot-word entries

Model. createStream()

Create a new streaming inference state. One can then call StreamImpl.feedAudioContent() and StreamImpl.finishStream() on the returned stream object.

Returns

StreamImpl – a StreamImpl() object that represents the streaming state.

Model. disableExternalScorer()

Disable decoding using an external scorer.

Model. enableExternalScorer(aScorerPath)

Enable decoding using an external scorer.

Arguments

Model. eraseHotWord(aWord)

Erase entry for hot-word

Arguments

Model. sampleRate()

Return the sample rate expected by the model.

Returns

number – Sample rate.

Model. setBeamWidth(aBeamWidth)

Set beam width value used by the model.

Arguments

Model. setScorerAlphaBeta(aLMAlpha, aLMBeta)

Set hyperparameters alpha and beta of the external scorer.

Arguments

Model. stt(aBuffer)

Use the DeepSpeech model to perform Speech-To-Text.

Arguments

Returns

string – The STT result. Returns undefined on error.

Model. sttWithMetadata(aBuffer, aNumResults)

Use the DeepSpeech model to perform Speech-To-Text and output metadata about the results.

Arguments

Returns

MetadataMetadata() object containing multiple candidate transcripts. Each transcript has per-token metadata including timing information. The user is responsible for freeing Metadata by calling FreeMetadata(). Returns undefined on error.

Stream

class StreamImpl(nativeStream)

Provides an interface to a DeepSpeech stream. The constructor cannot be called directly, use Model.createStream().

Arguments

StreamImpl. feedAudioContent(aBuffer)

Feed audio samples to an ongoing streaming inference.

Arguments

StreamImpl. finishStream()

Compute the final decoding of an ongoing streaming inference and return the result. Signals the end of an ongoing streaming inference.

Returns

string – The STT result. This method will free the stream, it must not be used after this method is called.

StreamImpl. finishStreamWithMetadata(aNumResults)

Compute the final decoding of an ongoing streaming inference and return the results including metadata. Signals the end of an ongoing streaming inference.

Arguments

Returns

Metadata – Outputs a Metadata() struct of individual letters along with their timing information. The user is responsible for freeing Metadata by calling FreeMetadata(). This method will free the stream, it must not be used after this method is called.

StreamImpl. intermediateDecode()

Compute the intermediate decoding of an ongoing streaming inference.

Returns

string – The STT intermediate result.

StreamImpl. intermediateDecodeWithMetadata(aNumResults)

Compute the intermediate decoding of an ongoing streaming inference, return results including metadata.

Arguments

Returns

MetadataMetadata() object containing multiple candidate transcripts. Each transcript has per-token metadata including timing information. The user is responsible for freeing Metadata by calling FreeMetadata(). Returns undefined on error.

Module exported methods

FreeModel(model)

Frees associated resources and destroys model object.

Arguments

FreeStream(stream)

Destroy a streaming state without decoding the computed logits. This can be used if you no longer need the result of an ongoing streaming inference and don’t want to perform a costly decode operation.

Arguments

FreeMetadata(metadata)

Free memory allocated for metadata information.

Arguments

Version()

Returns the version of this library. The returned version is a semantic version (SemVer 2.0.0).

Returns

string

Metadata

class Metadata()

interface, exported from index

An array of CandidateTranscript objects computed by the model.

Metadata. transcripts

type: CandidateTranscript[]

CandidateTranscript

class CandidateTranscript()

interface, exported from index

A single transcript computed by the model, including a confidence value and the metadata for its constituent tokens.

CandidateTranscript. confidence

type: number

Approximated confidence value for this transcription. This is roughly the sum of the acoustic model logit values for each timestep/token that contributed to the creation of this transcription.

CandidateTranscript. tokens

type: TokenMetadata[]

TokenMetadata

class TokenMetadata()

interface, exported from index

Stores text of an individual token, along with its timing information

TokenMetadata. start_time

type: number

Position of the token in seconds

TokenMetadata. text

type: string

The text corresponding to this token

TokenMetadata. timestep

type: number

Position of the token in units of 20ms