Release DeepSpeech 0.6.1 · mozilla/DeepSpeech (original) (raw)

General

This is the 0.6.1 release of Deep Speech, an open speech-to-text engine. In accord with semantic versioning, this version is not backwards compatible with version 0.5.1 or earlier versions. So when updating one will have to update code and models. As with previous releases, this release source code:

v0.6.1.tar.gz

and a model

deepspeech-0.6.1-models.tar.gz (This is identical to the 0.6.0 model).

trained on American English which achieves an 7.5% word error rate on the LibriSpeech clean test corpus. Models with a "*.pbmm" extension are memory mapped and much more memory efficient, as well as faster to load. Models with the ".tflite" extension are converted to use with TFLite and have post-training quantization enabled, and are more suitable for resource constrained environments.

We also include example audio files:

audio-0.6.1.tar.gz

which can be used to test the engine; and checkpoint files

deepspeech-0.6.1-checkpoint.tar.gz (This is identical to the 0.6.0 checkpoint, except the missing alphabet.txt file is now included.)

which can be used as the basis for further fine-tuning.

Notable changes from the previous release

DeepSpeech 0.6.1 is a patch release that addresses some minor points surfaced after the 0.6.0 release:

Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the hardware used, a server with 8 Quadro RTX 6000 GPUs each with 24GB of VRAM. These are identical to the 0.6.0 release.

The weights with the best validation loss were selected at the end of 75 epochs using --noearly_stop, and the selected model was trained for 233784 steps. In addition the training used the --use_cudnn_rnn flag.

Bindings

This release also includes a Python based command line tool deepspeech, installed through

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux, macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

Also, it exposes bindings for the following languages

On Linux, macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

npm install deepspeech-gpu  

In addition there are third party bindings that are supported by external developers, for example

Supported Platforms

Contact/Getting Help

  1. FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.
  2. Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.
  3. IRC - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the #machinelearning channel on Mozilla IRC; people there can try to answer/help
  4. Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.

Contributors to 0.6.1 release