Transcription is successful. Need help for training (original) (raw)

I have installed deepspeech and could successfully perform live transcription with mic_vad_streaming. Now, I desire to train with my data which consists of about 15 words (i.e., 15 commands). I have the following difficulties:

I am using windows. I am finding DeepSpeech.EXE and not DeepSpeech.py. Executing DeepSpeech.exe throws a message with options which does NOT incude --train-files
DeepSpeech-0.9.3 archive is obtained separately. If I execute DeepSpeech.py, I am getting the following error:

from deepspeech_training import train as ds_train
ModuleNotFoundError: No module named ‘deepspeech_training’

Since my language contains only 15 words, do I need GPU or does CPU suffices?
The command to train deep speech (python3 DeepSpeech.py --train_files …/data/CV/en/clips/train.csv --dev_files …/data/CV/en/clips/dev.csv --test_files …/data/CV/en/clips/test.csv) is obtained from https://deepspeech.readthedocs.io/en/r0.9/TRAINING.html. The command does not include ‘alphabet.txt’. Is it implied that ‘alphabet.txt’ exists in the current directory?
The wav file size in CSV represents actual size of file? I mean, what is obtained from ‘dir’ command?
Is there any simplified data set (say about 10 words) to ILLUSTRATE training?

Thanks and Regards
S Srinivasan

h4ever July 29, 2023, 10:09pm 2

My guess: You didn’t install the module deepspeech_training with pip

It doesn’t make sense why you use .exe and python. Those are different environments. I mean usually when you run .py you do everything in python. I am python newbe on linux.

kathyreid (Kathy Reid) July 30, 2023, 1:47am 3

The DeepSpeech Playbook provides a step by step guide to model training, and considers aspects such as the alphabet.txt file, the environment needed for training a model (we recommend not using CPUs) among other considerations.

First, my sincere gratitude for a valuable response. I went through the instructions and could successfully perform the training resulting in generation of .pb file. Yet, I have a few issues:

My Language : My language has only three words “ram”, “robert” and “rahim”. I spoke these words multiple times and recorded. I have prepared the wave files using Audacity and manually prepared the CSV files. I seek help on the following:

(1) Performed the training WITHOUT alphabet.txt. Does it mean that it takes English by default? (Because, playbook training example (for Indonesian data set) does not include alphabet.txt at all. Even, I am using English script)
(2) Post training, I attempted transcription using mic_vad_streaming. But, whatever I speak, it outputs only the single character “r”

I seek valuable comments, especially on (2) above.

Regards
S Srinivasan