Speaches Documentation (original) (raw)

speaches is an OpenAI API-compatible server supporting streaming transcription, translation, and speech generation. Speach-to-Text is powered by faster-whisper and for Text-to-Speech piper and Kokoro are used. This project aims to be Ollama, but for TTS/STT models.

Features:

Please create an issue if you find a bug, have a question, or a feature suggestion.

Demos

Realtime API

(Excuse the breathing lol. Didn't have enough time to record a better demo)

Streaming Transcription

TODO

Speech Generation