feat: Realtime API support reboot by richiejp · Pull Request #5392 · mudler/LocalAI (original) (raw)

Rebase and continue #3722 with the intention of at least getting audio-to-text with VAD working.

feat(realtime): Initial Realtime API implementation
chore: go mod tidy

EDIT:

This just implements transcription only mode with VAD enabled. However a lot of code for supporting the full API is still there, but it's not functional, it's there to be built on.

I have only tested against richiejp/VoxInput#2
Which works nicely, but the API behavior may not be exactly like OpenAI's. It would be helpful for people to test their apps and report the results.

I think it would be good to get transcription only mode out for experimentation and get a version of VoxInput out which uses it. Then I can start thinking of ways to use the full API with VoxInput or something else. The full API could be used with tool calling to enable flexible voice commands, either on desktop or with embedded devices.