Any self-hosted speech-to-text / text-to-speech LLM available?

Zeon@lemmy.world · 11 days ago

Any self-hosted speech-to-text / text-to-speech LLM available?

Scrubbles@poptalk.scrubbles.tech · edit-2 11 days ago

Generally there are not LLMs that do this, but you start building up a workflow. You speak, one service reads in the audio and translates it to text. Then you feed that into an LLM, it responds in text, and you have another service translate that into audio.

Home Assistant is the easiest way to get them all put together.

https://www.home-assistant.io/integrations/assist_pipeline

Edit agree with others below. Use the apps that are made for it.

Whisper for STT
Any hosted LLM can work, text-generation-webui or tabbyapi
I use xttsv2 for TTS