KugelAudio: Real-time text-to-speech model you can self-host

The most natural real-time TTS with voice cloning and sub-60 ms latency, on-premises or via API. Grammar-aware normalization naturally reads phone numbers, IBANs, addresses, and medications in 25+ languages ​​with word-level timestamps and IPA support. Adapter for LiveKit, Pipecat and Vapi. Manufactured by 4 in Berlin.



<a href

Leave a Comment