Blog 1

Voice AI is no longer a single category. It’s a portfolio of providers, each with strengths in different languages and styles. The bet that one TTS engine could win every market — never realistic — is now demonstrably wrong. We’ve been quietly tracking quality across en-US, ja-JP, hi-IN, and a handful of long-tail locales for the past year. The pattern is consistent: the best provider for your script depends on the language, the genre, and sometimes the specific named entities you ship. The provider routing problem Most teams pick a TTS vendor early and live with the long tail. That works for English. It breaks at the second locale. By the fifth, the gap between best-available and what-you’re-actually-shipping is impossible to ignore. One TTS contract per language is unsustainable. One pipeline that routes across all of them is. What we did about it We built a router that ranks providers per-locale, per-genre, and per-validator-pass-rate. The decision happens at the edge of the pipeline; the rest of the workflow doesn’t need to know which engine produced the audio. Below is the simplest form: pipeline.route({ "en-US": ["elevenlabs:sonic-v2", "openai:tts-1-hd"], "ja-JP": ["sarvam:tara", "elevenlabs:multilingual"], "hi-IN": ["sarvam:tara"], fallback: "openai:tts-1-hd", }); Routing alone gets you most of the way. Validators close the loop: WER, naturalness, pronunciation, and human review feed back into the router’s ranking, so the “best” provider for each locale stays current. What we didn’t do We didn’t train our own TTS. We didn’t fine-tune anyone else’s. The bet is that orchestration, validation, and pronunciation control compound faster than any single-vendor improvement, especially for teams with broad locale coverage. So far, that’s held.
Onepin.ai - the trust of voice AI
Home
Blog