
Gurugram, India, Apr 09: Shunya Labs has launched a unified voice platform designed to enable large-scale content localisation for the entertainment industry. Built on the company’s voice AI models, the platform supports dubbing, translation, subtitling, voice cloning, and lip sync across multiple languages. This allows content to be adapted for diverse audiences within a single ecosystem, at studio-grade quality and scale.
Designed for global content workflows, the platform enables translation across formats, including text-to-text and speech-to-text. It supports a wide range of languages and dialects, along with code-switched speech patterns, allowing content to be adapted in ways that reflect how language is actually used across regions.
For Shunya Labs, the development of this platform addresses a fundamental challenge in media distribution. Content localisation often requires separate processes for translation, dubbing, and post-production, with each stage handled by different tools or vendors. This fragmentation limits both speed and scale. By bringing these capabilities together into a single platform, Shunya Labs enables content to be recreated across languages without relying on multiple tools or vendors.
The platform supports content transformation at multiple levels, from language conversion to voice and character consistency. It enables creators to generate dubbed audio that preserves tone, emotion, and speaker identity, with phoneme-level lip synchronisation to align speech with on-screen performance. Low-shot voice cloning allows studio-quality voice models to be built from limited audio samples, maintaining accent and identity across languages.
Ritu Mehrotra, Co-Founder and CEO of Shunya Labs, said, “Localising content at scale is not just about translating words. It requires preserving how something is said, not just what is said. Our focus has been on building a system that can recreate content across languages while maintaining tone, emotion, and identity, so that it remains authentic for different audiences.”
Beyond localisation, the platform supports the creation and reuse of voices through configurable voice models. Users can design voices with specific attributes such as tone, age, accent, and style, and maintain character consistency across projects, episodes, and languages. It also supports script-to-audio generation, allowing content to move directly from script to final audio without traditional recording workflows, with emotion tagging for expressive and high-fidelity output.
The platform also incorporates a content intelligence layer that enables scene segmentation, emotion arc detection, and narrative structuring. It supports the generation of highlights, trailers, and chaptered indexes, allowing content to be repurposed and monetised more effectively. In addition, it includes features for compliance and discoverability, such as ad suitability checks, compliance tagging, multilingual metadata generation, and the ability to search within video and audio content across languages.
