Created: by Pradeep Gowda Updated: Dec 12, 2023 Tagged: xtts

Major Open Source Engine is Coqui. Others to consider – Whisper, and Piper.

XTTS model weights from Coqui - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

KoljaB/RealtimeTTS: Converts text to speech in realtime by identifying sentence fragments for immediate auditory feedback. Ideal for applications requiring instant audio responses.

yl4579/StyleTTS2: StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Related to above: StyleTTS 2 | llm-tracker and a shootoff; via

vitsvia “model is 40M parameter and 150MB in size, and works on-CPU runtime”

Kim, Jaehyeon, Jungil Kong, and Juhee Son. “Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech,” 2021. https://arxiv.org/abs/2106.06103.

It does its job for most on-device use cases like reading an article, practicing a language, etc.!! Here’s how you can use it with Transformers !

Set up your environment: pip install transformers accelerate phonemizer

Initialize the model:

import torch
from transformers import VitsModel, AutoTokenizer
model = VitsModel.from_pretrained( "kakao-enterprise/vits-vctk") tokenizer = AutoTokenizer.from_pretrained( "kakao-enterprise/vits-vctk")
# Pass the text you'd like to synthesise:
text = "Hey, it's Max the best doggo speaking!"
inputs = tokenizer(text, return_tensors="pt")

# Generate audio
with torch. no_grad():
    output = model(**inputs).waveform[0]

Bonus: you’d soon be able to fine-tune them in your voice/ dataset too!

Indian language TTS

… specially i’m interested in SOTA, Open Source, widely available Kannada models. If not, there is an opportunity to develop one.

Indic TTS - Synthesis Docs github - AI4Bharat/Indic-TTS: Text-to-Speech for languages of India

Kannada (ಕನ್ನಡ) Text To Speech (TTS) Demo

Bhashini Neural TTS

2023-12-06 I have asked Mozilla Commonvoice to add Kannada Language Option.


Speech and Language Processing – by Dan Jurafsky and James H. MartinAn Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition