xtts
Created:
Major Open Source Engine is Coqui. Others to consider – Whisper, and Piper.
XTTS model weights from Coqui - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Related to above: StyleTTS 2 | llm-tracker and a shootoff; via
vits – via “model is 40M parameter and 150MB in size, and works on-CPU runtime”
Kim, Jaehyeon, Jungil Kong, and Juhee Son. “Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech,” 2021. http://arxiv.org/abs/2106.06103.
It does its job for most on-device use cases like reading an article, practicing a language, etc.!! Here’s how you can use it with Transformers !
Set up your environment: pip install transformers accelerate phonemizer
Initialize the model:
import torch
from transformers import VitsModel, AutoTokenizer
= VitsModel.from_pretrained( "kakao-enterprise/vits-vctk") tokenizer = AutoTokenizer.from_pretrained( "kakao-enterprise/vits-vctk")
model # Pass the text you'd like to synthesise:
= "Hey, it's Max the best doggo speaking!"
text = tokenizer(text, return_tensors="pt")
inputs
# Generate audio
with torch. no_grad():
= model(**inputs).waveform[0] output
Bonus: you’d soon be able to fine-tune them in your voice/ dataset too!
Indian language TTS
… specially i’m interested in SOTA, Open Source, widely available Kannada models. If not, there is an opportunity to develop one.
Indic TTS - Synthesis Docs github - AI4Bharat/Indic-TTS: Text-to-Speech for languages of India
Kannada (ಕನ್ನಡ) Text To Speech (TTS) Demo
Books
Speech and Language Processing – by Dan Jurafsky and James H. Martin – An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition