xtts

Major Open Source Engine is Coqui. Others to consider — Whisper, and Piper.

XTTS model weights from Coqui - coqui-ai/TTS: 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

KoljaB/RealtimeTTS: Converts text to speech in realtime by identifying sentence fragments for immediate auditory feedback. Ideal for applications requiring instant audio responses.

yl4579/StyleTTS2: StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Related to above: StyleTTS 2 | llm-tracker and a shootoff; via

vits — via “model is 40M parameter and 150MB in size, and works on-CPU runtime”

(Kim et al., 2021)

It does its job for most on-device use cases like reading an article, practicing a language, etc.!! Here’s how you can use it with Transformers !

Set up your environment: pip install transformers accelerate phonemizer

Initialize the model:

import torch
from transformers import VitsModel, AutoTokenizer
model = VitsModel.from_pretrained( "kakao-enterprise/vits-vctk") tokenizer = AutoTokenizer.from_pretrained( "kakao-enterprise/vits-vctk")
# Pass the text you'd like to synthesise:
text = "Hey, it's Max the best doggo speaking!"
inputs = tokenizer(text, return_tensors="pt")
 
# Generate audio
with torch. no_grad():
    output = model(**inputs).waveform[0]

Bonus: you’d soon be able to fine-tune them in your voice/ dataset too!

Indian language TTS

… specially i’m interested in SOTA, Open Source, widely available Kannada models. If not, there is an opportunity to develop one.

Indic TTS - Synthesis Docs github - AI4Bharat/Indic-TTS: Text-to-Speech for languages of India

Kannada (ಕನ್ನಡ) Text To Speech (TTS) Demo

Bhashini Neural TTS

2023-12-06 I have asked Mozilla Commonvoice to add Kannada Language Option.

Books

Speech and Language Processing — by Dan Jurafsky and James H. Martin — An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Kim, J., Kong, J., & Son, J. (2021). Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech.

btbytes.com

xtts

Indian language TTS

Books

Table of Contents

Graph View

Backlinks