fusion_hat.tts module
Text-to-Speech (TTS) module.
This module provides a Text-to-Speech (TTS) class that can be used to convert text to speech using different TTS engines.
Available TTS engines:
Piper: A fast and local neural text-to-speech engine that embeds espeak-ng for phonemization.Pico2Wave: SVOX Pico TTS engine used to convert text into a WAV audio file.Espeak: A compact open source software speech synthesizer for English and other languages.OpenAI_TTSOnline TTS service from OpenAI.
Example
Piper
Initialize Piper TTS engine.
>>> from fusion_hat.tts import Piper
>>> tts = Piper()
Checkout available countries.
>>> tts.available_countrys()
['ar_JO', 'ca_ES', 'cs_CZ', 'cy_GB', 'da_DK', 'de_DE', 'el_GR', 'en_GB', 'en_US', 'es_ES', 'es_MX', 'fa_IR', 'fi_FI', 'fr_FR', 'hu_HU', 'is_IS', 'it_IT', 'ka_GE', 'kk_KZ', 'lb_LU', 'lv_LV', 'ml_IN', 'ne_NP', 'nl_BE', 'nl_NL', 'no_NO', 'pl_PL', 'pt_BR', 'pt_PT', 'ro_RO', 'ru_RU', 'sk_SK', 'sl_SI', 'sr_RS', 'sv_SE', 'sw_CD', 'tr_TR', 'uk_UA', 'vi_VN', 'zh_CN']
List all models for country en_US.
>>> tts.available_models('en_US')
{'amy': ['en_US-amy-low', 'en_US-amy-medium'], 'arctic': ['en_US-arctic-medium'], 'bryce': ['en_US-bryce-medium'], 'danny': ['en_US-danny-low'], 'hfc_female': ['en_US-hfc_female-medium'], 'hfc_male': ['en_US-hfc_male-medium'], 'joe': ['en_US-joe-medium'], 'john': ['en_US-john-medium'], 'kathleen': ['en_US-kathleen-low'], 'kristin': ['en_US-kristin-medium'], 'kusal': ['en_US-kusal-medium'], 'l2arctic': ['en_US-l2arctic-medium'], 'lessac': ['en_US-lessac-low', 'en_US-lessac-medium', 'en_US-lessac-high'], 'libritts': ['en_US-libritts-high'], 'libritts_r': ['en_US-libritts_r-medium'], 'ljspeech': ['en_US-ljspeech-medium', 'en_US-ljspeech-high'], 'norman': ['en_US-norman-medium'], 'reza_ibrahim': ['en_US-reza_ibrahim-medium'], 'ryan': ['en_US-ryan-low', 'en_US-ryan-medium', 'en_US-ryan-high'], 'sam': ['en_US-sam-medium']}
Set model
>>> tts.set_model('en_US-amy-low')
Say message.
>>> tts.say("Hi, I'm piper TTS. A fast and local neural text-to-speech engine that embeds espeak-ng for phonemization.")
Espeak
Import and initialize Espeak TTS engine.
>>> from fusion_hat.tts import Espeak
>>> tts = Espeak()
Set amplitude 0-200, default 100
>>> tts.set_amp(200)
Set speed 80-260, default 150
>>> tts.set_speed(150)
Set gap 0-200, default 1
>>> tts.set_gap(1)
Set pitch 0-99, default 80
>>> tts.set_pitch(80)
Say message.
>>> tts.say("Hello world!")
Pico2Wave
Import and initialize Pico2Wave TTS engine.
>>> from fusion_hat.tts import Pico2Wave
>>> tts = Pico2Wave()
List available languages.
>>> tts.SUPPORTED_LANGUAUE
['en-US', 'en-GB', 'de-DE', 'es-ES', 'fr-FR', 'it-IT']
Set language.
>>> tts.set_lang('en-US')
Say message.
>>> tts.say("Hello world!")
OpenAI TTS
Import and initialize OpenAI TTS engine.
>>> from fusion_hat.tts import OpenAI_TTS
>>> API_KEY = "sk-..."
>>> tts = OpenAI_TTS(api_key=API_KEY)
Set voice.
>>> tts.set_voice(tts.Voice.ALLOY)
Say message.
>>> tts.say("Hello world!")
Say message with instructions.
>>> tts.say("I'm so sad right now.", instructions="say it sadly")
- class fusion_hat.tts.Piper(*args, **kwargs)[source]
Bases:
_BasePiper TTS engine.
- Parameters:
model (str, optional) – model, leave it None to use default model, defaults to None
*args – passed to
sunfounder_voice_assistant._base._Base.**kwargs – passed to
sunfounder_voice_assistant._base._Base.
- _load_model_list()[source]
Load model list from local cache or built-in defaults (offline, no network).
- update_model_list()[source]
Fetch latest model list from network and save to cache.
Call this manually when you want to check for new models online. Falls back to local cache if network is unavailable.
- is_model_downloaded(model: str) bool[source]
Check if model is downloaded.
- Parameters:
model (str) – model
- Returns:
True if model is downloaded, False otherwise
- Return type:
bool
- download_model(model: str, force: bool = False, progress_callback: Callable[[int, int], None] = None) None[source]
Download model.
- Parameters:
model (str) – model
force (bool, optional) – force download, default is False
progress_callback (Callable[[int, int], None], optional) – progress callback, default is None
- fix_chinese_punctuation(text: str) str[source]
Replace Chinese punctuation with English punctuation.
- Parameters:
text (str) – text
- Returns:
text with English punctuation
- Return type:
str
- tts(text: str, file: str) None[source]
Synthesize text to wave file.
- Parameters:
text (str) – text
file (str) – wave file path
- Raises:
ValueError – Model not set, set model first, with Piper.set_model(model)
- stream(text: str) None[source]
Stream text to speaker.
- Parameters:
text (str) – text
- Raises:
ValueError – Model not set, set model first, with Piper.set_model(model)
- say(text: str, stream: bool = True) None[source]
Say text.
- Parameters:
text (str) – text
stream (bool, optional) – stream to speaker, default is True
- Raises:
ValueError – Model not set, set model first, with Piper.set_model(model)
- available_models(country: str = None) List[str][source]
Get available models.
- Parameters:
country (str, optional) – country, leave it None to get all models, defaults to None
- Returns:
available models
- Return type:
List[str]
- available_countrys() List[str][source]
Get available countrys.
- Returns:
available countrys
- Return type:
List[str]
- class fusion_hat.tts.Pico2Wave(*args, **kwargs)[source]
Bases:
_BasePico2Wave TTS engine.
- Parameters:
lang (str, optional) – language, leave it None to use default language, defaults to ‘en-US’
*args – passed to
sunfounder_voice_assistant._base._Base.**kwargs – passed to
sunfounder_voice_assistant._base._Base.
- SUPPORTED_LANGUAUE = ['en-US', 'en-GB', 'de-DE', 'es-ES', 'fr-FR', 'it-IT']
Supported languages.
- class fusion_hat.tts.Espeak(*args, **kwargs)[source]
Bases:
_BaseEspeak TTS engine
- Parameters:
*args – passed to
sunfounder_voice_assistant._base._Base.**kwargs – passed to
sunfounder_voice_assistant._base._Base.
- ESPEAK = 'espeak'
- class fusion_hat.tts.OpenAI_TTS(*args, **kwargs)[source]
Bases:
_BaseOpenAI TTS engine.
- Parameters:
voice (Voice, optional) – Voice, default is Voice.ALLOY.
model (Model, optional) – Model, default is Model.GPT_4O_MINI_TTS.
api_key (str, optional) – API key.
gain (float, optional) – Volume gain, default is 1.5.
log (logging.Logger, optional) – Logger, default is None.
*args – passed to
sunfounder_voice_assistant._base._Base.**kwargs – passed to
sunfounder_voice_assistant._base._Base.
- DEFAULT_INSTRUCTIONS = 'Speak in a cheerful and positive tone.'
- URL = 'https://api.openai.com/v1/audio/speech'
- AUDIO_FORMAT = 'wav'
- class Voice(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
StrEnumVoice enum.
- ALLOY = 'alloy'
- ASH = 'ash'
- BALLAD = 'ballad'
- CORAL = 'coral'
- ECHO = 'echo'
- FABLE = 'fable'
- NOVA = 'nova'
- ONYX = 'onyx'
- SAGE = 'sage'
- SHIMMER = 'shimmer'
- static _generate_next_value_(name, start, count, last_values)
Return the lower-cased version of the member name.
- class Model(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
StrEnumModel enum.
- GPT_4O_MINI_TTS = 'gpt-4o-mini-tts'
- static _generate_next_value_(name, start, count, last_values)
Return the lower-cased version of the member name.
- DEFAULT_MODEL = 'gpt-4o-mini-tts'
- DEFAULT_VOICE = 'alloy'
- tts(words: str, output_file: str = './openai_tts.wav', instructions: str | None = None, stream: bool = False) bool[source]
Request OpenAI TTS API.
- Parameters:
words (str) – Words to say.
output_file (str, optional) – Output file, default is ‘./openai_tts.wav’.
instructions (str, optional) – Instructions, default is None.
stream (bool, optional) – Whether to stream the audio, default is False.
- Returns:
True if success, False otherwise.
- Return type:
bool
- say(words: str, instructions: str | None = None, stream: bool = True) None[source]
Say words.
- Parameters:
words (str) – Words to say.
instructions (str, optional) – Instructions, default is None.
stream (bool, optional) – Whether to stream the audio, default is True.
- set_voice(voice: [<enum 'Voice'>, <class 'str'>]) None[source]
Set voice.
- Parameters:
voice (Voice | str) – Voice.