fusion_hat.tts module

Text-to-Speech (TTS) module.

This module provides a Text-to-Speech (TTS) class that can be used to convert text to speech using different TTS engines.

Available TTS engines:

Piper: A fast and local neural text-to-speech engine that embeds espeak-ng for phonemization.
Pico2Wave: SVOX Pico TTS engine used to convert text into a WAV audio file.
Espeak: A compact open source software speech synthesizer for English and other languages.
OpenAI_TTS Online TTS service from OpenAI.

Example

Piper

Initialize Piper TTS engine.

>>> from fusion_hat.tts import Piper
>>> tts = Piper()

Checkout available countries.

>>> tts.available_countrys()
['ar_JO', 'ca_ES', 'cs_CZ', 'cy_GB', 'da_DK', 'de_DE', 'el_GR', 'en_GB', 'en_US', 'es_ES', 'es_MX', 'fa_IR', 'fi_FI', 'fr_FR', 'hu_HU', 'is_IS', 'it_IT', 'ka_GE', 'kk_KZ', 'lb_LU', 'lv_LV', 'ml_IN', 'ne_NP', 'nl_BE', 'nl_NL', 'no_NO', 'pl_PL', 'pt_BR', 'pt_PT', 'ro_RO', 'ru_RU', 'sk_SK', 'sl_SI', 'sr_RS', 'sv_SE', 'sw_CD', 'tr_TR', 'uk_UA', 'vi_VN', 'zh_CN']

List all models for country en_US.

>>> tts.available_models('en_US')
{'amy': ['en_US-amy-low', 'en_US-amy-medium'], 'arctic': ['en_US-arctic-medium'], 'bryce': ['en_US-bryce-medium'], 'danny': ['en_US-danny-low'], 'hfc_female': ['en_US-hfc_female-medium'], 'hfc_male': ['en_US-hfc_male-medium'], 'joe': ['en_US-joe-medium'], 'john': ['en_US-john-medium'], 'kathleen': ['en_US-kathleen-low'], 'kristin': ['en_US-kristin-medium'], 'kusal': ['en_US-kusal-medium'], 'l2arctic': ['en_US-l2arctic-medium'], 'lessac': ['en_US-lessac-low', 'en_US-lessac-medium', 'en_US-lessac-high'], 'libritts': ['en_US-libritts-high'], 'libritts_r': ['en_US-libritts_r-medium'], 'ljspeech': ['en_US-ljspeech-medium', 'en_US-ljspeech-high'], 'norman': ['en_US-norman-medium'], 'reza_ibrahim': ['en_US-reza_ibrahim-medium'], 'ryan': ['en_US-ryan-low', 'en_US-ryan-medium', 'en_US-ryan-high'], 'sam': ['en_US-sam-medium']}

Set model

>>> tts.set_model('en_US-amy-low')

Say message.

>>> tts.say("Hi, I'm piper TTS. A fast and local neural text-to-speech engine that embeds espeak-ng for phonemization.")

Espeak

Import and initialize Espeak TTS engine.

>>> from fusion_hat.tts import Espeak
>>> tts = Espeak()

Set amplitude 0-200, default 100

>>> tts.set_amp(200)

Set speed 80-260, default 150

>>> tts.set_speed(150)

Set gap 0-200, default 1

>>> tts.set_gap(1)

Set pitch 0-99, default 80

>>> tts.set_pitch(80)

Say message.

>>> tts.say("Hello world!")

Pico2Wave

Import and initialize Pico2Wave TTS engine.

>>> from fusion_hat.tts import Pico2Wave
>>> tts = Pico2Wave()

List available languages.

>>> tts.SUPPORTED_LANGUAUE
['en-US', 'en-GB', 'de-DE', 'es-ES', 'fr-FR', 'it-IT']

Set language.

>>> tts.set_lang('en-US')

Say message.

>>> tts.say("Hello world!")

OpenAI TTS

Import and initialize OpenAI TTS engine.

>>> from fusion_hat.tts import OpenAI_TTS
>>> API_KEY = "sk-..."
>>> tts = OpenAI_TTS(api_key=API_KEY)

Set voice.

>>> tts.set_voice(tts.Voice.ALLOY)

Say message.

>>> tts.say("Hello world!")

Say message with instructions.

>>> tts.say("I'm so sad right now.", instructions="say it sadly")

class fusion_hat.tts.Piper(*args, **kwargs)[source]

Bases: _Base

Piper TTS engine.

Parameters:

model (str, optional) – model, leave it None to use default model, defaults to None
*args – passed to sunfounder_voice_assistant._base._Base.
**kwargs – passed to sunfounder_voice_assistant._base._Base.

_load_model_list()[source]: Load model list from local cache or built-in defaults (offline, no network).

update_model_list()[source]

Fetch latest model list from network and save to cache.

Call this manually when you want to check for new models online. Falls back to local cache if network is unavailable.

get_language() → str[source]

Get language from model.

Returns:: language
Return type:: str

is_model_downloaded(model: str) → bool[source]

Check if model is downloaded.

Parameters:: model (str) – model
Returns:: True if model is downloaded, False otherwise
Return type:: bool

download_model(model: str, force: bool = False, progress_callback: Callable[[int, int], None] = None) → None[source]

Download model.

Parameters:

model (str) – model
force (bool, optional) – force download, default is False
progress_callback (Callable[[int, int], None], optional) – progress callback, default is None

fix_chinese_punctuation(text: str) → str[source]

Replace Chinese punctuation with English punctuation.

Parameters:: text (str) – text
Returns:: text with English punctuation
Return type:: str

tts(text: str, file: str) → None[source]

Synthesize text to wave file.

Parameters:

text (str) – text
file (str) – wave file path

Raises:

ValueError – Model not set, set model first, with Piper.set_model(model)

stream(text: str) → None[source]

Stream text to speaker.

Parameters:: text (str) – text
Raises:: ValueError – Model not set, set model first, with Piper.set_model(model)

say(text: str, stream: bool = True) → None[source]

Say text.

Parameters:

text (str) – text
stream (bool, optional) – stream to speaker, default is True

Raises:

ValueError – Model not set, set model first, with Piper.set_model(model)

available_models(country: str = None) → List[str][source]

Get available models.

Parameters:: country (str, optional) – country, leave it None to get all models, defaults to None
Returns:: available models
Return type:: List[str]

available_countrys() → List[str][source]

Get available countrys.

Returns:: available countrys
Return type:: List[str]

get_model_path(model: str) → str[source]

Get model path.

Parameters:: model (str) – model
Returns:: model path
Return type:: str

set_model(model: str) → None[source]

Set model.

Parameters:: model (str) – model
Raises:: ValueError – Model not found

class fusion_hat.tts.Pico2Wave(*args, **kwargs)[source]

Bases: _Base

Pico2Wave TTS engine.

Parameters:

lang (str, optional) – language, leave it None to use default language, defaults to ‘en-US’
*args – passed to sunfounder_voice_assistant._base._Base.
**kwargs – passed to sunfounder_voice_assistant._base._Base.

SUPPORTED_LANGUAUE = ['en-US', 'en-GB', 'de-DE', 'es-ES', 'fr-FR', 'it-IT']: Supported languages.

say(words: str) → None[source]

Say words with pico2wave.

Parameters:: words (str) – words to say.

set_lang(lang: str) → None[source]

Set language.

Parameters:: lang (str) – language.

class fusion_hat.tts.Espeak(*args, **kwargs)[source]

Bases: _Base

Espeak TTS engine

Parameters:

*args – passed to sunfounder_voice_assistant._base._Base.
**kwargs – passed to sunfounder_voice_assistant._base._Base.

ESPEAK = 'espeak'

tts(words: str, file_path: str) → None[source]

Text-to-speech with espeak

Parameters:

words (str) – Word to say
file_path (str) – Path to save audio file

say(words: str) → None[source]

Say words with espeak

Parameters:: words (str) – Words to say

set_amp(amp: int) → None[source]

Set amplitude

Parameters:: amp (int) – Amplitude (0-200)

set_speed(speed: int) → None[source]

Set speed

Parameters:: speed (int) – Speed (80-260)

set_gap(gap: int) → None[source]

Set gap

Parameters:: gap (int) – Gap (0-200)

set_pitch(pitch: int) → None[source]

Set pitch

Parameters:: pitch (int) – Pitch (0-99)

class fusion_hat.tts.OpenAI_TTS(*args, **kwargs)[source]

Bases: _Base

OpenAI TTS engine.

Parameters:

voice (Voice, optional) – Voice, default is Voice.ALLOY.
model (Model, optional) – Model, default is Model.GPT_4O_MINI_TTS.
api_key (str, optional) – API key.
gain (float, optional) – Volume gain, default is 1.5.
log (logging.Logger, optional) – Logger, default is None.
*args – passed to sunfounder_voice_assistant._base._Base.
**kwargs – passed to sunfounder_voice_assistant._base._Base.

DEFAULT_INSTRUCTIONS = 'Speak in a cheerful and positive tone.'

URL = 'https://api.openai.com/v1/audio/speech'

AUDIO_FORMAT = 'wav'

class Voice(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: StrEnum

Voice enum.

ALLOY = 'alloy'

ASH = 'ash'

BALLAD = 'ballad'

CORAL = 'coral'

ECHO = 'echo'

FABLE = 'fable'

NOVA = 'nova'

ONYX = 'onyx'

SAGE = 'sage'

SHIMMER = 'shimmer'

static _generate_next_value_(name, start, count, last_values): Return the lower-cased version of the member name.

class Model(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: StrEnum

Model enum.

GPT_4O_MINI_TTS = 'gpt-4o-mini-tts'

static _generate_next_value_(name, start, count, last_values): Return the lower-cased version of the member name.

DEFAULT_MODEL = 'gpt-4o-mini-tts'

DEFAULT_VOICE = 'alloy'

tts(words: str, output_file: str = './openai_tts.wav', instructions: str | None = None, stream: bool = False) → bool[source]

Request OpenAI TTS API.

Parameters:

words (str) – Words to say.
output_file (str, optional) – Output file, default is ‘./openai_tts.wav’.
instructions (str, optional) – Instructions, default is None.
stream (bool, optional) – Whether to stream the audio, default is False.

Returns:

True if success, False otherwise.

Return type:

bool

say(words: str, instructions: str | None = None, stream: bool = True) → None[source]

Say words.

Parameters:

words (str) – Words to say.
instructions (str, optional) – Instructions, default is None.
stream (bool, optional) – Whether to stream the audio, default is True.

set_voice(voice: [<enum 'Voice'>, <class 'str'>]) → None[source]

Set voice.

Parameters:: voice (Voice | str) – Voice.

set_model(model: [<enum 'Model'>, <class 'str'>]) → None[source]

Set model.

Parameters:: model (Model | str) – Model.

set_api_key(api_key: str) → None[source]

Set api key.

Parameters:: api_key (str) – API key.

set_gain(gain: float) → None[source]

Set gain.

Parameters:: gain (float) – Gain.