fusion_hat.stt module

Speech to Text module

Convert speech to text.

Example

import STT and instantiate it

>>> from fusion_hat.stt import STT
>>> stt = STT(language="en-us")

Listen for speech input

>>> result = stt.listen(stream=False)
>>> print(result)
Hello

Use Stream to get partial results

>>> for result in stt.listen(stream=True):
>>>     if result["done"]:
>>>         print(f"\r\x1b[Kfinal: {result['final']}")
>>>     else:
>>>         print(f"\r\x1b[Kpartial: {result['partial']}", end="", flush=True)

Wait for wake words

>>> WAKE_WORDS = ["hey robot", "hello robot"]
>>> stt = STT(language="en-us")
>>> stt.set_wake_words(WAKE_WORDS)
>>> print(f'Wake me with: {WAKE_WORDS}')
Wake me with: ['hey robot', 'hello robot']
>>> result = stt.wait_until_heard()
>>> print("Wake word detected")
Wake word detected

Wake word in thread

>>> while True:
>>>     stt.start_listening_wake_words()
>>>     while not stt.is_waked():
>>>         print("Waiting for wake word...")
>>>         time.sleep(3)
>>>     print("Wake word detected")

fusion_hat.stt.STT: alias of Vosk

class fusion_hat.stt.Vosk(language=None, samplerate=None, device=None, log=None)[source]

Bases: object

Vosk STT class

DEFAULT_LANGUAGE = 'en-us'

is_ready()[source]

Check if Vosk STT is ready

Returns:: True if ready, False otherwise
Return type:: bool

init()[source]: Initialize Vosk STT

_load_model_list()[source]: Load model list from local cache or built-in defaults (offline, no network).

update_model_list()[source]

Fetch latest model list from network and save to cache.

Call this manually when you want to check for new models online. Falls back to local cache if network is unavailable.

wait_until_heard(wake_words=None, print_callback=<function Vosk.<lambda>>)[source]

Wait until heard a wake word

Parameters:

wake_words (list, optional) – Wake words, default is None
print_callback (function, optional) – Print callback, default is None

Returns:

Heard wake word

Return type:

str

heard_wake_word(print_callback=<function Vosk.<lambda>>)[source]

Check if heard a wake word

Parameters:: print_callback (function, optional) – Print callback, default is None
Returns:: True if heard a wake word, False otherwise
Return type:: bool

wait_for_wake_word()[source]: Wait for wake word

start_listening_wake_words()[source]: Start listening for wake words

is_waked()[source]

Check if the wake word thread is running

Returns:: True if running, False otherwise
Return type:: bool

stt(filename, stream=False)[source]

Perform STT on audio file

Parameters:

filename (str) – Audio file path
stream (bool, optional) – Stream mode, default is False

Returns:

STT result

Return type:

str

get_stream_result(wf, recognizer)[source]

Get streaming results from recognizer

Parameters:

wf (wave.Wave_read) – Wave file object
recognizer (KaldiRecognizer) – Vosk recognizer

Yields:

str – STT result

listen(stream=False, device=None, samplerate=None)[source]

Listen from microphone and return results

Parameters:

stream (bool, optional) – Stream mode, default is False
device (int, optional) – Device index, default is None
samplerate (int, optional) – Sampling rate, default is None

Returns:

STT result

Return type:

str

_listen_streaming(q, device=None, samplerate=None, callback=None)[source]

Listen from microphone and return streaming results

Parameters:

q (queue.Queue) – Queue to store audio data
device (int, optional) – Device index, default is None
samplerate (int, optional) – Sampling rate, default is None
callback (function, optional) – Callback function, default is None

Yields:

dict – STT result

_listen_non_streaming(q, device=None, samplerate=None, callback=None)[source]

Listen from microphone and return final result

Parameters:

q (queue.Queue) – Queue to store audio data
device (int, optional) – Device index, default is None
samplerate (int, optional) – Sampling rate, default is None
callback (function, optional) – Callback function, default is None

Returns:

STT result

Return type:

str

set_wake_words(wake_words: list)[source]

Set wake words

Parameters:: wake_words (list) – List of wake words

language() → str[source]

Get current language

Returns:: Current language
Return type:: str

set_language(language: str, init=True)[source]

Set language

Parameters:

language (str) – Language to set
init (bool, optional) – Initialize recognizer, default is True

get_model_name(lang: str) → str[source]

Get model name for language

Parameters:: lang (str) – Language
Returns:: Model name
Return type:: str

get_model_path(lang: str) → Path[source]

Get model path for language

Parameters:: lang (str) – Language
Returns:: Model path
Return type:: Path

is_model_downloaded(lang: str) → bool[source]

Check if model is downloaded

Parameters:: lang (str) – Language
Returns:: True if model is downloaded, False otherwise
Return type:: bool

cancel_download()[source]: Public method to cancel ongoing download

download_model(lang: str, progress_callback=None, max_retries: int = 5)[source]

Download model for language

Parameters:

lang (str) – Language
progress_callback (function, optional) – Progress callback function, default is None
max_retries (int, optional) – Maximum retries, default is 5

download_progress_hook(tqdm_bar=None, progress_callback=None)[source]

Download progress hook function

Parameters:

tqdm_bar (tqdm, optional) – tqdm progress bar, default is None
progress_callback (function, optional) – Progress callback function, default is None

stop_listening()[source]: Stop listening for wake word

close()[source]: Close STT