fusion_hat.voice_assistant module

class fusion_hat.voice_assistant.VoiceAssistant(*args, **kwargs)[source]

Bases: object

Voice assistant class

Parameters:

llm (sunfounder_voice_assistant.llm.LLM) – Language model
name (str, optional) – Robot name, default is NAME
with_image (bool, optional) – Enable image, need to set up a multimodal language model, default is WITH_IMAGE
tts_model (str, optional) – Text-to-speech model, default is TTS_MODEL
stt_language (str, optional) – Speech-to-text language, default is STT_LANGUAGE
keyboard_enable (bool, optional) – Enable keyboard input, default is KEYBOARD_ENABLE
wake_enable (bool, optional) – Enable wake word, default is WAKE_ENABLE
wake_word (list, optional) – Wake word, default is WAKE_WORD
answer_on_wake (str, optional) – Answer on wake word, default is ANSWER_ON_WAKE
welcome (str, optional) – Welcome message, default is WELCOME
instructions (str, optional) – Set instructions, default is INSTRUCTIONS
disable_think (bool, optional) – Disable think, default is False

before_listen() → None[source]: Before listen

after_listen(stt_result: str) → None[source]

After listen

Parameters:: stt_result (str) – Speech-to-text result

before_think(text: str) → None[source]

Before think

Parameters:: text (str) – Text to think

after_think(text: str) → None[source]

After think

Parameters:: text (str) – Text to think

on_start() → None[source]: On start

on_wake() → None[source]: On wake

on_heard(text: str) → None[source]

On heard

Parameters:: text (str) – Text heard

parse_response(text: str) → str[source]

Parse response

Parameters:: text (str) – Text to parse
Returns:: Parsed text
Return type:: str

add_trigger(trigger_function: Callable[[], tuple[bool, bool, str]]) → None[source]

Add trigger function

Parameters:: trigger_function (Callable[[], tuple[bool, bool, str]]) – Trigger function

before_say(text: str) → None[source]

Before say

Parameters:: text (str) – Text to say

after_say(text: str) → None[source]

After say

Parameters:: text (str) – Text to say

on_stop() → None[source]: On stop

on_finish_a_round() → None[source]: On finish a round

capture_image(path: str) → None[source]

Capture image

Parameters:: path (str) – Path to save image

trigger_wake_word() → tuple[bool, bool, str][source]

Trigger wake word

Returns:: Triggered, disable image, message
Return type:: tuple[bool, bool, str]

trigger_keyboard_input() → tuple[bool, bool, str][source]

Trigger keyboard input

Returns:: Triggered, disable image, message
Return type:: tuple[bool, bool, str]

init_camera() → None[source]: Initialize camera

close_camera() → None[source]: Close camera

listen() → str[source]

Listen

Returns:: Speech-to-text result
Return type:: str

think(text: str, disable_image: bool = False) → str[source]

Think

Parameters:

text (str) – Text to think
disable_image (bool, optional) – Disable image, defaults to False

Returns:

LLM response

Return type:

str

main() → None[source]: Main loop

run() → None[source]: Run