fusion_hat.voice_assistant module

class fusion_hat.voice_assistant.VoiceAssistant(*args, **kwargs)[source]

Bases: object

Voice assistant class

Parameters:
  • llm (sunfounder_voice_assistant.llm.LLM) – Language model

  • name (str, optional) – Robot name, default is NAME

  • with_image (bool, optional) – Enable image, need to set up a multimodal language model, default is WITH_IMAGE

  • tts_model (str, optional) – Text-to-speech model, default is TTS_MODEL

  • stt_language (str, optional) – Speech-to-text language, default is STT_LANGUAGE

  • keyboard_enable (bool, optional) – Enable keyboard input, default is KEYBOARD_ENABLE

  • wake_enable (bool, optional) – Enable wake word, default is WAKE_ENABLE

  • wake_word (list, optional) – Wake word, default is WAKE_WORD

  • answer_on_wake (str, optional) – Answer on wake word, default is ANSWER_ON_WAKE

  • welcome (str, optional) – Welcome message, default is WELCOME

  • instructions (str, optional) – Set instructions, default is INSTRUCTIONS

  • disable_think (bool, optional) – Disable think, default is False

before_listen() None[source]

Before listen

after_listen(stt_result: str) None[source]

After listen

Parameters:

stt_result (str) – Speech-to-text result

before_think(text: str) None[source]

Before think

Parameters:

text (str) – Text to think

after_think(text: str) None[source]

After think

Parameters:

text (str) – Text to think

on_start() None[source]

On start

on_wake() None[source]

On wake

on_heard(text: str) None[source]

On heard

Parameters:

text (str) – Text heard

parse_response(text: str) str[source]

Parse response

Parameters:

text (str) – Text to parse

Returns:

Parsed text

Return type:

str

add_trigger(trigger_function: Callable[[], tuple[bool, bool, str]]) None[source]

Add trigger function

Parameters:

trigger_function (Callable[[], tuple[bool, bool, str]]) – Trigger function

before_say(text: str) None[source]

Before say

Parameters:

text (str) – Text to say

after_say(text: str) None[source]

After say

Parameters:

text (str) – Text to say

on_stop() None[source]

On stop

on_finish_a_round() None[source]

On finish a round

capture_image(path: str) None[source]

Capture image

Parameters:

path (str) – Path to save image

trigger_wake_word() tuple[bool, bool, str][source]

Trigger wake word

Returns:

Triggered, disable image, message

Return type:

tuple[bool, bool, str]

trigger_keyboard_input() tuple[bool, bool, str][source]

Trigger keyboard input

Returns:

Triggered, disable image, message

Return type:

tuple[bool, bool, str]

init_camera() None[source]

Initialize camera

close_camera() None[source]

Close camera

listen() str[source]

Listen

Returns:

Speech-to-text result

Return type:

str

think(text: str, disable_image: bool = False) str[source]

Think

Parameters:
  • text (str) – Text to think

  • disable_image (bool, optional) – Disable image, defaults to False

Returns:

LLM response

Return type:

str

main() None[source]

Main loop

run() None[source]

Run