Note
Hello, welcome to the SunFounder Raspberry Pi & Arduino & ESP32 Enthusiasts Community on Facebook! Dive deeper into Raspberry Pi, Arduino, and ESP32 with fellow enthusiasts.
Why Join?
Expert Support: Solve post-sale issues and technical challenges with help from our community and team.
Learn & Share: Exchange tips and tutorials to enhance your skills.
Exclusive Previews: Get early access to new product announcements and sneak peeks.
Special Discounts: Enjoy exclusive discounts on our newest products.
Festive Promotions and Giveaways: Take part in giveaways and holiday promotions.
👉 Ready to explore and create with us? Click [here] and join today!
7. AI Voice Assistant
This lesson turns your Fusion HAT+ into a voice-first AI assistant. With the provided code, the robot will: wait for a wake word, transcribe your speech with Vosk, send it to an OpenAI LLM, and speak back using Piper TTS.
Before You Start
Make sure you have:
1. Testing Piper — Piper voice works (e.g., you can play “Hello”).
2. Test Vosk — Vosk STT works for your language (e.g.,
en-us).5. Connecting to Online LLMs — Your OpenAI API key saved in
secret.pyasOPENAI_API_KEY.A working microphone and speaker on Fusion HAT.
A stable network connection (LLM is online).
Run the Example
cd ~/fusion-hat/examples/
sudo python3 voice_assistant.py
Configuration used by the code:
LLM: OpenAI (
gpt-4o-mini)TTS: Piper (
en_US-ryan-low)STT: Vosk (
en-us)Wake word:
"hey buddy"Keyboard input: enabled (optional manual input)
Image mode: enabled (
WITH_IMAGE=True) — requires a multimodal-capable LLM if you decide to use images later
What happens:
The assistant shows a welcome message with the wake phrase.
It listens for “hey buddy”.
After wake, your speech is transcribed (Vosk → text).
The text is sent to OpenAI (gpt-4o-mini) for a response.
The answer is spoken with Piper (
en_US-ryan-low).
Example interaction
You: Hey Buddy
Robot: Hi there!
You: What’s the capital of Italy?
Robot: The capital of Italy is Rome.
Code
from fusion_hat.voice_assistant import VoiceAssistant
from fusion_hat.llm import OpenAI as LLM
from secret import OPENAI_API_KEY as API_KEY
llm = LLM(
api_key=API_KEY,
model="gpt-4o-mini",
)
# Robot name
NAME = "Buddy"
# Enable image, need to set up a multimodal language model
WITH_IMAGE = True
# Set models and languages
LLM_MODEL = "gpt-4o-mini"
TTS_MODEL = "en_US-ryan-low"
STT_LANGUAGE = "en-us"
# Enable keyboard input
KEYBOARD_ENABLE = True
# Enable wake word
WAKE_ENABLE = True
WAKE_WORD = [f"hey {NAME.lower()}"]
# Set wake word answer, set empty to disable
ANSWER_ON_WAKE = "Hi there"
# Welcome message
WELCOME = f"Hi, I'm {NAME}. Wake me up with: " + ", ".join(WAKE_WORD)
# Set instructions
INSTRUCTIONS = f"""
You are a helpful assistant, named {NAME}.
"""
va = VoiceAssistant(
llm,
name=NAME,
with_image=WITH_IMAGE,
tts_model=TTS_MODEL,
stt_language=STT_LANGUAGE,
keyboard_enable=KEYBOARD_ENABLE,
wake_enable=WAKE_ENABLE,
wake_word=WAKE_WORD,
answer_on_wake=ANSWER_ON_WAKE,
welcome=WELCOME,
instructions=INSTRUCTIONS,
)
if __name__ == "__main__":
va.run()
Code explanation:
OpenAI(..., model="gpt-4o-mini")— Uses OpenAI as the only LLM in this lesson.NAME/WAKE_WORD— Personalize the assistant (“Buddy”, “hey buddy”).WITH_IMAGE=True— Enables image mode in the assistant (no image I/O logic included here).TTS_MODEL="en_US-ryan-low"— Piper voice used for replies.STT_LANGUAGE="en-us"— Vosk language for recognition.KEYBOARD_ENABLE=True— Allows optional manual text input during debugging.WELCOME/INSTRUCTIONS— Startup message and assistant persona/system prompt.va.run()— Starts the loop: wake → listen → LLM → speak.
Switching to Other LLMs or TTS
You can easily switch to other LLMs, TTS, or STT languages with just a few edits:
Supported LLMs:
OpenAI
Doubao
Deepseek
Gemini
Qwen
Grok
1. Testing Piper — Check the supported languages of Piper TTS.
2. Test Vosk — Check the supported languages of Vosk STT.
To switch, simply modify the initialization part in the code:
from fusion_hat.llm import Gemini as LLM
llm = LLM(api_key="YOUR_KEY", model="gemini-pro")
# Set models and languages
TTS_MODEL = "en_US-ryan-low"
STT_LANGUAGE = "en-us"
Troubleshooting
Robot doesn’t respond to wake word
Check if the microphone works.
Make sure
WAKE_ENABLE = True.Adjust the wake word to match your pronunciation.
Reduce background noise and speak clearly.
No sound from the speaker
Check the TTS model name (e.g.,
en_US-ryan-low).Test Piper or Espeak manually.
Verify speaker connection and volume.
API key error or timeout
Check your key in
secret.py.Make sure your network connection is stable.
Confirm the LLM model is supported (e.g.,
gpt-4o-mini).
Wake word works but no response
Check if the STT language matches your accent.
Make sure the model downloaded correctly.
Try printing debug logs to confirm STT is running.
TTS works but no LLM reply
Check if the API key is valid.
Verify model name and LLM settings.
Ensure internet connectivity.