.. include:: /index.rst :start-after: start_hello_message :end-before: end_hello_message .. _ai_voice_assistant_car: 7. AI Voice Assistant =========================== This lesson turns your Pironman 5 Pro MAX into a **voice-first AI assistant**. With the provided code, the robot will: **wait for a wake word**, **transcribe your speech** with Vosk, send it to an **OpenAI LLM**, and **speak back** using Piper TTS. ---- Before You Start ---------------- Make sure you have: * :ref:`test_piper` — Piper voice works (e.g., you can play “Hello”). * :ref:`test_vosk` — Vosk STT works for your language (e.g., ``en-us``). * :ref:`py_online_llm` — Your **OpenAI API key** saved in ``secret.py`` as ``OPENAI_API_KEY``. * A working **microphone** and **speaker** on Pironman 5 Pro MAX. * A stable network connection (LLM is online). ---- Run the Example --------------- .. code-block:: bash cd ~/sunfounder-voice-assistant/examples/ sudo python3 voice_assistant.py **Configuration used by the code:** * LLM: **OpenAI** (``gpt-4o-mini``) * TTS: **Piper** (``en_US-ryan-low``) * STT: **Vosk** (``en-us``) * Wake word: ``"hey buddy"`` * Keyboard input: **enabled** (optional manual input) * Image mode: **enabled** (``WITH_IMAGE=True``) — requires a multimodal-capable LLM if you decide to use images later **What happens:** 1. The assistant shows a welcome message with the wake phrase. 2. It listens for **“hey buddy”**. 3. After wake, your speech is transcribed (Vosk → text). 4. The text is sent to **OpenAI (gpt-4o-mini)** for a response. 5. The answer is spoken with **Piper** (``en_US-ryan-low``). **Example interaction** .. code-block:: text You: Hey Buddy Robot: Hi there! You: What’s the capital of Italy? Robot: The capital of Italy is Rome. Code ----------------- .. code-block:: python from sunfounder_voice_assistant.voice_assistant import VoiceAssistant from sunfounder_voice_assistant.llm import OpenAI as LLM from secret import OPENAI_API_KEY as API_KEY llm = LLM( api_key=API_KEY, model="gpt-4o-mini", ) # Robot name NAME = "Buddy" # Enable image, need to set up a multimodal language model WITH_IMAGE = True # Set models and languages LLM_MODEL = "gpt-4o-mini" TTS_MODEL = "en_US-ryan-low" STT_LANGUAGE = "en-us" # Enable keyboard input KEYBOARD_ENABLE = True # Enable wake word WAKE_ENABLE = True WAKE_WORD = [f"hey {NAME.lower()}"] # Set wake word answer, set empty to disable ANSWER_ON_WAKE = "Hi there" # Welcome message WELCOME = f"Hi, I'm {NAME}. Wake me up with: " + ", ".join(WAKE_WORD) # Set instructions INSTRUCTIONS = f""" You are a helpful assistant, named {NAME}. """ va = VoiceAssistant( llm, name=NAME, with_image=WITH_IMAGE, tts_model=TTS_MODEL, stt_language=STT_LANGUAGE, keyboard_enable=KEYBOARD_ENABLE, wake_enable=WAKE_ENABLE, wake_word=WAKE_WORD, answer_on_wake=ANSWER_ON_WAKE, welcome=WELCOME, instructions=INSTRUCTIONS, ) if __name__ == "__main__": va.run() **Code explanation:** * ``OpenAI(..., model="gpt-4o-mini")`` — Uses **OpenAI** as the only LLM in this lesson. * ``NAME`` / ``WAKE_WORD`` — Personalize the assistant (“Buddy”, “hey buddy”). * ``WITH_IMAGE=True`` — Enables image mode in the assistant (no image I/O logic included here). * ``TTS_MODEL="en_US-ryan-low"`` — Piper voice used for replies. * ``STT_LANGUAGE="en-us"`` — Vosk language for recognition. * ``KEYBOARD_ENABLE=True`` — Allows optional manual text input during debugging. * ``WELCOME`` / ``INSTRUCTIONS`` — Startup message and assistant persona/system prompt. * ``va.run()`` — Starts the loop: **wake → listen → LLM → speak**. Switching to Other LLMs or TTS ------------------------------ You can easily switch to other LLMs, TTS, or STT languages with just a few edits: * Supported LLMs: * OpenAI * Doubao * Deepseek * Gemini * Qwen * Grok * :ref:`test_piper` — Check the supported languages of **Piper TTS**. * :ref:`test_vosk` — Check the supported languages of **Vosk STT**. To switch, simply modify the initialization part in the code: .. code-block:: python from sunfounder_voice_assistant.llm import Gemini as LLM llm = LLM(api_key="YOUR_KEY", model="gemini-pro") # Set models and languages TTS_MODEL = "en_US-ryan-low" STT_LANGUAGE = "en-us" ---- Troubleshooting ----------------------------- * **Robot doesn’t respond to wake word** - Check if the microphone works. - Make sure ``WAKE_ENABLE = True``. - Adjust the wake word to match your pronunciation. - Reduce background noise and speak clearly. * **No sound from the speaker** - Check the TTS model name (e.g., ``en_US-ryan-low``). - Test Piper or Espeak manually. - Verify speaker connection and volume. * **API key error or timeout** - Check your key in ``secret.py``. - Make sure your network connection is stable. - Confirm the LLM model is supported (e.g., ``gpt-4o-mini``). * **Wake word works but no response** - Check if the STT language matches your accent. - Make sure the model downloaded correctly. - Try printing debug logs to confirm STT is running. * **TTS works but no LLM reply** - Check if the API key is valid. - Verify model name and LLM settings. - Ensure internet connectivity.