.. include:: /index.rst :start-after: start_hello_message :end-before: end_hello_message 6. Local Voice Chatbot =========================== In this lesson, you will combine everything you've learned — **speech recognition (STT)**, **text-to-speech (TTS)**, and a **local LLM (Ollama)** — to build a fully offline **voice chatbot** that runs on your Pironman 5 Pro MAX. The workflow is simple: #. **Listen** — The microphone captures your speech and transcribes it with **Vosk**. #. **Think** — The text is sent to a local **LLM** running on Ollama (e.g., ``llama3.2:3b``). #. **Speak** — The chatbot answers aloud using **Piper TTS**. This creates a **hands-free conversational robot** that can understand and respond in real time. ---- Before You Start ---------------- Make sure you have prepared the following: * Tested **Piper TTS** (:ref:`test_piper`) and chosen a working voice model. * Tested **Vosk STT** (:ref:`test_vosk`) and chosen the right language pack (e.g., ``en-us``). * Installed **Ollama** (:ref:`download_ollama`) on your Pi or another computer, and downloaded a model such as ``llama3.2:3b`` (or a smaller one like ``moondream:1.8b`` if memory is limited). ---- Run the Code -------------- #. Open the example script: .. code-block:: bash cd ~/sunfounder-voice-assistant/examples/ sudo nano local_voice_chatbot.py #. Update the parameters as needed: * ``stt = Vosk(language="en-us")``: Change this to match your accent/language package (e.g., ``en-us``, ``zh-cn``, ``es``). * ``tts.set_model("en_US-amy-low")``: Replace with the Piper voice model you verified in :ref:`test_piper`. * ``llm = Ollama(ip="localhost", model="llama3.2:3b")``: Update both ``ip`` and ``model`` to your own setup. * ``ip``: If Ollama runs on the **same Pi**, use ``localhost``. If Ollama runs on another computer in your LAN, enable **Expose to network** in Ollama and set ``ip`` to that computer’s LAN IP. * ``model``: Must exactly match the model name you downloaded/activated in Ollama. #. Run the script: .. code-block:: bash cd ~/sunfounder-voice-assistant/examples/ sudo python3 local_voice_chatbot.py #. After running, you should see: * The bot greets you with a spoken welcome message. * It waits for speech input. * Vosk transcribes your speech into text. * The text is sent to Ollama, which streams back a reply. * The reply is cleaned (removing hidden reasoning) and spoken aloud by Piper. * Stop the program anytime with ``Ctrl+C``. ---- Code ---- .. code-block:: python import re import time from sunfounder_voice_assistant.llm import Ollama from sunfounder_voice_assistant.stt import Vosk from sunfounder_voice_assistant.tts import Piper # Initialize speech recognition stt = Vosk(language="en-us") # Initialize TTS tts = Piper() tts.set_model("en_US-amy-low") # Instructions for the LLM INSTRUCTIONS = ( "You are a helpful assistant. Answer directly in plain English. " "Do NOT include any hidden thinking, analysis, or tags like ." ) WELCOME = "Hello! I'm your voice chatbot. Speak when you're ready." # Initialize Ollama connection llm = Ollama(ip="localhost", model="llama3.2:3b") llm.set_max_messages(20) llm.set_instructions(INSTRUCTIONS) # Utility: clean hidden reasoning def strip_thinking(text: str) -> str: if not text: return "" text = re.sub(r"<\s*think[^>]*>.*?<\s*/\s*think\s*>", "", text, flags=re.DOTALL|re.IGNORECASE) text = re.sub(r"<\s*thinking[^>]*>.*?<\s*/\s*thinking\s*>", "", text, flags=re.DOTALL|re.IGNORECASE) text = re.sub(r"```(?:\s*thinking)?\s*.*?```", "", text, flags=re.DOTALL|re.IGNORECASE) text = re.sub(r"\[/?thinking\]", "", text, flags=re.IGNORECASE) return re.sub(r"\s+\n", "\n", text).strip() def main(): print(WELCOME) tts.say(WELCOME) try: while True: print("\n🎤 Listening... (Press Ctrl+C to stop)") # Collect final transcript from Vosk text = "" for result in stt.listen(stream=True): if result["done"]: text = result["final"].strip() print(f"[YOU] {text}") else: print(f"[YOU] {result['partial']}", end="\r", flush=True) if not text: print("[INFO] Nothing recognized. Try again.") time.sleep(0.1) continue # Query Ollama with streaming reply_accum = "" response = llm.prompt(text, stream=True) for next_word in response: if next_word: print(next_word, end="", flush=True) reply_accum += next_word print("") # Clean and speak clean = strip_thinking(reply_accum) if clean: tts.say(clean) else: tts.say("Sorry, I didn't catch that.") time.sleep(0.05) except KeyboardInterrupt: print("\n[INFO] Stopping...") finally: tts.say("Goodbye!") print("Bye.") if __name__ == "__main__": main() ---- Code Analysis ------------- **Imports and global setup** .. code-block:: python import re import time from sunfounder_voice_assistant.llm import Ollama from sunfounder_voice_assistant.stt import Vosk from sunfounder_voice_assistant.tts import Piper Brings in the three subsystems you built earlier: **Vosk** for speech-to-text (STT), **Ollama** for the LLM, and **Piper** for text-to-speech (TTS). **Initialize STT (Vosk)** .. code-block:: python stt = Vosk(language="en-us") Loads the Vosk model for US English. Change the language code (e.g., ``zh-cn``, ``es``) to match your voice pack for better accuracy. **Initialize TTS (Piper)** .. code-block:: python tts = Piper() tts.set_model("en_US-amy-low") Creates a Piper engine and selects a specific voice. Pick a model you’ve tested in :ref:`test_piper`. Lower-quality voices are faster and use less CPU. **LLM instructions and welcome line** .. code-block:: python INSTRUCTIONS = ( "You are a helpful assistant. Answer directly in plain English. " "Do NOT include any hidden thinking, analysis, or tags like ." ) WELCOME = "Hello! I'm your voice chatbot. Speak when you're ready." Two key UX choices: * Keep **answers short and direct** (helps with TTS clarity). * Explicitly forbid hidden “chain-of-thought” tags to reduce noisy outputs. **Connect to Ollama and set conversation scope** .. code-block:: python llm = Ollama(ip="localhost", model="llama3.2:3b") llm.set_max_messages(20) llm.set_instructions(INSTRUCTIONS) * ``ip="localhost"`` assumes the Ollama server runs on the same Pi. If it runs on another LAN machine, put that computer’s **LAN IP** and enable *Expose to network* in Ollama. * ``set_max_messages(20)`` keeps a short conversational history. Lower this if memory/latency is tight. **Strip hidden reasoning / tags before speaking** .. code-block:: python def strip_thinking(text: str) -> str: if not text: return "" text = re.sub(r"<\s*think[^>]*>.*?<\s*/\s*think\s*>", "", text, flags=re.DOTALL|re.IGNORECASE) text = re.sub(r"<\s*thinking[^>]*>.*?<\s*/\s*thinking\s*>", "", text, flags=re.DOTALL|re.IGNORECASE) text = re.sub(r"```(?:\s*thinking)?\s*.*?```", "", text, flags=re.DOTALL|re.IGNORECASE) text = re.sub(r"\[/?thinking\]", "", text, flags=re.IGNORECASE) return re.sub(r"\s+\n", "\n", text).strip() Some models may emit internal-style tags (e.g., ``…``). This function removes those so your TTS **only** speaks the final answer. **Tip:** If you see other artifacts on screen (because you stream raw tokens), this function already ensures **spoken** output stays clean. **Main loop: greet once, then listen → think → speak** .. code-block:: python print(WELCOME) tts.say(WELCOME) Greets the user via terminal and speaker. Happens once at startup. **Listen (streaming STT with live partials)** .. code-block:: python print("\n🎤 Listening... (Press Ctrl+C to stop)") text = "" for result in stt.listen(stream=True): if result["done"]: text = result["final"].strip() print(f"[YOU] {text}") else: print(f"[YOU] {result['partial']}", end="\r", flush=True) * ``stream=True`` yields **partial** transcripts for immediate feedback and a **final** transcript when the utterance ends. * The final recognized text is stored in ``text`` and printed once. **Guard:** If nothing was recognized, you skip the LLM call: .. code-block:: python if not text: print("[INFO] Nothing recognized. Try again.") time.sleep(0.1) continue This avoids sending empty prompts to the model (saves time and tokens). **Think (LLM) with streamed printing** .. code-block:: python reply_accum = "" response = llm.prompt(text, stream=True) for next_word in response: if next_word: print(next_word, end="", flush=True) reply_accum += next_word print("") * Sends the final transcript to the local LLM and **prints tokens as they arrive** for low latency. * Meanwhile, you accumulate the full reply in ``reply_accum`` for post-processing. **Note:** If you’d rather **not** show raw tokens, set ``stream=False`` and just print the final string. **Speak (clean first, then TTS once)** .. code-block:: python clean = strip_thinking(reply_accum) if clean: tts.say(clean) else: tts.say("Sorry, I didn't catch that.") * Cleans the final text to remove hidden tags, then **speaks exactly once**. * Keeping TTS to a single pass avoids repeated prompts like “[LLM] / [SAY]”. **Exit and teardown** .. code-block:: python except KeyboardInterrupt: print("\n[INFO] Stopping...") finally: tts.say("Goodbye!") print("Bye.") Use **Ctrl+C** to stop. The bot says a short goodbye to signal a clean exit. ---- Troubleshooting & FAQ --------------------- * **Model is too large (memory error)** Use a smaller model like ``moondream:1.8b`` or run Ollama on a more powerful computer. * **No response from Ollama** Make sure Ollama is running (``ollama serve`` or desktop app open). If remote, enable **Expose to network** and check IP address. * **Vosk not recognizing speech** Verify your microphone works. Try another language pack (``zh-cn``, ``es`` etc.) if needed. * **Piper silent or errors** Confirm the chosen voice model is downloaded and tested in :ref:`test_piper`. * **Answers too long or off-topic** Edit ``INSTRUCTIONS`` to add: **“Keep answers short and to the point.”**