19. 本地语音聊天机器人

在本课中，你将把之前学到的一切结合起来——语音识别（STT）、 文本转语音（TTS），以及 本地 LLM（Ollama）——在你的 PiCar-X 系统上构建一个完全离线运行的 语音聊天机器人。

工作流程很简单：

监听 — 麦克风捕获你的语音，并使用 Vosk 转写。
思考 — 将文本发送到运行在 Ollama 上的本地 LLM （例如 llama3.2:3b）。
说话 — 聊天机器人使用 Piper TTS 朗读回答。

这将创建一个 免手动的对话式机器人，能够实时理解并回应。

开始之前

确保你已经准备好以下内容：

安装所有模块（重要） — 安装 robot-hat、 vilib、 picar-x 模块，然后运行脚本 i2samp.sh。
已测试 Piper TTS （1. 测试 Piper）并选择一个可用的语音模型。
已测试 Vosk STT （2. 测试 Vosk）并选择合适的语言包（例如 en-us）。
在你的树莓派或另一台计算机上安装了 Ollama （1. 安装 Ollama（LLM）并下载模型），并下载了一个模型，如 llama3.2:3b （如果内存有限，可选择更小的 moondream:1.8b）。

运行代码

打开示例脚本：

cd ~/picar-x/example
sudo nano 19.local_voice_chatbot.py

按需更新参数：
- stt = Vosk(language="en-us")：将其改为与你的口音 / 语言包匹配（例如 en-us、zh-cn、es）。
- tts.set_model("en_US-amy-low")：替换为你在 1. 测试 Piper 中验证过的 Piper 语音模型。
- llm = Ollama(ip="localhost", model="llama3.2:3b")：根据你的环境更新 ip 与 model。
  - ip：如果 Ollama 在 同一台树莓派 上运行，使用 localhost。如果在局域网的另一台电脑上运行，请在 Ollama 中启用 Expose to network，并将 ip 设置为那台电脑的局域网 IP。
  - model：必须与您在 Ollama 中下载 / 启用的模型名称 完全一致。

运行脚本：

cd ~/picar-x/example
sudo python3 19.local_voice_chatbot.py

运行后，你应当看到：
- 机器人用语音播放欢迎语。
- 它等待语音输入。
- Vosk 将你的语音转写为文本。
- 文本被发送到 Ollama，后者以流式方式返回回复。
- 对回复进行清理（移除隐藏推理）后，由 Piper 朗读播放。
- 随时可通过 Ctrl+C 停止程序。

代码

import re
import time
from picarx.llm import Ollama
from picarx.stt import Vosk
from picarx.tts import Piper

# Initialize speech recognition
stt = Vosk(language="en-us")

# Initialize TTS
tts = Piper()
tts.set_model("en_US-amy-low")

# Instructions for the LLM
INSTRUCTIONS = (
    "You are a helpful assistant. Answer directly in plain English. "
    "Do NOT include any hidden thinking, analysis, or tags like <think>."
)
WELCOME = "Hello! I'm your voice chatbot. Speak when you're ready."

# Initialize Ollama connection
llm = Ollama(ip="localhost", model="llama3.2:3b")
llm.set_max_messages(20)
llm.set_instructions(INSTRUCTIONS)

# Utility: clean hidden reasoning
def strip_thinking(text: str) -> str:
    if not text:
        return ""
    text = re.sub(r"<\s*think[^>]*>.*?<\s*/\s*think\s*>", "", text, flags=re.DOTALL|re.IGNORECASE)
    text = re.sub(r"<\s*thinking[^>]*>.*?<\s*/\s*thinking\s*>", "", text, flags=re.DOTALL|re.IGNORECASE)
    text = re.sub(r"```(?:\s*thinking)?\s*.*?```", "", text, flags=re.DOTALL|re.IGNORECASE)
    text = re.sub(r"\[/?thinking\]", "", text, flags=re.IGNORECASE)
    return re.sub(r"\s+\n", "\n", text).strip()

def main():
    print(WELCOME)
    tts.say(WELCOME)

    try:
        while True:
            print("\n🎤 Listening... (Press Ctrl+C to stop)")

            # Collect final transcript from Vosk
            text = ""
            for result in stt.listen(stream=True):
                if result["done"]:
                    text = result["final"].strip()
                    print(f"[YOU] {text}")
                else:
                    print(f"[YOU] {result['partial']}", end="\r", flush=True)

            if not text:
                print("[INFO] Nothing recognized. Try again.")
                time.sleep(0.1)
                continue

            # Query Ollama with streaming
            reply_accum = ""
            response = llm.prompt(text, stream=True)
            for next_word in response:
                if next_word:
                    print(next_word, end="", flush=True)
                    reply_accum += next_word
            print("")

            # Clean and speak
            clean = strip_thinking(reply_accum)
            if clean:
                tts.say(clean)
            else:
                tts.say("Sorry, I didn't catch that.")

            time.sleep(0.05)

    except KeyboardInterrupt:
        print("\n[INFO] Stopping...")
    finally:
        tts.say("Goodbye!")
        print("Bye.")

if __name__ == "__main__":
    main()

代码解析

导入与全局设置

import re
import time
from picarx.llm import Ollama
from picarx.stt import Vosk
from picarx.tts import Piper

引入你之前构建的三个子系统：用于语音转文本（STT）的 Vosk，用于 LLM 的 Ollama，以及用于文本转语音（TTS）的 Piper。

初始化 STT（Vosk）

stt = Vosk(language="en-us")

加载美式英语的 Vosk 模型。将语言代码（例如 zh-cn、es）改为与你的语音包匹配的语言，以获得更高准确率。

初始化 TTS（Piper）

tts = Piper()
tts.set_model("en_US-amy-low")

创建一个 Piper 引擎并选择特定音色。选择你在 1. 测试 Piper 中已经测试通过的模型。较低质量的音色更快且占用更少 CPU。

LLM 指令与欢迎语

INSTRUCTIONS = (
    "You are a helpful assistant. Answer directly in plain English. "
    "Do NOT include any hidden thinking, analysis, or tags like <think>."
)
WELCOME = "Hello! I'm your voice chatbot. Speak when you're ready."

两个关键的用户体验设计：

保持 回答简短直接 （有助于提升 TTS 清晰度）。
明确禁止隐藏的“思维链”标签，以 减少噪声输出。

连接 Ollama 并设置会话范围

llm = Ollama(ip="localhost", model="llama3.2:3b")
llm.set_max_messages(20)
llm.set_instructions(INSTRUCTIONS)

ip="localhost" 假设 Ollama 运行在同一台树莓派上。若运行在局域网的另一台机器上，请在 Ollama 中启用 Expose to network，并填写该机器的 局域网 IP。
set_max_messages(20) 保持较短的对话历史。如果内存 / 时延紧张，可再调小。

在说话前移除隐藏推理 / 标签

def strip_thinking(text: str) -> str:
    if not text:
        return ""
    text = re.sub(r"<\s*think[^>]*>.*?<\s*/\s*think\s*>", "", text, flags=re.DOTALL|re.IGNORECASE)
    text = re.sub(r"<\s*thinking[^>]*>.*?<\s*/\s*thinking\s*>", "", text, flags=re.DOTALL|re.IGNORECASE)
    text = re.sub(r"```(?:\s*thinking)?\s*.*?```", "", text, flags=re.DOTALL|re.IGNORECASE)
    text = re.sub(r"\[/?thinking\]", "", text, flags=re.IGNORECASE)
    return re.sub(r"\s+\n", "\n", text).strip()

有些模型可能会输出内部样式的标签（如 <think>…）。此函数会移除这些内容，确保你的 TTS 只朗读最终答案。

提示： 如果你在屏幕上看到其它“流式”杂质（因为你直接打印了原始 token），这个函数已经确保语音输出保持干净。

主循环：先问候一次，然后监听 → 思考 → 说话

print(WELCOME)
tts.say(WELCOME)

通过终端与扬声器向用户问好。程序启动时执行一次。

监听（带实时部分转写的流式 STT）

print("\n🎤 Listening... (Press Ctrl+C to stop)")

text = ""
for result in stt.listen(stream=True):
    if result["done"]:
        text = result["final"].strip()
        print(f"[YOU] {text}")
    else:
        print(f"[YOU] {result['partial']}", end="\r", flush=True)

stream=True 会生成部分转写，便于即时反馈，并在话语结束时给出最终转写。
最终识别文本存入 text，并打印一次。

保护：若未识别到任何内容，则跳过 LLM 调用

if not text:
    print("[INFO] Nothing recognized. Try again.")
    time.sleep(0.1)
    continue

避免向模型发送空提示（节省时间与算力 / 令牌）。

思考（LLM）并流式打印

reply_accum = ""
response = llm.prompt(text, stream=True)
for next_word in response:
    if next_word:
        print(next_word, end="", flush=True)
        reply_accum += next_word
print("")

将最终转写发送给本地 LLM，并 随到随打 token，从而降低延迟。
同时将完整回复累积到 reply_accum，用于后续处理。

注意： 如果你不想展示原始 token，可将 stream=False，只打印最终字符串。

说话（先清理，再一次性 TTS）

clean = strip_thinking(reply_accum)
if clean:
    tts.say(clean)
else:
    tts.say("Sorry, I didn't catch that.")

清理最终文本，移除隐藏标签，然后 只朗读一次。
让 TTS 保持一次性播报，避免出现诸如 “[LLM] / [SAY]” 之类的重复提示。

退出与清理

except KeyboardInterrupt:
    print("\n[INFO] Stopping...")
finally:
    tts.say("Goodbye!")
    print("Bye.")

使用 Ctrl+C 结束程序。机器人会简短道别，以提示正常退出。

故障排查与常见问题

模型过大（内存错误）

使用较小的模型，如 moondream:1.8b，或在性能更强的计算机上运行 Ollama。
Ollama 没有响应

确保 Ollama 已在运行（ollama serve 或桌面应用已打开）。如果是远程设备，启用 Expose to network 并检查 IP 地址是否正确。
Vosk 无法识别语音

确认麦克风工作正常。如有需要，可尝试更换语言包（zh-cn、es 等）。
Piper 没有声音或报错

确认所选语音模型已下载，并已在 1. 测试 Piper 中测试通过。
回答太长或偏题

编辑 INSTRUCTIONS，添加：“Keep answers short and to the point.”。