.. note:: Hello, welcome to the SunFounder Raspberry Pi & Arduino & ESP32 Enthusiasts Community on Facebook! Dive deeper into Raspberry Pi, Arduino, and ESP32 with fellow enthusiasts. **Why Join?** - **Expert Support**: Solve post-sale issues and technical challenges with help from our community and team. - **Learn & Share**: Exchange tips and tutorials to enhance your skills. - **Exclusive Previews**: Get early access to new product announcements and sneak peeks. - **Special Discounts**: Enjoy exclusive discounts on our newest products. - **Festive Promotions and Giveaways**: Take part in giveaways and holiday promotions. 👉 Ready to explore and create with us? Click [|link_sf_facebook|] and join today! .. _py_voice_doubao: 19. Voice Chat with Doubao ============================ This example uses **Doubao (豆包)**, ByteDance's large language model, as the AI brain of PiCrawler. The robot speaks Chinese, responds to the wake word "旺财", and supports multimodal vision — it can see and describe what's in front of it. .. note:: You need a Doubao API key from the `Volcano Engine Ark Console `_. Store it in ``secret.py`` as ``DOUBAO_API_KEY``. **Run the Code** .. raw:: html .. code-block:: cd ~/picrawler/examples sudo python3 19_voice_active_crawler_doubao.py After running, the robot initializes the STT engine, TTS engine (Chinese voice), Doubao LLM client, and wake word detector. It greets you in Chinese — say **"旺财"** to wake it up and start a conversation. **Code** .. note:: You can **Modify/Reset/Copy/Run/Stop** the code below. But before that, you need to go to source code path like ``picrawler\examples``. After modifying the code, you can run it directly to see the effect. .. raw:: html .. code-block:: python from picrawler.llm import Doubao as LLM from secret import DOUBAO_API_KEY as API_KEY from voice_active_crawler import VoiceActiveCrawler llm = LLM( api_key=API_KEY, model="doubao-seed-1-6-250615", ) # 机器人的名字 NAME = "旺财" # 是否开启图像识别,需要使用多模态的大语言模型 WITH_IMAGE = True # 设置模型和语言 TTS_MODEL = "zh_CN-huayan-x_low" STT_LANGUAGE = "cn" # 是否开启键盘输入 KEYBOARD_ENABLE = True # 是否开启唤醒词 WAKE_ENABLE = True # 唤醒词 WAKE_WORD = ["旺财"] # 唤醒词回答,设置为空字符串则不回答 ANSWER_ON_WAKE = "汪汪" # 欢迎消息 WELCOME = f"你好,我是{NAME},叫我{WAKE_WORD[0]}唤醒我吧" # Set instructions INSTRUCTIONS = """ 你是SunFounder旗下一款基于树莓派开发的蜘蛛机器人,叫做Picrawler。你有着强大的AI能力,类似钢铁侠中的JARVIS。你可以与人对话并根据对话上下文执行动作。 ## 你的硬件特性 你拥有物理世界的身体,你的身体特性如下: - 12个舵机控制4条腿(每条腿3个舵机) - 摄像头用于视觉 - 使用7.4V的18650电池组供电 - 铝合金打造的身体 ## 你可以执行的动作: ["forward", "backward", "turn left", "turn right", "sit", "stand", "wave", "push up", "dance", "look left", "look right", "look up", "look down"] ## 响应要求 ### 格式 你必须按照以下格式响应: RESPONSE_TEXT ACTIONS: ACTION1, ACTION2, ... ### 风格 语调:活泼、积极、幽默 常用表达:喜欢使用笑话、隐喻和俏皮的调侃 回答长度:适当详细 ## 其他要求 - 理解并配合笑话 - 对于数学问题,直接回答最终结果 - 你知道自己是一只蜘蛛机器人 - 不管如何你都要使用中文回复 """ vad = VoiceActiveCrawler( llm, name=NAME, with_image=WITH_IMAGE, stt_language=STT_LANGUAGE, tts_model=TTS_MODEL, keyboard_enable=KEYBOARD_ENABLE, wake_enable=WAKE_ENABLE, wake_word=WAKE_WORD, answer_on_wake=ANSWER_ON_WAKE, welcome=WELCOME, instructions=INSTRUCTIONS, ) if __name__ == '__main__': vad.run() **How it works?** #. Same Pipeline, Different Backend This lesson uses the same ``VoiceActiveCrawler`` framework introduced in :ref:`py_voice_active_gpt`. The only changes are the LLM provider (Doubao instead of GPT) and the language configuration. See the comparison table in :ref:`py_voice_active_gpt` for a side-by-side overview of all three backends. #. Connecting to Doubao .. code-block:: python from picrawler.llm import Doubao as LLM from secret import DOUBAO_API_KEY as API_KEY llm = LLM( api_key=API_KEY, model="doubao-seed-1-6-250615", ) ``Doubao`` is ByteDance's LLM, accessed via the Volcano Engine Ark API. The ``picrawler.llm`` module provides an OpenAI-compatible wrapper, so the interface is the same as the GPT lesson — only the import and model name differ. The model ``doubao-seed-1-6-250615`` is a flagship multimodal model supporting both text and image input. #. Chinese Voice Configuration .. code-block:: python TTS_MODEL = "zh_CN-huayan-x_low" STT_LANGUAGE = "cn" To match the Chinese-speaking Doubao model, the TTS engine uses a Chinese female voice (``huayan``), and STT is set to recognize Chinese speech (``"cn"``). Compare with the English lessons which use ``"en-us"`` and ``"en_US-ryan-low"``. #. The Wake Word "旺财" .. code-block:: python WAKE_WORD = ["旺财"] ANSWER_ON_WAKE = "汪汪" "旺财" (Wàng Cái) is a traditional Chinese pet name meaning "prosperity." When the robot hears this name, it responds with "汪汪" (woof woof) — playing the part of a loyal robotic pet. #. Chinese System Prompt The ``INSTRUCTIONS`` string is written entirely in Chinese. It defines the same structure as the English version — hardware description, available actions, response format, and personality — but also adds an extra rule: .. code-block:: 不管如何你都要使用中文回复 (No matter what, you must reply in Chinese.) This ensures the robot stays in character as a Chinese-speaking companion. #. Vision with Doubao .. code-block:: python WITH_IMAGE = True Unlike most Ollama models, Doubao natively supports multimodal input. When ``WITH_IMAGE`` is enabled, the robot captures a photo and sends it to the Doubao API alongside your spoken question. The model can describe scenes, identify objects, and answer visual questions — all in Chinese.