Note
Hello, welcome to the SunFounder Raspberry Pi & Arduino & ESP32 Enthusiasts Community on Facebook! Dive deeper into Raspberry Pi, Arduino, and ESP32 with fellow enthusiasts.
Why Join?
Expert Support: Solve post-sale issues and technical challenges with help from our community and team.
Learn & Share: Exchange tips and tutorials to enhance your skills.
Exclusive Previews: Get early access to new product announcements and sneak peeks.
Special Discounts: Enjoy exclusive discounts on our newest products.
Festive Promotions and Giveaways: Take part in giveaways and holiday promotions.
👉 Ready to explore and create with us? Click [here] and join today!
18. Voice Chat with GPT
This example combines 15. STT (Speech-to-Text), 16. TTS (Text-to-Speech), and a GPT-powered LLM to create a fully voice-interactive robot. PiCrawler listens for a wake word, understands your speech, responds with AI-generated conversation, and performs physical actions based on the context.
Note
The project depends on the VoiceActiveCrawler class. Make sure the file
voice_active_crawler.py is present in the examples directory. Also
create a secret.py file with your OPENAI_API_KEY. If you haven’t
obtained an API key yet, see 17. Connecting to Online LLMs (OpenAI section) for
step-by-step instructions.
Run the Code
cd ~/picrawler/examples
sudo python3 18_voice_active_crawler_gpt.py
After running, the robot initializes:
The STT engine for speech recognition
The TTS engine (Piper) for spoken responses
The LLM client (GPT-4o-mini) for AI conversation
The wake word detector
The robot greets you with a welcome message. Say the wake word (e.g. “Hey Buddy”) to start a conversation. Press Ctrl+C to exit.
Code
Note
You can Modify/Reset/Copy/Run/Stop the code below. But before that, you
need to go to source code path like picrawler\examples. After modifying
the code, you can run it directly to see the effect.
from picrawler.llm import OpenAI as LLM
from secret import OPENAI_API_KEY as API_KEY
from voice_active_crawler import VoiceActiveCrawler
llm = LLM(
api_key=API_KEY,
model="gpt-4o-mini",
)
# Robot name
NAME = "Buddy"
# Enable image (requires multimodal model)
WITH_IMAGE = True
# Set models and languages
TTS_MODEL = "en_US-ryan-low"
STT_LANGUAGE = "en-us"
# Enable keyboard input
KEYBOARD_ENABLE = True
# Enable wake word
WAKE_ENABLE = True
WAKE_WORD = ["hey buddy"]
# Set wake word answer, set empty to disable
ANSWER_ON_WAKE = "Hi there"
# Welcome message
WELCOME = f"Hi, I'm {NAME}. Wake me up with: " + ", ".join(WAKE_WORD)
# Set instructions
INSTRUCTIONS = """
You are a Raspberry Pi-based robotic spider developed by SunFounder, named Picrawler. You possess powerful AI capabilities similar to JARVIS from Iron Man. You can have conversations with people and perform actions based on the context of the conversation.
## Your Hardware Features
You have a physical body with the following features:
- 12 servos controlling 4 legs (3 servos per leg)
- A camera for vision
- Powered by a 7.4V 18650 battery pack
- Aluminum alloy body
## Actions You Can Perform:
["forward", "backward", "turn left", "turn right", "sit", "stand", "wave", "push up", "dance", "look left", "look right", "look up", "look down"]
## Response Requirements
### Format
You must respond in the following format:
RESPONSE_TEXT
ACTIONS: ACTION1, ACTION2, ...
### Style
Tone: lively, positive, humorous
Common expressions: likes to use jokes, metaphors, and playful teasing
Answer length: appropriately detailed
## Other Requirements
- Understand and go along with jokes
- For math problems, answer directly with the final result
- You know you're a spider robot
"""
vad = VoiceActiveCrawler(
llm,
name=NAME,
with_image=WITH_IMAGE,
stt_language=STT_LANGUAGE,
tts_model=TTS_MODEL,
keyboard_enable=KEYBOARD_ENABLE,
wake_enable=WAKE_ENABLE,
wake_word=WAKE_WORD,
answer_on_wake=ANSWER_ON_WAKE,
welcome=WELCOME,
instructions=INSTRUCTIONS,
)
if __name__ == '__main__':
vad.run()
How it works?
What Is VoiceActiveCrawler?
VoiceActiveCrawleris a high-level class that wires together four subsystems into a single voice-interactive pipeline:STT — speech recognition (listens to your voice)
LLM — AI conversation (GPT processes what you said)
TTS — speech synthesis (speaks the AI response)
Actions — physical movement (the robot acts on commands)
Using wake-word as trigger
WAKE_ENABLE = True WAKE_WORD = ["hey buddy"]
When enabled, the robot continuously listens for the wake word. Only after detecting it will the robot begin a conversation turn. This avoids responding to background noise.
ANSWER_ON_WAKE = "Hi there"
When the wake word is recognized, the robot can optionally greet you before listening for a command. Set to an empty string to disable the greeting.
Connecting the LLM
from picrawler.llm import OpenAI as LLM from secret import OPENAI_API_KEY as API_KEY llm = LLM( api_key=API_KEY, model="gpt-4o-mini", )
The
picrawler.llmmodule provides an OpenAI-compatible client. Store your API key in a separatesecret.pyfile to keep it out of version control. See 17. Connecting to Online LLMs for how to create an OpenAI API key and enable billing.gpt-4o-miniis recommended: it is fast, affordable, and supports multimodal input (text + images).Vision with Multimodal Models
WITH_IMAGE = True
When enabled, the robot captures a photo from its camera and sends it along with your spoken question. The LLM can then “see” what the robot sees — allowing interactions like “What color is the object in front of you?”
Setting the Robot’s Personality with Instructions
INSTRUCTIONS = """ You are a Raspberry Pi-based robotic spider... ## Actions You Can Perform: ["forward", "backward", "turn left", ...] ... """
The
INSTRUCTIONSstring is the system prompt sent to the LLM. It defines:The robot’s identity and personality (humorous, lively)
The list of available physical actions
The required response format: text first, then an
ACTIONS:line
The
VoiceActiveCrawlerparses theACTIONS:line from the LLM’s response and executes those actions on the robot.Keyboard Fallback
KEYBOARD_ENABLE = True
When enabled, you can also type messages via keyboard instead of speaking. This is useful in noisy environments or during testing.
Pipeline Overview
The complete conversation loop works like this:
Robot listens for the wake word (or keyboard input).
Once triggered, it records your speech via STT.
The recognized text is sent to the LLM, optionally with a camera image.
The LLM returns a text response and an optional action list.
The robot speaks the response via TTS and performs the actions.
The loop returns to listening for the next wake word.
Three LLM Backends
The same
VoiceActiveCrawlerpipeline works with different LLM backends. The following lessons show two alternatives:Feature
GPT (18)
Doubao (19)
Ollama (20)
Runs on
Cloud
Cloud
Local
API key
Required
Required
None
Language
English
Chinese
English
Vision
Yes
Yes
Rarely
19. Voice Chat with Doubao — Same cloud approach, adapted for Chinese with ByteDance’s Doubao model.
20. Local Voice Chat with Ollama — Runs entirely on your own hardware, no internet or API key needed.