Note
Welcome to the SunFounder Raspberry Pi, Arduino & ESP32 Community on Facebook!
Get technical support and troubleshooting help.
Learn and share projects, tips, and tutorials.
Access early product previews and updates.
Enjoy exclusive discounts and giveaways.
👉 Join us here: [here]
3. STT with Vosk (Offline)
Vosk is a lightweight speech-to-text (STT) engine that supports many languages and runs fully offline on Raspberry Pi. You only need internet access once to download a language model. After that, everything works without a network connection.
In this lesson, we will install and test Vosk with a chosen language model.
Test Vosk
Run the program
cd ~/sunfounder-voice-assistant/examples sudo python3 stt_vosk_stream.py
The first time you run this code with a new language, Vosk will:
Automatically download the language model (by default, the small version).
Print out the list of supported languages.
Start listening for audio input through the microphone.
You’ll see something like this in the terminal:
vosk-model-small-en-us-0.15.zip: 100%|███████████████████| 39.3M/39.3M [00:05<00:00, 7.85MB/s]
['ar', 'ar-tn', 'ca', 'cn', 'cs', 'de', 'en-gb', 'en-in', 'en-us', 'eo', 'es', 'fa', 'fr', 'gu', 'hi', 'it', 'ja', 'ko', 'kz', 'nl', 'pl', 'pt', 'ru', 'sv', 'te', 'tg', 'tr', 'ua', 'uz', 'vn']
Say something
This means:
The model file (
vosk-model-small-en-us-0.15) has been downloaded.The list of supported languages has been printed. sunfounder_voice_assistant
The system is now listening — say something into the Pironman 5 Pro MAX microphone, and the recognized text will appear in the terminal.
Tips:
Keep the microphone about 15–30 cm away for better accuracy.
Choose a model that matches your language and accent.
Use a quiet environment to improve recognition.
Code
from sunfounder_voice_assistant.stt import Vosk as STT
stt = STT(language="en-us")
while True:
print("Say something")
for result in stt.listen(stream=True):
if result["done"]:
print(f"final: {result['final']}")
else:
print(f"partial: {result['partial']}", end="\r", flush=True)
Code explanation:
stt.listen(stream=True)— Starts streaming speech recognition and yields intermediate results as you speak.result["partial"]— Displays the real-time recognized text (updated continuously).result["final"]— Displays the final recognized sentence when you stop speaking.The loop runs continuously, allowing hands-free real-time transcription.
Tip: This streaming mode is perfect for voice assistants, command control, or live transcription.
Troubleshooting
No such file or directory (when running `arecord`)
You may have used the wrong card/device number. Run:
arecord -land replace
1,0with the numbers shown for your USB microphone.Recorded file has no sound
Open the mixer and check the microphone volume:
alsamixer
Press F6 to select your USB mic.
Make sure Mic/Capture is not muted ([OO] instead of [MM]).
Increase the level with ↑.
Vosk does not recognize speech
Make sure the language code matches your model (e.g.
en-usfor English,zh-cnfor Chinese).Keep the microphone 15–30 cm away and avoid background noise.
Speak clearly and slowly.
High latency / slow recognition
The default auto-download is a small model (faster, but less accurate).
If it’s still slow, close other programs to free CPU.