Note

Welcome to the SunFounder Raspberry Pi, Arduino & ESP32 Community on Facebook!

  • Get technical support and troubleshooting help.

  • Learn and share projects, tips, and tutorials.

  • Access early product previews and updates.

  • Enjoy exclusive discounts and giveaways.

👉 Join us here: [here]

3. STT with Vosk (Offline)

Vosk is a lightweight speech-to-text (STT) engine that supports many languages and runs fully offline on Raspberry Pi. You only need internet access once to download a language model. After that, everything works without a network connection.

In this lesson, we will install and test Vosk with a chosen language model.

Test Vosk

Run the program

cd ~/sunfounder-voice-assistant/examples
sudo python3 stt_vosk_stream.py

The first time you run this code with a new language, Vosk will:

  • Automatically download the language model (by default, the small version).

  • Print out the list of supported languages.

  • Start listening for audio input through the microphone.

You’ll see something like this in the terminal:

vosk-model-small-en-us-0.15.zip: 100%|███████████████████| 39.3M/39.3M [00:05<00:00, 7.85MB/s]
['ar', 'ar-tn', 'ca', 'cn', 'cs', 'de', 'en-gb', 'en-in', 'en-us', 'eo', 'es', 'fa', 'fr', 'gu', 'hi', 'it', 'ja', 'ko', 'kz', 'nl', 'pl', 'pt', 'ru', 'sv', 'te', 'tg', 'tr', 'ua', 'uz', 'vn']
Say something

This means:

  • The model file (vosk-model-small-en-us-0.15) has been downloaded.

  • The list of supported languages has been printed. sunfounder_voice_assistant

  • The system is now listening — say something into the Pironman 5 Pro MAX microphone, and the recognized text will appear in the terminal.

Tips:

  • Keep the microphone about 15–30 cm away for better accuracy.

  • Choose a model that matches your language and accent.

  • Use a quiet environment to improve recognition.

Code

from sunfounder_voice_assistant.stt import Vosk as STT

stt = STT(language="en-us")

while True:
   print("Say something")
   for result in stt.listen(stream=True):
      if result["done"]:
            print(f"final:   {result['final']}")
      else:
            print(f"partial: {result['partial']}", end="\r", flush=True)

Code explanation:

  • stt.listen(stream=True) — Starts streaming speech recognition and yields intermediate results as you speak.

  • result["partial"] — Displays the real-time recognized text (updated continuously).

  • result["final"] — Displays the final recognized sentence when you stop speaking.

  • The loop runs continuously, allowing hands-free real-time transcription.

Tip: This streaming mode is perfect for voice assistants, command control, or live transcription.

Troubleshooting

  • No such file or directory (when running `arecord`)

    You may have used the wrong card/device number. Run:

    arecord -l
    

    and replace 1,0 with the numbers shown for your USB microphone.

  • Recorded file has no sound

    Open the mixer and check the microphone volume:

    alsamixer
    
    • Press F6 to select your USB mic.

    • Make sure Mic/Capture is not muted ([OO] instead of [MM]).

    • Increase the level with ↑.

  • Vosk does not recognize speech

    • Make sure the language code matches your model (e.g. en-us for English, zh-cn for Chinese).

    • Keep the microphone 15–30 cm away and avoid background noise.

    • Speak clearly and slowly.

  • High latency / slow recognition

    • The default auto-download is a small model (faster, but less accurate).

    • If it’s still slow, close other programs to free CPU.