.. include:: /index.rst :start-after: start_hello_message :end-before: end_hello_message 3. STT with Vosk (Offline) ============================================== Vosk is a lightweight speech-to-text (STT) engine that supports many languages and runs fully **offline** on Raspberry Pi. You only need internet access once to download a language model. After that, everything works without a network connection. In this lesson, we will install and test Vosk with a chosen language model. .. 1. Check Your Microphone .. -------------------------- .. Before using speech recognition, make sure your USB microphone works correctly. .. #. List available recording devices: .. .. code-block:: bash .. arecord -l .. Look for a line like ``card 1: ... device 0``. .. #. Record a short sample (replace ``1,0`` with the numbers you found): .. .. code-block:: bash .. arecord -D plughw:1,0 -f S16_LE -r 16000 -d 3 test.wav .. * Example: if your device is ``card 2, device 0``, use: .. .. code-block:: bash .. arecord -D plughw:2,0 -f S16_LE -r 16000 -d 3 test.wav .. #. Play it back to confirm the recording: .. .. code-block:: bash .. aplay test.wav .. #. Adjust microphone volume if needed: .. .. code-block:: bash .. alsamixer .. * Press **F6** to select your USB microphone. .. * Find the **Mic** or **Capture** channel. .. * Make sure it is not muted (**[MM]** means mute, press ``M`` to unmute → should show **[OO]**). .. * Use ↑ / ↓ arrow keys to change the recording volume. .. _test_vosk: Test Vosk -------------------------- **Run the program** .. code-block:: bash cd ~/sunfounder-voice-assistant/examples sudo python3 stt_vosk_stream.py The first time you run this code with a new language, Vosk will: * **Automatically download the language model** (by default, the small version). * **Print out the list of supported languages**. * Start **listening** for audio input through the microphone. You’ll see something like this in the terminal: .. code-block:: text vosk-model-small-en-us-0.15.zip: 100%|███████████████████| 39.3M/39.3M [00:05<00:00, 7.85MB/s] ['ar', 'ar-tn', 'ca', 'cn', 'cs', 'de', 'en-gb', 'en-in', 'en-us', 'eo', 'es', 'fa', 'fr', 'gu', 'hi', 'it', 'ja', 'ko', 'kz', 'nl', 'pl', 'pt', 'ru', 'sv', 'te', 'tg', 'tr', 'ua', 'uz', 'vn'] Say something This means: * The model file (``vosk-model-small-en-us-0.15``) has been downloaded. * The list of supported languages has been printed. sunfounder_voice_assistant * The system is now listening — say something into the Pironman 5 Pro MAX microphone, and the recognized text will appear in the terminal. **Tips:** * Keep the microphone about **15–30 cm** away for better accuracy. * Choose a **model that matches your language and accent**. * Use a quiet environment to improve recognition. **Code** .. code-block:: python from sunfounder_voice_assistant.stt import Vosk as STT stt = STT(language="en-us") while True: print("Say something") for result in stt.listen(stream=True): if result["done"]: print(f"final: {result['final']}") else: print(f"partial: {result['partial']}", end="\r", flush=True) **Code explanation:** * ``stt.listen(stream=True)`` — Starts streaming speech recognition and yields intermediate results as you speak. * ``result["partial"]`` — Displays the **real-time recognized text** (updated continuously). * ``result["final"]`` — Displays the **final recognized sentence** when you stop speaking. * The loop runs continuously, allowing **hands-free real-time transcription**. Tip: This streaming mode is perfect for **voice assistants**, **command control**, or **live transcription**. Troubleshooting ----------------- * **No such file or directory (when running `arecord`)** You may have used the wrong card/device number. Run: .. code-block:: bash arecord -l and replace ``1,0`` with the numbers shown for your USB microphone. * **Recorded file has no sound** Open the mixer and check the microphone volume: .. code-block:: bash alsamixer * Press **F6** to select your USB mic. * Make sure **Mic/Capture** is not muted (**[OO]** instead of **[MM]**). * Increase the level with ↑. * **Vosk does not recognize speech** * Make sure the **language code** matches your model (e.g. ``en-us`` for English, ``zh-cn`` for Chinese). * Keep the microphone 15–30 cm away and avoid background noise. * Speak clearly and slowly. * **High latency / slow recognition** * The default auto-download is a **small model** (faster, but less accurate). * If it’s still slow, close other programs to free CPU.