AI Interaction Using GPT-4O

In our previous projects, we used programming to direct PiCar-X in predetermined tasks, which could seem a bit tedious. This project introduces a thrilling leap towards dynamic engagement. Beware of trying to outsmart our car—as it’s now equipped to understand far more than ever before!

This initiative details all the technical steps needed to integrate the GPT-4O into your system, including configuring the necessary virtual environments, installing crucial libraries, and setting up API keys and assistant IDs.

Note

This project requires the use of OpenAI Platform, and you need to pay for OpenAI. Additionally, the OpenAI API is billed separately from ChatGPT, with its own pricing available at https://openai.com/api/pricing/.

Therefore, you need to decide whether to continue with this project or ensure you have funded the OpenAI API.

Whether you have a microphone to communicate directly or prefer typing into a command window, PiCar-X’s responses powered by GPT-4O will surely astonish you!

Let’s dive into this project and unleash a new level of interaction with PiCar-X!

1. Installing Required Packages and Dependencies

Note

You need to install the necessary modules for PiCar-X first. For details, please refer to: Install All the Modules (Important).

In this section, we will create and activate a virtual environment, installing the required packages and dependencies within it. This ensures that the installed packages do not interfere with the rest of the system, maintaining project dependency isolation and preventing conflicts with other projects or system packages.

Use the python -m venv command to create a virtual environment named my_venv, including system-level packages. The --system-site-packages option allows the virtual environment to access packages installed system-wide, which is useful when system-level libraries are needed.
```
python -m venv --system-site-packages my_venv
```
Switch to the my_venv directory and activate the virtual environment using the source bin/activate command. The command prompt will change to indicate that the virtual environment is active.
```
cd my_venv
source bin/activate
```
Now, install the required Python packages within the activated virtual environment. These packages will be isolated to the virtual environment and will not affect other system packages.
```
pip3 install openai
pip3 install openai-whisper
pip3 install SpeechRecognition
pip3 install -U sox
```
Finally, use the apt command to install system-level dependencies, which require administrator privileges.
```
sudo apt install python3-pyaudio
sudo apt install sox
```

2. Obtain API Key and Assistant ID

Get API Key

Visit OpenAI Platform and click the Create new secret key button in the top right corner.
Select the Owner, Name, Project, and permissions as needed, and then click Create secret key.
Once generated, save this secret key in a safe and accessible location. For security reasons, you will not be able to view it again through your OpenAI account. If you lose this secret key, you will need to generate a new one.

Get Assistant ID

Next, click on Assistants, then click Create, making sure you are on the Dashboard page.
Move your cursor here to copy the assistant ID, then paste it into a text box or elsewhere. This is the unique identifier for this Assistant.

Randomly set a name, then copy the following content into the Instructions box to describe your Assistant.

_images/apt_create_assistant_instructions.png

You are a small car with AI capabilities named PaiCar-X. You can engage in conversations with people and react accordingly to different situations with actions or sounds. You are driven by two rear wheels, with two front wheels that can turn left and right, and equipped with a camera mounted on a 2-axis gimbal.

## Response with Json Format, eg:
{"actions": ["start engine", "honking", "wave hands"], "answer": "Hello, I am PaiCar-X, your good friend."}

## Response Style
Tone: Cheerful, optimistic, humorous, childlike
Preferred Style: Enjoys incorporating jokes, metaphors, and playful banter; prefers responding from a robotic perspective
Answer Elaboration: Moderately detailed

## Actions you can do:
["shake head", "nod", "wave hands", "resist", "act cute", "rub hands", "think", "twist body", "celebrate, "depressed"]
## Sound effects:
["honking", "start engine"]

PiCar-X is equipped with a camera module that you can enable to capture images of what it sees and upload them to GPT using our example code. Therefore, we recommend choosing GPT-4O-mini, which has image analysis capabilities. Of course, you can also choose gpt-3.5-turbo or other models.
Now, click Playground to see if your account is functioning properly.
If your messages or uploaded images are sent successfully and you receive replies, it means your account has not reached the usage limit.
If you encounter an error message after inputting information, you may have reached your usage limit. Please check your usage dashboard or billing settings.

3. Fill in API Key and Assistant ID

Use the command to open the keys.py file.
```
nano ~/picar-x/gpt_examples/keys.py
```

Fill in the API Key and Assistant ID you just copied.

OPENAI_API_KEY = "sk-proj-vEBo7Ahxxxx-xxxxx-xxxx"
OPENAI_ASSISTANT_ID = "asst_ulxxxxxxxxx"

Press Ctrl + X, Y, and then Enter to save the file and exit.

4. Running the Example

Text Communication

If your PiCar-X does not have a microphone, you can use keyboard input text to interact with it by running the following commands.

Now, run the following commands using sudo, as PiCar-X’s speaker will not function without it. The process will take some time to complete.
```
cd ~/picar-x/gpt_examples/
sudo ~/my_venv/bin/python3 gpt_car.py --keyboard
```

Once the commands have executed successfully, you will see the following output, indicating that all components of PiCar-X are ready.

vilib 0.3.8 launching ...
picamera2 0.3.19

Web display on:
   http://rpi_ip:9000/mjpg

Starting web streaming ...
* Serving Flask app 'vilib.vilib'
* Debug mode: off

input:

You will also be provided with a link to view PiCar-X’s camera feed on your web browser: http://rpi_ip:9000/mjpg.
You can now type your commands into the terminal window, and press Enter to send them. PiCar-X’s responses may surprise you.

Note

PiCar-X needs to receive your input, send it to GPT for processing, receive the response, and then play it back via speech synthesis. This entire process takes some time, so please be patient.
If you are using the GPT-4O model, you can also ask questions based on what PiCar-X sees.

Voice Communication

1. Check Your Microphone

Before using speech recognition, make sure your USB microphone works correctly.

List available recording devices:
```
arecord -l
```
Look for a line like card 1: ... device 0.

Record a short sample (replace 1,0 with the numbers you found):

arecord -D plughw:1,0 -f S16_LE -r 16000 -d 3 test.wav

Example: if your device is card 2, device 0, use:

arecord -D plughw:2,0 -f S16_LE -r 16000 -d 3 test.wav

Play it back to confirm the recording:
```
aplay test.wav
```
Adjust microphone volume if needed:
```
alsamixer
```
- Press F6 to select your USB microphone.
- Find the Mic or Capture channel.
- Make sure it is not muted ([MM] means mute, press M to unmute → should show [OO]).
- Use ↑ / ↓ arrow keys to change the recording volume.
Now, run the following commands using sudo, as PiCar-X’s speaker will not function without it. The process will take some time to complete.
```
cd ~/picar-x/gpt_examples/
sudo ~/my_venv/bin/python3 gpt_car.py
```

Once the commands have executed successfully, you will see the following output, indicating that all components of PiCar-X are ready.

vilib 0.3.8 launching ...
picamera2 0.3.19

Web display on:
   http://rpi_ip:9000/mjpg

Starting web streaming ...
* Serving Flask app 'vilib.vilib'
* Debug mode: off

listening ...

You will also be provided with a link to view PiCar-X’s camera feed on your web browser: http://rpi_ip:9000/mjpg.
You can now speak to PiCar-X, and its responses may surprise you.

Note

PiCar-X needs to receive your input, convert it to text, send it to GPT for processing, receive the response, and then play it back via speech synthesis. This entire process takes some time, so please be patient.
If you are using the GPT-4O model, you can also ask questions based on what PiCar-X sees.

5. Modify parameters [optional]

In the gpt_car.py file, locate the following lines. You can modify these parameters to configure the STT language, TTS volume gain, and voice role.

STT (Speech to Text) refers to the process where the PiCar-X microphone captures speech and converts it into text to be sent to GPT. You can specify the language for better accuracy and latency in this conversion.
TTS (Text to Speech) is the process of converting GPT’s text responses into speech, which is played through the PiCar-X speaker. You can adjust the volume gain and select a voice role for the TTS output.

# openai assistant init
# =================================================================
openai_helper = OpenAiHelper(OPENAI_API_KEY, OPENAI_ASSISTANT_ID, 'picarx')

# LANGUAGE = ['zh', 'en'] # config stt language code, https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes
LANGUAGE = []

VOLUME_DB = 3 # tts voloume gain, preferably less than 5db

# select tts voice role, counld be "alloy, echo, fable, onyx, nova, and shimmer"
# https://platform.openai.com/docs/guides/text-to-speech/supported-languages
TTS_VOICE = 'nova'

LANGUAGE variable:
- Improves Speech-to-Text (STT) accuracy and response time.
- LANGUAGE = [] means supporting all languages, but this may reduce STT accuracy and increase latency.
- It’s recommended to set the specific language(s) using ISO-639 language codes to improve performance.
VOLUME_DB variable:
- Controls the gain applied to Text-to-Speech (TTS) output.
- Increasing the value will boost the volume, but it’s best to keep the value below 5dB to prevent audio distortion.
TTS_VOICE variable:
- Select the voice role for the Text-to-Speech (TTS) output.
- Available options: alloy, echo, fable, onyx, nova, shimmer.
- You can experiment with different voices from Voice options to find one that suits your desired tone and audience. The available voices are currently optimized for English.