Note

Hello, welcome to the SunFounder Raspberry Pi & Arduino & ESP32 Enthusiasts Community on Facebook! Dive deeper into Raspberry Pi, Arduino, and ESP32 with fellow enthusiasts.

Why Join?

Expert Support: Solve post-sale issues and technical challenges with help from our community and team.
Learn & Share: Exchange tips and tutorials to enhance your skills.
Exclusive Previews: Get early access to new product announcements and sneak peeks.
Special Discounts: Enjoy exclusive discounts on our newest products.
Festive Promotions and Giveaways: Take part in giveaways and holiday promotions.

👉 Ready to explore and create with us? Click [here] and join today!

3. Train Your Custom YOLO Model

Training your own YOLO model essentially involves letting the deep learning algorithm learn how to identify specific objects from the image data you provide. This process can be analogized to teaching a child to recognize something new: you show them numerous example images from different angles and environments, telling them “this is the target object.” After sufficient examples, they can accurately identify that object in new images.

For YOLO, the training process works like this:

Data Preparation: Collect images containing the target objects and annotate the position and category of each object
Model Learning: The algorithm automatically learns the feature patterns of objects by analyzing this annotated data
Weight Generation: After training completes, generate a model file (.pt file) containing the learned knowledge
Inference Application: Deploy this model to the Raspberry Pi for detection on new images

Thanks to transfer learning, we don’t need to train from scratch. The Ultralytics platform provides pre-trained base models (such as YOLOv8n) that have been trained on millions of images. We only need to “fine-tune” these models with a small number of our own images to create effective custom models.

Capturing Photos

Since our YOLO project is based on the Raspberry Pi, we’ll use the Raspberry Pi camera to capture photos. For better results, we also used mobile phones to capture some photos to increase data diversity.

Photo Capture Tips

Clarity: Capture objects as clearly as possible, avoiding blurriness
Diversity: Capture photos from different angles (front, side, overhead, etc.) and under different lighting conditions (bright light, low light, backlight, etc.)
Background Variation: Try to capture images against different backgrounds to help the model learn the essential features of objects rather than backgrounds
Avoid Overlap: You can capture multiple objects simultaneously, but avoid significant overlap between objects
Quantity Recommendation: Aim for at least 50-100 photos per category; more images yield better results

What Object Should You Use?

You can choose any object you’re interested in to train, such as: a doll, a cup, a chair, or even your pet. This tutorial uses a snowman toy as an example; simply replace it with your own target object.

../_images/ultralytics_a1_capture_photo.png

Capturing Photos with the Raspberry Pi Camera

Here’s the code for capturing photos using the Raspberry Pi camera:

cd ~/ai-lab-kit/yolo
python3 yolo_capture_images.py

#!/usr/bin/env python3
"""
Simple camera capture script for Raspberry Pi
Press SPACE to capture, ESC to exit
Images saved to ./captured_images/
"""

from picamera2 import Picamera2
import cv2
import os
import time

# Create save directory
save_dir = "captured_images"
os.makedirs(save_dir, exist_ok=True)

# Initialize camera
picam2 = Picamera2()
picam2.preview_configuration.main.size = (640, 480)
picam2.preview_configuration.main.format = "RGB888"
picam2.configure("preview")
picam2.start()

# Wait for camera to warm up
time.sleep(1)

print("=== Camera Capture Tool ===")
print(f"Images will be saved to: {save_dir}")
print("Controls:")
print("  SPACE - Capture image")
print("  ESC   - Exit")
print("==========================")

count = 0

try:
   while True:
      # Capture frame
      frame = picam2.capture_array()

      # Display frame with instructions
      display = frame.copy()
      cv2.putText(display, f"Captured: {count} images", (10, 30),
                  cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)
      cv2.putText(display, "Press SPACE to capture, ESC to exit", (10, 60),
                  cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 1)

      cv2.imshow("Camera Capture", display)

      # Wait for key press
      key = cv2.waitKey(1) & 0xFF

      if key == 32:  # SPACE key
            # Save image
            filename = f"{save_dir}/img_{count:04d}.jpg"
            cv2.imwrite(filename, frame)
            print(f"Captured: {filename}")
            count += 1

            # Optional: flash effect
            flash = frame.copy()
            flash[:] = (255, 255, 255)
            cv2.imshow("Camera Capture", flash)
            cv2.waitKey(50)

      elif key == 27:  # ESC key
            print(f"\nExiting. Total captured: {count} images")
            break

finally:
   cv2.destroyAllWindows()
   picam2.stop()
   print("Camera stopped")

Transferring Images to Your Computer

After capturing, use FileZilla Software to download the images from the Raspberry Pi to your computer:

Check the IP address on your Raspberry Pi: hostname -I
Connect to the Raspberry Pi in FileZilla (username: pi, password: your password)
Navigate to the ~/ai-lab-kit/yolo/captured_images/ directory
Download all images to your computer

Training the Model

We’ll use the online Ultralytics Platform. This platform provides convenient model training services without the need to configure complex training environments.

Registration and Login

Click Get started in the upper right corner to access the registration page and complete the sign-up process.

Creating a Dataset

After registration, you’ll be taken to the homepage. Click New Dataset to create a new dataset.

../_images/ultralytics_3_new_dataset.png

A window will pop up. Here you can upload the photos you just captured with your Raspberry Pi and enter a Dataset name. Then click Create & upload.

../_images/ultralytics_4_create_dataset.png

You’ll now enter the dataset interface, where you can see all uploaded images.

Annotating Images

Open each photo to annotate. Use the +Add Class button on the right to add categories. Add the appropriate category name based on the object you want to identify (for example: if training to recognize a cup, add “cup”; if training to recognize a pet, add “pet”).

Annotation Tips: - Use the mouse to draw bounding boxes around objects, keeping them as close to object edges as possible - Ensure each object is correctly annotated - If an image contains no target objects, no annotation is needed

Repeat the above steps until all photos are annotated. Check that annotations on each image are accurate.

Creating a Training Model

Click Models, then click New Model.

In the pop-up window, select YOLOv8n or YOLO11n as the Base Model. These are nano versions suitable for Raspberry Pi, offering small size and fast speed.

Configure training parameters:
- Image size: Select 320 (this is the image size that the Raspberry Pi can efficiently process)
- Epochs: Keep the default (typically 50-100 epochs)
- GPU Type: No specific requirement, but different GPU types affect training speed and cost
Note: New Ultralytics accounts come with $5 in free credits; training a small model typically costs only a few cents, use as needed.

Click Start Training. Wait for a period (usually 10-30 minutes, depending on data volume and GPU), and the model will complete training.

During training, you can see real-time metrics:
- box_loss: Bounding box loss; smaller values are better
- cls_loss: Classification loss; smaller values are better
- mAP: Mean Average Precision; higher values are better (0-1 range)

Downloading and Deploying

After training completes, click Download PyTorch Model to download the trained model (it will be a .pt file).

../_images/ultralytics_10_download_model.png

After downloading, use FileZilla to transfer it to your Raspberry Pi (recommended to place it in the ~/ai-lab-kit/yolo/ directory).

Running the Custom Model

After placing the model on your Raspberry Pi, you need to modify the model path in the example code. Here’s a complete running example

cd ~/ai-lab-kit/yolo
nano yolo_custom.py

replace the model filename with your own downloaded file:

#!/usr/bin/env python3
import cv2
from picamera2 import Picamera2
from ultralytics import YOLO

model = YOLO("your_model.pt")  # Replace with your model filename

# initialize camera
picam2 = Picamera2()
picam2.preview_configuration.main.size = (640, 480)
picam2.preview_configuration.main.format = "RGB888"
picam2.configure("preview")
picam2.start()

print("YOLO start, Press 'q' to exit...")

try:
   while True:
      # capture frame
      frame = picam2.capture_array()

      # run YOLO and set imgsz=320
      results = model(frame, imgsz=320)

      # draw results
      annotated = results[0].plot()

      # show results
      cv2.imshow("YOLO on Raspberry Pi", annotated)

      # press 'q' to exit
      if cv2.waitKey(1) & 0xFF == ord('q'):
            break
finally:
   cv2.destroyAllWindows()
   picam2.stop()
   print("exit")

Verifying Results

Run the example code to observe how the YOLO model performs on your Raspberry Pi:
```
python3 yolo_custom.py
```
If everything works correctly, you should see your trained target object framed by a bounding box in the camera feed, with the category name and confidence score displayed.

Congratulations! You have successfully trained your own YOLO model and deployed it on the Raspberry Pi.

Training Tips and Recommendations

Improving Model Performance

Increase Data Volume: Aim for at least 50-100 images per category
Data Augmentation: Proactively vary angles, distances, and lighting during capture
Negative Samples: Include some images without target objects to help reduce false positives
Balanced Dataset: If identifying multiple categories, ensure similar image counts for each category

Common Questions

Q: What if model detection results are unsatisfactory?

Check annotation accuracy
Increase the number of training images
Try larger models (like YOLOv8s) or more training epochs
Capture more images from different scenarios

Q: How long does training take?

With approximately 50 images and YOLOv8n, training typically takes 10-20 minutes
The platform automatically adjusts based on the selected GPU

Q: Can I train locally?

Yes, but you’ll need to configure the Python environment and GPU drivers. For beginners, the Ultralytics platform is recommended for quickly validating ideas.