Note
Hello, welcome to the SunFounder Raspberry Pi & Arduino & ESP32 Enthusiasts Community on Facebook! Dive deeper into Raspberry Pi, Arduino, and ESP32 with fellow enthusiasts.
Why Join?
Expert Support: Solve post-sale issues and technical challenges with help from our community and team.
Learn & Share: Exchange tips and tutorials to enhance your skills.
Exclusive Previews: Get early access to new product announcements and sneak peeks.
Special Discounts: Enjoy exclusive discounts on our newest products.
Festive Promotions and Giveaways: Take part in giveaways and holiday promotions.
š Ready to explore and create with us? Click [here] and join today!
7. Human Pose Estimationļ
1. Overviewļ
Following the implementation of hand and gesture recognition, this chapter introduces MediaPipe Pose ā a lightweight yet powerful real-time human pose estimation module.
Using MediaPipe Pose, we can detect 33 body landmarks in real time and draw the full-body skeleton on the video feed.
This module can be used for:
Action recognition
Posture correction
Fitness monitoring
Motion analysis
2. How It Worksļ
The program performs the following steps:
Initialize the MediaPipe Pose model (configure model complexity and optional segmentation).
Capture video frames using
Picamera2.Convert frames to RGB format (required by MediaPipe).
Run the Pose model to obtain 33 body keypoints.
Draw keypoints and skeleton connections using OpenCV.
Display the annotated video stream in real time.
This chapter lays the foundation for more advanced humanācomputer interaction and body motion analysis tasks.
3. Run the Codeļ
Important
Before you start, make sure:
The pan-tilt is assembled
You can access the Raspberry Pi desktop
The code package is installed
Fusion HAT+ is installed and configured
OpenCV is installed
For detailed instructions, see 0. Setup OpenCV.
Open the terminal and enter the following command:
sudo python3 ~/ai-lab-kit/mediapipe/mp_pose.py
If you want to use MediaPipe Pose with a recorded video, you can run the following command:
sudo python3 ~/ai-lab-kit/mediapipe/mp_pose_video.py
After running the program, a window titled āShow Videoā opens and displays the live camera feed.
When a person appears in front of the camera:
MediaPipe Pose detects 33 body landmarks in real time.
A full-body skeleton is drawn on the video frame.
Key joints such as shoulders, elbows, wrists, hips, knees, and ankles are connected with lines.
As the person moves:
The skeletal keypoints follow the body motion smoothly.
The skeleton updates continuously in real time.
If background segmentation is enabled (
enable_segmentation=True), the model internally computes a segmentation mask, although in this example only the skeleton is displayed.If no person is detected, the program simply shows the normal camera feed without annotations.
Press
qto exit the program. The camera stops and the OpenCV window closes automatically.
4. Complete Codeļ
Here is a basic human pose detection program:
from picamera2 import Picamera2, Preview
import cv2
import mediapipe.python.solutions.pose as mp_pose
import mediapipe.python.solutions.drawing_utils as drawing
import mediapipe.python.solutions.drawing_styles as drawing_styles
# Initialize the Pose model
pose = mp_pose.Pose(
static_image_mode=False, # False for processing video streams
model_complexity=2, # 0~2, higher is more accurate
enable_segmentation=True, # Enable background segmentation (optional)
)
# Open the camera
picam2 = Picamera2()
config = picam2.create_preview_configuration(
main={"size": (640, 480), "format": "XRGB8888"} ,
)
picam2.configure(config)
picam2.start()
print("Streaming... press 'q' to quit")
while True:
frame_bgra = picam2.capture_array()
frame_bgr = cv2.cvtColor(frame_bgra, cv2.COLOR_BGRA2BGR)
# Convert BGR to RGB (required by MediaPipe)
frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
# Pose detection
results = pose.process(frame_rgb)
# Convert RGB back to BGR (required by OpenCV)
frame = cv2.cvtColor(frame_rgb, cv2.COLOR_RGB2BGR)
# If human body is detected, draw skeleton
if results.pose_landmarks:
drawing.draw_landmarks(
frame,
results.pose_landmarks,
mp_pose.POSE_CONNECTIONS,
landmark_drawing_spec=drawing_styles.get_default_pose_landmarks_style(),
)
cv2.imshow("Show Video", frame)
if cv2.waitKey(1) & 0xff == ord('q'):
break
picam2.stop_preview()
picam2.stop()
cv2.destroyAllWindows()
After running the program, the camera feed will display a real-time human skeleton, including:
33 keypoints
Skeleton connection lines
Skeleton follows movement when the person moves
5. Code Explanationļ
1. Import Libraries
from picamera2 import Picamera2, Preview
import cv2
import mediapipe.python.solutions.pose as mp_pose
import mediapipe.python.solutions.drawing_utils as drawing
import mediapipe.python.solutions.drawing_styles as drawing_styles
Picamera2 Controls the Raspberry Pi camera, based on libcamera.
cv2 (OpenCV) Used for image color space conversion (BGRāRGB), display windows, drawing graphics.
mediapipe.python.solutions.pose MediaPipeās Pose model, which can detect 33 full-body keypoints (head, shoulders, elbows, knees, etc.), and can return segmentation masks (human vs. background).
drawing_utils / drawing_styles MediaPipeās built-in drawing tools and style definitions, used for drawing keypoints and skeleton lines.
2. Initialize Pose Model
pose = mp_pose.Pose(
static_image_mode=False, # Continuous video mode
model_complexity=1,
enable_segmentation=True,
)
static_image_mode=False: Indicates the input is a continuous video stream, not a single image. Tracks after initial detection for faster speed. Usually set to False.model_complexity=1: Model complexity, 0=light, 1=medium, 2=high accuracy (slower). Set to 1 or 2 if Raspberry Pi performance allows.enable_segmentation=True: Outputs human segmentation mask, can distinguish foreground person from background. When True, enables effects like background replacement, chroma keying. This usage will be explained in subsequent documentation: 9. Green Screen
MediaPipe Pose returns a result structure including:
pose_landmarks: 33 keypoints;pose_world_landmarks: 3D world coordinates;segmentation_mask: Human segmentation map.
3. Open Camera
picam2 = Picamera2()
config = picam2.create_preview_configuration(
main={"size": (640, 480), "format": "XRGB8888"} ,
)
picam2.configure(config)
#picam2.start_preview(Preview.QTGL)
picam2.start()
Create camera object
Picamera2()Set resolution 640x480, pixel format
"XRGB8888"(4-channel BGRA). This format has the best compatibility with OpenCV, eliminating decoding steps.Start the camera.
Optional:
picam2.start_preview(Preview.QTGL) can display the video stream window directly on the GPU; commented out here, using OpenCVās imshow() instead.
4. Main Loop: Process Each Frame
while True:
frame_bgra = picam2.capture_array() # Capture a frame from the camera (BGRA format)
frame_bgr = cv2.cvtColor(frame_bgra, cv2.COLOR_BGRA2BGR)
Capture the current frame. Picamera2 returns images in BGRA (Blue Green Red + Alpha) format by default.
Convert to BGR for subsequent OpenCV processing.
# Convert to RGB for MediaPipe
frame = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
results = pose.process(frame)
MediaPipe models must use RGB.
Call
pose.process()for keypoint detection.resultsis a complex object that may contain:results.pose_landmarks: Keypoints (33 points)results.pose_world_landmarks: 3D coordinatesresults.segmentation_mask: Segmentation mask
# Convert back to BGR for OpenCV display
frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
Convert back because OpenCVās imshow() requires BGR order.
5. Draw Pose Keypoints
if results.pose_landmarks:
drawing.draw_landmarks(
frame,
results.pose_landmarks,
mp_pose.POSE_CONNECTIONS,
landmark_drawing_spec=drawing_styles.get_default_pose_landmarks_style(),
)
If a human body is detected:
results.pose_landmarks: Contains(x, y, z, visibility)for each keypoint.x, y: Normalized coordinates (0~1)z: Relative depthvisibility: Keypoint confidence (0~1)
draw_landmarksparameter explanation:frame: Image to draw on (BGR format)results.pose_landmarks: Human keypoints for the current framemp_pose.POSE_CONNECTIONS: Connection rules (which points to connect with lines)landmark_drawing_spec: Point drawing styleconnection_drawing_spec: Line drawing style (can be omitted, uses system default style)
Effect: Draws the skeleton (connections for head, arms, legs) and keypoints (joint positions) on the image.
6. Display Frame & Exit Logic
cv2.imshow("Show Video", frame)
if cv2.waitKey(1) & 0xff == ord('q'):
break
Display each frame in the "Show Video" window.
Exit the loop when the āqā key is pressed.
7. Release Resources
picam2.stop_preview()
picam2.stop()
cv2.destroyAllWindows()
Stop preview, release camera, close all OpenCV windows.
6. Pose Model Introductionļ
The MediaPipe Pose module returns 33 keypoints, covering areas like the head, torso, arms, and legs:
Body Part |
Index |
|---|---|
Nose |
0 |
Left/Right Shoulder |
11 / 12 |
Left/Right Elbow |
13 / 14 |
Left/Right Wrist |
15 / 16 |
Left/Right Hip |
23 / 24 |
Left/Right Knee |
25 / 26 |
Left/Right Ankle |
27 / 28 |
Left/Right Foot Index |
31 / 32 |
These points can be used for posture judgment, action counting (e.g., squats, push-ups, yoga pose detection), etc.
7. Performance and Tuningļ
Item |
Impact |
Optimization Suggestion |
|---|---|---|
Resolution |
Higher resolution increases accuracy but also latency |
Use 640x480 to balance performance and speed |
model_complexity |
Improves recognition accuracy but slows computation |
Recommended 1~2 for Raspberry Pi |
segmentation |
Increases GPU/CPU load |
Recommended to disable if background replacement is not needed |
8. Troubleshootingļ
No human detected
If the program runs but no person is detected, make sure the entire body is inside the camera frame. Avoid strong backlight and improve lighting conditions. Keep a distance of about 1ā2 meters from the camera for best results.
Video is slow or lagging
If the frame rate is low, try reducing the resolution to 640Ć480 or lower. Set
model_complexity = 1for better performance. Disable segmentation if it is not required, and close other background programs to free system resources.Segmentation fault occurs
Most segmentation faults are caused by a mismatch between the system architecture and the installed MediaPipe wheel.
Check your system architecture:
uname -mThe output should be
aarch64.If you see
armv7lorarmhf, you are using 32-bit Raspberry Pi OS, which is not compatible with the official MediaPipe wheel.You can also verify in Python:
import platform print(platform.machine())
The result must also be
aarch64.Using aarch64 but still getting segmentation fault
This may happen if some TensorFlow Lite XNNPACK kernels are not fully compatible with your MediaPipe build.
Possible solutions:
Use
model_complexity = 1(recommended in this tutorial).Make sure MediaPipe is installed in the correct virtual environment.
Install a Raspberry Piāoptimized wheel such as
mediapipe-bin(PINTO0309 version).
model_complexity = 2crashes but1worksComplexity 2 loads a larger model that may trigger advanced CPU optimizations. On Raspberry Pi, some optimized TensorFlow Lite kernels may not be fully supported. Complexity 1 avoids those kernels and is generally more stable and faster on Raspberry Pi.
9. Summaryļ
This chapter implemented real-time human skeleton detection based on MediaPipe Pose;
Pose provides 33 keypoints, usable in fields like fitness, posture analysis, action recognition;
By adjusting resolution and model complexity, smooth operation can be achieved on Raspberry Pi;
Based on these keypoints, we can subsequently develop:
Action recognition (e.g., āraising handā, āsquattingā)
Posture assessment (e.g., āIs sitting posture correct?ā)
Human interactive control.