This is Handpose ROS Integration Project.
This system reconstructs hand joint coordinate frames from 21 hand landmarks using google mediapipe.
It includes both a standalone script version and a ROS 2 version.
handpose-git-demo-video.mp4
This package integrates MediaPipe Hands with ROS 2 (Humble).
It detects hand landmarks from a camera stream, scales them into canonical and world coordinates, builds per-joint coordinate frames, and publishes them into the ROS TF system for downstream robotics applications (e.g., teleoperation, grasp planning).
- Apache 2.0
In this project, I didn't use depth camera.
So Z value of wrist is always zero.
So if you want to know Z value, make sure use depth module and take the wrist distance from camera. And add depth value to all joint's points
- MediaPipe Hands is used to get 21 hand landmarks per hand.
- Landmarks are published as normalized coordinates
(x, y, z).
- Normalized coordinates are multiplied by image width/height to obtain canonical pixel space coordinates.
- To approximate metric scale, the wrist–index MCP distance is assumed to be
0.08 m (80 mm)(based on author’s hand). - This scaling is applied uniformly, so that the hand size remains constant regardless of the screen position.
- The resulting transforms are suffixed with
world_abs.
- Each landmark is just a point, so local coordinate systems must be defined.
- Palm direction → Y axis
- Middle finger MCP direction → X axis
- Z axis is defined as the cross product, forming a right-handed system.
- Each finger has joints:
MCP → PIP → DIP → TIP. - For each joint:
- Project onto wrist XZ plane to determine Y axis.
- Joint-to-joint vector defines X axis.
- Z axis is set by cross product.
- For thumb MCP, an additional ~60° rotation about X axis is applied to better align with human thumb kinematics.
- All transforms are published into ROS TF tree.
Ubuntu 22.04.6 LTS
- Ubuntu 22.04 (ROS2 humble)
- Realsense camera or Leap Motion (TBD)
Using Python Packages with ROS 2
source {PATH_OF_YOUR_VIRTUAL_ENV}/bin/activatepip install -r requirements.txtHandPose packages is required for run this project.
cd ~
git clone https://github.com/DaeyunJang/Mediapipe-Hand-ROS2.git# activate python environment
source {PATH_OF_YOUR_VIRTUAL_ENV}/bin/activate
# build dependent package
cd ~/Mediapipe-Hand-ROS2
colcon build
./install/setup.bashros2 launch handpose_ros handpose_launch.pyhandpose_interfaces/Hands
| Field | Type | Description |
|---|---|---|
| hands | array<HandLandmarks> | Multiple detected hands |
handpose_interfaces/HandLandmarks
| Field | Type | Description |
|---|---|---|
| id | int | Hand index |
| label | string | left / right |
| score | float | Detection confidence |
| handed_index | int | Mediapipe internal index |
| width | int | Input image width |
| height | int | Input image height |
| landmarks_norm | float[] | Normalized landmarks |
| landmarks_canon | float[] | Canonical (pixel) landmarks |
| landmarks_world | float[] | World (metric) landmarks |
| Topic | Msg type | Description |
|---|---|---|
/hands/detections |
handpose_interfaces/Hands |
Detected hands (landmarks + metadata) |
/hands/points |
sensor_msgs/PointCloud2 |
All hand landmarks as PointCloud |
/hands/points/hand_left |
sensor_msgs/PointCloud2 |
Left hand landmarks |
/hands/points/hand_right |
sensor_msgs/PointCloud2 |
Right hand landmarks |
/mp_overlay_image |
sensor_msgs/Image |
Debug overlay image |
/tf |
tf2_msgs/TFMessage |
TF transforms of wrist & joints |
hand_{label}_{finger}_{joint}_{suffix} |
tf2_msgs/TFMessage |
Per-joint TFs (e.g. hand_left_index_mcp_world_abs) |
check handpose_ros/config/params.yaml
mediapipe_hands_node
| Parameter | Type | Default value | Description |
|---|---|---|---|
image_topic |
string | /camera/camera/color/image_raw |
Input RGB image topic |
max_num_hands |
int | 2 | Max hands to detect |
min_detection_confidence |
float | 0.95 | Detection confidence threshold |
min_tracking_confidence |
float | 0.95 | Tracking confidence threshold |
draw |
bool | False | Overlay debug image |
flip_image |
bool | True | Flip input image horizontally |
use_pointcloud |
bool | True | Publish PointCloud2 |
handpose_tf_broadcaster
| Parameter | Type | Default value | Description |
|---|---|---|---|
hands_topic |
string | hands/detections |
Input landmark topic |
use_depth |
bool | False | Use depth info |
depth_topic |
string | /camera/aligned_depth_to_color/image_raw |
Depth image topic |
camera_info_topic |
string | /camera/camera/color/camera_info |
Camera info topic |
camera_frame |
string | camera_color_optical_frame |
TF frame name |
tf.norm.enable |
bool | False | Enable normalized TF |
tf.canonical.enable |
bool | False | Enable canonical TF |
tf.canonical_norm.enable |
bool | True | Enable canonical normalized TF |
tf.canonical_norm.scale |
float | 1/1280 |
Scale factor |
tf.world_absolute_scale.enable |
bool | True | Enable world absolute scaling |
tf.world_absolute_scale.target_length |
float | 0.06 | Wrist–MCP reference length (m) |
tf.world_absolute_scale.finger_name |
string | index |
Reference finger |
tf.world_absolute_scale.joint_name |
string | mcp |
Reference joint |
tf.world_absolute_scale.eps |
float | 1e-6 | Numerical stability |
tf.world_absolute_scale.EMA_smooth_alpha |
float | 0.3 | Exponential smoothing alpha |
tf.world_absolute_scale.suffix |
string | world_abs |
TF suffix |
tf.world_absolute_scale.max_scale_step |
float | 0.0 | Per-frame clamp |