Vision-triggered pancake robot runtime for the LeRobot SO-ARM101 / SO101 follower arm.
Panbot combines a deterministic Task 1 motion sequence, a shared vision-trigger camera, YOLO segmentation, GRU cook-state estimation, and pretrained LeRobot ACT policies into one end-to-end robot control pipeline.
| Task 1: Batter Pouring | Task 2: Pancake Flipping | Task 3: Pancake Plating |
|---|---|---|
![]() |
![]() |
![]() |
| Joint waypoint motion + YOLO stop trigger | ACT policy for flipping | ACT policy for plating |
| Task 2: Implicit Recovery | Task 3: Implicit Recovery |
|---|---|
![]() |
![]() |
Full runtime demo: Panbot Demo - Vision-Triggered Runtime (YOLO + GRU + ACT)
Important
Core idea: perception decides when the robot should move to the next stage. A single trigger camera watches the cooking state, while separate policy observation cameras feed the ACT policies during manipulation.
conda create -n panbot python=3.10 -y
conda activate panbot
pip install opencv-python torch torchvision
pip install lerobot
pip install ultralyticsNote
Install the PyTorch build that matches your CUDA and driver setup. GPU inference is recommended for the vision modules and ACT policy rollout.
Edit config/runtime.yaml before running:
paths:
corners: "Panbot/vision/calibration/corners.json"
yolo_model: "Panbot/vision/models/runs/batter_seg_local_v1/weights/best.pt"
gru_ckpt: "Panbot/vision/models/runs/resnet18_gru16_cls/best.pt"
vision:
cam_index: 8 # shared YOLO/GRU trigger camera
width: 3840
height: 2160
fps: 30
robot:
port: "/dev/ttyACM0"
cameras: # policy observation cameras, separate from vision.cam_index
right: { type: "opencv", index_or_path: 2, width: 640, height: 480, fps: 30, fourcc: "MJPG" }
left: { type: "opencv", index_or_path: 6, width: 640, height: 480, fps: 30, fourcc: "MJPG" }
global: { type: "opencv", index_or_path: 4, width: 640, height: 480, fps: 30, fourcc: "MJPG" }
wrist: { type: "opencv", index_or_path: 0, width: 640, height: 480, fps: 30, fourcc: "MJPG" }cd ~/Panbot
chmod +x scripts/start_all.sh
./scripts/start_all.shOr run the module directly:
python -m Panbot.control.main_runtime --config Panbot/config/runtime.yamlPanbot uses two camera paths:
| Camera path | Config key | Used by | Purpose |
|---|---|---|---|
| Trigger camera | vision.cam_index |
YOLO + GRU | Decide when to switch stages |
| Policy cameras | robot.cameras.* |
LeRobot ACT policies | Multi-view observations for action prediction |
Tip
Keep this distinction clear when debugging. A trigger camera problem usually appears as a stage-transition failure; a policy camera problem usually appears as poor or missing ACT actions.
Panbot/control/main_runtime.py is the main orchestrator. It connects the robot, opens the shared trigger camera, initializes YOLO and GRU inference, runs Task 1, waits for readiness, then executes the pretrained policies for Task 2 and Task 3.
| Stage | Controller | Transition |
|---|---|---|
| 1. Batter pouring | Task1MotionStepper |
YOLO segmentation detects enough batter coverage |
| 2. Return / hold | Task1MotionStepper + BasePoseController |
Task 1 return sequence completes |
| 3. Cook readiness wait | GRUInfer |
GRU trigger fires |
| 4. Pancake flipping | ACT policy 1 | Configured duration ends |
| 5. Pancake plating | ACT policy 2 | Configured duration ends |
| 6. Shutdown | Base pose + cleanup | Runtime exits cleanly |
connect_robot()
trigger_cam = open_camera(vision.cam_index)
task1.start_initial()
while not yolo.step(trigger_cam.frame).triggered:
task1.step()
task1.interrupt_to_return()
task1.run_return_sequence()
base_pose.hold()
while not gru.step(trigger_cam.frame).triggered:
base_pose.tick()
run_policy(policy1, duration=task.task2_duration_s) # Task 2: flipping
sleep(task.wait_task2_to_task3_s)
run_policy(policy2, duration=task.task3_duration_s) # Task 3: plating
return_to_base_pose()
shutdown_cleanly()Task 2 and Task 3 are not scripted waypoint motions. They are ACT policy rollouts trained from demonstrations, so recovery can emerge from the policy distribution when the scene drifts slightly from the nominal trajectory.
In practice, the policy can re-align after small perturbations such as imperfect pancake placement, minor contact mismatch, or a slightly shifted utensil interaction. The implicit recovery demos above show this behavior during flipping and plating.
Caution
Implicit recovery is not a formal safety guarantee. It is useful for robustness, but the robot should still be tested in a constrained workspace with a reachable emergency stop.
yolo_trigger:
conf: 0.25
imgsz: 640
area_thr_ratio: 0.12
hold_frames: 30
use_warp: true
gru_trigger:
image_size: 224
seq_len: 16
stride: 6
ema: 0.7
ready_hold: 3
use_warp: true| Key | Effect |
|---|---|
yolo_trigger.conf |
Segmentation confidence threshold |
yolo_trigger.area_thr_ratio |
Batter area ratio required for the Task 1 stop trigger |
yolo_trigger.hold_frames |
Number of stable frames before accepting the YOLO trigger |
gru_trigger.seq_len |
Temporal window length for cook-state inference |
gru_trigger.ema |
Confidence smoothing |
gru_trigger.ready_hold |
Stable ready frames required before triggering |
task:
hz: 30
policy_fps: 30
task2_duration_s: 90.0
task3_duration_s: 90.0
wait_task2_to_task3_s: 30.0Keep task.hz, vision.fps, and policy_fps aligned when possible. It makes trigger timing, policy rollout, and logs easier to reason about.
policies:
policy1:
repo_id: "ispaik06/act_panbot_task2_3"
policy2:
repo_id: "ispaik06/act_panbot_task3_3"Public assets:
| Asset | Link |
|---|---|
| Task 2 dataset | https://huggingface.co/datasets/ispaik06/Panbot_task2_dataset_3 |
| Task 3 dataset | https://huggingface.co/datasets/ispaik06/Panbot_task3_dataset_3 |
| ACT Task 2 checkpoint | https://huggingface.co/ispaik06/act_panbot_task2_3 |
| ACT Task 3 checkpoint | https://huggingface.co/ispaik06/act_panbot_task3_3 |
Panbot/
├─ config/
│ └─ runtime.yaml # Runtime config
├─ control/
│ └─ main_runtime.py # End-to-end orchestrator
├─ policies/
│ └─ common_policy_runner.py # LeRobot pretrained policy runner
├─ tasks/
│ ├─ base_pose.py # Base pose holding / stabilization
│ └─ task1_motion.py # Task 1 waypoint motion
├─ vision/
│ ├─ calibration/corners.json # Optional perspective warp corners
│ ├─ models/runs/... # YOLO and GRU checkpoints
│ └─ modules/
│ ├─ camera.py
│ ├─ gru_infer.py
│ └─ yoloseg_infer.py
├─ scripts/
│ └─ start_all.sh
├─ assets/
│ ├─ task1.gif
│ ├─ task2.gif
│ ├─ task3.gif
│ ├─ task2_implicit_recovery.gif
│ ├─ task3_implicit_recovery.gif
│ └─ panbot_pipeline.011.jpeg
└─ logs/
└─ main_runtime_YYYYMMDD_HHMMSS.log
This repository focuses on runtime orchestration: robot control, trigger handling, stage transitions, and policy execution.
For vision training, dataset generation, preprocessing, model export, and lower-level inference details, see:
Panbot_vision: https://github.com/ispaik06/Panbot_vision
If you change warp geometry, preprocessing, label mapping, checkpoint format, or trigger thresholds in the vision repository, update config/runtime.yaml accordingly.
| Symptom | Check |
|---|---|
| Trigger camera does not open | vision.cam_index, vision.backend, vision.mjpg, camera permissions |
| YOLO fires too early | Increase area_thr_ratio or hold_frames; raise conf |
| YOLO never fires | Lower area_thr_ratio; verify warp corners and model path |
| GRU trigger is late | Tune ema, ready_hold, seq_len, and stride |
| ACT policy does not move | Check policy repo_id, robot connection, and whether robot.send_action(...) is reached |
| Policy behavior is poor | Verify robot.cameras.* indices, camera resolution, observation names, and lighting |
Logs are written to log.dir, typically:
Panbot/logs/main_runtime_YYYYMMDD_HHMMSS.log





