Skip to content

airoa-org/airoa-evaluation-ICRA

Repository files navigation

airoa-evaluation-ICRA — sample-openpi branch

⚠️ You are on the sample-openpi branch. This branch ships the OpenPI loader pre-wired into server/serve_hsr_policy_ws.py and the full src/openpi/ source tree (~14k lines). It is only useful if your model is PI0Pytorch / OpenPI-compatible. For any other framework (PyTorch / JAX / LeRobot / custom), fork from the base branch instead — it gives you a clean minimal starting point with a ZeroPolicy placeholder so the smoke test passes immediately.

git checkout base   # recommended for non-OpenPI submissions

Participant evaluation runtime for the ICRA 2026 AIRoA VLA Workshop Competition.

For participants: work from a fork of this repository, edit server/ and src/, pass the local smoke test, then submit.


1. TL;DR (this branch — OpenPI sample)

# 1. Fork airoa-org/airoa-evaluation-ICRA on GitHub.
#    For non-OpenPI submissions switch to the `base` branch first.
git clone https://github.com/<you>/airoa-evaluation-ICRA.git
cd airoa-evaluation-ICRA
git checkout sample-openpi          # this branch — OpenPI sample
# git checkout base                 # alternative: minimal starting point
git checkout -b feat/my-policy

# 2. Put your model code under src/<your_policy>/
# 3. Edit server/serve_hsr_policy_ws.py to load your policy (see Integration Guide §3)
# 4. Edit server/Dockerfile if you need extra dependencies
# 5. Package your checkpoint as a DIRECTORY, not a single .pt file

# 6. Smoke test locally (mandatory before submission):
export POLICY_CHECKPOINT_PATH=/abs/path/to/your_checkpoint_dir
./RUN-DOCKER-CONTAINER.sh up
./RUN-DOCKER-CONTAINER.sh shell           # open client container shell
# Inside the container:
roslaunch hsr_policy_client hsr_policy_client.launch
# Expect "Action executed." in the logs.

# 7. Upload checkpoint to the S3 bucket you were given, then share with the organizers:
#    - fork URL
#    - branch name
#    - a short note on how to run (S3 checkpoint path, env vars, any special setup)
#      → use docs/REPRODUCTION_STEPS.template.md as the structure for your note

If any step above is unclear, read docs/INTEGRATION_GUIDE.md before asking. Most questions we receive are already answered there.


2. Host Requirements

  • Linux (verified on Ubuntu 24.04)
  • Docker Engine + Docker Compose v2
  • NVIDIA driver + NVIDIA Container Toolkit
  • NVIDIA GPU with ≥ 16 GB VRAM
docker --version
docker compose version
nvidia-smi

Verified environment (the machine your submission will run on):

  • OS: Ubuntu 24.04.3 LTS
  • GPU: NVIDIA GeForce RTX 5070 Ti (Blackwell, 16 GB)
  • NVIDIA driver: 580.126.09
  • Docker: 29.0.1
  • Docker Compose: v2.40.3

⚠️ Note: the evaluation GPU is Blackwell (sm_120). PyTorch/CUDA wheels must support it. See Integration Guide §8.


3. What You May Edit

Path Edit? Purpose
server/serve_hsr_policy_ws.py Yes — required Instantiate your policy and pass it to WebsocketPolicyServer
server/Dockerfile, server/entrypoint.sh Yes Install your Python dependencies; tweak server startup
src/<your_policy>/… Yes Your model implementation
packages/<your_package>/… Yes, if needed Additional local packages
pyproject.toml / uv.lock Yes, if you add deps Workspace + lock
docker-compose.yml Discouraged but allowed Only to add env vars / volumes needed by your server. Document the change.
client/Dockerfile Discouraged but allowed Only for hardware compatibility (e.g. CUDA base image). Document the change.
deploy/hsr_policy_client/** No Pipeline-managed client logic
runtime_core/**, packages/policy-client/** No WebSocket server & client protocol
RUN-DOCKER-CONTAINER.sh No Harness entrypoint

If you must change a "discouraged" file, mention it in your submission note (see REPRODUCTION_STEPS.template.md) so the evaluators can reproduce your run.


4. Submission Requirements

Your submission must satisfy all of the following. The evaluator will check each one.

  • Submitted from a fork of airoa-org/airoa-evaluation-ICRA (not a separate repo)
  • server/serve_hsr_policy_ws.py instantiates your policy object that exposes policy.infer(obs) -> dict
  • Your policy accepts the observation dict documented in §5 and returns {"actions": np.ndarray of shape (T, 11), dtype float32}
  • Checkpoint is packaged as a directory (not a bare .pt / .safetensors file)
  • Dependencies installed via server/Dockerfile (don't rely on host Python)
  • Local smoke test passes: ./RUN-DOCKER-CONTAINER.sh uproslaunch hsr_policy_client hsr_policy_client.launch prints Action executed.
  • Checkpoint uploaded to the S3 bucket/path provided by organizers
  • Submission note prepared (fork URL, branch, how to run) — docs/REPRODUCTION_STEPS.template.md is a recommended structure

At submission time, share three things with the organizers:

  1. Fork URL (e.g. https://github.com/<you>/airoa-evaluation-ICRA)
  2. Branch name (e.g. feat/my-policy)
  3. Note — how to run it: S3 checkpoint path, required env vars, any special setup. Using docs/REPRODUCTION_STEPS.template.md as the structure is strongly recommended, since it already includes every field the evaluator needs.

💡 Committing a REPRODUCTION_STEPS.md (based on the template) to your branch and pasting its link as your note is the cleanest option. Pinning the commit hash there also helps the evaluator reproduce the exact state.


5. WebSocket I/O Contract

The server wraps your policy object and exposes a single method over WebSocket:

policy.infer(obs) -> dict

Observation dict (what the client sends, already deserialized into numpy):

{
  "head_rgb": np.ndarray,  # (480, 640, 3) uint8
  "hand_rgb": np.ndarray,  # (480, 640, 3) uint8
  "state":    np.ndarray,  # (8,)          float32
  "prompt":   str,
}

state order (8 dims):

[arm_lift_joint, arm_flex_joint, arm_roll_joint,
 wrist_flex_joint, wrist_roll_joint, gripper,
 head_pan_joint, head_tilt_joint]

Action dict (what your policy returns):

{"actions": np.ndarray}   # shape (T, 11), dtype float32, T >= 1, all finite

actions order (11 dims):

[arm_lift_joint, arm_flex_joint, arm_roll_joint,
 wrist_flex_joint, wrist_roll_joint, gripper,
 head_pan_joint, head_tilt_joint,
 base_x, base_y, base_t]

Common mistakes: returning a 1-D (11,) array, 32-dim state, calling the method predict() instead of infer(). See FAQ §3.


6. Environment Variables Consumed by the Pipeline

Only these are read by server/entrypoint.sh. Other names are silently ignored — it's a common source of confusion.

Variable Required? Purpose
POLICY_CHECKPOINT_PATH Yes Absolute path to your checkpoint directory (mounted as /policy_checkpoint in the container)
POLICY_PYTORCH_DEVICE Optional E.g. cuda
POLICY_CONFIG_NAME Optional OpenPI config name (only if you use the default OpenPI loader)

7. Test Flow

export POLICY_CHECKPOINT_PATH=/abs/path/to/checkpoint_dir
./RUN-DOCKER-CONTAINER.sh up               # builds & starts containers (uses TEST_MODE=true)
./RUN-DOCKER-CONTAINER.sh shell            # opens client container shell
roslaunch hsr_policy_client hsr_policy_client.launch    # sends synthetic observations
./RUN-DOCKER-CONTAINER.sh logs policy_server
./RUN-DOCKER-CONTAINER.sh logs hsr_client
./RUN-DOCKER-CONTAINER.sh down

In test_mode=true (the default), the client generates synthetic random head_rgb, hand_rgb, state, and sends them to the server in a loop. A healthy server prints Action executed. for each inference.

If the smoke test fails on your laptop, the remote evaluation will fail the same way. Do not skip this step.


8. Getting Help

  1. Read docs/INTEGRATION_GUIDE.md / _ja end to end.
  2. Check docs/FAQ.md / _ja.
  3. Run the smoke test locally and capture full logs before asking.
  4. Contact organizers on the designated channel with: fork URL + branch + commit hash + full docker compose logs output. (For bug reports, the commit hash matters since branches move.)

License

See LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages