GaTech-RL2 · rl2aloha · May 8, 2026 · May 8, 2026
diff --git a/ROLLOUT_INSTALL.md b/ROLLOUT_INSTALL.md
@@ -0,0 +1,280 @@
+# Rollout install (Python 3.10)
+
+End-to-end setup for running policy rollouts on a real robot. The training stack
+(zarr, lerobot, etc.) is intentionally **not** required here — this guide is the
+minimum set of dependencies to load a checkpoint and drive an ARX arm.
+
+## Why 3.10
+
+The Stanford ARX `arx5_interface` C extension is shipped as a
+`cpython-310-x86_64-linux-gnu.so`, so the rollout host must run Python 3.10.
+The Docker image we use already has 3.10 system-wide and CUDA / ROS / ARX
+pre-built against it; this guide installs everything into that same
+interpreter rather than into a venv.
+
+openpi upstream is developed on 3.11 and pulls in lerobot, which transitively
+requires FFmpeg 7. Both are problems on 3.10 + Ubuntu 22.04, so we use a
+rollout-only fork of openpi with two small patches (Python version + lerobot
+removed). See `external/openpi` instructions below.
+
+## Prerequisites
+
+- Docker container based on the team's Eva image (system Python 3.10, CUDA 12.x,
+  ROS Humble at `/opt/ros/humble`, ARX prebuilt under
+  `/opt/ros/humble/lib/python3.10/site-packages/arx5/`).
+- A working `python3 -c "import jax; print(jax.devices())"` returning a CUDA
+  device (JAX is needed for the optional checkpoint conversion path).
+- A Hugging Face access token with read access to
+  [`google/paligemma-3b-mix-224`](https://huggingface.co/google/paligemma-3b-mix-224).
+  This repo is **gated** — you must accept the license on the model page once,
+  then mint a token. The collate function in `pl_data_utils.py` downloads the
+  PaliGemma tokenizer at rollout startup, so without a token rollout aborts
+  with a `GatedRepoError`.
+
+## Branches
+
+This guide assumes:
+
+- **egomimic** — checked out on `elmo/pi-rollout-local` (the rollout-host
+  branch carrying `rollout-requirements.txt`, `uv.lock`, and the relaxed
+  `requires-python` in `pyproject.toml`). It stacks on `elmo/pi-rollout-fix`,
+  which is the mergeable rollout-code branch.
+- **external/openpi** — currently in a **detached-HEAD** state at upstream
+  commit `981483d` ("use EGL for headless GPU rendering in libero example",
+  PR #837) with the patches below applied as **uncommitted working-tree
+  edits**, plus an untracked `scripts/patch_transformers.py`. The fork
+  (`https://github.com/GaTech-RL2/openpi`) does have a `pi-rollout-changes`
+  branch with overlapping patches, but it diverges from this host's working
+  tree (it adds `chex`, loosens `numpy`/`opencv-python`, patches
+  `pi0_pytorch.py`, and **does not** carry `.python-version`,
+  `packages/openpi-client/pyproject.toml`, `uv.lock`, or
+  `patch_transformers.py`). Treat this guide — not the fork branch — as the
+  source of truth for what's running here.
+
+### openpi patches to apply on top of `981483d`
+
+After cloning and checking out commit `981483d`, edit these files:
+
+- **`.python-version`** — replace `3.11` with `3.10`.
+- **`pyproject.toml`**:
+  - In `[project]`, delete the `requires-python = ">=3.11"` line entirely
+    (the rollout install pins via the host interpreter; no replacement
+    needed).
+  - In `dependencies`, delete the `"lerobot",` entry.
+  - In `[tool.uv.sources]`, delete the
+    `lerobot = { git = "https://github.com/huggingface/lerobot", rev = "..." }`
+    entry.
+- **`packages/openpi-client/pyproject.toml`** — raise `requires-python`
+  from `>=3.7` to `>=3.10`.
+- **`src/openpi/shared/download.py`** — replace `datetime.UTC` with
+  `datetime.timezone.utc` (3.10 doesn't have `datetime.UTC`).
+- **`uv.lock`** — regenerate by running `uv lock` from `external/openpi`
+  after the `pyproject.toml` edits above (Step 3 covers this).
+- **`scripts/patch_transformers.py`** — this script is not in upstream;
+  copy it from a working rollout host or recreate it (see Step 4 for what
+  it does — copies a handful of replacement files from
+  `models_pytorch/transformers_replace/` into the active `transformers`
+  package on disk).
+
+These patches are not currently published as a clean fork branch. If you
+finish a rollout setup and want to make this reproducible without hand-
+editing, push the result to your fork as e.g. `pi-rollout-host` and update
+this section to reference it.
+
+## Step 1 — clone with submodules
+
+```bash
+git clone <egomimic-repo>
+cd egomimic
+git checkout elmo/pi-rollout-local       # stacks on elmo/pi-rollout-fix
+git submodule update --init --recursive external/openpi
+git -C external/openpi checkout 981483d  # upstream tip the patches apply to
+```
+
+`external/openpi` ends up detached at `981483d` and pulls in its own nested
+submodules (`third_party/aloha`, `third_party/libero`). Apply the patches
+listed under "openpi patches to apply on top of `981483d`" before
+proceeding.
+
+## Step 2 — install egomimic rollout deps
+
+Install into system Python 3.10 (not a venv — see "Why 3.10").
+
+```bash
+pip install -r rollout-requirements.txt
+pip install -e .
+```
+
+`rollout-requirements.txt` is the trimmed sibling of `requirements.txt` — no
+`zarr` (training-only). `pyproject.toml` has its `requires-python` relaxed to
+`>=3.10` and `zarr` removed from `dependencies`, matching the rollout-branch
+edits.
+
+## Step 3 — install openpi deps
+
+We use uv to resolve openpi's lockfile against 3.10, then hand the constraints
+to pip so we install into the system interpreter.
+
+```bash
+# uv only used for resolution
+curl -LsSf https://astral.sh/uv/install.sh | sh   # if not already present
+
+cd external/openpi
+uv lock                                            # rewrites uv.lock for 3.10
+uv export --no-hashes --no-emit-workspace --quiet > /tmp/openpi-constraints.txt
+
+pip install -e . -c /tmp/openpi-constraints.txt
+```
+
+`--no-emit-workspace` strips editable workspace entries that pip's `-c` would
+choke on. The first `uv lock` is required because the bundled `uv.lock` was
+generated against 3.11 and contains lerobot transitives.
+
+## Step 4 — patch transformers
+
+openpi ships replacement files for a handful of transformers internals (Gemma
+attention, etc.). They have to be copied into the active transformers package
+on disk.
+
+```bash
+cd external/openpi
+python3 scripts/patch_transformers.py
+```
+
+The script auto-resolves the destination via `importlib.util.find_spec`, so it
+works regardless of where transformers lives.
+
+## Step 5 — pin torchvision to match torch
+
+openpi pins `torch==2.7.1`. The pre-existing system `torchvision` was built
+against an older torch, so its `_meta_registrations` references C++ ops that
+no longer exist on 2.7.1 (`RuntimeError: operator torchvision::nms does not
+exist`). Pin to the matching wheel:
+
+```bash
+pip install torchvision==0.22.1
+```
+
+If pip can't find a wheel, force the PyTorch index:
+
+```bash
+pip install torchvision==0.22.1 \
+  --index-url https://download.pytorch.org/whl/cu124
+```
+
+(Replace `cu124` with whatever `python3 -c 'import torch; print(torch.version.cuda)'`
+prints — `12.1` → `cu121`, etc.)
+
+## Step 6 — expose ARX to the active interpreter
+
+The arx5 binding lives at `/opt/ros/humble/lib/python3.10/site-packages/arx5/`,
+which is on `PYTHONPATH` only when ROS is sourced. For the rollout we add a
+`.pth` file so it's visible whether or not the user remembers to source ROS:
+
+```bash
+echo "/opt/ros/humble/lib/python3.10/site-packages" \
+  > $(python3 -c "import sysconfig; print(sysconfig.get_paths()['purelib'])")/ros_arx5.pth
+
+python3 -c "import arx5.arx5_interface; print('arx5 OK')"
+```
+
+If `arx5` imports cleanly but ARX initialization later fails to find native
+shared libs (e.g. `libArxJointController.so`), source ROS once per shell so
+`LD_LIBRARY_PATH` is populated:
+
+```bash
+source /opt/ros/humble/setup.bash
+```
+
+The `.pth` fix only handles Python-level imports; LD lookups for the C++
+libraries still need the env var.
+
+## Step 7 — Hugging Face authentication
+
+Required because the rollout's collate function instantiates a PaliGemma
+tokenizer, and `google/paligemma-3b-mix-224` is a gated repo.
+
+1. Visit https://huggingface.co/google/paligemma-3b-mix-224, click "Agree and
+   access repository" (one-time per HF account).
+2. Mint a read token at https://huggingface.co/settings/tokens.
+3. Authenticate the host:
+
+   ```bash
+   huggingface-cli login
+   # or set non-interactively:
+   export HF_TOKEN=hf_xxxxxxxxxxxxxxxx
+   ```
+
+   `transformers.AutoTokenizer.from_pretrained` reads `HF_TOKEN` (or the cached
+   credentials from `huggingface-cli login`) automatically.
+
+Sanity check:
+
+```bash
+python3 -c "from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('google/paligemma-3b-mix-224'); print('hf OK')"
+```
+
+## Step 8 — get the PyTorch checkpoint
+
+The rollout loads a converted PyTorch checkpoint. There are two paths:
+
+### 8a — rsync from cloud (preferred)
+
+If a teammate has already converted weights for the same openpi commit, just
+copy the directory:
+
+```bash
+rsync -av <cloud-host>:/path/to/pi05_base_pytorch/ \
+       egomimic/algo/pi_checkpoints/pi05_base_pytorch/
+```
+
+The output is a deterministic function of the JAX checkpoint + openpi
+revision, so a copy from another machine is byte-identical to running the
+conversion locally. `safetensors` files are torch-version-independent.
+
+### 8b — convert from JAX locally
+
+Only needed if you don't have a pre-converted copy. The JAX checkpoint
+(`pi05_base/`) is fetched from the team's GCS bucket; you'll need `gsutil`
+configured. Then:
+
+```bash
+cd external/openpi
+python3 examples/convert_jax_model_to_pytorch.py \
+  --config_name pi05_aloha \
+  --checkpoint_dir ../../egomimic/algo/pi_checkpoints/pi05_base \
+  --output_path ../../egomimic/algo/pi_checkpoints/pi05_base_pytorch
+```
+
+The conversion needs JAX with CUDA (12 GB GPU is enough), the patched
+transformers from Step 4, and a working torchvision from Step 5.
+
+## Verification
+
+```bash
+python3 -c "
+import torch, torchvision, transformers, openpi
+import openpi.models.pi0_config
+import openpi.models_pytorch.pi0_pytorch
+import arx5.arx5_interface
+print('torch', torch.__version__, 'torchvision', torchvision.__version__)
+print('transformers', transformers.__version__)
+print('openpi', openpi.__file__)
+print('arx5', arx5.arx5_interface.__file__)
+print('all OK')
+"
+```
+
+All five lines should print without errors.
+
+## Known incompatibilities to keep an eye on
+
+- `datetime.UTC`, `Self` (from `typing`), and `type X = ...` syntax are 3.11+.
+  We've patched the one occurrence in
+  `src/openpi/shared/download.py`. If you `git pull` upstream openpi changes,
+  re-grep for `datetime.UTC` and similar.
+- `transformers` major-version bumps occasionally invalidate the
+  `models_pytorch/transformers_replace/` files. Re-run
+  `scripts/patch_transformers.py` after any transformers upgrade.
+- The `pynvml` deprecation warning at startup is harmless and unrelated;
+  ignore it.
diff --git a/pull_models.sh b/pull_models.sh
@@ -3,7 +3,7 @@ set -euo pipefail
 
 # ====== config (edit these) ======
 REMOTE_USER_HOST="paphiwetsa3@login-phoenix.pace.gatech.edu"
-REMOTE_PATH="/storage/home/hcoda1/4/paphiwetsa3/r-dxu345-0/projects/EgoVerse/logs/pick_place/"
+REMOTE_PATH="/storage/home/hcoda1/4/paphiwetsa3/r-dxu345-0/projects/EgoVerse4/logs/pick_place/"
 LOCAL_PATH="./egomimic/robot/models/"
 # =================================
 
@@ -21,4 +21,4 @@ env -u LD_LIBRARY_PATH -u CONDA_PREFIX -u MAMBA_ROOT_PREFIX \
   --exclude='***/0/videos/***' \
   --exclude='***/0/wandb/***' \
   "${REMOTE_USER_HOST}:${REMOTE_PATH%/}/" \
-  "${LOCAL_PATH%/}/"
+  "${LOCAL_PATH%/}/"
diff --git a/pyproject.toml b/pyproject.toml
@@ -6,7 +6,7 @@ build-backend = "setuptools.build_meta"
 name = "egomimic"
 version = "1.0"
 description = "Egomimic Scripts Package"
-requires-python = ">=3.11"
+requires-python = ">=3.10"
 dependencies = [
     "torch==2.7.1",
     "torchvision==0.22.1",
@@ -40,7 +40,6 @@ dependencies = [
     "argparse",
     "pandas",
     "overrides",
-    "zarr==3.1.5",
     "tabulate",
     "transformers==4.53.2",
     "timm",

diff --git a/rollout-requirements.txt b/rollout-requirements.txt
@@ -0,0 +1,60 @@
+torch==2.6.0
+torchvision==0.21.0
+projectaria-tools[all]==2.0.0
+pyyaml
+matplotlib
+packaging
+h5py
+ipython
+rich
+wandb
+hydra-core
+hydra-submitit-launcher==1.2.0
+black
+gpustat
+pynvml
+termcolor
+pyquaternion
+rospkg
+einops
+av==12.0.0
+opencv-python
+dm-control==1.0.8
+mink==0.0.13
+mujoco==3.4.0
+mujoco-py==2.1.2.14
+submitit
+arm_pytorch_utilities
+pytorch-kinematics
+lightning
+positional-encodings[pytorch]
+argparse
+pandas
+overrides
+tabulate
+transformers==4.57.3
+timm
+boto3
+cloudpathlib
+sqlalchemy
+psycopg
+ray
+geomloss
+tslearn
+sqlalchemy
+psycopg[binary]
+boto3
+typing_extensions
+pyarrow
+simplejpeg
+prettytable
+datasets==4.0.0
+s5cmd
+mediapy
+pytest
+pre-commit
+ruff
+scaleapi
+openai
+decord
+scale_sensor_fusion_io