Yiyuan Pan*, Xusheng Luo*, Hanjiang Hu, Peiqi Yu, Changliu Liu
Robotics Institute, Carnegie Mellon University
* Equal contribution
- Reference implementation (current): This repository is a lightweight, simplified release of ENAP. It implements the full pipeline on a custom ManiSkill
PegInsertionSide-v1setup (Versus stock ManiSkill, the main change is dual-end insertion: either the orange (head) or white (tail) end can satisfy the success condition.) Additional tasks, environments, and experiment scripts aligned with the full paper will be uploaded in future updates.
Scaling robot learning to long-horizon tasks remains a formidable challenge. While end-to-end policies often lack the structural priors needed for effective long-term reasoning, traditional neuro-symbolic methods rely heavily on hand-crafted symbolic priors. To address the issue, we introduce ENAP (Emergent Neural Automaton Policy), a framework that allows a bi-level neuro-symbolic policy to adaptively emerge from demonstrations. Specifically, we first employ adaptive clustering and an extension of the L* algorithm to infer a Mealy state machine from visuomotor data, which serves as an interpretable high-level planner capturing latent task modes. Then, this discrete structure guides a low-level reactive residual network to learn precise continuous control via behavior cloning. By explicitly modeling the task policy with discrete transitions and continuous residuals, ENAP achieves high sample efficiency and interpretability without requiring task-specific labels. Extensive experiments on complex manipulation and long-horizon tasks demonstrate that ENAP outperforms state-of-the-art end-to-end VLA policies by up to 27% in low-data regimes, while offering a structured representation of robotic intent.
Figure 1. ENAP follows a three-stage pipeline—(i) symbol abstraction, (ii) structure extraction via an extended L*, and (iii) bi-level control—to learn structured policies from demonstrations.
This repository implements ENAP using ManiSkill for visuomotor simulation and control. The pipeline learns a discrete PMM structure from demonstrations, then trains a residual policy conditioned on that structure.
We recommend a Conda environment with Python 3.10. From the repository root:
bash setup_enap_env.sh
conda activate enap_envThe script creates replaces ManiSkill’s stock peg_insertion_side.py with our customized task file peg_insertion_side_replace.py (required for experiments in this codebase).
Install any additional packages your workflow needs (e.g. tyro, matplotlib, tqdm) via pip as you run the scripts.
-
Demonstrations and Clustering
Place trajectory data in the format expected by the preprocessing script:python data/preprocess.py --pkl <path_to_trajectories.pkl> [--output-pkl data/episodes_with_states.pkl]
This produces
data/episodes_with_states.pklwith per-step cluster labels and centers. -
RNN pretraining and residual training
bash scripts/train/train.sh
This runs
scripts/train/rnn_train.pythenscripts/train/residual_train.pywith their default CLI arguments. Checkpoints and artifacts are written underresults/andresults/checkpoints/.
python scripts/eval/peg_insert_eval.pyIf you find this work useful, please cite our paper:
@article{pan2026enap,
title={Emergent Neural Automaton Policies: Learning Symbolic Structure from Visuomotor Trajectories},
author={Pan, Yiyuan and Luo, Xusheng and Hu, Hanjiang and Yu, Peiqi and Liu, Changliu},
journal={arXiv preprint arXiv:2603.25903},
year={2026}
}Paper: https://arxiv.org/abs/2603.25903
This code builds on ManiSkill. We thank the ManiSkill team for the simulation stack and APIs.
