FA-OPD

Train with an expressive Flow Matching teacher. Deploy a lightweight MLP policy.

Paper · Quick start · Method · Experiments · Acknowledgements · Citation

What This Repository Provides

FA-OPD is the official implementation of Adversarial Dual On-Policy Distillation from Expressive Flow-based Teacher. It targets learning-from-demonstrations settings where reward design is difficult, demonstrations are available, and the deployed controller must remain cheap.

The repository includes:

the FA-OPD training loop and Flow Matching teacher modules;
PPO-compatible MLP student policies for efficient deployment;
diffusion, Flow Matching, GAIL/AIRL/WAIL/DRAIL, and online FM-policy baselines;
expert demonstrations tracked with Git LFS;
W&B sweep configs for navigation, manipulation, and locomotion benchmarks.

Method

FA-OPD uses the Flow Matching model as a training-time teacher, not as the deployed actor.

Signal	Teacher role	Student update	Effect
Reward distillation	FM-enhanced discriminator scores each visited state-action pair	PPO maximizes expert-likeness rewards	Online improvement and exploration
Action distillation	FM generator supplies target actions at student-visited states	MLP regresses onto teacher actions	Dense local correction and stability
Deployment	Teacher removed	MLP forward pass only	Low-latency control

The result is a compact student policy that benefits from an expressive teacher during training without paying the flow-inference cost at deployment.

Repository Map

FA-OPD/
├── faopd/
│   ├── main.py              # experiment entry point
│   ├── faopd_algo.py        # FA-OPD algorithm
│   ├── flow_matching/       # FM teacher and discriminator
│   ├── ddpm/                # diffusion-policy baseline modules
│   ├── fm_a2c.py            # online FM-policy A2C baseline
│   ├── fm_ppo.py            # online FM-policy PPO baseline
│   └── fpo_algo.py          # FM policy-gradient baseline
├── configs/                 # W&B sweep configs
├── expert_datasets/         # demonstrations tracked with Git LFS
├── goal_prox/               # customized environments
├── rl-toolkit/              # RL and imitation-learning infrastructure
├── d4rl/                    # Maze2D dependency
├── utils/                   # setup and logging helpers
└── assets/                  # README figures

Installation

The experiments were run with Python 3.8, CUDA 11.7, PyTorch 1.13, MuJoCo, W&B, and Git LFS.

git lfs install
git clone https://github.com/vanzll/FA-OPD.git
cd FA-OPD
git lfs pull

conda create -n faopd python=3.8
conda activate faopd
bash utils/setup.sh

For MuJoCo build issues:

sudo apt-get install libglew-dev libosmesa6-dev
conda install -c conda-forge gcc=12.1.0

Quick Start

All runs are launched from W&B sweep configs.

wandb login
bash utils/wandb.sh configs/ant/faopd.yaml

FA-OPD configs:

configs/ant/faopd.yaml
configs/hand/faopd.yaml
configs/pick/faopd.yaml
configs/maze/faopd.yaml
configs/hopper/faopd.yaml
configs/walker/faopd.yaml

Baseline configs live in the same task folders:

airl.yaml
diffusion-policy.yaml
drail.yaml
fm_policy.yaml
gail.yaml
wail.yaml

Maze2D also includes online FM-policy baselines:

fm-a2c.yaml
fm-ppo.yaml
fpo.yaml

Logs are written to:

data/log/<environment>_<method>/<seed>/metrics.csv

Demonstrations

The demonstration tensors are stored in expert_datasets/:

File	Task family
`ant_50.pt`	Navigation
`maze2d_25.pt`	Navigation
`hand_10000.pt`	Manipulation
`pick_10000.pt`	Manipulation
`ppo_hopper_1.pt`	Locomotion
`ppo_walker_1.pt`	Locomotion

If a file is a small Git LFS pointer instead of a tensor, run git lfs pull.

Experiments

Method	Ant-goal	Maze2d	Hand-rotate	Fetch-pick	Hopper	Walker2d
DRAIL	0.7142	0.7780	0.7775	0.7052	3182.60	3122.69
GAIL	0.6465	0.6902	0.9317	0.2798	2921.73	1698.25
WAIL	0.6127	0.2978	0.2370	0.0000	2609.28	1729.20
VAIL	0.7662	0.6360	0.5694	0.8539	2878.04	1156.52
AIRL	0.5467	0.8239	0.4595	0.0000	7.86	-5.27
Diffusion Policy	0.8212	0.5618	0.9068	0.8298	1433.21	2204.41
Flow Matching Policy	0.8334	0.5420	0.9032	0.5460	1950.41	2384.81
FA-OPD	0.8225	0.8731	0.9794	0.9984	3358.95	4164.24

Navigation and manipulation tasks report average success rate. Locomotion tasks report average return.

Implementation Notes

Main entry point: faopd/main.py
Proposed method key: faopd
Deployed actor: MLP student policy
Training-only teacher: Flow Matching discriminator/generator
Experiment launcher: utils/wandb.sh

Third-party components remain in their upstream-style subdirectories and retain their own licensing terms.

Acknowledgements

This repository builds on several open-source research codebases. In particular, we thank the authors of DRAIL for releasing their official implementation, which informed the DRAIL baseline integration used in our comparisons.

Citation

@inproceedings{wan2026faopd,
  title     = {Adversarial Dual On-Policy Distillation from Expressive Flow-based Teacher},
  author    = {Wan, Zhenglin and Wu, Jingxuan and Yu, Xingrui and Zhang, Chubin and Lei, Mingcong and An, Bo and Tsang, Ivor W. and You, Yang},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning},
  eprint    = {2605.27095},
  archivePrefix = {arXiv},
  primaryClass  = {cs.LG},
  url       = {https://arxiv.org/abs/2605.27095},
  year      = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
configs		configs
d4rl		d4rl
expert_datasets		expert_datasets
faopd		faopd
goal_prox		goal_prox
pytorch_sac		pytorch_sac
rl-toolkit		rl-toolkit
shape_env		shape_env
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
README.md		README.md
config.yaml		config.yaml
framework_detailed.png		framework_detailed.png
framework_overview.png		framework_overview.png
performance.png		performance.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FA-OPD

Train with an expressive Flow Matching teacher. Deploy a lightweight MLP policy.

What This Repository Provides

Method

Repository Map

Installation

Quick Start

Demonstrations

Experiments

Implementation Notes

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FA-OPD

Train with an expressive Flow Matching teacher. Deploy a lightweight MLP policy.

What This Repository Provides

Method

Repository Map

Installation

Quick Start

Demonstrations

Experiments

Implementation Notes

Acknowledgements

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages