Nano-vLLM Engine Single-Task RL Environment

I wrote this prototype around the nano-vllm engine layer from GeeeekExplorer/nano-vllm rather than around my earlier custom mini-decoder. If I want a single-task RL environment that is still representative of modern LLM inference, this is where the interesting control-flow problems live: request scheduling, prefix caching, block ownership, cache pressure, and decode lifecycle.

I intentionally did not try to vendor the entire upstream project. For this environment, I only need the engine slice and a lightweight deterministic runner. That keeps the task judgeable and self-contained while still making it feel like real modern inference-system work instead of a toy exercise.

Upstream Basis

This prototype is adapted from the MIT-licensed nano-vllm project:

Repository: https://github.com/GeeeekExplorer/nano-vllm
Engine subtree used as the code basis: nanovllm/engine

What The Task Is

The task is to repair a bug in an adapted local copy of the nanovllm engine implementation.

The bug is in prefill admission under prefix caching. The baseline engine correctly stores reusable full prompt blocks, but its scheduler still budgets prefill work using total prompt length instead of uncached prompt length. Under tight max_num_batched_tokens, that causes the engine to reject or stall requests even when the cache should allow them to proceed.

I like this version much more because it is closer to the real shape of modern inference systems work:

the solver has to reason about the scheduler and block manager together
the bug only appears under realistic shared-prefix workloads
the judge can stay deterministic and behavioral
the environment is still small enough to understand in one sitting

Layout

TASK_PROMPT.md: the formal task statement I would hand to the agent.
task_repo_baseline/: the buggy adapted local nanovllm package the agent is allowed to modify.
judge_assets/: the trusted evaluation side, including hidden tests and a corrected reference implementation.
simple_examples/: example patches that show the evaluator working end to end.
run_eval.py: the clean-snapshot evaluator.
THIRD_PARTY_NOTICES.md: attribution for the upstream engine source I adapted.

Intended Environment

I designed this environment assuming the evaluation image preinstalls the following packages:

python>=3.10
torch
numpy
pytest
transformers
tqdm
xxhash
typing_extensions

Evaluation Flow

(For the judge) We want to evaluate patches, not a dirty workspace.

restore a fresh copy of task_repo_baseline/.
apply only the candidate patch.
reject patches that touch protected evaluation assets or public tests.
run public tests in the candidate repo.
run hidden behavioral tests from the trusted side in a fresh Python process.

The score is about engine behavior, not code style. I care about whether the candidate restores correct scheduling semantics under prefix caching pressure.

The public tests intentionally expose the broad shape of the problem without fully specifying the fix. They show that prefix blocks are reusable and that shared-prefix prefill should still make progress under a tight token budget. The hidden tests then check the more complete scheduler behavior.

Quickstart

You can run the evaluator against the included example patches with:

python3 run_eval.py simple_examples/example_bad.patch
python3 run_eval.py simple_examples/example_good.patch

Reproducible Eval Image

I also packaged the prototype as a simple Docker image so the intended environment can be recreated directly instead of being inferred from prose.

You can build the image with:

docker build -t nano-vllm-rl-env .

You can run the evaluator inside that image with:

docker run --rm -it nano-vllm-rl-env python3 run_eval.py simple_examples/example_bad.patch
docker run --rm -it nano-vllm-rl-env python3 run_eval.py simple_examples/example_good.patch

If you want to iterate on candidate fixes from my local checkout, you can mount the workspace into the container:

docker run --rm -it -v "$PWD":/workspace -w /workspace nano-vllm-rl-env bash

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
judge_assets		judge_assets
simple_examples		simple_examples
task_repo_baseline		task_repo_baseline
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
TASK_PROMPT.md		TASK_PROMPT.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
requirements.txt		requirements.txt
run_eval.py		run_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nano-vLLM Engine Single-Task RL Environment

Upstream Basis

What The Task Is

Layout

Intended Environment

Evaluation Flow

Quickstart

Reproducible Eval Image

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Nano-vLLM Engine Single-Task RL Environment

Upstream Basis

What The Task Is

Layout

Intended Environment

Evaluation Flow

Quickstart

Reproducible Eval Image

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages