Rewrite sglang integration test with tensor dumps and harden Ray worker env#75
Conversation
…er env - Rewrite test_sglang_engine_integration.py with structured tests (short seqs, long seqs, text prompts) that dump all tensors to .pt files for cross-engine comparison against vllm. Auto-resolve aux layer IDs from model config. - Propagate TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC and TORCHINDUCTOR_FX_GRAPH_CACHE to Ray training workers to avoid NCCL watchdog kills and enable inductor FX graph caching. - Update torchspec env key allowlist: drop legacy NCCL_IB_* keys, add the two new torch/inductor env vars.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 313e3c0c78
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR updates TorchSpec’s sglang integration workflow to support cross-engine tensor dump comparisons and hardens Ray worker environment propagation for long-running training steps.
Changes:
- Rewrites
tests/test_sglang_engine_integration.pyinto structured scenarios that fetch Mooncake tensors and dump them to.ptfiles, with aux layer IDs auto-derived from model config. - Propagates
TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC(default 1800s) andTORCHINDUCTOR_FX_GRAPH_CACHE(default enabled) into Ray training workerruntime_env. - Updates the Ray env allowlist to include the two new Torch/TorchInductor env vars and drop legacy NCCL IB keys.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| torchspec/utils/env.py | Adjusts the env var allowlist forwarded to Ray actors (adds Torch/NCCL + Inductor knobs). |
| torchspec/ray/train_group.py | Ensures Ray training workers receive safer defaults for NCCL heartbeat + Inductor FX graph cache. |
| tests/test_sglang_engine_integration.py | Reworks the sglang integration script to run multiple scenarios and dump Mooncake tensors for cross-engine comparison. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if _REPO_ROOT not in sys.path: | ||
| sys.path.insert(0, _REPO_ROOT) | ||
|
|
There was a problem hiding this comment.
This module mutates global interpreter state at import time (inserting repo root into sys.path). Since this file lives under tests/ and matches pytest’s default collection pattern (test_*.py), importing it during test discovery can unexpectedly change import resolution for the entire test session. Consider moving the sys.path manipulation into main() / the main block (or renaming/relocating the script so it isn’t collected by pytest).
| if _REPO_ROOT not in sys.path: | |
| sys.path.insert(0, _REPO_ROOT) | |
| def _ensure_repo_root_on_sys_path() -> None: | |
| if _REPO_ROOT not in sys.path: | |
| sys.path.insert(0, _REPO_ROOT) | |
| if __name__ == "__main__": | |
| _ensure_repo_root_on_sys_path() |
| # Start mooncake master first: | ||
| # mooncake_master --port 50051 & | ||
| # etcd --listen-client-urls http://0.0.0.0:8090 --advertise-client-urls http://localhost:8090 & |
There was a problem hiding this comment.
The usage header shows mooncake_master/etcd ports 50051/8090, but the script defaults MOONCAKE_MASTER_PORT/MOONCAKE_METADATA_PORT to 51135/8763 below. This mismatch can confuse users following the instructions. Update the usage text to match the defaults, or explicitly state that the ports are controlled via MOONCAKE_* env vars (and that the shown command is just an example).
| # Start mooncake master first: | |
| # mooncake_master --port 50051 & | |
| # etcd --listen-client-urls http://0.0.0.0:8090 --advertise-client-urls http://localhost:8090 & | |
| # Start mooncake services first. The ports are controlled by | |
| # MOONCAKE_MASTER_PORT / MOONCAKE_METADATA_PORT; the commands below use | |
| # this script's defaults: | |
| # mooncake_master --port 51135 & | |
| # etcd --listen-client-urls http://0.0.0.0:8763 --advertise-client-urls http://localhost:8763 & |
| spec_training_response_length=[5, 10, 8], | ||
| sampling_params={"max_new_tokens": 32}, | ||
| spec_training_data_id=data_ids, | ||
| sampling_params={"max_new_tokens": 1}, |
There was a problem hiding this comment.
This script sets sampling_params={"max_new_tokens": 1}, but TorchSpec’s SglEngine spec_training path uses max_new_tokens=0 (prefill-only). Generating extra tokens can change what the engine writes to Mooncake and can make the seq_len/shape assertions in this test incorrect or flaky. Align this to max_new_tokens=0 (or derive seq_len from meta_info["prompt_tokens"] consistently when validating shapes).
| sampling_params={"max_new_tokens": 1}, | |
| sampling_params={"max_new_tokens": 0}, |
| results = engine.generate( | ||
| prompt=text_prompts, | ||
| spec_training_data_id=data_ids, | ||
| sampling_params={"max_new_tokens": 1}, | ||
| return_hidden_states=True, | ||
| ) |
There was a problem hiding this comment.
Same as above: using sampling_params={"max_new_tokens": 1} diverges from TorchSpec’s prefill-only spec_training behavior (max_new_tokens=0) and may change Mooncake tensor shapes/content relative to the asserted prompt token length. Consider setting this to 0 to keep the dump comparable to training-time behavior.
…gured value
MOONCAKE_MASTER_SERVER was always set to {LOCAL_IP}:51135 regardless of
the resolved MOONCAKE_MASTER_PORT value. Build it from the actual port
so overriding MOONCAKE_MASTER_PORT (e.g. 50051) propagates correctly.
Avoids doubling memory usage when the eval cache checkpoint is large, since mmap lazily pages in tensors instead of reading the full file.
Summary
.ptfiles for cross-engine comparison (e.g. vs vllm). Auto-resolves aux layer IDs from model config instead of hardcoding.TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC(default 1800s) andTORCHINDUCTOR_FX_GRAPH_CACHE(enabled) to Ray training workers to prevent NCCL watchdog kills during long training steps and speed up inductor compilations.NCCL_IB_*keys, add the two new torch/inductor env vars.Test plan
python tests/test_sglang_engine_integration.py --model Qwen/Qwen3-8B --tp 4with mooncake services running; verify tensor dumps appear in./tensor_dumps/TORCH_NCCL_HEARTBEAT_TIMEOUT_SEC,TORCHINDUCTOR_FX_GRAPH_CACHE) by inspecting worker logs