ci(benchmarks): add Devstral 2 to H100 nightly benchmarks#1069
ci(benchmarks): add Devstral 2 to H100 nightly benchmarks#1069smfirmin wants to merge 6 commits intolightseekorg:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request adds the Devstral-2-123B-Instruct-2512 model to the nightly performance benchmarks and infrastructure configurations. It also introduces a mechanism to skip redundant multi-worker test variants for specific models. The reviewer suggests simplifying the test generation logic by checking if _multi_workers > 1 instead of using a hardcoded exclusion set, which would more generically handle models that only require a single worker.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
e2e_test/benchmarks/test_nightly_perf.py (1)
176-188: 🧹 Nitpick | 🔵 TrivialClean up
_variantsfrom module namespace for consistency.Line 188 cleans loop temporaries, but
_variantsis now also a module-level temporary and should be deleted too.Suggested small cleanup
-del _model_id, _name, _multi_workers, _backends, _extra, _suffix, _count, _cls_name, _cls +del _model_id, _name, _multi_workers, _backends, _extra, _variants, _suffix, _count, _cls_name, _cls🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@e2e_test/benchmarks/test_nightly_perf.py` around lines 176 - 188, The module-level temporary _variants is left behind after the loop and should be removed with the other temporaries; add _variants to the cleanup deletion so the trailing line becomes del _model_id, _name, _multi_workers, _backends, _extra, _suffix, _count, _cls_name, _cls, _variants to avoid leaking the variable from the loop that builds test classes in the block using _NIGHTLY_MODELS and _make_test_class.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/nightly-benchmark.yml:
- Line 353: The YAML flow-mapping on the line containing the mapping with id:
mistralai/Devstral-2-123B-Instruct-2512, slug:
mistralai-Devstral-2-123B-Instruct-2512, and test_class:
TestNightlyDevstral2Single violates the `braces` rule due to spaces immediately
inside `{ }`; remove the space after `{` and before `}` so the mapping becomes
brace-adjacent (e.g. `{id: ..., slug: ..., test_class: ...}`) to satisfy
yamlint.
---
Outside diff comments:
In `@e2e_test/benchmarks/test_nightly_perf.py`:
- Around line 176-188: The module-level temporary _variants is left behind after
the loop and should be removed with the other temporaries; add _variants to the
cleanup deletion so the trailing line becomes del _model_id, _name,
_multi_workers, _backends, _extra, _suffix, _count, _cls_name, _cls, _variants
to avoid leaking the variable from the loop that builds test classes in the
block using _NIGHTLY_MODELS and _make_test_class.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 419636fa-5ef2-4f14-a65a-ea79ac9fe15d
📒 Files selected for processing (3)
.github/workflows/nightly-benchmark.ymle2e_test/benchmarks/test_nightly_perf.pye2e_test/infra/model_specs.py
|
Hi @smfirmin, the DCO sign-off check has failed. All commits must include a To fix existing commits: # Sign off the last N commits (replace N with the number of unsigned commits)
git rebase HEAD~N --signoff
git push --force-with-leaseTo sign off future commits automatically:
|
efd72de to
34a818c
Compare
|
Hi @smfirmin, the DCO sign-off check has failed. All commits must include a To fix existing commits: # Sign off the last N commits (replace N with the number of unsigned commits)
git rebase HEAD~N --signoff
git push --force-with-leaseTo sign off future commits automatically:
|
c06ba3b to
ad7d542
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ad7d5426f5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
ad7d542 to
7562d29
Compare
7562d29 to
dbcf595
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dbcf595cc9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Sydney Firmin <sydney.firmin@oracle.com>
dbcf595 to
2d532b4
Compare
Signed-off-by: Sydney Firmin <sydney.firmin@oracle.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/nightly-benchmark.yml:
- Line 124: The Devstral 2 matrix entry was added to the wrong job matrix; move
the matrix entry referencing id "mistralai/Devstral-2-123B-Instruct-2512" (slug
"mistralai-Devstral-2-123B-Instruct-2512", test_class
"TestNightlyDevstral2Single") out of the H100 "single-worker" matrix and add it
to the "single-worker-h200" matrix (the block after the paused MiniMax line) so
the nightly run targets H200; ensure you remove it from the original
"single-worker" list to avoid duplicate runs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0fd3306b-dc8f-4f27-97e2-262b0d008a06
📒 Files selected for processing (3)
.github/workflows/nightly-benchmark.ymle2e_test/benchmarks/test_nightly_perf.pye2e_test/infra/model_specs.py
Signed-off-by: key4ng <rukeyang@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: efa7376600
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
.github/workflows/nightly-benchmark.yml (1)
125-125:⚠️ Potential issue | 🟠 MajorFix YAML flow-mapping brace spacing to avoid lint failure.
Line [125] uses spaces inside
{ ... }, which violates the configuredbraceslint rule.Lint-safe fix
- - { id: mistralai/Devstral-2-123B-Instruct-2512, slug: mistralai-Devstral-2-123B-Instruct-2512, test_class: TestNightlyDevstral2Single } + - {id: mistralai/Devstral-2-123B-Instruct-2512, slug: mistralai-Devstral-2-123B-Instruct-2512, test_class: TestNightlyDevstral2Single}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/nightly-benchmark.yml at line 125, The flow-mapping entry containing the mapping with keys id, slug, and test_class (the line with id: mistralai/Devstral-2-123B-Instruct-2512, slug: mistralai-Devstral-2-123B-Instruct-2512, test_class: TestNightlyDevstral2Single) uses spaces inside the braces; remove the inner spaces so the mapping is brace-lint-safe (e.g., change "{ id: ..., slug: ..., test_class: ... }" to "{id: ...,slug: ...,test_class: ...}"). Ensure the updated single-line flow mapping preserves the same keys/values and passes the `braces` lint rule.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/nightly-benchmark.yml:
- Line 230: The jobs "multi-worker" and "single-worker-h200" are unintentionally
disabled by an unconditional `if: false`; update their job-level `if` condition
(the `if:` line inside the job definitions for multi-worker and
single-worker-h200) to `if: ${{ !cancelled() }}` to re-enable them, or if
disabling was intentional add a clear comment above each job explaining the
rationale; ensure you change both occurrences (the `if:` line for the
multi-worker job and the `if:` line for the single-worker-h200 job).
---
Duplicate comments:
In @.github/workflows/nightly-benchmark.yml:
- Line 125: The flow-mapping entry containing the mapping with keys id, slug,
and test_class (the line with id: mistralai/Devstral-2-123B-Instruct-2512, slug:
mistralai-Devstral-2-123B-Instruct-2512, test_class: TestNightlyDevstral2Single)
uses spaces inside the braces; remove the inner spaces so the mapping is
brace-lint-safe (e.g., change "{ id: ..., slug: ..., test_class: ... }" to "{id:
...,slug: ...,test_class: ...}"). Ensure the updated single-line flow mapping
preserves the same keys/values and passes the `braces` lint rule.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: fbcca510-37e6-4135-9284-e4487924360b
📒 Files selected for processing (1)
.github/workflows/nightly-benchmark.yml
This reverts commit efa7376. Signed-off-by: key4ng <rukeyang@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/nightly-benchmark.yml:
- Line 124: The flow-mapping on the line containing the mapping for id:
mistralai/Devstral-2-123B-Instruct-2512, slug:
mistralai-Devstral-2-123B-Instruct-2512, and test_class:
TestNightlyDevstral2Single has spaces after the opening brace and before the
closing brace; remove those spaces so the mapping uses "{id: ..., slug: ...,
test_class: ...}" (no space immediately after "{" and no space immediately
before "}") to satisfy yamlint’s braces rule.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 160c8a8b-e79b-4091-be58-9e677f4feca4
📒 Files selected for processing (1)
.github/workflows/nightly-benchmark.yml
| - { id: Qwen/Qwen2.5-7B-Instruct, slug: Qwen-Qwen2.5-7B-Instruct, test_class: TestNightlyQwen7bSingle } | ||
| - { id: Qwen/Qwen3-30B-A3B, slug: Qwen-Qwen3-30B-A3B, test_class: TestNightlyQwen30bSingle } | ||
| - { id: openai/gpt-oss-20b, slug: openai-gpt-oss-20b, test_class: TestNightlyGptOss20bSingle } | ||
| - { id: mistralai/Devstral-2-123B-Instruct-2512, slug: mistralai-Devstral-2-123B-Instruct-2512, test_class: TestNightlyDevstral2Single } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Confirm TestNightlyDevstral2Single exists in the test generator configuration
# Check the model definition and generated test class naming pattern
rg -A 2 'mistralai/Devstral-2-123B-Instruct-2512' e2e_test/benchmarks/test_nightly_perf.py
# Verify the test class generation loop and naming pattern
rg -B 3 -A 5 'TestNightly.*Single.*=' e2e_test/benchmarks/test_nightly_perf.py
# Confirm the pattern: TestNightly{fragment}{Single|Multi}
ast-grep --pattern $'for $_ in $_:
$$$
_cls_name = f"TestNightly{$_}{$_}"
$$$'Repository: lightseekorg/smg
Length of output: 712
Fix YAML flow-mapping brace spacing to pass yamlint checks.
The flow-mapping on line 124 has spaces after { and before }, which violates the braces rule in yamlint and will fail CI linting. The test class name TestNightlyDevstral2Single is correctly generated from the test framework pattern.
🔧 Lint-compliant fix
- - { id: mistralai/Devstral-2-123B-Instruct-2512, slug: mistralai-Devstral-2-123B-Instruct-2512, test_class: TestNightlyDevstral2Single }
+ - {id: mistralai/Devstral-2-123B-Instruct-2512, slug: mistralai-Devstral-2-123B-Instruct-2512, test_class: TestNightlyDevstral2Single}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - { id: mistralai/Devstral-2-123B-Instruct-2512, slug: mistralai-Devstral-2-123B-Instruct-2512, test_class: TestNightlyDevstral2Single } | |
| - {id: mistralai/Devstral-2-123B-Instruct-2512, slug: mistralai-Devstral-2-123B-Instruct-2512, test_class: TestNightlyDevstral2Single} |
🧰 Tools
🪛 YAMLlint (1.38.0)
[error] 124-124: too many spaces inside braces
(braces)
[error] 124-124: too many spaces inside braces
(braces)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/nightly-benchmark.yml at line 124, The flow-mapping on the
line containing the mapping for id: mistralai/Devstral-2-123B-Instruct-2512,
slug: mistralai-Devstral-2-123B-Instruct-2512, and test_class:
TestNightlyDevstral2Single has spaces after the opening brace and before the
closing brace; remove those spaces so the mapping uses "{id: ..., slug: ...,
test_class: ...}" (no space immediately after "{" and no space immediately
before "}") to satisfy yamlint’s braces rule.
Description
Summary
Add
mistralai/Devstral-2-123B-Instruct-2512to the nightly benchmark coverage on H100 for bothsglangandvllm.Changes
MODEL_SPECSwith nightly benchmark metadata4single-worker-h100workflow matrixminimax-m2benchmark entryNotes
sglangandvllmSummary by CodeRabbit
New Features
Tests
Chores