feat(grpc): support vLLM --generation-config in gRPC mode#1103
feat(grpc): support vLLM --generation-config in gRPC mode#1103ai-jz wants to merge 9 commits intolightseekorg:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe PR modifies sampling parameter handling across proto definitions, Rust client code, and Python servicer to distinguish between unset and explicitly-set values. Proto fields are marked optional, client code propagates Changes
Sequence DiagramsequenceDiagram
actor Client
participant gRPC
participant Servicer
participant ModelConfig
Client->>gRPC: Send SamplingParams (some fields unset)
gRPC->>Servicer: Receive proto message with optional fields
Servicer->>ModelConfig: Load model default sampling params
ModelConfig-->>Servicer: Return defaults dict
Servicer->>Servicer: For each sampling field:<br/>if unset in proto → use default<br/>else → use proto value
Servicer-->>Client: Apply merged params to generation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @ai-jz, the DCO sign-off check has failed. All commits must include a To fix existing commits: # Sign off the last N commits (replace N with the number of unsigned commits)
git rebase HEAD~N --signoff
git push --force-with-leaseTo sign off future commits automatically:
|
7fb8355 to
55ec145
Compare
There was a problem hiding this comment.
Code Review
This pull request updates the gRPC interface to support model-specific generation configuration defaults by making sampling parameters optional in the protobuf definition and the Rust client. The Python servicer now correctly loads and applies these defaults when fields are omitted in the request, ensuring parity with the HTTP interface. Feedback was provided regarding the Rust client to use a safer numeric conversion for the top_k field to handle potential overflows or negative values.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7fb8355f2b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| top_k=params.top_k if params.HasField("top_k") | ||
| else defaults.get("top_k", 0), | ||
| min_p=params.min_p if params.HasField("min_p") | ||
| else defaults.get("min_p", 0.0), | ||
| frequency_penalty=params.frequency_penalty if params.HasField("frequency_penalty") |
There was a problem hiding this comment.
Preserve zero-valued sampling fields from legacy clients
This fallback now interprets missing sampling fields as "unset" and pulls model defaults, but proto3 legacy clients (built before these fields were optional) omit scalar default values (0/0.0) on the wire. In a mixed-version rollout, explicit zero-valued settings like top_k=0, min_p=0, or zero penalties from older clients will arrive as absent and be replaced by generation-config defaults, changing generation behavior unexpectedly.
Useful? React with 👍 / 👎.
| if max_model_len is not None and input_length > 0: | ||
| model_max = max_model_len - input_length | ||
| candidates = [v for v in (model_max, raw_max_tokens, override_max_tokens) if v is not None] | ||
| max_tokens = min(candidates) if candidates else None | ||
| else: |
There was a problem hiding this comment.
Apply override max_tokens when prompt is text input
override_max_tokens is only applied inside the input_length > 0 branch, so requests using the text input oneof (or empty tokenized input) skip that branch and ignore the configured override cap. In those cases max_tokens falls back to raw_max_tokens only, which can bypass override_generation_config.max_new_tokens limits.
Useful? React with 👍 / 👎.
Review ResponseAddressed findings from Codex and CodeRabbit reviews in e73a892: Addressed1. [HIGH] Safe top_k conversion (vllm_engine.rs:586) 2. [P2] override_max_tokens skipped for text input (servicer.py:574) Acknowledged (no change needed)3. [P1] Legacy client zero-value concern (servicer.py:588) |
Support all three --generation-config modes (auto, vllm, custom path) in the vLLM gRPC serving path, achieving parity with the HTTP path. Previously, the gRPC servicer ignored the model generation_config.json and hardcoded neutral defaults (temperature=1.0, top_p=1.0, etc.). Now it reads the same defaults as the HTTP path via get_diff_sampling_param() and applies them when request fields are unset. Three-layer change: - Proto: make top_p, top_k, min_p, frequency_penalty, presence_penalty, repetition_penalty optional so HasField can detect unset fields - Rust client: pass Option through instead of unwrap_or(default) in all 5 vLLM sampling param builders, preserving the unset signal - Python servicer: load model defaults at init, use HasField for all fields, clamp max_tokens to remaining context, merge Harmony stop_token_ids for gpt_oss models Tests: - 3 new + 2 updated Rust unit tests for proto optional semantics - 4 new e2e tests verifying generation_config defaults in gRPC mode - All 48 Rust tests pass, 17/17 existing e2e openai-grpc tests pass Signed-off-by: ai-jz <ai-jz@users.noreply.github.com>
…or text input Signed-off-by: ai-jz <ai-jz@users.noreply.github.com>
e73a892 to
78072f3
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@e2e_test/chat_completions/test_generation_config.py`:
- Around line 30-38: The test test_default_temperature_applied currently unpacks
client from setup_backend but should use the api_client fixture directly; remove
extracting the client from "_, model_path, client, _ = setup_backend" and
instead use the api_client parameter as the client variable, keeping only
whatever value you still need from setup_backend (e.g., model_path) by adjusting
the unpack to exclude client (or access only model_path), and update any
references to "client" in the test to use "api_client".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 313f1c2f-3415-4515-9770-90dd6fc7646e
📒 Files selected for processing (4)
crates/grpc_client/proto/vllm_engine.protocrates/grpc_client/src/vllm_engine.rse2e_test/chat_completions/test_generation_config.pygrpc_servicer/smg_grpc_servicer/vllm/servicer.py
Signed-off-by: ai-jz <ai-jz@users.noreply.github.com>
0916c74 to
e8df77c
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e8df77ca9a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| params.max_tokens if params.HasField("max_tokens") | ||
| else defaults.get("max_tokens") | ||
| ) | ||
| if max_model_len is not None and input_length > 0: |
There was a problem hiding this comment.
Clamp max_tokens for zero-length prompts
Include max_model_len in the max_tokens clamp even when input_length == 0. The current guard (input_length > 0) skips the model-length cap for empty tokenized prompts and all text oneof prompts (which pass input_length=0), so raw_max_tokens/override_max_tokens can exceed context length and produce avoidable vLLM validation failures instead of being bounded to remaining capacity.
Useful? React with 👍 / 👎.
- Fix AttributeError when override_generation_config is None: use
(getattr(..., None) or {}) to coalesce missing and None cases
- Defensive copy of get_diff_sampling_param() to prevent shared-state
mutation of ModelConfig internals
- Add ValueError guard when input_length exceeds max_model_len
- Clamp max_tokens to context length for text input prompts
- Fix dead fallback: return None when candidates list is empty
- Fix clippy: remove useless u32::try_from(u32) in Messages builder
- Apply ruff format to servicer.py
- Clarify test_default_temperature_applied is a smoke test
Signed-off-by: ai-jz <ai-jz@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2065fec386
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| optional float top_p = 2; | ||
| optional uint32 top_k = 3; | ||
| optional float min_p = 4; | ||
| optional float frequency_penalty = 5; |
There was a problem hiding this comment.
I don't recommend changing types of the sampling parameters here. Making it optional would mean it is 0 in default. However, the default of the above sampling parameters could be 1.0 in backend engine.
There was a problem hiding this comment.
Proto3 optional is specifically what lets us distinguish "not set" from "explicitly 0" — without it, an unset float is silently 0.0 on the wire, and the servicer can't tell if the client meant zero or just didn't specify.
With optional, HasField("top_p") returns False when the field wasn't set, so the servicer falls through to the model's generation_config.json default. When it returns True, the value is used as-is (even if it's 0). The proto-level default is still 0, but the servicer never sees it because it checks HasField() first.
This matches how the HTTP path works — JSON naturally distinguishes a missing key from "top_p": 0. The optional keyword gives the gRPC path the same capability.
Signed-off-by: ai-jz <ai-jz@users.noreply.github.com>
| class TestGenerationConfigDefaults: | ||
| """Verify --generation-config defaults apply in gRPC mode.""" | ||
|
|
||
| def test_default_temperature_applied(self, setup_backend, api_client): |
There was a problem hiding this comment.
I don't think this test is necessary as it doesn't necessarily test the sampling parameters override.
There was a problem hiding this comment.
Agreed — removed the test file in 60a723b. The core change is in proto + Rust client + servicer; tests can be added later when there's a way to observe the effective sampling params through the API.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/grpc_client/src/vllm_engine.rs`:
- Around line 917-957: The tests only construct proto::SamplingParams directly
and omit frequency_penalty/presence_penalty/repetition_penalty and thus don't
exercise the builder/conversion paths that regressed; update the tests
(test_sampling_params_optional_preserves_user_values and
test_sampling_params_unset_fields_distinguishable_from_zero) to (1) include
frequency_penalty, presence_penalty, and repetition_penalty in both the user-set
and zero cases, and (2) create and assert equality/inequality by passing those
proto::SamplingParams through the same builder/conversion path used in
production (the SamplingParams builder/converter that consumes
proto::SamplingParams) rather than only using the raw proto struct, so the
translation logic is exercised. Ensure assertions still check Some(value) vs
None and Some(0) distinctions after conversion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: bd57ef16-dcd2-4d5f-98c0-1f4f7d2884e9
📒 Files selected for processing (1)
crates/grpc_client/src/vllm_engine.rs
| #[test] | ||
| fn test_sampling_params_optional_preserves_user_values() { | ||
| // When user sets values, they should be preserved as Some(value) | ||
| let params = proto::SamplingParams { | ||
| temperature: Some(0.6), | ||
| top_p: Some(0.95), | ||
| top_k: Some(20), | ||
| min_p: Some(0.1), | ||
| frequency_penalty: Some(0.5), | ||
| presence_penalty: Some(0.3), | ||
| repetition_penalty: Some(1.2), | ||
| max_tokens: Some(100), | ||
| ..Default::default() | ||
| }; | ||
| assert_eq!(params.temperature, Some(0.6)); | ||
| assert_eq!(params.top_p, Some(0.95)); | ||
| assert_eq!(params.top_k, Some(20)); | ||
| assert_eq!(params.min_p, Some(0.1)); | ||
| assert_eq!(params.frequency_penalty, Some(0.5)); | ||
| assert_eq!(params.presence_penalty, Some(0.3)); | ||
| assert_eq!(params.repetition_penalty, Some(1.2)); | ||
| assert_eq!(params.max_tokens, Some(100)); | ||
| } | ||
|
|
||
| #[test] | ||
| fn test_sampling_params_unset_fields_distinguishable_from_zero() { | ||
| // Unset (None) must be different from explicit zero — this is | ||
| // the core invariant for --generation-config support | ||
| let unset = proto::SamplingParams::default(); | ||
| let zero = proto::SamplingParams { | ||
| temperature: Some(0.0), | ||
| top_p: Some(0.0), | ||
| top_k: Some(0), | ||
| min_p: Some(0.0), | ||
| ..Default::default() | ||
| }; | ||
| assert_ne!(unset.temperature, zero.temperature); | ||
| assert_ne!(unset.top_p, zero.top_p); | ||
| assert_ne!(unset.top_k, zero.top_k); | ||
| assert_ne!(unset.min_p, zero.min_p); | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Exercise the builder paths in the regression tests.
This block proves the generated proto type can represent None vs Some(0), but it never hits the translation paths that regressed in this PR, and it omits frequency_penalty, presence_penalty, and repetition_penalty. A future unwrap_or(...) reintroduction in one of the builders would still leave this suite green.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@crates/grpc_client/src/vllm_engine.rs` around lines 917 - 957, The tests only
construct proto::SamplingParams directly and omit
frequency_penalty/presence_penalty/repetition_penalty and thus don't exercise
the builder/conversion paths that regressed; update the tests
(test_sampling_params_optional_preserves_user_values and
test_sampling_params_unset_fields_distinguishable_from_zero) to (1) include
frequency_penalty, presence_penalty, and repetition_penalty in both the user-set
and zero cases, and (2) create and assert equality/inequality by passing those
proto::SamplingParams through the same builder/conversion path used in
production (the SamplingParams builder/converter that consumes
proto::SamplingParams) rather than only using the raw proto struct, so the
translation logic is exercised. Ensure assertions still check Some(value) vs
None and Some(0) distinctions after conversion.
There was a problem hiding this comment.
Fair point — the test validates proto-level semantics (None vs Some(0)) but does not exercise the builder translation paths. The existing 48 Rust unit tests and 17 e2e openai-grpc tests cover those paths; this test was specifically for the new optional field behavior. Adding builder-path coverage is a good follow-up.
There was a problem hiding this comment.
@ai-jz Understood — the distinction makes sense. The two new tests are purposefully scoped to proto-layer invariants (confirming None is representable and distinguishable from Some(0) after the optional keyword change), which is orthogonal to exercising the builder translation logic already covered by the broader unit and e2e suite. Happy to track the builder-path regression tests as a follow-up if that's the plan.
🧠 Learnings used
Learnt from: vschandramourya
Repo: lightseekorg/smg PR: 915
File: model_gateway/src/routers/grpc/client.rs:387-423
Timestamp: 2026-03-26T17:06:14.307Z
Learning: In repo lightseekorg/smg, in `model_gateway/src/routers/grpc/client.rs` and the corresponding backend builders (`crates/grpc_client/src/sglang_scheduler.rs`, `vllm_engine.rs`, `trtllm_service.rs`): The per-backend divergence in handling `CompletionRequest.max_tokens == None` is intentional. SGLang and vLLM pass `None` through to their proto builders, while TRT-LLM falls back to `16`. This matches the pre-existing per-backend pattern used in the chat/messages request builders. Do not flag this divergence as a bug or request normalization at the `build_completion_request` dispatcher layer in `client.rs`.
Learnt from: slin1237
Repo: lightseekorg/smg PR: 489
File: multimodal/tests/vision_golden_tests.rs:576-578
Timestamp: 2026-02-21T02:39:51.670Z
Learning: Repo lightseekorg/smg — For PR `#489` (clippy/lint-only), do not replace unwrap() with expect(...) in test files when a file-/crate-level #![expect(clippy::unwrap_used)] is present (e.g., multimodal/tests/vision_golden_tests.rs). Such stylistic swaps are out of scope.
Learnt from: vschandramourya
Repo: lightseekorg/smg PR: 978
File: model_gateway/src/routers/grpc/regular/streaming.rs:2275-2276
Timestamp: 2026-03-31T03:43:30.981Z
Learning: In repo lightseekorg/smg, file model_gateway/src/routers/grpc/regular/streaming.rs: `logprobs: None` in `CompletionStreamChoice` within `process_completion_streaming_chunks` is intentional. Per-token logprob streaming is deferred across all streaming endpoints (completions, generate) because it requires backend-side plumbing not yet available. The same `// TODO: wire per-token logprob streaming` deferral exists in `process_generate_streaming` (around line 702). Non-streaming completions has the same TODO. Do not flag `logprobs: None` or the TODO comment in the completions or generate streaming paths as a missing feature or silent data loss until a dedicated follow-up PR wires backend per-token logprob support.
Learnt from: slin1237
Repo: lightseekorg/smg PR: 489
File: multimodal/tests/vision_golden_tests.rs:402-404
Timestamp: 2026-02-21T02:39:23.481Z
Learning: Repo lightseekorg/smg — For clippy-only/lint-enforcement PRs (e.g., PR `#489`), do not replace unwrap() with expect() across tests/benches when a crate-level `#![expect(clippy::unwrap_used)]` is present. Such per-call swaps are treated as out-of-scope stylistic changes. Example: multimodal/tests/vision_golden_tests.rs.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 812
File: e2e_test/embeddings/test_correctness.py:223-223
Timestamp: 2026-03-20T00:00:47.235Z
Learning: In `lightseekorg/smg` PR `#812`, `TestEmbeddingCorrectness` in `e2e_test/embeddings/test_correctness.py` uses the `api_client` fixture without class-level parametrization by design. SmgClient coverage for embedding correctness (HF parity assertions in `test_semantic_similarity` and `test_relevance_scores`) is intentionally opt-in and deferred until SmgClient embedding compatibility is verified. Do not flag the absence of SmgClient parametrization in this test class as a gap.
Learnt from: XinyueZhang369
Repo: lightseekorg/smg PR: 723
File: model_gateway/tests/api/interactions_api_test.rs:251-282
Timestamp: 2026-03-11T23:26:31.100Z
Learning: In repo lightseekorg/smg, the Gemini-specific header path (apply_provider_headers emitting x-goog-api-key for googleapis.com URLs, and extract_gemini_auth_header) is intentionally not covered by the mock-server integration tests in model_gateway/tests/api/interactions_api_test.rs (PR `#723`). Coverage against a real Gemini backend URL will be added in a follow-up e2e test PR. Do not flag the absence of this test coverage as a gap in PR `#723` or related PRs.
Learnt from: XinyueZhang369
Repo: lightseekorg/smg PR: 723
File: model_gateway/tests/api/interactions_api_test.rs:316-346
Timestamp: 2026-03-11T05:32:45.536Z
Learning: In repo lightseekorg/smg, `test_interactions_multiple_workers` in `model_gateway/tests/api/interactions_api_test.rs` intentionally only verifies that requests succeed with multiple Gemini backends (connectivity), not load distribution. Load distribution across workers is covered separately in `tests/routing/load_balancing_test.rs`. Do not flag the absence of per-worker request count assertions in this test as a gap — distribution verification is deferred to a follow-up PR once the Gemini router implementation matures.
Learnt from: pallasathena92
Repo: lightseekorg/smg PR: 687
File: model_gateway/src/routers/openai/realtime/webrtc.rs:238-289
Timestamp: 2026-03-11T01:29:56.655Z
Learning: In repo lightseekorg/smg, file model_gateway/src/routers/openai/realtime/webrtc.rs: `Metrics::record_router_request` is already emitted in router.rs (around line 1152) before `handle_realtime_webrtc` is called, and `Metrics::record_router_error` is emitted inside `handle_realtime_webrtc` for the no-workers case. The missing instrumentation is success/duration recording after `setup_and_spawn_bridge` returns — this is a metrics improvement deferred to a follow-up PR, not a correctness gap. Do not flag missing success/duration metrics as a blocking issue for PR `#687`.
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1030
File: crates/tool_parser/src/parsers/deepseek32.rs:352-400
Timestamp: 2026-04-06T20:50:55.263Z
Learning: In `crates/tool_parser/src/parsers/deepseek32.rs` (and other parsers in `crates/tool_parser/src/parsers/`), `prev_tool_call_arr` entries are initialized with `"arguments": {}` (a JSON object), not a string. Code that retrieves the previous arguments via `.as_str()` will get `None` for this initial value. The correct pattern is to handle `prev_args = None` explicitly (e.g., treat it as an empty string and emit from `sent_len`), rather than changing the initialization to `"arguments": ""`. This is a deliberate cross-parser convention — do not flag the `{}` initialization as a bug.
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1006
File: crates/tool_parser/src/parsers/deepseek31.rs:162-181
Timestamp: 2026-04-01T04:14:46.469Z
Learning: In repo lightseekorg/smg, `crates/tool_parser/src/parsers/deepseek31.rs` (and the analogous V3 parser `crates/tool_parser/src/parsers/deepseek.rs` lines 186-197): The `parse_incremental` method does NOT split off a plain-text prefix before the first tool marker within the same chunk. This is intentional because the gRPC streaming path delivers tokens individually, so normal text content and tool-call markers always arrive in separate chunks — a prefix and a tool marker will never coexist in the same chunk. Do not flag the absence of a within-chunk prefix-split as a bug; the `test_deepseek31_streaming_text_before_tools` test covers the realistic multi-chunk case.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 588
File: model_gateway/src/routers/grpc/multimodal.rs:453-514
Timestamp: 2026-03-03T18:03:45.713Z
Learning: In repo lightseekorg/smg, backend assembly functions in model_gateway/src/routers/grpc/multimodal.rs (e.g., assemble_sglang, assemble_vllm, assemble_trtllm) are tested via E2E tests rather than unit tests, as unit tests for these functions are not considered worthwhile.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 495
File: grpc_client/proto/sglang_scheduler.proto:140-172
Timestamp: 2026-02-21T11:57:36.560Z
Learning: In repo lightseekorg/smg, the multimodal proto fields in `grpc_client/proto/sglang_scheduler.proto` (including `MultimodalInputs.pixel_values`, `model_specific_tensors`, `im_token_id`, and `mm_placeholders`) were introduced in PR `#495` as the first real usage; prior internal-only development fields like `processed_features` were never shipped to production, so wire-compatibility concerns do not apply to these field number assignments.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 638
File: grpc_servicer/smg_grpc_servicer/vllm/servicer.py:371-406
Timestamp: 2026-03-05T04:48:49.033Z
Learning: In repo lightseekorg/smg, `grpc_servicer/smg_grpc_servicer/vllm/servicer.py` (`_build_preprocessed_mm_inputs`): multimodal metadata fields (`flat_keys`, `batched_keys`, `mm_placeholders`, `model_specific_tensors`) are produced exclusively by the internal Rust router and are treated as trusted data. Defensive input-validation guards (e.g., checking flat_keys targets exist in hf_dict, validating PlaceholderRange bounds) are intentionally omitted to avoid hot-path overhead. Do not flag missing validation on these fields in future reviews.
Learnt from: slin1237
Repo: lightseekorg/smg PR: 489
File: model_gateway/src/routers/grpc/client.rs:189-194
Timestamp: 2026-02-21T02:32:13.396Z
Learning: Repo: lightseekorg/smg PR: 489
File: model_gateway/src/routers/grpc/client.rs
Context: GrpcClient::generate fallback match arm
Learning: The catch-all panic for mismatched client/request types in GrpcClient::generate is intentional to enforce pipeline invariants; do not convert this to a returned error in PR `#489` or similar lint-only changes.
Learnt from: slin1237
Repo: lightseekorg/smg PR: 447
File: model_gateway/src/routers/grpc/client.rs:312-328
Timestamp: 2026-02-17T20:30:41.243Z
Learning: In model_gateway gRPC metadata discovery, SGLang's `server_args` (a `prost_types::Struct`) natively uses short-form keys like `tp_size`, `dp_size`, `pp_size` that match `SGLANG_GRPC_KEYS` and are extracted via `pick_prost_fields()` without normalization. vLLM and TRT-LLM gRPC protos use long-form keys like `tensor_parallel_size`, `pipeline_parallel_size`, which go through `flat_labels()` and are normalized by `normalize_grpc_keys()` in `discover_metadata.rs` after both `model_info.to_labels()` and `server_info.to_labels()` return.
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1034
File: model_gateway/src/routers/grpc/multimodal.rs:602-610
Timestamp: 2026-04-04T00:29:52.690Z
Learning: In repo lightseekorg/smg, the gRPC regular pipeline runs stages in order: Preparation (tokenization + multimodal media fetch + preprocessing) → Worker Selection → Request Building. Because backend identity (e.g., GrpcClient::Mlx) is only resolved after worker selection, it is architecturally impossible to short-circuit MLX multimodal rejection before process_multimodal() / process_multimodal_messages() runs in the Preparation stage. The earliest feasible rejection point for backend-specific multimodal unsupported errors is at assemble_multimodal_data in request_building (after preprocessing but before generation). Moving rejection earlier would require a larger architectural change (e.g., moving worker selection before preparation). Do not flag assemble_multimodal_data's GrpcClient::Mlx error arm as "too late" in the pipeline.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 606
File: e2e_test/chat_completions/test_openai_server.py:600-600
Timestamp: 2026-03-04T04:05:00.273Z
Learning: In the `lightseekorg/smg` repository, as of PR `#606`, Harmony (gpt-oss) DOES support stop sequence trimming. `TestChatCompletionGptOss` in `e2e_test/chat_completions/test_openai_server.py` correctly sets `STOP_SEQUENCE_TRIMMED = True` by default. The vLLM backend (is_vllm) overrides this to `False` in both `test_stop_sequences` and `test_stop_sequences_stream` because vLLM does not trim stop sequences. Do not flag `STOP_SEQUENCE_TRIMMED = True` in `TestChatCompletionGptOss` as incorrect.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 1031
File: model_gateway/src/routers/grpc/regular/streaming.rs:1579-1600
Timestamp: 2026-04-03T05:31:33.707Z
Learning: In repo lightseekorg/smg, `model_gateway/src/routers/grpc/regular/streaming.rs` (`process_messages_streaming_chunks`): The reasoning parser is gated on explicit user opt-in (`ThinkingConfig::Enabled`) via `separate_reasoning = matches!(&original_request.thinking, Some(ThinkingConfig::Enabled { .. }))`, NOT on `thinking_override` (which only controls whether `mark_reasoning_started()` is called on the parser, not whether the parser runs at all). Chat API uses the `separate_reasoning` field on `ChatCompletionRequest` instead. Do not suggest removing or replacing the `ThinkingConfig::Enabled` gate in the Messages path.
Learnt from: TingtingZhou7
Repo: lightseekorg/smg PR: 1057
File: model_gateway/src/routers/openai/mcp/tool_loop.rs:856-885
Timestamp: 2026-04-08T00:08:05.944Z
Learning: In repo lightseekorg/smg, `sanitize_builtin_tool_arguments` in `model_gateway/src/routers/openai/mcp/tool_loop.rs` intentionally drops image-generation options (size, quality, background, output_format, compression) when handling `ResponseFormat::ImageGenerationCall`, keeping only `model` (hardcoded `IMAGE_MODEL`) and `revised_prompt`. This is a deliberate scoped decision for the initial image-generation tool integration; per-option overrides/defaults are planned for a follow-up PR. Do not flag the truncation as a bug or request preservation of extra fields.
Learnt from: XinyueZhang369
Repo: lightseekorg/smg PR: 399
File: protocols/src/interactions.rs:505-509
Timestamp: 2026-02-19T03:08:50.192Z
Learning: In code reviews for Rust projects using the validator crate (v0.20.0), ensure that custom validation functions for numeric primitive types (e.g., f32, i32, u32, i16, etc.) accept the value by value, not by reference. Example: fn validate(value: f32) { ... }. The validator derive macro has a hardcoded list of numeric types that are passed by value, while all other types are passed by reference. Apply this guideline whenever validating numeric fields to align with the derive macro behavior.
Learnt from: slin1237
Repo: lightseekorg/smg PR: 489
File: model_gateway/src/core/token_bucket.rs:58-63
Timestamp: 2026-02-21T02:30:51.443Z
Learning: For lint-only/Clippy enforcement PRs in this repository, avoid introducing behavioral changes (e.g., new input validation or logic changes). Treat such PRs as non-functional changes and plan a separate follow-up issue/PR for hardening or behavior changes. This applies broadly to Rust files across the repo; during review, focus on lint/style corrections and clearly note any intentional exceptions.
Learnt from: slin1237
Repo: lightseekorg/smg PR: 489
File: protocols/src/responses.rs:928-931
Timestamp: 2026-02-21T02:36:00.882Z
Learning: In Rust code across the repository, use the marker INVARIANT: to document assumptions in safe code. Reserve SAFETY: for explaining why unsafe blocks are sound. This improves clarity of invariants and safety reasoning. Example reference: protocols/src/responses.rs near validate_tool_choice_with_tools().
Learnt from: slin1237
Repo: lightseekorg/smg PR: 489
File: mesh/src/sync.rs:83-83
Timestamp: 2026-02-21T02:37:01.416Z
Learning: General Rust formatting rule: format! with implicit captures only supports simple identifiers, not full expressions like {state.model_id}. For cases where you want to interpolate a field or expression, bind the value first and interpolate the binding, e.g., let model_id = &state.model_id; and then use format!("policy:{}", model_id). In the specific file mesh/src/sync.rs, prefer format!("policy:{}", state.model_id) or bind to a local variable if you need named interpolation, to keep clarity and avoid unintended captures.
Learnt from: zhaowenzi
Repo: lightseekorg/smg PR: 807
File: model_gateway/src/middleware.rs:61-81
Timestamp: 2026-03-18T21:32:00.041Z
Learning: In Rust code using the http crate, HeaderMap::get() is effectively case-insensitive because HeaderName normalizes keys to lowercase on insertion and lookup. Do not require or perform explicit .to_lowercase() before HeaderMap::get() calls. Mark as not a concern for case-sensitivity in lookups; only consider normalization when inserting or comparing via HeaderName, not in lookups.
Learnt from: key4ng
Repo: lightseekorg/smg PR: 867
File: tui/src/app.rs:798-813
Timestamp: 2026-03-22T20:13:55.778Z
Learning: In this repo (lightseekorg/smg), treat the workspace `Cargo.toml`’s `package.rust-version` (MSRV) as the source of truth (e.g., `rust-version = "1.85"`). When reviewing Rust changes, do not flag usage of Rust language/library features that were stabilized on or before the MSRV (e.g., `Option::is_none_or`, stabilized in 1.82, is compatible with an MSRV of 1.85). Always verify the MSRV from the workspace `Cargo.toml` rather than relying on issue templates.
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1034
File: crates/grpc_client/src/mlx_engine.rs:135-145
Timestamp: 2026-04-04T00:29:38.039Z
Learning: In this repo (lightseekorg/smg), the gRPC Channel builder implementations for the backend clients (SGLang, vLLM, TRT-LLM, and MLX) intentionally omit `.connect_timeout()`. When reviewing code changes in `crates/grpc_client/src/`, do not treat a missing `.connect_timeout()` in these backend client channel builders as a blocking issue; it is planned for a uniform addition in a follow-up PR.
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1034
File: crates/grpc_client/src/mlx_engine.rs:61-66
Timestamp: 2026-04-04T00:29:37.400Z
Learning: In lightseekorg/smg’s gRPC backend clients that use the `AbortOnDropStream` pattern (e.g., vLLM, SGLang, TRT-LLM, MLX), `poll_next` is intentionally allowed to return `Poll::Ready(None)` on normal EOF without auto-flipping the completed flag. Do not review-fail for the absence of an EOF-driven `mark_completed()`/completed-flag update in these `AbortOnDropStream` implementations. The stream consumer is responsible for calling `mark_completed()` when the stream ends normally; any spurious/cheap idempotent `Abort` RPC that happens when `mark_completed()` wasn’t called before drop on natural EOF is an accepted tradeoff, and should not be treated as a bug.
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1034
File: crates/grpc_client/src/mlx_engine.rs:250-256
Timestamp: 2026-04-04T00:29:40.508Z
Learning: In lightseekorg/smg’s gRPC backend implementation under crates/grpc_client/src/, tool-calling enforcement is router-side: SMG’s router-side ToolParser parses tool calls from the backend’s text output. Therefore, MLX (and other gRPC backend) builder functions should NOT be required to reject/guard tool-related fields (e.g., tools, tool_choice) at the backend builder boundary just because they are missing. Only reject tool-related content when there’s an explicit, backend-specific structural constraint to enforce at the builder boundary (e.g., response_format / structural_tag or other required schema/structure invariants). When reviewing, do not flag the absence of tool-call guards in MLX/gRPC backend builders as missing enforcement.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 335b8130cd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if input_length > 0: | ||
| model_max = max_model_len - input_length | ||
| if model_max <= 0: | ||
| raise ValueError( | ||
| f"Input length ({input_length}) exceeds max model length ({max_model_len})" |
There was a problem hiding this comment.
Respect prompt truncation before enforcing context-length guard
The new input_length > 0 guard rejects requests before generation using the raw tokenized prompt length, but truncation is applied later via tokenization_kwargs (truncate_prompt_tokens). For requests where len(input_ids) > max_model_len but truncate_prompt_tokens would bring the prompt under the limit, this now raises INVALID_ARGUMENT even though vLLM could have served the request after truncation. This introduces a regression for direct gRPC callers using prompt truncation and breaks parity with the HTTP flow that computes limits from the effective prompt length.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
The HTTP path's get_max_tokens() has the same guard — it checks raw input_length before truncation is applied by the engine. Made a minor edit in 4ee8b93 to align the guard condition (< instead of <=) and error message to match the HTTP path exactly.
The test doesn't verify actual sampling parameter values, only that requests succeed. Remove it to keep the test suite lean — the core change is in proto + Rust client + servicer. Signed-off-by: ai-jz <ai-jz@users.noreply.github.com>
Signed-off-by: ai-jz <ai-jz@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4ee8b936ec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| candidates = [ | ||
| v for v in (model_max, raw_max_tokens, override_max_tokens) if v is not None | ||
| ] |
There was a problem hiding this comment.
Clamp max_tokens against platform output limits
The new max_tokens clamp only considers model_max, request/default max_tokens, and override_max_tokens, but it omits vLLM’s platform-specific cap (current_platform.get_max_output_tokens(...)) that HTTP uses in get_max_tokens. On platforms that enforce a lower output-token ceiling than context-based model_max, requests with omitted/large max_tokens can now be forwarded with an oversized value and fail later with avoidable validation/runtime errors instead of being bounded up front.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
current_platform.get_max_output_tokens() is an internal vLLM entrypoint API — the gRPC servicer runs outside that context and does not have access to it. The engine applies platform limits internally after receiving the request, same as it does for all other entrypoints.
| # Harmony model stop tokens — same as chat_completion/serving.py | ||
| if mc.hf_config.model_type == "gpt_oss": | ||
| from vllm.entrypoints.openai.parser.harmony_utils import ( | ||
| get_stop_tokens_for_assistant_actions, |
There was a problem hiding this comment.
This is injected in harmony pipeline. No need to inject here
| logger.info("VllmEngineServicer initialized") | ||
|
|
||
| # Load model's default sampling params (respects --generation-config) | ||
| mc = async_llm.model_config |
There was a problem hiding this comment.
Can we use the full name variables here?
There was a problem hiding this comment.
Done — renamed to model_config in b534cc2.
| mc = async_llm.model_config | ||
| self.default_sampling_params: dict = dict(mc.get_diff_sampling_param()) | ||
|
|
||
| # max_tokens override — same logic as chat_completion/serving.py |
There was a problem hiding this comment.
Please include detailed url link with specific commit.
There was a problem hiding this comment.
Added — see serving.py#L141 in b534cc2.
|
|
||
| Returns: | ||
| vLLM SamplingParams with detokenize=False and structured_outputs | ||
| vLLM SamplingParams with model defaults applied for unset fields |
There was a problem hiding this comment.
The phrase model defaults here is misleading. What if user passes --generation-config vllm or an overridden config?
There was a problem hiding this comment.
Renamed to "generation-config defaults" in b534cc2 — get_diff_sampling_param() covers all three --generation-config modes.
| params.max_tokens if params.HasField("max_tokens") else defaults.get("max_tokens") | ||
| ) | ||
| if max_model_len is not None: | ||
| if input_length > 0: |
There was a problem hiding this comment.
Does this come from the vllm logic? If so, please put the source link.
And it is quite surprised to me that this is not handled in AsyncLLM already. It should be. This is pure backend logic, should not be in the front end servicer.py
There was a problem hiding this comment.
You're right — removed the entire clamping block in b534cc2. The engine handles this in input_processor.py via AsyncLLM.add_request() → process_inputs().
…logic - Remove Harmony stop_token_ids block (gpt_oss model type check) - Rename 'mc' to 'model_config' for clarity in __init__ - Add vLLM HTTP path source link comment - Rename 'model defaults' to 'generation-config defaults' in comment - Remove max_tokens clamping logic (max_model_len, input_length, override_max_tokens) — let vLLM handle context-length enforcement Signed-off-by: ai-jz <ai-jz@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 81822bd7ed
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - uses: actions/checkout@v6 | ||
| with: | ||
| fetch-depth: 0 |
There was a problem hiding this comment.
Checkout pull-request ref for issue_comment reviews
When this workflow is triggered by issue_comment, actions/checkout without an explicit ref checks out the repository default branch rather than the commented PR branch. That means the Claude run can review the wrong code (or an empty main...HEAD diff) when someone invokes @claude from a PR thread. Set the checkout ref for comment-triggered runs (for example, refs/pull/${{ github.event.issue.number }}/head) so the review analyzes the intended changes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This file was included by mistake — fork-ci-review.yml is fork-only CI infrastructure and has been removed from this PR (force-pushed to drop the commit). These findings are valid and will be addressed on the fork separately.
There was a problem hiding this comment.
This file was included by mistake and has been removed from this PR.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
grpc_servicer/smg_grpc_servicer/vllm/servicer.py (1)
495-500:⚠️ Potential issue | 🟠 MajorDon't merge
stop_token_idsin the servicer.This union changes request semantics because
stop_token_idsis still a repeated proto field with no presence bit: omitted and explicitly empty look identical here, so callers lose any way to clear generation-config stop IDs. It also reintroduces the frontend-side stop-token injection that was already called out earlier.Suggested fix
- # Merge stop_token_ids: request + model defaults stop_token_ids = list(params.stop_token_ids) if params.stop_token_ids else None - default_stop_ids = defaults.get("stop_token_ids") - if default_stop_ids: - stop_token_ids = list(set(stop_token_ids or []) | set(default_stop_ids))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@grpc_servicer/smg_grpc_servicer/vllm/servicer.py` around lines 495 - 500, The current code merges request stop_token_ids with model defaults (using params.stop_token_ids and defaults.get("stop_token_ids")), which changes caller semantics; stop_token_ids is a repeated proto without presence so we must not union them. Change the logic in servicer.py around stop_token_ids so you do NOT take the set-union: instead honor the request value as-is (use params.stop_token_ids directly when provided) and only fall back to defaults.get("stop_token_ids") when the request did not provide any value, avoiding combining the two lists; update references to stop_token_ids, params.stop_token_ids, and defaults.get("stop_token_ids") accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/fork-ci-review.yml:
- Around line 53-55: The workflow currently uses actions/checkout@v6 (the two
occurrences of actions/checkout@v6) which is a floating tag contradicting the
repo's SHA-pinning policy; replace each actions/checkout@v6 with the exact
commit SHA pin used for other actions (e.g., actions/checkout@<commit-sha>) or
remove the SHA-pinning claim in the workflow header; update both occurrences of
actions/checkout so they match the pinned format used for
anthropics/claude-code-action and openai/codex-action.
- Around line 44-51: The "Validate secret" step currently exits 0 and does not
prevent downstream steps from running when ANTHROPIC_API_KEY is missing; change
the validation to emit a step output (e.g., set a has_key/has_anthropic output
via GITHUB_OUTPUT) indicating whether the secret exists, and then add if:
conditionals on the subsequent "checkout" and "Claude Code review" steps to run
only when that output is true; apply the same pattern to the codex-review job's
validation step and its dependent steps (use consistent output names like
has_key or has_codex_key and reference them in if: conditions).
- Around line 21-22: The workflow currently triggers on issue_comment and will
run for plain issues and default-branch refs; update the job condition (where
the workflow checks before running) to require github.event.issue.pull_request
!= null so it only runs for PR comments and add a secondary guard ensuring
github.event_name == 'issue_comment' && contains(github.event.comment.body,
'@claude') if applicable; also change the actions/checkout step (the checkout
job using actions/checkout) to pass an explicit ref:
github.event.issue.pull_request.head.ref or github.event.pull_request.head.sha
(use the correct event field in your job) so the job checks out the PR head
instead of the default branch.
---
Duplicate comments:
In `@grpc_servicer/smg_grpc_servicer/vllm/servicer.py`:
- Around line 495-500: The current code merges request stop_token_ids with model
defaults (using params.stop_token_ids and defaults.get("stop_token_ids")), which
changes caller semantics; stop_token_ids is a repeated proto without presence so
we must not union them. Change the logic in servicer.py around stop_token_ids so
you do NOT take the set-union: instead honor the request value as-is (use
params.stop_token_ids directly when provided) and only fall back to
defaults.get("stop_token_ids") when the request did not provide any value,
avoiding combining the two lists; update references to stop_token_ids,
params.stop_token_ids, and defaults.get("stop_token_ids") accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ed101e87-d959-449d-86c6-c498ffd200d3
📒 Files selected for processing (2)
.github/workflows/fork-ci-review.ymlgrpc_servicer/smg_grpc_servicer/vllm/servicer.py
| issue_comment: | ||
| types: [created] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -type f -name "fork-ci-review.yml" | head -20Repository: lightseekorg/smg
Length of output: 230
🏁 Script executed:
fd -t f "fork-ci-review.yml"Repository: lightseekorg/smg
Length of output: 42
🏁 Script executed:
fd -t f "fork" .github/workflows/Repository: lightseekorg/smg
Length of output: 96
🏁 Script executed:
cat -n .github/workflows/fork-ci-review.ymlRepository: lightseekorg/smg
Length of output: 5504
@claude comment runs are not scoped to pull requests—they can execute on plain issues or use the default branch.
The issue_comment trigger at line 21-22 fires on both pull requests and issues. Without a check for github.event.issue.pull_request != null in the job condition (line 35), a comment on a plain issue can trigger the job. Additionally, actions/checkout at line 53 lacks an explicit ref parameter, so it defaults to the repository's default branch instead of the PR head—even on valid PR comments.
Suggested fix
claude-review:
if: >-
github.actor == 'ai-jz' && (
github.event_name == 'pull_request' ||
- (github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude'))
+ (
+ github.event_name == 'issue_comment' &&
+ github.event.issue.pull_request != null &&
+ contains(github.event.comment.body, '@claude')
+ )
)
@@
- uses: actions/checkout@v6
with:
fetch-depth: 0
+ ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || format('refs/pull/{0}/head', github.event.issue.number) }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/fork-ci-review.yml around lines 21 - 22, The workflow
currently triggers on issue_comment and will run for plain issues and
default-branch refs; update the job condition (where the workflow checks before
running) to require github.event.issue.pull_request != null so it only runs for
PR comments and add a secondary guard ensuring github.event_name ==
'issue_comment' && contains(github.event.comment.body, '@claude') if applicable;
also change the actions/checkout step (the checkout job using actions/checkout)
to pass an explicit ref: github.event.issue.pull_request.head.ref or
github.event.pull_request.head.sha (use the correct event field in your job) so
the job checks out the PR head instead of the default branch.
There was a problem hiding this comment.
This file was included by mistake — fork-ci-review.yml is fork-only CI infrastructure and has been removed from this PR (force-pushed to drop the commit). These findings are valid and will be addressed on the fork separately.
There was a problem hiding this comment.
@ai-jz, understood — thanks for the clarification! Glad the findings were useful; feel free to apply them when revisiting that workflow on the fork. No further action needed here.
🧠 Learnings used
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1003
File: .github/workflows/claude-code-review.yml:77-81
Timestamp: 2026-03-31T23:05:49.930Z
Learning: In repo lightseekorg/smg, `github-actions[bot]` does NOT post inline PR review comments, so it does not need to be included in cross-bot deduplication lists for Claude review workflows (e.g., in `.github/workflows/claude-code-review.yml`). Do not flag its absence from such lists.
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1036
File: .github/workflows/claude-code-review.yml:71-73
Timestamp: 2026-04-06T16:49:28.633Z
Learning: In GitHub Actions workflow YAMLs, treat any `prompt:` field intended for an LLM (e.g., Claude) as natural-language instructions for the model. Do not embed shell-script-specific constructs or error-handling patterns inside the prompt content (e.g., `git rev-parse --verify`, `2>/dev/null`, explicit exit-code checks like `if [ $? -ne 0 ]`). Instead, express fallback/error-handling guidance in natural language so the LLM can reason about failures from context rather than parsing shell stderr/stdout semantics.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 633
File: .github/workflows/pr-test-rust.yml:484-485
Timestamp: 2026-03-04T20:40:48.357Z
Learning: In repo lightseekorg/smg, the composite action `.github/actions/run-e2e` (at `.github/actions/run-e2e/action.yml`) includes a `actions/checkoutv6` step internally as its first step. Callers (e.g., `e2e-always`, `e2e-chat-completions`, `e2e-agentic` jobs in `.github/workflows/pr-test-rust.yml`) do NOT need a separate checkout step before invoking this action — do not flag missing checkout for these jobs.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 633
File: .github/workflows/pr-test-rust.yml:439-440
Timestamp: 2026-03-04T20:41:53.013Z
Learning: Ensure GitHub Actions workflows always include an actions/checkout step when referencing local composite actions (uses: ./... or similar) on self-hosted runners. Do not rely on a pre-checked-out workspace unless your environment explicitly guarantees it; follow the standard behavior and include checkout to ensure reliable action resolution.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 638
File: .github/workflows/release-grpc.yml:35-37
Timestamp: 2026-03-05T05:50:07.158Z
Learning: In workflows under .github/workflows that trigger only on push to main, you do not need to guard against the all-zeros github.event.before (0000000000000000000000000000000000000000) since that value only occurs on initial branch creation and cannot happen for an already-existing main. Apply this guidance to similar push-to-main workflows across the repository (pattern: **/.github/workflows/*.yml). If a workflow also handles other events or branches, retain appropriate guards for those cases.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 633
File: .github/workflows/pr-test-rust.yml:439-440
Timestamp: 2026-03-04T20:42:14.058Z
Learning: In lightseekorg/smg, the self-hosted k8s-runner-gpu and k8s-runner-cpu runners appear to have a pre-checked-out workspace, which allows local composite action references (e.g., uses: ./.github/actions/run-e2e) to resolve without an explicit actions/checkout step. Per standard GitHub Actions behavior, checkout IS required before local `./` composite action references; this repo is an exception due to its persistent self-hosted runner workspace.
Learnt from: slin1237
Repo: lightseekorg/smg PR: 851
File: docs/_audit/reliability-worker.md:216-228
Timestamp: 2026-03-21T06:14:03.701Z
Learning: In repo lightseekorg/smg, files under `docs/_audit/` (e.g., `docs/_audit/reliability-worker.md`) are internal audit checklists used during the review process and are not shipped as user-facing documentation. Do not flag markdown formatting nitpicks (e.g., missing code block language specifiers, lint warnings) in these files — formatting polish is out of scope for them.
Learnt from: vschandramourya
Repo: lightseekorg/smg PR: 915
File: model_gateway/src/routers/grpc/client.rs:387-423
Timestamp: 2026-03-26T17:06:14.307Z
Learning: In repo lightseekorg/smg, in `model_gateway/src/routers/grpc/client.rs` and the corresponding backend builders (`crates/grpc_client/src/sglang_scheduler.rs`, `vllm_engine.rs`, `trtllm_service.rs`): The per-backend divergence in handling `CompletionRequest.max_tokens == None` is intentional. SGLang and vLLM pass `None` through to their proto builders, while TRT-LLM falls back to `16`. This matches the pre-existing per-backend pattern used in the chat/messages request builders. Do not flag this divergence as a bug or request normalization at the `build_completion_request` dispatcher layer in `client.rs`.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 495
File: grpc_client/proto/sglang_scheduler.proto:140-172
Timestamp: 2026-02-21T11:57:36.560Z
Learning: In repo lightseekorg/smg, the multimodal proto fields in `grpc_client/proto/sglang_scheduler.proto` (including `MultimodalInputs.pixel_values`, `model_specific_tensors`, `im_token_id`, and `mm_placeholders`) were introduced in PR `#495` as the first real usage; prior internal-only development fields like `processed_features` were never shipped to production, so wire-compatibility concerns do not apply to these field number assignments.
Learnt from: vschandramourya
Repo: lightseekorg/smg PR: 978
File: model_gateway/src/routers/grpc/regular/streaming.rs:2275-2276
Timestamp: 2026-03-31T03:43:30.981Z
Learning: In repo lightseekorg/smg, file model_gateway/src/routers/grpc/regular/streaming.rs: `logprobs: None` in `CompletionStreamChoice` within `process_completion_streaming_chunks` is intentional. Per-token logprob streaming is deferred across all streaming endpoints (completions, generate) because it requires backend-side plumbing not yet available. The same `// TODO: wire per-token logprob streaming` deferral exists in `process_generate_streaming` (around line 702). Non-streaming completions has the same TODO. Do not flag `logprobs: None` or the TODO comment in the completions or generate streaming paths as a missing feature or silent data loss until a dedicated follow-up PR wires backend per-token logprob support.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 638
File: grpc_servicer/smg_grpc_servicer/vllm/servicer.py:371-406
Timestamp: 2026-03-05T04:48:49.033Z
Learning: In repo lightseekorg/smg, `grpc_servicer/smg_grpc_servicer/vllm/servicer.py` (`_build_preprocessed_mm_inputs`): multimodal metadata fields (`flat_keys`, `batched_keys`, `mm_placeholders`, `model_specific_tensors`) are produced exclusively by the internal Rust router and are treated as trusted data. Defensive input-validation guards (e.g., checking flat_keys targets exist in hf_dict, validating PlaceholderRange bounds) are intentionally omitted to avoid hot-path overhead. Do not flag missing validation on these fields in future reviews.
Learnt from: slin1237
Repo: lightseekorg/smg PR: 447
File: model_gateway/src/routers/grpc/client.rs:312-328
Timestamp: 2026-02-17T20:30:41.243Z
Learning: In model_gateway gRPC metadata discovery, SGLang's `server_args` (a `prost_types::Struct`) natively uses short-form keys like `tp_size`, `dp_size`, `pp_size` that match `SGLANG_GRPC_KEYS` and are extracted via `pick_prost_fields()` without normalization. vLLM and TRT-LLM gRPC protos use long-form keys like `tensor_parallel_size`, `pipeline_parallel_size`, which go through `flat_labels()` and are normalized by `normalize_grpc_keys()` in `discover_metadata.rs` after both `model_info.to_labels()` and `server_info.to_labels()` return.
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1034
File: model_gateway/src/routers/grpc/multimodal.rs:602-610
Timestamp: 2026-04-04T00:29:52.690Z
Learning: In repo lightseekorg/smg, the gRPC regular pipeline runs stages in order: Preparation (tokenization + multimodal media fetch + preprocessing) → Worker Selection → Request Building. Because backend identity (e.g., GrpcClient::Mlx) is only resolved after worker selection, it is architecturally impossible to short-circuit MLX multimodal rejection before process_multimodal() / process_multimodal_messages() runs in the Preparation stage. The earliest feasible rejection point for backend-specific multimodal unsupported errors is at assemble_multimodal_data in request_building (after preprocessing but before generation). Moving rejection earlier would require a larger architectural change (e.g., moving worker selection before preparation). Do not flag assemble_multimodal_data's GrpcClient::Mlx error arm as "too late" in the pipeline.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 606
File: e2e_test/chat_completions/test_openai_server.py:600-600
Timestamp: 2026-03-04T04:05:00.273Z
Learning: In the `lightseekorg/smg` repository, as of PR `#606`, Harmony (gpt-oss) DOES support stop sequence trimming. `TestChatCompletionGptOss` in `e2e_test/chat_completions/test_openai_server.py` correctly sets `STOP_SEQUENCE_TRIMMED = True` by default. The vLLM backend (is_vllm) overrides this to `False` in both `test_stop_sequences` and `test_stop_sequences_stream` because vLLM does not trim stop sequences. Do not flag `STOP_SEQUENCE_TRIMMED = True` in `TestChatCompletionGptOss` as incorrect.
There was a problem hiding this comment.
This file was included by mistake and has been removed from this PR.
| - name: Validate secret | ||
| env: | ||
| KEY: ${{ secrets.ANTHROPIC_API_KEY }} | ||
| run: | | ||
| if [ -z "$KEY" ]; then | ||
| echo "::warning::ANTHROPIC_API_KEY not set — skipping Claude review" | ||
| exit 0 | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -t f fork-ci-review.ymlRepository: lightseekorg/smg
Length of output: 42
🏁 Script executed:
ls -la .github/workflows/ | grep -i forkRepository: lightseekorg/smg
Length of output: 128
🏁 Script executed:
cat -n .github/workflows/fork-ci-review.ymlRepository: lightseekorg/smg
Length of output: 5504
Missing secrets do not skip either job; validation steps need output-based conditionals.
The validation steps exit successfully (exit 0), allowing subsequent steps to execute with empty API keys. In GitHub Actions, a step exiting with code 0 is successful; it does not prevent the next step from running. This causes the Claude Code review and Codex review actions to fail with missing credentials instead of cleanly skipping.
Add step outputs to the validation steps and apply if: conditionals to the checkout and review steps to actually skip them when secrets are missing:
Suggested fix
- name: Validate secret
+ id: anthropic_secret
env:
KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
if [ -z "$KEY" ]; then
echo "::warning::ANTHROPIC_API_KEY not set — skipping Claude review"
+ echo "present=false" >> "$GITHUB_OUTPUT"
exit 0
fi
+ echo "present=true" >> "$GITHUB_OUTPUT"
- uses: actions/checkout@v6
+ if: steps.anthropic_secret.outputs.present == 'true'
with:
fetch-depth: 0
- name: Claude Code review
+ if: steps.anthropic_secret.outputs.present == 'true'
uses: anthropics/claude-code-action@c7c8889b30499b4e46f4c32b892e43cd364bc2fe # v1Apply the same pattern to the codex-review job (lines 89–96).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Validate secret | |
| env: | |
| KEY: ${{ secrets.ANTHROPIC_API_KEY }} | |
| run: | | |
| if [ -z "$KEY" ]; then | |
| echo "::warning::ANTHROPIC_API_KEY not set — skipping Claude review" | |
| exit 0 | |
| fi | |
| - name: Validate secret | |
| id: anthropic_secret | |
| env: | |
| KEY: ${{ secrets.ANTHROPIC_API_KEY }} | |
| run: | | |
| if [ -z "$KEY" ]; then | |
| echo "::warning::ANTHROPIC_API_KEY not set — skipping Claude review" | |
| echo "present=false" >> "$GITHUB_OUTPUT" | |
| exit 0 | |
| fi | |
| echo "present=true" >> "$GITHUB_OUTPUT" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/fork-ci-review.yml around lines 44 - 51, The "Validate
secret" step currently exits 0 and does not prevent downstream steps from
running when ANTHROPIC_API_KEY is missing; change the validation to emit a step
output (e.g., set a has_key/has_anthropic output via GITHUB_OUTPUT) indicating
whether the secret exists, and then add if: conditionals on the subsequent
"checkout" and "Claude Code review" steps to run only when that output is true;
apply the same pattern to the codex-review job's validation step and its
dependent steps (use consistent output names like has_key or has_codex_key and
reference them in if: conditions).
There was a problem hiding this comment.
This file was included by mistake — fork-ci-review.yml is fork-only CI infrastructure and has been removed from this PR (force-pushed to drop the commit). These findings are valid and will be addressed on the fork separately.
There was a problem hiding this comment.
@ai-jz, understood — thanks for the clarification. Since the file has been dropped from this PR entirely, the findings are moot here. No action needed on your end for this review; glad the findings are useful for hardening the fork workflow separately.
🧠 Learnings used
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1036
File: .github/workflows/claude-code-review.yml:71-73
Timestamp: 2026-04-06T16:49:28.633Z
Learning: In GitHub Actions workflow YAMLs, treat any `prompt:` field intended for an LLM (e.g., Claude) as natural-language instructions for the model. Do not embed shell-script-specific constructs or error-handling patterns inside the prompt content (e.g., `git rev-parse --verify`, `2>/dev/null`, explicit exit-code checks like `if [ $? -ne 0 ]`). Instead, express fallback/error-handling guidance in natural language so the LLM can reason about failures from context rather than parsing shell stderr/stdout semantics.
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1003
File: .github/workflows/claude-code-review.yml:77-81
Timestamp: 2026-03-31T23:05:49.930Z
Learning: In repo lightseekorg/smg, `github-actions[bot]` does NOT post inline PR review comments, so it does not need to be included in cross-bot deduplication lists for Claude review workflows (e.g., in `.github/workflows/claude-code-review.yml`). Do not flag its absence from such lists.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 633
File: .github/workflows/pr-test-rust.yml:484-485
Timestamp: 2026-03-04T20:40:48.357Z
Learning: In repo lightseekorg/smg, the composite action `.github/actions/run-e2e` (at `.github/actions/run-e2e/action.yml`) includes a `actions/checkoutv6` step internally as its first step. Callers (e.g., `e2e-always`, `e2e-chat-completions`, `e2e-agentic` jobs in `.github/workflows/pr-test-rust.yml`) do NOT need a separate checkout step before invoking this action — do not flag missing checkout for these jobs.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 633
File: .github/workflows/pr-test-rust.yml:439-440
Timestamp: 2026-03-04T20:41:53.013Z
Learning: Ensure GitHub Actions workflows always include an actions/checkout step when referencing local composite actions (uses: ./... or similar) on self-hosted runners. Do not rely on a pre-checked-out workspace unless your environment explicitly guarantees it; follow the standard behavior and include checkout to ensure reliable action resolution.
Learnt from: slin1237
Repo: lightseekorg/smg PR: 851
File: docs/_audit/reliability-worker.md:216-228
Timestamp: 2026-03-21T06:14:03.701Z
Learning: In repo lightseekorg/smg, files under `docs/_audit/` (e.g., `docs/_audit/reliability-worker.md`) are internal audit checklists used during the review process and are not shipped as user-facing documentation. Do not flag markdown formatting nitpicks (e.g., missing code block language specifiers, lint warnings) in these files — formatting polish is out of scope for them.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 633
File: .github/actions/run-e2e/action.yml:65-83
Timestamp: 2026-03-04T20:39:43.892Z
Learning: Repo lightseekorg/smg uses custom/internal action versions that differ from public GitHub marketplace releases. Specifically: `actions/checkoutv6`, `actions/download-artifactv8`, and `actions/upload-artifactv7` are all valid and intentional throughout this repo's workflows (e.g., pr-test-rust.yml and composite actions under .github/actions/). Do not flag these versions as non-existent or incorrect.
Learnt from: vschandramourya
Repo: lightseekorg/smg PR: 915
File: model_gateway/src/routers/grpc/client.rs:387-423
Timestamp: 2026-03-26T17:06:14.307Z
Learning: In repo lightseekorg/smg, in `model_gateway/src/routers/grpc/client.rs` and the corresponding backend builders (`crates/grpc_client/src/sglang_scheduler.rs`, `vllm_engine.rs`, `trtllm_service.rs`): The per-backend divergence in handling `CompletionRequest.max_tokens == None` is intentional. SGLang and vLLM pass `None` through to their proto builders, while TRT-LLM falls back to `16`. This matches the pre-existing per-backend pattern used in the chat/messages request builders. Do not flag this divergence as a bug or request normalization at the `build_completion_request` dispatcher layer in `client.rs`.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 495
File: grpc_client/proto/sglang_scheduler.proto:140-172
Timestamp: 2026-02-21T11:57:36.560Z
Learning: In repo lightseekorg/smg, the multimodal proto fields in `grpc_client/proto/sglang_scheduler.proto` (including `MultimodalInputs.pixel_values`, `model_specific_tensors`, `im_token_id`, and `mm_placeholders`) were introduced in PR `#495` as the first real usage; prior internal-only development fields like `processed_features` were never shipped to production, so wire-compatibility concerns do not apply to these field number assignments.
Learnt from: vschandramourya
Repo: lightseekorg/smg PR: 978
File: model_gateway/src/routers/grpc/regular/streaming.rs:2275-2276
Timestamp: 2026-03-31T03:43:30.981Z
Learning: In repo lightseekorg/smg, file model_gateway/src/routers/grpc/regular/streaming.rs: `logprobs: None` in `CompletionStreamChoice` within `process_completion_streaming_chunks` is intentional. Per-token logprob streaming is deferred across all streaming endpoints (completions, generate) because it requires backend-side plumbing not yet available. The same `// TODO: wire per-token logprob streaming` deferral exists in `process_generate_streaming` (around line 702). Non-streaming completions has the same TODO. Do not flag `logprobs: None` or the TODO comment in the completions or generate streaming paths as a missing feature or silent data loss until a dedicated follow-up PR wires backend per-token logprob support.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 638
File: grpc_servicer/smg_grpc_servicer/vllm/servicer.py:371-406
Timestamp: 2026-03-05T04:48:49.033Z
Learning: In repo lightseekorg/smg, `grpc_servicer/smg_grpc_servicer/vllm/servicer.py` (`_build_preprocessed_mm_inputs`): multimodal metadata fields (`flat_keys`, `batched_keys`, `mm_placeholders`, `model_specific_tensors`) are produced exclusively by the internal Rust router and are treated as trusted data. Defensive input-validation guards (e.g., checking flat_keys targets exist in hf_dict, validating PlaceholderRange bounds) are intentionally omitted to avoid hot-path overhead. Do not flag missing validation on these fields in future reviews.
Learnt from: slin1237
Repo: lightseekorg/smg PR: 447
File: model_gateway/src/routers/grpc/client.rs:312-328
Timestamp: 2026-02-17T20:30:41.243Z
Learning: In model_gateway gRPC metadata discovery, SGLang's `server_args` (a `prost_types::Struct`) natively uses short-form keys like `tp_size`, `dp_size`, `pp_size` that match `SGLANG_GRPC_KEYS` and are extracted via `pick_prost_fields()` without normalization. vLLM and TRT-LLM gRPC protos use long-form keys like `tensor_parallel_size`, `pipeline_parallel_size`, which go through `flat_labels()` and are normalized by `normalize_grpc_keys()` in `discover_metadata.rs` after both `model_info.to_labels()` and `server_info.to_labels()` return.
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1034
File: model_gateway/src/routers/grpc/multimodal.rs:602-610
Timestamp: 2026-04-04T00:29:52.690Z
Learning: In repo lightseekorg/smg, the gRPC regular pipeline runs stages in order: Preparation (tokenization + multimodal media fetch + preprocessing) → Worker Selection → Request Building. Because backend identity (e.g., GrpcClient::Mlx) is only resolved after worker selection, it is architecturally impossible to short-circuit MLX multimodal rejection before process_multimodal() / process_multimodal_messages() runs in the Preparation stage. The earliest feasible rejection point for backend-specific multimodal unsupported errors is at assemble_multimodal_data in request_building (after preprocessing but before generation). Moving rejection earlier would require a larger architectural change (e.g., moving worker selection before preparation). Do not flag assemble_multimodal_data's GrpcClient::Mlx error arm as "too late" in the pipeline.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 606
File: e2e_test/chat_completions/test_openai_server.py:600-600
Timestamp: 2026-03-04T04:05:00.273Z
Learning: In the `lightseekorg/smg` repository, as of PR `#606`, Harmony (gpt-oss) DOES support stop sequence trimming. `TestChatCompletionGptOss` in `e2e_test/chat_completions/test_openai_server.py` correctly sets `STOP_SEQUENCE_TRIMMED = True` by default. The vLLM backend (is_vllm) overrides this to `False` in both `test_stop_sequences` and `test_stop_sequences_stream` because vLLM does not trim stop sequences. Do not flag `STOP_SEQUENCE_TRIMMED = True` in `TestChatCompletionGptOss` as incorrect.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 638
File: .github/workflows/release-grpc.yml:35-37
Timestamp: 2026-03-05T05:50:07.158Z
Learning: In workflows under .github/workflows that trigger only on push to main, you do not need to guard against the all-zeros github.event.before (0000000000000000000000000000000000000000) since that value only occurs on initial branch creation and cannot happen for an already-existing main. Apply this guidance to similar push-to-main workflows across the repository (pattern: **/.github/workflows/*.yml). If a workflow also handles other events or branches, retain appropriate guards for those cases.
There was a problem hiding this comment.
This file was included by mistake and has been removed from this PR.
| - uses: actions/checkout@v6 | ||
| with: | ||
| fetch-depth: 0 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n .github/workflows/fork-ci-review.yml | head -120Repository: lightseekorg/smg
Length of output: 5079
Pin actions/checkout to an exact commit SHA or remove the security claim.
Line 6 asserts "Actions pinned to SHA (not floating tags) to prevent supply chain attacks," but actions/checkout@v6 on lines 53 and 98 uses a floating tag. Other actions in this file (anthropics/claude-code-action and openai/codex-action) are correctly pinned to commit SHAs. Either pin checkout to a specific commit hash like the others, or remove the SHA-pinning security claim from the header.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/fork-ci-review.yml around lines 53 - 55, The workflow
currently uses actions/checkout@v6 (the two occurrences of actions/checkout@v6)
which is a floating tag contradicting the repo's SHA-pinning policy; replace
each actions/checkout@v6 with the exact commit SHA pin used for other actions
(e.g., actions/checkout@<commit-sha>) or remove the SHA-pinning claim in the
workflow header; update both occurrences of actions/checkout so they match the
pinned format used for anthropics/claude-code-action and openai/codex-action.
There was a problem hiding this comment.
This file was included by mistake — fork-ci-review.yml is fork-only CI infrastructure and has been removed from this PR (force-pushed to drop the commit). These findings are valid and will be addressed on the fork separately.
There was a problem hiding this comment.
@ai-jz, understood — thanks for the clarification! Makes sense to keep fork-only CI infrastructure out of the upstream PR. Good to hear the SHA-pinning findings will be addressed on the fork side.
🧠 Learnings used
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1003
File: .github/workflows/claude-code-review.yml:77-81
Timestamp: 2026-03-31T23:05:49.930Z
Learning: In repo lightseekorg/smg, `github-actions[bot]` does NOT post inline PR review comments, so it does not need to be included in cross-bot deduplication lists for Claude review workflows (e.g., in `.github/workflows/claude-code-review.yml`). Do not flag its absence from such lists.
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1036
File: .github/workflows/claude-code-review.yml:71-73
Timestamp: 2026-04-06T16:49:28.633Z
Learning: In GitHub Actions workflow YAMLs, treat any `prompt:` field intended for an LLM (e.g., Claude) as natural-language instructions for the model. Do not embed shell-script-specific constructs or error-handling patterns inside the prompt content (e.g., `git rev-parse --verify`, `2>/dev/null`, explicit exit-code checks like `if [ $? -ne 0 ]`). Instead, express fallback/error-handling guidance in natural language so the LLM can reason about failures from context rather than parsing shell stderr/stdout semantics.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 633
File: .github/workflows/pr-test-rust.yml:484-485
Timestamp: 2026-03-04T20:40:48.357Z
Learning: In repo lightseekorg/smg, the composite action `.github/actions/run-e2e` (at `.github/actions/run-e2e/action.yml`) includes a `actions/checkoutv6` step internally as its first step. Callers (e.g., `e2e-always`, `e2e-chat-completions`, `e2e-agentic` jobs in `.github/workflows/pr-test-rust.yml`) do NOT need a separate checkout step before invoking this action — do not flag missing checkout for these jobs.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 633
File: .github/workflows/pr-test-rust.yml:439-440
Timestamp: 2026-03-04T20:41:53.013Z
Learning: Ensure GitHub Actions workflows always include an actions/checkout step when referencing local composite actions (uses: ./... or similar) on self-hosted runners. Do not rely on a pre-checked-out workspace unless your environment explicitly guarantees it; follow the standard behavior and include checkout to ensure reliable action resolution.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 633
File: .github/actions/run-e2e/action.yml:65-83
Timestamp: 2026-03-04T20:39:43.892Z
Learning: Repo lightseekorg/smg uses custom/internal action versions that differ from public GitHub marketplace releases. Specifically: `actions/checkoutv6`, `actions/download-artifactv8`, and `actions/upload-artifactv7` are all valid and intentional throughout this repo's workflows (e.g., pr-test-rust.yml and composite actions under .github/actions/). Do not flag these versions as non-existent or incorrect.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 633
File: .github/workflows/pr-test-rust.yml:439-440
Timestamp: 2026-03-04T20:42:14.058Z
Learning: In lightseekorg/smg, the self-hosted k8s-runner-gpu and k8s-runner-cpu runners appear to have a pre-checked-out workspace, which allows local composite action references (e.g., uses: ./.github/actions/run-e2e) to resolve without an explicit actions/checkout step. Per standard GitHub Actions behavior, checkout IS required before local `./` composite action references; this repo is an exception due to its persistent self-hosted runner workspace.
Learnt from: slin1237
Repo: lightseekorg/smg PR: 851
File: docs/_audit/reliability-worker.md:216-228
Timestamp: 2026-03-21T06:14:03.701Z
Learning: In repo lightseekorg/smg, files under `docs/_audit/` (e.g., `docs/_audit/reliability-worker.md`) are internal audit checklists used during the review process and are not shipped as user-facing documentation. Do not flag markdown formatting nitpicks (e.g., missing code block language specifiers, lint warnings) in these files — formatting polish is out of scope for them.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 638
File: .github/workflows/release-grpc.yml:35-37
Timestamp: 2026-03-05T05:50:07.158Z
Learning: In workflows under .github/workflows that trigger only on push to main, you do not need to guard against the all-zeros github.event.before (0000000000000000000000000000000000000000) since that value only occurs on initial branch creation and cannot happen for an already-existing main. Apply this guidance to similar push-to-main workflows across the repository (pattern: **/.github/workflows/*.yml). If a workflow also handles other events or branches, retain appropriate guards for those cases.
Learnt from: vschandramourya
Repo: lightseekorg/smg PR: 915
File: model_gateway/src/routers/grpc/client.rs:387-423
Timestamp: 2026-03-26T17:06:14.307Z
Learning: In repo lightseekorg/smg, in `model_gateway/src/routers/grpc/client.rs` and the corresponding backend builders (`crates/grpc_client/src/sglang_scheduler.rs`, `vllm_engine.rs`, `trtllm_service.rs`): The per-backend divergence in handling `CompletionRequest.max_tokens == None` is intentional. SGLang and vLLM pass `None` through to their proto builders, while TRT-LLM falls back to `16`. This matches the pre-existing per-backend pattern used in the chat/messages request builders. Do not flag this divergence as a bug or request normalization at the `build_completion_request` dispatcher layer in `client.rs`.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 495
File: grpc_client/proto/sglang_scheduler.proto:140-172
Timestamp: 2026-02-21T11:57:36.560Z
Learning: In repo lightseekorg/smg, the multimodal proto fields in `grpc_client/proto/sglang_scheduler.proto` (including `MultimodalInputs.pixel_values`, `model_specific_tensors`, `im_token_id`, and `mm_placeholders`) were introduced in PR `#495` as the first real usage; prior internal-only development fields like `processed_features` were never shipped to production, so wire-compatibility concerns do not apply to these field number assignments.
Learnt from: vschandramourya
Repo: lightseekorg/smg PR: 978
File: model_gateway/src/routers/grpc/regular/streaming.rs:2275-2276
Timestamp: 2026-03-31T03:43:30.981Z
Learning: In repo lightseekorg/smg, file model_gateway/src/routers/grpc/regular/streaming.rs: `logprobs: None` in `CompletionStreamChoice` within `process_completion_streaming_chunks` is intentional. Per-token logprob streaming is deferred across all streaming endpoints (completions, generate) because it requires backend-side plumbing not yet available. The same `// TODO: wire per-token logprob streaming` deferral exists in `process_generate_streaming` (around line 702). Non-streaming completions has the same TODO. Do not flag `logprobs: None` or the TODO comment in the completions or generate streaming paths as a missing feature or silent data loss until a dedicated follow-up PR wires backend per-token logprob support.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 638
File: grpc_servicer/smg_grpc_servicer/vllm/servicer.py:371-406
Timestamp: 2026-03-05T04:48:49.033Z
Learning: In repo lightseekorg/smg, `grpc_servicer/smg_grpc_servicer/vllm/servicer.py` (`_build_preprocessed_mm_inputs`): multimodal metadata fields (`flat_keys`, `batched_keys`, `mm_placeholders`, `model_specific_tensors`) are produced exclusively by the internal Rust router and are treated as trusted data. Defensive input-validation guards (e.g., checking flat_keys targets exist in hf_dict, validating PlaceholderRange bounds) are intentionally omitted to avoid hot-path overhead. Do not flag missing validation on these fields in future reviews.
Learnt from: slin1237
Repo: lightseekorg/smg PR: 447
File: model_gateway/src/routers/grpc/client.rs:312-328
Timestamp: 2026-02-17T20:30:41.243Z
Learning: In model_gateway gRPC metadata discovery, SGLang's `server_args` (a `prost_types::Struct`) natively uses short-form keys like `tp_size`, `dp_size`, `pp_size` that match `SGLANG_GRPC_KEYS` and are extracted via `pick_prost_fields()` without normalization. vLLM and TRT-LLM gRPC protos use long-form keys like `tensor_parallel_size`, `pipeline_parallel_size`, which go through `flat_labels()` and are normalized by `normalize_grpc_keys()` in `discover_metadata.rs` after both `model_info.to_labels()` and `server_info.to_labels()` return.
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1034
File: model_gateway/src/routers/grpc/multimodal.rs:602-610
Timestamp: 2026-04-04T00:29:52.690Z
Learning: In repo lightseekorg/smg, the gRPC regular pipeline runs stages in order: Preparation (tokenization + multimodal media fetch + preprocessing) → Worker Selection → Request Building. Because backend identity (e.g., GrpcClient::Mlx) is only resolved after worker selection, it is architecturally impossible to short-circuit MLX multimodal rejection before process_multimodal() / process_multimodal_messages() runs in the Preparation stage. The earliest feasible rejection point for backend-specific multimodal unsupported errors is at assemble_multimodal_data in request_building (after preprocessing but before generation). Moving rejection earlier would require a larger architectural change (e.g., moving worker selection before preparation). Do not flag assemble_multimodal_data's GrpcClient::Mlx error arm as "too late" in the pipeline.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 606
File: e2e_test/chat_completions/test_openai_server.py:600-600
Timestamp: 2026-03-04T04:05:00.273Z
Learning: In the `lightseekorg/smg` repository, as of PR `#606`, Harmony (gpt-oss) DOES support stop sequence trimming. `TestChatCompletionGptOss` in `e2e_test/chat_completions/test_openai_server.py` correctly sets `STOP_SEQUENCE_TRIMMED = True` by default. The vLLM backend (is_vllm) overrides this to `False` in both `test_stop_sequences` and `test_stop_sequences_stream` because vLLM does not trim stop sequences. Do not flag `STOP_SEQUENCE_TRIMMED = True` in `TestChatCompletionGptOss` as incorrect.
There was a problem hiding this comment.
This file was included by mistake and has been removed from this PR.
There was a problem hiding this comment.
@ai-jz, confirmed — thanks for removing it!
🧠 Learnings used
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1003
File: .github/workflows/claude-code-review.yml:77-81
Timestamp: 2026-03-31T23:05:49.930Z
Learning: In repo lightseekorg/smg, `github-actions[bot]` does NOT post inline PR review comments, so it does not need to be included in cross-bot deduplication lists for Claude review workflows (e.g., in `.github/workflows/claude-code-review.yml`). Do not flag its absence from such lists.
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1036
File: .github/workflows/claude-code-review.yml:71-73
Timestamp: 2026-04-06T16:49:28.633Z
Learning: In GitHub Actions workflow YAMLs, treat any `prompt:` field intended for an LLM (e.g., Claude) as natural-language instructions for the model. Do not embed shell-script-specific constructs or error-handling patterns inside the prompt content (e.g., `git rev-parse --verify`, `2>/dev/null`, explicit exit-code checks like `if [ $? -ne 0 ]`). Instead, express fallback/error-handling guidance in natural language so the LLM can reason about failures from context rather than parsing shell stderr/stdout semantics.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 606
File: e2e_test/chat_completions/test_openai_server.py:611-612
Timestamp: 2026-03-04T04:43:55.458Z
Learning: In `lightseekorg/smg` PR `#606`, `TestChatCompletionGptOss` in `e2e_test/chat_completions/test_openai_server.py`: `STOP_SEQUENCE_TRIMMED = True` by default (Harmony/gpt-oss trims stop sequences). In `test_stop_sequences` (non-streaming), both `is_vllm()` and `is_sglang()` override this to `False` because neither backend trims stop sequences in non-streaming mode. In `test_stop_sequences_stream` (streaming), only `is_vllm()` overrides to `False`. This asymmetry is intentional. Do not flag `is_sglang()` in the non-streaming condition as incorrect.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 633
File: .github/workflows/pr-test-rust.yml:484-485
Timestamp: 2026-03-04T20:40:48.357Z
Learning: In repo lightseekorg/smg, the composite action `.github/actions/run-e2e` (at `.github/actions/run-e2e/action.yml`) includes a `actions/checkoutv6` step internally as its first step. Callers (e.g., `e2e-always`, `e2e-chat-completions`, `e2e-agentic` jobs in `.github/workflows/pr-test-rust.yml`) do NOT need a separate checkout step before invoking this action — do not flag missing checkout for these jobs.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 606
File: e2e_test/chat_completions/test_openai_server.py:600-600
Timestamp: 2026-03-04T04:05:00.273Z
Learning: In the `lightseekorg/smg` repository, as of PR `#606`, Harmony (gpt-oss) DOES support stop sequence trimming. `TestChatCompletionGptOss` in `e2e_test/chat_completions/test_openai_server.py` correctly sets `STOP_SEQUENCE_TRIMMED = True` by default. The vLLM backend (is_vllm) overrides this to `False` in both `test_stop_sequences` and `test_stop_sequences_stream` because vLLM does not trim stop sequences. Do not flag `STOP_SEQUENCE_TRIMMED = True` in `TestChatCompletionGptOss` as incorrect.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 633
File: .github/workflows/pr-test-rust.yml:439-440
Timestamp: 2026-03-04T20:41:53.013Z
Learning: Ensure GitHub Actions workflows always include an actions/checkout step when referencing local composite actions (uses: ./... or similar) on self-hosted runners. Do not rely on a pre-checked-out workspace unless your environment explicitly guarantees it; follow the standard behavior and include checkout to ensure reliable action resolution.
Learnt from: slin1237
Repo: lightseekorg/smg PR: 851
File: docs/_audit/reliability-worker.md:216-228
Timestamp: 2026-03-21T06:14:03.701Z
Learning: In repo lightseekorg/smg, files under `docs/_audit/` (e.g., `docs/_audit/reliability-worker.md`) are internal audit checklists used during the review process and are not shipped as user-facing documentation. Do not flag markdown formatting nitpicks (e.g., missing code block language specifiers, lint warnings) in these files — formatting polish is out of scope for them.
Learnt from: slin1237
Repo: lightseekorg/smg PR: 489
File: model_gateway/benches/wasm_middleware_latency.rs:88-91
Timestamp: 2026-02-21T02:37:02.009Z
Learning: Repo: lightseekorg/smg — For clippy-only/enforcement PRs (e.g., PR `#489`), even micro-optimizations (like replacing an async closure with std::future::ready in benches such as model_gateway/benches/wasm_middleware_latency.rs) should be deferred to a follow-up PR rather than included inline.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 633
File: .github/actions/run-e2e/action.yml:65-83
Timestamp: 2026-03-04T20:39:43.892Z
Learning: Repo lightseekorg/smg uses custom/internal action versions that differ from public GitHub marketplace releases. Specifically: `actions/checkoutv6`, `actions/download-artifactv8`, and `actions/upload-artifactv7` are all valid and intentional throughout this repo's workflows (e.g., pr-test-rust.yml and composite actions under .github/actions/). Do not flag these versions as non-existent or incorrect.
Learnt from: slin1237
Repo: lightseekorg/smg PR: 489
File: model_gateway/src/core/token_bucket.rs:58-63
Timestamp: 2026-02-21T02:30:58.753Z
Learning: For lint-only/Clippy enforcement PRs in this repo (e.g., PR `#489`), avoid introducing behavioral changes (like new input validation). Prefer a separate, follow-up issue/PR for such hardening.
Learnt from: vschandramourya
Repo: lightseekorg/smg PR: 915
File: model_gateway/src/routers/grpc/client.rs:387-423
Timestamp: 2026-03-26T17:06:14.307Z
Learning: In repo lightseekorg/smg, in `model_gateway/src/routers/grpc/client.rs` and the corresponding backend builders (`crates/grpc_client/src/sglang_scheduler.rs`, `vllm_engine.rs`, `trtllm_service.rs`): The per-backend divergence in handling `CompletionRequest.max_tokens == None` is intentional. SGLang and vLLM pass `None` through to their proto builders, while TRT-LLM falls back to `16`. This matches the pre-existing per-backend pattern used in the chat/messages request builders. Do not flag this divergence as a bug or request normalization at the `build_completion_request` dispatcher layer in `client.rs`.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 495
File: grpc_client/proto/sglang_scheduler.proto:140-172
Timestamp: 2026-02-21T11:57:36.560Z
Learning: In repo lightseekorg/smg, the multimodal proto fields in `grpc_client/proto/sglang_scheduler.proto` (including `MultimodalInputs.pixel_values`, `model_specific_tensors`, `im_token_id`, and `mm_placeholders`) were introduced in PR `#495` as the first real usage; prior internal-only development fields like `processed_features` were never shipped to production, so wire-compatibility concerns do not apply to these field number assignments.
Learnt from: vschandramourya
Repo: lightseekorg/smg PR: 978
File: model_gateway/src/routers/grpc/regular/streaming.rs:2275-2276
Timestamp: 2026-03-31T03:43:30.981Z
Learning: In repo lightseekorg/smg, file model_gateway/src/routers/grpc/regular/streaming.rs: `logprobs: None` in `CompletionStreamChoice` within `process_completion_streaming_chunks` is intentional. Per-token logprob streaming is deferred across all streaming endpoints (completions, generate) because it requires backend-side plumbing not yet available. The same `// TODO: wire per-token logprob streaming` deferral exists in `process_generate_streaming` (around line 702). Non-streaming completions has the same TODO. Do not flag `logprobs: None` or the TODO comment in the completions or generate streaming paths as a missing feature or silent data loss until a dedicated follow-up PR wires backend per-token logprob support.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 638
File: grpc_servicer/smg_grpc_servicer/vllm/servicer.py:371-406
Timestamp: 2026-03-05T04:48:49.033Z
Learning: In repo lightseekorg/smg, `grpc_servicer/smg_grpc_servicer/vllm/servicer.py` (`_build_preprocessed_mm_inputs`): multimodal metadata fields (`flat_keys`, `batched_keys`, `mm_placeholders`, `model_specific_tensors`) are produced exclusively by the internal Rust router and are treated as trusted data. Defensive input-validation guards (e.g., checking flat_keys targets exist in hf_dict, validating PlaceholderRange bounds) are intentionally omitted to avoid hot-path overhead. Do not flag missing validation on these fields in future reviews.
Learnt from: slin1237
Repo: lightseekorg/smg PR: 447
File: model_gateway/src/routers/grpc/client.rs:312-328
Timestamp: 2026-02-17T20:30:41.243Z
Learning: In model_gateway gRPC metadata discovery, SGLang's `server_args` (a `prost_types::Struct`) natively uses short-form keys like `tp_size`, `dp_size`, `pp_size` that match `SGLANG_GRPC_KEYS` and are extracted via `pick_prost_fields()` without normalization. vLLM and TRT-LLM gRPC protos use long-form keys like `tensor_parallel_size`, `pipeline_parallel_size`, which go through `flat_labels()` and are normalized by `normalize_grpc_keys()` in `discover_metadata.rs` after both `model_info.to_labels()` and `server_info.to_labels()` return.
Learnt from: key4ng
Repo: lightseekorg/smg PR: 1034
File: model_gateway/src/routers/grpc/multimodal.rs:602-610
Timestamp: 2026-04-04T00:29:52.690Z
Learning: In repo lightseekorg/smg, the gRPC regular pipeline runs stages in order: Preparation (tokenization + multimodal media fetch + preprocessing) → Worker Selection → Request Building. Because backend identity (e.g., GrpcClient::Mlx) is only resolved after worker selection, it is architecturally impossible to short-circuit MLX multimodal rejection before process_multimodal() / process_multimodal_messages() runs in the Preparation stage. The earliest feasible rejection point for backend-specific multimodal unsupported errors is at assemble_multimodal_data in request_building (after preprocessing but before generation). Moving rejection earlier would require a larger architectural change (e.g., moving worker selection before preparation). Do not flag assemble_multimodal_data's GrpcClient::Mlx error arm as "too late" in the pipeline.
Learnt from: CatherineSue
Repo: lightseekorg/smg PR: 638
File: .github/workflows/release-grpc.yml:35-37
Timestamp: 2026-03-05T05:50:07.158Z
Learning: In workflows under .github/workflows that trigger only on push to main, you do not need to guard against the all-zeros github.event.before (0000000000000000000000000000000000000000) since that value only occurs on initial branch creation and cannot happen for an already-existing main. Apply this guidance to similar push-to-main workflows across the repository (pattern: **/.github/workflows/*.yml). If a workflow also handles other events or branches, retain appropriate guards for those cases.
81822bd to
b534cc2
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b534cc23c1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| temperature=params.temperature | ||
| if params.HasField("temperature") | ||
| else defaults.get("temperature", 1.0), | ||
| top_p=params.top_p if params.HasField("top_p") else defaults.get("top_p", 1.0), |
There was a problem hiding this comment.
Require updated proto before calling new HasField checks
These new presence checks rely on top_p/top_k/... being proto3 optional, but smg-grpc-servicer still allows older smg-grpc-proto installs (e.g. 0.4.6) where those fields are plain scalars; in that environment params.HasField("top_p") raises ValueError and generation requests fail at runtime. This commit introduces that hard dependency, so the servicer release needs a minimum proto-version bump (or a compatibility guard) before this code is safe to deploy.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
The proto definition and servicer are in the same repo and released together — there's no independent versioning between them. A new servicer always ships with the matching proto.
|
@CatherineSue Addressed all 5 comments in b534cc2:
Also updated PR description to reflect the simplified scope. Note: |
Signed-off-by: ai-jz <ai-jz@users.noreply.github.com>
|
Addressed the @coderabbitai re-review |
|
✏️ Learnings added
🧠 Learnings used✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Motivation
vLLM's
--generation-configflag (default:auto) reads the model'sgeneration_config.json— a model artifact shipped alongside weights and chat template — and applies the model author's recommended sampling parameters (temperature, top_p, top_k, etc.) when the API request doesn't specify them.This is already supported in the HTTP serving path. This PR extends the same support to the gRPC path, so the same
--generation-configflag produces identical sampling behavior regardless of whether the request reaches vLLM via HTTP or gRPC. All three modes are supported:auto,vllm, and custom path.Description
Problem
The gRPC path does not yet support
--generation-config. The HTTP path reads the model'sgeneration_config.jsonand applies defaults (e.g.,temperature=0.6for Qwen3/Llama-3) for unset request fields. The gRPC path uses vLLM's neutral defaults (temperature=1.0,top_p=1.0) regardless of the flag value.Root cause spans three layers:
top_p,top_k,min_p,frequency_penalty,presence_penalty,repetition_penaltyare plain proto3 fields — the zero value is indistinguishable from "not set"unwrap_or(default)fills neutral values before sending, losing the "user didn't set this" signalmodel_config.get_diff_sampling_param(), hardcodes fallbacksSolution
Propagate the "unset" signal from client to servicer across all three layers:
optional— enablesHasField()detectionOptionthrough instead ofunwrap_or(default)in all 5 sampling param buildersget_diff_sampling_param()at init (same as HTTP path), useHasField()fallbacks for all fieldsContext-length enforcement (
max_tokensclamping) and Harmonystop_token_idsinjection are handled by vLLM's engine internally (input_processor.py), so the servicer does not duplicate that logic.Changes
vllm_engine.protooptional(enablesHasFielddetection)vllm_engine.rsOptionthrough; 3 new + 2 updated unit testsservicer.pyHasFieldfallbacksDesign Decisions
Why server-side defaults (in servicer) over client-side merge (in Rust gateway):
The engine process is authoritative for default resolution — it owns
generation_config.jsonparsing,--generation-configmode selection, and model-type-aware logic. vLLM exposes all of this throughget_diff_sampling_param()which the servicer reuses directly.Making the proto fields
optionalalso ensures omission semantics live in the gRPC request contract itself — which benefits any future non-gateway gRPC clients.SGLang support will be addressed in a separate PR.
Other notes:
optionalis wire-compatible — rolling upgrade safe. Old clients always set fields explicitly, soHasField() = trueand behavior is unchanged.input_processor.py) — not duplicated in the servicer.Test Plan
Existing tests (regression):
cargo test -p smg-grpc-clientOptiontypestest_openai_server.py(openai-grpc)New tests (added in this PR):
Checklist
Summary by CodeRabbit
Release Notes