feat(grpc): support --sampling-defaults in sglang gRPC servicer by ai-jz · Pull Request #1107 · lightseekorg/smg

ai-jz · 2026-04-13T04:52:09Z

Motivation

SGLang's --sampling-defaults model flag reads the model's generation_config.json and applies the model author's recommended sampling parameters (temperature, top_p, top_k, etc.) when the API request doesn't specify them. The --preferred-sampling-params flag provides operator-level overrides.

This is already supported in the HTTP serving path. This PR extends the same support to the gRPC path, so the same flags produce identical sampling behavior regardless of whether the request reaches SGLang via HTTP or gRPC.

Companion to #1103 which adds the same support for vLLM. Same 3-layer approach (proto optional → Rust pass-through → servicer cascade), adapted for SGLang's architecture.

Description

Problem

The gRPC path does not support --sampling-defaults model or --preferred-sampling-params. The HTTP path reads generation_config.json and applies defaults (e.g., temperature=0.6 for Qwen3, temperature=0.6, top_p=0.9 for Llama-3-Instruct) for unset request fields. The gRPC path uses OpenAI defaults (temperature=1.0, top_p=1.0) regardless of the flag values.

Root cause spans three layers:

Proto: temperature, top_p, top_k, min_p, frequency_penalty, presence_penalty, repetition_penalty are plain proto3 fields — the zero value is indistinguishable from "not set"
Rust client: unwrap_or(default) fills neutral values before sending, losing the "user didn't set this" signal
Python servicer: never calls model_config.get_default_sampling_params(), hardcodes fallbacks

Solution

Propagate the "unset" signal from client to servicer across all three layers:

Proto: make the 7 fields optional — enables HasField() detection
Rust client: pass Option through instead of unwrap_or(default) in all 5 sampling param builders
Python servicer: load defaults at init via get_default_sampling_params() and preferred_sampling_params, apply 3-tier cascade using HasField() fallbacks: user-set > preferred > generation_config > OpenAI defaults

Defaults are applied in the servicer because SGLang's gRPC path bypasses both serving_chat.py and TokenizerManager where defaults are applied for HTTP. The servicer reuses model_config.get_default_sampling_params() (the same SGLang API as the HTTP path) so default resolution stays consistent. Centralizing in SGLang's engine would require consolidating these two separate application points into one — a larger scope change that could be a follow-up if that is considered a better approach.

Changes

File	What
`sglang_scheduler.proto`	7 sampling fields → `optional` (enables `HasField` detection)
`sglang_scheduler.rs`	All 5 builders pass `Option` through; 1 new unit test for optional semantics
`sglang/server.py`	Pass existing `model_config` to servicer constructor (~1 line)
`sglang/servicer.py`	Load generation-config + preferred defaults at init, precompute merged `_sampling_base`, apply via `HasField` fallbacks

Deploy Note

The servicer (engine image) must be updated before the gateway. The new gateway sends wire-absent (None) values for the 7 optional sampling fields. An old servicer without HasField logic will deserialize absent fields as proto3 zeros (temperature=0.0, top_p=0.0, top_k=0), causing silently incorrect sampling. Update the servicer first so the HasField-based cascade is in place when None values arrive.

Test Plan

Existing tests (regression):

Suite	Count	Result
`cargo test -p smg-grpc-client`	47/47	All pass
Full workspace `cargo check`	All crates	Pass

New tests (added in this PR):

Suite	Count	Result
Rust: optional field set-vs-unset semantics	1/1	Pass

Integration tests (on GPU pod):

Suite	Count	Result
Proto `HasField` (unset vs zero, all 7 fields)	3/3	Pass
Cascade logic (3-tier, 8 scenarios — see below)	8/8	Pass
HTTP vs gRPC parity (Qwen3-32B, `--sampling-defaults model`)	8/8	Pass
End-to-end gRPC wire test (Llama-3-8B-Instruct, 2× H100, built from source)	8/8	Pass

Cascade logic scenarios

No defaults configured, no user params → OpenAI defaults (temp=1.0, top_p=1.0)
--sampling-defaults model with Qwen3 → generation_config applied (temp=0.6, top_k=20, top_p=0.95)
User sets temp=0.8 → overrides model default 0.6
User sets temp=0.0 (explicit greedy) → not overridden by model default
--preferred-sampling-params → overrides model defaults
Full cascade: user > preferred > model > OpenAI
All 7 optional fields: unset vs explicit zero distinguished
Before/after fix: proto3 zero (0.0) vs cascade-resolved (0.6)

Checklist

Proto change is wire-compatible (optional addition)
All 5 Rust sampling param builders updated
Existing Rust tests pass, new test for optional semantics
Python servicer matches HTTP path logic (get_default_sampling_params)
Pre-flight: pre-commit, clippy, rustfmt, ruff, DCO, no AI co-author
Rolling upgrade safe (old client → new servicer)
HTTP vs gRPC parity verified on GPU
End-to-end wire test verified on GPU (built from source)

coderabbitai · 2026-04-13T04:52:24Z

Warning

Rate limit exceeded

@ai-jz has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 55 minutes and 37 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 55 minutes and 37 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d845770c-e313-4a0f-9038-ce874a454fa8

📥 Commits

Reviewing files that changed from the base of the PR and between 1d75a61 and 02690d6.

📒 Files selected for processing (4)

crates/grpc_client/proto/sglang_scheduler.proto
crates/grpc_client/src/sglang_scheduler.rs
grpc_servicer/smg_grpc_servicer/sglang/server.py
grpc_servicer/smg_grpc_servicer/sglang/servicer.py

📝 Walkthrough

Walkthrough

This PR makes 7 core sampling fields in the gRPC SamplingParams message optional and implements a cascading priority system for parameter resolution. Client-side propagates Option values into proto instead of filling defaults; server-side applies a 3-tier cascade: user gRPC values override preferred_sampling_params, which override model generation_config, which fall back to OpenAI defaults.

Changes

Cohort / File(s)	Summary
Protocol Buffer Schema `crates/grpc_client/proto/sglang_scheduler.proto`	Made 7 sampling fields (`temperature`, `top_p`, `top_k`, `min_p`, `frequency_penalty`, `presence_penalty`, `repetition_penalty`) optional in `SamplingParams` to enable distinguishing explicit zero values from unset fields.
Rust gRPC Client `crates/grpc_client/src/sglang_scheduler.rs`	Updated sampling parameter construction for Chat, Responses, Messages, Completion, and plain generators to propagate `Option` values directly instead of using hard-coded defaults via `unwrap_or()`; tests updated to verify `Some(value)` vs `None` semantics.
Python Servicer Infrastructure `grpc_servicer/smg_grpc_servicer/sglang/server.py`	Added passing of newly created `model_config` loaded from `server_args` into `SGLangSchedulerServicer` constructor during gRPC service setup.
Python Servicer Logic `grpc_servicer/smg_grpc_servicer/sglang/servicer.py`	Extended servicer constructor to accept `model_config` and precompute `_sampling_base` with priority merging; updated `_convert_sampling_params` to implement 3-tier cascade using `HasField` checks: gRPC user values → `preferred_sampling_params` → `model_config` defaults → OpenAI defaults.

Sequence Diagram

sequenceDiagram
    actor User
    participant Client as gRPC Client
    participant Servicer as SGLangSchedulerServicer
    participant ModelConfig as ModelConfig
    participant SGLSampling as SGLSamplingParams

    User->>Client: Invoke generate/chat with optional sampling params
    Client->>Client: Map Option<T> values to proto::SamplingParams<br/>(Some(v)→field set, None→field unset)
    Client->>Servicer: gRPC request with optional sampling fields
    
    Servicer->>Servicer: Init: Load model_config from server_args
    Servicer->>ModelConfig: get_default_sampling_params()
    ModelConfig-->>Servicer: generation_config defaults
    Servicer->>Servicer: Precompute _sampling_base<br/>(OpenAI → generation_config → preferred_sampling_params)
    
    Servicer->>Servicer: _convert_sampling_params(grpc_params)
    Servicer->>Servicer: For each of 7 fields:<br/>if HasField(grpc_param): use gRPC value<br/>else if preferred_sampling_params[field]: use preferred value<br/>else if generation_config[field]: use model default<br/>else: use OpenAI default
    Servicer->>SGLSampling: Construct SGLSamplingParams<br/>with resolved per-field values
    Servicer-->>Client: Response with applied sampling params

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

refactor(grpc): optimize vLLM servicer and remove duplicate server.py #662: Both PRs modify gRPC servicer sampling-parameter handling and parameter validation/propagation logic in the same servicer module.
refactor(grpc): remove redundant harmony_stop_ids from build_generate_request_from_responses #880: Both PRs modify sampling-parameter construction in the Rust gRPC client module with overlapping changes to request-building code.

Suggested labels

grpc

Suggested reviewers

slin1237

Poem

🐰 Seven fields once fixed in stone,
Now whisper soft: "I'm set... or lone?"
A cascade flows from peak to base,
Defaults dance in proper place!
User's wish trumps all the rest,
Optional sampling, truly blessed. 🎲

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: enabling gRPC servicer support for sampling defaults configuration, matching the PR's primary objective.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/sglang-generation-config

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request updates the gRPC protocol and the SGLang servicer to support a hierarchical default resolution system for sampling parameters. By marking core sampling fields as optional in the proto definition, the system can now distinguish between parameters explicitly set by the user and those that should fall back to preferred settings, model generation configurations, or OpenAI defaults. The Rust client has been updated to pass through optional values, and the Python servicer now implements the cascading logic. A high-severity issue was identified in the servicer where it incorrectly attempts to cast a JSON string to a dictionary, which will cause a runtime failure.

gemini-code-assist · 2026-04-13T04:53:38Z

+        preferred_sampling_params = (
+            dict(server_args.preferred_sampling_params)
+            if server_args.preferred_sampling_params
+            else {}
+        )


The server_args.preferred_sampling_params field is a JSON string (as defined in the ServerArgs dataclass), not a dictionary or an iterable that can be passed to the dict() constructor. Attempting to call dict() on a JSON string will raise a ValueError or TypeError at runtime. Since msgspec is already imported in this file, you should use msgspec.json.decode to correctly parse the configuration string.

Suggested change

preferred_sampling_params = (

dict(server_args.preferred_sampling_params)

if server_args.preferred_sampling_params

else {}

)

preferred_sampling_params = (

msgspec.json.decode(server_args.preferred_sampling_params)

if server_args.preferred_sampling_params

else {}

)

ServerArgs.__post_init__ parses preferred_sampling_params from JSON string to dict during validation. By the time the servicer receives it, the field is already a dict (or None). No additional parsing needed.

claude

Clean implementation across all three layers. The proto optional fields, Rust Option pass-through, and Python HasField-based cascade are correct and consistent. Deploy ordering (servicer before gateway) is well-documented.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d75a61c3c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-13T04:58:57Z

+        temperature = (
+            grpc_params.temperature if grpc_params.HasField("temperature") else base["temperature"]
+        )


Guard HasField fallback for legacy zero-valued clients

This fallback logic changes behavior for mixed-version rollouts where the servicer is upgraded before the Rust gateway: the old gateway used non-optional proto3 scalars, so explicit 0 values (for example temperature=0) are encoded as field absence, and these lines now reinterpret them as "unset" and replace them with _sampling_base defaults (often 1.0 or model defaults). That silently flips deterministic/greedy requests to sampled outputs until the gateway is upgraded, so compatibility handling is needed for legacy senders.

Useful? React with 👍 / 👎.

Covered by the Deploy Note: "The servicer must be updated before the gateway." The old gateway uses non-optional proto3 scalars which always have wire presence, so HasField() = True for every field — the cascade is never triggered until the new gateway starts sending None values. Deploy ordering ensures the HasField logic is in place first.

chatgpt-codex-connector · 2026-04-13T04:58:57Z

+  optional float temperature = 1;
+  optional float top_p = 2;
+  optional int32 top_k = 3;


Regenerate Go protobuf stubs after optional field change

Changing these fields to proto3 optional requires regenerating committed Go protobuf code, but bindings/golang/internal/proto/sglang_scheduler.pb.go still exposes non-optional scalar fields (float32/int32). As a result, Go clients cannot carry field presence (unset vs explicit zero), so they won't get the intended sampling-default cascade semantics and the repo now has protocol/source drift.

Useful? React with 👍 / 👎.

Out of scope — consistent with #1103 which also did not regenerate Go stubs for the same type of proto optional change.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/grpc_client/src/sglang_scheduler.rs (1)
961-978: 🧹 Nitpick | 🔵 Trivial

Add builder-level coverage for the optional semantics.

The new test proves prost preserves Option, but it does not exercise the five builder functions changed here. A small table-driven test over the builders would catch an accidental unwrap_or(...) reintroduction in one path.

Also applies to: 982-1012
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/grpc_client/src/sglang_scheduler.rs` around lines 961 - 978, Add a
small table-driven unit test that exercises each builder method that maps to
optional proto fields and asserts the Option semantics are preserved: for each
builder method (e.g. the methods that set temperature, top_p, top_k, min_p and
the other optional sampling builders) run two cases—call the builder with
Some(value) and leave it unset—and verify the resulting params fields
(params.temperature, params.top_p, params.top_k, params.min_p,
params.frequency_penalty, etc.) are Some(value) or None as expected; this will
catch any accidental unwrap_or(...) reintroduction in the builder
implementations.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@crates/grpc_client/src/sglang_scheduler.rs`:
- Around line 961-978: Add a small table-driven unit test that exercises each
builder method that maps to optional proto fields and asserts the Option
semantics are preserved: for each builder method (e.g. the methods that set
temperature, top_p, top_k, min_p and the other optional sampling builders) run
two cases—call the builder with Some(value) and leave it unset—and verify the
resulting params fields (params.temperature, params.top_p, params.top_k,
params.min_p, params.frequency_penalty, etc.) are Some(value) or None as
expected; this will catch any accidental unwrap_or(...) reintroduction in the
builder implementations.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ae0cc067-e917-4401-ac24-b04e0f7ac34a

📥 Commits

Reviewing files that changed from the base of the PR and between d954c14 and 1d75a61.

📒 Files selected for processing (4)

crates/grpc_client/proto/sglang_scheduler.proto
crates/grpc_client/src/sglang_scheduler.rs
grpc_servicer/smg_grpc_servicer/sglang/server.py
grpc_servicer/smg_grpc_servicer/sglang/servicer.py

ai-jz · 2026-04-13T05:31:13Z

Re: CodeRabbit nitpick on builder-level tests (sglang_scheduler.rs:961-978)

The existing test (test_optional_sampling_params_set_vs_unset) covers proto-level Option semantics. The 5 builders are straightforward Option pass-through with no conditional logic, and 47 existing tests already cover their construction paths. Consistent with #1103 where similar builder-level coverage was not required.

The gRPC path does not support --sampling-defaults model or --preferred-sampling-params. The HTTP path reads generation_config.json and applies defaults for unset request fields; the gRPC path uses OpenAI defaults regardless. Fix across three layers: - Proto: make 7 sampling fields optional (HasField detection) - Rust client: pass Option through in all 5 sampling param builders - Python servicer: load defaults at init, apply 3-tier cascade (user-set > preferred > generation_config > OpenAI defaults) Signed-off-by: ai-jz <ai-jz@users.noreply.github.com>

ai-jz requested review from CatherineSue and slin1237 as code owners April 13, 2026 04:52

github-actions bot added the grpc gRPC client and router changes label Apr 13, 2026

gemini-code-assist bot reviewed Apr 13, 2026

View reviewed changes

claude bot approved these changes Apr 13, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 13, 2026

View reviewed changes

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

coderabbitai bot approved these changes Apr 13, 2026

View reviewed changes

ai-jz force-pushed the feat/sglang-generation-config branch from 1d75a61 to 02690d6 Compare April 18, 2026 02:35

Conversation

ai-jz commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Problem

Solution

Changes

Deploy Note

Test Plan

Uh oh!

coderabbitai bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

ai-jz Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

ai-jz Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

ai-jz Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

ai-jz commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ai-jz commented Apr 13, 2026 •

edited

Loading

coderabbitai bot commented Apr 13, 2026 •

edited

Loading