Skip to content

feat(grpc): support --sampling-defaults in sglang gRPC servicer#1107

Open
ai-jz wants to merge 1 commit intomainfrom
feat/sglang-generation-config
Open

feat(grpc): support --sampling-defaults in sglang gRPC servicer#1107
ai-jz wants to merge 1 commit intomainfrom
feat/sglang-generation-config

Conversation

@ai-jz
Copy link
Copy Markdown
Collaborator

@ai-jz ai-jz commented Apr 13, 2026

Motivation

SGLang's --sampling-defaults model flag reads the model's generation_config.json and applies the model author's recommended sampling parameters (temperature, top_p, top_k, etc.) when the API request doesn't specify them. The --preferred-sampling-params flag provides operator-level overrides.

This is already supported in the HTTP serving path. This PR extends the same support to the gRPC path, so the same flags produce identical sampling behavior regardless of whether the request reaches SGLang via HTTP or gRPC.

Companion to #1103 which adds the same support for vLLM. Same 3-layer approach (proto optional → Rust pass-through → servicer cascade), adapted for SGLang's architecture.

Description

Problem

The gRPC path does not support --sampling-defaults model or --preferred-sampling-params. The HTTP path reads generation_config.json and applies defaults (e.g., temperature=0.6 for Qwen3, temperature=0.6, top_p=0.9 for Llama-3-Instruct) for unset request fields. The gRPC path uses OpenAI defaults (temperature=1.0, top_p=1.0) regardless of the flag values.

Root cause spans three layers:

  1. Proto: temperature, top_p, top_k, min_p, frequency_penalty, presence_penalty, repetition_penalty are plain proto3 fields — the zero value is indistinguishable from "not set"
  2. Rust client: unwrap_or(default) fills neutral values before sending, losing the "user didn't set this" signal
  3. Python servicer: never calls model_config.get_default_sampling_params(), hardcodes fallbacks

Solution

Propagate the "unset" signal from client to servicer across all three layers:

  1. Proto: make the 7 fields optional — enables HasField() detection
  2. Rust client: pass Option through instead of unwrap_or(default) in all 5 sampling param builders
  3. Python servicer: load defaults at init via get_default_sampling_params() and preferred_sampling_params, apply 3-tier cascade using HasField() fallbacks: user-set > preferred > generation_config > OpenAI defaults

Defaults are applied in the servicer because SGLang's gRPC path bypasses both serving_chat.py and TokenizerManager where defaults are applied for HTTP. The servicer reuses model_config.get_default_sampling_params() (the same SGLang API as the HTTP path) so default resolution stays consistent. Centralizing in SGLang's engine would require consolidating these two separate application points into one — a larger scope change that could be a follow-up if that is considered a better approach.

Changes

File What
sglang_scheduler.proto 7 sampling fields → optional (enables HasField detection)
sglang_scheduler.rs All 5 builders pass Option through; 1 new unit test for optional semantics
sglang/server.py Pass existing model_config to servicer constructor (~1 line)
sglang/servicer.py Load generation-config + preferred defaults at init, precompute merged _sampling_base, apply via HasField fallbacks

Deploy Note

The servicer (engine image) must be updated before the gateway. The new gateway sends wire-absent (None) values for the 7 optional sampling fields. An old servicer without HasField logic will deserialize absent fields as proto3 zeros (temperature=0.0, top_p=0.0, top_k=0), causing silently incorrect sampling. Update the servicer first so the HasField-based cascade is in place when None values arrive.

Test Plan

Existing tests (regression):

Suite Count Result
cargo test -p smg-grpc-client 47/47 All pass
Full workspace cargo check All crates Pass

New tests (added in this PR):

Suite Count Result
Rust: optional field set-vs-unset semantics 1/1 Pass

Integration tests (on GPU pod):

Suite Count Result
Proto HasField (unset vs zero, all 7 fields) 3/3 Pass
Cascade logic (3-tier, 8 scenarios — see below) 8/8 Pass
HTTP vs gRPC parity (Qwen3-32B, --sampling-defaults model) 8/8 Pass
End-to-end gRPC wire test (Llama-3-8B-Instruct, 2× H100, built from source) 8/8 Pass
Cascade logic scenarios
  1. No defaults configured, no user params → OpenAI defaults (temp=1.0, top_p=1.0)
  2. --sampling-defaults model with Qwen3 → generation_config applied (temp=0.6, top_k=20, top_p=0.95)
  3. User sets temp=0.8 → overrides model default 0.6
  4. User sets temp=0.0 (explicit greedy) → not overridden by model default
  5. --preferred-sampling-params → overrides model defaults
  6. Full cascade: user > preferred > model > OpenAI
  7. All 7 optional fields: unset vs explicit zero distinguished
  8. Before/after fix: proto3 zero (0.0) vs cascade-resolved (0.6)
Checklist
  • Proto change is wire-compatible (optional addition)
  • All 5 Rust sampling param builders updated
  • Existing Rust tests pass, new test for optional semantics
  • Python servicer matches HTTP path logic (get_default_sampling_params)
  • Pre-flight: pre-commit, clippy, rustfmt, ruff, DCO, no AI co-author
  • Rolling upgrade safe (old client → new servicer)
  • HTTP vs gRPC parity verified on GPU
  • End-to-end wire test verified on GPU (built from source)

@github-actions github-actions bot added the grpc gRPC client and router changes label Apr 13, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

Warning

Rate limit exceeded

@ai-jz has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 55 minutes and 37 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 55 minutes and 37 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d845770c-e313-4a0f-9038-ce874a454fa8

📥 Commits

Reviewing files that changed from the base of the PR and between 1d75a61 and 02690d6.

📒 Files selected for processing (4)
  • crates/grpc_client/proto/sglang_scheduler.proto
  • crates/grpc_client/src/sglang_scheduler.rs
  • grpc_servicer/smg_grpc_servicer/sglang/server.py
  • grpc_servicer/smg_grpc_servicer/sglang/servicer.py
📝 Walkthrough

Walkthrough

This PR makes 7 core sampling fields in the gRPC SamplingParams message optional and implements a cascading priority system for parameter resolution. Client-side propagates Option values into proto instead of filling defaults; server-side applies a 3-tier cascade: user gRPC values override preferred_sampling_params, which override model generation_config, which fall back to OpenAI defaults.

Changes

Cohort / File(s) Summary
Protocol Buffer Schema
crates/grpc_client/proto/sglang_scheduler.proto
Made 7 sampling fields (temperature, top_p, top_k, min_p, frequency_penalty, presence_penalty, repetition_penalty) optional in SamplingParams to enable distinguishing explicit zero values from unset fields.
Rust gRPC Client
crates/grpc_client/src/sglang_scheduler.rs
Updated sampling parameter construction for Chat, Responses, Messages, Completion, and plain generators to propagate Option values directly instead of using hard-coded defaults via unwrap_or(); tests updated to verify Some(value) vs None semantics.
Python Servicer Infrastructure
grpc_servicer/smg_grpc_servicer/sglang/server.py
Added passing of newly created model_config loaded from server_args into SGLangSchedulerServicer constructor during gRPC service setup.
Python Servicer Logic
grpc_servicer/smg_grpc_servicer/sglang/servicer.py
Extended servicer constructor to accept model_config and precompute _sampling_base with priority merging; updated _convert_sampling_params to implement 3-tier cascade using HasField checks: gRPC user values → preferred_sampling_paramsmodel_config defaults → OpenAI defaults.

Sequence Diagram

sequenceDiagram
    actor User
    participant Client as gRPC Client
    participant Servicer as SGLangSchedulerServicer
    participant ModelConfig as ModelConfig
    participant SGLSampling as SGLSamplingParams

    User->>Client: Invoke generate/chat with optional sampling params
    Client->>Client: Map Option<T> values to proto::SamplingParams<br/>(Some(v)→field set, None→field unset)
    Client->>Servicer: gRPC request with optional sampling fields
    
    Servicer->>Servicer: Init: Load model_config from server_args
    Servicer->>ModelConfig: get_default_sampling_params()
    ModelConfig-->>Servicer: generation_config defaults
    Servicer->>Servicer: Precompute _sampling_base<br/>(OpenAI → generation_config → preferred_sampling_params)
    
    Servicer->>Servicer: _convert_sampling_params(grpc_params)
    Servicer->>Servicer: For each of 7 fields:<br/>if HasField(grpc_param): use gRPC value<br/>else if preferred_sampling_params[field]: use preferred value<br/>else if generation_config[field]: use model default<br/>else: use OpenAI default
    Servicer->>SGLSampling: Construct SGLSamplingParams<br/>with resolved per-field values
    Servicer-->>Client: Response with applied sampling params

Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

grpc

Suggested reviewers

  • slin1237

Poem

🐰 Seven fields once fixed in stone,
Now whisper soft: "I'm set... or lone?"
A cascade flows from peak to base,
Defaults dance in proper place!
User's wish trumps all the rest,
Optional sampling, truly blessed. 🎲

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: enabling gRPC servicer support for sampling defaults configuration, matching the PR's primary objective.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sglang-generation-config

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the gRPC protocol and the SGLang servicer to support a hierarchical default resolution system for sampling parameters. By marking core sampling fields as optional in the proto definition, the system can now distinguish between parameters explicitly set by the user and those that should fall back to preferred settings, model generation configurations, or OpenAI defaults. The Rust client has been updated to pass through optional values, and the Python servicer now implements the cascading logic. A high-severity issue was identified in the servicer where it incorrectly attempts to cast a JSON string to a dictionary, which will cause a runtime failure.

Comment on lines +191 to +195
preferred_sampling_params = (
dict(server_args.preferred_sampling_params)
if server_args.preferred_sampling_params
else {}
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The server_args.preferred_sampling_params field is a JSON string (as defined in the ServerArgs dataclass), not a dictionary or an iterable that can be passed to the dict() constructor. Attempting to call dict() on a JSON string will raise a ValueError or TypeError at runtime. Since msgspec is already imported in this file, you should use msgspec.json.decode to correctly parse the configuration string.

Suggested change
preferred_sampling_params = (
dict(server_args.preferred_sampling_params)
if server_args.preferred_sampling_params
else {}
)
preferred_sampling_params = (
msgspec.json.decode(server_args.preferred_sampling_params)
if server_args.preferred_sampling_params
else {}
)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServerArgs.__post_init__ parses preferred_sampling_params from JSON string to dict during validation. By the time the servicer receives it, the field is already a dict (or None). No additional parsing needed.

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean implementation across all three layers. The proto optional fields, Rust Option pass-through, and Python HasField-based cascade are correct and consistent. Deploy ordering (servicer before gateway) is well-documented.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d75a61c3c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +877 to +879
temperature = (
grpc_params.temperature if grpc_params.HasField("temperature") else base["temperature"]
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard HasField fallback for legacy zero-valued clients

This fallback logic changes behavior for mixed-version rollouts where the servicer is upgraded before the Rust gateway: the old gateway used non-optional proto3 scalars, so explicit 0 values (for example temperature=0) are encoded as field absence, and these lines now reinterpret them as "unset" and replace them with _sampling_base defaults (often 1.0 or model defaults). That silently flips deterministic/greedy requests to sampled outputs until the gateway is upgraded, so compatibility handling is needed for legacy senders.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered by the Deploy Note: "The servicer must be updated before the gateway." The old gateway uses non-optional proto3 scalars which always have wire presence, so HasField() = True for every field — the cascade is never triggered until the new gateway starts sending None values. Deploy ordering ensures the HasField logic is in place first.

Comment on lines +53 to +55
optional float temperature = 1;
optional float top_p = 2;
optional int32 top_k = 3;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Regenerate Go protobuf stubs after optional field change

Changing these fields to proto3 optional requires regenerating committed Go protobuf code, but bindings/golang/internal/proto/sglang_scheduler.pb.go still exposes non-optional scalar fields (float32/int32). As a result, Go clients cannot carry field presence (unset vs explicit zero), so they won't get the intended sampling-default cascade semantics and the repo now has protocol/source drift.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope — consistent with #1103 which also did not regenerate Go stubs for the same type of proto optional change.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/grpc_client/src/sglang_scheduler.rs (1)

961-978: 🧹 Nitpick | 🔵 Trivial

Add builder-level coverage for the optional semantics.

The new test proves prost preserves Option, but it does not exercise the five builder functions changed here. A small table-driven test over the builders would catch an accidental unwrap_or(...) reintroduction in one path.

Also applies to: 982-1012

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/grpc_client/src/sglang_scheduler.rs` around lines 961 - 978, Add a
small table-driven unit test that exercises each builder method that maps to
optional proto fields and asserts the Option semantics are preserved: for each
builder method (e.g. the methods that set temperature, top_p, top_k, min_p and
the other optional sampling builders) run two cases—call the builder with
Some(value) and leave it unset—and verify the resulting params fields
(params.temperature, params.top_p, params.top_k, params.min_p,
params.frequency_penalty, etc.) are Some(value) or None as expected; this will
catch any accidental unwrap_or(...) reintroduction in the builder
implementations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@crates/grpc_client/src/sglang_scheduler.rs`:
- Around line 961-978: Add a small table-driven unit test that exercises each
builder method that maps to optional proto fields and asserts the Option
semantics are preserved: for each builder method (e.g. the methods that set
temperature, top_p, top_k, min_p and the other optional sampling builders) run
two cases—call the builder with Some(value) and leave it unset—and verify the
resulting params fields (params.temperature, params.top_p, params.top_k,
params.min_p, params.frequency_penalty, etc.) are Some(value) or None as
expected; this will catch any accidental unwrap_or(...) reintroduction in the
builder implementations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ae0cc067-e917-4401-ac24-b04e0f7ac34a

📥 Commits

Reviewing files that changed from the base of the PR and between d954c14 and 1d75a61.

📒 Files selected for processing (4)
  • crates/grpc_client/proto/sglang_scheduler.proto
  • crates/grpc_client/src/sglang_scheduler.rs
  • grpc_servicer/smg_grpc_servicer/sglang/server.py
  • grpc_servicer/smg_grpc_servicer/sglang/servicer.py

@ai-jz
Copy link
Copy Markdown
Collaborator Author

ai-jz commented Apr 13, 2026

Re: CodeRabbit nitpick on builder-level tests (sglang_scheduler.rs:961-978)

The existing test (test_optional_sampling_params_set_vs_unset) covers proto-level Option semantics. The 5 builders are straightforward Option pass-through with no conditional logic, and 47 existing tests already cover their construction paths. Consistent with #1103 where similar builder-level coverage was not required.

The gRPC path does not support --sampling-defaults model or
--preferred-sampling-params. The HTTP path reads generation_config.json
and applies defaults for unset request fields; the gRPC path uses
OpenAI defaults regardless.

Fix across three layers:
- Proto: make 7 sampling fields optional (HasField detection)
- Rust client: pass Option through in all 5 sampling param builders
- Python servicer: load defaults at init, apply 3-tier cascade
  (user-set > preferred > generation_config > OpenAI defaults)

Signed-off-by: ai-jz <ai-jz@users.noreply.github.com>
@ai-jz ai-jz force-pushed the feat/sglang-generation-config branch from 1d75a61 to 02690d6 Compare April 18, 2026 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

grpc gRPC client and router changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant