feat(grpc): support --sampling-defaults in sglang gRPC servicer#1107
feat(grpc): support --sampling-defaults in sglang gRPC servicer#1107
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 55 minutes and 37 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR makes 7 core sampling fields in the gRPC Changes
Sequence DiagramsequenceDiagram
actor User
participant Client as gRPC Client
participant Servicer as SGLangSchedulerServicer
participant ModelConfig as ModelConfig
participant SGLSampling as SGLSamplingParams
User->>Client: Invoke generate/chat with optional sampling params
Client->>Client: Map Option<T> values to proto::SamplingParams<br/>(Some(v)→field set, None→field unset)
Client->>Servicer: gRPC request with optional sampling fields
Servicer->>Servicer: Init: Load model_config from server_args
Servicer->>ModelConfig: get_default_sampling_params()
ModelConfig-->>Servicer: generation_config defaults
Servicer->>Servicer: Precompute _sampling_base<br/>(OpenAI → generation_config → preferred_sampling_params)
Servicer->>Servicer: _convert_sampling_params(grpc_params)
Servicer->>Servicer: For each of 7 fields:<br/>if HasField(grpc_param): use gRPC value<br/>else if preferred_sampling_params[field]: use preferred value<br/>else if generation_config[field]: use model default<br/>else: use OpenAI default
Servicer->>SGLSampling: Construct SGLSamplingParams<br/>with resolved per-field values
Servicer-->>Client: Response with applied sampling params
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the gRPC protocol and the SGLang servicer to support a hierarchical default resolution system for sampling parameters. By marking core sampling fields as optional in the proto definition, the system can now distinguish between parameters explicitly set by the user and those that should fall back to preferred settings, model generation configurations, or OpenAI defaults. The Rust client has been updated to pass through optional values, and the Python servicer now implements the cascading logic. A high-severity issue was identified in the servicer where it incorrectly attempts to cast a JSON string to a dictionary, which will cause a runtime failure.
| preferred_sampling_params = ( | ||
| dict(server_args.preferred_sampling_params) | ||
| if server_args.preferred_sampling_params | ||
| else {} | ||
| ) |
There was a problem hiding this comment.
The server_args.preferred_sampling_params field is a JSON string (as defined in the ServerArgs dataclass), not a dictionary or an iterable that can be passed to the dict() constructor. Attempting to call dict() on a JSON string will raise a ValueError or TypeError at runtime. Since msgspec is already imported in this file, you should use msgspec.json.decode to correctly parse the configuration string.
| preferred_sampling_params = ( | |
| dict(server_args.preferred_sampling_params) | |
| if server_args.preferred_sampling_params | |
| else {} | |
| ) | |
| preferred_sampling_params = ( | |
| msgspec.json.decode(server_args.preferred_sampling_params) | |
| if server_args.preferred_sampling_params | |
| else {} | |
| ) |
There was a problem hiding this comment.
ServerArgs.__post_init__ parses preferred_sampling_params from JSON string to dict during validation. By the time the servicer receives it, the field is already a dict (or None). No additional parsing needed.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1d75a61c3c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| temperature = ( | ||
| grpc_params.temperature if grpc_params.HasField("temperature") else base["temperature"] | ||
| ) |
There was a problem hiding this comment.
Guard HasField fallback for legacy zero-valued clients
This fallback logic changes behavior for mixed-version rollouts where the servicer is upgraded before the Rust gateway: the old gateway used non-optional proto3 scalars, so explicit 0 values (for example temperature=0) are encoded as field absence, and these lines now reinterpret them as "unset" and replace them with _sampling_base defaults (often 1.0 or model defaults). That silently flips deterministic/greedy requests to sampled outputs until the gateway is upgraded, so compatibility handling is needed for legacy senders.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Covered by the Deploy Note: "The servicer must be updated before the gateway." The old gateway uses non-optional proto3 scalars which always have wire presence, so HasField() = True for every field — the cascade is never triggered until the new gateway starts sending None values. Deploy ordering ensures the HasField logic is in place first.
| optional float temperature = 1; | ||
| optional float top_p = 2; | ||
| optional int32 top_k = 3; |
There was a problem hiding this comment.
Regenerate Go protobuf stubs after optional field change
Changing these fields to proto3 optional requires regenerating committed Go protobuf code, but bindings/golang/internal/proto/sglang_scheduler.pb.go still exposes non-optional scalar fields (float32/int32). As a result, Go clients cannot carry field presence (unset vs explicit zero), so they won't get the intended sampling-default cascade semantics and the repo now has protocol/source drift.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Out of scope — consistent with #1103 which also did not regenerate Go stubs for the same type of proto optional change.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/grpc_client/src/sglang_scheduler.rs (1)
961-978: 🧹 Nitpick | 🔵 TrivialAdd builder-level coverage for the optional semantics.
The new test proves prost preserves
Option, but it does not exercise the five builder functions changed here. A small table-driven test over the builders would catch an accidentalunwrap_or(...)reintroduction in one path.Also applies to: 982-1012
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/grpc_client/src/sglang_scheduler.rs` around lines 961 - 978, Add a small table-driven unit test that exercises each builder method that maps to optional proto fields and asserts the Option semantics are preserved: for each builder method (e.g. the methods that set temperature, top_p, top_k, min_p and the other optional sampling builders) run two cases—call the builder with Some(value) and leave it unset—and verify the resulting params fields (params.temperature, params.top_p, params.top_k, params.min_p, params.frequency_penalty, etc.) are Some(value) or None as expected; this will catch any accidental unwrap_or(...) reintroduction in the builder implementations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@crates/grpc_client/src/sglang_scheduler.rs`:
- Around line 961-978: Add a small table-driven unit test that exercises each
builder method that maps to optional proto fields and asserts the Option
semantics are preserved: for each builder method (e.g. the methods that set
temperature, top_p, top_k, min_p and the other optional sampling builders) run
two cases—call the builder with Some(value) and leave it unset—and verify the
resulting params fields (params.temperature, params.top_p, params.top_k,
params.min_p, params.frequency_penalty, etc.) are Some(value) or None as
expected; this will catch any accidental unwrap_or(...) reintroduction in the
builder implementations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ae0cc067-e917-4401-ac24-b04e0f7ac34a
📒 Files selected for processing (4)
crates/grpc_client/proto/sglang_scheduler.protocrates/grpc_client/src/sglang_scheduler.rsgrpc_servicer/smg_grpc_servicer/sglang/server.pygrpc_servicer/smg_grpc_servicer/sglang/servicer.py
|
Re: CodeRabbit nitpick on builder-level tests (sglang_scheduler.rs:961-978) The existing test (test_optional_sampling_params_set_vs_unset) covers proto-level Option semantics. The 5 builders are straightforward Option pass-through with no conditional logic, and 47 existing tests already cover their construction paths. Consistent with #1103 where similar builder-level coverage was not required. |
The gRPC path does not support --sampling-defaults model or --preferred-sampling-params. The HTTP path reads generation_config.json and applies defaults for unset request fields; the gRPC path uses OpenAI defaults regardless. Fix across three layers: - Proto: make 7 sampling fields optional (HasField detection) - Rust client: pass Option through in all 5 sampling param builders - Python servicer: load defaults at init, apply 3-tier cascade (user-set > preferred > generation_config > OpenAI defaults) Signed-off-by: ai-jz <ai-jz@users.noreply.github.com>
1d75a61 to
02690d6
Compare
Motivation
SGLang's
--sampling-defaults modelflag reads the model'sgeneration_config.jsonand applies the model author's recommended sampling parameters (temperature, top_p, top_k, etc.) when the API request doesn't specify them. The--preferred-sampling-paramsflag provides operator-level overrides.This is already supported in the HTTP serving path. This PR extends the same support to the gRPC path, so the same flags produce identical sampling behavior regardless of whether the request reaches SGLang via HTTP or gRPC.
Companion to #1103 which adds the same support for vLLM. Same 3-layer approach (proto
optional→ Rust pass-through → servicer cascade), adapted for SGLang's architecture.Description
Problem
The gRPC path does not support
--sampling-defaults modelor--preferred-sampling-params. The HTTP path readsgeneration_config.jsonand applies defaults (e.g.,temperature=0.6for Qwen3,temperature=0.6, top_p=0.9for Llama-3-Instruct) for unset request fields. The gRPC path uses OpenAI defaults (temperature=1.0,top_p=1.0) regardless of the flag values.Root cause spans three layers:
temperature,top_p,top_k,min_p,frequency_penalty,presence_penalty,repetition_penaltyare plain proto3 fields — the zero value is indistinguishable from "not set"unwrap_or(default)fills neutral values before sending, losing the "user didn't set this" signalmodel_config.get_default_sampling_params(), hardcodes fallbacksSolution
Propagate the "unset" signal from client to servicer across all three layers:
optional— enablesHasField()detectionOptionthrough instead ofunwrap_or(default)in all 5 sampling param buildersget_default_sampling_params()andpreferred_sampling_params, apply 3-tier cascade usingHasField()fallbacks: user-set > preferred > generation_config > OpenAI defaultsDefaults are applied in the servicer because SGLang's gRPC path bypasses both
serving_chat.pyandTokenizerManagerwhere defaults are applied for HTTP. The servicer reusesmodel_config.get_default_sampling_params()(the same SGLang API as the HTTP path) so default resolution stays consistent. Centralizing in SGLang's engine would require consolidating these two separate application points into one — a larger scope change that could be a follow-up if that is considered a better approach.Changes
sglang_scheduler.protooptional(enablesHasFielddetection)sglang_scheduler.rsOptionthrough; 1 new unit test for optional semanticssglang/server.pymodel_configto servicer constructor (~1 line)sglang/servicer.py_sampling_base, apply viaHasFieldfallbacksDeploy Note
The servicer (engine image) must be updated before the gateway. The new gateway sends wire-absent (
None) values for the 7 optional sampling fields. An old servicer withoutHasFieldlogic will deserialize absent fields as proto3 zeros (temperature=0.0,top_p=0.0,top_k=0), causing silently incorrect sampling. Update the servicer first so theHasField-based cascade is in place whenNonevalues arrive.Test Plan
Existing tests (regression):
cargo test -p smg-grpc-clientcargo checkNew tests (added in this PR):
Integration tests (on GPU pod):
HasField(unset vs zero, all 7 fields)--sampling-defaults model)Cascade logic scenarios
--sampling-defaults modelwith Qwen3 → generation_config applied (temp=0.6, top_k=20, top_p=0.95)--preferred-sampling-params→ overrides model defaultsChecklist
get_default_sampling_params)