feat(tool_parser): add structural tag constraint generation, fix skip_special_tokens derivation#1090
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRegistry now stores per-parser metadata (including optional structural-tag builders) and exposes a unified constraint-generation API. Gateway and bindings thread a configured parser into shared components, delegate constraint generation to the registry, and adjust parsing and skip-special-tokens behavior based on returned constraint types. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant GrpcRouter
participant SharedComponents
participant ParserRegistry
participant Parser
Client->>GrpcRouter: Send request (model, tools, tool_choice)
GrpcRouter->>SharedComponents: build request context (includes configured_tool_parser)
GrpcRouter->>ParserRegistry: generate_tool_constraint(configured_parser, tools, tool_choice)
alt configured parser has structural tag
ParserRegistry->>Parser: call build_structural_tag(tools, at_least_one)
Parser-->>ParserRegistry: StructuralTag(JSON)
else generic/json-schema path
ParserRegistry-->>ParserRegistry: build JSON-schema constraint (or None)
end
ParserRegistry-->>GrpcRouter: return ToolConstraint (JsonSchema | StructuralTag | None)
GrpcRouter->>GrpcRouter: set skip_special_tokens / parsing path based on constraint type
GrpcRouter->>Client: continue processing/request execution
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Code Review Observations1. Mistral: single tool call only. The 2. 3. Error → Option. 4. Deprecated wrapper is expensive. 5. No |
There was a problem hiding this comment.
Code Review
This pull request refactors the tool parser registry to support model-specific structural tags alongside generic JSON schema constraints. It introduces a ParserEntry metadata structure, adds native structural tag builders for Mistral and Kimi K2 models, and migrates the gateway's preparation stages to use the new centralized generate_tool_constraint logic. Feedback focuses on enhancing the robustness of the new constraint generation by handling potential schema conflicts during $defs consolidation, improving error handling for serialization, and suggesting automatic parser resolution via model IDs.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 09ca1ef43a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
model_gateway/src/routers/grpc/utils/chat_utils.rs (1)
211-229:⚠️ Potential issue | 🟠 MajorCompatibility wrapper can no longer produce structural tags.
This always calls
generate_tool_constraint(None, ...), so the registry’s structural-tag branch is unreachable. Any remaining caller on this wrapper—including the golang bindings called out in the FIXME—will still get the old token-0json_schemaconstraint for Mistral/Kimi models, which defeats the compatibility goal of this PR.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/utils/chat_utils.rs` around lines 211 - 229, The compatibility wrapper generate_tool_constraints currently passes None as the first argument to ParserRegistry::generate_tool_constraint, which prevents the registry from taking the structural-tag branch; change the wrapper to use the provided model string instead: rename the unused parameter _model to model (or use _model without underscore) and forward Some(model) as the first argument to factory.registry().generate_tool_constraint(Some(model), tools, choice) so callers (including golang bindings) receive the correct structural-tag-aware constraint; preserve the function signature/result handling otherwise.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/tool_parser/src/factory.rs`:
- Around line 540-549: The current consolidation into all_defs silently drops
later $defs with the same key; instead, when iterating the tools collection and
extracting params.get("$defs") (inside the loop that builds all_defs),
rewrite/namespace each definition key uniquely (e.g., prefix with a stable tool
identifier such as tool.name or tool.function.name or tool.id, e.g.
format!("{}::{}", tool_id, def_name)) and insert those namespaced keys into
all_defs; also update any $ref strings inside def_schema and the tool's
parameter schemas to point to the new namespaced keys so references remain
correct (i.e., mutate def_schema and the parameter schema $ref values as you
namespace). Ensure this transformation happens where all_defs and the
tool.function.parameters are being processed so no raw-key collisions occur.
- Around line 167-199: The code currently uses the full tools list when building
constraints, so ToolChoice filters are ignored; update the logic in the function
that handles tool_choice (referencing the variable tool_choice and the enum
variants ToolChoice::Function and ToolChoice::AllowedTools { mode, .. }) to
first produce a filtered_tools list according to the requested choice (for
ToolChoice::Function use only the specific function/tool requested, for
AllowedTools restrict to the allowed tool IDs when mode == "required", etc.),
return None if filtered_tools is empty, and then use filtered_tools (not tools)
when calling entry.build_structural_tag, serializing parameters for
ToolChoice::Function (use filtered_tools[0].function.parameters), and when
calling build_required_array_schema(filtered_tools).
In `@crates/tool_parser/src/parsers/kimik2.rs`:
- Around line 61-92: The code binds Kimi tool-call indices to tool-definition
order by using enumerate() and formatting "functions.{name}:{index}"; change
build_structural_tag to iterate tools without enumerate (use tools.iter()) and
emit tags that do not use the definition-order index — e.g. format the keys as
"functions.{name}:0" (or another neutral placeholder) instead of "{index}" so
the structural tags aren’t tied to tool-definition order; update the two places
that currently format "functions.{name}:{index}" inside build_structural_tag
accordingly.
In `@crates/tool_parser/src/parsers/mistral.rs`:
- Around line 50-76: build_structural_tag currently emits a single
triggered_tags entry that forces the whole payload to be one tool call; change
build_structural_tag so that when multiple tools could be emitted (tools.len() >
1) it returns a json_schema format instead of triggered_tags: build a JSON
Schema for an array whose items are a oneOf list of objects, each object having
properties "name" (const = tool.function.name) and "arguments" (schema =
tool.function.parameters) and required ["name","arguments"], and return
serde_json::json!({ "format": { "type":"json_schema", "json_schema": <that array
schema> }}); keep the existing triggered_tags behavior only when there is
exactly one tool (or when you want to constrain to a single tool call).
In `@model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs`:
- Around line 181-194: The current call to
tool_parser_factory.registry().generate_tool_constraint(...) always forwards
ctx.components.configured_tool_parser which prevents the normal model-based
auto-detection when no override is set; instead, resolve the parser the same way
the runtime path does and pass that resolved parser into
generate_tool_constraint: if ctx.components.configured_tool_parser.is_some() use
it, otherwise call the registry's model-based resolver (the same helper used
elsewhere) using the request's model/metadata to pick the native parser, then
call generate_tool_constraint with that resolved parser (keeping
request.tool_choice and body_ref.tools as before) so model-based parsers
(Mistral/KimiK2) are used when no explicit override exists.
In `@model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs`:
- Around line 202-214: The tool constraint path only passes
ctx.components.configured_tool_parser.as_deref() into
tool_parser_factory.registry().generate_tool_constraint, which prevents the
registry's runtime auto-detection (model-based fallback) from being used; update
the code around tool_call_constraint so you first resolve the effective parser
using the same runtime lookup the rest of the code uses (i.e., if
ctx.components.configured_tool_parser is None, call the registry's model-based
resolver used elsewhere), then pass that resolved parser into
tool_parser_factory.registry().generate_tool_constraint so auto-detected
Mistral/Kimi parsers produce the correct native framing constraint.
---
Outside diff comments:
In `@model_gateway/src/routers/grpc/utils/chat_utils.rs`:
- Around line 211-229: The compatibility wrapper generate_tool_constraints
currently passes None as the first argument to
ParserRegistry::generate_tool_constraint, which prevents the registry from
taking the structural-tag branch; change the wrapper to use the provided model
string instead: rename the unused parameter _model to model (or use _model
without underscore) and forward Some(model) as the first argument to
factory.registry().generate_tool_constraint(Some(model), tools, choice) so
callers (including golang bindings) receive the correct structural-tag-aware
constraint; preserve the function signature/result handling otherwise.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b05fbf99-31b5-4327-91f1-c4ef0bf76111
📒 Files selected for processing (9)
crates/tool_parser/src/factory.rscrates/tool_parser/src/parsers/kimik2.rscrates/tool_parser/src/parsers/mistral.rsmodel_gateway/src/routers/grpc/context.rsmodel_gateway/src/routers/grpc/pd_router.rsmodel_gateway/src/routers/grpc/regular/stages/chat/preparation.rsmodel_gateway/src/routers/grpc/regular/stages/messages/preparation.rsmodel_gateway/src/routers/grpc/router.rsmodel_gateway/src/routers/grpc/utils/chat_utils.rs
09ca1ef to
739917b
Compare
|
Re: coderabbit outside-diff comment on Fixed in commit c8a1392 — the deprecated wrapper has been deleted. Golang bindings now call |
|
All 5 observations have been addressed:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 739917beca
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/grpc_client/src/sglang_scheduler.rs (1)
437-445:⚠️ Potential issue | 🟠 MajorDon't let Chat tool calls strip parser-required markers.
Line 445 now blindly forwards
request.skip_special_tokens. For structural-tag parsers like KimiK2, that can remove the native tool tokens the router-side parser keys on, and it also regressesautowhen tools are present but unconstrained. The Messages path at Lines 640-644 already applies the correct tool-aware policy; Chat should mirror that override here.Suggested fix
- let skip_special_tokens = request.skip_special_tokens; + let skip_special_tokens = match &tool_call_constraint { + Some((typ, _)) if typ == "json_schema" => true, + _ if request.tools.as_ref().is_some_and(|t| !t.is_empty()) => false, + _ => request.skip_special_tokens, + };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/grpc_client/src/sglang_scheduler.rs` around lines 437 - 445, In build_grpc_sampling_params_from_chat, don't forward request.skip_special_tokens blindly; instead compute skip_special_tokens using the same tool-aware override used in the Messages path (mirror the logic from the function that builds sampling params from messages) so parser-required markers aren't stripped: if the chat request includes tool calls and tool_call_constraint is None, force skip_special_tokens = false (preserve tool tokens), otherwise use request.skip_special_tokens; update build_grpc_sampling_params_from_chat (and reference extract_stop_strings as needed) to apply that override.
♻️ Duplicate comments (6)
crates/tool_parser/src/parsers/kimik2.rs (1)
61-84:⚠️ Potential issue | 🔴 CriticalDon’t bind Kimi call IDs to tool-definition order.
Line 61 uses
enumerate(), and Line 71 / Line 83 bake that index intofunctions.{name}:{index}. The parser, however, ignores the parsed index at Line 256, initializes the first emitted call as index0at Line 272, and then advances by emission order at Line 351. A first call to any non-first tool is therefore rejected by the constraint. Generate an index that is independent of definition order, or fall back until dynamic indices can be expressed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/tool_parser/src/parsers/kimik2.rs` around lines 61 - 84, The tags are baking in a definition-order index via tools.iter().enumerate() and the "functions.{name}:{index}" format in the tags.push calls, which conflicts with the runtime emission indexing the parser enforces; change the tag generation in the block that uses tools.iter().enumerate() (the two tags.push calls with format!("functions.{name}:{index}")) to avoid coupling to definition order—either omit the numeric index from the identifier (e.g., emit "functions.{name}" only) or substitute a placeholder token (e.g., "*"/"dynamic") or a stable per-tool ID (derived from tool.function.name) so the emitted call index can be applied at parse time instead of being hard-coded from enumerate().crates/tool_parser/src/parsers/mistral.rs (1)
48-76:⚠️ Potential issue | 🟠 MajorThis structural tag still only permits a single Mistral call.
Line 60 opens
[TOOL_CALLS] [{..., and Line 65 hard-closes with}], so there is no legal path to emit a second array item before the payload ends. Required / allowed_tools(required) now prefer this structural-tag path, so valid multi-call Mistral outputs regress. Either add continuation tags for subsequent items or fall back tojson_schemafor modes that may emit multiple calls.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/tool_parser/src/parsers/mistral.rs` around lines 48 - 76, The structural tag created by build_structural_tag currently always opens the array and then hard-closes it, preventing multiple items; update build_structural_tag (and the tags vector) to model a repeatable array item pattern: emit an opening tag for "[TOOL_CALLS] [" (or begin the first item with "{\"name\": ... \"arguments\": "), then emit a tag representing a single array item with the json_schema content, then emit a continuation tag that matches the comma separator between items (e.g., begin="," end="" or a tag whose begin is "," and which re-uses the same json_schema structure), and finally emit a closing tag for the array ("]" or "}]"); alternatively, when at_least_one is false or multi-call is allowed, fall back to returning a single json_schema entry rather than the triggered_tags structure so multi-call outputs aren’t constrained—change build_structural_tag to construct these opening/item/continuation/closing tag entries rather than producing a single begin + end pair.model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs (1)
202-212:⚠️ Potential issue | 🟠 MajorResolve the effective parser before building the constraint.
This still only forwards
configured_tool_parser, so default model-mapped Mistral/Kimi requests in the Messages path never reach the structural-tag branch.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs` around lines 202 - 212, The code builds tool_call_constraint using ctx.components.configured_tool_parser directly, which prevents resolving the effective parser (including model-mapped defaults) before constraint generation; update the logic in the tool_call_constraint computation (the block creating tool_call_constraint and calling ctx.components.tool_parser_factory.registry().generate_tool_constraint) to first resolve the effective parser (e.g., call the same resolution used elsewhere to derive the actual parser from configured_tool_parser and the request/model mapping) and pass that resolved parser into generate_tool_constraint so Messages-path requests reach the structural-tag branch.model_gateway/src/routers/grpc/regular/processor.rs (1)
144-149:⚠️ Potential issue | 🟠 MajorResolve the effective parser before checking structural-tag support.
has_structural_tag_for_parser(self.configured_tool_parser.as_deref())only sees the explicit override. On the default model-mapped path, both branches still treat structural-tag-capable parsers as JSON-schema mode and route tool responses throughparse_json_schema_response.Also applies to: 627-635
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/regular/processor.rs` around lines 144 - 149, Resolve the effective parser before checking structural-tag support: determine the actual parser to use (either self.configured_tool_parser.as_deref() if present or the model-mapped default used elsewhere) and pass that resolved parser into tool_parser_factory.registry().has_structural_tag_for_parser(...) instead of passing the raw configured_tool_parser; then base used_json_schema and the call to parse_json_schema_response on that resolved parser. Update both the block around has_structural_tag/used_json_schema and the duplicate section (lines ~627-635) to use the same resolved-parser variable.model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs (1)
186-196:⚠️ Potential issue | 🟠 MajorUse the resolved parser here, not only
configured_tool_parser.Passing only
ctx.components.configured_tool_parser.as_deref()keeps the default model-mapped Mistral/Kimi path on the generic JSON-schema fallback, so the native structural-tag fix still requires a manual override.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs` around lines 186 - 196, The code passes ctx.components.configured_tool_parser.as_deref() to registry().generate_tool_constraint, which forces the generic JSON-schema fallback; change the call to pass the resolved parser instance (the value produced by the tool parser resolution logic you have elsewhere) instead of configured_tool_parser.as_deref(), so registry().generate_tool_constraint receives the actual resolved parser (use the variable or factory call that holds the resolved parser) when building tool_call_constraint.bindings/golang/src/client.rs (1)
152-152:⚠️ Potential issue | 🟠 MajorFilter the request and resolve the parser before generating the tool constraint.
This path now feeds the raw
chat_request.toolslist intogenerate_tool_constraint(None, ...). That means forced-function requests can still constraintools[0]instead of the requested tool, and model-mapped Mistral/Kimi calls never reach the structural-tag branch. Based on learnings:ParserRegistry::generate_tool_constraintdoes not internally filtertoolsbytool_choice; caller filtering is the caller's responsibility.Also applies to: 180-186
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bindings/golang/src/client.rs` at line 152, The code currently passes the raw chat_request.tools list into ParserRegistry::generate_tool_constraint, which means tool filtering by tool_choice and resolving the parser is not happening before constraint generation; update the caller (where chat_request: ChatCompletionRequest is parsed and before calls to generate_tool_constraint and ParserRegistry::resolve_parser) to first filter chat_request.tools to the selected tool based on the tool_choice (or tool index/name), resolve the parser via ParserRegistry::resolve_parser for that selected tool, and then pass only that filtered tool (or its metadata) into ParserRegistry::generate_tool_constraint so forced-function requests and model-mapped calls hit the correct structural-tag branch; apply the same fix to the other occurrence around the code handling lines ~180-186 where generate_tool_constraint is invoked.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bindings/golang/src/client.rs`:
- Around line 201-208: The pattern match on chat_request.tool_choice currently
moves the Option<ToolChoice>, causing later borrows of &chat_request to fail;
change the match to use a reference: call .as_ref() on chat_request.tool_choice
when using matches! so you match against
Some(ToolChoice::Value(ToolChoiceValue::None)) by reference (e.g.,
matches!(chat_request.tool_choice.as_ref(), ...)); update the condition that
uses tool_constraint, chat_request.tools, and tool_choice accordingly so
tool_choice is borrowed, not moved.
In `@bindings/golang/src/policy.rs`:
- Around line 601-614: The matches! call on chat_request.tool_choice is moving
the non-Copy field and causing later borrow errors; change the pattern match to
borrow the option (e.g., use chat_request.tool_choice.as_ref()) so you match
against a reference (openai_protocol::common::ToolChoice::Value(...)) instead of
moving the value, keeping tool_constraint, chat_request, and subsequent uses
valid.
In `@model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs`:
- Around line 213-217: The match guard is trying to move the non-Copy enum field
request.tool_choice out of a shared borrow; change the guard to match a
reference by using request.tool_choice.as_ref() (so the condition becomes
something like matches!(request.tool_choice.as_ref(),
Some(ToolChoice::Value(ToolChoiceValue::None)))) while keeping the
request.tools.is_some() check unchanged; update the matches! usage in the if
guard that references ToolChoice/ToolChoiceValue to operate on a reference
rather than trying to move the value.
In `@model_gateway/src/routers/grpc/regular/streaming.rs`:
- Around line 253-266: The fast-path that treats a choice as "arguments-only"
must be gated by used_json_schema: update the logic that derives
is_specific_function so it requires both the original tool_choice checks (e.g.,
ToolChoice::Function, ToolChoice::Value::Required, AllowedTools mode ==
"required") AND used_json_schema == true (which itself is computed from
has_structural_tag and tool_choice); similarly, any branches that bypass the
parser and emit native wrapper tokens as arguments should only run when
used_json_schema is true — ensure places referencing is_specific_function and
the arguments-only fast path consult used_json_schema before skipping the parser
to prevent streaming raw TOOL_CALLS/<|tool_call_begin|> tokens for
structural-tag function calls.
---
Outside diff comments:
In `@crates/grpc_client/src/sglang_scheduler.rs`:
- Around line 437-445: In build_grpc_sampling_params_from_chat, don't forward
request.skip_special_tokens blindly; instead compute skip_special_tokens using
the same tool-aware override used in the Messages path (mirror the logic from
the function that builds sampling params from messages) so parser-required
markers aren't stripped: if the chat request includes tool calls and
tool_call_constraint is None, force skip_special_tokens = false (preserve tool
tokens), otherwise use request.skip_special_tokens; update
build_grpc_sampling_params_from_chat (and reference extract_stop_strings as
needed) to apply that override.
---
Duplicate comments:
In `@bindings/golang/src/client.rs`:
- Line 152: The code currently passes the raw chat_request.tools list into
ParserRegistry::generate_tool_constraint, which means tool filtering by
tool_choice and resolving the parser is not happening before constraint
generation; update the caller (where chat_request: ChatCompletionRequest is
parsed and before calls to generate_tool_constraint and
ParserRegistry::resolve_parser) to first filter chat_request.tools to the
selected tool based on the tool_choice (or tool index/name), resolve the parser
via ParserRegistry::resolve_parser for that selected tool, and then pass only
that filtered tool (or its metadata) into
ParserRegistry::generate_tool_constraint so forced-function requests and
model-mapped calls hit the correct structural-tag branch; apply the same fix to
the other occurrence around the code handling lines ~180-186 where
generate_tool_constraint is invoked.
In `@crates/tool_parser/src/parsers/kimik2.rs`:
- Around line 61-84: The tags are baking in a definition-order index via
tools.iter().enumerate() and the "functions.{name}:{index}" format in the
tags.push calls, which conflicts with the runtime emission indexing the parser
enforces; change the tag generation in the block that uses
tools.iter().enumerate() (the two tags.push calls with
format!("functions.{name}:{index}")) to avoid coupling to definition
order—either omit the numeric index from the identifier (e.g., emit
"functions.{name}" only) or substitute a placeholder token (e.g., "*"/"dynamic")
or a stable per-tool ID (derived from tool.function.name) so the emitted call
index can be applied at parse time instead of being hard-coded from enumerate().
In `@crates/tool_parser/src/parsers/mistral.rs`:
- Around line 48-76: The structural tag created by build_structural_tag
currently always opens the array and then hard-closes it, preventing multiple
items; update build_structural_tag (and the tags vector) to model a repeatable
array item pattern: emit an opening tag for "[TOOL_CALLS] [" (or begin the first
item with "{\"name\": ... \"arguments\": "), then emit a tag representing a
single array item with the json_schema content, then emit a continuation tag
that matches the comma separator between items (e.g., begin="," end="" or a tag
whose begin is "," and which re-uses the same json_schema structure), and
finally emit a closing tag for the array ("]" or "}]"); alternatively, when
at_least_one is false or multi-call is allowed, fall back to returning a single
json_schema entry rather than the triggered_tags structure so multi-call outputs
aren’t constrained—change build_structural_tag to construct these
opening/item/continuation/closing tag entries rather than producing a single
begin + end pair.
In `@model_gateway/src/routers/grpc/regular/processor.rs`:
- Around line 144-149: Resolve the effective parser before checking
structural-tag support: determine the actual parser to use (either
self.configured_tool_parser.as_deref() if present or the model-mapped default
used elsewhere) and pass that resolved parser into
tool_parser_factory.registry().has_structural_tag_for_parser(...) instead of
passing the raw configured_tool_parser; then base used_json_schema and the call
to parse_json_schema_response on that resolved parser. Update both the block
around has_structural_tag/used_json_schema and the duplicate section (lines
~627-635) to use the same resolved-parser variable.
In `@model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs`:
- Around line 186-196: The code passes
ctx.components.configured_tool_parser.as_deref() to
registry().generate_tool_constraint, which forces the generic JSON-schema
fallback; change the call to pass the resolved parser instance (the value
produced by the tool parser resolution logic you have elsewhere) instead of
configured_tool_parser.as_deref(), so registry().generate_tool_constraint
receives the actual resolved parser (use the variable or factory call that holds
the resolved parser) when building tool_call_constraint.
In `@model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs`:
- Around line 202-212: The code builds tool_call_constraint using
ctx.components.configured_tool_parser directly, which prevents resolving the
effective parser (including model-mapped defaults) before constraint generation;
update the logic in the tool_call_constraint computation (the block creating
tool_call_constraint and calling
ctx.components.tool_parser_factory.registry().generate_tool_constraint) to first
resolve the effective parser (e.g., call the same resolution used elsewhere to
derive the actual parser from configured_tool_parser and the request/model
mapping) and pass that resolved parser into generate_tool_constraint so
Messages-path requests reach the structural-tag branch.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 95e93f6f-a72b-48c0-929f-038a48a7bfd4
📒 Files selected for processing (18)
bindings/golang/src/client.rsbindings/golang/src/policy.rsbindings/golang/src/preprocessor.rscrates/grpc_client/src/sglang_scheduler.rscrates/grpc_client/src/vllm_engine.rscrates/tool_parser/src/factory.rscrates/tool_parser/src/lib.rscrates/tool_parser/src/parsers/kimik2.rscrates/tool_parser/src/parsers/mistral.rsmodel_gateway/src/routers/grpc/context.rsmodel_gateway/src/routers/grpc/pd_router.rsmodel_gateway/src/routers/grpc/regular/processor.rsmodel_gateway/src/routers/grpc/regular/stages/chat/preparation.rsmodel_gateway/src/routers/grpc/regular/stages/messages/preparation.rsmodel_gateway/src/routers/grpc/regular/streaming.rsmodel_gateway/src/routers/grpc/router.rsmodel_gateway/src/routers/grpc/utils/chat_utils.rsmodel_gateway/src/routers/grpc/utils/mod.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ca76040a49
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
model_gateway/src/routers/grpc/regular/streaming.rs (1)
1448-1456:⚠️ Potential issue | 🟡 MinorMessages API
skip_special_tokensis hardcodedtruein streaming but derived dynamically in preparation stage.In
messages/preparation.rs(lines 227–230),skip_special_tokensis derived based on constraint type:
falsewhen structural tags are present (tool markers)truefor JSON schema or when no tools are presentHowever, in
process_messages_streaming_response(line 1454), it's hardcoded totrue. This differs from Chat API, where streaming respects the request field.Impact is likely benign: The tokenizer's
skip_special_tokensparameter only filters BOS/EOS tokens; tool markers like[TOOL_CALLS]and<|tool_call_begin|>are model-native framing and not subject to this filtering. Tool parsing should work correctly despite the inconsistency. However, the mismatch with Chat API behavior and the prep stage logic warrants alignment—either propagate the preparation-stage value or update preparation to match streaming's hardcoded behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/regular/streaming.rs` around lines 1448 - 1456, process_messages_streaming_response currently hardcodes skip_special_tokens=true in the stop_params tuple (the tuple bound to stop_params in streaming.rs), which conflicts with the dynamic skip_special_tokens value computed during messages preparation; change the streaming code to use the same derived boolean instead of true—i.e., thread the prepared skip_special_tokens value (the one computed in messages preparation or the request-level field used there) into process_messages_streaming_response and replace the hardcoded true with that variable so streaming respects the preparation-stage decision for skip_special_tokens.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@model_gateway/src/routers/grpc/regular/streaming.rs`:
- Around line 1448-1456: process_messages_streaming_response currently hardcodes
skip_special_tokens=true in the stop_params tuple (the tuple bound to
stop_params in streaming.rs), which conflicts with the dynamic
skip_special_tokens value computed during messages preparation; change the
streaming code to use the same derived boolean instead of true—i.e., thread the
prepared skip_special_tokens value (the one computed in messages preparation or
the request-level field used there) into process_messages_streaming_response and
replace the hardcoded true with that variable so streaming respects the
preparation-stage decision for skip_special_tokens.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c2b2bcc8-a3cb-46f7-9be8-eb59e60a7028
📒 Files selected for processing (2)
model_gateway/src/routers/grpc/regular/stages/messages/preparation.rsmodel_gateway/src/routers/grpc/regular/streaming.rs
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs (1)
201-235:⚠️ Potential issue | 🟠 MajorDon’t pin the Messages stop decoder to
truewhen tool constraints are active.
tool_call_constraintis now available above, butcreate_stop_decoder(..., true, ...)still forces special-token stripping. That drops structural-tag/native-framing markers before the router parser can consume them, breaking Messages tool calls for parsers like Mistral/Kimi. Derive this flag from the chosen constraint/tool-choice path instead of the unconditional default.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs` around lines 201 - 235, The stop-decoder is always being created with the Messages API default flag hardcoded to true, which strips special tokens and breaks tool-call parsing when tool constraints are active; update the utils::create_stop_decoder call to derive that boolean from the presence/choice of tool constraints instead of the unconditional true—use tool_call_constraint (and/or chat_tool_choice) to set the flag (e.g., disable special-token stripping when a tool constraint or tool_choice exists) so structural framing markers are preserved for the router/parser.model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs (1)
180-222:⚠️ Potential issue | 🟠 MajorDerive the decoder’s
skip_special_tokensfrom the selectedToolConstraint.This new path can now return
StructuralTag, but the stage still feedsrequest.skip_special_tokensintocreate_stop_decoder()below. With the defaulttrue, native-framing tokens can be stripped before the router-side parser sees them, so required/function tool calls can regress for structural-tag parsers. Please compute the effective flag alongsidetool_call_constraintand pass that value into the decoder.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs` around lines 180 - 222, The stage currently always passes request.skip_special_tokens into utils::create_stop_decoder, but when a ToolConstraint (tool_call_constraint) indicates a structural-tag/native parser it must override that default; compute an effective_skip_special_tokens boolean alongside building tool_call_constraint by examining the selected ToolConstraint (e.g., inspect tool_call_constraint.as_ref() / the ToolConstraint variant or its method that indicates native/structural framing) and set effective_skip_special_tokens = constraint-driven value else fallback to request.skip_special_tokens, then pass effective_skip_special_tokens into create_stop_decoder instead of request.skip_special_tokens and persist/store this effective value with PreparationOutput if needed for downstream logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/grpc_client/src/sglang_scheduler.rs`:
- Around line 445-447: The gRPC path is inconsistent: some builders set let
skip_special_tokens = true while build_grpc_sampling_params_from_responses still
hardcodes skip_special_tokens: false, causing EOS/EOT markers to leak; update
build_grpc_sampling_params_from_responses (and the other response-path builder
locations referenced around the second occurrence) to use skip_special_tokens =
true (or read the same variable/constant used by the other builders) so all gRPC
SamplingParams consistently set skip_special_tokens to true and follow the same
backend contract.
---
Outside diff comments:
In `@model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs`:
- Around line 180-222: The stage currently always passes
request.skip_special_tokens into utils::create_stop_decoder, but when a
ToolConstraint (tool_call_constraint) indicates a structural-tag/native parser
it must override that default; compute an effective_skip_special_tokens boolean
alongside building tool_call_constraint by examining the selected ToolConstraint
(e.g., inspect tool_call_constraint.as_ref() / the ToolConstraint variant or its
method that indicates native/structural framing) and set
effective_skip_special_tokens = constraint-driven value else fallback to
request.skip_special_tokens, then pass effective_skip_special_tokens into
create_stop_decoder instead of request.skip_special_tokens and persist/store
this effective value with PreparationOutput if needed for downstream logic.
In `@model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs`:
- Around line 201-235: The stop-decoder is always being created with the
Messages API default flag hardcoded to true, which strips special tokens and
breaks tool-call parsing when tool constraints are active; update the
utils::create_stop_decoder call to derive that boolean from the presence/choice
of tool constraints instead of the unconditional true—use tool_call_constraint
(and/or chat_tool_choice) to set the flag (e.g., disable special-token stripping
when a tool constraint or tool_choice exists) so structural framing markers are
preserved for the router/parser.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7a68c5e6-e80a-4060-acbf-260ac6cc11b1
📒 Files selected for processing (5)
crates/grpc_client/src/sglang_scheduler.rscrates/grpc_client/src/vllm_engine.rse2e_test/chat_completions/test_function_calling.pymodel_gateway/src/routers/grpc/regular/stages/chat/preparation.rsmodel_gateway/src/routers/grpc/regular/stages/messages/preparation.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f82054ab88
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
22ac68a to
ec1cda9
Compare
| ) | ||
| { | ||
| chat_request.skip_special_tokens = false; | ||
| } |
There was a problem hiding this comment.
🟡 Nit: This mutation is dead code. build_grpc_sampling_params_from_chat in sglang_scheduler.rs now hardcodes skip_special_tokens = true (line 447), so the value set here on chat_request.skip_special_tokens is never read by the downstream gRPC request builder.
Same issue in policy.rs:612-625.
Either remove the dead derivation from both files, or — if the intent is forward-looking for when golang bindings support a configured_parser — add a // TODO: explaining why this exists and that it is currently inert.
There was a problem hiding this comment.
The derivation is not dead code — the derived skip_special_tokens is passed to sgl_grpc_response_converter_create (line 282) which uses it for SMG-side token decoding via StopDecoder. The gRPC request builder hardcodes it to true by design (backends return raw token IDs, skip_special_tokens only matters for SMG-side decoding). The derivation is consumed by the response converter, not the request builder.
There was a problem hiding this comment.
The threading to StopDecoder in go bindings is a separate issue and we should address it in another PR.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ec1cda9f6a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| serde_json::json!({ | ||
| "format": { | ||
| "type": "triggered_tags", | ||
| "triggers": ["<|tool_calls_section_begin|>", "<|tool_call_begin|>"], |
There was a problem hiding this comment.
Require section wrapper in Kimi structural-tag trigger
This structural-tag definition allows generation to trigger directly on <|tool_call_begin|> without first emitting <|tool_calls_section_begin|>. In non-streaming mode, KimiK2Parser::parse_complete only recognizes tool calls when the section-begin marker is present, so responses that follow this allowed path are treated as plain text and tool calls are lost. Please either constrain the first trigger/tag to require the section wrapper or make complete parsing accept the bare tool-call-begin form as well.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Valid observation — the dual trigger is intentional for parallel tool call support. The bare <|tool_call_begin|> trigger handles subsequent tool calls after the first one opened the section. However, has_tool_markers only checks for <|tool_calls_section_begin|>, which means a bare <|tool_call_begin|> without the section wrapper would be treated as plain text by parse_complete. This is a known gap — will address in a follow-up.
|
Hi @CatherineSue, this PR has merge conflicts that must be resolved before it can be merged. Please rebase your branch: git fetch origin main
git rebase origin/main
# resolve any conflicts, then:
git push --force-with-lease |
…generation and skip_special_tokens [Tasks 1-7,11] - Add ToolConstraint enum (JsonSchema/StructuralTag) to replace raw string tuples - Remove REQUIRES_SPECIAL_TOKENS from ParserEntry, MistralParser, KimiK2Parser (skip_special_tokens is derived from constraint type, not per-parser metadata) - Rename register_parser_with_meta to register_parser_with_structural_tag - Change generate_tool_constraint to return Result<Option<ToolConstraint>, String> to surface serialization errors instead of silently falling back - Restore $defs conflict detection in build_required_array_schema - Fix Mistral tool name JSON escaping via serde_json::to_string - Add has_structural_tag accessor for response parsing - Derive skip_special_tokens from constraint type in chat/messages preparation: json_schema -> keep user default, structural_tag/no constraint with tools -> false - Fix StopDecoder receiving inconsistent skip_special_tokens (was using user default while grpc_client independently overrode to false) Signed-off-by: Chang Su <chang.s.su@oracle.com> # Conflicts: # model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs
…odel resolution [Task 12] Extract shared model→parser resolution logic (exact match → prefix match) into resolve_model_to_parser and resolve_parser_name methods. Refactor has_parser_for_model, create_for_model, get_pooled_for_model, and ParserFactory::get_parser to use the shared helper, eliminating ~60 lines of duplicated code across 4 methods. Signed-off-by: Chang Su <chang.s.su@oracle.com>
…se parsing [Tasks 8-9] Replace tool_choice-based used_json_schema inference with registry-based has_structural_tag check. When the resolved parser supports structural tags, response parsing uses the model-specific parser instead of forcing the generic JSON parser. Fixes divergence where constraint generation used structural tags but response parsing assumed json_schema. Affected paths: chat processor, messages processor, chat streaming, messages streaming (4 sites total). Signed-off-by: Chang Su <chang.s.su@oracle.com>
…onstraint-aware [Task 10] Chat API: skip_special_tokens is now a passthrough from the request. The preparation stage sets it correctly based on constraint type. Messages API: since CreateMessageRequest has no skip_special_tokens field, derive it from constraint type — json_schema keeps true (pure JSON output), structural_tag or tools without constraint forces false (parser needs special token delimiters). Responses/completion API paths unchanged. Signed-off-by: Chang Su <chang.s.su@oracle.com>
…l-call-parser [Tasks 8-9 followup] Add has_structural_tag_for_parser(configured: Option<&str>) to ParserRegistry. Structural tags only activate when --tool-call-parser is explicitly set to a parser that supports them — no model-name auto-detection, which is unreliable with model aliases. Simplifies 4 call sites in processor.rs and streaming.rs from 7-line resolve_parser_name + has_structural_tag blocks to single-line calls. Signed-off-by: Chang Su <chang.s.su@oracle.com>
…ed wrapper [Task 13] Migrate all 3 golang binding files (client.rs, preprocessor.rs, policy.rs) to use PARSER_FACTORY.registry().generate_tool_constraint() directly instead of the deprecated generate_tool_constraints() wrapper in chat_utils.rs. Go bindings now derive skip_special_tokens from constraint type (option B): json_schema keeps the default, other constraint types or no constraint with tools present forces skip_special_tokens=false. Delete the deprecated generate_tool_constraints() function and its re-export. Signed-off-by: Chang Su <chang.s.su@oracle.com>
…nup] The method was added during dedup but is not called anywhere. Structural tag checks use configured_tool_parser directly (explicit opt-in), not model-based auto-detection. Signed-off-by: Chang Su <chang.s.su@oracle.com>
…s tool_choice none Two fixes for structural tag correctness: 1. streaming.rs: is_specific_function (arguments-only fast path) now requires used_json_schema=true. When structural tags are active, the output includes framing tokens that the parser must handle — bypassing the parser would stream raw [TOOL_CALLS] or <|tool_call_begin|> tokens as arguments. 2. messages/preparation.rs: skip_special_tokens derivation now respects tool_choice=none. Previously, having tools present with tool_choice=none would force skip_special_tokens=false, causing special tokens to leak into plain text responses. Signed-off-by: Chang Su <chang.s.su@oracle.com>
…onstraint-aware [Task 10] Mutate skip_special_tokens on the request in the chat preparation stage (via Cow::to_mut) so both StopDecoder and the grpc_client backend see the same derived value. The grpc_client remains a simple passthrough. Previously, skip_special_tokens was derived as a local variable for the StopDecoder but request.skip_special_tokens was never modified, so the backend still received the user default (true), stripping special tokens needed by tool parsers like Mistral and KimiK2. Signed-off-by: Chang Su <chang.s.su@oracle.com> # Conflicts: # model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs
…ag support - Update assertion from "not supported" to "conflicting" to match the restored $defs conflict error message - Skip test_conflicting_defs_required_tool_choice for Mistral: structural tags handle each tool's schema independently, so $defs conflicts are irrelevant when not consolidated into a single array schema Signed-off-by: Chang Su <chang.s.su@oracle.com>
gRPC backends (sglang, vllm) return raw token IDs, not decoded text. Detokenization happens entirely on the SMG Rust side via StopDecoder. The skip_special_tokens field in the gRPC proto is unused by the backend in gRPC mode, so hardcode it to true to avoid confusion. Also revert the skip_special_tokens derivation from preparation stages — the StopDecoder should use the request's original value (true by default). Changing it to false caused EOS/EOT tokens to appear in decoded text, breaking tool call detection. Signed-off-by: Chang Su <chang.s.su@oracle.com>
Use tags_with_separator format inside triggered_tags to support multiple
tool calls in a single array. The format matches Mistral's native output:
[TOOL_CALLS] [{"name":"f1","arguments":{...}}, {"name":"f2","arguments":{...}}]
Previously, each tool had its own triggered_tags entry with begin/end that
included the array brackets, making parallel calls impossible.
Signed-off-by: Chang Su <chang.s.su@oracle.com>
…ad to streaming Derive skip_special_tokens based on the constraint type, not just tool presence: - json_schema: keep default (true) — backend forces JSON, no trigger tokens - structural_tag or no constraint (auto): set false — parser needs trigger tokens - no tools: keep request default The derived value is stored on ResponseState (survives request_building .take() of PreparationOutput) and threaded to streaming.rs via new parameter on process_streaming_response and process_messages_streaming_response. Signed-off-by: Chang Su <chang.s.su@oracle.com>
…plate - Enable test_complex_parameters_required_non_streaming for Mistral (structural tags fix the whitespace issue that caused the skip) - Remove custom tool_chat_template_mistral_parallel.jinja — Mistral v0.3's built-in template handles tools natively - Update test_function_call_strict skip reason Signed-off-by: Chang Su <chang.s.su@oracle.com>
2fefced to
a1b92c7
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2fefcedf65
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Move the empty tools check before the structural tag path, preventing an unsatisfiable grammar (zero tags with at_least_one=true) from being sent to the backend. Also fix missing .to_tuple() in chat preparation (rebase artifact). Signed-off-by: Chang Su <chang.s.su@oracle.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 99e42d5969
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| "<|tool_calls_section_begin|><|tool_call_begin|>functions.{name}:{index}<|tool_call_argument_begin|>" | ||
| ), |
There was a problem hiding this comment.
Sanitize Kimi tool names in structural-tag delimiters
The new Kimi structural-tag builder interpolates tool.function.name directly into the delimiter string (functions.{name}:{index}), but this format is later parsed with a strict identifier pattern ([\w\.]+:\d+). If a caller provides a valid request containing a tool name with delimiter/meta characters (for example : or token-like substrings), the generated structural tag can become unmatchable and tool calls will be dropped as plain text. Please validate/restrict Kimi tool names for this format (or escape/encode them consistently) before building the tag.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
I doubt Kimi-K2 would even work correctly under such case. Not worth to address this for now. And OpenAI only allows
openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid 'tools[0].function.name': string does not match pattern. Expected a string that matches the pattern '^[a-zA-Z0-9_-]+$'.", 'type': 'invalid_request_error', 'param': 'tools[0].function.name', 'code': 'invalid_value'}}
There was a problem hiding this comment.
We should consider adding such validation though over the function name. cc @key4ng
…_special_tokens derivation (lightseekorg#1090) Signed-off-by: Chang Su <chang.s.su@oracle.com>
Description
Problem
The model gateway has three interrelated issues with tool-call constraint generation and
skip_special_tokenshandling:1. Generic
json_schemaconstraint breaks native-format modelsThe gateway uses a single generic
json_schemaconstraint for all tool-calling models whentool_choiceisrequiredor a specific function. The xgrammar backend applies this JSON schema grammar mask starting at token position 0, meaning every token must conform to the schema. This is incorrect for models that use native framing tokens around their tool calls:[TOOL_CALLS] [{"name": "func", "arguments": {...}}]— the[TOOL_CALLS]prefix is not valid JSON.<|tool_calls_section_begin|>,<|tool_call_begin|>, etc.The xgrammar backend supports structural tags (
triggered_tagsformat) designed for this: the model generates free-form text until a trigger string is matched, then the grammar constrains only the structured content.2.
skip_special_tokensinconsistency between StopDecoder and backendThe preparation stages pass
request.skip_special_tokens(user's default, typicallytrue) to the StopDecoder. Meanwhile,sglang_scheduler.rsandvllm_engine.rsindependently overrideskip_special_tokenstofalsewhen tools are present. This means the StopDecoder and the backend see differentskip_special_tokensvalues, which can cause inconsistent stop sequence matching.Investigation across inference engines:
serving_chat.py:343-344): blanketskip_special_tokens = Falsefor all tool callsadjust_request): most tool parsers setskip_special_tokens = Falseopenai_server.py:1051-1056): per-parser vianeeds_raw_special_tokensclass attribute — most selective approach, only sets it when the parser explicitly declares the needThe correct behavior:
skip_special_tokensshould be derived from the constraint type, not blanket-overridden. Withjson_schema, output is pure JSON — special tokens aren't needed. Withstructural_tagor no constraint, the parser needs special token delimiters.3. Response parsing assumes
json_schemaconstraintprocessor.rsandstreaming.rsinferused_json_schemafromtool_choicepatterns and force the generic "json" parser. When structural tags are used instead, the model output includes framing tokens that the JSON parser can't handle. The model-specific parser should be used.Solution
Constraint type as a first-class enum:
Introduce
ToolConstraintenum (JsonSchema/StructuralTag) replacing raw(String, String)tuples. Each parser can register abuild_structural_tagfunction. The constraint generation strategy:tool_choicebuild_structural_tag?auto/nonerequired/function/allowed_tools(required)StructuralTagrequired/function/allowed_tools(required)JsonSchema(fallback)Constraint-aware
skip_special_tokens:Both the gateway preparation stages and the grpc_client derive
skip_special_tokensfrom the constraint type:JsonSchema→ keep user default (true) — output is pure JSONStructuralTagor no constraint with tools present →false— parser needs delimitersThis makes StopDecoder and the backend consistent.
Explicit opt-in for structural tags:
Structural tags are only used when
--tool-call-parseris explicitly configured. Model IDs can be aliases (e.g.model-234) that don't match any pattern, so model-based auto-detection is unreliable for constraint generation.Changes
crates/tool_parser/src/factory.rs:ToolConstraintenum withto_tuple()andis_json_schema()helpersrequires_special_tokensfromParserEntry(skip_special_tokens is now derived from constraint type)register_parser_with_meta→register_parser_with_structural_taggenerate_tool_constraintreturn type toResult<Option<ToolConstraint>, String>has_structural_tagandhas_structural_tag_for_parseraccessorsresolve_model_to_parserto deduplicate model→parser resolution across 4 methods (~60 lines removed)$defsconflict detection inbuild_required_array_schemacrates/tool_parser/src/parsers/mistral.rs: RemoveREQUIRES_SPECIAL_TOKENS, fix tool name JSON escaping viaserde_json::to_stringcrates/tool_parser/src/parsers/kimik2.rs: RemoveREQUIRES_SPECIAL_TOKENSmodel_gateway/src/routers/grpc/regular/stages/chat/preparation.rs: Deriveskip_special_tokensfrom constraint type, surface schema errors as 400, fix StopDecoder consistencymodel_gateway/src/routers/grpc/regular/stages/messages/preparation.rs: Same as chat preparationmodel_gateway/src/routers/grpc/regular/processor.rs: Replaceused_json_schemainference withhas_structural_tag_for_parsercheck (2 sites)model_gateway/src/routers/grpc/regular/streaming.rs: Same as processor (2 sites)crates/grpc_client/src/sglang_scheduler.rs: Remove blanketskip_special_tokensoverride for chat; make messages constraint-type-awarecrates/grpc_client/src/vllm_engine.rs: Same as sglang_schedulerbindings/golang/src/{client,preprocessor,policy}.rs: Migrate toPARSER_FACTORY.registry().generate_tool_constraint()with constraint-type-awareskip_special_tokensderivationmodel_gateway/src/routers/grpc/utils/chat_utils.rs: Delete deprecatedgenerate_tool_constraintswrapper and re-exportFollow-up Items
trtllm_service.rs: Addskip_special_tokensto TRT-LLM gRPC proto and wire it (field not in TRT-LLM proto today)tool_choice: autowith structural tags: Sendtriggered_tagswithat_least_one: false— model generates freely but tool call arguments are constrained when triggered. Ref: xgrammar structural tag docstags_with_separatorformat as sglang implementsbuild_structural_tagto DeepSeek, Llama, Qwen, Hermes, Internlm, and other parsers (ref: sglang#17171)ToolConstraintenum, structural tag generation, and constraint-type-awareskip_special_tokensderivationTest Plan
tool_choice: requiredproducesstructural_tagconstraint with[TOOL_CALLS]triggertool_choice: requiredproducesstructural_tagconstraint with dual triggersbuild_structural_tag(e.g.,json,qwen) fall back tojson_schemaconstrainttool_choice: autoandtool_choice: noneproduce no constraintskip_special_tokensisfalsewhen structural tags are used,truefor json_schemaskip_special_tokensvaluesChecklist
cargo +nightly fmtpassescargo clippy --all-targets --all-features -- -D warningspassesSummary by CodeRabbit
New Features
Improvements
Bug Fixes
Tests