Skip to content

feat(tool_parser): add structural tag constraint generation, fix skip_special_tokens derivation#1090

Merged
CatherineSue merged 15 commits intomainfrom
feat/parser-constraint-generation
Apr 15, 2026
Merged

feat(tool_parser): add structural tag constraint generation, fix skip_special_tokens derivation#1090
CatherineSue merged 15 commits intomainfrom
feat/parser-constraint-generation

Conversation

@CatherineSue
Copy link
Copy Markdown
Member

@CatherineSue CatherineSue commented Apr 10, 2026

Description

Problem

The model gateway has three interrelated issues with tool-call constraint generation and skip_special_tokens handling:

1. Generic json_schema constraint breaks native-format models

The gateway uses a single generic json_schema constraint for all tool-calling models when tool_choice is required or a specific function. The xgrammar backend applies this JSON schema grammar mask starting at token position 0, meaning every token must conform to the schema. This is incorrect for models that use native framing tokens around their tool calls:

  • Mistral outputs [TOOL_CALLS] [{"name": "func", "arguments": {...}}] — the [TOOL_CALLS] prefix is not valid JSON.
  • Kimi K2 outputs tool calls wrapped in special tokens like <|tool_calls_section_begin|>, <|tool_call_begin|>, etc.

The xgrammar backend supports structural tags (triggered_tags format) designed for this: the model generates free-form text until a trigger string is matched, then the grammar constrains only the structured content.

2. skip_special_tokens inconsistency between StopDecoder and backend

The preparation stages pass request.skip_special_tokens (user's default, typically true) to the StopDecoder. Meanwhile, sglang_scheduler.rs and vllm_engine.rs independently override skip_special_tokens to false when tools are present. This means the StopDecoder and the backend see different skip_special_tokens values, which can cause inconsistent stop sequence matching.

Investigation across inference engines:

  • sglang (serving_chat.py:343-344): blanket skip_special_tokens = False for all tool calls
  • vllm (per-parser adjust_request): most tool parsers set skip_special_tokens = False
  • TRT-LLM (openai_server.py:1051-1056): per-parser via needs_raw_special_tokens class attribute — most selective approach, only sets it when the parser explicitly declares the need

The correct behavior: skip_special_tokens should be derived from the constraint type, not blanket-overridden. With json_schema, output is pure JSON — special tokens aren't needed. With structural_tag or no constraint, the parser needs special token delimiters.

3. Response parsing assumes json_schema constraint

processor.rs and streaming.rs infer used_json_schema from tool_choice patterns and force the generic "json" parser. When structural tags are used instead, the model output includes framing tokens that the JSON parser can't handle. The model-specific parser should be used.

Solution

Constraint type as a first-class enum:

Introduce ToolConstraint enum (JsonSchema/StructuralTag) replacing raw (String, String) tuples. Each parser can register a build_structural_tag function. The constraint generation strategy:

tool_choice Parser has build_structural_tag? Constraint type
auto / none None (no constraint)
required / function / allowed_tools(required) Yes StructuralTag
required / function / allowed_tools(required) No JsonSchema (fallback)

Constraint-aware skip_special_tokens:

Both the gateway preparation stages and the grpc_client derive skip_special_tokens from the constraint type:

  • JsonSchema → keep user default (true) — output is pure JSON
  • StructuralTag or no constraint with tools present → false — parser needs delimiters

This makes StopDecoder and the backend consistent.

Explicit opt-in for structural tags:

Structural tags are only used when --tool-call-parser is explicitly configured. Model IDs can be aliases (e.g. model-234) that don't match any pattern, so model-based auto-detection is unreliable for constraint generation.

Changes

  • crates/tool_parser/src/factory.rs:

    • Add ToolConstraint enum with to_tuple() and is_json_schema() helpers
    • Remove requires_special_tokens from ParserEntry (skip_special_tokens is now derived from constraint type)
    • Rename register_parser_with_metaregister_parser_with_structural_tag
    • Change generate_tool_constraint return type to Result<Option<ToolConstraint>, String>
    • Add has_structural_tag and has_structural_tag_for_parser accessors
    • Extract resolve_model_to_parser to deduplicate model→parser resolution across 4 methods (~60 lines removed)
    • Restore $defs conflict detection in build_required_array_schema
  • crates/tool_parser/src/parsers/mistral.rs: Remove REQUIRES_SPECIAL_TOKENS, fix tool name JSON escaping via serde_json::to_string

  • crates/tool_parser/src/parsers/kimik2.rs: Remove REQUIRES_SPECIAL_TOKENS

  • model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs: Derive skip_special_tokens from constraint type, surface schema errors as 400, fix StopDecoder consistency

  • model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs: Same as chat preparation

  • model_gateway/src/routers/grpc/regular/processor.rs: Replace used_json_schema inference with has_structural_tag_for_parser check (2 sites)

  • model_gateway/src/routers/grpc/regular/streaming.rs: Same as processor (2 sites)

  • crates/grpc_client/src/sglang_scheduler.rs: Remove blanket skip_special_tokens override for chat; make messages constraint-type-aware

  • crates/grpc_client/src/vllm_engine.rs: Same as sglang_scheduler

  • bindings/golang/src/{client,preprocessor,policy}.rs: Migrate to PARSER_FACTORY.registry().generate_tool_constraint() with constraint-type-aware skip_special_tokens derivation

  • model_gateway/src/routers/grpc/utils/chat_utils.rs: Delete deprecated generate_tool_constraints wrapper and re-export

Follow-up Items

  • trtllm_service.rs: Add skip_special_tokens to TRT-LLM gRPC proto and wire it (field not in TRT-LLM proto today)
  • tool_choice: auto with structural tags: Send triggered_tags with at_least_one: false — model generates freely but tool call arguments are constrained when triggered. Ref: xgrammar structural tag docs
  • Mistral parallel tool calls: Current structural tag limits to single tool call per trigger — needs tags_with_separator format as sglang implements
  • More structural tag parsers: Add build_structural_tag to DeepSeek, Llama, Qwen, Hermes, Internlm, and other parsers (ref: sglang#17171)
  • Unit tests: Add tests for ToolConstraint enum, structural tag generation, and constraint-type-aware skip_special_tokens derivation

Test Plan

  • Verify Mistral tool calling with tool_choice: required produces structural_tag constraint with [TOOL_CALLS] trigger
  • Verify Kimi K2 tool calling with tool_choice: required produces structural_tag constraint with dual triggers
  • Verify parsers without build_structural_tag (e.g., json, qwen) fall back to json_schema constraint
  • Verify tool_choice: auto and tool_choice: none produce no constraint
  • Verify skip_special_tokens is false when structural tags are used, true for json_schema
  • Verify StopDecoder and backend receive consistent skip_special_tokens values
  • Run existing e2e function calling tests to ensure no regression
Checklist
  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • (Optional) Documentation updated
  • (Optional) Please join us on Slack #sig-smg to discuss, review, and merge PRs

Summary by CodeRabbit

  • New Features

    • Structural tag-based tool calling added for Mistral and KimiK2 parsers
    • CLI-configurable tool parser selection; registry-driven constraint generation emits JSON Schema or structural-tag constraints
  • Improvements

    • Runtime selection between JSON-schema and structural-tag parsing for more accurate tool calls
    • Tool-constraint results standardized to a tuple form for downstream use
  • Bug Fixes

    • Error messaging tightened for conflicting schema definitions
  • Tests

    • Updated test expectations for conflict error wording

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Registry now stores per-parser metadata (including optional structural-tag builders) and exposes a unified constraint-generation API. Gateway and bindings thread a configured parser into shared components, delegate constraint generation to the registry, and adjust parsing and skip-special-tokens behavior based on returned constraint types.

Changes

Cohort / File(s) Summary
Parser registry & factory
crates/tool_parser/src/factory.rs, crates/tool_parser/src/lib.rs
Replaced creator-only map with ParserEntry metadata (entries). Added ToolConstraint enum, register_parser_with_structural_tag, has_structural_tag*, resolve_model_to_parser, and generate_tool_constraint. Parser resolution/pooling now uses resolve_model_to_parser; exported ToolConstraint.
Parser structural-tag builders
crates/tool_parser/src/parsers/mistral.rs, crates/tool_parser/src/parsers/kimik2.rs
Added build_structural_tag(tools, at_least_one) implementations producing "triggered_tags" JSON structures; these are registered via the new registry API.
Gateway shared components & wiring
model_gateway/src/routers/grpc/context.rs, .../pd_router.rs, .../router.rs
Threaded configured_tool_parser: Option<String> into SharedComponents and router/PD construction; removed #[expect(dead_code)] on tool_parser_factory.
Preparation stages (chat & messages)
model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs, .../messages/preparation.rs
Switched generation to registry().generate_tool_constraint(configured, tools, tool_choice), now requires both tools and tool_choice to produce a constraint. Stored constraint as to_tuple(); error mapping preserved.
Processing & streaming selection
model_gateway/src/routers/grpc/regular/processor.rs, .../streaming.rs
Parsing-path selection (JSON-schema vs normal tool parsing) now consults has_structural_tag_for_parser(configured) and alters used_json_schema / is_specific_function checks accordingly.
Removed local utils & re-exports
model_gateway/src/routers/grpc/utils/chat_utils.rs, model_gateway/src/routers/grpc/utils/mod.rs
Deleted local generate_tool_constraints and build_required_array_schema/$defs consolidation; removed generate_tool_constraints re-export.
Bindings & consumers
bindings/golang/src/client.rs, bindings/golang/src/policy.rs, bindings/golang/src/preprocessor.rs, crates/grpc_client/src/sglang_scheduler.rs, crates/grpc_client/src/vllm_engine.rs
Replaced helper calls with PARSER_FACTORY.registry().generate_tool_constraint(None, tools, tool_choice); adjusted request mutability and skip_special_tokens logic (now driven by constraint type or set to true in some clients).
Tests
e2e_test/chat_completions/test_function_calling.py
Adjusted assertion message for conflicting $defs and added a Mistral-skipped test wrapper.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant GrpcRouter
    participant SharedComponents
    participant ParserRegistry
    participant Parser

    Client->>GrpcRouter: Send request (model, tools, tool_choice)
    GrpcRouter->>SharedComponents: build request context (includes configured_tool_parser)
    GrpcRouter->>ParserRegistry: generate_tool_constraint(configured_parser, tools, tool_choice)
    alt configured parser has structural tag
        ParserRegistry->>Parser: call build_structural_tag(tools, at_least_one)
        Parser-->>ParserRegistry: StructuralTag(JSON)
    else generic/json-schema path
        ParserRegistry-->>ParserRegistry: build JSON-schema constraint (or None)
    end
    ParserRegistry-->>GrpcRouter: return ToolConstraint (JsonSchema | StructuralTag | None)
    GrpcRouter->>GrpcRouter: set skip_special_tokens / parsing path based on constraint type
    GrpcRouter->>Client: continue processing/request execution
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Suggested labels

protocols

Suggested reviewers

  • key4ng
  • slin1237

Poem

🐰
I dug a burrow of tidy code,
Parsers lined up in a row,
Tags that trigger, schemas show,
Registry hums and pipes now flow,
I munch a carrot — tests all go.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main changes: introducing structural tag constraint generation and fixing skip_special_tokens derivation logic, which are the primary objectives of this PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/parser-constraint-generation

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added grpc gRPC client and router changes tool-parser Tool/function call parser changes model-gateway Model gateway crate changes labels Apr 10, 2026
Copy link
Copy Markdown
Member Author

CatherineSue commented Apr 10, 2026

Code Review Observations

1. Mistral: single tool call only. The end: "}]" closes both the JSON object and the array, so parallel tool calls ([{...}, {...}]) aren't supported via the structural tag path. Kimi K2 handles this correctly with dual triggers (<|tool_calls_section_begin|> vs <|tool_call_begin|>). This should be addressed in a follow-up.

2. $defs conflict checking was silently dropped. The old build_required_array_schema in chat_utils returned an error if two tools had the same $def name with different schemas. The new version in factory.rs uses or_insert_with (first-wins, no error). This matches Python's behavior, but is a behavioral change worth documenting.

3. Error → Option. generate_tool_constraint returns Option instead of Result. If serde_json::to_string fails, you get None (no constraint) instead of a propagated error. Shouldn't happen in practice, but it's a silent failure mode.

4. Deprecated wrapper is expensive. chat_utils::generate_tool_constraints now creates a brand new ParserFactory (registering ~14 parsers) on every call just to pass None as the configured parser. Since it always hits the json_schema fallback, it could inline that logic directly. Should be addressed in a follow-up.

5. No auto mode constraint. The sglang design (sgl-project/sglang#13032) envisions structural tags for auto mode too (with at_least_one = false), so the model gets format guidance when it chooses to call a tool. This PR returns None for auto, which is correct but leaves that benefit on the table for a future PR.

Comment thread crates/tool_parser/src/factory.rs
Comment thread model_gateway/src/routers/grpc/utils/chat_utils.rs Outdated
Comment thread crates/tool_parser/src/factory.rs Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the tool parser registry to support model-specific structural tags alongside generic JSON schema constraints. It introduces a ParserEntry metadata structure, adds native structural tag builders for Mistral and Kimi K2 models, and migrates the gateway's preparation stages to use the new centralized generate_tool_constraint logic. Feedback focuses on enhancing the robustness of the new constraint generation by handling potential schema conflicts during $defs consolidation, improving error handling for serialization, and suggesting automatic parser resolution via model IDs.

Comment thread crates/tool_parser/src/factory.rs
Comment thread crates/tool_parser/src/factory.rs Outdated
Comment thread crates/tool_parser/src/parsers/mistral.rs Outdated
Comment thread model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 09ca1ef43a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread crates/tool_parser/src/factory.rs
Comment thread crates/tool_parser/src/factory.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
model_gateway/src/routers/grpc/utils/chat_utils.rs (1)

211-229: ⚠️ Potential issue | 🟠 Major

Compatibility wrapper can no longer produce structural tags.

This always calls generate_tool_constraint(None, ...), so the registry’s structural-tag branch is unreachable. Any remaining caller on this wrapper—including the golang bindings called out in the FIXME—will still get the old token-0 json_schema constraint for Mistral/Kimi models, which defeats the compatibility goal of this PR.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/utils/chat_utils.rs` around lines 211 - 229,
The compatibility wrapper generate_tool_constraints currently passes None as the
first argument to ParserRegistry::generate_tool_constraint, which prevents the
registry from taking the structural-tag branch; change the wrapper to use the
provided model string instead: rename the unused parameter _model to model (or
use _model without underscore) and forward Some(model) as the first argument to
factory.registry().generate_tool_constraint(Some(model), tools, choice) so
callers (including golang bindings) receive the correct structural-tag-aware
constraint; preserve the function signature/result handling otherwise.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tool_parser/src/factory.rs`:
- Around line 540-549: The current consolidation into all_defs silently drops
later $defs with the same key; instead, when iterating the tools collection and
extracting params.get("$defs") (inside the loop that builds all_defs),
rewrite/namespace each definition key uniquely (e.g., prefix with a stable tool
identifier such as tool.name or tool.function.name or tool.id, e.g.
format!("{}::{}", tool_id, def_name)) and insert those namespaced keys into
all_defs; also update any $ref strings inside def_schema and the tool's
parameter schemas to point to the new namespaced keys so references remain
correct (i.e., mutate def_schema and the parameter schema $ref values as you
namespace). Ensure this transformation happens where all_defs and the
tool.function.parameters are being processed so no raw-key collisions occur.
- Around line 167-199: The code currently uses the full tools list when building
constraints, so ToolChoice filters are ignored; update the logic in the function
that handles tool_choice (referencing the variable tool_choice and the enum
variants ToolChoice::Function and ToolChoice::AllowedTools { mode, .. }) to
first produce a filtered_tools list according to the requested choice (for
ToolChoice::Function use only the specific function/tool requested, for
AllowedTools restrict to the allowed tool IDs when mode == "required", etc.),
return None if filtered_tools is empty, and then use filtered_tools (not tools)
when calling entry.build_structural_tag, serializing parameters for
ToolChoice::Function (use filtered_tools[0].function.parameters), and when
calling build_required_array_schema(filtered_tools).

In `@crates/tool_parser/src/parsers/kimik2.rs`:
- Around line 61-92: The code binds Kimi tool-call indices to tool-definition
order by using enumerate() and formatting "functions.{name}:{index}"; change
build_structural_tag to iterate tools without enumerate (use tools.iter()) and
emit tags that do not use the definition-order index — e.g. format the keys as
"functions.{name}:0" (or another neutral placeholder) instead of "{index}" so
the structural tags aren’t tied to tool-definition order; update the two places
that currently format "functions.{name}:{index}" inside build_structural_tag
accordingly.

In `@crates/tool_parser/src/parsers/mistral.rs`:
- Around line 50-76: build_structural_tag currently emits a single
triggered_tags entry that forces the whole payload to be one tool call; change
build_structural_tag so that when multiple tools could be emitted (tools.len() >
1) it returns a json_schema format instead of triggered_tags: build a JSON
Schema for an array whose items are a oneOf list of objects, each object having
properties "name" (const = tool.function.name) and "arguments" (schema =
tool.function.parameters) and required ["name","arguments"], and return
serde_json::json!({ "format": { "type":"json_schema", "json_schema": <that array
schema> }}); keep the existing triggered_tags behavior only when there is
exactly one tool (or when you want to constrain to a single tool call).

In `@model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs`:
- Around line 181-194: The current call to
tool_parser_factory.registry().generate_tool_constraint(...) always forwards
ctx.components.configured_tool_parser which prevents the normal model-based
auto-detection when no override is set; instead, resolve the parser the same way
the runtime path does and pass that resolved parser into
generate_tool_constraint: if ctx.components.configured_tool_parser.is_some() use
it, otherwise call the registry's model-based resolver (the same helper used
elsewhere) using the request's model/metadata to pick the native parser, then
call generate_tool_constraint with that resolved parser (keeping
request.tool_choice and body_ref.tools as before) so model-based parsers
(Mistral/KimiK2) are used when no explicit override exists.

In `@model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs`:
- Around line 202-214: The tool constraint path only passes
ctx.components.configured_tool_parser.as_deref() into
tool_parser_factory.registry().generate_tool_constraint, which prevents the
registry's runtime auto-detection (model-based fallback) from being used; update
the code around tool_call_constraint so you first resolve the effective parser
using the same runtime lookup the rest of the code uses (i.e., if
ctx.components.configured_tool_parser is None, call the registry's model-based
resolver used elsewhere), then pass that resolved parser into
tool_parser_factory.registry().generate_tool_constraint so auto-detected
Mistral/Kimi parsers produce the correct native framing constraint.

---

Outside diff comments:
In `@model_gateway/src/routers/grpc/utils/chat_utils.rs`:
- Around line 211-229: The compatibility wrapper generate_tool_constraints
currently passes None as the first argument to
ParserRegistry::generate_tool_constraint, which prevents the registry from
taking the structural-tag branch; change the wrapper to use the provided model
string instead: rename the unused parameter _model to model (or use _model
without underscore) and forward Some(model) as the first argument to
factory.registry().generate_tool_constraint(Some(model), tools, choice) so
callers (including golang bindings) receive the correct structural-tag-aware
constraint; preserve the function signature/result handling otherwise.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b05fbf99-31b5-4327-91f1-c4ef0bf76111

📥 Commits

Reviewing files that changed from the base of the PR and between 6c39621 and 09ca1ef.

📒 Files selected for processing (9)
  • crates/tool_parser/src/factory.rs
  • crates/tool_parser/src/parsers/kimik2.rs
  • crates/tool_parser/src/parsers/mistral.rs
  • model_gateway/src/routers/grpc/context.rs
  • model_gateway/src/routers/grpc/pd_router.rs
  • model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs
  • model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs
  • model_gateway/src/routers/grpc/router.rs
  • model_gateway/src/routers/grpc/utils/chat_utils.rs

Comment thread crates/tool_parser/src/factory.rs Outdated
Comment thread crates/tool_parser/src/factory.rs
Comment thread crates/tool_parser/src/parsers/kimik2.rs
Comment thread crates/tool_parser/src/parsers/mistral.rs
Comment thread model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs
@CatherineSue CatherineSue force-pushed the feat/parser-constraint-generation branch from 09ca1ef to 739917b Compare April 13, 2026 23:22
@CatherineSue CatherineSue changed the title WIP feat(tool_parser): add structural tag constraint generation to parser registry feat(tool_parser): add structural tag constraint generation to parser registry Apr 13, 2026
@CatherineSue
Copy link
Copy Markdown
Member Author

Re: coderabbit outside-diff comment on chat_utils.rs:211-229 (compatibility wrapper)

Fixed in commit c8a1392 — the deprecated wrapper has been deleted. Golang bindings now call ParserRegistry::generate_tool_constraint() directly via the global PARSER_FACTORY singleton.

@CatherineSue
Copy link
Copy Markdown
Member Author

CatherineSue commented Apr 13, 2026

All 5 observations have been addressed:

  1. Mistral parallel tool calls — Noted as follow-up (needs tags_with_separator format like sglang).
  2. $defs conflict checking — Restored with explicit error return (commit 6a3644e).
  3. Error -> Option — Changed to Result<Option<ToolConstraint>, String> (commit 6a3644e).
  4. Deprecated wrapper expense — Deleted entirely. Go bindings use global PARSER_FACTORY singleton (commit c8a1392).
  5. No auto mode constraint — Noted as follow-up item in PR description.

Comment thread crates/grpc_client/src/sglang_scheduler.rs Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 739917beca

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs Outdated
Comment thread crates/tool_parser/src/parsers/mistral.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/grpc_client/src/sglang_scheduler.rs (1)

437-445: ⚠️ Potential issue | 🟠 Major

Don't let Chat tool calls strip parser-required markers.

Line 445 now blindly forwards request.skip_special_tokens. For structural-tag parsers like KimiK2, that can remove the native tool tokens the router-side parser keys on, and it also regresses auto when tools are present but unconstrained. The Messages path at Lines 640-644 already applies the correct tool-aware policy; Chat should mirror that override here.

Suggested fix
-        let skip_special_tokens = request.skip_special_tokens;
+        let skip_special_tokens = match &tool_call_constraint {
+            Some((typ, _)) if typ == "json_schema" => true,
+            _ if request.tools.as_ref().is_some_and(|t| !t.is_empty()) => false,
+            _ => request.skip_special_tokens,
+        };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/grpc_client/src/sglang_scheduler.rs` around lines 437 - 445, In
build_grpc_sampling_params_from_chat, don't forward request.skip_special_tokens
blindly; instead compute skip_special_tokens using the same tool-aware override
used in the Messages path (mirror the logic from the function that builds
sampling params from messages) so parser-required markers aren't stripped: if
the chat request includes tool calls and tool_call_constraint is None, force
skip_special_tokens = false (preserve tool tokens), otherwise use
request.skip_special_tokens; update build_grpc_sampling_params_from_chat (and
reference extract_stop_strings as needed) to apply that override.
♻️ Duplicate comments (6)
crates/tool_parser/src/parsers/kimik2.rs (1)

61-84: ⚠️ Potential issue | 🔴 Critical

Don’t bind Kimi call IDs to tool-definition order.

Line 61 uses enumerate(), and Line 71 / Line 83 bake that index into functions.{name}:{index}. The parser, however, ignores the parsed index at Line 256, initializes the first emitted call as index 0 at Line 272, and then advances by emission order at Line 351. A first call to any non-first tool is therefore rejected by the constraint. Generate an index that is independent of definition order, or fall back until dynamic indices can be expressed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/tool_parser/src/parsers/kimik2.rs` around lines 61 - 84, The tags are
baking in a definition-order index via tools.iter().enumerate() and the
"functions.{name}:{index}" format in the tags.push calls, which conflicts with
the runtime emission indexing the parser enforces; change the tag generation in
the block that uses tools.iter().enumerate() (the two tags.push calls with
format!("functions.{name}:{index}")) to avoid coupling to definition
order—either omit the numeric index from the identifier (e.g., emit
"functions.{name}" only) or substitute a placeholder token (e.g., "*"/"dynamic")
or a stable per-tool ID (derived from tool.function.name) so the emitted call
index can be applied at parse time instead of being hard-coded from enumerate().
crates/tool_parser/src/parsers/mistral.rs (1)

48-76: ⚠️ Potential issue | 🟠 Major

This structural tag still only permits a single Mistral call.

Line 60 opens [TOOL_CALLS] [{..., and Line 65 hard-closes with }], so there is no legal path to emit a second array item before the payload ends. Required / allowed_tools(required) now prefer this structural-tag path, so valid multi-call Mistral outputs regress. Either add continuation tags for subsequent items or fall back to json_schema for modes that may emit multiple calls.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/tool_parser/src/parsers/mistral.rs` around lines 48 - 76, The
structural tag created by build_structural_tag currently always opens the array
and then hard-closes it, preventing multiple items; update build_structural_tag
(and the tags vector) to model a repeatable array item pattern: emit an opening
tag for "[TOOL_CALLS] [" (or begin the first item with "{\"name\": ...
\"arguments\": "), then emit a tag representing a single array item with the
json_schema content, then emit a continuation tag that matches the comma
separator between items (e.g., begin="," end="" or a tag whose begin is "," and
which re-uses the same json_schema structure), and finally emit a closing tag
for the array ("]" or "}]"); alternatively, when at_least_one is false or
multi-call is allowed, fall back to returning a single json_schema entry rather
than the triggered_tags structure so multi-call outputs aren’t
constrained—change build_structural_tag to construct these
opening/item/continuation/closing tag entries rather than producing a single
begin + end pair.
model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs (1)

202-212: ⚠️ Potential issue | 🟠 Major

Resolve the effective parser before building the constraint.

This still only forwards configured_tool_parser, so default model-mapped Mistral/Kimi requests in the Messages path never reach the structural-tag branch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs` around
lines 202 - 212, The code builds tool_call_constraint using
ctx.components.configured_tool_parser directly, which prevents resolving the
effective parser (including model-mapped defaults) before constraint generation;
update the logic in the tool_call_constraint computation (the block creating
tool_call_constraint and calling
ctx.components.tool_parser_factory.registry().generate_tool_constraint) to first
resolve the effective parser (e.g., call the same resolution used elsewhere to
derive the actual parser from configured_tool_parser and the request/model
mapping) and pass that resolved parser into generate_tool_constraint so
Messages-path requests reach the structural-tag branch.
model_gateway/src/routers/grpc/regular/processor.rs (1)

144-149: ⚠️ Potential issue | 🟠 Major

Resolve the effective parser before checking structural-tag support.

has_structural_tag_for_parser(self.configured_tool_parser.as_deref()) only sees the explicit override. On the default model-mapped path, both branches still treat structural-tag-capable parsers as JSON-schema mode and route tool responses through parse_json_schema_response.

Also applies to: 627-635

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/regular/processor.rs` around lines 144 - 149,
Resolve the effective parser before checking structural-tag support: determine
the actual parser to use (either self.configured_tool_parser.as_deref() if
present or the model-mapped default used elsewhere) and pass that resolved
parser into tool_parser_factory.registry().has_structural_tag_for_parser(...)
instead of passing the raw configured_tool_parser; then base used_json_schema
and the call to parse_json_schema_response on that resolved parser. Update both
the block around has_structural_tag/used_json_schema and the duplicate section
(lines ~627-635) to use the same resolved-parser variable.
model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs (1)

186-196: ⚠️ Potential issue | 🟠 Major

Use the resolved parser here, not only configured_tool_parser.

Passing only ctx.components.configured_tool_parser.as_deref() keeps the default model-mapped Mistral/Kimi path on the generic JSON-schema fallback, so the native structural-tag fix still requires a manual override.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs` around
lines 186 - 196, The code passes
ctx.components.configured_tool_parser.as_deref() to
registry().generate_tool_constraint, which forces the generic JSON-schema
fallback; change the call to pass the resolved parser instance (the value
produced by the tool parser resolution logic you have elsewhere) instead of
configured_tool_parser.as_deref(), so registry().generate_tool_constraint
receives the actual resolved parser (use the variable or factory call that holds
the resolved parser) when building tool_call_constraint.
bindings/golang/src/client.rs (1)

152-152: ⚠️ Potential issue | 🟠 Major

Filter the request and resolve the parser before generating the tool constraint.

This path now feeds the raw chat_request.tools list into generate_tool_constraint(None, ...). That means forced-function requests can still constrain tools[0] instead of the requested tool, and model-mapped Mistral/Kimi calls never reach the structural-tag branch. Based on learnings: ParserRegistry::generate_tool_constraint does not internally filter tools by tool_choice; caller filtering is the caller's responsibility.

Also applies to: 180-186

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bindings/golang/src/client.rs` at line 152, The code currently passes the raw
chat_request.tools list into ParserRegistry::generate_tool_constraint, which
means tool filtering by tool_choice and resolving the parser is not happening
before constraint generation; update the caller (where chat_request:
ChatCompletionRequest is parsed and before calls to generate_tool_constraint and
ParserRegistry::resolve_parser) to first filter chat_request.tools to the
selected tool based on the tool_choice (or tool index/name), resolve the parser
via ParserRegistry::resolve_parser for that selected tool, and then pass only
that filtered tool (or its metadata) into
ParserRegistry::generate_tool_constraint so forced-function requests and
model-mapped calls hit the correct structural-tag branch; apply the same fix to
the other occurrence around the code handling lines ~180-186 where
generate_tool_constraint is invoked.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bindings/golang/src/client.rs`:
- Around line 201-208: The pattern match on chat_request.tool_choice currently
moves the Option<ToolChoice>, causing later borrows of &chat_request to fail;
change the match to use a reference: call .as_ref() on chat_request.tool_choice
when using matches! so you match against
Some(ToolChoice::Value(ToolChoiceValue::None)) by reference (e.g.,
matches!(chat_request.tool_choice.as_ref(), ...)); update the condition that
uses tool_constraint, chat_request.tools, and tool_choice accordingly so
tool_choice is borrowed, not moved.

In `@bindings/golang/src/policy.rs`:
- Around line 601-614: The matches! call on chat_request.tool_choice is moving
the non-Copy field and causing later borrow errors; change the pattern match to
borrow the option (e.g., use chat_request.tool_choice.as_ref()) so you match
against a reference (openai_protocol::common::ToolChoice::Value(...)) instead of
moving the value, keeping tool_constraint, chat_request, and subsequent uses
valid.

In `@model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs`:
- Around line 213-217: The match guard is trying to move the non-Copy enum field
request.tool_choice out of a shared borrow; change the guard to match a
reference by using request.tool_choice.as_ref() (so the condition becomes
something like matches!(request.tool_choice.as_ref(),
Some(ToolChoice::Value(ToolChoiceValue::None)))) while keeping the
request.tools.is_some() check unchanged; update the matches! usage in the if
guard that references ToolChoice/ToolChoiceValue to operate on a reference
rather than trying to move the value.

In `@model_gateway/src/routers/grpc/regular/streaming.rs`:
- Around line 253-266: The fast-path that treats a choice as "arguments-only"
must be gated by used_json_schema: update the logic that derives
is_specific_function so it requires both the original tool_choice checks (e.g.,
ToolChoice::Function, ToolChoice::Value::Required, AllowedTools mode ==
"required") AND used_json_schema == true (which itself is computed from
has_structural_tag and tool_choice); similarly, any branches that bypass the
parser and emit native wrapper tokens as arguments should only run when
used_json_schema is true — ensure places referencing is_specific_function and
the arguments-only fast path consult used_json_schema before skipping the parser
to prevent streaming raw TOOL_CALLS/<|tool_call_begin|> tokens for
structural-tag function calls.

---

Outside diff comments:
In `@crates/grpc_client/src/sglang_scheduler.rs`:
- Around line 437-445: In build_grpc_sampling_params_from_chat, don't forward
request.skip_special_tokens blindly; instead compute skip_special_tokens using
the same tool-aware override used in the Messages path (mirror the logic from
the function that builds sampling params from messages) so parser-required
markers aren't stripped: if the chat request includes tool calls and
tool_call_constraint is None, force skip_special_tokens = false (preserve tool
tokens), otherwise use request.skip_special_tokens; update
build_grpc_sampling_params_from_chat (and reference extract_stop_strings as
needed) to apply that override.

---

Duplicate comments:
In `@bindings/golang/src/client.rs`:
- Line 152: The code currently passes the raw chat_request.tools list into
ParserRegistry::generate_tool_constraint, which means tool filtering by
tool_choice and resolving the parser is not happening before constraint
generation; update the caller (where chat_request: ChatCompletionRequest is
parsed and before calls to generate_tool_constraint and
ParserRegistry::resolve_parser) to first filter chat_request.tools to the
selected tool based on the tool_choice (or tool index/name), resolve the parser
via ParserRegistry::resolve_parser for that selected tool, and then pass only
that filtered tool (or its metadata) into
ParserRegistry::generate_tool_constraint so forced-function requests and
model-mapped calls hit the correct structural-tag branch; apply the same fix to
the other occurrence around the code handling lines ~180-186 where
generate_tool_constraint is invoked.

In `@crates/tool_parser/src/parsers/kimik2.rs`:
- Around line 61-84: The tags are baking in a definition-order index via
tools.iter().enumerate() and the "functions.{name}:{index}" format in the
tags.push calls, which conflicts with the runtime emission indexing the parser
enforces; change the tag generation in the block that uses
tools.iter().enumerate() (the two tags.push calls with
format!("functions.{name}:{index}")) to avoid coupling to definition
order—either omit the numeric index from the identifier (e.g., emit
"functions.{name}" only) or substitute a placeholder token (e.g., "*"/"dynamic")
or a stable per-tool ID (derived from tool.function.name) so the emitted call
index can be applied at parse time instead of being hard-coded from enumerate().

In `@crates/tool_parser/src/parsers/mistral.rs`:
- Around line 48-76: The structural tag created by build_structural_tag
currently always opens the array and then hard-closes it, preventing multiple
items; update build_structural_tag (and the tags vector) to model a repeatable
array item pattern: emit an opening tag for "[TOOL_CALLS] [" (or begin the first
item with "{\"name\": ... \"arguments\": "), then emit a tag representing a
single array item with the json_schema content, then emit a continuation tag
that matches the comma separator between items (e.g., begin="," end="" or a tag
whose begin is "," and which re-uses the same json_schema structure), and
finally emit a closing tag for the array ("]" or "}]"); alternatively, when
at_least_one is false or multi-call is allowed, fall back to returning a single
json_schema entry rather than the triggered_tags structure so multi-call outputs
aren’t constrained—change build_structural_tag to construct these
opening/item/continuation/closing tag entries rather than producing a single
begin + end pair.

In `@model_gateway/src/routers/grpc/regular/processor.rs`:
- Around line 144-149: Resolve the effective parser before checking
structural-tag support: determine the actual parser to use (either
self.configured_tool_parser.as_deref() if present or the model-mapped default
used elsewhere) and pass that resolved parser into
tool_parser_factory.registry().has_structural_tag_for_parser(...) instead of
passing the raw configured_tool_parser; then base used_json_schema and the call
to parse_json_schema_response on that resolved parser. Update both the block
around has_structural_tag/used_json_schema and the duplicate section (lines
~627-635) to use the same resolved-parser variable.

In `@model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs`:
- Around line 186-196: The code passes
ctx.components.configured_tool_parser.as_deref() to
registry().generate_tool_constraint, which forces the generic JSON-schema
fallback; change the call to pass the resolved parser instance (the value
produced by the tool parser resolution logic you have elsewhere) instead of
configured_tool_parser.as_deref(), so registry().generate_tool_constraint
receives the actual resolved parser (use the variable or factory call that holds
the resolved parser) when building tool_call_constraint.

In `@model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs`:
- Around line 202-212: The code builds tool_call_constraint using
ctx.components.configured_tool_parser directly, which prevents resolving the
effective parser (including model-mapped defaults) before constraint generation;
update the logic in the tool_call_constraint computation (the block creating
tool_call_constraint and calling
ctx.components.tool_parser_factory.registry().generate_tool_constraint) to first
resolve the effective parser (e.g., call the same resolution used elsewhere to
derive the actual parser from configured_tool_parser and the request/model
mapping) and pass that resolved parser into generate_tool_constraint so
Messages-path requests reach the structural-tag branch.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 95e93f6f-a72b-48c0-929f-038a48a7bfd4

📥 Commits

Reviewing files that changed from the base of the PR and between 09ca1ef and 739917b.

📒 Files selected for processing (18)
  • bindings/golang/src/client.rs
  • bindings/golang/src/policy.rs
  • bindings/golang/src/preprocessor.rs
  • crates/grpc_client/src/sglang_scheduler.rs
  • crates/grpc_client/src/vllm_engine.rs
  • crates/tool_parser/src/factory.rs
  • crates/tool_parser/src/lib.rs
  • crates/tool_parser/src/parsers/kimik2.rs
  • crates/tool_parser/src/parsers/mistral.rs
  • model_gateway/src/routers/grpc/context.rs
  • model_gateway/src/routers/grpc/pd_router.rs
  • model_gateway/src/routers/grpc/regular/processor.rs
  • model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs
  • model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs
  • model_gateway/src/routers/grpc/regular/streaming.rs
  • model_gateway/src/routers/grpc/router.rs
  • model_gateway/src/routers/grpc/utils/chat_utils.rs
  • model_gateway/src/routers/grpc/utils/mod.rs

Comment thread bindings/golang/src/client.rs
Comment thread bindings/golang/src/policy.rs
Comment thread model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs
Comment thread model_gateway/src/routers/grpc/regular/streaming.rs
@CatherineSue CatherineSue changed the title feat(tool_parser): add structural tag constraint generation to parser registry feat(tool_parser): add structural tag constraint generation, fix skip_special_tokens derivation Apr 13, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca76040a49

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread crates/grpc_client/src/sglang_scheduler.rs Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
model_gateway/src/routers/grpc/regular/streaming.rs (1)

1448-1456: ⚠️ Potential issue | 🟡 Minor

Messages API skip_special_tokens is hardcoded true in streaming but derived dynamically in preparation stage.

In messages/preparation.rs (lines 227–230), skip_special_tokens is derived based on constraint type:

  • false when structural tags are present (tool markers)
  • true for JSON schema or when no tools are present

However, in process_messages_streaming_response (line 1454), it's hardcoded to true. This differs from Chat API, where streaming respects the request field.

Impact is likely benign: The tokenizer's skip_special_tokens parameter only filters BOS/EOS tokens; tool markers like [TOOL_CALLS] and <|tool_call_begin|> are model-native framing and not subject to this filtering. Tool parsing should work correctly despite the inconsistency. However, the mismatch with Chat API behavior and the prep stage logic warrants alignment—either propagate the preparation-stage value or update preparation to match streaming's hardcoded behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/regular/streaming.rs` around lines 1448 -
1456, process_messages_streaming_response currently hardcodes
skip_special_tokens=true in the stop_params tuple (the tuple bound to
stop_params in streaming.rs), which conflicts with the dynamic
skip_special_tokens value computed during messages preparation; change the
streaming code to use the same derived boolean instead of true—i.e., thread the
prepared skip_special_tokens value (the one computed in messages preparation or
the request-level field used there) into process_messages_streaming_response and
replace the hardcoded true with that variable so streaming respects the
preparation-stage decision for skip_special_tokens.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@model_gateway/src/routers/grpc/regular/streaming.rs`:
- Around line 1448-1456: process_messages_streaming_response currently hardcodes
skip_special_tokens=true in the stop_params tuple (the tuple bound to
stop_params in streaming.rs), which conflicts with the dynamic
skip_special_tokens value computed during messages preparation; change the
streaming code to use the same derived boolean instead of true—i.e., thread the
prepared skip_special_tokens value (the one computed in messages preparation or
the request-level field used there) into process_messages_streaming_response and
replace the hardcoded true with that variable so streaming respects the
preparation-stage decision for skip_special_tokens.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c2b2bcc8-a3cb-46f7-9be8-eb59e60a7028

📥 Commits

Reviewing files that changed from the base of the PR and between 739917b and ca76040.

📒 Files selected for processing (2)
  • model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs
  • model_gateway/src/routers/grpc/regular/streaming.rs

@github-actions github-actions bot added the tests Test changes label Apr 14, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs (1)

201-235: ⚠️ Potential issue | 🟠 Major

Don’t pin the Messages stop decoder to true when tool constraints are active.

tool_call_constraint is now available above, but create_stop_decoder(..., true, ...) still forces special-token stripping. That drops structural-tag/native-framing markers before the router parser can consume them, breaking Messages tool calls for parsers like Mistral/Kimi. Derive this flag from the chosen constraint/tool-choice path instead of the unconditional default.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs` around
lines 201 - 235, The stop-decoder is always being created with the Messages API
default flag hardcoded to true, which strips special tokens and breaks tool-call
parsing when tool constraints are active; update the utils::create_stop_decoder
call to derive that boolean from the presence/choice of tool constraints instead
of the unconditional true—use tool_call_constraint (and/or chat_tool_choice) to
set the flag (e.g., disable special-token stripping when a tool constraint or
tool_choice exists) so structural framing markers are preserved for the
router/parser.
model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs (1)

180-222: ⚠️ Potential issue | 🟠 Major

Derive the decoder’s skip_special_tokens from the selected ToolConstraint.

This new path can now return StructuralTag, but the stage still feeds request.skip_special_tokens into create_stop_decoder() below. With the default true, native-framing tokens can be stripped before the router-side parser sees them, so required/function tool calls can regress for structural-tag parsers. Please compute the effective flag alongside tool_call_constraint and pass that value into the decoder.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs` around
lines 180 - 222, The stage currently always passes request.skip_special_tokens
into utils::create_stop_decoder, but when a ToolConstraint
(tool_call_constraint) indicates a structural-tag/native parser it must override
that default; compute an effective_skip_special_tokens boolean alongside
building tool_call_constraint by examining the selected ToolConstraint (e.g.,
inspect tool_call_constraint.as_ref() / the ToolConstraint variant or its method
that indicates native/structural framing) and set effective_skip_special_tokens
= constraint-driven value else fallback to request.skip_special_tokens, then
pass effective_skip_special_tokens into create_stop_decoder instead of
request.skip_special_tokens and persist/store this effective value with
PreparationOutput if needed for downstream logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/grpc_client/src/sglang_scheduler.rs`:
- Around line 445-447: The gRPC path is inconsistent: some builders set let
skip_special_tokens = true while build_grpc_sampling_params_from_responses still
hardcodes skip_special_tokens: false, causing EOS/EOT markers to leak; update
build_grpc_sampling_params_from_responses (and the other response-path builder
locations referenced around the second occurrence) to use skip_special_tokens =
true (or read the same variable/constant used by the other builders) so all gRPC
SamplingParams consistently set skip_special_tokens to true and follow the same
backend contract.

---

Outside diff comments:
In `@model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs`:
- Around line 180-222: The stage currently always passes
request.skip_special_tokens into utils::create_stop_decoder, but when a
ToolConstraint (tool_call_constraint) indicates a structural-tag/native parser
it must override that default; compute an effective_skip_special_tokens boolean
alongside building tool_call_constraint by examining the selected ToolConstraint
(e.g., inspect tool_call_constraint.as_ref() / the ToolConstraint variant or its
method that indicates native/structural framing) and set
effective_skip_special_tokens = constraint-driven value else fallback to
request.skip_special_tokens, then pass effective_skip_special_tokens into
create_stop_decoder instead of request.skip_special_tokens and persist/store
this effective value with PreparationOutput if needed for downstream logic.

In `@model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs`:
- Around line 201-235: The stop-decoder is always being created with the
Messages API default flag hardcoded to true, which strips special tokens and
breaks tool-call parsing when tool constraints are active; update the
utils::create_stop_decoder call to derive that boolean from the presence/choice
of tool constraints instead of the unconditional true—use tool_call_constraint
(and/or chat_tool_choice) to set the flag (e.g., disable special-token stripping
when a tool constraint or tool_choice exists) so structural framing markers are
preserved for the router/parser.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7a68c5e6-e80a-4060-acbf-260ac6cc11b1

📥 Commits

Reviewing files that changed from the base of the PR and between ca76040 and 3eec4b1.

📒 Files selected for processing (5)
  • crates/grpc_client/src/sglang_scheduler.rs
  • crates/grpc_client/src/vllm_engine.rs
  • e2e_test/chat_completions/test_function_calling.py
  • model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs
  • model_gateway/src/routers/grpc/regular/stages/messages/preparation.rs

Comment thread crates/grpc_client/src/sglang_scheduler.rs
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f82054ab88

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread model_gateway/src/routers/grpc/regular/streaming.rs
Comment thread model_gateway/src/routers/grpc/regular/streaming.rs
Comment thread model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs
@mergify mergify bot added the needs-rebase PR has merge conflicts that need to be resolved label Apr 15, 2026
@CatherineSue CatherineSue force-pushed the feat/parser-constraint-generation branch from 22ac68a to ec1cda9 Compare April 15, 2026 01:54
@mergify mergify bot removed the needs-rebase PR has merge conflicts that need to be resolved label Apr 15, 2026
)
{
chat_request.skip_special_tokens = false;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: This mutation is dead code. build_grpc_sampling_params_from_chat in sglang_scheduler.rs now hardcodes skip_special_tokens = true (line 447), so the value set here on chat_request.skip_special_tokens is never read by the downstream gRPC request builder.

Same issue in policy.rs:612-625.

Either remove the dead derivation from both files, or — if the intent is forward-looking for when golang bindings support a configured_parser — add a // TODO: explaining why this exists and that it is currently inert.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The derivation is not dead code — the derived skip_special_tokens is passed to sgl_grpc_response_converter_create (line 282) which uses it for SMG-side token decoding via StopDecoder. The gRPC request builder hardcodes it to true by design (backends return raw token IDs, skip_special_tokens only matters for SMG-side decoding). The derivation is consumed by the response converter, not the request builder.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The threading to StopDecoder in go bindings is a separate issue and we should address it in another PR.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec1cda9f6a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

serde_json::json!({
"format": {
"type": "triggered_tags",
"triggers": ["<|tool_calls_section_begin|>", "<|tool_call_begin|>"],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Require section wrapper in Kimi structural-tag trigger

This structural-tag definition allows generation to trigger directly on <|tool_call_begin|> without first emitting <|tool_calls_section_begin|>. In non-streaming mode, KimiK2Parser::parse_complete only recognizes tool calls when the section-begin marker is present, so responses that follow this allowed path are treated as plain text and tool calls are lost. Please either constrain the first trigger/tag to require the section wrapper or make complete parsing accept the bare tool-call-begin form as well.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid observation — the dual trigger is intentional for parallel tool call support. The bare <|tool_call_begin|> trigger handles subsequent tool calls after the first one opened the section. However, has_tool_markers only checks for <|tool_calls_section_begin|>, which means a bare <|tool_call_begin|> without the section wrapper would be treated as plain text by parse_complete. This is a known gap — will address in a follow-up.

Comment thread crates/tool_parser/src/factory.rs
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Apr 15, 2026

Hi @CatherineSue, this PR has merge conflicts that must be resolved before it can be merged. Please rebase your branch:

git fetch origin main
git rebase origin/main
# resolve any conflicts, then:
git push --force-with-lease

@mergify mergify bot added the needs-rebase PR has merge conflicts that need to be resolved label Apr 15, 2026
…generation and skip_special_tokens [Tasks 1-7,11]

- Add ToolConstraint enum (JsonSchema/StructuralTag) to replace raw string tuples
- Remove REQUIRES_SPECIAL_TOKENS from ParserEntry, MistralParser, KimiK2Parser
  (skip_special_tokens is derived from constraint type, not per-parser metadata)
- Rename register_parser_with_meta to register_parser_with_structural_tag
- Change generate_tool_constraint to return Result<Option<ToolConstraint>, String>
  to surface serialization errors instead of silently falling back
- Restore $defs conflict detection in build_required_array_schema
- Fix Mistral tool name JSON escaping via serde_json::to_string
- Add has_structural_tag accessor for response parsing
- Derive skip_special_tokens from constraint type in chat/messages preparation:
  json_schema -> keep user default, structural_tag/no constraint with tools -> false
- Fix StopDecoder receiving inconsistent skip_special_tokens (was using user
  default while grpc_client independently overrode to false)

Signed-off-by: Chang Su <chang.s.su@oracle.com>

# Conflicts:
#	model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs
…odel resolution [Task 12]

Extract shared model→parser resolution logic (exact match → prefix match)
into resolve_model_to_parser and resolve_parser_name methods. Refactor
has_parser_for_model, create_for_model, get_pooled_for_model, and
ParserFactory::get_parser to use the shared helper, eliminating ~60 lines
of duplicated code across 4 methods.

Signed-off-by: Chang Su <chang.s.su@oracle.com>
…se parsing [Tasks 8-9]

Replace tool_choice-based used_json_schema inference with registry-based
has_structural_tag check. When the resolved parser supports structural tags,
response parsing uses the model-specific parser instead of forcing the generic
JSON parser. Fixes divergence where constraint generation used structural tags
but response parsing assumed json_schema.

Affected paths: chat processor, messages processor, chat streaming,
messages streaming (4 sites total).

Signed-off-by: Chang Su <chang.s.su@oracle.com>
…onstraint-aware [Task 10]

Chat API: skip_special_tokens is now a passthrough from the request.
The preparation stage sets it correctly based on constraint type.

Messages API: since CreateMessageRequest has no skip_special_tokens field,
derive it from constraint type — json_schema keeps true (pure JSON output),
structural_tag or tools without constraint forces false (parser needs
special token delimiters).

Responses/completion API paths unchanged.

Signed-off-by: Chang Su <chang.s.su@oracle.com>
…l-call-parser [Tasks 8-9 followup]

Add has_structural_tag_for_parser(configured: Option<&str>) to ParserRegistry.
Structural tags only activate when --tool-call-parser is explicitly set to a
parser that supports them — no model-name auto-detection, which is unreliable
with model aliases.

Simplifies 4 call sites in processor.rs and streaming.rs from 7-line
resolve_parser_name + has_structural_tag blocks to single-line calls.

Signed-off-by: Chang Su <chang.s.su@oracle.com>
…ed wrapper [Task 13]

Migrate all 3 golang binding files (client.rs, preprocessor.rs, policy.rs)
to use PARSER_FACTORY.registry().generate_tool_constraint() directly instead
of the deprecated generate_tool_constraints() wrapper in chat_utils.rs.

Go bindings now derive skip_special_tokens from constraint type (option B):
json_schema keeps the default, other constraint types or no constraint with
tools present forces skip_special_tokens=false.

Delete the deprecated generate_tool_constraints() function and its re-export.

Signed-off-by: Chang Su <chang.s.su@oracle.com>
…nup]

The method was added during dedup but is not called anywhere.
Structural tag checks use configured_tool_parser directly (explicit opt-in),
not model-based auto-detection.

Signed-off-by: Chang Su <chang.s.su@oracle.com>
…s tool_choice none

Two fixes for structural tag correctness:

1. streaming.rs: is_specific_function (arguments-only fast path) now requires
   used_json_schema=true. When structural tags are active, the output includes
   framing tokens that the parser must handle — bypassing the parser would
   stream raw [TOOL_CALLS] or <|tool_call_begin|> tokens as arguments.

2. messages/preparation.rs: skip_special_tokens derivation now respects
   tool_choice=none. Previously, having tools present with tool_choice=none
   would force skip_special_tokens=false, causing special tokens to leak
   into plain text responses.

Signed-off-by: Chang Su <chang.s.su@oracle.com>
…onstraint-aware [Task 10]

Mutate skip_special_tokens on the request in the chat preparation stage
(via Cow::to_mut) so both StopDecoder and the grpc_client backend see the
same derived value. The grpc_client remains a simple passthrough.

Previously, skip_special_tokens was derived as a local variable for the
StopDecoder but request.skip_special_tokens was never modified, so the
backend still received the user default (true), stripping special tokens
needed by tool parsers like Mistral and KimiK2.

Signed-off-by: Chang Su <chang.s.su@oracle.com>

# Conflicts:
#	model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs
…ag support

- Update assertion from "not supported" to "conflicting" to match the
  restored $defs conflict error message
- Skip test_conflicting_defs_required_tool_choice for Mistral: structural
  tags handle each tool's schema independently, so $defs conflicts are
  irrelevant when not consolidated into a single array schema

Signed-off-by: Chang Su <chang.s.su@oracle.com>
gRPC backends (sglang, vllm) return raw token IDs, not decoded text.
Detokenization happens entirely on the SMG Rust side via StopDecoder.
The skip_special_tokens field in the gRPC proto is unused by the backend
in gRPC mode, so hardcode it to true to avoid confusion.

Also revert the skip_special_tokens derivation from preparation stages —
the StopDecoder should use the request's original value (true by default).
Changing it to false caused EOS/EOT tokens to appear in decoded text,
breaking tool call detection.

Signed-off-by: Chang Su <chang.s.su@oracle.com>
Use tags_with_separator format inside triggered_tags to support multiple
tool calls in a single array. The format matches Mistral's native output:
[TOOL_CALLS] [{"name":"f1","arguments":{...}}, {"name":"f2","arguments":{...}}]

Previously, each tool had its own triggered_tags entry with begin/end that
included the array brackets, making parallel calls impossible.

Signed-off-by: Chang Su <chang.s.su@oracle.com>
…ad to streaming

Derive skip_special_tokens based on the constraint type, not just tool presence:
- json_schema: keep default (true) — backend forces JSON, no trigger tokens
- structural_tag or no constraint (auto): set false — parser needs trigger tokens
- no tools: keep request default

The derived value is stored on ResponseState (survives request_building .take()
of PreparationOutput) and threaded to streaming.rs via new parameter on
process_streaming_response and process_messages_streaming_response.

Signed-off-by: Chang Su <chang.s.su@oracle.com>
…plate

- Enable test_complex_parameters_required_non_streaming for Mistral
  (structural tags fix the whitespace issue that caused the skip)
- Remove custom tool_chat_template_mistral_parallel.jinja — Mistral v0.3's
  built-in template handles tools natively
- Update test_function_call_strict skip reason

Signed-off-by: Chang Su <chang.s.su@oracle.com>
@CatherineSue CatherineSue force-pushed the feat/parser-constraint-generation branch from 2fefced to a1b92c7 Compare April 15, 2026 03:25
@mergify mergify bot removed the needs-rebase PR has merge conflicts that need to be resolved label Apr 15, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2fefcedf65

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread crates/tool_parser/src/factory.rs
Comment thread model_gateway/src/routers/grpc/regular/stages/chat/preparation.rs Outdated
Move the empty tools check before the structural tag path, preventing
an unsatisfiable grammar (zero tags with at_least_one=true) from being
sent to the backend. Also fix missing .to_tuple() in chat preparation
(rebase artifact).

Signed-off-by: Chang Su <chang.s.su@oracle.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 99e42d5969

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +71 to +72
"<|tool_calls_section_begin|><|tool_call_begin|>functions.{name}:{index}<|tool_call_argument_begin|>"
),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Sanitize Kimi tool names in structural-tag delimiters

The new Kimi structural-tag builder interpolates tool.function.name directly into the delimiter string (functions.{name}:{index}), but this format is later parsed with a strict identifier pattern ([\w\.]+:\d+). If a caller provides a valid request containing a tool name with delimiter/meta characters (for example : or token-like substrings), the generated structural tag can become unmatchable and tool calls will be dropped as plain text. Please validate/restrict Kimi tool names for this format (or escape/encode them consistently) before building the tag.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt Kimi-K2 would even work correctly under such case. Not worth to address this for now. And OpenAI only allows

openai.BadRequestError: Error code: 400 - {'error': {'message': "Invalid 'tools[0].function.name': string does not match pattern. Expected a string that matches the pattern '^[a-zA-Z0-9_-]+$'.", 'type': 'invalid_request_error', 'param': 'tools[0].function.name', 'code': 'invalid_value'}}

Copy link
Copy Markdown
Member Author

@CatherineSue CatherineSue Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider adding such validation though over the function name. cc @key4ng

@CatherineSue CatherineSue merged commit d449219 into main Apr 15, 2026
66 of 79 checks passed
@CatherineSue CatherineSue deleted the feat/parser-constraint-generation branch April 15, 2026 04:36
TingtingZhou7 pushed a commit to TingtingZhou7/smg that referenced this pull request Apr 18, 2026
…_special_tokens derivation (lightseekorg#1090)

Signed-off-by: Chang Su <chang.s.su@oracle.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

grpc gRPC client and router changes model-gateway Model gateway crate changes tests Test changes tool-parser Tool/function call parser changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant