Skip to content

feat(protocol): add thinking param to Chat API and support bare string image_url#1098

Open
ConnorLi96 wants to merge 1 commit intomainfrom
connorli/thinking-and-image-url
Open

feat(protocol): add thinking param to Chat API and support bare string image_url#1098
ConnorLi96 wants to merge 1 commit intomainfrom
connorli/thinking-and-image-url

Conversation

@ConnorLi96
Copy link
Copy Markdown
Collaborator

Problem

  1. No way to control extended thinking via Chat Completions API (only Messages API has thinking field).
  2. Some clients send "image_url": "https://..." (bare string) instead of "image_url": {"url": "https://..."}, causing deserialization errors.
  3. Tiktoken partial UTF-8 decode logs at WARN level, flooding production logs during CJK/emoji streaming.

Solution

  1. Add thinking: Option<ThinkingConfig> to ChatCompletionRequest, reusing the existing ThinkingConfig from Messages API. Normalized to chat_template_kwargs with both enable_thinking (Qwen3) and thinking (Kimi-K2.5) keys. Uses or_insert to not override user-provided kwargs.
  2. Custom Deserialize on ImageUrl to accept both bare string and object forms.
  3. Change tiktoken partial UTF-8 log from warn! to debug!.

Behavior change note: ImageUrl now uses a custom deserializer. Serialization is unchanged (#[derive(Serialize)]).

Changes

  • crates/protocols/src/chat.rs: add thinking field + normalization + unit tests
  • crates/protocols/src/common.rs: custom ImageUrl deserializer + unit tests (bare string, object, missing url, roundtrip, skip_none)
  • crates/tokenizer/src/tiktoken.rs: log level warn -> debug

Test Plan

  • 7 new unit tests for ImageUrl deserialization + serialization
  • 4 new unit tests for thinking normalization (enabled, disabled, no-override, absent)
  • All 63 protocol tests pass

Checklist:

  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • cargo test -p openai-protocol passes (63 tests)

Made with Cursor

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 11, 2026

Warning

Rate limit exceeded

@ConnorLi96 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 16 minutes and 50 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 16 minutes and 50 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0d342ad2-0d7e-47ad-8f28-bd2b0faa7a66

📥 Commits

Reviewing files that changed from the base of the PR and between 381c5f6 and cf003e3.

📒 Files selected for processing (3)
  • crates/protocols/src/chat.rs
  • crates/protocols/src/common.rs
  • crates/tokenizer/src/tiktoken.rs
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch connorli/thinking-and-image-url

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added tokenizer Tokenizer related changes protocols Protocols crate changes labels Apr 11, 2026
…g image_url

1. Add Anthropic-style `thinking` field to ChatCompletionRequest.
   Normalized to chat_template_kwargs with both `thinking` and
   `enable_thinking` keys for Kimi-K2.5 and Qwen3 template compat.

2. Support bare string image_url: accept both `"image_url": "url"`
   and `"image_url": {"url": "..."}` via custom serde Deserialize.

3. Reduce tiktoken partial UTF-8 decode log from warn to debug
   (expected during streaming of CJK/emoji, not an error).

Includes unit tests for ImageUrl deserialization (bare string, object,
missing url, roundtrip) and thinking normalization.

Signed-off-by: ConnorLi96 <ConnorLi96@users.noreply.github.com>
Made-with: Cursor
Comment on lines +577 to +584
ThinkingConfig::Enabled { .. } => {
kwargs
.entry("enable_thinking".to_string())
.or_insert(Value::Bool(true));
kwargs
.entry("thinking".to_string())
.or_insert(Value::Bool(true));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: budget_tokens is silently discarded here — the Enabled { .. } pattern ignores it, and only a boolean true is forwarded to the template kwargs. Since budget_tokens is a required field on ThinkingConfig::Enabled, every caller must provide it, yet it has no effect on this code path. This could confuse users who expect their token budget to be respected.

Consider either:

  • Adding a tracing::debug! noting the value is ignored (cheap, aids debugging), or
  • Forwarding it as an additional kwarg (e.g., budget_tokens) so future templates can opt in to it.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for extended thinking configurations in ChatCompletionRequest by mapping the thinking field to chat_template_kwargs. It also enhances ImageUrl to support deserialization from both string and object formats and downgrades the logging level for tiktoken decode failures. Feedback was provided to ensure the thinking field takes precedence during normalization and to update the JsonSchema for ImageUrl to match its new deserialization capabilities.

Comment on lines +574 to +594
if let Some(ref thinking) = self.thinking {
let kwargs = self.chat_template_kwargs.get_or_insert_with(HashMap::new);
match thinking {
ThinkingConfig::Enabled { .. } => {
kwargs
.entry("enable_thinking".to_string())
.or_insert(Value::Bool(true));
kwargs
.entry("thinking".to_string())
.or_insert(Value::Bool(true));
}
ThinkingConfig::Disabled => {
kwargs
.entry("enable_thinking".to_string())
.or_insert(Value::Bool(false));
kwargs
.entry("thinking".to_string())
.or_insert(Value::Bool(false));
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using or_insert here creates a logic issue where the explicit thinking field is ignored if the corresponding keys already exist in chat_template_kwargs. For example, if a user sets thinking: Disabled but chat_template_kwargs contains "enable_thinking": true, the model will remain in thinking mode. Since thinking is a first-class field in the request, it should take precedence over the generic chat_template_kwargs during normalization to ensure alignment with the external specification. Additionally, the budget_tokens value from the Enabled variant is currently dropped; consider propagating it to the kwargs as well if the underlying templates support it.

        if let Some(ref thinking) = self.thinking {
            let kwargs = self.chat_template_kwargs.get_or_insert_with(HashMap::new);
            match thinking {
                ThinkingConfig::Enabled { .. } => {
                    kwargs.insert("enable_thinking".to_string(), Value::Bool(true));
                    kwargs.insert("thinking".to_string(), Value::Bool(true));
                }
                ThinkingConfig::Disabled => {
                    kwargs.insert("enable_thinking".to_string(), Value::Bool(false));
                    kwargs.insert("thinking".to_string(), Value::Bool(false));
                }
            }
        }
References
  1. For protocol data structures that mirror an external API (e.g., OpenAI), prioritize alignment with the external specification over internal consistency.

Comment on lines +198 to +203
#[derive(Debug, Clone, Serialize, PartialEq, schemars::JsonSchema)]
pub struct ImageUrl {
pub url: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub detail: Option<String>, // "auto", "low", or "high"
pub detail: Option<String>,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ImageUrl struct now supports deserialization from a bare string via a custom Deserialize implementation, but the derived JsonSchema does not reflect this change. This will result in generated API documentation and client libraries that only allow the object form. To maintain alignment with external specifications that allow both forms, you should implement schemars::JsonSchema manually for ImageUrl to specify that it can be either a string or an object (e.g., using any_of).

References
  1. For protocol data structures that mirror an external API (e.g., OpenAI), prioritize alignment with the external specification over internal consistency.

}

#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, schemars::JsonSchema)]
#[derive(Debug, Clone, Serialize, PartialEq, schemars::JsonSchema)]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: The schemars::JsonSchema derive still generates a schema for the struct form only (object with url + optional detail). Now that the custom Deserialize impl also accepts a bare string, the generated JSON Schema / OpenAPI docs won't reflect that. Clients doing client-side validation against the schema would reject the bare-string form that the server actually accepts.

If the schema is exposed in API docs, consider a manual JsonSchema impl (or #[schemars(with = ...)]) that emits a oneOf of string and object.

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean PR with good test coverage. Two minor nits posted inline — neither is blocking.

Summary: 🟡 Nit x2, 🔴 Important x0

  • budget_tokens silently discarded during thinking normalization — worth a debug log or forwarding as a kwarg
  • JsonSchema derive on ImageUrl doesn't reflect the new bare-string acceptance

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cf003e3994

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// Migrate thinking → chat_template_kwargs.
// Sets both keys: Qwen3 templates check `enable_thinking`, Kimi-K2.5 checks `thinking`.
// Uses or_insert to not override user-provided kwargs.
if let Some(ref thinking) = self.thinking {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Drop migrated thinking field after populating template kwargs

In the OpenAI HTTP routing path I checked (model_gateway/src/routers/openai/chat.rs:84-106), the request is serialized from ChatCompletionRequest and then only fields in SGLANG_FIELDS are stripped (model_gateway/src/routers/openai/provider.rs:11-35), which does not include thinking. Because this normalization block migrates thinking into chat_template_kwargs but leaves self.thinking set, any client that now sends thinking will forward an extra unsupported parameter to external OpenAI-compatible backends (a regression from previous behavior where unknown thinking was dropped during deserialization), causing 400-style upstream rejections.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

protocols Protocols crate changes tokenizer Tokenizer related changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant