Skip to content

Split api handlers#4083

Open
michalkulakowski wants to merge 6 commits intomkulakow/responses_apifrom
mkulakow/response_api_separate_handler
Open

Split api handlers#4083
michalkulakowski wants to merge 6 commits intomkulakow/responses_apifrom
mkulakow/response_api_separate_handler

Conversation

@michalkulakowski
Copy link
Collaborator

🛠 Summary

JIRA/Issue if applicable.
Describe the changes.

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR splits OpenAI REST API handling into a shared base (OpenAIApiHandler) plus endpoint-specific handlers, moving /v3/responses logic into a dedicated OpenAIResponsesHandler and updating serving code, build targets, tests, and documentation accordingly.

Changes:

  • Introduces OpenAIApiHandler base class and refactors chat/completions handler to inherit from it.
  • Adds dedicated OpenAIResponsesHandler with Responses-specific parsing + (streaming/unary) serialization and wires it into servables/calculator.
  • Updates Bazel targets, unit tests, and adds REST API documentation for /v3/responses.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/test/http_openai_handler_test.cpp Updates tests to construct correct handler type per endpoint and validates new streaming event sequence APIs.
src/llm/visual_language_model/legacy/servable.cpp Instantiates OpenAIResponsesHandler for RESPONSES and adjusts streaming chunk emission behavior for empty chunks.
src/llm/language_model/legacy/servable.cpp Same as above for legacy language model servable.
src/llm/servable.hpp Switches execution context to store a polymorphic std::shared_ptr<OpenAIApiHandler>.
src/llm/servable.cpp Creates responses vs chat handler based on endpoint; adjusts streaming chunk emission comments/behavior.
src/llm/http_llm_calculator.cc Emits response.created lifecycle event for streaming via handler hook; uses generic serializeFailedEvent.
src/llm/apis/openai_api_handler.hpp / src/llm/apis/openai_api_handler.cpp New shared base class with common parsing, tool parsing, usage, and shared utilities.
src/llm/apis/openai_responses.hpp / src/llm/apis/openai_responses.cpp New Responses endpoint handler (request parsing + unary/streaming serialization).
src/llm/apis/openai_completions.hpp / src/llm/apis/openai_completions.cpp Refactors chat/completions handler to inherit shared logic and removes embedded Responses logic.
src/llm/BUILD Adds Bazel targets for the new handler libraries and wires dependencies.
src/BUILD Links the new Responses handler library into the main build.
docs/model_server_rest_api_responses.md Adds user documentation for /v3/responses, including streaming event semantics.

Comment on lines +195 to +198
const auto firstMissmatch = std::mismatch(imageUrl.begin(), imageUrl.end(), allowedLocalMediaPath.value().begin(), allowedLocalMediaPath.value().end());
if (firstMissmatch.second != allowedLocalMediaPath.value().end()) {
return absl::InvalidArgumentError("Given filepath is not subpath of allowed_local_media_path");
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local file path allowlist check uses std::mismatch prefix matching, which can be bypassed by paths that share the same prefix (e.g., allowed "/tmp/allowed" would also allow "/tmp/allowed2/..."), and doesn’t canonicalize symlinks/relative segments. Consider using std::filesystem::weakly_canonical/lexically_normal and then verifying the canonical path is within allowedLocalMediaPath (including a path-separator boundary check).

Copilot uses AI. Check for mistakes.
Comment on lines +1112 to +1114
const auto createdAt = std::chrono::duration_cast<std::chrono::seconds>(created.time_since_epoch()).count();
const std::string responseId = "resp-" + std::to_string(createdAt);
StringBuffer buffer;
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Response IDs are derived only from seconds since epoch (e.g., "resp-"). Under concurrency, multiple requests created within the same second can collide and produce identical response IDs across different streams/unary calls. Consider using microseconds (as earlier streaming implementation did) or a random/UUID-based suffix to guarantee uniqueness while keeping the existing prefix format.

Copilot uses AI. Check for mistakes.
Comment on lines +124 to +131
const std::string type = typeIt->value.GetString();
if (type == "input_text") {
auto textIt = contentObj.FindMember("text");
if (textIt == contentObj.MemberEnd() || !textIt->value.IsString()) {
return absl::InvalidArgumentError("input_text requires a valid text field");
}
contentText = textIt->value.GetString();
} else if (type == "input_image") {
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When input[i].content is an array, multiple input_text parts will overwrite contentText (only the last text part is preserved). The OpenAI Responses schema allows multiple content parts; if more than one input_text is provided, consider concatenating (or otherwise preserving ordering) instead of replacing, so the full prompt reaches the chat template/model.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants