feat: unified GenAI routing client + fix Bedrock & Vertex AI compilation bugs (#81)#82
Open
atharva-nagane wants to merge 1 commit intoc2siorg:mainfrom
Open
Conversation
3 tasks
da4d44d to
50a8276
Compare
50a8276 to
01094b5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #81
This PR does three things that are tightly coupled:
Fixes two providers that cannot compile —
aws_bedrock.rsandgcp_vertex_ai.rsboth referenceCloudError::ProviderError(String),a variant that does not exist in
errors.rs.Rescues the orphaned Vertex AI implementation —
gcp_vertex_ai.rsexisted on disk but was never declared in the module tree, making it
completely unreachable. It is now wired and its existing tests run.
Introduces
UnifiedLlmClient— a provider-agnostic routing layer thatsits on top of the three
LlmProviderimplementations, enablingapplications to target multiple clouds without any provider-specific code.
What's fixed in existing providers
aws_bedrock.rsbuild_messagesCloudError::ProviderError(e)CloudError::Provider { http_status: 0, message: e, retryable: false }.send()errorsCloudError::ProviderError(e)CloudError::Provider { .. }CloudError::ProviderError(e)CloudError::Provider { retryable: true }CloudError::ProviderError(e)CloudError::Serialization { source: e }generate_with_toolsToolConfigurationper toolToolConfigurationwith all tools (AWS Bedrock API requirement)gcp_vertex_ai.rsCloudError::ProviderError(e)mod.rs,main.rs,tests/mod.rs:predictBatch(wrong)publishers/google/models/text-embedding-004:predictrequests[]instances[]withtask_typepredictions[].embeddings.valuesDeltaTextdata:line parser; emitsDeltaText,Usage,Doneeventsname: "tool_called"functionCall.nameandfunctionCall.argsfrom response partsPartstructfunctionCallfieldfunction_call: Option<FunctionCall>with correct serde renameCloudError::ProviderErrorCloudError::Auth { message }CloudError::ProviderErrorCloudError::Network { source }New:
rustcloud/src/genai/Architecture
All three backends implement the same
LlmProvidertrait.The unified client also implements
LlmProvider, meaning callers neverneed to change their call site when switching between a specific provider
and the multi-cloud client.
Builder pattern
Fallback routing
Routing strategy rules
ModelBased inference table
anthropic.*,amazon.*,meta.*,cohere.*,mistral.*"aws"gemini*,text-embedding-*,textembedding-*"gcp"gpt-*,o1*,o3*,ModelRef::Deployment(_)"azure"Fallback transient error detection
Fallback skips a provider and tries the next when it returns:
Hard errors (
Auth,Unsupported, non-retryableProvider) also skip tothe next provider so that a misconfigured backend does not block healthy ones.
Design decisions
UnifiedLlmClient implements
LlmProviderCallers use identical code whether they hold a
BedrockProvideror a
UnifiedLlmClient.This ensures zero lock-in at the call site.
Builder validates at
build()Missing default provider or empty registry returns
Err(String)immediately rather than panicking during a request.
No new Cargo dependencies
The routing layer uses existing crate dependencies:
No binary size increase.
Vertex AI streaming uses
?alt=sseThe
streamGenerateContentendpoint with?alt=ssereturns standard Server Sent Events.
The implementation collects the full body then emits typed
LlmStreamEventvariants — consistent with the Bedrock stream design.Files changed
src/aws/aws_apis/artificial_intelligence/aws_bedrock.rssrc/gcp/gcp_apis/artificial_intelligence/gcp_vertex_ai.rssrc/gcp/gcp_apis/artificial_intelligence/mod.rspub mod gcp_vertex_aisrc/main.rsgcp_vertex_ai+ newgenaimodulesrc/tests/mod.rsgcp_vertex_ai_operations+unified_genai_operationssrc/genai/mod.rssrc/genai/routing.rsRoutingStrategy,infer_provider(),is_transient()src/genai/client.rsUnifiedLlmClient+ buildersrc/tests/unified_genai_operations.rsexamples/unified_genai.mdREADME.mdTesting
Unit tests (no credentials required)
Integration tests (
#[ignore])Run with:
Related
Fixes compilation of #61 (AWS Bedrock)
Fixes and wires #53 (GCP Vertex AI)
Builds on #64 (Azure OpenAI)
Pattern inspired by googleapis/google-cloud-rust:
Centralised credential layer with shared error / retry primitives
consumed by all service clients.