You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #179 M5b 必改2-C (commit b875a16 on feat/179-feishu-agent-sdk-channel), the claude-agent-sdk runtime now accepts images?: string[] in processWithClaude for symmetry with the codex / grok branches, but downgrades non-empty images to a text-only prompt + warn line (mirrors the Grok behavior already in the codebase).
The downgrade was chosen to keep M5b's blast radius scoped — the proper multimodal fix touches the SDK call shape itself.
What this issue tracks
Real multimodal wiring for claude-agent-sdk: send images as actual content blocks the LLM can see.
Sketch
claude-agent-sdkquery({prompt}) accepts prompt: string | AsyncIterable<SDKUserMessage>. Each SDKUserMessage carries a MessageParam (Anthropic spec) whose content can be an array of blocks including:
processWithClaude would switch from query({prompt: task}) to query({prompt: asyncIter([{type:"user", message:{role:"user", content:[textBlock, ...imageBlocks]}}])}) when images are non-empty.
Blocking work to verify first
Vendor multimodal support: every Anthropic-compat endpoint the wizard currently lists (intern / MiniMax / mimo / deepseek / Anthropic native) needs a verify pass — does it accept image blocks? deepseek-v4-pro via https://api.deepseek.com/anthropic is the user-facing concern; verify via real curl with a small base64 image before landing.
Cross-runtime regression: all current processWithClaude callers go through think(); flipping the prompt shape changes the SDK call signature for the common path, not just feishu. Need a verify pass on commhub-inbox + /loop wakes + standalone agent-node smoke.
processWithClaude(task, from, [imagePath]) with non-empty images → LLM sees the image and can describe / reason about it (verify with a "what's in this image?" probe).
Existing text-only path byte-identical (query({prompt: task}) shape preserved when images is empty).
Per-vendor verification matrix (which vendors accept image blocks; warn-and-downgrade for vendors that don't).
Related: Grok ACP promptCapabilities.image=false warn at agent-node/src/cli.ts:1855 (same downgrade pattern, also a candidate for upgrade once Grok backend supports images)
Context
In #179 M5b 必改2-C (commit
b875a16onfeat/179-feishu-agent-sdk-channel), the claude-agent-sdk runtime now acceptsimages?: string[]inprocessWithClaudefor symmetry with the codex / grok branches, but downgrades non-empty images to a text-only prompt + warn line (mirrors the Grok behavior already in the codebase).The downgrade was chosen to keep M5b's blast radius scoped — the proper multimodal fix touches the SDK call shape itself.
What this issue tracks
Real multimodal wiring for claude-agent-sdk: send images as actual content blocks the LLM can see.
Sketch
claude-agent-sdkquery({prompt})acceptsprompt: string | AsyncIterable<SDKUserMessage>. EachSDKUserMessagecarries aMessageParam(Anthropic spec) whosecontentcan be an array of blocks including:processWithClaudewould switch fromquery({prompt: task})toquery({prompt: asyncIter([{type:"user", message:{role:"user", content:[textBlock, ...imageBlocks]}}])})when images are non-empty.Blocking work to verify first
https://api.deepseek.com/anthropicis the user-facing concern; verify via real curl with a small base64 image before landing.processWithClaudecallers go throughthink(); flipping the prompt shape changes the SDK call signature for the common path, not just feishu. Need a verify pass on commhub-inbox + /loop wakes + standalone agent-node smoke.Acceptance criteria
processWithClaude(task, from, [imagePath])with non-empty images → LLM sees the image and can describe / reason about it (verify with a "what's in this image?" probe).query({prompt: task})shape preserved whenimagesis empty).Refs
b875a16(current warn-only impl)promptCapabilities.image=falsewarn atagent-node/src/cli.ts:1855(same downgrade pattern, also a candidate for upgrade once Grok backend supports images)