ADK web is broken in a number of ways regarding non-text handling. A simple agent to demo the problems:
from google.adk.agents import Agent
root_agent = Agent(
name="image_display",
model="gemini-3.1-flash-image-preview",
description=(
"Agent who can generate images."
),
instruction=(
"""
You are an agent who can generate images.
"""
)
)
Given a prompt of "Generate an image of a flower and explain what kind of flower it is.", the agent generates these events:
{
"id": "bfe28ad9-a577-4a28-b3c4-2a378468a8ae",
"appName": "image_display",
"userId": "user",
"state": {},
"events": [
{
"content": {
"parts": [
{
"text": "Generate an image of a flower and explain what kind of flower it is."
}
],
"role": "user"
},
"invocationId": "e-d4bb169c-7ccc-47e6-9ea2-46639c1bb9e9",
"author": "user",
"actions": {
"stateDelta": {},
"artifactDelta": {},
"requestedAuthConfigs": {},
"requestedToolConfirmations": {}
},
"id": "2dc12d0d-2b40-4c14-8f87-ac9a424451bc",
"timestamp": 1774470817.5733912
},
{
"modelVersion": "gemini-3.1-flash-image-preview",
"content": {
"parts": [
{
"text": "**Visualizing a Floral Subject**\n\nI'm now focused on the initial step: creating a detailed image. I've decided on a rose and am working on generating that visual asset. My next step will be analyzing the image created.\n\n\n",
"thought": true
},
{
"text": "**Defining the Image's Details**\n\nI'm starting a close-up photograph of a red rose. Next, I plan to identify it and will then describe the flower's color, structure, and associated meanings.\n\n\n",
"thought": true
},
{
"inlineData": {
"data": "<image-data>",
"mimeType": "image/png"
},
"thought": true
},
{
"text": "**Refining Visual Accuracy**\n\nI'm focused on confirming the visual fidelity. The rose image is being reviewed for how well it matches the prompt's details: color, clarity, and overall composition. The image's subject and details are under scrutiny.\n\n\n",
"thought": true
},
{
"text": "**Evaluating Image Compliance**\n\nI've checked the image against the prompt, focusing on key elements like color and detail. I determined that the image meets the user's requirements. I'm preparing the final delivery, which includes both the image and the explanation.\n\n\n",
"thought": true
},
{
"inlineData": {
"data": "<image-data>",
"mimeType": "image/png"
},
"thoughtSignature": "<signature>"
}
],
"role": "model"
},
"finishReason": "STOP",
"usageMetadata": {
"candidatesTokenCount": 1120,
"candidatesTokensDetails": [
{
"modality": "IMAGE",
"tokenCount": 1120
}
],
"promptTokenCount": 56,
"promptTokensDetails": [
{
"modality": "TEXT",
"tokenCount": 56
}
],
"thoughtsTokenCount": 161,
"totalTokenCount": 1337,
"trafficType": "ON_DEMAND"
},
"avgLogprobs": 1.1017857142857144,
"invocationId": "e-d4bb169c-7ccc-47e6-9ea2-46639c1bb9e9",
"author": "image_display",
"actions": {
"stateDelta": {},
"artifactDelta": {},
"requestedAuthConfigs": {},
"requestedToolConfirmations": {}
},
"id": "091f9644-e3c6-4c17-805e-fccffc259572",
"timestamp": 1774470817.577141
}
],
"lastUpdateTime": 1774470817.577141
}
ADK web is broken in a number of ways regarding non-text handling. A simple agent to demo the problems:
Given a prompt of "Generate an image of a flower and explain what kind of flower it is.", the agent generates these events:
But the display didn't include the image(s):
Cursory review of the code:
mediaTypeis not set.nameis not a correct attribute, should bedisplayNameif it's copied as-is from Part.