The backend logs show an auth error when trying to invoke a hosted agent.
Cause: DefaultAzureCredential cannot acquire a token for the Foundry Responses API.
Fix:
-
Local dev (Docker Compose): Ensure you are logged in to Azure CLI:
az login az account set --subscription <your-subscription-id>
-
Azure (production): Verify the backend Container App's managed identity has the
CognitiveServicesOpenAIUserrole on the Foundry account, and the Foundry project's managed identity hasCognitive Services OpenAI ContributorandAzure AI Userroles on the Foundry account:- Check
infra/modules/role-assignments.bicep - Re-run
azd provisionto reapply role assignments if missing
- Check
-
Both: Confirm
AZURE_AI_PROJECT_ENDPOINTis set correctly:https://<account>.services.ai.azure.com/api/projects/<project-name>The
/api/projects/<project-name>segment is required — the bare account endpoint will not work.
PubMed literature search fails with search_articles: PubMed search failed but
other MCP tools (ICD-10, Clinical Trials, NPI, CMS) work fine.
Cause: PubMed's MCP server (pubmed.mcp.claude.com) terminates idle sessions
after ~10 minutes. The agent container reuses the same MCP session across requests.
If the session has been idle too long between user submissions, PubMed responds
with McpError("Session terminated").
Fix (already applied): The clinical agent uses _ReconnectingMCPTool — a
subclass of MCPStreamableHTTPTool that catches Session terminated errors
and automatically reconnects with a fresh session. See agents/clinical/main.py.
If you still see this error, check:
- The container image was rebuilt after the fix (
azd up) - The agent version includes the
_ReconnectingMCPToolchange (check image tag inaz cognitiveservices agent show)
All agent calls fail with 400 - ID cannot be null or empty or return
status: "failed" with empty output.
Cause: MCPTool definitions in HostedAgentDefinition.tools cause the
agentserver-core adapter to inject a UserInfoContextMiddleware that calls
/agents/{name}/tools/resolve. This API is not available in all Foundry regions
(returns 404), which crashes the entire ASGI pipeline.
Fix: Ensure agents are registered with tools=[] in scripts/register_agents.py.
MCP tools are handled directly by MCPStreamableHTTPTool in each agent's main.py,
not via Foundry's MCPTool proxy. See the comments in scripts/register_agents.py
for details.
Agents connect but return {"error": "...", "tool_results": []} instead of structured output.
Cause 1 — Wrong project endpoint: AZURE_AI_PROJECT_ENDPOINT points to the bare account endpoint instead of the project endpoint.
Cause 2 — Agent not registered: The hosted agent was not successfully deployed/registered with Foundry Agent Service.
Cause 3 — Model deployment missing: The AZURE_OPENAI_DEPLOYMENT_NAME (e.g., gpt-5.4) doesn't exist in the Foundry project.
Fix:
-
Verify agents are registered:
python scripts/register_agents.py --list
-
Confirm the endpoint format in
AZURE_AI_PROJECT_ENDPOINTincludes/api/projects/<project>. -
In the Foundry portal under Build → Deployments, confirm the gpt-5.4 deployment exists and its name matches
AZURE_OPENAI_DEPLOYMENT_NAMEin eachagent.yaml.
The frontend shows an error when submitting a review.
Cause: The backend server is not running, or the frontend is not configured to reach it.
Fix:
-
Ensure the backend is running:
cd backend uvicorn app.main:app --reload -
Ensure
frontend/.env.localhas the correct backend URL:NEXT_PUBLIC_API_BASE=http://localhost:8000/api -
Restart the frontend dev server after changing
.env.local:cd frontend npm run dev
The review fails as soon as a specialist phase starts.
Cause: One or more hosted agent endpoint URLs are missing or unreachable.
Fix: Check which dispatch mode is active:
Docker Compose (local dev): Verify all URL variables are set in backend/.env or docker-compose.yml:
HOSTED_AGENT_COMPLIANCE_URLHOSTED_AGENT_CLINICAL_URLHOSTED_AGENT_COVERAGE_URLHOSTED_AGENT_SYNTHESIS_URL
Make sure docker-compose.yml is running all four agent containers and that their ports match the URLs above.
Foundry Hosted Agents (production / azd up): Verify these variables are set (injected automatically by Bicep):
AZURE_AI_PROJECT_ENDPOINTHOSTED_AGENT_CLINICAL_NAMEHOSTED_AGENT_COMPLIANCE_NAMEHOSTED_AGENT_COVERAGE_NAMEHOSTED_AGENT_SYNTHESIS_NAME
If set manually, confirm the project endpoint format:
https://<account>.services.ai.azure.com/api/projects/<project-name>
Also confirm the agents were successfully registered by scripts/register_agents.py
in the postprovision hook (check azd provision output for registration errors).
You can verify all deployment health at any time with:
python scripts/check_agents.pyThis checks agent registration, App Insights connectivity, MCP tool connections, backend health, and frontend availability.
The backend reaches the hosted endpoint but receives an authorization failure.
Cause: The outbound auth header configuration does not match the hosted agent deployment.
Fix: Depends on the dispatch mode:
Docker Compose: Direct HTTP to containers — no auth configured by default. If containers are behind a proxy requiring auth, set HOSTED_AGENT_AUTH_HEADER, HOSTED_AGENT_AUTH_SCHEME, and HOSTED_AGENT_AUTH_TOKEN in .env.
Foundry Hosted Agents (production): Credentials come from DefaultAzureCredential. Common causes:
- The backend ACA managed identity is missing the
CognitiveServicesOpenAIUserrole on the Foundry account — checkinfra/modules/role-assignments.bicepand re-runazd provision - The Foundry project managed identity is missing
Cognitive Services OpenAI ContributororAzure AI Useron the Foundry account — these roles are required for hosted agent containers to call gpt-5.4 and use Agent Service data actions - The deployer user is missing the
Azure AI Userrole on the Foundry project (required byscripts/register_agents.pyto register agents) — this role is auto-assigned byaz role assignment createin the postprovision hook; re-runazd upto fix AZURE_AI_PROJECT_ENDPOINTis pointing to the wrong project or account- The agents were not successfully registered — check
scripts/register_agents.pyoutput in the postprovision hook logs
register_agents.py fails with:
ERROR: (PermissionDenied) The principal ... lacks the required data action
Microsoft.CognitiveServices/accounts/AIServices/agents/write
Cause: Azure RBAC propagation delay. The postprovision hook assigns the Azure AI User role immediately before running register_agents.py, but Azure's role cache can take up to several minutes to update.
Automatic handling: The hook automatically detects newly assigned roles and retries register_agents.py every 10 seconds (up to 12 attempts / ~2 minutes). You'll see "Waiting for RBAC propagation (attempt N/12)..." messages in the output — this is expected on first deployment.
If all 12 retries fail: RBAC propagation took unusually long. Simply re-run azd up — the role already exists so registration will proceed without retries.
The backend reaches the hosted agent, but parsing or downstream validation fails.
Expected payload (Foundry Responses API envelope):
{
"output": [{"content": [{"text": "{\"field\": \"value\", ...}"}]}]
}The backend uses response.output_text from the OpenAI SDK and parses the JSON result directly.
Fix: Confirm the agent container is returning the standard Foundry Responses
API envelope. MAF's from_agent_framework(agent).run() produces this format
automatically.
After killing a server process, the port remains in LISTENING state.
Cause: Windows TCP socket lingering.
Fix: Wait 2-4 minutes for the socket to clear, or use a different port.
One or more agents return partial data with missing top-level keys.
Cause: The agent's response_format structured output was not fully populated by the model response, typically due to token limits or a model timeout.
Symptoms in server logs:
WARNING app.agents.orchestrator: Clinical Reviewer Agent returned incomplete result (attempt 1/2). Missing keys: clinical_extraction, clinical_summary. Retrying...
INFO app.agents.orchestrator: Clinical Reviewer Agent succeeded on retry (attempt 2/2)
A normal Clinical result has 3 expected top-level keys (diagnosis_validation, clinical_extraction, clinical_summary). Additional fields like procedure_validation, tool_results, and clinical_trials are also present but not checked by the validation gate.
Mitigations (in place):
- Result validation — checks for expected top-level keys via
_EXPECTED_KEYSinorchestrator.py - Automatic retry — retries once (
_MAX_AGENT_RETRIES = 1) if validation fails - SSE warnings — surfaces validation warnings to the frontend
If retries consistently fail:
- The agent's
HOSTED_AGENT_TIMEOUT_SECONDS(default180) may be too low — increase it inbackend/.env - Check the agent container logs in Foundry portal for model errors or context overflow
If traces don't appear in Foundry (Trace ID = "--", Duration = "--", Tokens = "--"):
Known limitation (current Hosted Agents Preview): The Foundry Traces tab does not display trace data for hosted agents even when all span attributes are correctly populated in App Insights. The Traces tab reads from a Foundry internal OTEL collector, which does not surface hosted agent spans in the current version. The Monitor tab (which reads from App Insights) works correctly. This is expected to be fixed in the vNext hosted agents backend. The
_patch_trace_agent_id()monkey-patch in each agent'smain.pyshould be removed once vNext is available.
- Verify the Foundry project has Application Insights configured
- If App Insights was added after agent registration, unregister and re-register
- Verify your backend sends traces to the same Application Insights resource
- Verify
gen_ai.agent.idis populated in spans. The Foundry portal uses this attribute to correlate traces to registered agents. The agentserver adapter (v1.0.0b17) readsgen_ai.agent.idfrom the request payload'sagentfield viaAgentRunContext.get_agent_id_object(). However, Foundry Agent Service does not include theagentreference when forwarding requests to hosted containers — resulting in emptygen_ai.agent.id/gen_ai.agent.namein spans and Trace ID = "--" in the Foundry portal. Fix: monkey-patchAgentRunContextMiddleware.set_run_context_to_context_varto inject the agent name as a fallback. All four agents in this project apply this patch via_patch_trace_agent_id()— see any agent'smain.pyfor the implementation. You can verify by querying App Insights:Iftraces | where cloud_RoleName == 'azure.ai.agentserver' | where message has 'agent_run' | extend agentId = tostring(parse_json(customDimensions)['gen_ai.agent.id']) | project timestamp, agentId
agentIdis empty, the patch is not applied. - Check the env var name: The Foundry agentserver adapter expects
APPLICATIONINSIGHTS_CONNECTION_STRING(no underscore between APPLICATION and INSIGHTS). This is different fromAPPLICATION_INSIGHTS_CONNECTION_STRINGused by theazure-monitor-opentelemetrySDK. Both must be set. See technical-notes.md for details. - If the Foundry Operate tab shows "0/3 monitoring features enabled," the adapter-expected env var is missing or empty