The backend is a pure HTTP dispatcher (FastAPI). It has no local AI runtime. All specialist reasoning runs in four independent Foundry Hosted Agent containers.
Frontend (Next.js / ACA)
└── POST /api/review/stream (SSE)
└── FastAPI Backend / Orchestrator (ACA)
│
├─── [Docker Compose — local dev] ──────────────────────────────────────
│ POST http://agent-{name}/responses (HOSTED_AGENT_*_URL)
│ → Clinical / Compliance / Coverage / Synthesis Container
│
└─── [Foundry Hosted Agents — production (azd up)] ──────────────────
POST {AZURE_AI_PROJECT_ENDPOINT}/responses
with agent_reference: {name, type: "agent_reference"}
Authorization: Bearer <DefaultAzureCredential>
→ Foundry Agent Service → registered agent containers
Each agent container runs Microsoft Agent Framework (MAF) via
azure.ai.agentserver.agentframework.from_agent_framework, exposes an HTTP
endpoint, and is registered with Microsoft Foundry as a Hosted Agent via
scripts/register_agents.py.
Each agent's main.py creates MCPStreamableHTTPTool instances with a shared
httpx.AsyncClient (including User-Agent: claude-code/1.0 for DeepSense
CloudFront routing). Tools are passed via tools=[...] to .as_agent() and
called directly during inference.
Note:
scripts/register_agents.pycreates Foundry project connections for portal visibility (Build → Tools), butMCPTooldefinitions onHostedAgentDefinitionare disabled (tools=[]) because the Foundrytools/resolveAPI is not GA in all regions.
PubMed's MCP server terminates idle sessions after ~10 minutes. The clinical
agent uses _ReconnectingMCPTool — a subclass of MCPStreamableHTTPTool that
catches McpError('Session terminated') and auto-reconnects with a fresh
session. Other MCP servers (DeepSense) use standard MCPStreamableHTTPTool.
Each agent loads its SKILL.md via SkillsProvider:
skills_provider = SkillsProvider(
skill_paths=str(Path(__file__).parent / "skills")
)SKILL.md files live alongside the agent:
agents/
clinical/skills/clinical-review/SKILL.md # ICD-10 validation, clinical extraction (< 60% warning), literature + trials
coverage/skills/coverage-assessment/SKILL.md # Provider NPI, specialty-procedure match, CMS policy, criteria mapping
compliance/skills/compliance-review/SKILL.md # 10-item checklist (items 9: NCCI, 10: service type are non-blocking)
synthesis/skills/synthesis-decision/SKILL.md # Gate rubric, weighted confidence, synthesis_audit_trail output
Each agent container declares a local Pydantic model in schemas.py and
passes it via MAF's default_options parameter:
agent = (
AzureOpenAIResponsesClient(...)
.as_agent(
name="clinical-reviewer-agent",
id="clinical-reviewer-agent", # Must match registered name for Foundry Traces
tools=[...],
default_options={"response_format": ClinicalResult},
)
)
app = from_agent_framework(agent)
_patch_trace_agent_id(app, "clinical-reviewer-agent") # Fix gen_ai.agent.id for Foundry Traces
app.run()MAF enforces the schema as a token-level JSON constraint at inference time — no post-processing or regex extraction needed. The backend dispatcher reads the text payload from the OpenAI SDK response:
# hosted_agents.py
output_text = response.output_text
return json.loads(output_text)The Pydantic models live in each agent container:
| Agent | Schema file | Root model |
|---|---|---|
| Clinical | agents/clinical/schemas.py |
ClinicalResult |
| Compliance | agents/compliance/schemas.py |
ComplianceResult |
| Coverage | agents/coverage/schemas.py |
CoverageResult |
| Synthesis | agents/synthesis/schemas.py |
SynthesisOutput (includes synthesis_audit_trail: str — JSON-encoded audit trail with gate_results and confidence_components; parsed back to dict by the orchestrator) |
Phase 1 (parallel): Compliance + Clinical agents
Phase 2 (sequential): Coverage agent (receives clinical findings)
Phase 3: Synthesis agent (receives all three results)
Phase 4: Audit trail + PDF generation
| Mechanism | Where | What it does |
|---|---|---|
| Result validation | _validate_agent_result() |
Checks expected top-level keys |
| Automatic retry | _safe_run() |
Retries once if validation fails |
| SSE status warnings | Phase events | Reports status "warning" for incomplete results |
| Tool result normalization | _normalize_tool_result() |
Maps non-standard status values |
Gate 1: Provider NPI verification → Gate 2: Code validation → Gate 3: Medical necessity
Default to PEND at any uncertain gate. Never DENY in LENIENT mode.
- Review completes → stored in-memory (reviewed via
GET /api/reviews) - Frontend shows Accept / Override panel
POST /api/decisionprevents double-decisions (409)- Generates thread-safe authorization number (
PA-YYYYMMDD-XXXXX) - Produces notification letter (approval or pend) in text and PDF
Letter types:
- Approval — auth number, 90-day validity, coverage criteria met, clinical rationale
- Pend — confidence level, missing documentation, 30-day deadline, appeal rights
Pre-flight step before agents execute:
- Format validation — regex for CPT (5-digit) or HCPCS (letter + 4 digits)
- Curated lookup — ~30 common PA-trigger codes
- Results injected into synthesis prompt for Gate 2
The frontend includes a "Load Sample Case" button for a CT-guided transbronchial lung biopsy case:
| Field | Value |
|---|---|
| Patient | John Smith, DOB 1958-03-15 |
| Provider NPI | 1720180003 (active pulmonologist) |
| ICD-10 codes | R91.1, J18.9, R05.9 |
| CPT code | 31628 |
| Insurance ID | MCR-123456789A |
All five processes — the FastAPI backend and all four agent containers — export
OpenTelemetry traces and metrics to Azure Application Insights via
azure-monitor-opentelemetry. Agent traces are also visible in the Foundry
portal's built-in Traces view when App Insights is linked to the Foundry project.
| Process | OTEL_SERVICE_NAME |
What it instruments |
|---|---|---|
| FastAPI backend | prior-auth-backend |
HTTP requests/responses, outgoing httpx calls to agents, logs, exceptions, live metrics |
| Clinical agent | agent-clinical |
MAF invoke_agent, chat, execute_tool spans, token metrics |
| Coverage agent | agent-coverage |
Same as above |
| Compliance agent | agent-compliance |
Same as above |
| Synthesis agent | agent-synthesis |
Same as above |
Each process configures observability differently based on its role:
- Backend (
observability.py): Callsconfigure_azure_monitor()directly before the FastAPI app starts. This is the standard Azure Monitor SDK pattern for non-MAF applications. - Agent containers: Do NOT call
configure_azure_monitor()manually. Instead, the Foundry agentserver adapter'sinit_tracing()method (called internally byfrom_agent_framework(agent).run()) handles the full OTel setup — creating exporters, callingconfigure_otel_providers(), and enabling MAF instrumentation. Agent code only sets env vars before the adapter runs.
Why agents don't call
configure_azure_monitor()directly: The adapter'sinit_tracing()callsconfigure_otel_providers()which replaces any existing OTel providers. If agent code callsconfigure_azure_monitor()first, the adapter overwrites it — creating a conflict where traces go to App Insights but the Foundry portal can't correlate them (Trace ID = "--", Duration = "--"). Letting the adapter handle everything avoids this conflict.
TODO (vNext): The
_patch_trace_agent_id()monkey-patch in each agent'smain.pyis a workaround for the current Hosted Agents Preview. It should be removed when migrating to the vNext hosted agents backend, which handles telemetry at the platform level via Entra-based agent identity.
The agentserver adapter (v1.0.0b17) has two gaps that prevent Foundry trace correlation for hosted agents:
-
Missing
gen_ai.agent.idon spans. The adapter reads this from the request payload'sagentfield (viaAgentRunContext.get_agent_id_object()), but Foundry Agent Service does not include theagentreference when forwarding requests to hosted containers. -
Missing Foundry env var dimensions on spans. The adapter populates
azure.ai.agentserver.agent_id,agent_name, andagent_project_resource_idon log records (viaCustomDimensionsFilter/get_dimensions()), but NOT on OTel spans (requests/dependencies tables).
Fix: All four agent containers monkey-patch
AgentRunContextMiddleware.set_run_context_to_context_var (via
_patch_trace_agent_id() in each main.py) to inject both gen_ai.agent.id
and the Foundry env var dimensions into the span context.
Current status: The patch correctly populates all attributes in App Insights spans. However, the Foundry Traces tab still shows "--" because it reads from a Foundry internal OTEL collector pipeline that does not surface hosted agent data in the current Preview version. The Monitor tab (App Insights) works correctly.
The adapter's configure_otel_providers() hard-codes enable_sensitive_data=True,
which records full LLM prompts, tool arguments, and results in telemetry spans.
This cannot currently be overridden via environment variable — the MAF env var
ENABLE_SENSITIVE_DATA is ignored because the adapter passes the value explicitly.
⚠️ Production consideration: Withenable_sensitive_data=True, PA request content (patient names, DOBs, diagnoses, clinical notes) will be stored in Application Insights telemetry. Ensure your App Insights resource has appropriate access controls and data retention policies. If this is a concern, contact theazure-ai-agentserverteam about making this configurable, or reduce App Insights data retention to the minimum required period.
Because OTEL_SERVICE_NAME is set in every process, App Insights
Application Map renders a clean 5-node topology:
prior-auth-backend
├──► agent-compliance
├──► agent-clinical
├──► agent-coverage
└──► agent-synthesis
Edges are drawn from the backend's auto-instrumented outgoing httpx dependency spans. W3C trace context headers propagate across process boundaries so App Insights stitches the end-to-end call graph automatically — no manual correlation ID wiring is needed.
OTEL_SERVICE_NAME is set via os.environ.setdefault(...) so an explicit
env var configured in the Container App (e.g., via Bicep or the ACA portal)
always overrides the in-code default.
prior_auth_review (request_id)
├── phase_1_parallel
│ ├── compliance_agent_dispatch
│ └── clinical_agent_dispatch
├── phase_2_coverage
│ └── coverage_agent_dispatch
├── phase_3_synthesis
│ └── synthesis_agent_dispatch
└── phase_4_audit
| Span | Emitted by | Key attributes |
|---|---|---|
invoke_agent |
MAF | agent name, status |
chat |
MAF | model deployment, turn index |
execute_tool |
MAF | tool name, tool result status |
These spans are children of the backend *_agent_dispatch dependency spans,
creating an end-to-end trace from HTTP request → backend orchestration → agent
tool calls.
| Span | Key attributes |
|---|---|
prior_auth_review |
request_id |
phase_1_parallel |
compliance_status, clinical_status |
phase_2_coverage |
coverage_status |
phase_3_synthesis |
recommendation, confidence |
phase_4_audit |
confidence, confidence_level |
Set the same connection string in all five containers (Bicep injects this
automatically from the shared monitoring module output):
APPLICATION_INSIGHTS_CONNECTION_STRING=InstrumentationKey=<key>;IngestionEndpoint=...Important: Dual env var names for agent containers. The Foundry agentserver adapter reads a different env var name than the Azure Monitor SDK:
| Package | Env var name | Convention |
|---|---|---|
azure-monitor-opentelemetry (backend) |
APPLICATION_INSIGHTS_CONNECTION_STRING |
Azure Monitor SDK |
azure-ai-agentserver (Foundry adapter) |
APPLICATIONINSIGHTS_CONNECTION_STRING |
Azure App Service |
Agent code bridges this by calling os.environ.setdefault("APPLICATIONINSIGHTS_CONNECTION_STRING", ...)
when APPLICATION_INSIGHTS_CONNECTION_STRING is set. Both env vars are also
passed to agent containers via register_agents.py. Without the adapter-expected
name, the adapter's init_tracing() skips setup entirely and the Foundry portal
shows empty Trace ID / Duration / Tokens and "0/3 monitoring features enabled."
Locally (docker-compose), the variable is intentionally absent — all observability blocks are no-ops so the app runs without App Insights.
hosted_agents.py automatically selects the dispatch mode based on environment:
- URL set (
HOSTED_AGENT_*_URL): direct HTTP to the container — Docker Compose mode - URL empty +
AZURE_AI_PROJECT_ENDPOINTset: Foundryagent_referencerouting — production mode
Docker Compose mode — HOSTED_AGENT_*_URL vars (defaults already in docker-compose.yml):
| Agent | Variable | Default |
|---|---|---|
| Clinical | HOSTED_AGENT_CLINICAL_URL |
http://agent-clinical:8088 |
| Compliance | HOSTED_AGENT_COMPLIANCE_URL |
http://agent-compliance:8088 |
| Coverage | HOSTED_AGENT_COVERAGE_URL |
http://agent-coverage:8088 |
| Synthesis | HOSTED_AGENT_SYNTHESIS_URL |
http://agent-synthesis:8088 |
Shared: HOSTED_AGENT_TIMEOUT_SECONDS (default 180).
Foundry Hosted Agents mode — injected automatically by Bicep via azd up:
| Variable | Value |
|---|---|
AZURE_AI_PROJECT_ENDPOINT |
https://<account>.services.ai.azure.com/api/projects/<project> |
HOSTED_AGENT_CLINICAL_NAME |
clinical-reviewer-agent |
HOSTED_AGENT_COMPLIANCE_NAME |
compliance-agent |
HOSTED_AGENT_COVERAGE_NAME |
coverage-assessment-agent |
HOSTED_AGENT_SYNTHESIS_NAME |
synthesis-agent |
Token acquisition uses azure.identity.aio.DefaultAzureCredential — no manual token configuration needed.
The following RBAC roles are automatically assigned during azd up:
| Role | Principal | Scope | How Assigned | Purpose |
|---|---|---|---|---|
| Cognitive Services OpenAI User | Backend Container App managed identity | Foundry account | role-assignments.bicep (provision) |
Orchestrator calls Foundry Responses API with agent_reference routing |
| AcrPull | Foundry project managed identity | Container Registry | role-assignments.bicep (provision) |
Foundry Agent Service pulls agent container images from ACR |
| Cognitive Services OpenAI Contributor | Foundry project managed identity | Foundry account | role-assignments.bicep (provision) |
Hosted agent containers call gpt-5.4 via the Responses API |
| Azure AI User | Foundry project managed identity | Foundry account | role-assignments.bicep (provision) |
Hosted agent containers use Foundry Agent Service data actions |
| Azure AI User | Deployer (user running azd up) |
Foundry project | az role assignment create (postprovision hook) |
register_agents.py registers agents via Foundry Agent Service API |
| Azure AI User | Backend Container App managed identity | Foundry project | az role assignment create (postprovision hook) |
Backend calls Foundry Hosted Agents at runtime via DefaultAzureCredential |
The first four roles are assigned by infra/modules/role-assignments.bicep during azd provision. The remaining Azure AI User roles are assigned via az role assignment create in the postprovision hook — this is intentionally outside Bicep because the CLI command is natively idempotent (no error if the role was previously granted manually).
First-run note: Azure RBAC propagation can take up to several minutes after a new role assignment. On the very first
azd up(when the Azure AI User role is newly created), the postprovision hook automatically retriesregister_agents.pyevery 10 seconds (up to 12 attempts) until the permission propagates. On subsequent runs the role already exists and no retries are needed.
After azd provision, scripts/register_agents.py (called from the azure.yaml postprovision hook)
registers all four agents with Foundry:
- Calls
azure-ai-projectsSDKclient.agents.create_version()with the ACR container image and resource specs fromagent.yaml - Calls
az cognitiveservices agent startto start each agent under Foundry management
Resource specs (defined in each agents/<name>/agent.yaml):
| Agent | CPU | Memory |
|---|---|---|
clinical-reviewer-agent |
1 |
2Gi |
coverage-assessment-agent |
1 |
2Gi |
compliance-agent |
0.5 |
1Gi |
synthesis-agent |
1 |
2Gi |
| Agent ID | Module |
|---|---|
compliance-agent |
agents/compliance/main.py |
clinical-reviewer-agent |
agents/clinical/main.py |
coverage-assessment-agent |
agents/coverage/main.py |
synthesis-agent |
agents/synthesis/main.py |