Technical Notes

Architecture Overview

The backend is a pure HTTP dispatcher (FastAPI). It has no local AI runtime. All specialist reasoning runs in four independent Foundry Hosted Agent containers.

Frontend (Next.js / ACA)
  └── POST /api/review/stream   (SSE)
        └── FastAPI Backend / Orchestrator (ACA)
              │
              ├─── [Docker Compose — local dev] ──────────────────────────────────────
              │    POST http://agent-{name}/responses   (HOSTED_AGENT_*_URL)
              │    → Clinical / Compliance / Coverage / Synthesis Container
              │
              └─── [Foundry Hosted Agents — production (azd up)] ──────────────────
                   POST {AZURE_AI_PROJECT_ENDPOINT}/responses
                   with  agent_reference: {name, type: "agent_reference"}
                         Authorization: Bearer <DefaultAzureCredential>
                   → Foundry Agent Service → registered agent containers

Each agent container runs Microsoft Agent Framework (MAF) via azure.ai.agentserver.agentframework.from_agent_framework, exposes an HTTP endpoint, and is registered with Microsoft Foundry as a Hosted Agent via scripts/register_agents.py.

MCP Tool Connections

Each agent's main.py creates MCPStreamableHTTPTool instances with a shared httpx.AsyncClient (including User-Agent: claude-code/1.0 for DeepSense CloudFront routing). Tools are passed via tools=[...] to .as_agent() and called directly during inference.

Note: scripts/register_agents.py creates Foundry project connections for portal visibility (Build → Tools), but MCPTool definitions on HostedAgentDefinition are disabled (tools=[]) because the Foundry tools/resolve API is not GA in all regions.

PubMed Session Reconnect

PubMed's MCP server terminates idle sessions after ~10 minutes. The clinical agent uses _ReconnectingMCPTool — a subclass of MCPStreamableHTTPTool that catches McpError('Session terminated') and auto-reconnects with a fresh session. Other MCP servers (DeepSense) use standard MCPStreamableHTTPTool.

Agent Skills

Each agent loads its SKILL.md via SkillsProvider:

skills_provider = SkillsProvider(
    skill_paths=str(Path(__file__).parent / "skills")
)

SKILL.md files live alongside the agent:

agents/
  clinical/skills/clinical-review/SKILL.md      # ICD-10 validation, clinical extraction (< 60% warning), literature + trials
  coverage/skills/coverage-assessment/SKILL.md  # Provider NPI, specialty-procedure match, CMS policy, criteria mapping
  compliance/skills/compliance-review/SKILL.md  # 10-item checklist (items 9: NCCI, 10: service type are non-blocking)
  synthesis/skills/synthesis-decision/SKILL.md  # Gate rubric, weighted confidence, synthesis_audit_trail output

Structured Output

Each agent container declares a local Pydantic model in schemas.py and passes it via MAF's default_options parameter:

agent = (
    AzureOpenAIResponsesClient(...)
    .as_agent(
        name="clinical-reviewer-agent",
        id="clinical-reviewer-agent",  # Must match registered name for Foundry Traces
        tools=[...],
        default_options={"response_format": ClinicalResult},
    )
)
app = from_agent_framework(agent)
_patch_trace_agent_id(app, "clinical-reviewer-agent")  # Fix gen_ai.agent.id for Foundry Traces
app.run()

MAF enforces the schema as a token-level JSON constraint at inference time — no post-processing or regex extraction needed. The backend dispatcher reads the text payload from the OpenAI SDK response:

# hosted_agents.py
output_text = response.output_text
return json.loads(output_text)

The Pydantic models live in each agent container:

Agent	Schema file	Root model
Clinical	`agents/clinical/schemas.py`	`ClinicalResult`
Compliance	`agents/compliance/schemas.py`	`ComplianceResult`
Coverage	`agents/coverage/schemas.py`	`CoverageResult`
Synthesis	`agents/synthesis/schemas.py`	`SynthesisOutput` (includes `synthesis_audit_trail: str` — JSON-encoded audit trail with `gate_results` and `confidence_components`; parsed back to `dict` by the orchestrator)

Orchestration Flow

Phase 1 (parallel):   Compliance + Clinical agents
Phase 2 (sequential): Coverage agent (receives clinical findings)
Phase 3:              Synthesis agent (receives all three results)
Phase 4:              Audit trail + PDF generation

Resilience

Mechanism	Where	What it does
Result validation	`_validate_agent_result()`	Checks expected top-level keys
Automatic retry	`_safe_run()`	Retries once if validation fails
SSE status warnings	Phase events	Reports status "warning" for incomplete results
Tool result normalization	`_normalize_tool_result()`	Maps non-standard status values

Decision Gate (LENIENT MODE)

Gate 1: Provider NPI verification → Gate 2: Code validation → Gate 3: Medical necessity

Default to PEND at any uncertain gate. Never DENY in LENIENT mode.

Decision and Notification Flow

Review completes → stored in-memory (reviewed via GET /api/reviews)
Frontend shows Accept / Override panel
POST /api/decision prevents double-decisions (409)
Generates thread-safe authorization number (PA-YYYYMMDD-XXXXX)
Produces notification letter (approval or pend) in text and PDF

Letter types:

Approval — auth number, 90-day validity, coverage criteria met, clinical rationale
Pend — confidence level, missing documentation, 30-day deadline, appeal rights

CPT/HCPCS Validation

Pre-flight step before agents execute:

Format validation — regex for CPT (5-digit) or HCPCS (letter + 4 digits)
Curated lookup — ~30 common PA-trigger codes
Results injected into synthesis prompt for Gate 2

Sample Data

The frontend includes a "Load Sample Case" button for a CT-guided transbronchial lung biopsy case:

Field	Value
Patient	John Smith, DOB 1958-03-15
Provider NPI	1720180003 (active pulmonologist)
ICD-10 codes	R91.1, J18.9, R05.9
CPT code	31628
Insurance ID	MCR-123456789A

Observability

All five processes — the FastAPI backend and all four agent containers — export OpenTelemetry traces and metrics to Azure Application Insights via azure-monitor-opentelemetry. Agent traces are also visible in the Foundry portal's built-in Traces view when App Insights is linked to the Foundry project.

Process Roles

Process	`OTEL_SERVICE_NAME`	What it instruments
FastAPI backend	`prior-auth-backend`	HTTP requests/responses, outgoing httpx calls to agents, logs, exceptions, live metrics
Clinical agent	`agent-clinical`	MAF `invoke_agent`, `chat`, `execute_tool` spans, token metrics
Coverage agent	`agent-coverage`	Same as above
Compliance agent	`agent-compliance`	Same as above
Synthesis agent	`agent-synthesis`	Same as above

Each process configures observability differently based on its role:

Backend (observability.py): Calls configure_azure_monitor() directly before the FastAPI app starts. This is the standard Azure Monitor SDK pattern for non-MAF applications.
Agent containers: Do NOT call configure_azure_monitor() manually. Instead, the Foundry agentserver adapter's init_tracing() method (called internally by from_agent_framework(agent).run()) handles the full OTel setup — creating exporters, calling configure_otel_providers(), and enabling MAF instrumentation. Agent code only sets env vars before the adapter runs.

Why agents don't call configure_azure_monitor() directly: The adapter's init_tracing() calls configure_otel_providers() which replaces any existing OTel providers. If agent code calls configure_azure_monitor() first, the adapter overwrites it — creating a conflict where traces go to App Insights but the Foundry portal can't correlate them (Trace ID = "--", Duration = "--"). Letting the adapter handle everything avoids this conflict.

Agent ID / Name for Trace Correlation

TODO (vNext): The _patch_trace_agent_id() monkey-patch in each agent's main.py is a workaround for the current Hosted Agents Preview. It should be removed when migrating to the vNext hosted agents backend, which handles telemetry at the platform level via Entra-based agent identity.

The agentserver adapter (v1.0.0b17) has two gaps that prevent Foundry trace correlation for hosted agents:

Missing gen_ai.agent.id on spans. The adapter reads this from the request payload's agent field (via AgentRunContext.get_agent_id_object()), but Foundry Agent Service does not include the agent reference when forwarding requests to hosted containers.
Missing Foundry env var dimensions on spans. The adapter populates azure.ai.agentserver.agent_id, agent_name, and agent_project_resource_id on log records (via CustomDimensionsFilter/get_dimensions()), but NOT on OTel spans (requests/dependencies tables).

Fix: All four agent containers monkey-patch AgentRunContextMiddleware.set_run_context_to_context_var (via _patch_trace_agent_id() in each main.py) to inject both gen_ai.agent.id and the Foundry env var dimensions into the span context.

Current status: The patch correctly populates all attributes in App Insights spans. However, the Foundry Traces tab still shows "--" because it reads from a Foundry internal OTEL collector pipeline that does not surface hosted agent data in the current Preview version. The Monitor tab (App Insights) works correctly.

Content Recording (Sensitive Data)

The adapter's configure_otel_providers() hard-codes enable_sensitive_data=True, which records full LLM prompts, tool arguments, and results in telemetry spans. This cannot currently be overridden via environment variable — the MAF env var ENABLE_SENSITIVE_DATA is ignored because the adapter passes the value explicitly.

⚠️ Production consideration: With enable_sensitive_data=True, PA request content (patient names, DOBs, diagnoses, clinical notes) will be stored in Application Insights telemetry. Ensure your App Insights resource has appropriate access controls and data retention policies. If this is a concern, contact the azure-ai-agentserver team about making this configurable, or reduce App Insights data retention to the minimum required period.

Application Map

Because OTEL_SERVICE_NAME is set in every process, App Insights Application Map renders a clean 5-node topology:

prior-auth-backend
  ├──► agent-compliance
  ├──► agent-clinical
  ├──► agent-coverage
  └──► agent-synthesis

Edges are drawn from the backend's auto-instrumented outgoing httpx dependency spans. W3C trace context headers propagate across process boundaries so App Insights stitches the end-to-end call graph automatically — no manual correlation ID wiring is needed.

OTEL_SERVICE_NAME is set via os.environ.setdefault(...) so an explicit env var configured in the Container App (e.g., via Bicep or the ACA portal) always overrides the in-code default.

Trace Hierarchy (backend layer)

prior_auth_review (request_id)
  ├── phase_1_parallel
  │     ├── compliance_agent_dispatch
  │     └── clinical_agent_dispatch
  ├── phase_2_coverage
  │     └── coverage_agent_dispatch
  ├── phase_3_synthesis
  │     └── synthesis_agent_dispatch
  └── phase_4_audit

MAF Spans (agent layer — all four containers)

Span	Emitted by	Key attributes
`invoke_agent`	MAF	agent name, status
`chat`	MAF	model deployment, turn index
`execute_tool`	MAF	tool name, tool result status

These spans are children of the backend *_agent_dispatch dependency spans, creating an end-to-end trace from HTTP request → backend orchestration → agent tool calls.

Custom Backend Span Attributes

Span	Key attributes
`prior_auth_review`	`request_id`
`phase_1_parallel`	`compliance_status`, `clinical_status`
`phase_2_coverage`	`coverage_status`
`phase_3_synthesis`	`recommendation`, `confidence`
`phase_4_audit`	`confidence`, `confidence_level`

Enabling Observability

Set the same connection string in all five containers (Bicep injects this automatically from the shared monitoring module output):

APPLICATION_INSIGHTS_CONNECTION_STRING=InstrumentationKey=<key>;IngestionEndpoint=...

Important: Dual env var names for agent containers. The Foundry agentserver adapter reads a different env var name than the Azure Monitor SDK:

Package	Env var name	Convention
`azure-monitor-opentelemetry` (backend)	`APPLICATION_INSIGHTS_CONNECTION_STRING`	Azure Monitor SDK
`azure-ai-agentserver` (Foundry adapter)	`APPLICATIONINSIGHTS_CONNECTION_STRING`	Azure App Service

Agent code bridges this by calling os.environ.setdefault("APPLICATIONINSIGHTS_CONNECTION_STRING", ...) when APPLICATION_INSIGHTS_CONNECTION_STRING is set. Both env vars are also passed to agent containers via register_agents.py. Without the adapter-expected name, the adapter's init_tracing() skips setup entirely and the Foundry portal shows empty Trace ID / Duration / Tokens and "0/3 monitoring features enabled."

Locally (docker-compose), the variable is intentionally absent — all observability blocks are no-ops so the app runs without App Insights.

Hosted Agent Dispatch Settings

hosted_agents.py automatically selects the dispatch mode based on environment:

URL set (HOSTED_AGENT_*_URL): direct HTTP to the container — Docker Compose mode
URL empty + AZURE_AI_PROJECT_ENDPOINT set: Foundry agent_reference routing — production mode

Docker Compose mode — HOSTED_AGENT_*_URL vars (defaults already in docker-compose.yml):

Agent	Variable	Default
Clinical	`HOSTED_AGENT_CLINICAL_URL`	`http://agent-clinical:8088`
Compliance	`HOSTED_AGENT_COMPLIANCE_URL`	`http://agent-compliance:8088`
Coverage	`HOSTED_AGENT_COVERAGE_URL`	`http://agent-coverage:8088`
Synthesis	`HOSTED_AGENT_SYNTHESIS_URL`	`http://agent-synthesis:8088`

Shared: HOSTED_AGENT_TIMEOUT_SECONDS (default 180).

Foundry Hosted Agents mode — injected automatically by Bicep via azd up:

Variable	Value
`AZURE_AI_PROJECT_ENDPOINT`	`https://<account>.services.ai.azure.com/api/projects/<project>`
`HOSTED_AGENT_CLINICAL_NAME`	`clinical-reviewer-agent`
`HOSTED_AGENT_COMPLIANCE_NAME`	`compliance-agent`
`HOSTED_AGENT_COVERAGE_NAME`	`coverage-assessment-agent`
`HOSTED_AGENT_SYNTHESIS_NAME`	`synthesis-agent`

Token acquisition uses azure.identity.aio.DefaultAzureCredential — no manual token configuration needed.

The following RBAC roles are automatically assigned during azd up:

Role	Principal	Scope	How Assigned	Purpose
Cognitive Services OpenAI User	Backend Container App managed identity	Foundry account	`role-assignments.bicep` (provision)	Orchestrator calls Foundry Responses API with `agent_reference` routing
AcrPull	Foundry project managed identity	Container Registry	`role-assignments.bicep` (provision)	Foundry Agent Service pulls agent container images from ACR
Cognitive Services OpenAI Contributor	Foundry project managed identity	Foundry account	`role-assignments.bicep` (provision)	Hosted agent containers call gpt-5.4 via the Responses API
Azure AI User	Foundry project managed identity	Foundry account	`role-assignments.bicep` (provision)	Hosted agent containers use Foundry Agent Service data actions
Azure AI User	Deployer (user running `azd up`)	Foundry project	`az role assignment create` (postprovision hook)	`register_agents.py` registers agents via Foundry Agent Service API
Azure AI User	Backend Container App managed identity	Foundry project	`az role assignment create` (postprovision hook)	Backend calls Foundry Hosted Agents at runtime via `DefaultAzureCredential`

The first four roles are assigned by infra/modules/role-assignments.bicep during azd provision. The remaining Azure AI User roles are assigned via az role assignment create in the postprovision hook — this is intentionally outside Bicep because the CLI command is natively idempotent (no error if the role was previously granted manually).

First-run note: Azure RBAC propagation can take up to several minutes after a new role assignment. On the very first azd up (when the Azure AI User role is newly created), the postprovision hook automatically retries register_agents.py every 10 seconds (up to 12 attempts) until the permission propagates. On subsequent runs the role already exists and no retries are needed.

Agent Registration

After azd provision, scripts/register_agents.py (called from the azure.yaml postprovision hook) registers all four agents with Foundry:

Calls azure-ai-projects SDK client.agents.create_version() with the ACR container image and resource specs from agent.yaml
Calls az cognitiveservices agent start to start each agent under Foundry management

Resource specs (defined in each agents/<name>/agent.yaml):

Agent	CPU	Memory
`clinical-reviewer-agent`	`1`	`2Gi`
`coverage-assessment-agent`	`1`	`2Gi`
`compliance-agent`	`0.5`	`1Gi`
`synthesis-agent`	`1`	`2Gi`

Agent IDs (Foundry)

Agent ID	Module
`compliance-agent`	`agents/compliance/main.py`
`clinical-reviewer-agent`	`agents/clinical/main.py`
`coverage-assessment-agent`	`agents/coverage/main.py`
`synthesis-agent`	`agents/synthesis/main.py`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Technical Notes

Architecture Overview

MCP Tool Connections

PubMed Session Reconnect

Agent Skills

Structured Output

Orchestration Flow

Resilience

Decision Gate (LENIENT MODE)

Decision and Notification Flow

CPT/HCPCS Validation

Sample Data

Observability

Process Roles

Agent ID / Name for Trace Correlation

Content Recording (Sensitive Data)

Application Map

Trace Hierarchy (backend layer)

MAF Spans (agent layer — all four containers)

Custom Backend Span Attributes

Enabling Observability

Hosted Agent Dispatch Settings

Agent Registration

Agent IDs (Foundry)

FilesExpand file tree

technical-notes.md

Latest commit

History

technical-notes.md

File metadata and controls

Technical Notes

Architecture Overview

MCP Tool Connections

PubMed Session Reconnect

Agent Skills

Structured Output

Orchestration Flow

Resilience

Decision Gate (LENIENT MODE)

Decision and Notification Flow

CPT/HCPCS Validation

Sample Data

Observability

Process Roles

Agent ID / Name for Trace Correlation

Content Recording (Sensitive Data)

Application Map

Trace Hierarchy (backend layer)

MAF Spans (agent layer — all four containers)

Custom Backend Span Attributes

Enabling Observability

Hosted Agent Dispatch Settings

Agent Registration

Agent IDs (Foundry)