Skip to content

Schema silently accepts misnested parallel:/for_each: blocks under agents: items, causing default-model fallback failures #140

@PolyphonyRequiem

Description

@PolyphonyRequiem

Symptom

While running a real workflow, we hit:

ProviderError: SDK call failed after 3 attempts: Copilot SDK call failed:
JSON-RPC Error -32603: Request session.create failed with message:
Model "gpt-4o" is not available.

The agent named in the failure (review_group) was intended to be a parallel container wrapping two reviewer agents. Instead, the engine treated it as a regular type: agent with no model and no prompt, fell back to the Copilot provider's hard-coded default of "gpt-4o" (conductor/providers/copilot.py:198), and failed at runtime because the user's Copilot CLI doesn't have access to that model.

The error message points at the model. The actual bug is in the schema, two layers up.

Root cause

Three independent gaps compound to swallow the mistake silently:

  1. AgentDef does not set extra="forbid". When a user nests parallel: (or for_each:) inside an agents: item — as a sibling of name: and routes: — Pydantic silently drops the unknown field. (conductor/config/schema.py:417, AgentDef.)
  2. type defaults to "agent" when omitted (Literal[...] | None = None). The wrapper item, stripped of its parallel: block, is now a perfectly valid agent with no LLM-agent fields at all — no model, no prompt, no command, no script.
  3. conductor validate reports "Validation Successful" for this YAML and shows the wrapper as review_group | type: agent | model: default without flagging it as suspicious. The validate summary also doesn't show counts for Parallel Groups or For-each Groups, so the absence of review_group from "Parallel Groups" isn't visible to the user.

So the wrapper survives schema validation, survives conductor validate, runs as a default agent, hits the provider default model, and fails at runtime with a message that doesn't mention the schema at all.

Repro

A minimal misnested YAML that exhibits the bug:

name: misnested-parallel-repro
version: "1"
entrypoint: review_group

agents:
  - name: review_group
    parallel:
      - technical_reviewer
      - readability_reviewer
    failure_mode: fail_fast
    routes:
      - to: end

  - name: technical_reviewer
    type: agent
    model: claude-sonnet-4.6
    prompt: "Review the code for technical correctness."

  - name: readability_reviewer
    type: agent
    model: claude-sonnet-4.6
    prompt: "Review the code for readability."

The correct shape is parallel: as a top-level workflow key, sibling of agents: (conductor/config/schema.py:860+, WorkflowConfig):

agents:
  - name: technical_reviewer
    type: agent
    model: claude-sonnet-4.6
    prompt: "..."
  - name: readability_reviewer
    type: agent
    model: claude-sonnet-4.6
    prompt: "..."

parallel:
  - name: review_group
    agents: [technical_reviewer, readability_reviewer]
    failure_mode: fail_fast
    routes:
      - to: end

Steps:

  1. Save the misnested YAML.
  2. conductor validate path/to/workflow.yaml"Validation Successful", summary shows review_group | type: agent | model: default.
  3. Run the workflow with a Copilot CLI that lacks gpt-4o access → fails at runtime with Model "gpt-4o" is not available.

Expected: schema or validate should reject this before it ever reaches the provider.

Proposed fixes

Any subset would help. Ordered roughly by leverage:

a. Add model_config = ConfigDict(extra="forbid") to AgentDef (and ParallelGroup, ForEachDef, WorkflowConfig).
This is the highest-leverage fix. Hard-fails at load time with a clear error pointing at the unknown key — exactly the kind of mistake schema-level enforcement is supposed to catch. Worth auditing whether any legitimate user YAMLs rely on extras being silently accepted before flipping it on.

b. Validator should reject any AgentDef of type "agent" (explicit or defaulted) that has neither prompt nor model.
Such an agent cannot possibly do anything useful. A clear "agent X has no prompt and no model — did you mean to nest this under parallel: or for_each: at the top level?" would have caught this immediately.

c. conductor validate summary should include counts for Parallel Groups and For-each Groups alongside Agents and Human Gates. Their absence is then a visible signal that something the user thought they declared isn't actually being parsed as such.

d. The Copilot provider default model should not be a hard-coded "gpt-4o".
Either require explicit per-agent models, or fail fast at session creation when no model is set — rather than silently substituting a hard-coded one that the user may not have access to. Reference: conductor/providers/copilot.py:198. This wouldn't have prevented the misnesting bug, but it would have produced an error message that points closer to the real cause ("agent X has no model configured") instead of one that points at a model name the user never wrote.

Why this matters

We were running our SDLC pipeline against itself. Fleet code-reviewer agents had reviewed our workflow YAMLs and approved them. They never executed the workflow, so they never saw the runtime failure. The first time anyone actually ran the pipeline end-to-end, we hit this — and the visible error pointed at a model name that doesn't appear anywhere in our config.

That's the failure mode schema-level enforcement is supposed to prevent. By the time the error surfaces at the provider, you're three layers downstream of the actual mistake.

Note

We're separately fixing our YAMLs in our downstream repo. This issue is about hardening conductor itself so the next user doesn't fall into the same trap.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions