Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude/skills/conductor/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ For runtime config, context modes, limits, and cost tracking, see [references/au
| `context.mode` | How agents share data (accumulate, last_only, explicit) |
| `limits` | Safety bounds (max_iterations up to 500, timeout_seconds) |
| `cost` | Token usage and cost tracking configuration |
| `runtime` | Provider, model, temperature, max_tokens, MCP servers |
| `runtime` | Provider, model, temperature, max_tokens, reasoning effort, MCP servers |
| `--web` | Real-time web dashboard with DAG graph, live streaming, in-browser human gates |
| `checkpoint` | Auto-saved on failure; resume with `conductor resume` |
| `registry` | Named workflow sources (GitHub repo or local dir) for sharing workflows |
Expand Down
32 changes: 32 additions & 0 deletions .claude/skills/conductor/references/authoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ workflow:
timeout: 600 # Per-request timeout in seconds (optional)
max_agent_iterations: 50 # Max tool-use roundtrips per agent (1-500, optional)
max_session_seconds: 120 # Wall-clock timeout per agent session (optional)
default_reasoning_effort: medium # Workflow-wide reasoning effort: low, medium, high, xhigh (optional)

input: # Define workflow inputs
param_name:
Expand Down Expand Up @@ -76,10 +77,41 @@ agents:
max_agent_iterations: 100 # Override workflow default for this agent (optional)
max_session_seconds: 60 # Wall-clock timeout for this agent (optional)

reasoning: # Override runtime.default_reasoning_effort (optional)
effort: high # low, medium, high, or xhigh

routes: # Where to go next
- to: next_agent
```

### Reasoning Effort

`reasoning.effort` (per-agent) and `runtime.default_reasoning_effort` (workflow-wide) accept `low`, `medium`, `high`, or `xhigh`. Per-agent overrides the runtime default. The provider translates the unified value to its native API:

- **Copilot**: forwarded as `reasoning_effort` on the session. Validated against the model's advertised `supported_reasoning_efforts`; raises `ValidationError` for unsupported combinations (skipped in mock-handler mode or when capability metadata is absent).
- **Claude**: enables extended thinking via `thinking={"type": "enabled", "budget_tokens": N}` with mapping `low=2048`, `medium=8192`, `high=16384`, `xhigh=32768`. Auto-coerces `temperature` to `1.0` (logged at INFO) and bumps `max_tokens` to fit `budget + 4096` (capped at 64000, logged at INFO when clamped). Only valid on thinking-capable models (`claude-3-7-*`, `claude-opus-4*`, `claude-sonnet-4*`, `claude-haiku-4*`); raises `ValidationError` otherwise.

Both providers surface reasoning content via `agent_reasoning` events visible in the dashboard, JSONL logs, and the console at `-vv`. Not allowed on `script`, `human_gate`, or `workflow` agent types.

```yaml
runtime:
provider: claude
default_model: claude-opus-4-20250514
default_reasoning_effort: medium # workflow-wide default

agents:
- name: explainer
prompt: "Explain this algorithm."
# inherits 'medium'

- name: architect
reasoning:
effort: high # override
prompt: "Design the system architecture."
```

See `examples/reasoning-effort.yaml` for a complete example.

## Routing Patterns

### Linear
Expand Down
17 changes: 16 additions & 1 deletion .claude/skills/conductor/references/yaml-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ workflow:
timeout: float # Per-request timeout in seconds (optional, default: 600)
max_agent_iterations: integer # Max tool-use roundtrips per agent (1-500, optional)
max_session_seconds: float # Wall-clock timeout per agent session in seconds (optional)
default_reasoning_effort: string # Workflow-wide reasoning/thinking effort: low, medium, high, xhigh (optional)
mcp_servers: # MCP server configurations
<server_name>:
type: string # "stdio" (default), "http", or "sse"
Expand Down Expand Up @@ -130,6 +131,11 @@ agents:
max_agent_iterations: integer # Max tool-use roundtrips for this agent (1-500, optional)
max_session_seconds: float # Wall-clock timeout for this agent session (optional)

# Per-agent reasoning effort (overrides runtime.default_reasoning_effort)
# Not allowed for script, human_gate, or workflow agent types.
reasoning:
effort: string # low, medium, high, or xhigh

# Per-agent retry policy (optional, not allowed for script agents)
retry:
max_attempts: integer # Max attempts including first (1-10, default: 1 = no retry)
Expand All @@ -146,7 +152,16 @@ agents:
timeout: integer # Per-script timeout in seconds
```

**Script agent restrictions:** Cannot have `prompt`, `provider`, `model`, `tools`, `output`, `system_prompt`, `options`, `retry`. Output is always `{stdout, stderr, exit_code}`.
**Script agent restrictions:** Cannot have `prompt`, `provider`, `model`, `tools`, `output`, `system_prompt`, `options`, `retry`, `reasoning`. Output is always `{stdout, stderr, exit_code}`.

**Reasoning effort:** `reasoning.effort` (and `runtime.default_reasoning_effort`) accepts `low`, `medium`, `high`, or `xhigh`. Per-agent value overrides the runtime default. Each provider translates the unified value to its native API:

- **Copilot**: forwards `reasoning_effort` to the session. Validated against the model's advertised `supported_reasoning_efforts` (when available); raises `ValidationError` for unsupported combinations.
- **Claude**: enables extended thinking via `thinking={"type":"enabled","budget_tokens":N}` with mapping low=2048, medium=8192, high=16384, xhigh=32768. Auto-coerces `temperature=1.0` (Anthropic API requirement) and bumps `max_tokens` to fit `budget+4096` (capped at 64000). Only valid on thinking-capable models (Claude 3.7+, Opus/Sonnet/Haiku 4.x); raises `ValidationError` otherwise.

Both providers continue to surface reasoning content via `agent_reasoning` events visible in the dashboard, JSONL logs, and console at `-vv`.

Forbidden on agent types: `script`, `human_gate`, `workflow`.

## Script Agent Schema

Expand Down
2 changes: 2 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ make validate-examples # validate all examples
- **Failure modes** for parallel/for-each: `fail_fast`, `continue_on_error`, `all_or_nothing`
- **Route evaluation**: First matching `when` condition wins; no `when` = always matches
- **Tool resolution**: `null` = all workflow tools, `[]` = none, `[list]` = subset
- **Reasoning effort**: `runtime.default_reasoning_effort` sets a workflow-wide default; per-agent `reasoning.effort` overrides it. Allowed values: `low`, `medium`, `high`, `xhigh`. Each provider translates the unified value to its native API (Copilot: `reasoning_effort` on the session, validated against the model's `supported_reasoning_efforts`; Claude: extended thinking with budget mapping low=2048, medium=8192, high=16384, xhigh=32768 tokens, with `temperature` coerced to 1.0 and `max_tokens` bumped to fit the budget). See `examples/reasoning-effort.yaml`.

## Tests Structure

Expand Down Expand Up @@ -158,5 +159,6 @@ All providers (`copilot.py`, `claude.py`) must maintain feature parity. Any chan
- **Output contract**: Same `AgentOutput` structure with consistent field population (model, tokens, input_tokens, output_tokens, content)
- **Tool execution**: Same MCP tool calling interface and result handling
- **Session management**: Same lifecycle (`validate_connection()`, `execute()`, `close()`)
- **Reasoning effort**: All providers must accept the unified `reasoning.effort` field (`low` | `medium` | `high` | `xhigh`), translate it to the native API (Copilot `reasoning_effort` on the session; Claude extended `thinking` budget), validate that the selected model supports the requested effort, and raise `ValidationError` with a clear message when it does not. Any reasoning/thinking content the model returns must be surfaced via `agent_reasoning` events so the dashboard, JSONL logger, and console subscriber render it consistently.

When modifying any provider, check all other providers for the same change. The dashboard, JSONL logger, console subscriber, and workflow engine all depend on consistent behavior across providers.
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased](https://github.com/microsoft/conductor/compare/v0.1.11...HEAD)

### Added
- Unified `reasoning.effort` configuration for per-agent and workflow-wide
control of model reasoning / extended-thinking effort. Set
`runtime.default_reasoning_effort` (`low` | `medium` | `high` | `xhigh`) for a
workflow-wide default, or override per agent with a `reasoning.effort` block.
Translates to `reasoning_effort` on the Copilot session and to extended
`thinking` budget on Claude (low=2048, medium=8192, high=16384, xhigh=32768
tokens, with `temperature` coerced to 1.0 and `max_tokens` bumped to fit).
Validates against each model's supported efforts/capabilities and surfaces
thinking content via `agent_reasoning` events. See
[`examples/reasoning-effort.yaml`](examples/reasoning-effort.yaml).

## [0.1.11](https://github.com/microsoft/conductor/compare/v0.1.10...v0.1.11) - 2026-05-04

### Added
Expand Down
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Conductor provides the patterns that work: evaluator-optimizer loops for iterati
- **Sub-workflow composition** - Reusable sub-workflows with templated `input_mapping`, usable inside `for_each` groups for dynamic fan-out
- **Script steps** - Run shell commands and route on exit code or parsed JSON stdout
- **Dialog mode** - Agents can pause for multi-turn conversation when uncertain
- **Reasoning effort** - Unified `reasoning.effort` (low/medium/high/xhigh) per agent or workflow-wide, translated to each provider's native API
- **Workspace instructions** - Auto-discover and inject `AGENTS.md` / `CLAUDE.md` / `.github/copilot-instructions.md` into every agent's prompt
- **Conditional routing** - Route between agents based on output conditions
- **Human-in-the-loop** - Pause for human decisions with Markdown-rendered prompts and clickable file links
Expand Down Expand Up @@ -231,8 +232,9 @@ conductor registry add official myorg/conductor-workflows --default
conductor registry list official

# Run a workflow from the registry
conductor run qa-bot # latest from default registry
conductor run qa-bot@official@1.2.3 # specific version
conductor run qa-bot # latest from default registry
conductor run 'qa-bot@official#v1.2.3' # specific tag (quote the #)
conductor run 'qa-bot@official#main' # branch HEAD (re-resolved on fetch)
```

See [docs/design/registry.md](docs/design/registry.md) for the full design.
Expand Down
70 changes: 70 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,18 @@ workflow:
runtime:
provider: copilot # or 'claude'
default_model: gpt-5.2
temperature: 0.7
max_tokens: 4096
default_reasoning_effort: medium # low | medium | high | xhigh (optional)
# Provider-specific settings...
```

The `default_reasoning_effort` field sets a workflow-wide default for model
reasoning / extended-thinking effort that every provider-backed agent inherits
unless it declares its own `reasoning.effort` override. See
[Reasoning Effort](#reasoning-effort) for the per-provider translation and
constraints.

## Provider Selection

### Copilot Provider
Expand Down Expand Up @@ -122,6 +131,67 @@ workflow:

**Note**: This is output tokens, not context window (200K separate limit)

## Reasoning Effort

Conductor exposes a single, unified `reasoning.effort` knob that controls how
much "thinking" budget the underlying model uses, and translates it to each
provider's native API. Allowed values: `low`, `medium`, `high`, `xhigh`.

Set a workflow-wide default and/or override per agent:

```yaml
workflow:
runtime:
provider: copilot
default_model: gpt-5.2
default_reasoning_effort: medium # workflow-wide default

agents:
- name: explainer
# No reasoning block — inherits `medium` from the runtime default.
prompt: "Explain {{ workflow.input.topic }}"

- name: architect
reasoning:
effort: high # per-agent override wins
prompt: "Design a system for {{ workflow.input.topic }}"
```

Per-agent overrides always win over the workflow-wide default. The
`reasoning.effort` field is **only** valid on standard `agent`-type agents; it
is rejected on `script`, `human_gate`, and `workflow` agents (which do not call
a model).

### Per-provider translation

- **Copilot** — Forwards the chosen effort as `reasoning_effort` to
`CopilotClient.create_session`. The value is validated against the model's
advertised `supported_reasoning_efforts` capability metadata; a
`ValidationError` is raised at startup if the model does not support the
requested effort. Validation is skipped in mock mode or when capability
metadata is unavailable.
- **Claude** — Enables Anthropic's extended thinking via
`messages.create(thinking={"type": "enabled", "budget_tokens": N})` with the
following effort → budget mapping:

| Effort | Budget tokens |
|----------|---------------|
| `low` | 2 048 |
| `medium` | 8 192 |
| `high` | 16 384 |
| `xhigh` | 32 768 |

Extended thinking is only valid on thinking-capable models
(`claude-3-7-*`, `claude-opus-4*`, `claude-sonnet-4*`, `claude-haiku-4*`); a
`ValidationError` is raised otherwise. The provider also auto-coerces
`temperature` to `1.0` (required by the Anthropic API for extended thinking,
logged at INFO) and bumps `max_tokens` to fit `budget + 4096`, capped at
`64000` (logged at INFO when clamped).

Reasoning / thinking content emitted by the model is surfaced via
`agent_reasoning` events and rendered in the dashboard, JSONL logs, and
`-vv` console output for both providers.

## MCP Servers

Configure [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) servers for tool access. Both the Copilot and Claude providers support MCP tools.
Expand Down
72 changes: 72 additions & 0 deletions docs/providers/claude.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ The Claude provider enables Conductor workflows to use Anthropic's Claude models
- [Model Selection](#model-selection)
- [Runtime Configuration](#runtime-configuration)
- [Streaming Limitations](#streaming-limitations)
- [Extended Thinking](#extended-thinking)
- [Troubleshooting](#troubleshooting)
- [Cost Optimization](#cost-optimization)

Expand Down Expand Up @@ -287,6 +288,77 @@ Streaming support is planned for Phase 2 (estimated 2-3 weeks):

Track progress in the project roadmap or GitHub issues.

## Extended Thinking

The Claude provider supports Anthropic's extended thinking via the unified
[`reasoning.effort`](../configuration.md#reasoning-effort) field. Set a
workflow-wide default with `runtime.default_reasoning_effort` and/or override
per agent with an `reasoning.effort` block:

```yaml
workflow:
runtime:
provider: claude
default_model: claude-sonnet-4.5
default_reasoning_effort: medium

agents:
- name: planner
reasoning:
effort: high # per-agent override
prompt: "Plan a deployment for {{ workflow.input.service }}"
```

### Effort → thinking budget

The unified effort level is translated into Anthropic's
`messages.create(thinking={"type": "enabled", "budget_tokens": N})` parameter:

| Effort | Budget tokens |
|----------|---------------|
| `low` | 2 048 |
| `medium` | 8 192 |
| `high` | 16 384 |
| `xhigh` | 32 768 |

### Supported models

Extended thinking is only valid on thinking-capable models. The provider
accepts any model whose name starts with one of:

- `claude-3-7-*`
- `claude-opus-4*`
- `claude-sonnet-4*`
- `claude-haiku-4*`

Requesting `reasoning.effort` on any other model raises a `ValidationError` at
startup so you fail fast instead of silently dropping the budget.

### Auto-coercion of `temperature` and `max_tokens`

When extended thinking is enabled, the Anthropic API requires `temperature=1.0`
and a `max_tokens` value large enough to contain both the thinking budget and
the visible response. The provider handles this for you:

- **`temperature`**: coerced to `1.0` (logged at INFO if you configured a
different value).
- **`max_tokens`**: bumped to `budget + 4096`, capped at `64000` (logged at INFO
when clamped).

This means you don't need to hand-tune `max_tokens` when raising the effort —
the provider will widen the output budget to fit. If you've explicitly set a
`max_tokens` higher than `budget + 4096`, your value is preserved.

### Reasoning content in events

Any thinking content the model returns is surfaced as `agent_reasoning` events
alongside the regular `agent_message` stream, and shows up in the dashboard
detail panel, the JSONL log, and the `-vv` console output. The Copilot provider
emits the same event shape so workflows that mix providers render consistently.

See [`examples/reasoning-effort.yaml`](../../examples/reasoning-effort.yaml) for
a runnable end-to-end example.

## Troubleshooting

### Common Errors and Solutions
Expand Down
26 changes: 26 additions & 0 deletions docs/providers/comparison.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ This guide helps you choose between GitHub Copilot and Anthropic Claude provider
| **Model Selection** | GPT-5.2, o1 | Haiku, Sonnet, Opus | Tie |
| **Streaming** | Yes | No (Phase 1) | Copilot |
| **Tool Support** | Yes (MCP, all types) | Yes (MCP, stdio only) | Copilot |
| **Reasoning / Extended Thinking** | Yes (`reasoning_effort` on session) | Yes (extended `thinking` budget) | Tie |
| **Speed** | Fast | Fast | Tie |
| **Output Quality** | Excellent | Excellent | Tie |
| **Cost Predictability** | High (flat rate) | Variable (usage-based) | Copilot |
Expand Down Expand Up @@ -242,6 +243,31 @@ agents:

See the [MCP Tools guide](../mcp-tools.md) for details.

### Reasoning / Extended Thinking

Both providers expose a unified [`reasoning.effort`](../configuration.md#reasoning-effort)
field (`low` | `medium` | `high` | `xhigh`) at workflow scope
(`runtime.default_reasoning_effort`) or per agent (`reasoning.effort`).
Conductor translates the value to each provider's native API:

**Copilot**:
- Forwarded as `reasoning_effort` on `CopilotClient.create_session`
- Validated against the model's advertised `supported_reasoning_efforts`

**Claude**:
- Translated to `messages.create(thinking={"type": "enabled", "budget_tokens": N})`
- Effort → budget: low=2048, medium=8192, high=16384, xhigh=32768 tokens
- Restricted to thinking-capable models (`claude-3-7-*`, `claude-opus-4*`,
`claude-sonnet-4*`, `claude-haiku-4*`)
- Auto-coerces `temperature=1.0` and bumps `max_tokens` to fit the budget

Reasoning content from either provider surfaces as `agent_reasoning` events
in the dashboard, JSONL log, and `-vv` console output.

**Winner**: Tie (both support it; pick the provider on other grounds)

See [`examples/reasoning-effort.yaml`](../../examples/reasoning-effort.yaml).

## Migration Path

### From Copilot to Claude
Expand Down
Loading
Loading