microsoft · jrob5756 · May 6, 2026 · May 5, 2026
diff --git a/.claude/skills/conductor/SKILL.md b/.claude/skills/conductor/SKILL.md
@@ -100,7 +100,7 @@ For runtime config, context modes, limits, and cost tracking, see [references/au
 | `context.mode` | How agents share data (accumulate, last_only, explicit) |
 | `limits` | Safety bounds (max_iterations up to 500, timeout_seconds) |
 | `cost` | Token usage and cost tracking configuration |
-| `runtime` | Provider, model, temperature, max_tokens, MCP servers |
+| `runtime` | Provider, model, temperature, max_tokens, reasoning effort, MCP servers |
 | `--web` | Real-time web dashboard with DAG graph, live streaming, in-browser human gates |
 | `checkpoint` | Auto-saved on failure; resume with `conductor resume` |
 | `registry` | Named workflow sources (GitHub repo or local dir) for sharing workflows |

diff --git a/.claude/skills/conductor/references/authoring.md b/.claude/skills/conductor/references/authoring.md
@@ -19,6 +19,7 @@ workflow:
     timeout: 600                 # Per-request timeout in seconds (optional)
     max_agent_iterations: 50     # Max tool-use roundtrips per agent (1-500, optional)
     max_session_seconds: 120     # Wall-clock timeout per agent session (optional)
+    default_reasoning_effort: medium  # Workflow-wide reasoning effort: low, medium, high, xhigh (optional)
 
   input:                         # Define workflow inputs
     param_name:
@@ -76,10 +77,41 @@ agents:
     max_agent_iterations: 100    # Override workflow default for this agent (optional)
     max_session_seconds: 60      # Wall-clock timeout for this agent (optional)
 
+    reasoning:                   # Override runtime.default_reasoning_effort (optional)
+      effort: high               # low, medium, high, or xhigh
+
     routes:                      # Where to go next
       - to: next_agent
 ```
 
+### Reasoning Effort
+
+`reasoning.effort` (per-agent) and `runtime.default_reasoning_effort` (workflow-wide) accept `low`, `medium`, `high`, or `xhigh`. Per-agent overrides the runtime default. The provider translates the unified value to its native API:
+
+- **Copilot**: forwarded as `reasoning_effort` on the session. Validated against the model's advertised `supported_reasoning_efforts`; raises `ValidationError` for unsupported combinations (skipped in mock-handler mode or when capability metadata is absent).
+- **Claude**: enables extended thinking via `thinking={"type": "enabled", "budget_tokens": N}` with mapping `low=2048`, `medium=8192`, `high=16384`, `xhigh=32768`. Auto-coerces `temperature` to `1.0` (logged at INFO) and bumps `max_tokens` to fit `budget + 4096` (capped at 64000, logged at INFO when clamped). Only valid on thinking-capable models (`claude-3-7-*`, `claude-opus-4*`, `claude-sonnet-4*`, `claude-haiku-4*`); raises `ValidationError` otherwise.
+
+Both providers surface reasoning content via `agent_reasoning` events visible in the dashboard, JSONL logs, and the console at `-vv`. Not allowed on `script`, `human_gate`, or `workflow` agent types.
+
+```yaml
+runtime:
+  provider: claude
+  default_model: claude-opus-4-20250514
+  default_reasoning_effort: medium    # workflow-wide default
+
+agents:
+  - name: explainer
+    prompt: "Explain this algorithm."
+    # inherits 'medium'
+
+  - name: architect
+    reasoning:
+      effort: high                    # override
+    prompt: "Design the system architecture."
+```
+
+See `examples/reasoning-effort.yaml` for a complete example.
+
 ## Routing Patterns
 
 ### Linear

diff --git a/.claude/skills/conductor/references/yaml-schema.md b/.claude/skills/conductor/references/yaml-schema.md
@@ -34,6 +34,7 @@ workflow:
     timeout: float                  # Per-request timeout in seconds (optional, default: 600)
     max_agent_iterations: integer   # Max tool-use roundtrips per agent (1-500, optional)
     max_session_seconds: float      # Wall-clock timeout per agent session in seconds (optional)
+    default_reasoning_effort: string # Workflow-wide reasoning/thinking effort: low, medium, high, xhigh (optional)
     mcp_servers:                    # MCP server configurations
       <server_name>:
         type: string                # "stdio" (default), "http", or "sse"
@@ -130,6 +131,11 @@ agents:
     max_agent_iterations: integer   # Max tool-use roundtrips for this agent (1-500, optional)
     max_session_seconds: float      # Wall-clock timeout for this agent session (optional)
 
+    # Per-agent reasoning effort (overrides runtime.default_reasoning_effort)
+    # Not allowed for script, human_gate, or workflow agent types.
+    reasoning:
+      effort: string                # low, medium, high, or xhigh
+
     # Per-agent retry policy (optional, not allowed for script agents)
     retry:
       max_attempts: integer         # Max attempts including first (1-10, default: 1 = no retry)
@@ -146,7 +152,16 @@ agents:
     timeout: integer                # Per-script timeout in seconds
 ```
 
-**Script agent restrictions:** Cannot have `prompt`, `provider`, `model`, `tools`, `output`, `system_prompt`, `options`, `retry`. Output is always `{stdout, stderr, exit_code}`.
+**Script agent restrictions:** Cannot have `prompt`, `provider`, `model`, `tools`, `output`, `system_prompt`, `options`, `retry`, `reasoning`. Output is always `{stdout, stderr, exit_code}`.
+
+**Reasoning effort:** `reasoning.effort` (and `runtime.default_reasoning_effort`) accepts `low`, `medium`, `high`, or `xhigh`. Per-agent value overrides the runtime default. Each provider translates the unified value to its native API:
+
+- **Copilot**: forwards `reasoning_effort` to the session. Validated against the model's advertised `supported_reasoning_efforts` (when available); raises `ValidationError` for unsupported combinations.
+- **Claude**: enables extended thinking via `thinking={"type":"enabled","budget_tokens":N}` with mapping low=2048, medium=8192, high=16384, xhigh=32768. Auto-coerces `temperature=1.0` (Anthropic API requirement) and bumps `max_tokens` to fit `budget+4096` (capped at 64000). Only valid on thinking-capable models (Claude 3.7+, Opus/Sonnet/Haiku 4.x); raises `ValidationError` otherwise.
+
+Both providers continue to surface reasoning content via `agent_reasoning` events visible in the dashboard, JSONL logs, and console at `-vv`.
+
+Forbidden on agent types: `script`, `human_gate`, `workflow`.
 
 ## Script Agent Schema
 

diff --git a/AGENTS.md b/AGENTS.md
@@ -121,6 +121,7 @@ make validate-examples    # validate all examples
 - **Failure modes** for parallel/for-each: `fail_fast`, `continue_on_error`, `all_or_nothing`
 - **Route evaluation**: First matching `when` condition wins; no `when` = always matches
 - **Tool resolution**: `null` = all workflow tools, `[]` = none, `[list]` = subset
+- **Reasoning effort**: `runtime.default_reasoning_effort` sets a workflow-wide default; per-agent `reasoning.effort` overrides it. Allowed values: `low`, `medium`, `high`, `xhigh`. Each provider translates the unified value to its native API (Copilot: `reasoning_effort` on the session, validated against the model's `supported_reasoning_efforts`; Claude: extended thinking with budget mapping low=2048, medium=8192, high=16384, xhigh=32768 tokens, with `temperature` coerced to 1.0 and `max_tokens` bumped to fit the budget). See `examples/reasoning-effort.yaml`.
 
 ## Tests Structure
 
@@ -158,5 +159,6 @@ All providers (`copilot.py`, `claude.py`) must maintain feature parity. Any chan
 - **Output contract**: Same `AgentOutput` structure with consistent field population (model, tokens, input_tokens, output_tokens, content)
 - **Tool execution**: Same MCP tool calling interface and result handling
 - **Session management**: Same lifecycle (`validate_connection()`, `execute()`, `close()`)
+- **Reasoning effort**: All providers must accept the unified `reasoning.effort` field (`low` | `medium` | `high` | `xhigh`), translate it to the native API (Copilot `reasoning_effort` on the session; Claude extended `thinking` budget), validate that the selected model supports the requested effort, and raise `ValidationError` with a clear message when it does not. Any reasoning/thinking content the model returns must be surfaced via `agent_reasoning` events so the dashboard, JSONL logger, and console subscriber render it consistently.
 
 When modifying any provider, check all other providers for the same change. The dashboard, JSONL logger, console subscriber, and workflow engine all depend on consistent behavior across providers.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased](https://github.com/microsoft/conductor/compare/v0.1.11...HEAD)
 
+### Added
+- Unified `reasoning.effort` configuration for per-agent and workflow-wide
+  control of model reasoning / extended-thinking effort. Set
+  `runtime.default_reasoning_effort` (`low` | `medium` | `high` | `xhigh`) for a
+  workflow-wide default, or override per agent with a `reasoning.effort` block.
+  Translates to `reasoning_effort` on the Copilot session and to extended
+  `thinking` budget on Claude (low=2048, medium=8192, high=16384, xhigh=32768
+  tokens, with `temperature` coerced to 1.0 and `max_tokens` bumped to fit).
+  Validates against each model's supported efforts/capabilities and surfaces
+  thinking content via `agent_reasoning` events. See
+  [`examples/reasoning-effort.yaml`](examples/reasoning-effort.yaml).
+
 ## [0.1.11](https://github.com/microsoft/conductor/compare/v0.1.10...v0.1.11) - 2026-05-04
 
 ### Added

diff --git a/README.md b/README.md
@@ -19,6 +19,7 @@ Conductor provides the patterns that work: evaluator-optimizer loops for iterati
 - **Sub-workflow composition** - Reusable sub-workflows with templated `input_mapping`, usable inside `for_each` groups for dynamic fan-out
 - **Script steps** - Run shell commands and route on exit code or parsed JSON stdout
 - **Dialog mode** - Agents can pause for multi-turn conversation when uncertain
+- **Reasoning effort** - Unified `reasoning.effort` (low/medium/high/xhigh) per agent or workflow-wide, translated to each provider's native API
 - **Workspace instructions** - Auto-discover and inject `AGENTS.md` / `CLAUDE.md` / `.github/copilot-instructions.md` into every agent's prompt
 - **Conditional routing** - Route between agents based on output conditions
 - **Human-in-the-loop** - Pause for human decisions with Markdown-rendered prompts and clickable file links
@@ -231,8 +232,9 @@ conductor registry add official myorg/conductor-workflows --default
 conductor registry list official
 
 # Run a workflow from the registry
-conductor run qa-bot                    # latest from default registry
-conductor run qa-bot@official@1.2.3    # specific version
+conductor run qa-bot                       # latest from default registry
+conductor run 'qa-bot@official#v1.2.3'     # specific tag (quote the #)
+conductor run 'qa-bot@official#main'       # branch HEAD (re-resolved on fetch)
 ```
 
 See [docs/design/registry.md](docs/design/registry.md) for the full design.

diff --git a/docs/configuration.md b/docs/configuration.md
@@ -13,9 +13,18 @@ workflow:
   runtime:
     provider: copilot  # or 'claude'
     default_model: gpt-5.2
+    temperature: 0.7
+    max_tokens: 4096
+    default_reasoning_effort: medium  # low | medium | high | xhigh (optional)
     # Provider-specific settings...
 ```
 
+The `default_reasoning_effort` field sets a workflow-wide default for model
+reasoning / extended-thinking effort that every provider-backed agent inherits
+unless it declares its own `reasoning.effort` override. See
+[Reasoning Effort](#reasoning-effort) for the per-provider translation and
+constraints.
+
 ## Provider Selection
 
 ### Copilot Provider
@@ -122,6 +131,67 @@ workflow:
 
 **Note**: This is output tokens, not context window (200K separate limit)
 
+## Reasoning Effort
+
+Conductor exposes a single, unified `reasoning.effort` knob that controls how
+much "thinking" budget the underlying model uses, and translates it to each
+provider's native API. Allowed values: `low`, `medium`, `high`, `xhigh`.
+
+Set a workflow-wide default and/or override per agent:
+
+```yaml
+workflow:
+  runtime:
+    provider: copilot
+    default_model: gpt-5.2
+    default_reasoning_effort: medium    # workflow-wide default
+
+agents:
+  - name: explainer
+    # No reasoning block — inherits `medium` from the runtime default.
+    prompt: "Explain {{ workflow.input.topic }}"
+
+  - name: architect
+    reasoning:
+      effort: high                      # per-agent override wins
+    prompt: "Design a system for {{ workflow.input.topic }}"
+```
+
+Per-agent overrides always win over the workflow-wide default. The
+`reasoning.effort` field is **only** valid on standard `agent`-type agents; it
+is rejected on `script`, `human_gate`, and `workflow` agents (which do not call
+a model).
+
+### Per-provider translation
+
+- **Copilot** — Forwards the chosen effort as `reasoning_effort` to
+  `CopilotClient.create_session`. The value is validated against the model's
+  advertised `supported_reasoning_efforts` capability metadata; a
+  `ValidationError` is raised at startup if the model does not support the
+  requested effort. Validation is skipped in mock mode or when capability
+  metadata is unavailable.
+- **Claude** — Enables Anthropic's extended thinking via
+  `messages.create(thinking={"type": "enabled", "budget_tokens": N})` with the
+  following effort → budget mapping:
+
+  | Effort   | Budget tokens |
+  |----------|---------------|
+  | `low`    | 2 048         |
+  | `medium` | 8 192         |
+  | `high`   | 16 384        |
+  | `xhigh`  | 32 768        |
+
+  Extended thinking is only valid on thinking-capable models
+  (`claude-3-7-*`, `claude-opus-4*`, `claude-sonnet-4*`, `claude-haiku-4*`); a
+  `ValidationError` is raised otherwise. The provider also auto-coerces
+  `temperature` to `1.0` (required by the Anthropic API for extended thinking,
+  logged at INFO) and bumps `max_tokens` to fit `budget + 4096`, capped at
+  `64000` (logged at INFO when clamped).
+
+Reasoning / thinking content emitted by the model is surfaced via
+`agent_reasoning` events and rendered in the dashboard, JSONL logs, and
+`-vv` console output for both providers.
+
 ## MCP Servers
 
 Configure [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) servers for tool access. Both the Copilot and Claude providers support MCP tools.

diff --git a/docs/providers/claude.md b/docs/providers/claude.md
@@ -9,6 +9,7 @@ The Claude provider enables Conductor workflows to use Anthropic's Claude models
 - [Model Selection](#model-selection)
 - [Runtime Configuration](#runtime-configuration)
 - [Streaming Limitations](#streaming-limitations)
+- [Extended Thinking](#extended-thinking)
 - [Troubleshooting](#troubleshooting)
 - [Cost Optimization](#cost-optimization)
 
@@ -287,6 +288,77 @@ Streaming support is planned for Phase 2 (estimated 2-3 weeks):
 
 Track progress in the project roadmap or GitHub issues.
 
+## Extended Thinking
+
+The Claude provider supports Anthropic's extended thinking via the unified
+[`reasoning.effort`](../configuration.md#reasoning-effort) field. Set a
+workflow-wide default with `runtime.default_reasoning_effort` and/or override
+per agent with an `reasoning.effort` block:
+
+```yaml
+workflow:
+  runtime:
+    provider: claude
+    default_model: claude-sonnet-4.5
+    default_reasoning_effort: medium
+
+agents:
+  - name: planner
+    reasoning:
+      effort: high          # per-agent override
+    prompt: "Plan a deployment for {{ workflow.input.service }}"
+```
+
+### Effort → thinking budget
+
+The unified effort level is translated into Anthropic's
+`messages.create(thinking={"type": "enabled", "budget_tokens": N})` parameter:
+
+| Effort   | Budget tokens |
+|----------|---------------|
+| `low`    | 2 048         |
+| `medium` | 8 192         |
+| `high`   | 16 384        |
+| `xhigh`  | 32 768        |
+
+### Supported models
+
+Extended thinking is only valid on thinking-capable models. The provider
+accepts any model whose name starts with one of:
+
+- `claude-3-7-*`
+- `claude-opus-4*`
+- `claude-sonnet-4*`
+- `claude-haiku-4*`
+
+Requesting `reasoning.effort` on any other model raises a `ValidationError` at
+startup so you fail fast instead of silently dropping the budget.
+
+### Auto-coercion of `temperature` and `max_tokens`
+
+When extended thinking is enabled, the Anthropic API requires `temperature=1.0`
+and a `max_tokens` value large enough to contain both the thinking budget and
+the visible response. The provider handles this for you:
+
+- **`temperature`**: coerced to `1.0` (logged at INFO if you configured a
+  different value).
+- **`max_tokens`**: bumped to `budget + 4096`, capped at `64000` (logged at INFO
+  when clamped).
+
+This means you don't need to hand-tune `max_tokens` when raising the effort —
+the provider will widen the output budget to fit. If you've explicitly set a
+`max_tokens` higher than `budget + 4096`, your value is preserved.
+
+### Reasoning content in events
+
+Any thinking content the model returns is surfaced as `agent_reasoning` events
+alongside the regular `agent_message` stream, and shows up in the dashboard
+detail panel, the JSONL log, and the `-vv` console output. The Copilot provider
+emits the same event shape so workflows that mix providers render consistently.
+
+See [`examples/reasoning-effort.yaml`](../../examples/reasoning-effort.yaml) for
+a runnable end-to-end example.
+
 ## Troubleshooting
 
 ### Common Errors and Solutions

diff --git a/docs/providers/comparison.md b/docs/providers/comparison.md
@@ -12,6 +12,7 @@ This guide helps you choose between GitHub Copilot and Anthropic Claude provider
 | **Model Selection** | GPT-5.2, o1 | Haiku, Sonnet, Opus | Tie |
 | **Streaming** | Yes | No (Phase 1) | Copilot |
 | **Tool Support** | Yes (MCP, all types) | Yes (MCP, stdio only) | Copilot |
+| **Reasoning / Extended Thinking** | Yes (`reasoning_effort` on session) | Yes (extended `thinking` budget) | Tie |
 | **Speed** | Fast | Fast | Tie |
 | **Output Quality** | Excellent | Excellent | Tie |
 | **Cost Predictability** | High (flat rate) | Variable (usage-based) | Copilot |
@@ -242,6 +243,31 @@ agents:
 
 See the [MCP Tools guide](../mcp-tools.md) for details.
 
+### Reasoning / Extended Thinking
+
+Both providers expose a unified [`reasoning.effort`](../configuration.md#reasoning-effort)
+field (`low` | `medium` | `high` | `xhigh`) at workflow scope
+(`runtime.default_reasoning_effort`) or per agent (`reasoning.effort`).
+Conductor translates the value to each provider's native API:
+
+**Copilot**:
+- Forwarded as `reasoning_effort` on `CopilotClient.create_session`
+- Validated against the model's advertised `supported_reasoning_efforts`
+
+**Claude**:
+- Translated to `messages.create(thinking={"type": "enabled", "budget_tokens": N})`
+- Effort → budget: low=2048, medium=8192, high=16384, xhigh=32768 tokens
+- Restricted to thinking-capable models (`claude-3-7-*`, `claude-opus-4*`,
+  `claude-sonnet-4*`, `claude-haiku-4*`)
+- Auto-coerces `temperature=1.0` and bumps `max_tokens` to fit the budget
+
+Reasoning content from either provider surfaces as `agent_reasoning` events
+in the dashboard, JSONL log, and `-vv` console output.
+
+**Winner**: Tie (both support it; pick the provider on other grounds)
+
+See [`examples/reasoning-effort.yaml`](../../examples/reasoning-effort.yaml).
+
 ## Migration Path
 
 ### From Copilot to Claude