feat(reasoning): add reasoning effort configuration to providers#152
Merged
feat(reasoning): add reasoning effort configuration to providers#152
Conversation
Adds an optional unified reasoning.effort field at both the runtime
default level (runtime.default_reasoning_effort) and per-agent level
(reasoning.effort), with values low | medium | high | xhigh. Per-agent
overrides the runtime default.
Each provider translates the unified value to its native API:
- Copilot: passes reasoning_effort to CopilotClient.create_session.
Validates against the model's supportedReasoningEfforts (from
list_models()) and raises ValidationError with a clear message when
the model does not support the requested effort. Skipped in mock-
handler mode.
- Claude: enables extended thinking via messages.create(thinking={...}).
Effort to budget mapping: low=2048, medium=8192, high=16384,
xhigh=32768 tokens. Auto-coerces temperature to 1.0 and bumps
max_tokens to fit budget+4096 (capped at 64000 for thinking-enabled
requests). Validates against thinking-capable model prefixes
(claude-3-7-, claude-opus-4, claude-sonnet-4, claude-haiku-4).
Surfaces thinking content blocks via the existing agent_reasoning
event callback (provider parity with Copilot's assistant.reasoning).
Provider parity is preserved: both providers accept the same field with
identical semantics, raise ValidationError consistently when the model
does not support reasoning, and emit agent_reasoning events.
New shared helper module src/conductor/providers/reasoning.py
centralizes the effort-to-budget mapping, the Claude model prefix
allow-list, and the agent/runtime resolution logic.
Schema validators forbid reasoning on script, human_gate, and
sub-workflow agent types (parity with how 'model' and 'retry' are
handled).
Includes:
- Schema tests (21 cases, parametrized)
- Copilot provider tests (7 cases)
- Claude provider tests (13 cases)
- E2E workflow + factory wiring tests (4 cases)
- examples/reasoning-effort.yaml demonstrating runtime default and
per-agent override
- AGENTS.md updated with reasoning effort key pattern and provider
parity bullet
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
e5ee905 to
867cb6e
Compare
Merged
jrob5756
added a commit
that referenced
this pull request
May 6, 2026
* chore: release 0.1.12 Bumps version to 0.1.12 and updates CHANGELOG with the four PRs merged since v0.1.11: - #149: Windows install diagnostics - #151: Tag-based registry versioning with # ref syntax - #152: Unified reasoning.effort configuration - #153: Dashboard layout fix for human_gate options + loop-backs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: add #155 (Windows update reliability) to 0.1.12 changelog Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an optional unified
reasoning.effortfield that lets users dial upmodel "thinking" / reasoning at both the workflow level
(
runtime.default_reasoning_effort) and per-agent (reasoning.effort).Per-agent value overrides the runtime default.
Allowed values: `low`, `medium`, `high`, `xhigh`.
Per-provider translation
Copilot
`list_models()`) and raises `ValidationError` with a clear message
when the model does not support the requested effort.
Claude
`messages.create(thinking={"type":"enabled","budget_tokens":N})`.
`max_tokens` to fit `budget+4096` (capped at 64000 for thinking-
enabled requests).
`claude-opus-4`, `claude-sonnet-4`, `claude-haiku-4`) and raises
`ValidationError` for non-thinking models.
event (provider parity with Copilot's `assistant.reasoning`).
Provider parity
Both providers:
unsupported.
produces, so the dashboard, JSONL logger, and console subscriber
render it consistently.
Schema
with single `effort` field).
(parity with how `model` and `retry` are handled).
Files changed
`AgentDef.reasoning`, `RuntimeConfig.default_reasoning_effort`,
forbid rules.
mapping, Claude thinking-model prefix allow-list, agent/runtime
resolver.
validate against `supportedReasoningEfforts`, retry loop fix.
every `messages.create` site, auto-coerce temperature/max_tokens,
emit `agent_reasoning` from thinking blocks.
forwarding to both providers.
and per-agent override.
Tests
45 new tests:
default, per-type forbid rules.
per-agent precedence; key absent when unset; `ValidationError` on
unsupported effort; mock-mode skip.
`temperature` coerced to 1.0; `max_tokens` bumped; `ValidationError`
on non-thinking model; runtime/per-agent precedence; thinking blocks
emit `agent_reasoning`; key absent when unset.
Verification
`dialog_evaluator.py` remains)
Out of scope
🤖 Generated with Copilot CLI