All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Scope-level oversight hooks in
nh.scope(oversight=...)for synchronous tool-call inspection and step-commit inspection. - Oversight trace events and backend/wrapper coverage for accept/reject decision paths.
- Validation for
vote(min_success=...)bounds (>= 1and<= count) with matching tests. - Tool-boundary regression test ensuring recoverable
ToolBoundaryErroris wrapped into JSON payload.
- Finalized committed binding validation at step boundaries, added write-root dirty tracking for dotted
nh_assign, and aligned prompt/docs with the new validation contract. - Introduced
ExecutionRef(run_id,scope_id, optionalstep_id) and renamed runtime accessor toget_execution_ref. - Added top-level module access for
oversightandresilience, with docs and public API tests aligned. - Consolidated tool-call oversight inspection to a single wrapper-side entry point and removed duplicate inspection guard state.
- Unified tool result handling to TypedDict-based envelope types (
ToolResult,ToolError) instead of Pydantic models. - Clarified tool-boundary policy: recoverable tool-call failures are JSON-envelope responses, host invariant violations propagate as Python exceptions.
- Updated specification and runtime configuration docs to codify envelope-vs-raise boundary semantics for tool-call failures.
- Updated API and patterns docs for
ExecutionRef,get_execution_ref, and oversight/resilience namespace exposure.
- Backend handler now preserves wrapped tool error payloads (
{"value": ..., "error": ...}) without dropping error details. - Tool-call context propagation now preserves step identity in execution ref during step execution.
- Redesigned
nh.scope()aroundmodesemantics ("inherit"default,"replace"for explicit replacement). - Scope prompt suffix arguments now use list-based forms:
system_prompt_suffix_fragmentsanduser_prompt_suffix_fragments.
- Removed
step_executor_configuration_patchfromnh.scope(). - Removed
StepExecutorConfigurationPatchfrom the public API.
- Clarified and aligned docs/tests for scope merge vs replace behavior.
- Unit tests covering prompt token-budget injection: system prompt resolves
$tool_result_max_tokens, and custom user prompt templates can resolve the same placeholder.
- Default step system prompt now states that tool result
valueis a preview and includes the injected max-token limit. - User prompt template rendering now uses
Template.safe_substitute, aligned with system prompt injection behavior and compatible with optional$tool_result_max_tokensplaceholders.
nighthawk.UsageMeter: run-scoped, thread-safe LLM token usage accumulator. Created automatically bynh.run()and readable vianh.get_current_usage_meter().nighthawk.resilience.budgettransformer: composable token and cost budget enforcement with pre-call and post-call checks. Parameters:tokens,tokens_per_call,cost,cost_per_call,cost_function,estimate_usage.BudgetExceededError,BudgetLimitKind,CostFunctionsupporting types.- OpenTelemetry span event
nighthawk.resilience.budget.exceededandnighthawk.resiliencelogger warning on budget violation.
- Resilience OpenTelemetry events for retry/timeout/circuit paths:
nighthawk.resilience.retry.attempt,nighthawk.resilience.retry.exhausted,nighthawk.resilience.timeout.triggered,nighthawk.resilience.circuit.opened.
- Project status promoted from Alpha to Beta.
- Updated one-line description.
- Removed "experimental" language from README and documentation.
- Updated PyPI keywords for improved discoverability.
- Generalized
StepContextimplicit references to value-based mappings (implicit_reference_name_to_value), and added additive scope injection vianh.scope(implicit_references={...})across nested scopes.
- Implicit type alias discovery: callable signatures in step locals and referenced globals are now scanned for PEP 695
TypeAliasTypereferences, automatically including their definitions in the prompt globals section so the LLM can resolve type names like-> Labels.
nh_evalandnh_assignprovided tools are now async, directly awaiting coroutines in async contexts instead of bridging through a background thread.
nighthawk.resiliencemodule with composable function transformers for production resilience:retrying(tenacity-based),fallback,vote/plurality,timeout,circuit_breaker/CircuitState/CircuitOpenError.tenacity>=9as a core dependency.
BackendModelSettingsbase class andClaudeCodeModelSettingsintermediate class extracting shared settings across coding agent backends.
- Refactored backend settings hierarchy: extracted shared fields (
allowed_tool_names,working_directory) intoBackendModelSettingsand Claude Code fields (max_turns,permission_mode,setting_sources) intoClaudeCodeModelSettings; renamedclaude_executable/codex_executabletoexecutable,claude_max_turnstomax_turns. nh_assignnow resolves type annotations viaget_type_hintsfor plain classes and dataclasses, enabling type-mismatch retry beyond Pydantic models.- Simplified intent hint formatting: dropped
intent:prefix from callable metadata comments in prompt context. - Renamed
NIGHTHAWK_RUN_INTEGRATION_TESTStoNIGHTHAWK_OPENAI_INTEGRATION_TESTSfor consistency with other per-backend environment variables. - Restructured documentation into sectioned navigation: split monolithic tutorial into focused guides (
natural-blocks,executors,runtime-configuration,patterns,verification,pydantic-ai-providers); renameddesign.mdtospecification.md; removedpractices.md,providers.md,tutorial.md.
evals/promptfoo/evaluation harness for system prompt optimization using promptfoo: custom Python provider, reusable assertions, and prompt variant A/B comparison support. SeeCONTRIBUTING.mdfor usage.docs/philosophy.md: design philosophy and motivation behind Nighthawk.docs/practices.md: practical patterns and binding function design guidance (extracted from tutorial).
- Replaced
return_reference_pathwithreturn_expressionin step execution contract: return values are now specified as Python expressions evaluated against step locals/globals, consistent withnh_eval/nh_assignexpression evaluation. This unblocks coding-agent backends (e.g. Claude Code CLI) that compute results via native tools without bridging values throughnh_assign. nh_assignnow infers binding types from initial values when no explicit annotation is provided, enabling type-mismatch retry for unannotated write bindings.- Default
json_renderer_stylechanged from"strict"to"default", making truncation visible via…omission markers in prompt context and tool results. - Merged
nh_execintonh_eval:nh_evalnow handles expression evaluation, function calls, and in-place mutation.nh_execis removed. - Condensed system prompt: simplified tool selection guidance (single
nh_evaltool), added execution order section, clarified tool result format. - Condensed step execution contract (outcome prompt suffix) for reduced token usage.
- Improved
nh_assignandnh_evaltool descriptions for LLM clarity. - Restructured documentation: rewrote
index.md,tutorial.md,for-coding-agents.md; cross-referenced specification and practice guides. - Integration tests: replaced single
NIGHTHAWK_RUN_INTEGRATION_TESTSgate with per-backend environment variables (NIGHTHAWK_CODEX_INTEGRATION_TESTS,NIGHTHAWK_CLAUDE_SDK_INTEGRATION_TESTS,NIGHTHAWK_CLAUDE_CLI_INTEGRATION_TESTS).
nh_exectool (functionality absorbed bynh_eval).- Three redundant OpenAI integration tests from
test_llm_integration.py(covered by promptfoo evaluation harness). pytest_sessionstartcredential-check hook (replaced by per-backend skip helpers).
0.4.0 - 2026-03-20
nighthawk.testingmodule with test executors and convenience factories for deterministic Natural function testing without LLM API calls.
- Rewrote testing documentation in
tutorial.md(Section 8) andfor-coding-agents.md(Section 8): replaced incorrectTestModelusage withnighthawk.testingutilities, added testing strategy guidance distinguishing mock tests (Python logic) from integration tests (Natural block judgment).
0.3.1 - 2026-03-19
- Internal ID generation now uses
ulid.generate_ulid()(ULID, 26-character Crockford Base32, timestamp-sortable) in a dedicated module, replacing the formergenerate_idembedded inruntime.scoping.
0.3.0 - 2026-03-18
system_prompt_suffix_fragment_scopeanduser_prompt_suffix_fragment_scopecontext managers for lightweight prompt fragment management without full scope overhead.- OpenTelemetry tracer now reports
instrumenting_library_version.
- Simplified OpenTelemetry span hierarchy: removed implicit
nighthawk.scopespans andnighthawk.step_executorspans.nighthawk.scopespans are now emitted only for explicitnh.scope()calls. nighthawk.runspan no longer includesscope.idattribute; onlyrun.idis emitted.- Trimmed
for-coding-agents.mdfor coding-agent relevance: removed deprecated@nh.toolreferences, condensed exception hierarchy, scoped overrides, and added debugging context toStepContextLimits.
0.2.0 - 2026-03-16
- Added compact step trace support for Natural block execution attempts:
StepTraceStepTraceErrornighthawk.get_step_traces()
- Updated CI workflow setup (
setup-uv) in project automation.
- License badge reference in README.
- Documentation formatting inconsistencies.
0.1.0 - 2026-03-13
- Initial public release of
nighthawk-python. - Natural DSL execution runtime with run/scope execution context model.
- Step executor abstraction and provider integration foundation.
- Core documentation and project scaffolding.