Skip to content

feat(evaluator): add agent constraint enforcement and rename goal_com…#149

Open
nrajanee wants to merge 2 commits into
mainfrom
feat/agent-constraints
Open

feat(evaluator): add agent constraint enforcement and rename goal_com…#149
nrajanee wants to merge 2 commits into
mainfrom
feat/agent-constraints

Conversation

@nrajanee

@nrajanee nrajanee commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Surfaces agent constraints to the evaluator so correct constraint enforcement is never
misclassified as a behavior failure. Introduces agent_constraints (global, eval-config
level) and agent_response assertions (scenario level) as two complementary ways to declare
what the agent is not permitted to do. Violations surface as a new constraint_violation
label inside agent_behavior_failure via a single all-at-once LLM call. Also renames
goal_completion to user_goal_completion throughout to distinguish it from the new constraint
adherence signal.

Changes

  • scenario/entities.py - Added AgentResponseAssertion (type: "agent_response", expected:
    str) and AssertionType.AGENT_RESPONSE; updated the Assertion discriminated union to include
    it alongside ToolCallsAssertion
  • evaluator/entities.py - Added agent_constraints: list[str] to EvaluationInput and
    EvaluationParams; added agent_constraints and expected_behavior fields to TurnItem; added
    constraints_fulfilled: list[str] to TurnEvaluation; renamed goal_completion_score/reason to
    user_goal_completion_score/reason in ConversationEvaluation with a model_validator migration
    shim and deprecated @Property aliases for one release cycle
  • evaluator/builtin_metrics.py - AgentBehaviorFailureMetric now receives agent_context,
    agent_constraints, and expected_behavior in every call so the LLM can suppress
    constraint-related false positives; GoalCompletionMetric receives agent_context; added
    ConstraintViolationMetric with build_constraints_list() that merges global agent_constraints
  • scenario-level expected_behavior into one unified list for a single all-at-once LLM call
  • evaluator/evaluate.py - agent_context/agent_constraints/expected_behavior wired from
    TurnItem into ScoreInput; constraint violation check runs per turn when any constraints are
    defined, independent of the behavior threshold; severity-based merging with existing failure
    labels; evaluate_goal_completion passes agent_context and accepts "goal_completion" as a
    deprecated alias for "user_goal_completion"
  • evaluator/evaluator.py - Builds _scenario_agent_response mapping from agent_response
    assertions at init; passes agent_constraints and expected_behavior through _process_input
    into every TurnItem; run_evaluation threads agent_constraints from EvaluationInput into
    EvaluationParams
  • evaluator/utils/prompts.py - Updated agent_behavior_failure system/user prompts to accept
    agent_context, agent_constraints, expected_behavior; updated goal_completion prompt to
    accept agent_context; added constraint_violation_system_prompt and
    constraint_violation_user_prompt
  • evaluator/utils/enums.py - Added CONSTRAINT_VIOLATION to AgentBehaviorFailureType
    (severity: high); added USER_GOAL_COMPLETION to AgentMetrics
  • evaluator/utils/schema.py - Added ConstraintViolationSchema with violated_constraints,
    fulfilled_constraints, reason
  • evaluator/thresholds.py - "goal_completion" threshold key now also matches
    "user_goal_completion" for backward compat
  • utils/html_report/, ui/frontend/index.html - Updated ConvoRow fields and all JS references
    to user_goal_completion_score/reason
  • tests/ - Updated all unit tests and helpers.py to use user_goal_completion_score/reason

Documentation

  • Updated relevant docs in docs/ (if behavior, config, or API changed)
  • Updated README.md (if installation, quickstart, or usage changed)
  • No docs needed (explain why below)

Docs update for agent_constraints config field and agent_response assertion schema should
follow in a separate docs PR once the feature is validated.

How to Test

Notes

Breaking change: ConversationEvaluation fields goal_completion_score and
goal_completion_reason are renamed to user_goal_completion_score and
user_goal_completion_reason in the JSON output. A model_validator migration shim accepts the
old names on input (for anyone loading old evaluation.json files), and deprecated @Property
aliases are provided for Python code accessing the fields directly. Both will be removed in
the next minor release.

Constraint check architecture: constraints are evaluated in a single all-at-once LLM call
per turn (not per-constraint) to contain cost. The call only runs when at least one
constraint is defined for the turn (global or scenario-level). Fulfilled constraint names
are stored in TurnEvaluation.constraints_fulfilled for downstream visibility.

agent_response assertion scope: declared at the scenario level and applied to every turn
unchanged - the same expected behavior is passed into each turn evaluation as a standing
constraint. Per-turn sequenced assertions (e.g. "verify identity on turn 1, decline on turn
2") are a known MVP limitation and flagged for future work.

/cc @arklexai/arksim-maintainers

@nrajanee nrajanee requested a review from a team as a code owner April 17, 2026 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant