feat(evaluator): add agent constraint enforcement and rename goal_com…#149
Open
nrajanee wants to merge 2 commits into
Open
feat(evaluator): add agent constraint enforcement and rename goal_com…#149nrajanee wants to merge 2 commits into
nrajanee wants to merge 2 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Surfaces agent constraints to the evaluator so correct constraint enforcement is never
misclassified as a behavior failure. Introduces agent_constraints (global, eval-config
level) and agent_response assertions (scenario level) as two complementary ways to declare
what the agent is not permitted to do. Violations surface as a new constraint_violation
label inside agent_behavior_failure via a single all-at-once LLM call. Also renames
goal_completion to user_goal_completion throughout to distinguish it from the new constraint
adherence signal.
Changes
str) and AssertionType.AGENT_RESPONSE; updated the Assertion discriminated union to include
it alongside ToolCallsAssertion
EvaluationParams; added agent_constraints and expected_behavior fields to TurnItem; added
constraints_fulfilled: list[str] to TurnEvaluation; renamed goal_completion_score/reason to
user_goal_completion_score/reason in ConversationEvaluation with a model_validator migration
shim and deprecated @Property aliases for one release cycle
agent_constraints, and expected_behavior in every call so the LLM can suppress
constraint-related false positives; GoalCompletionMetric receives agent_context; added
ConstraintViolationMetric with build_constraints_list() that merges global agent_constraints
TurnItem into ScoreInput; constraint violation check runs per turn when any constraints are
defined, independent of the behavior threshold; severity-based merging with existing failure
labels; evaluate_goal_completion passes agent_context and accepts "goal_completion" as a
deprecated alias for "user_goal_completion"
assertions at init; passes agent_constraints and expected_behavior through _process_input
into every TurnItem; run_evaluation threads agent_constraints from EvaluationInput into
EvaluationParams
agent_context, agent_constraints, expected_behavior; updated goal_completion prompt to
accept agent_context; added constraint_violation_system_prompt and
constraint_violation_user_prompt
(severity: high); added USER_GOAL_COMPLETION to AgentMetrics
fulfilled_constraints, reason
"user_goal_completion" for backward compat
to user_goal_completion_score/reason
Documentation
Docs update for agent_constraints config field and agent_response assertion schema should
follow in a separate docs PR once the feature is validated.
How to Test
Notes
Breaking change: ConversationEvaluation fields goal_completion_score and
goal_completion_reason are renamed to user_goal_completion_score and
user_goal_completion_reason in the JSON output. A model_validator migration shim accepts the
old names on input (for anyone loading old evaluation.json files), and deprecated @Property
aliases are provided for Python code accessing the fields directly. Both will be removed in
the next minor release.
Constraint check architecture: constraints are evaluated in a single all-at-once LLM call
per turn (not per-constraint) to contain cost. The call only runs when at least one
constraint is defined for the turn (global or scenario-level). Fulfilled constraint names
are stored in TurnEvaluation.constraints_fulfilled for downstream visibility.
agent_response assertion scope: declared at the scenario level and applied to every turn
unchanged - the same expected behavior is passed into each turn evaluation as a standing
constraint. Per-turn sequenced assertions (e.g. "verify identity on turn 1, decline on turn
2") are a known MVP limitation and flagged for future work.
/cc @arklexai/arksim-maintainers