-
Notifications
You must be signed in to change notification settings - Fork 533
Pull requests: UKGovernmentBEIS/inspect_ai
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Parallel tools: Strengthened system-prompt guidance encouraging
react() and deepagent() to issue independent tool calls in parallel
#4085
opened May 29, 2026 by
jjallaire
Collaborator
Loading…
fix(openrouter): replay reasoning_content for deepseek v4
#4080
opened May 28, 2026 by
g6000
Loading…
docs(scorer): clarify
chat_history comment for include_history=True
#4073
opened May 28, 2026 by
RecreationalMath
Contributor
Loading…
1 of 5 tasks
docs: clarify include_history grading behavior
#4072
opened May 28, 2026 by
he-yufeng
Contributor
Loading…
Extract SampleRunner from task_run_sample (PR 1 of EvalSession RFC)
#4070
opened May 28, 2026 by
sinamoeini
•
Draft
5 tasks
Bound transcript memory for long-running samples
qualified
#4062
opened May 27, 2026 by
rasmusfaber
Contributor
Loading…
3 of 5 tasks
Categorical score support: frequency() metric, StrEnum values, value_schema round-trip
qualified
#4058
opened May 27, 2026 by
kaifronsdal
Collaborator
Loading…
6 tasks done
vllm-completions: accept pre-tokenized prompts
#4055
opened May 26, 2026 by
Butanium
Contributor
Loading…
fix(scorer): return NOANSWER (not INCORRECT) when model_graded grade parser fails
#4048
opened May 26, 2026 by
vladmesh
Loading…
1 of 5 tasks
Add Krippendorff's α metric for multi-judge agreement
#4035
opened May 25, 2026 by
joesposito8
Contributor
Loading…
2 of 5 tasks
Arena: Add pairwise comparison with win rate and Elo metrics
#4034
opened May 25, 2026 by
showpiecep
•
Draft
Fix Bedrock provider to support adaptive thinking and output_config (closes #3765)
#4020
opened May 22, 2026 by
ernestprovo23
Contributor
Loading…
2 tasks done
Previous Next
ProTip!
Adding no:label will show everything without a label.