Describe the bug
Summary
Every built-in metric in arksim/evaluator/builtin_metrics.py follows the
same pattern:
class HelpfulnessMetric(QuantitativeMetric):
def __init__(self, llm: LLM) -> None:
super().__init__(name="helpfulness")
self._llm = llm # ← set directly, bypasses mixin
The super().__init__() call omits llm=llm, and then self._llm is
assigned manually — bypassing _LLMMixin's intended initialization path.
This is repeated identically across 7 classes. The file even contains a
# TODO acknowledging the duplication:
# TODO: we can define a shared metric class that inherits from quant and qual
# metric that has the _llm, _system_prompt, and _user_prompt_template...
### Steps to reproduce
Run the code
### Expected behavior
Suggested approach
Introduce an intermediate base class for LLM-backed metrics that stores the
prompt templates and handles _llm initialization correctly:
class _PromptMetric(QuantitativeMetric):
"""Base for built-in metrics that call an LLM with a fixed prompt pair."""
_system_prompt: str
_user_prompt_template: str
def __init__(self, name: str, llm: BaseLLM) -> None:
super().__init__(name=name, llm=llm) # correctly passes llm to mixin
def score(self, score_input: ScoreInput) -> QuantResult:
response = self.llm.call(
[
{"role": "system", "content": self._system_prompt},
{"role": "user", "content": self._user_prompt_template.format(
**score_input.model_dump()
)},
],
schema=ScoreSchema,
)
return QuantResult(name=self.name, value=response.score, reason=response.reason)
Each concrete metric then only needs to declare its prompts:
class HelpfulnessMetric(_PromptMetric):
_system_prompt = helpfulness_system_prompt
_user_prompt_template = helpfulness_user_prompt
def __init__(self, llm: BaseLLM) -> None:
super().__init__(name="helpfulness", llm=llm)
Benefits
- Resolves the existing # TODO
- _llm is initialized via the mixin — the intended path
- Adding a new built-in metric is a 3-line subclass
### Error output or logs
```shell
ArkSim version
1.0
Python version
3.11
Operating system
macOS
Describe the bug
Summary
Every built-in metric in
arksim/evaluator/builtin_metrics.pyfollows thesame pattern:
ArkSim version
1.0
Python version
3.11
Operating system
macOS