refactor(evaluator): extract LLMBackedMetric base class to remove duplicated _llm init in built-in metrics

### Describe the bug

Summary
 
 Every built-in metric in `arksim/evaluator/builtin_metrics.py` follows the
 same pattern:
 
 ```python
 class HelpfulnessMetric(QuantitativeMetric):
     def __init__(self, llm: LLM) -> None:
         super().__init__(name="helpfulness")
         self._llm = llm          # ← set directly, bypasses mixin

The super().__init__() call omits llm=llm, and then self._llm is
assigned manually — bypassing _LLMMixin's intended initialization path.
This is repeated identically across 7 classes. The file even contains a
# TODO acknowledging the duplication:

 # TODO: we can define a shared metric class that inherits from quant and qual
 # metric that has the _llm, _system_prompt, and _user_prompt_template...

### Steps to reproduce

Run the code

### Expected behavior

Suggested approach

Introduce an intermediate base class for LLM-backed metrics that stores the
prompt templates and handles _llm initialization correctly:

 class _PromptMetric(QuantitativeMetric):
     """Base for built-in metrics that call an LLM with a fixed prompt pair."""
 
     _system_prompt: str
     _user_prompt_template: str
 
     def __init__(self, name: str, llm: BaseLLM) -> None:
         super().__init__(name=name, llm=llm)   # correctly passes llm to mixin
 
     def score(self, score_input: ScoreInput) -> QuantResult:
         response = self.llm.call(
             [
                 {"role": "system", "content": self._system_prompt},
                 {"role": "user", "content": self._user_prompt_template.format(
                     **score_input.model_dump()
                 )},
             ],
             schema=ScoreSchema,
         )
         return QuantResult(name=self.name, value=response.score, reason=response.reason)

Each concrete metric then only needs to declare its prompts:

 class HelpfulnessMetric(_PromptMetric):
     _system_prompt = helpfulness_system_prompt
     _user_prompt_template = helpfulness_user_prompt
 
     def __init__(self, llm: BaseLLM) -> None:
         super().__init__(name="helpfulness", llm=llm)

Benefits

 - Resolves the existing # TODO
 - _llm is initialized via the mixin — the intended path
 - Adding a new built-in metric is a 3-line subclass

### Error output or logs

```shell

```

### ArkSim version

1.0

### Python version

3.11

### Operating system

macOS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(evaluator): extract LLMBackedMetric base class to remove duplicated _llm init in built-in metrics #175

Describe the bug

ArkSim version

Python version

Operating system

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

refactor(evaluator): extract LLMBackedMetric base class to remove duplicated _llm init in built-in metrics #175

Description

Describe the bug

ArkSim version

Python version

Operating system

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions