possible refactor

goals: 
- make benchtools more extendible to novel models/runners (e.g. so that a user can define their own model type without having to modify benchtools; so it's faster for us to add more)
- work toward enabling agents (i.e. a loop-based model that will iterate multiple times and pass multiple responses) 
- enable passing more [options](https://docs.ollama.com/api/chat#body-options) to the apicall/runner (eg. max tokens)

likely path to implement: 
- add a run* method bench runner object
- move the logic out of the cases in task.run to bench runner

recommendation:
start by creating bench runner object's run* method moving the case logic out of task.run. . Start so that task.run calls that new method and see how that works/ how it is to explain.  Then consider if the following ideas might help make it easier to modify.

*this method might be named something else and even the object name might be changed

ideas to consider: 
- possibly task.run yeilds responses to benchmark (or other call mechanism) instead of doing the logging in here
- possibly benchmark calls the logger-- this could be more modular and might enable logging more benchmark details with responses without passing benchmark info into the task in weird ways. 
- should runner object be passed to the task or the task to the runner? 


requirements:
- tasks can be run without a benchmark (e.g. for testing or for running a subset of a benchmark); task.run does not necessarily need to generate log files 
- 

context:
- i think this would help supporting agents because i think we can set the spec so that the agent class yeilds the interim responses, which then get logged; that would be a way for benchtools to not need to know the loop. The benchmark or runner would need to store that it is
- see how the custom scorerers/ custom responses are implemented. what would it take for a user-provided model interface? 
- see #71 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

possible refactor #84

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

possible refactor #84

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions