[model-matchmakah] Learned RouteLLM-style grader

Replace the self-grading confidence call with a small trained classifier per query (RouteLLM-style) and compare escalation accuracy.

Start in `agentkit/route.py` usage in `projects/wildcards/model-matchmakah/`.