Skip to content

feat: multi-dimensional quality scoring for structured outputs#5

Open
sungdark wants to merge 2 commits intoMint-Claw:mainfrom
sungdark:bounty-1-quality-scorer
Open

feat: multi-dimensional quality scoring for structured outputs#5
sungdark wants to merge 2 commits intoMint-Claw:mainfrom
sungdark:bounty-1-quality-scorer

Conversation

@sungdark
Copy link

Summary

Implements bounty #1: scoring structured submissions (JSON/markdown/code/text) with a weighted 0-1 quality score and per-dimension feedback.

Included

    • format auto-detection
    • 5 rubric dimensions: completeness, format compliance, coverage, clarity, validity
    • weighted output schema:

    • schema validation
    • multi-format scoring coverage
    • benchmark check for 100 submissions <10s

Validation

Run:

All tests pass.

@sungdark
Copy link
Author

Follow-up update from my side:

I pushed an improvement commit to make the scorer easier to tune and review:

  • Added configurable weights (auto-normalized)
  • Improved per-dimension feedback wording
  • Added evaluate_against_ground_truth() helper for dataset-level error reporting
  • Expanded tests to include schema, format coverage, benchmark, and ground-truth alignment checks

Happy to tighten/adjust thresholds if you share your preferred calibration dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant