feat: add deterministic quality scoring engine with tests by yuliuyi717-ux · Pull Request #15 · Mint-Claw/content-split

yuliuyi717-ux · 2026-03-02T18:14:36Z

/claim #1

Implemented a deterministic multi-dimensional quality scorer for structured submissions with required output schema, benchmark coverage, and sample scorecards.

What is included:

quality_scorer.py
- auto-detects json, markdown, code, text
- scores 5 dimensions with required weights:
  - completeness 0.30
  - format_compliance 0.20
  - coverage 0.25
  - clarity 0.15
  - validity 0.10
- returns required schema:
  - weighted_score
  - quality_rating
  - scores
  - feedback
  - pass_threshold
tests/test_quality_scorer.py
- format detection tests
- output schema test
- weighted-score consistency test
- 100 submissions performance test (<10s)
- ground-truth tolerance test (mae <= 0.05)
- sample scorecards file integrity test
sample_scorecards.json
- 20 sample scored outputs
scripts/generate_sample_scorecards.py
- regenerates the sample scorecards

Validation:

python3 -m unittest discover -s tests -p 'test_*.py'

yuliuyi717-ux · 2026-03-02T18:15:35Z

Quick verification note:

python3 -m unittest discover -s tests -p 'test_*.py' passes locally.
The suite includes format detection, schema validation, weighted-score consistency, and a 100-submission benchmark check under 10s.
Included sample_scorecards.json with 20 generated scorecards plus a generator script for reproducibility.

If you want stricter calibration checks against your provided 20-item ground-truth set, I can wire that in directly.

yuliuyi717-ux · 2026-03-03T11:59:05Z

Follow-up update pushed:

added ground-truth calibration utility (evaluate_ground_truth_submission_set) to compute MAE + tolerance pass/fail
added CLI script (scripts/evaluate_ground_truth.py) to validate against a provided 20-item truth set
added test coverage for tolerance gating

Validation:

PYTHONPATH=. python3 -m unittest discover -s tests -p 'test_*.py'

This directly strengthens the acceptance item around +/-0.05 calibration checks.

feat: add deterministic quality scoring engine

e39657e

yuliuyi717-ux mentioned this pull request Mar 2, 2026

[BOUNTY $10] Multi-Dimensional Quality Scoring for Structured Outputs #1

Open

feat: add ground-truth calibration evaluation utility

d0d3ea1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add deterministic quality scoring engine with tests#15

feat: add deterministic quality scoring engine with tests#15
yuliuyi717-ux wants to merge 2 commits intoMint-Claw:mainfrom
yuliuyi717-ux:codex/quality-scoring-issue-1

yuliuyi717-ux commented Mar 2, 2026

Uh oh!

yuliuyi717-ux commented Mar 2, 2026

Uh oh!

yuliuyi717-ux commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yuliuyi717-ux commented Mar 2, 2026

Uh oh!

yuliuyi717-ux commented Mar 2, 2026

Uh oh!

yuliuyi717-ux commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant