feat: Multi-Dimensional Quality Scoring for Structured Outputs by a827681306 · Pull Request #4 · Mint-Claw/content-split

a827681306 · 2026-02-26T07:11:09Z

Summary

Implements a multi-dimensional quality scoring engine for structured submissions (JSON, markdown, code, text), addressing issue #1.

Architecture

scorer.py — Pure Python, zero external dependencies. Two main classes:

QualityScorer — Entry point. Configurable weights, pass threshold. Supports single and batch scoring.
Rubric — Defines expectations: required fields, expected format, keywords, JSON schema, etc.

Auto-format detection via regex heuristics for JSON, markdown, code, and plain text.

Scoring Dimensions

Dimension	Weight	What it measures
Completeness	0.30	Required fields/sections present, minimum length
Format Compliance	0.20	Matches expected format, structural quality
Coverage	0.25	Keyword/topic coverage, vocabulary diversity
Clarity	0.15	Sentence length, repetition, readability
Validity	0.10	JSON schema, bracket balance, syntax patterns

Output Format

{
    "weighted_score": 0.8725,      # 0-1 weighted aggregate
    "quality_rating": "good",      # excellent/good/acceptable/poor/failing
    "scores": {                    # per-dimension breakdown
        "completeness": 0.95,
        "format_compliance": 0.90,
        "coverage": 0.80,
        "clarity": 0.85,
        "validity": 1.0
    },
    "feedback": ["..."],           # actionable per-dimension feedback
    "pass_threshold": true,        # meets minimum bar
    "detected_format": "json"
}

Bonus: NLP Feedback Generation

generate_nlp_feedback() produces a natural-language summary identifying strongest/weakest dimensions and priority improvements.

Performance

Single submission: ~0.05ms
100 submissions: ~0.15s (well under the 10s requirement)
Pure Python, no ML dependencies — fast cold start

Test Coverage

35 test cases in test_scorer.py covering:

Format detection (6 tests)
JSON scoring with schema validation (5 tests)
Markdown scoring with section/structure checks (3 tests)
Code scoring with bracket balance (2 tests)
Text scoring with keyword coverage (2 tests)
Edge cases: empty, short, placeholder, format mismatch (6 tests)
Output structure validation (5 tests)
NLP feedback generation (2 tests)
Performance benchmarks (2 tests)
Custom weights and validation (2 tests)

Design Decisions

Zero dependencies beyond Python stdlib — no numpy, no sklearn, no NLTK. Keeps it lightweight and deployable anywhere.
Rubric-driven — scoring adapts to what the rubric specifies rather than hardcoded expectations.
Extensible — new dimensions can be added by implementing a scorer function and adding it to _DIMENSION_SCORERS.
Batch-first — score_batch() designed for bulk evaluation pipelines.

Closes #1

Implements a scoring engine that evaluates structured submissions (JSON, markdown, code, text) across 5 weighted dimensions: - Completeness (0.30): required fields, sections, min length - Format Compliance (0.20): format detection, structure quality - Coverage (0.25): keyword matching, vocabulary diversity - Clarity (0.15): sentence length, repetition, readability - Validity (0.10): JSON schema, bracket balance, syntax checks Features: - Auto-detect content format (JSON/markdown/code/text) - Weighted 0-1 score with quality rating - Per-dimension feedback with NLP summary generation - Batch scoring: 100 submissions in <0.2s - Configurable weights and pass thresholds - 35 test cases covering all formats and edge cases Closes Mint-Claw#1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Multi-Dimensional Quality Scoring for Structured Outputs#4

feat: Multi-Dimensional Quality Scoring for Structured Outputs#4
a827681306 wants to merge 1 commit intoMint-Claw:mainfrom
a827681306:feat/quality-scorer

a827681306 commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

a827681306 commented Feb 26, 2026

Summary

Architecture

Scoring Dimensions

Output Format

Bonus: NLP Feedback Generation

Performance

Test Coverage

Design Decisions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant