Skip to content

feat: Multi-Dimensional Quality Scoring for Structured Outputs#4

Open
a827681306 wants to merge 1 commit intoMint-Claw:mainfrom
a827681306:feat/quality-scorer
Open

feat: Multi-Dimensional Quality Scoring for Structured Outputs#4
a827681306 wants to merge 1 commit intoMint-Claw:mainfrom
a827681306:feat/quality-scorer

Conversation

@a827681306
Copy link

Summary

Implements a multi-dimensional quality scoring engine for structured submissions (JSON, markdown, code, text), addressing issue #1.

Architecture

scorer.py — Pure Python, zero external dependencies. Two main classes:

  • QualityScorer — Entry point. Configurable weights, pass threshold. Supports single and batch scoring.
  • Rubric — Defines expectations: required fields, expected format, keywords, JSON schema, etc.

Auto-format detection via regex heuristics for JSON, markdown, code, and plain text.

Scoring Dimensions

Dimension Weight What it measures
Completeness 0.30 Required fields/sections present, minimum length
Format Compliance 0.20 Matches expected format, structural quality
Coverage 0.25 Keyword/topic coverage, vocabulary diversity
Clarity 0.15 Sentence length, repetition, readability
Validity 0.10 JSON schema, bracket balance, syntax patterns

Output Format

{
    "weighted_score": 0.8725,      # 0-1 weighted aggregate
    "quality_rating": "good",      # excellent/good/acceptable/poor/failing
    "scores": {                    # per-dimension breakdown
        "completeness": 0.95,
        "format_compliance": 0.90,
        "coverage": 0.80,
        "clarity": 0.85,
        "validity": 1.0
    },
    "feedback": ["..."],           # actionable per-dimension feedback
    "pass_threshold": true,        # meets minimum bar
    "detected_format": "json"
}

Bonus: NLP Feedback Generation

generate_nlp_feedback() produces a natural-language summary identifying strongest/weakest dimensions and priority improvements.

Performance

  • Single submission: ~0.05ms
  • 100 submissions: ~0.15s (well under the 10s requirement)
  • Pure Python, no ML dependencies — fast cold start

Test Coverage

35 test cases in test_scorer.py covering:

  • Format detection (6 tests)
  • JSON scoring with schema validation (5 tests)
  • Markdown scoring with section/structure checks (3 tests)
  • Code scoring with bracket balance (2 tests)
  • Text scoring with keyword coverage (2 tests)
  • Edge cases: empty, short, placeholder, format mismatch (6 tests)
  • Output structure validation (5 tests)
  • NLP feedback generation (2 tests)
  • Performance benchmarks (2 tests)
  • Custom weights and validation (2 tests)

Design Decisions

  • Zero dependencies beyond Python stdlib — no numpy, no sklearn, no NLTK. Keeps it lightweight and deployable anywhere.
  • Rubric-driven — scoring adapts to what the rubric specifies rather than hardcoded expectations.
  • Extensible — new dimensions can be added by implementing a scorer function and adding it to _DIMENSION_SCORERS.
  • Batch-firstscore_batch() designed for bulk evaluation pipelines.

Closes #1

Implements a scoring engine that evaluates structured submissions
(JSON, markdown, code, text) across 5 weighted dimensions:

- Completeness (0.30): required fields, sections, min length
- Format Compliance (0.20): format detection, structure quality
- Coverage (0.25): keyword matching, vocabulary diversity
- Clarity (0.15): sentence length, repetition, readability
- Validity (0.10): JSON schema, bracket balance, syntax checks

Features:
- Auto-detect content format (JSON/markdown/code/text)
- Weighted 0-1 score with quality rating
- Per-dimension feedback with NLP summary generation
- Batch scoring: 100 submissions in <0.2s
- Configurable weights and pass thresholds
- 35 test cases covering all formats and edge cases

Closes Mint-Claw#1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BOUNTY $10] Multi-Dimensional Quality Scoring for Structured Outputs

1 participant