Skip to content

Scorer head: informativeness_head training #24

@tartakovsky

Description

@tartakovsky

What

Train the informativeness_head to regress sentence informativeness on a 0..1 scale.

Label source

SQUINKY corpus (7,032 sentences with human ratings on formality, informativeness, and implicature):

  • Rating scale: 1–7.
  • Three axes per sentence (we use informativeness as the primary; formality and implicature are optional secondary targets).

Status

Dataset not yet in local cache. Download is the first step.

Concrete action:

  • Find the canonical source. The squinky==0.1.0 Python package on PyPI ships the corpus; alternative sources are the original Lazaridou paper supplementary or the GitHub mirror.
  • Inventory: CSV/TSV fields, label distribution, train/test split if provided.

Label transform

  • Normalize informativeness 1..7 → 0..1.
  • Keep formality and implicature as optional secondary targets for Phase 2 multi-task training (do not require them in Phase 1).

Head architecture

Same shape as specificity / vagueness heads. Phase 1: ridge / LightGBM / MLP.

Evaluation

  • Spearman correlation against held-out SQUINKY informativeness scores.
  • Mean absolute error.
  • High/low bucket accuracy.

Target: Spearman ≥ 0.55 on held-out SQUINKY for Phase 1.

Blocked by

  • Encoder integration
  • SQUINKY corpus download + prep (sub-issue)

Done

  • Trained head + metrics.
  • Calibration coefficients saved.

Reference

Plan section: Informativeness

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededresearchLinguistic, ML, or empirical researchroadmapPlanned feature on the roadmap

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions