What
Train the informativeness_head to regress sentence informativeness on a 0..1 scale.
Label source
SQUINKY corpus (7,032 sentences with human ratings on formality, informativeness, and implicature):
- Rating scale: 1–7.
- Three axes per sentence (we use informativeness as the primary; formality and implicature are optional secondary targets).
Status
Dataset not yet in local cache. Download is the first step.
Concrete action:
- Find the canonical source. The
squinky==0.1.0 Python package on PyPI ships the corpus; alternative sources are the original Lazaridou paper supplementary or the GitHub mirror.
- Inventory: CSV/TSV fields, label distribution, train/test split if provided.
Label transform
- Normalize informativeness 1..7 → 0..1.
- Keep formality and implicature as optional secondary targets for Phase 2 multi-task training (do not require them in Phase 1).
Head architecture
Same shape as specificity / vagueness heads. Phase 1: ridge / LightGBM / MLP.
Evaluation
- Spearman correlation against held-out SQUINKY informativeness scores.
- Mean absolute error.
- High/low bucket accuracy.
Target: Spearman ≥ 0.55 on held-out SQUINKY for Phase 1.
Blocked by
- Encoder integration
- SQUINKY corpus download + prep (sub-issue)
Done
- Trained head + metrics.
- Calibration coefficients saved.
Reference
Plan section: Informativeness
What
Train the
informativeness_headto regress sentence informativeness on a 0..1 scale.Label source
SQUINKY corpus (7,032 sentences with human ratings on formality, informativeness, and implicature):
Status
Dataset not yet in local cache. Download is the first step.
Concrete action:
squinky==0.1.0Python package on PyPI ships the corpus; alternative sources are the original Lazaridou paper supplementary or the GitHub mirror.Label transform
Head architecture
Same shape as specificity / vagueness heads. Phase 1: ridge / LightGBM / MLP.
Evaluation
Target: Spearman ≥ 0.55 on held-out SQUINKY for Phase 1.
Blocked by
Done
Reference
Plan section: Informativeness