Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions content-blog/lm-eval-challenges.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: "Challenges in Language Model Evaluation"
date: 2024-04-20T00:00:00
description: "ICML 2024 Tutorial"
author: ["Lintang Sutawika", "Hailey Schoelkopf"]
draft: true
---

$$
\text{July 22nd, Time and Place TBA}
$$

NLP and Machine Learning rely on benchmarks and evaluation to accurately track progress in the field and assess the efficacy of new models and methodologies. For this reason, good evaluation practices and accurate reporting are crucial. How- ever, language models not only inherit the chal- lenges previously faced in benchmarking, but also introduce a slew of novel considerations which can make proper comparison across models dif- ficult, misleading, or near-impossible. In this tu- torial, we aim to bring attendees up to speed on the state of language model evaluation, and high- light current challenges in evaluating language model performance through discussing the vari- ous methods of evaluation, tasks and benchmarks commonly associated with evaluating progress in language model research. We will then discuss how these common pitfalls can be addressed and what considerations should be taken to enhance future work.

## Contact Info

- Lintang Sutawika: `lintang@eleuther.ai`
- Hailey Schoelkopf: `hailey@eleuther.ai`

## Schedule

TBA

## Reading List

TBA

## Citation

TBA
<!-- ```
@misc{2024PileT5,
author = {Lintang Sutawika and Hailey Schoelkopf},
title = {Challenges in Language Model Evaluation},
year = {2024},
url = {https://blog.eleuther.ai/lm-eval-challenges/},
note = {Blog post},
}
``` -->