Collection of summarization datasets prepared for conformal summarization experiments. Each dataset is provided as Parquet files with calibration and test splits.
conformal_summer_ds/
CNNDailyMail/
CNNDM_cal.parquet
CNNDM_test.parquet
CSDS/
CSDS_cal.parquet
CSDS_test.parquet
ECTSum/
ECT_cal.parquet
ECT_test.parquet
MTS/
MTS_cal.parquet
MTS_test.parquet
TLDR/
TLDR_cal.parquet
TLDR_test.parquet
TLDR_full/
TLDRfull_cal.parquet
TLDRfull_test.parquet
*_cal.parquetis the calibration set used to fit the conformal wrapper (e.g., to compute nonconformity scores).*_test.parquetis the test set used to evaluate conformalized summaries and coverage.
- Files are stored in Parquet format. Load them with your preferred data stack (Pandas, PyArrow, DuckDB, etc.).
- Column names can differ by dataset; inspect them after loading to see the input text fields and reference summaries.
import pandas as pd
df_cal = pd.read_parquet("CNNDailyMail/CNNDM_cal.parquet")
df_test = pd.read_parquet("CNNDailyMail/CNNDM_test.parquet")
print(df_cal.columns)