Skip to content

fairintelligence/conformal_summer_ds

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

conformal_summer_ds

Collection of summarization datasets prepared for conformal summarization experiments. Each dataset is provided as Parquet files with calibration and test splits.

Dataset layout

conformal_summer_ds/
  CNNDailyMail/
    CNNDM_cal.parquet
    CNNDM_test.parquet
  CSDS/
    CSDS_cal.parquet
    CSDS_test.parquet
  ECTSum/
    ECT_cal.parquet
    ECT_test.parquet
  MTS/
    MTS_cal.parquet
    MTS_test.parquet
  TLDR/
    TLDR_cal.parquet
    TLDR_test.parquet
  TLDR_full/
    TLDRfull_cal.parquet
    TLDRfull_test.parquet

What the splits mean

  • *_cal.parquet is the calibration set used to fit the conformal wrapper (e.g., to compute nonconformity scores).
  • *_test.parquet is the test set used to evaluate conformalized summaries and coverage.

Notes

  • Files are stored in Parquet format. Load them with your preferred data stack (Pandas, PyArrow, DuckDB, etc.).
  • Column names can differ by dataset; inspect them after loading to see the input text fields and reference summaries.

Example (Python)

import pandas as pd

df_cal = pd.read_parquet("CNNDailyMail/CNNDM_cal.parquet")
df_test = pd.read_parquet("CNNDailyMail/CNNDM_test.parquet")

print(df_cal.columns)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published