For data sets like multi-MNIST and small ImageNet, we preprocess the data and cache by writing to disk so that future calls can load it into memory. More generally, we need to save and load data when its function requires preprocessing and the data fits in memory to be loaded.
We should decide on a specific option such as pickle, np.savez, or hdf5.