Skip to content
/ hallmark Public

Data management tool---reproducibility is the hallmark of the scientific method

License

Notifications You must be signed in to change notification settings

l6a/hallmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reproducibility is the hallmark of the scientific method.

Modern science has become so complex that many science projects rely on multiple software packages to work in unison, resulting in networks of data products along the analyses. Versioning and managing these data products are essential in making modern data- and computation-intensive science reproducible.

Motivated by the Event Horizon Telescope (EHT)'s observational data calibration pipelines and theory data analyses tools, hallmark is a lightweight package designed to version control and manage data products in a complex workflow. It provides a simple abstraction and a uniform Application Programming Interface (API) on top of different backend technologies such as POSIX file system, object storage, globus, iRODS, stream, etc. By using hallmark with other packages such as yukon and banyan in Project Laniakea, researchers can utilize computing infrastructures in a global scale to accelerate their science.

ParaFrame

When performing large scale parameter surveys and constructing simulation libraries, it is common to encode parameter values in the file paths. Example include Ma+0.94_i70/sed_Rh160.h5. hallmark provides a subclassed pandas DataFrame, called ParaFrame, to decode file paths back to proper parameters, and put the result into a pandas DataFrame. ParaFrame uses python parse to parse the file paths. Because parse is the opposite of format, this means the format string used to generate the surveys and libraries in the first place can be reused. In addition, ParaFrame has a nice interface to perform filter, which makes parameter selection much easier than pure pandas.

Tutorial

Examples of using ParaFrame can be found in the Jupyter Notebook demos/ParaFrame.ipynb.

About

Data management tool---reproducibility is the hallmark of the scientific method

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 5