Container Index, Futures and Volatility
- Data Quality
- Auditing
- Reproducibility
- Informational Transparency
Data Lineage includes the data’s origins, what happens to it and where it moves over time. Data lineage gives visibility while greatly simplifying the ability to trace errors back to the root cause in a data analytics process.
Data Provenance refers to records of the inputs, entities, systems, and processes that influence data of interest, providing a historical record of the data and its origins. The generated evidence supports forensic activities such as data-dependency analysis, error/compromise detection and recovery, auditing, and compliance analysis.
SHA-256 hashes used properly can confirm both file integrity and authenticity Comparing hashes makes it possible to detect changes in files that would cause errors. The possibility of changes (errors) is proportional to the size of the file; the possibility of errors increase as the file becomes larger. source, Ubuntu Documentation
- Date formats are inconsistent
- An index masks underlying variation
- Results have been p-hacked
- Provenance is not documented
- Date formats are inconsistent
- Zeros replace missing values
- Author is untrustworthy
- Collection process is opaque
- Data assert unrealistic precision
"a record that describes entities and processes involved in producing and delivering or otherwise influencing that resource. Provenance provides a critical foundation for assessing authenticity, enabling trust, and allowing reproducibility. Provenance assertions are a form of contextual metadata and can themselves become important records with their own provenance.
Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness.
- Provenance of the workflow definition
- Provenance of a workflow run
- Provenance of data
2020 © FreightTrust and Clearing Corporation