GreyModel accepts external data in two main ways:
- folder-first production import
- Hugging Face import for public pretraining
Expected image types include:
.npy.pgm.png.jpg.jpeg.bmp.tif.tiff
Optional sidecar files use the same base name with .json.
Example:
station_01/
good/
sample_001.npy
sample_001.json
bad/
sample_002.npy
sample_002.json
Supported sidecar keys:
station_idproduct_familygeometry_modeaccept_rejectdefect_tagsboxesmask_pathsplitcapture_metadatasource_datasetreview_state
Each manifest row is a DatasetRecord.
Core fields:
sample_idimage_pathstation_idproduct_familygeometry_modeaccept_rejectdefect_tagsboxesmask_pathsplitcapture_metadatasource_datasetreview_state
dataset_index.json stores framework metadata:
- manifest version
- ontology version
- root dir
- manifest path
- split path
- ontology path
- hard-negative path
- station configs
- grouping keys
- split assignments
- metadata
Each station config includes:
canvas_shapestation_idgeometry_modepad_valuenormalization_meannormalization_stdtile_sizetile_strideadapter_idreject_thresholddefect_thresholds
ontology.json currently records:
- ontology version
- defect tags
- product families
- stations
Use python -m greymodel dataset ontology ... to inspect it.
Batch prediction writes hierarchical PredictionRecord rows with:
sample_idstation_idaccept_rejectreject_scorepredicted_labelprimary_labelprimary_scoretop_defect_familydefect_family_probsevidencesplitdefect_scalemetadata
Failure bundles persist FailureRecord JSON with:
- failure id
- stage
- variant
- status
- error type and message
- traceback path
- offending sample IDs
- latest checkpoint metadata
- partial artifact paths
- resume metadata