This is the Problem we are facing at MeerKLASS OTF imaging run.
For this current DDFacet run, I had ~4500 MS's (32 MB each).
When the number of snapshots visibilities are greater than ~2500; the jobs fail.
My DDF job fails at this point
log.err
- 10:02:08 - CacheManager [260.0/283.7 296.4/320.0 50.0Gb] writing cache hash mslist.txt.ddfcache/Dirty.hash
- 10:02:13 - AsyncProcessPool [260.0/283.7 296.3/320.0 61.1Gb] FFT PSF: 2304 jobs complete, average single-core time 0.06s per job
- 10:02:14 - ClassFacetMachine [260.0/283.7 296.3/320.0 61.1Gb] Reading mslist.txt.ddfcache/DicoSumJonesNorm_FacetLabel.pickle from cache
- 10:02:24 - AsyncProcessPool [262.8/283.7 300.8/320.0 75.4Gb] Build PSF facet slices: 2304 jobs complete, average single-core time 0.03s per job
- 10:02:25 - ClassFacetMachine [262.9/283.7 317.5/320.0 75.4Gb] cutting PSF facet-slices of shape 309x309
- 10:02:26 - AsyncProcessPool [262.9/283.7 317.5/320.0 80.3Gb] Cut PSF facet slices: 2304 jobs complete, average single-core time 0.02s per job
- 10:02:26 - ClassFacetMachine [262.9/283.7 317.5/320.0 80.3Gb] Building Facets-PSF normalised by their maximum
- 10:05:19 - ClassFacetMachine [380.4/380.4 435.3/435.3 196.6Gb] Combining facets to stitched PSF image
log.out
Glue facets....................2304/2304 [==================================================] 100% - 0'34"
/dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 454G 163G 292G 36% /dev/shm
free -h
total used free shared buff/cache available
Mem: 503Gi 338Gi 1.1Gi 162Gi 163Gi 50Mi
Swap: 14Gi 14Gi 0B
Any suggestions for how can I solve this problem ?
This is the Problem we are facing at MeerKLASS OTF imaging run.
For this current DDFacet run, I had ~4500 MS's (32 MB each).
When the number of snapshots visibilities are greater than ~2500; the jobs fail.
My DDF job fails at this point
log.err
log.out
Glue facets....................2304/2304 [==================================================] 100% - 0'34"/dev/shm
free -h
Any suggestions for how can I solve this problem ?