ST-Bank data: most H&E images are around $16 \times 16$
After reviewing the ST-Bank dataset, I noted that most H&E images are cropped to approximately $16 \times 16$ pixels (matching the 55µm ST spot size). At this resolution, individual cells are reduced to just a few pixels, resulting in a total loss of the morphological detail necessary for contrastive learning. How can a model effectively learn meaningful biological representations when the input data lacks cellular features? This seems to be a fundamental bottleneck that undermines the validity of the H&E-ST training approach.
ST-Bank data: most H&E images are around$16 \times 16$
After reviewing the ST-Bank dataset, I noted that most H&E images are cropped to approximately$16 \times 16$ pixels (matching the 55µm ST spot size). At this resolution, individual cells are reduced to just a few pixels, resulting in a total loss of the morphological detail necessary for contrastive learning. How can a model effectively learn meaningful biological representations when the input data lacks cellular features? This seems to be a fundamental bottleneck that undermines the validity of the H&E-ST training approach.