feat: add evaluation callbacks, metrics and analysis#9
feat: add evaluation callbacks, metrics and analysis#9
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds image and vector metric functions, a new StainAnalyzer class with multi-metric comparisons and CSV export, three new Lightning callbacks (TilesExport, AnalysisExport, WSIAssembler) plus a DenormalizationCallback, and registers callbacks plus top-level Changes
Sequence Diagram(s)sequenceDiagram
participant Trainer as Lightning Trainer
participant DataModule as Trainer.datamodule
participant WSI as WSIAssembler
participant Buffers as SlideBuffers
participant Temp as TempDir
participant TIFF as Output TIFF
Trainer->>WSI: on_predict_start()
WSI->>DataModule: request slides metadata
DataModule-->>WSI: slides list (path, level, extents, mpp)
WSI->>Temp: create temp directory
loop For each prediction batch
Trainer->>WSI: on_predict_batch_end(batch)
WSI->>Buffers: ensure slide buffer open
loop For each tile in batch
WSI->>Buffers: _place_tile(tile_image, xy)
Buffers->>Buffers: running-average blend into buffers
end
end
Trainer->>WSI: on_predict_end()
WSI->>Buffers: close active slide (flush)
Buffers->>Temp: write raw buffers
Temp->>TIFF: convert buffers → pyramidal TIFF (embed mpp)
TIFF-->>Trainer: final slide artifacts
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the evaluation capabilities of the stain normalization pipeline by introducing a robust set of callbacks and metrics. The changes enable detailed analysis of model predictions, from individual tile inspection to comprehensive whole-slide image reconstruction and quantitative assessment of image quality and stain consistency. This provides a more thorough understanding of the model's performance and facilitates better decision-making during development and deployment. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive set of features for evaluation, including new callbacks for exporting analysis results and tiles, a WSI assembler, and a suite of image and stain vector metrics. The implementation is generally solid, particularly the WSIAssembler which robustly handles large image processing. I've identified a few areas for improvement, including a potential bug in the metric calculation logic, suggestions for improving robustness, and an optimization in one of the metric functions. My detailed comments are below.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
stain_normalization/analysis/analyzer.py (2)
91-97: Clarify theis_pairedsemantics in documentation.The
is_pairedflag on line 97 determines whether paired metrics (SSIM, PCC, LAB PSNR) are computed. Currently, it'sTrueonly when a per-callreferenceis passed, not when using the fixedself._ref_img.This behavior makes sense (paired metrics require spatially aligned images), but it's subtle. Consider adding a brief docstring note explaining when paired metrics are computed vs. skipped.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stain_normalization/analysis/analyzer.py` around lines 91 - 97, The comment points out subtlety in is_paired semantics: is_paired is set based only on whether compare() received a per-call reference (reference is not None) rather than whether a stored self._ref_img exists; clarify this in docs. Update the docstring for the Analyzer class and/or the compare() method to explicitly state that paired metrics (SSIM, PCC, LAB PSNR) are computed only when a per-call reference is provided (i.e., when is_paired is True), and not when using the fixed self._ref_img, and give rationale that paired metrics require spatial alignment; reference the is_paired variable, compare(), __init__, and self._ref_img in the docstring so future readers understand the behavior.
111-114: Consider replacingassertwith explicit validation.Using
assertfor runtime validation can be problematic since assertions are disabled when Python runs with-O(optimized mode). Ifestimate_stain_vectorsreturnsNone(e.g., image has too much background), this will silently fail in optimized mode.Proposed fix
- assert ( - ref_vectors is not None - ) # fails if reference has too much background and no valid stain vectors - img_vectors = estimate_stain_vectors(image) + if ref_vectors is None: + raise ValueError( + "Reference stain vectors unavailable (image may have too much background)" + ) + img_vectors = estimate_stain_vectors(image) + if img_vectors is None: + raise ValueError( + "Could not estimate stain vectors for image (may have too much background)" + )Similarly for line 139:
- assert ref_nmi is not None + if ref_nmi is None: + raise ValueError("Reference NMI unavailable")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stain_normalization/analysis/analyzer.py` around lines 111 - 114, Replace the runtime asserts that check for None with explicit validation and error handling: where ref_vectors is checked (around the block that currently uses "assert ref_vectors is not None") and where img_vectors is obtained from estimate_stain_vectors(image) (and the similar check around line 139), detect a None return and raise a clear exception (e.g., ValueError) or return a controlled error result with a descriptive message indicating the reference/image had too much background and no valid stain vectors; ensure you reference the estimate_stain_vectors call and the ref_vectors/img_vectors variables so the handler is applied in both locations and consider logging the condition before raising to aid debugging.stain_normalization/callbacks/wsi_assembler.py (1)
105-116: Consider narrowing the exception type or using logging.The bare
Exceptioncatch (flagged by static analysis) is understandable for robustness, but it may inadvertently swallow unexpected errors likeKeyboardInterrupt(though that's aBaseException). Consider:
- Using
logging.exception()instead oftraceback.print_exc()for consistent logging- Optionally re-raising after recording the failure if you want to surface critical errors
Optional improvement using logging
+import logging + +logger = logging.getLogger(__name__) + ... - except Exception: - print(f"ERROR: Failed to save slide '{slide_name}'") - traceback.print_exc() + except Exception: + logger.exception("Failed to save slide '%s'", slide_name) self._failed_slides.append(slide_name)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stain_normalization/callbacks/wsi_assembler.py` around lines 105 - 116, Replace the bare except in the block that calls self._save_slide(slide_name, self._active) with a captured exception (e.g., except Exception as e:) and use a module-level logger to record it via logger.exception("Failed to save slide %s", slide_name) instead of print + traceback; append slide_name to self._failed_slides as before and, if you want critical errors propagated, re-raise the exception after logging. Also ensure the finally cleanup references (self._active.result_buffer, self._active.count_buffer, self._active.temp_dir.cleanup(), and setting self._active / self._active_name to None) are safe if self._active is None (use guards or getattr checks) so cleanup does not raise another exception.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@configs/default.yaml`:
- Around line 59-62: The TilesExport and WSIAssembler callback configs
(callbacks.tiles_export and callbacks.wsi_assembler) are defined but not
registered with the trainer, so their Lightning hooks will never run; register
them by adding ${callbacks.tiles_export} and ${callbacks.wsi_assembler} to the
trainer callbacks list (the same place where ${callbacks.model_checkpoint},
${callbacks.early_stopping}, ${callbacks.analysis_export} are listed) so
trainer.callbacks includes these two entries, or alternatively add a short
comment/documentation in the config indicating they must be enabled via CLI
overrides if you prefer conditional registration.
In `@stain_normalization/callbacks/analysis_export.py`:
- Around line 26-42: The on_test_batch_end signature incorrectly types outputs
as list[torch.Tensor]; update its annotation to match the project's Outputs
alias (a single batched Tensor) so it reads outputs: Outputs (or torch.Tensor)
and keep the body that indexes outputs[b] unchanged; ensure the import or
reference to the Outputs type alias is added if needed and verify
tensor_to_image(outputs[b]) and any static type checks now accept the
batched-tensor usage.
In `@stain_normalization/callbacks/wsi_assembler.py`:
- Around line 1-6: Run the Ruff autoformatter on
stain_normalization/callbacks/wsi_assembler.py (uvx ruff format --fix
stain_normalization/callbacks/wsi_assembler.py) or manually apply the same fixes
so imports (tempfile, traceback, dataclass, Path, Any) and surrounding
whitespace follow the project's Ruff rules, ensuring proper import ordering,
spacing, and a trailing newline to satisfy the CI Ruff check for the
wsi_assembler module.
- Around line 84-89: The count buffer currently created as count_buf (np.memmap
with dtype=np.uint8) can overflow if >255 tiles overlap; change the dtype to
np.uint16 (or larger as needed) when creating count_buf in the WSI assembler to
prevent wraparound, update any dependent code that reads/writes count_buf (e.g.,
increments or dtype assumptions) to handle the new integer width, and ensure any
memory calculations or memmap shape logic around count_buf remain valid with the
larger element size.
---
Nitpick comments:
In `@stain_normalization/analysis/analyzer.py`:
- Around line 91-97: The comment points out subtlety in is_paired semantics:
is_paired is set based only on whether compare() received a per-call reference
(reference is not None) rather than whether a stored self._ref_img exists;
clarify this in docs. Update the docstring for the Analyzer class and/or the
compare() method to explicitly state that paired metrics (SSIM, PCC, LAB PSNR)
are computed only when a per-call reference is provided (i.e., when is_paired is
True), and not when using the fixed self._ref_img, and give rationale that
paired metrics require spatial alignment; reference the is_paired variable,
compare(), __init__, and self._ref_img in the docstring so future readers
understand the behavior.
- Around line 111-114: Replace the runtime asserts that check for None with
explicit validation and error handling: where ref_vectors is checked (around the
block that currently uses "assert ref_vectors is not None") and where
img_vectors is obtained from estimate_stain_vectors(image) (and the similar
check around line 139), detect a None return and raise a clear exception (e.g.,
ValueError) or return a controlled error result with a descriptive message
indicating the reference/image had too much background and no valid stain
vectors; ensure you reference the estimate_stain_vectors call and the
ref_vectors/img_vectors variables so the handler is applied in both locations
and consider logging the condition before raising to aid debugging.
In `@stain_normalization/callbacks/wsi_assembler.py`:
- Around line 105-116: Replace the bare except in the block that calls
self._save_slide(slide_name, self._active) with a captured exception (e.g.,
except Exception as e:) and use a module-level logger to record it via
logger.exception("Failed to save slide %s", slide_name) instead of print +
traceback; append slide_name to self._failed_slides as before and, if you want
critical errors propagated, re-raise the exception after logging. Also ensure
the finally cleanup references (self._active.result_buffer,
self._active.count_buffer, self._active.temp_dir.cleanup(), and setting
self._active / self._active_name to None) are safe if self._active is None (use
guards or getattr checks) so cleanup does not raise another exception.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 67fce064-bf2d-49cd-8fb7-b74f1a787e18
📒 Files selected for processing (11)
configs/default.yamlstain_normalization/analysis/__init__.pystain_normalization/analysis/analyzer.pystain_normalization/callbacks/__init__.pystain_normalization/callbacks/_base.pystain_normalization/callbacks/analysis_export.pystain_normalization/callbacks/tiles_export.pystain_normalization/callbacks/wsi_assembler.pystain_normalization/metrics/__init__.pystain_normalization/metrics/image_metrics.pystain_normalization/metrics/vector_metrics.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
stain_normalization/callbacks/wsi_assembler.py (1)
84-89:⚠️ Potential issue | 🟠 MajorPrevent overlap-counter overflow in the blending path.
At Line 86,
count_bufferusesnp.uint8; once overlap exceeds 255, Line 172 wraps and Line 163 blends with corrupted counts.Proposed fix
count_buf = np.memmap( Path(tmp.name) / "count.raw", - dtype=np.uint8, + dtype=np.uint16, mode="w+", shape=(h, w), ) @@ count_img = pyvips.Image.rawload( - str(count_path), meta.extent_x, meta.extent_y, 1 + str(count_path), meta.extent_x, meta.extent_y, 1, format="ushort" )Also applies to: 172-172, 198-200
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stain_normalization/callbacks/wsi_assembler.py` around lines 84 - 89, The overlap counter memmap is created with dtype=np.uint8 (count_buf) which overflows past 255 and corrupts the blending math; change the memmap dtype to a wider integer (e.g., np.uint32 or np.int32) when creating count_buf via np.memmap and ensure all places that increment or read from this buffer (the blending/overlap logic that uses count_buf to compute averages) use the wider dtype consistently (avoid implicit casts back to uint8), and if downstream code expects a smaller type, cast only after computing blended results; update any code that writes/reads count_buf in the blending path so accumulation and division use the widened type to prevent wraparound.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@stain_normalization/callbacks/wsi_assembler.py`:
- Around line 151-153: The slice math can yield non-positive or out-of-range
heights/widths when x/y are outside the target region; before computing h and w
in the WSI assembly code, validate and clamp the coordinates: ensure ey > y and
ex > x, clamp h = max(0, min(tile.shape[0], ey - y)) and w = max(0,
min(tile.shape[1], ex - x)), and skip the tile assignment if h == 0 or w == 0
(or adjust x/y so the slice is within bounds). Update the block that computes h,
w and slices tile (variables tile, x, y, ex, ey) to perform these checks and
only perform tile = tile[:h, :w] when h and w are positive and valid.
---
Duplicate comments:
In `@stain_normalization/callbacks/wsi_assembler.py`:
- Around line 84-89: The overlap counter memmap is created with dtype=np.uint8
(count_buf) which overflows past 255 and corrupts the blending math; change the
memmap dtype to a wider integer (e.g., np.uint32 or np.int32) when creating
count_buf via np.memmap and ensure all places that increment or read from this
buffer (the blending/overlap logic that uses count_buf to compute averages) use
the wider dtype consistently (avoid implicit casts back to uint8), and if
downstream code expects a smaller type, cast only after computing blended
results; update any code that writes/reads count_buf in the blending path so
accumulation and division use the widened type to prevent wraparound.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ce89273f-1d24-4e6c-9d70-fc73c9eb4ec8
📒 Files selected for processing (1)
stain_normalization/callbacks/wsi_assembler.py
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
stain_normalization/callbacks/wsi_assembler.py (1)
146-159:⚠️ Potential issue | 🟠 MajorStill reject negative tile coordinates before slicing.
The new
max(0, ...)guard handles tiles that start past the right or bottom edge, but negativexorystill flow intoresult_buffer[y:y+h, x:x+w]. NumPy interprets those indices from the end of the array, so a bad coordinate can write into the wrong region instead of being clipped.Proposed fix
def _place_tile(self, tile: np.ndarray[Any, Any], x: int, y: int) -> None: """Place a predicted tile into the active slide buffer with overlap averaging.""" assert self._active is not None sb = self._active ex, ey = sb.meta.extent_x, sb.meta.extent_y + + if x < 0 or y < 0 or x >= ex or y >= ey: + raise ValueError( + f"Tile coordinates out of bounds: x={x}, y={y}, extent=({ex}, {ey})" + ) - h = max(0, min(tile.shape[0], ey - y)) - w = max(0, min(tile.shape[1], ex - x)) - if h == 0 or w == 0: + h = min(tile.shape[0], ey - y) + w = min(tile.shape[1], ex - x) + if h <= 0 or w <= 0: return tile = tile[:h, :w]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stain_normalization/callbacks/wsi_assembler.py` around lines 146 - 159, The _place_tile method fails to reject negative tile coordinates before slicing, allowing negative x or y to be interpreted from the end of the NumPy arrays; update _place_tile to clamp/adjust x and y (and correspondingly adjust tile offsets) so that x>=0 and y>=0 before computing region = sb.result_buffer[y:y+h, x:x+w] and count = sb.count_buffer[y:y+h, x:x+w]; specifically compute tx = max(0, -x) and ty = max(0, -y), then advance the tile/view by ty: and tx:, reduce h and w accordingly, and shift x and y to max(0,x)/max(0,y) so slicing never receives negative indices (refer to function _place_tile and variables x, y, h, w, tile, result_buffer, count_buffer).
🧹 Nitpick comments (1)
stain_normalization/callbacks/analysis_export.py (1)
37-43: Pass a stableimage_idinto the analyzers.Right now
results.csvcontains anonymous rows, so you cannot map an outlier back toslide_name/xyor the PNG exports when debugging. Threading a stable ID through bothcompare()calls makes the export much more usable.Proposed fix
"""Computes metrics for each sample and accumulates results.""" for b in range(len(outputs)): - original_img = batch[1][b]["original_image"].astype("uint8") - modified_img = (batch[1][b]["modified_image"] * 255).astype("uint8") + slide_name = batch[1][b]["slide_name"] + xy = batch[1][b]["xy"] + image_id = f"{slide_name}/{xy}" + original_img = batch[1][b]["original_image"].astype("uint8") + modified_img = (batch[1][b]["modified_image"] * 255).astype("uint8") predicted_img = self.tensor_to_image(outputs[b]) - self.mod_analyzer.compare(modified_img, paired_image=original_img) - self.pred_analyzer.compare(predicted_img, paired_image=original_img) + self.mod_analyzer.compare( + modified_img, image_id=image_id, paired_image=original_img + ) + self.pred_analyzer.compare( + predicted_img, image_id=image_id, paired_image=original_img + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stain_normalization/callbacks/analysis_export.py` around lines 37 - 43, The loop in analysis_export.py calls mod_analyzer.compare and pred_analyzer.compare without any stable identifier, making results.csv rows anonymous; update the loop (where tensor_to_image is used to build predicted_img from outputs and original_img/modified_img are read from batch) to compute or extract a stable image_id (e.g., batch[1][b]["image_id"] or compose from slide_name and xy coordinates, falling back to a deterministic index-based id) and pass that image_id into both mod_analyzer.compare(...) and pred_analyzer.compare(...) as an explicit parameter (e.g., image_id=image_id); also ensure the Analyzer.compare implementations accept and propagate this image_id into CSV/exports.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@stain_normalization/analysis/analyzer.py`:
- Around line 174-182: get_statistics currently calls
df[numeric_cols].describe(...) which raises on zero rows; update get_statistics
to detect empty results (self.results or df[numeric_cols]) and return an empty
DataFrame with the same expected columns/structure (or a describe()-shaped empty
result) instead of calling describe() when there are no samples so save_csv
(which always calls get_statistics) won't fail during teardown; reference the
get_statistics method, the self.results attribute and the numeric_cols selection
when implementing the guard.
- Around line 43-48: The constructor currently expands any falsy metrics (e.g.,
an explicit empty list) into all metrics; change the assignment in __init__ so
only None maps to self.AVAILABLE_METRICS, e.g. set self.metrics =
self.AVAILABLE_METRICS if metrics is None else metrics, so passing [] is
preserved; update any related docstring or comments on the metrics parameter in
the Analyzer class to reflect this behavior.
In `@stain_normalization/callbacks/wsi_assembler.py`:
- Around line 58-69: The code keys slide metadata by Path(row.path).stem which
causes collisions for identical filenames in different directories; change the
key to a unique identifier derived from the full slide path (e.g., use the full
path string or a sanitized/hashed version of Path(row.path).as_posix()) when
populating _slide_meta in the loop over slides_df.iterrows(), keep constructing
_SlideMeta from the same row fields, and update any other places that rely on
slide_path.stem (e.g., the batch metadata usage in predict_dataset.py) to use
the same unique identifier so output filenames and lookup keys remain consistent
and collision-free.
In `@stain_normalization/metrics/image_metrics.py`:
- Around line 47-53: Update the docstring in
stain_normalization/metrics/image_metrics.py for the function that returns the
"Mean L* brightness of an image in CIE Lab color space" by replacing the Unicode
en-dash in the range text `0–100` with an ASCII hyphen so it reads `0-100`;
ensure the docstring line that currently contains `Mean L* value (0–100 scale,
higher = brighter).` is changed accordingly to satisfy the linter.
---
Duplicate comments:
In `@stain_normalization/callbacks/wsi_assembler.py`:
- Around line 146-159: The _place_tile method fails to reject negative tile
coordinates before slicing, allowing negative x or y to be interpreted from the
end of the NumPy arrays; update _place_tile to clamp/adjust x and y (and
correspondingly adjust tile offsets) so that x>=0 and y>=0 before computing
region = sb.result_buffer[y:y+h, x:x+w] and count = sb.count_buffer[y:y+h,
x:x+w]; specifically compute tx = max(0, -x) and ty = max(0, -y), then advance
the tile/view by ty: and tx:, reduce h and w accordingly, and shift x and y to
max(0,x)/max(0,y) so slicing never receives negative indices (refer to function
_place_tile and variables x, y, h, w, tile, result_buffer, count_buffer).
---
Nitpick comments:
In `@stain_normalization/callbacks/analysis_export.py`:
- Around line 37-43: The loop in analysis_export.py calls mod_analyzer.compare
and pred_analyzer.compare without any stable identifier, making results.csv rows
anonymous; update the loop (where tensor_to_image is used to build predicted_img
from outputs and original_img/modified_img are read from batch) to compute or
extract a stable image_id (e.g., batch[1][b]["image_id"] or compose from
slide_name and xy coordinates, falling back to a deterministic index-based id)
and pass that image_id into both mod_analyzer.compare(...) and
pred_analyzer.compare(...) as an explicit parameter (e.g., image_id=image_id);
also ensure the Analyzer.compare implementations accept and propagate this
image_id into CSV/exports.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7099bc42-ac2b-4fa5-abc6-3053c1efae97
📒 Files selected for processing (6)
configs/default.yamlstain_normalization/analysis/analyzer.pystain_normalization/callbacks/analysis_export.pystain_normalization/callbacks/tiles_export.pystain_normalization/callbacks/wsi_assembler.pystain_normalization/metrics/image_metrics.py
🚧 Files skipped from review as they are similar to previous changes (1)
- configs/default.yaml
There was a problem hiding this comment.
🧹 Nitpick comments (1)
stain_normalization/analysis/analyzer.py (1)
152-157: Consider replacingassertwith explicit checks for robustness.The
assert ref_nmi is not None(and similar forref_brightnessat line 163) could fail ifself.metricsis modified after initialization to include metrics that weren't precomputed. While this is unlikely in normal usage, replacing asserts with explicitValueErrorchecks would provide clearer error messages in production.♻️ Optional: Replace asserts with explicit checks
if "nmi" in self.metrics: - assert ref_nmi is not None + if ref_nmi is None: + raise ValueError( + "NMI metric requires a reference image with 'nmi' in metrics at init time." + ) img_nmi = compute_nmi(image)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stain_normalization/analysis/analyzer.py` around lines 152 - 157, Replace the assert checks that guard precomputed reference values with explicit runtime checks that raise informative exceptions: in analyzer.py inside the block that handles "nmi" (where self.metrics is consulted and compute_nmi(image) is called, and result["ref_nmi"], result["nmi"], result["nmi_diff"] are set), replace "assert ref_nmi is not None" with an explicit check that raises a ValueError (or similar) with a clear message if ref_nmi is missing; do the same for the "brightness" branch that checks ref_brightness so callers get a descriptive error instead of an AssertionError.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@stain_normalization/analysis/analyzer.py`:
- Around line 152-157: Replace the assert checks that guard precomputed
reference values with explicit runtime checks that raise informative exceptions:
in analyzer.py inside the block that handles "nmi" (where self.metrics is
consulted and compute_nmi(image) is called, and result["ref_nmi"],
result["nmi"], result["nmi_diff"] are set), replace "assert ref_nmi is not None"
with an explicit check that raises a ValueError (or similar) with a clear
message if ref_nmi is missing; do the same for the "brightness" branch that
checks ref_brightness so callers get a descriptive error instead of an
AssertionError.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c00411a5-8f2c-43e1-833b-f54b50031618
📒 Files selected for processing (2)
stain_normalization/analysis/analyzer.pystain_normalization/metrics/image_metrics.py
There was a problem hiding this comment.
Where is this used? Use torchmetrics instead of design your own. Look here how to integrate it. Also, if you need some custom ones you can implement your own, like here.
There was a problem hiding this comment.
This is used to analyze the predicted images after normalization and also for outside analysis of the standalone dataset trough script that handle loading the data.
Some of the metrics could be easily rewriten to work with tensors but a lot of them are also using rational staining and skimage color space transformation witch operates in numpy so this would need to be all re implemented or find different library that can do it with tensors.
There was a problem hiding this comment.
It seems that for the color space transformation on tensors there is library called kornia, but the stain vectors would still needed to be done separately after transforming images back to numpy. Normlay validation uses the SSIM and MeanAbsoluteError from torchmetrics. I'm using this only on test/predict to get extra info mainly those stain vectors.
There was a problem hiding this comment.
Actually, that's not a problem. It's quite common to convert a tensor to numpy for a specific operation and then convert it back to a tensor. This does introduce a slight overhead, but it's negligible.
There was a problem hiding this comment.
I implemented it with tensors but also kept the old version for standalone dataset analysis and if i need to see exact metrics per tile data.
There was a problem hiding this comment.
I don't see a reason why to keep the old version. You can obtain the per tile data from the metric data = metric(preds, targets)
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
stain_normalization/callbacks/_base.py (1)
17-20: Consider caching device-transferred tensors.
self.std.to(device)andself.mean.to(device)are called on everydenormalize()invocation. For batched processing, this creates redundant tensor copies. Caching the device-transferred tensors (e.g., after first use) could reduce overhead.Optional optimization
def denormalize(self, tensor: torch.Tensor) -> torch.Tensor: """Reverse normalization: tensor → [0, 1] float.""" device = tensor.device - return (tensor * self.std.to(device)) + self.mean.to(device) + if not hasattr(self, '_cached_device') or self._cached_device != device: + self._std = self.std.to(device) + self._mean = self.mean.to(device) + self._cached_device = device + return (tensor * self._std) + self._mean🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stain_normalization/callbacks/_base.py` around lines 17 - 20, The denormalize method repeatedly calls self.std.to(device) and self.mean.to(device) creating redundant copies; modify the class (where denormalize is defined) to cache device-transferred tensors (e.g., store self._mean_device and self._std_device keyed by device or keep last_device/last_std/last_mean) and in denormalize() use the cached tensors when tensor.device matches the cached device, updating the cache only when the device changes so you avoid repeated .to(device) calls.stain_normalization/callbacks/analysis_export.py (1)
56-59: Consider guarding MLflow calls against missing active run.If this callback runs outside an active MLflow run context,
mlflow.log_artifactwill raise an exception. A defensive check could improve robustness:if mlflow.active_run(): for f in mod_dir.glob("*"): mlflow.log_artifact(str(f), artifact_path="analysis_metrics/modified") for f in pred_dir.glob("*"): mlflow.log_artifact(str(f), artifact_path="analysis_metrics/predicted")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stain_normalization/callbacks/analysis_export.py` around lines 56 - 59, The current loop calls mlflow.log_artifact unconditionally and will raise if there is no active MLflow run; wrap the artifact logging in a guard that checks mlflow.active_run() before iterating mod_dir and pred_dir so logging only happens when a run exists, e.g., check mlflow.active_run() and then iterate mod_dir.glob("*") and pred_dir.glob("*") to call mlflow.log_artifact; ensure both artifact_path values ("analysis_metrics/modified" and "analysis_metrics/predicted") are preserved and skip or optionally warn when no active run is present.stain_normalization/callbacks/wsi_assembler.py (1)
106-117: Broad exception catch is reasonable here but consider narrowing.Ruff flags the bare
Exceptioncatch (BLE001). While narrowing to specific exceptions (e.g.,IOError,pyvipserrors) would be cleaner, the current approach ensures cleanup happens regardless of failure mode. The error is logged with full traceback, and the slide is recorded as failed.If you want to satisfy the linter while maintaining robustness:
Optional refinement
- except Exception: + except (IOError, OSError, RuntimeError) as e: print(f"ERROR: Failed to save slide '{slide_name}'") traceback.print_exc() self._failed_slides.append(slide_name)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@stain_normalization/callbacks/wsi_assembler.py` around lines 106 - 117, Replace the bare "except Exception:" in the block that calls self._save_slide(slide_name, self._active) with either (preferred) a narrowed exception tuple that lists the expected failure modes (e.g., except (IOError, OSError, pyvips.Error) as e:) and keep the same logging, failure recording (self._failed_slides.append(slide_name)), and cleanup; or (if you must catch everything) change to "except Exception as e: # noqa: BLE001" so Ruff is satisfied while preserving the traceback logging and failure handling; keep references to _save_slide, self._active, self._failed_slides, and the traceback.print_exc() call unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@stain_normalization/metrics/vector_metrics.py`:
- Around line 39-58: The function compare_vectors currently returns
was_swapped=False when either input contains NaN which encodes an actual pairing
decision; instead return an explicit unknown state (e.g., was_swapped=None) or
add a separate pairing_defined flag so callers can detect and skip export/
pairing; update compare_vectors' return shape and annotation to allow
Optional[bool] (or include pairing_defined: bool) and adjust downstream usage
(e.g., stain_normalization/analysis/analyzer.py where the pairing is consumed)
to check for None/ pairing_defined before treating was_swapped as a real
decision.
- Around line 60-74: The current pairing uses OD-space dot products to decide
was_swapped but then reports Lab delta E, causing inconsistent selection;
instead convert vecs1 and vecs2 into Lab with _od_to_lab, compute total delta E
for the straight pairing (delta_e76(lab1_a, lab2_a) + delta_e76(lab1_b, lab2_b))
and for the swapped pairing (delta_e76(lab1_a, lab2_b) + delta_e76(lab1_b,
lab2_a)), set was_swapped to True if the swapped total is smaller, then set
vecs2_paired accordingly and return the individual delta_e76 values and
was_swapped; update references to vecs1, vecs2, _od_to_lab, delta_e76,
was_swapped, vecs2_paired.
---
Nitpick comments:
In `@stain_normalization/callbacks/_base.py`:
- Around line 17-20: The denormalize method repeatedly calls self.std.to(device)
and self.mean.to(device) creating redundant copies; modify the class (where
denormalize is defined) to cache device-transferred tensors (e.g., store
self._mean_device and self._std_device keyed by device or keep
last_device/last_std/last_mean) and in denormalize() use the cached tensors when
tensor.device matches the cached device, updating the cache only when the device
changes so you avoid repeated .to(device) calls.
In `@stain_normalization/callbacks/analysis_export.py`:
- Around line 56-59: The current loop calls mlflow.log_artifact unconditionally
and will raise if there is no active MLflow run; wrap the artifact logging in a
guard that checks mlflow.active_run() before iterating mod_dir and pred_dir so
logging only happens when a run exists, e.g., check mlflow.active_run() and then
iterate mod_dir.glob("*") and pred_dir.glob("*") to call mlflow.log_artifact;
ensure both artifact_path values ("analysis_metrics/modified" and
"analysis_metrics/predicted") are preserved and skip or optionally warn when no
active run is present.
In `@stain_normalization/callbacks/wsi_assembler.py`:
- Around line 106-117: Replace the bare "except Exception:" in the block that
calls self._save_slide(slide_name, self._active) with either (preferred) a
narrowed exception tuple that lists the expected failure modes (e.g., except
(IOError, OSError, pyvips.Error) as e:) and keep the same logging, failure
recording (self._failed_slides.append(slide_name)), and cleanup; or (if you must
catch everything) change to "except Exception as e: # noqa: BLE001" so Ruff is
satisfied while preserving the traceback logging and failure handling; keep
references to _save_slide, self._active, self._failed_slides, and the
traceback.print_exc() call unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0187134f-5a48-4165-8781-770ea73d8d8f
📒 Files selected for processing (7)
stain_normalization/callbacks/__init__.pystain_normalization/callbacks/_base.pystain_normalization/callbacks/analysis_export.pystain_normalization/callbacks/tiles_export.pystain_normalization/callbacks/wsi_assembler.pystain_normalization/metrics/image_metrics.pystain_normalization/metrics/vector_metrics.py
🚧 Files skipped from review as they are similar to previous changes (2)
- stain_normalization/metrics/image_metrics.py
- stain_normalization/callbacks/tiles_export.py
There was a problem hiding this comment.
Actually, that's not a problem. It's quite common to convert a tensor to numpy for a specific operation and then convert it back to a tensor. This does introduce a slight overhead, but it's negligible.
| normalize_mean: list[float] | None = None, | ||
| normalize_std: list[float] | None = None, |
There was a problem hiding this comment.
I don't see a reason why to keep the old version. You can obtain the per tile data from the metric data = metric(preds, targets)
|
Callbacks:
TilesExport: Saves individual test/predict tiles as PNG files. Used mostly in early stages for getting some samples for visual inspection.
AnalysisExport: Runs StainAnalyzer (data PR) on test batches, computes all metrics comparing modified vs original and predicted vs original tiles. Saves results as CSV and logs summary statistics to MLflow.
WSIAssembler: Assembles predicted tiles back into full whole-slide images as pyramid TIFFs. Handles tile overlap using running average blending. Buffers one slide at a time using memmap, becase of space concerns.
Metrics:
Image metrics (metrics/image_metrics.py):
NMI (Normalized Median Intensity): Ratio of median intensity to 95th percentile. Measures relative brightness of the image.
PCC (Pearson Correlation Coefficient): Pixel-level linear correlation between two images. High PCC means the overall structure is preserved.
LAB Brightness PSNR: Peak signal-to-noise ratio on the L* (lightness) channel in Lab space. Measures how well brightness is preserved after normalization.
LAB Mean — Mean L* brightness in CIE Lab space. Useful for detecting scanner exposure differences.
Stain vector distance (metrics/vector_metrics.py):
Uses CIE76 Delta E to compare estimated stain vectors (hematoxylin, eosin). Stain vectors are converted from optical density to Lab color space. Comparison uses only chromaticity (a*, b*) with L*=0, so brightness differences do not affect the result — the metric focuses purely on dye/stain differences.
Summary by CodeRabbit
New Features
Chores