Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Welcome to the ITKIT documentation! ITKIT is a user-friendly toolkit built on `S
- **[itk_patch](itk_patch.md)** - Patch extraction
- **[itk_aug](itk_aug.md)** - Data augmentation
- **[itk_extract](itk_extract.md)** - Label extraction
- **[itk_combine](itk_combine.md)** - Label merging and intersection
- **[itk_convert](itk_convert.md)** - Format conversion

### Advanced Topics
Expand Down
57 changes: 57 additions & 0 deletions docs/itk_combine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# itk_combine

Combine multiple label folders by intersecting filenames and merging labels according to ordered mapping rules. This tool is useful when you have multiple specialized segmentations for the same cases and want to create a unified label map.

## Usage

```bash
itk_combine --source <name>=<folder> --map <mapping_rule> <dest_folder> [options]
```

## Parameters

- `--source`: Specify a label source in the format `name=/path/to/folder`. Can be specified multiple times for different sources.
- `--map`: Specify a mapping rule in the format `<source_name>:<source_labels>-><target_label>`.
- `<source_name>` must match one of the names defined in `--source`.
- `<source_labels>` can be a single integer or a comma-separated list of integers.
- Multiple `--map` rules are allowed. **Priority is determined by order**: the first rule that matches a voxel determines its value in the output.
- `dest_folder`: Destination folder for the combined label files.
- `--mp`: Enable multiprocessing.
- `--workers`: Number of worker processes (defaults to half of CPU cores).

## Mapping Priority and Logic

1. **Intersection**: Only files that exist in **all** specified source folders (with the same base name) will be processed.

2. **Validation**: For each file, the tool ensures that the image size and spacing are identical across all sources. If a mismatch is found, the process will fail.

3. **Merging**:

- The output label map is initialized to 0 (Background).
- Rules are applied sequentially in the order they appear in the command line.
- Once a voxel is assigned a non-zero value, it will not be overwritten by subsequent rules. This allows for clear priority management between overlapping sources.

## Example

Suppose you have:

- `Source A`: Organ segmentations (1: Liver, 2: Spleen)
- `Source B`: Tumor segmentations (1: Liver Tumor)

To combine them into a single map where Background=0, Liver=1, Spleen=2, and Liver Tumor=3 (with tumor taking priority over the organ label):

```bash
itk_combine \
--source organs=/path/to/organs \
--source tumors=/path/to/tumors \
--map tumors:1->3 \
--map organs:1->1 \
--map organs:2->2 \
/path/to/combined_output \
--mp
```

## Output

- Combined label maps (normalized to `.mha` format and `uint8` data type).
- `meta.json`: Standard ITKIT metadata file containing size, spacing, origin, and unique classes for each combined file.
214 changes: 214 additions & 0 deletions itkit/process/itk_combine.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
import argparse
import os
from dataclasses import dataclass
from pathlib import Path

import numpy as np
import SimpleITK as sitk

from itkit.process.base_processor import BaseITKProcessor
from itkit.process.metadata_models import SeriesMetadata


@dataclass(frozen=True)
class SourceSpec:
name: str
folder: Path


@dataclass(frozen=True)
class MappingRule:
source_name: str
source_labels: tuple[int, ...]
target_label: int


def _parse_sources(source_args: list[str]) -> list[SourceSpec]:
sources: list[SourceSpec] = []
seen_names: set[str] = set()
for item in source_args:
if "=" not in item:
raise ValueError(f"Invalid source format: {item}. Expected name=/path/to/labels")
Copy link

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message text is inconsistent with the expected format described in the error message itself. The message says "Expected name=/path/to/labels" but the parsing splits on "=" to get name and folder. Consider changing the message to "Expected name=/path/to/folder" to match the actual parameter description in the documentation and help text.

Suggested change
raise ValueError(f"Invalid source format: {item}. Expected name=/path/to/labels")
raise ValueError(f"Invalid source format: {item}. Expected name=/path/to/folder")

Copilot uses AI. Check for mistakes.
name, folder = item.split("=", 1)
name = name.strip()
if not name:
raise ValueError(f"Invalid source name in: {item}")
if name in seen_names:
raise ValueError(f"Duplicate source name: {name}")
folder_path = Path(folder).expanduser().resolve()
if not folder_path.exists() or not folder_path.is_dir():
raise ValueError(f"Source folder not found: {folder_path}")
sources.append(SourceSpec(name=name, folder=folder_path))
seen_names.add(name)
return sources


def _parse_mapping_rule(rule: str) -> MappingRule:
if "->" not in rule or ":" not in rule:
raise ValueError(f"Invalid mapping rule: {rule}. Expected <source>:<src_labels>-><target>")
left, target_str = rule.split("->", 1)
source_name, labels_str = left.split(":", 1)
source_name = source_name.strip()
labels_str = labels_str.strip()
target_str = target_str.strip()
if not source_name or not labels_str or not target_str:
raise ValueError(f"Invalid mapping rule: {rule}. Expected <source>:<src_labels>-><target>")

try:
target_label = int(target_str)
except ValueError as exc:
raise ValueError(f"Invalid target label in rule: {rule}") from exc

label_parts = [p.strip() for p in labels_str.split(",") if p.strip()]
if not label_parts:
raise ValueError(f"No source labels specified in rule: {rule}")

source_labels: list[int] = []
for part in label_parts:
try:
source_labels.append(int(part))
except ValueError as exc:
raise ValueError(f"Invalid source label '{part}' in rule: {rule}") from exc

return MappingRule(source_name=source_name, source_labels=tuple(source_labels), target_label=target_label)


class CombineProcessor(BaseITKProcessor):
def __init__(
self,
sources: list[SourceSpec],
dest_folder: Path,
mapping_rules: list[MappingRule],
mp: bool = False,
workers: int | None = None,
):
super().__init__(task_description="Combining labels", mp=mp, workers=workers)
self.sources = sources
self.dest_folder = dest_folder
self.mapping_rules = mapping_rules
self.source_index = {src.name: idx for idx, src in enumerate(self.sources)}

def get_items_to_process(self) -> list[tuple[str, list[str]]]:
source_files: dict[str, dict[str, str]] = {}
for src in self.sources:
files = {p.name: str(p) for p in src.folder.glob("*.mha")}
Copy link

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file discovery only looks for .mha files, but the base class SUPPORTED_EXTENSIONS includes '.mha', '.mhd', '.nii', '.nii.gz', '.nrrd'. While the documentation mentions that output is normalized to .mha format, there's no mention that input files must also be .mha. Consider either:

  1. Using a broader glob pattern to support all SUPPORTED_EXTENSIONS for input files, or
  2. Clearly documenting in itk_combine.md that only .mha input files are supported.

This hardcoded limitation is inconsistent with other processors in the codebase that typically support multiple input formats.

Suggested change
files = {p.name: str(p) for p in src.folder.glob("*.mha")}
files: dict[str, str] = {}
for ext in self.SUPPORTED_EXTENSIONS:
pattern = f"*{ext}" if ext.startswith(".") else f"*.{ext}"
for p in src.folder.glob(pattern):
files[p.name] = str(p)

Copilot uses AI. Check for mistakes.
source_files[src.name] = files

common_names = None
for files in source_files.values():
names = set(files.keys())
common_names = names if common_names is None else common_names & names
if not common_names:
return []

items = []
for name in sorted(common_names):
paths = [source_files[src.name][name] for src in self.sources]
items.append((name, paths))
return items

def process_one(self, args: tuple[str, list[str]]) -> SeriesMetadata | None:
name, paths = args
images = [sitk.ReadImage(p) for p in paths]
base_size = images[0].GetSize()
base_spacing = images[0].GetSpacing()

for idx, image in enumerate(images[1:], start=1):
if image.GetSize() != base_size:
raise ValueError(f"Size mismatch for {name}: {paths[0]} vs {paths[idx]}")
if not np.allclose(image.GetSpacing(), base_spacing):
raise ValueError(f"Spacing mismatch for {name}: {paths[0]} vs {paths[idx]}")

arrays = [sitk.GetArrayFromImage(img) for img in images]
output = np.zeros(arrays[0].shape, dtype=np.uint8)

for rule in self.mapping_rules:
src_idx = self.source_index[rule.source_name]
src_arr = arrays[src_idx]
mask = np.isin(src_arr, rule.source_labels)
mask = mask & (output == 0)
output[mask] = rule.target_label

out_image = sitk.GetImageFromArray(output)
out_image.CopyInformation(images[0])

output_path = self.dest_folder / name
output_path.parent.mkdir(parents=True, exist_ok=True)
sitk.WriteImage(out_image, str(output_path), useCompression=True)

return SeriesMetadata.from_sitk_image(out_image, name)


def parse_args():
parser = argparse.ArgumentParser(
prog="itk_combine",
description=(
"Combine multiple label folders by intersecting filenames and merging labels "
"according to ordered mapping rules."
),
)
parser.add_argument(
"-i", "--source",
action="append",
required=True,
help="Label source in form name=/path/to/label_folder (repeatable)",
)
parser.add_argument(
"--map",
dest="mapping_rules",
action="append",
required=True,
help="Mapping rule in form `<source>:<src_labels>-><target>`, e.g., `A:1,2->3` (repeatable)",
)
parser.add_argument(
"-o", "--dest-folder",
type=Path,
help="Destination folder for combined labels",
)
parser.add_argument("--mp", action="store_true", help="Enable multiprocessing")
parser.add_argument("--workers", type=int, default=None, help="Number of worker processes")
return parser.parse_args()


def main():
args = parse_args()

sources = _parse_sources(args.source)
rules = [_parse_mapping_rule(rule) for rule in args.mapping_rules]

source_names = {s.name for s in sources}
for rule in rules:
if rule.source_name not in source_names:
raise ValueError(f"Mapping rule references unknown source: {rule.source_name}")

if not rules:
raise ValueError("At least one mapping rule is required.")

Comment on lines +184 to +186
Copy link

Copilot AI Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for empty rules is redundant because the --map argument is marked as required=True in the argument parser. This validation can never be reached since argparse will fail earlier if no --map arguments are provided. Consider removing this redundant check.

Suggested change
if not rules:
raise ValueError("At least one mapping rule is required.")

Copilot uses AI. Check for mistakes.
dest_folder = args.dest_folder.expanduser().resolve()
os.makedirs(dest_folder, exist_ok=True)

print("Combining label sources:")
for src in sources:
print(f" - {src.name}: {src.folder}")
print("Mapping rules (ordered, earlier has higher priority):")
for rule in rules:
print(f" - {rule.source_name}: {list(rule.source_labels)} -> {rule.target_label}")
print(f"Output: {dest_folder}")
print(f"Multiprocessing: {args.mp} | Workers: {args.workers}")

processor = CombineProcessor(
sources=sources,
dest_folder=dest_folder,
mapping_rules=rules,
mp=args.mp,
workers=args.workers,
)

processor.process("Combining labels")
processor.save_meta(dest_folder / "meta.json")

print("Combine completed.")


if __name__ == "__main__":
main()
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ itk_aug = "itkit.process.itk_aug:main"
itk_extract = "itkit.process.itk_extract:main"
itk_convert = "itkit.process.itk_convert:main"
itk_evaluate = "itkit.process.itk_evaluate:main"
itk_combine = "itkit.process.itk_combine:main"
itkit-app = "itkit.gui.app:main"

[tool.setuptools.packages.find]
Expand Down
Loading
Loading