A deep learning project for automatic cell nuclei and border segmentation in FLAME microscopy images. Built on the Jarvis framework for high-dimensional biomedical data management and model training.
This project implements a deeply-supervised U-Net segmentation pipeline for cell instance separation in fluorescence microscopy images. It addresses the classic "touching cells" problem by learning explicit cell boundary detection through an innovative border encoding scheme and multi-task training strategy.
Border-Aware Label Encoding: Instead of simple binary masks, labels encode three distinct regions around each cell:
- Cell nuclei (label 1): The cell interior
- Inner border (label 2): 1 pixel inside the cell boundary
- Outer border (label 3): 4 pixels outside into the background
This encoding enables training two complementary models:
- Cell model: Segments full cell bodies (labels 1+2) for morphology analysis
- Edge model: Detects boundaries (labels 2+3) for separating touching cells
Deep Supervision: The U-Net decoder provides intermediate supervision at multiple resolutions, improving gradient flow and boundary localization.
Hybrid Loss with Custom Weighting:
- Focal Loss (gamma-tunable): Handles extreme class imbalance (background dominates)
- Soft Dice Loss: Optimizes region overlap directly
- Custom sample weights: Up-weight hard examples and border regions during training
Multi-Window Input: Raw images are preprocessed into 11-channel histogram-equalized representations, capturing intensity information across multiple contrast windows.
flame/
├── main.py # Prediction entry point
├── configs-*.yml # Experiment configurations
├── NOTES.md # Development notes
├── prep/
│ └── prepare.py # Data preparation and preprocessing
├── comp/
│ ├── ymls/ # Database definitions
│ ├── defs/ # Custom transformation definitions
│ └── pred/ # Prediction pipeline configs
├── data/
│ ├── csvs/ # Metadata CSVs
│ └── ymls/ # Data source definitions
└── exps/
├── base/ # Initial experiments (cell nuclei)
├── edge/ # Edge detection experiments
├── v00-v05/ # Progressive dataset versions
│ ├── jmodels/ # Trained models and hyperparameter search
│ └── csvs/ # Results and statistics
Input data consists of:
- Raw images: TIFF files (1200x1200 or 256x256 patches)
- Ground truth: PNG masks with annotations
- Sample weights: Background region masks
The data preparation pipeline (prep/prepare.py) creates HDF5 arrays:
| Array | Description | Shape | Key |
|---|---|---|---|
dat |
Raw normalized image | (1, 1200, 1200) or (1, 256, 256) | dat-raw |
hst |
Multi-window histogram-equalized image | (1, H, W, 11) | hst-raw |
lbl |
Segmentation labels: 1=nuclei, 2=inner border, 3=outer border | (1, H, W) | lbl-raw |
dst |
Signed distance transform (negative for nuclei, positive for edges) | (1, H, W) | dst-raw |
| Version | Description | Status |
|---|---|---|
raw |
Initial accumulated training set | Failed - poor annotations |
v00 |
First improved dataset | Superseded |
v01 |
Cell nuclei + border labels | Superseded |
v02 |
Additional data (06-25 batch) | Superseded |
v03 |
TrainSetNew data | Superseded |
v04 |
256x256 patches with adaptive histogram | Superseded |
v05 |
Latest: 256x256 patches, adaptive CLAHE | Active |
Convert raw TIFF images to HDF5 format:
from prep.prepare import create_v01, create_hdr, join_hdr
# Create v05 dataset with CLAHE normalization
create_v01(
pattern='/data/raw/flame/zips/TrainSetNew/Raw/*.tif',
suffix='v05',
patch_size=(1, 256, 256),
method='adapthist'
)
# Create and attach metadata header
create_hdr(v='v05', csv='./csvs/meta.csv')
join_hdr(v='v05')Training uses the jarvis auto CLI workflow. The pipeline consists of: (1) creating the database, (2) generating hyperparameter sweeps, and (3) running training.
Generate the training database from the db.query paths in your config:
jarvis auto configs --file configs-v05.yml --only dbThis creates:
data/ymls/db-v05.yml— Database definition with sform patternsdata/csvs/db-v05.csv.gz— Header with cohorts, folds, and statistics
Prefix config values with $ to create sweep variables, then generate permutations:
jarvis auto configs --file configs-v05.yml --only sweepThis creates experiment directories under exps/v05/:
jmodels/csvs/hyper.csv— Central hyperparameter registryjmodels/jobs/exp-*.sh— Launch scripts for each combinationexp-XXXX-X/configs.yml— Specific config per experiment
Execute the generated shell script for an experiment:
bash exps/v05/jmodels/jobs/exp-XXXXX-0.shOr run manually:
JARVIS_AUTO_CONFIGS=exps/v05/exp-XXXXX-0/configs.yml \
python exps/v05/jmodels/model.pyRun all steps (DB creation + sweep generation) together:
jarvis auto configs --file configs-v05.ymlIf jarvis auto CLI commands fail, you can generate experimental configs manually:
Step 1: Create the template config
Start with the template (e.g., configs-v05.yml), which contains $-prefixed sweep variables:
mapped:
$y: [cell] # Sweep variable: cell or edge
$w: [cell]
models:
training:
configs:
models:
losses:
functions:
- name: foc
$gamma: [2.0, 2.5, 3.0] # Sweep variableStep 2: Instantiate configs manually
Copy the template and replace $ variables with concrete values. For each experiment combination:
# Create experiment directory
mkdir -p exps/v05/exp-myexp-0
cp configs-v05.yml exps/v05/exp-myexp-0/configs.ymlEdit the instantiated config to resolve sweep variables:
Template ($) |
Instantiated |
|---|---|
$y: [cell] |
y: cell |
$gamma: [2.0, 2.5, 3.0] |
gamma: 2.0 |
Step 3: Register in hyper.csv
Add your experiment to the registry (exps/v05/jmodels/csvs/hyper.csv):
project_id,output_dir,sampling,gamma,y,w,wgt-edge
flame,{root}/exps/v05/exp-myexp-0,5-class,2.0,cell,cell,2Step 4: Ensure DB exists
If database files don't exist, manually create them by copying from a working experiment:
cp exps/v05/exp-6Y3cMLLv-5/configs.yml exps/v05/exp-myexp-0/configs.yml
# Edit to change paths and resolve sweep variablesThen update the db section with your data paths and run training directly:
JARVIS_AUTO_CONFIGS=exps/v05/exp-myexp-0/configs.yml \
python exps/v05/jmodels/model.pyNote: The key difference:
- Template configs (
configs-*.yml) contain$sweep variables andsearch:section - Instantiated configs (
exps/*/exp-*/configs.yml) have resolved values and are ready for training
Reference experiments with working instantiated configs:
exps/v05/exp-6Y3cMLLv-5/configs.yml— Cell mode (gamma=2.0)exps/v05/exp-mteCRq1Y-5/configs.yml— Edge mode (gamma=2.0, wgt-edge=2)
| Section | Purpose |
|---|---|
db.query |
Data sources (HDF5 paths) |
db.setup |
Database initialization (stats, cohorts, folds) |
client |
Data loading (batch size, sampling, specs) |
xforms |
Augmentation (rotation, scaling, intensity) |
mapped |
Label remapping strategies |
models |
Model architecture (U-Net backbone, losses, optimizer) |
search |
Hyperparameter sweep variables (prefix with $) |
Run inference on new TIFF images:
python main.py --data /path/to/tiffsThe prediction pipeline:
- Loads images from
{data}/*/*.tif - Applies histogram equalization preprocessing
- Runs ensemble inference using trained models
- Outputs masks, region proposals (RPN), and false positive predictions (FPR)
- Backbone: Encoder-decoder U-Net pattern
- Input: Multi-window histogram equalized images (11 channels)
- Output: Binary or multi-class segmentation maps
- Normalization: BatchNorm + GELU (v05) or LayerNorm + LeakyReLU (base)
| Loss | Weight | Description |
|---|---|---|
| Focal Loss | 1.0 | Handles class imbalance |
| Soft Dice | 1.0 | Region overlap optimization |
- Dice Score (DSC): Segmentation overlap quality
- Hausdorff Distance (H95): Boundary accuracy
- Sensitivity (SEN): True positive rate
- PPV: Positive predictive value
The search: section in configs defines sweeps over:
| Parameter | Options | Description |
|---|---|---|
sampling |
5-class stratified | Cohort-based sampling weights |
gamma |
2.0, 2.5, 3.0 | Focal loss focusing parameter |
y |
cell, edge | Target remapping (nuclei vs border) |
w |
cell, edge | Sample weight strategies |
wgt-edge |
2, 5, 10 | Edge class weight |
Results are saved to exps/{version}/jmodels/csvs/hyper.csv.
Initial experiments using accumulated training set. Failed due to poor annotation quality in source data.
- Binary segmentation: nuclei vs background
- Distance transform for edge-aware training
- Multi-class segmentation
- Class 1: Cell nuclei
- Class 2: Inner border (1 pixel)
- Class 3: Outer border (4 pixels)
- Classes 4-5: Additional cohorts for sampling
- Image size: 256x256 patches
- Preprocessing: Adaptive histogram equalization (CLAHE)
- Input channels: 11 multi-window intensity channels
- Batch size: 128
- Augmentation: Random affine (scale 0.8-1.2, rotation ±0.5 rad)
Labels are generated using a signed distance transform to create border regions around cell nuclei:
| Label | Value | Distance from Edge | Description |
|---|---|---|---|
| Background | 0 | >4 px outside | Extra-cellular space |
| Nuclei | 1 | Original mask | Cell interior |
| Inner Border | 2 | [-1, 0] px | 1 pixel inside cell boundary |
| Outer Border | 3 | (0, 4] px | 4 pixels outside cell boundary |
Generation logic (prep/prepare.py):
# Signed distance transform (negative inside nuclei, positive outside)
dst = ndimage.distance_transform_edt(1 - lbl) # distance to nuclei
dst[lbl == 1] = ndimage.distance_transform_edt(lbl)[lbl == 1] * -1 # negative inside
# Create border classes
lbl[(dst <= 0) & (dst >= -1)] = 2 # inner border: 1px into nuclei
lbl[(dst <= 4) & (dst > 0)] = 3 # outer border: 4px into backgroundThe border classes enable two complementary segmentation strategies via hyperparameter sweep:
| Mode | Target Remapping (y) |
Weight Remapping (w) |
Use Case |
|---|---|---|---|
| cell | lbl {1,2} → 1 (nuclei + inner border) |
All pixels = 1 | Segment full cell bodies for counting/measurement |
| edge | lbl {2,3} → 1 (inner + outer border) |
Borders = 2, rest = 1 | Detect cell boundaries for separating touching cells |
Why this design?
- Touching cells problem: Cells in microscopy often touch; simple binary segmentation merges them
- Cell model: Identifies cell regions (good for morphology analysis)
- Edge model: Identifies boundaries (good for instance separation)
- Ensemble inference: Combines both predictions for accurate instance segmentation
| File | Purpose |
|---|---|
main.py |
Prediction script entry point |
prep/prepare.py |
Data preparation functions |
comp/ymls/db-*.yml |
Database definitions per version |
configs-v05.yml |
Active training configuration |
NOTES.md |
Development history and notes |
This project depends on jarvis-md, which provides all core deep learning and data management functionality.
| Package | Version | Purpose |
|---|---|---|
| Python | >=3.13 | Language runtime |
| TensorFlow | >=2.20.0 | Deep learning framework |
| tf-keras | >=2.20.1 | Keras API |
| NumPy | any | Array operations |
| SciPy | any | Image processing (ndimage) |
| scikit-image | any | Image I/O and preprocessing |
| Pandas | any | Data manipulation |
| h5py | any | HDF5 array storage |
| PyYAML | >=5.2 | Configuration files |
| matplotlib | any | Visualization |
The FLAME project adds no additional Python dependencies beyond jarvis-md. All imports (jarvis.*, tensorflow, numpy, scipy, skimage, pandas) are covered by the base jarvis-md package.
Since FLAME has no additional dependencies beyond jarvis-md, use the jarvis-md virtual environment directly:
# Navigate to jarvis-md repo
cd <path-to-jarvis-md>
# Create venv if it doesn't exist
uv venv .venv
# Install jarvis-md with all dependencies
uv pip install -e ".[all]"
# Use this venv for FLAME work
cd <path-to-flame>
source <path-to-jarvis-md>/.venv/bin/activateIf you need isolation between projects:
cd <path-to-flame>
# Create local venv
uv venv .venv
source .venv/bin/activate
# Install jarvis-md as editable dependency
uv pip install -e <path-to-jarvis-md>Test your setup:
python -c "import tensorflow as tf; print(f'TF: {tf.__version__}')"
python -c "from jarvis.utils.db import DB; print('Jarvis: OK')"
python -c "from prep.prepare import create_v01; print('FLAME: OK')"