FLAME: Cell Segmentation

A deep learning project for automatic cell nuclei and border segmentation in FLAME microscopy images. Built on the Jarvis framework for high-dimensional biomedical data management and model training.

Overview

This project implements a deeply-supervised U-Net segmentation pipeline for cell instance separation in fluorescence microscopy images. It addresses the classic "touching cells" problem by learning explicit cell boundary detection through an innovative border encoding scheme and multi-task training strategy.

Key Innovations

Border-Aware Label Encoding: Instead of simple binary masks, labels encode three distinct regions around each cell:

Cell nuclei (label 1): The cell interior
Inner border (label 2): 1 pixel inside the cell boundary
Outer border (label 3): 4 pixels outside into the background

This encoding enables training two complementary models:

Cell model: Segments full cell bodies (labels 1+2) for morphology analysis
Edge model: Detects boundaries (labels 2+3) for separating touching cells

Deep Supervision: The U-Net decoder provides intermediate supervision at multiple resolutions, improving gradient flow and boundary localization.

Hybrid Loss with Custom Weighting:

Focal Loss (gamma-tunable): Handles extreme class imbalance (background dominates)
Soft Dice Loss: Optimizes region overlap directly
Custom sample weights: Up-weight hard examples and border regions during training

Multi-Window Input: Raw images are preprocessed into 11-channel histogram-equalized representations, capturing intensity information across multiple contrast windows.

Project Structure

flame/
├── main.py                  # Prediction entry point
├── configs-*.yml            # Experiment configurations
├── NOTES.md                 # Development notes
├── prep/
│   └── prepare.py           # Data preparation and preprocessing
├── comp/
│   ├── ymls/                # Database definitions
│   ├── defs/                # Custom transformation definitions
│   └── pred/                # Prediction pipeline configs
├── data/
│   ├── csvs/                # Metadata CSVs
│   └── ymls/                # Data source definitions
└── exps/
    ├── base/                # Initial experiments (cell nuclei)
    ├── edge/                # Edge detection experiments
    ├── v00-v05/             # Progressive dataset versions
    │   ├── jmodels/         # Trained models and hyperparameter search
    │   └── csvs/            # Results and statistics

Data Pipeline

Raw Data Format

Input data consists of:

Raw images: TIFF files (1200x1200 or 256x256 patches)
Ground truth: PNG masks with annotations
Sample weights: Background region masks

Preprocessed Arrays

The data preparation pipeline (prep/prepare.py) creates HDF5 arrays:

Array	Description	Shape	Key
`dat`	Raw normalized image	(1, 1200, 1200) or (1, 256, 256)	dat-raw
`hst`	Multi-window histogram-equalized image	(1, H, W, 11)	hst-raw
`lbl`	Segmentation labels: 1=nuclei, 2=inner border, 3=outer border	(1, H, W)	lbl-raw
`dst`	Signed distance transform (negative for nuclei, positive for edges)	(1, H, W)	dst-raw

Data Versions

Version	Description	Status
`raw`	Initial accumulated training set	Failed - poor annotations
`v00`	First improved dataset	Superseded
`v01`	Cell nuclei + border labels	Superseded
`v02`	Additional data (06-25 batch)	Superseded
`v03`	TrainSetNew data	Superseded
`v04`	256x256 patches with adaptive histogram	Superseded
`v05`	Latest: 256x256 patches, adaptive CLAHE	Active

Usage

1. Data Preparation

Convert raw TIFF images to HDF5 format:

from prep.prepare import create_v01, create_hdr, join_hdr

# Create v05 dataset with CLAHE normalization
create_v01(
    pattern='/data/raw/flame/zips/TrainSetNew/Raw/*.tif',
    suffix='v05',
    patch_size=(1, 256, 256),
    method='adapthist'
)

# Create and attach metadata header
create_hdr(v='v05', csv='./csvs/meta.csv')
join_hdr(v='v05')

2. Training

Training uses the jarvis auto CLI workflow. The pipeline consists of: (1) creating the database, (2) generating hyperparameter sweeps, and (3) running training.

Step 2a: Create Database

Generate the training database from the db.query paths in your config:

jarvis auto configs --file configs-v05.yml --only db

This creates:

data/ymls/db-v05.yml — Database definition with sform patterns
data/csvs/db-v05.csv.gz — Header with cohorts, folds, and statistics

Step 2b: Generate Hyperparameter Sweep

Prefix config values with $ to create sweep variables, then generate permutations:

jarvis auto configs --file configs-v05.yml --only sweep

This creates experiment directories under exps/v05/:

jmodels/csvs/hyper.csv — Central hyperparameter registry
jmodels/jobs/exp-*.sh — Launch scripts for each combination
exp-XXXX-X/configs.yml — Specific config per experiment

Step 2c: Run Training

Execute the generated shell script for an experiment:

bash exps/v05/jmodels/jobs/exp-XXXXX-0.sh

Or run manually:

JARVIS_AUTO_CONFIGS=exps/v05/exp-XXXXX-0/configs.yml \
    python exps/v05/jmodels/model.py

Full Pipeline (One Command)

Run all steps (DB creation + sweep generation) together:

jarvis auto configs --file configs-v05.yml

Fallback: Manual Config Generation

If jarvis auto CLI commands fail, you can generate experimental configs manually:

Step 1: Create the template config

Start with the template (e.g., configs-v05.yml), which contains $-prefixed sweep variables:

mapped:
  $y: [cell]          # Sweep variable: cell or edge
  $w: [cell]

models:
  training:
    configs:
      models:
        losses:
          functions:
          - name: foc
            $gamma: [2.0, 2.5, 3.0]   # Sweep variable

Step 2: Instantiate configs manually

Copy the template and replace $ variables with concrete values. For each experiment combination:

# Create experiment directory
mkdir -p exps/v05/exp-myexp-0
cp configs-v05.yml exps/v05/exp-myexp-0/configs.yml

Edit the instantiated config to resolve sweep variables:

Template (`$`)	Instantiated
`$y: [cell]`	`y: cell`
`$gamma: [2.0, 2.5, 3.0]`	`gamma: 2.0`

Step 3: Register in hyper.csv

Add your experiment to the registry (exps/v05/jmodels/csvs/hyper.csv):

project_id,output_dir,sampling,gamma,y,w,wgt-edge
flame,{root}/exps/v05/exp-myexp-0,5-class,2.0,cell,cell,2

Step 4: Ensure DB exists

If database files don't exist, manually create them by copying from a working experiment:

cp exps/v05/exp-6Y3cMLLv-5/configs.yml exps/v05/exp-myexp-0/configs.yml
# Edit to change paths and resolve sweep variables

Then update the db section with your data paths and run training directly:

JARVIS_AUTO_CONFIGS=exps/v05/exp-myexp-0/configs.yml \
    python exps/v05/jmodels/model.py

Note: The key difference:

Template configs (configs-*.yml) contain $ sweep variables and search: section
Instantiated configs (exps/*/exp-*/configs.yml) have resolved values and are ready for training

Reference experiments with working instantiated configs:

exps/v05/exp-6Y3cMLLv-5/configs.yml — Cell mode (gamma=2.0)
exps/v05/exp-mteCRq1Y-5/configs.yml — Edge mode (gamma=2.0, wgt-edge=2)

Key Training Config Sections

Section	Purpose
`db.query`	Data sources (HDF5 paths)
`db.setup`	Database initialization (stats, cohorts, folds)
`client`	Data loading (batch size, sampling, specs)
`xforms`	Augmentation (rotation, scaling, intensity)
`mapped`	Label remapping strategies
`models`	Model architecture (U-Net backbone, losses, optimizer)
`search`	Hyperparameter sweep variables (prefix with `$`)

3. Prediction

Run inference on new TIFF images:

python main.py --data /path/to/tiffs

The prediction pipeline:

Loads images from {data}/*/*.tif
Applies histogram equalization preprocessing
Runs ensemble inference using trained models
Outputs masks, region proposals (RPN), and false positive predictions (FPR)

Model Architecture

Network Design

Backbone: Encoder-decoder U-Net pattern
Input: Multi-window histogram equalized images (11 channels)
Output: Binary or multi-class segmentation maps
Normalization: BatchNorm + GELU (v05) or LayerNorm + LeakyReLU (base)

Loss Functions

Loss	Weight	Description
Focal Loss	1.0	Handles class imbalance
Soft Dice	1.0	Region overlap optimization

Metrics

Dice Score (DSC): Segmentation overlap quality
Hausdorff Distance (H95): Boundary accuracy
Sensitivity (SEN): True positive rate
PPV: Positive predictive value

Hyperparameter Search

The search: section in configs defines sweeps over:

Parameter	Options	Description
`sampling`	5-class stratified	Cohort-based sampling weights
`gamma`	2.0, 2.5, 3.0	Focal loss focusing parameter
`y`	cell, edge	Target remapping (nuclei vs border)
`w`	cell, edge	Sample weight strategies
`wgt-edge`	2, 5, 10	Edge class weight

Results are saved to exps/{version}/jmodels/csvs/hyper.csv.

Experiment Versions

Base/Edge (Failed)

Initial experiments using accumulated training set. Failed due to poor annotation quality in source data.

v00-v01 (2-Class)

Binary segmentation: nuclei vs background
Distance transform for edge-aware training

v02-v05 (3-5 Class)

Multi-class segmentation
Class 1: Cell nuclei
Class 2: Inner border (1 pixel)
Class 3: Outer border (4 pixels)
Classes 4-5: Additional cohorts for sampling

Current (v05)

Image size: 256x256 patches
Preprocessing: Adaptive histogram equalization (CLAHE)
Input channels: 11 multi-window intensity channels
Batch size: 128
Augmentation: Random affine (scale 0.8-1.2, rotation ±0.5 rad)

Label Encoding & Border Logic

Multi-Class Label Definition

Labels are generated using a signed distance transform to create border regions around cell nuclei:

Label	Value	Distance from Edge	Description
Background	0	>4 px outside	Extra-cellular space
Nuclei	1	Original mask	Cell interior
Inner Border	2	[-1, 0] px	1 pixel inside cell boundary
Outer Border	3	(0, 4] px	4 pixels outside cell boundary

Generation logic (prep/prepare.py):

# Signed distance transform (negative inside nuclei, positive outside)
dst = ndimage.distance_transform_edt(1 - lbl)              # distance to nuclei
dst[lbl == 1] = ndimage.distance_transform_edt(lbl)[lbl == 1] * -1  # negative inside

# Create border classes
lbl[(dst <= 0) & (dst >= -1)] = 2   # inner border: 1px into nuclei
lbl[(dst <= 4) & (dst >   0)] = 3   # outer border: 4px into background

Training Modes: Cell vs Edge

The border classes enable two complementary segmentation strategies via hyperparameter sweep:

Mode	Target Remapping (`y`)	Weight Remapping (`w`)	Use Case
cell	`lbl {1,2} → 1` (nuclei + inner border)	All pixels = 1	Segment full cell bodies for counting/measurement
edge	`lbl {2,3} → 1` (inner + outer border)	Borders = 2, rest = 1	Detect cell boundaries for separating touching cells

Why this design?

Touching cells problem: Cells in microscopy often touch; simple binary segmentation merges them
Cell model: Identifies cell regions (good for morphology analysis)
Edge model: Identifies boundaries (good for instance separation)
Ensemble inference: Combines both predictions for accurate instance segmentation

Key Files

File	Purpose
`main.py`	Prediction script entry point
`prep/prepare.py`	Data preparation functions
`comp/ymls/db-*.yml`	Database definitions per version
`configs-v05.yml`	Active training configuration
`NOTES.md`	Development history and notes

Dependencies

This project depends on jarvis-md, which provides all core deep learning and data management functionality.

Required Dependencies (from jarvis-md)

Package	Version	Purpose
Python	>=3.13	Language runtime
TensorFlow	>=2.20.0	Deep learning framework
tf-keras	>=2.20.1	Keras API
NumPy	any	Array operations
SciPy	any	Image processing (ndimage)
scikit-image	any	Image I/O and preprocessing
Pandas	any	Data manipulation
h5py	any	HDF5 array storage
PyYAML	>=5.2	Configuration files
matplotlib	any	Visualization

FLAME-Specific Additions

The FLAME project adds no additional Python dependencies beyond jarvis-md. All imports (jarvis.*, tensorflow, numpy, scipy, skimage, pandas) are covered by the base jarvis-md package.

Installation & Environment Setup

Recommended: Shared Virtual Environment

Since FLAME has no additional dependencies beyond jarvis-md, use the jarvis-md virtual environment directly:

# Navigate to jarvis-md repo
cd <path-to-jarvis-md>

# Create venv if it doesn't exist
uv venv .venv

# Install jarvis-md with all dependencies
uv pip install -e ".[all]"

# Use this venv for FLAME work
cd <path-to-flame>
source <path-to-jarvis-md>/.venv/bin/activate

Alternative: Project-Specific Environment

If you need isolation between projects:

cd <path-to-flame>

# Create local venv
uv venv .venv
source .venv/bin/activate

# Install jarvis-md as editable dependency
uv pip install -e <path-to-jarvis-md>

Environment Verification

Test your setup:

python -c "import tensorflow as tf; print(f'TF: {tf.__version__}')"
python -c "from jarvis.utils.db import DB; print('Jarvis: OK')"
python -c "from prep.prepare import create_v01; print('FLAME: OK')"

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
comp		comp
data		data
exps		exps
prep		prep
.gitignore		.gitignore
NOTES.md		NOTES.md
README.md		README.md
configs-base.yml		configs-base.yml
configs-edge.yml		configs-edge.yml
configs-v00.yml		configs-v00.yml
configs-v01.yml		configs-v01.yml
configs-v02.yml		configs-v02.yml
configs-v03.yml		configs-v03.yml
configs-v04.yml		configs-v04.yml
configs-v05.yml		configs-v05.yml
main.py		main.py

Folders and files

Latest commit

History

Repository files navigation

FLAME: Cell Segmentation

Overview

Key Innovations

Project Structure

Data Pipeline

Raw Data Format

Preprocessed Arrays

Data Versions

Usage

1. Data Preparation

2. Training

Step 2a: Create Database

Step 2b: Generate Hyperparameter Sweep

Step 2c: Run Training

Full Pipeline (One Command)

Fallback: Manual Config Generation

Key Training Config Sections

3. Prediction

Model Architecture

Network Design

Loss Functions

Metrics

Hyperparameter Search

Experiment Versions

Base/Edge (Failed)

v00-v01 (2-Class)

v02-v05 (3-5 Class)

Current (v05)

Label Encoding & Border Logic

Multi-Class Label Definition

Training Modes: Cell vs Edge

Key Files

Dependencies

Required Dependencies (from jarvis-md)

FLAME-Specific Additions

Installation & Environment Setup

Recommended: Shared Virtual Environment

Alternative: Project-Specific Environment

Environment Verification

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages