Skip to content

peterchang77/flame

Repository files navigation

FLAME: Cell Segmentation

A deep learning project for automatic cell nuclei and border segmentation in FLAME microscopy images. Built on the Jarvis framework for high-dimensional biomedical data management and model training.

Overview

This project implements a deeply-supervised U-Net segmentation pipeline for cell instance separation in fluorescence microscopy images. It addresses the classic "touching cells" problem by learning explicit cell boundary detection through an innovative border encoding scheme and multi-task training strategy.

Key Innovations

Border-Aware Label Encoding: Instead of simple binary masks, labels encode three distinct regions around each cell:

  • Cell nuclei (label 1): The cell interior
  • Inner border (label 2): 1 pixel inside the cell boundary
  • Outer border (label 3): 4 pixels outside into the background

This encoding enables training two complementary models:

  • Cell model: Segments full cell bodies (labels 1+2) for morphology analysis
  • Edge model: Detects boundaries (labels 2+3) for separating touching cells

Deep Supervision: The U-Net decoder provides intermediate supervision at multiple resolutions, improving gradient flow and boundary localization.

Hybrid Loss with Custom Weighting:

  • Focal Loss (gamma-tunable): Handles extreme class imbalance (background dominates)
  • Soft Dice Loss: Optimizes region overlap directly
  • Custom sample weights: Up-weight hard examples and border regions during training

Multi-Window Input: Raw images are preprocessed into 11-channel histogram-equalized representations, capturing intensity information across multiple contrast windows.

Project Structure

flame/
├── main.py                  # Prediction entry point
├── configs-*.yml            # Experiment configurations
├── NOTES.md                 # Development notes
├── prep/
│   └── prepare.py           # Data preparation and preprocessing
├── comp/
│   ├── ymls/                # Database definitions
│   ├── defs/                # Custom transformation definitions
│   └── pred/                # Prediction pipeline configs
├── data/
│   ├── csvs/                # Metadata CSVs
│   └── ymls/                # Data source definitions
└── exps/
    ├── base/                # Initial experiments (cell nuclei)
    ├── edge/                # Edge detection experiments
    ├── v00-v05/             # Progressive dataset versions
    │   ├── jmodels/         # Trained models and hyperparameter search
    │   └── csvs/            # Results and statistics

Data Pipeline

Raw Data Format

Input data consists of:

  • Raw images: TIFF files (1200x1200 or 256x256 patches)
  • Ground truth: PNG masks with annotations
  • Sample weights: Background region masks

Preprocessed Arrays

The data preparation pipeline (prep/prepare.py) creates HDF5 arrays:

Array Description Shape Key
dat Raw normalized image (1, 1200, 1200) or (1, 256, 256) dat-raw
hst Multi-window histogram-equalized image (1, H, W, 11) hst-raw
lbl Segmentation labels: 1=nuclei, 2=inner border, 3=outer border (1, H, W) lbl-raw
dst Signed distance transform (negative for nuclei, positive for edges) (1, H, W) dst-raw

Data Versions

Version Description Status
raw Initial accumulated training set Failed - poor annotations
v00 First improved dataset Superseded
v01 Cell nuclei + border labels Superseded
v02 Additional data (06-25 batch) Superseded
v03 TrainSetNew data Superseded
v04 256x256 patches with adaptive histogram Superseded
v05 Latest: 256x256 patches, adaptive CLAHE Active

Usage

1. Data Preparation

Convert raw TIFF images to HDF5 format:

from prep.prepare import create_v01, create_hdr, join_hdr

# Create v05 dataset with CLAHE normalization
create_v01(
    pattern='/data/raw/flame/zips/TrainSetNew/Raw/*.tif',
    suffix='v05',
    patch_size=(1, 256, 256),
    method='adapthist'
)

# Create and attach metadata header
create_hdr(v='v05', csv='./csvs/meta.csv')
join_hdr(v='v05')

2. Training

Training uses the jarvis auto CLI workflow. The pipeline consists of: (1) creating the database, (2) generating hyperparameter sweeps, and (3) running training.

Step 2a: Create Database

Generate the training database from the db.query paths in your config:

jarvis auto configs --file configs-v05.yml --only db

This creates:

  • data/ymls/db-v05.yml — Database definition with sform patterns
  • data/csvs/db-v05.csv.gz — Header with cohorts, folds, and statistics

Step 2b: Generate Hyperparameter Sweep

Prefix config values with $ to create sweep variables, then generate permutations:

jarvis auto configs --file configs-v05.yml --only sweep

This creates experiment directories under exps/v05/:

  • jmodels/csvs/hyper.csv — Central hyperparameter registry
  • jmodels/jobs/exp-*.sh — Launch scripts for each combination
  • exp-XXXX-X/configs.yml — Specific config per experiment

Step 2c: Run Training

Execute the generated shell script for an experiment:

bash exps/v05/jmodels/jobs/exp-XXXXX-0.sh

Or run manually:

JARVIS_AUTO_CONFIGS=exps/v05/exp-XXXXX-0/configs.yml \
    python exps/v05/jmodels/model.py

Full Pipeline (One Command)

Run all steps (DB creation + sweep generation) together:

jarvis auto configs --file configs-v05.yml

Fallback: Manual Config Generation

If jarvis auto CLI commands fail, you can generate experimental configs manually:

Step 1: Create the template config

Start with the template (e.g., configs-v05.yml), which contains $-prefixed sweep variables:

mapped:
  $y: [cell]          # Sweep variable: cell or edge
  $w: [cell]

models:
  training:
    configs:
      models:
        losses:
          functions:
          - name: foc
            $gamma: [2.0, 2.5, 3.0]   # Sweep variable

Step 2: Instantiate configs manually

Copy the template and replace $ variables with concrete values. For each experiment combination:

# Create experiment directory
mkdir -p exps/v05/exp-myexp-0
cp configs-v05.yml exps/v05/exp-myexp-0/configs.yml

Edit the instantiated config to resolve sweep variables:

Template ($) Instantiated
$y: [cell] y: cell
$gamma: [2.0, 2.5, 3.0] gamma: 2.0

Step 3: Register in hyper.csv

Add your experiment to the registry (exps/v05/jmodels/csvs/hyper.csv):

project_id,output_dir,sampling,gamma,y,w,wgt-edge
flame,{root}/exps/v05/exp-myexp-0,5-class,2.0,cell,cell,2

Step 4: Ensure DB exists

If database files don't exist, manually create them by copying from a working experiment:

cp exps/v05/exp-6Y3cMLLv-5/configs.yml exps/v05/exp-myexp-0/configs.yml
# Edit to change paths and resolve sweep variables

Then update the db section with your data paths and run training directly:

JARVIS_AUTO_CONFIGS=exps/v05/exp-myexp-0/configs.yml \
    python exps/v05/jmodels/model.py

Note: The key difference:

  • Template configs (configs-*.yml) contain $ sweep variables and search: section
  • Instantiated configs (exps/*/exp-*/configs.yml) have resolved values and are ready for training

Reference experiments with working instantiated configs:

  • exps/v05/exp-6Y3cMLLv-5/configs.yml — Cell mode (gamma=2.0)
  • exps/v05/exp-mteCRq1Y-5/configs.yml — Edge mode (gamma=2.0, wgt-edge=2)

Key Training Config Sections

Section Purpose
db.query Data sources (HDF5 paths)
db.setup Database initialization (stats, cohorts, folds)
client Data loading (batch size, sampling, specs)
xforms Augmentation (rotation, scaling, intensity)
mapped Label remapping strategies
models Model architecture (U-Net backbone, losses, optimizer)
search Hyperparameter sweep variables (prefix with $)

3. Prediction

Run inference on new TIFF images:

python main.py --data /path/to/tiffs

The prediction pipeline:

  1. Loads images from {data}/*/*.tif
  2. Applies histogram equalization preprocessing
  3. Runs ensemble inference using trained models
  4. Outputs masks, region proposals (RPN), and false positive predictions (FPR)

Model Architecture

Network Design

  • Backbone: Encoder-decoder U-Net pattern
  • Input: Multi-window histogram equalized images (11 channels)
  • Output: Binary or multi-class segmentation maps
  • Normalization: BatchNorm + GELU (v05) or LayerNorm + LeakyReLU (base)

Loss Functions

Loss Weight Description
Focal Loss 1.0 Handles class imbalance
Soft Dice 1.0 Region overlap optimization

Metrics

  • Dice Score (DSC): Segmentation overlap quality
  • Hausdorff Distance (H95): Boundary accuracy
  • Sensitivity (SEN): True positive rate
  • PPV: Positive predictive value

Hyperparameter Search

The search: section in configs defines sweeps over:

Parameter Options Description
sampling 5-class stratified Cohort-based sampling weights
gamma 2.0, 2.5, 3.0 Focal loss focusing parameter
y cell, edge Target remapping (nuclei vs border)
w cell, edge Sample weight strategies
wgt-edge 2, 5, 10 Edge class weight

Results are saved to exps/{version}/jmodels/csvs/hyper.csv.

Experiment Versions

Base/Edge (Failed)

Initial experiments using accumulated training set. Failed due to poor annotation quality in source data.

v00-v01 (2-Class)

  • Binary segmentation: nuclei vs background
  • Distance transform for edge-aware training

v02-v05 (3-5 Class)

  • Multi-class segmentation
  • Class 1: Cell nuclei
  • Class 2: Inner border (1 pixel)
  • Class 3: Outer border (4 pixels)
  • Classes 4-5: Additional cohorts for sampling

Current (v05)

  • Image size: 256x256 patches
  • Preprocessing: Adaptive histogram equalization (CLAHE)
  • Input channels: 11 multi-window intensity channels
  • Batch size: 128
  • Augmentation: Random affine (scale 0.8-1.2, rotation ±0.5 rad)

Label Encoding & Border Logic

Multi-Class Label Definition

Labels are generated using a signed distance transform to create border regions around cell nuclei:

Label Value Distance from Edge Description
Background 0 >4 px outside Extra-cellular space
Nuclei 1 Original mask Cell interior
Inner Border 2 [-1, 0] px 1 pixel inside cell boundary
Outer Border 3 (0, 4] px 4 pixels outside cell boundary

Generation logic (prep/prepare.py):

# Signed distance transform (negative inside nuclei, positive outside)
dst = ndimage.distance_transform_edt(1 - lbl)              # distance to nuclei
dst[lbl == 1] = ndimage.distance_transform_edt(lbl)[lbl == 1] * -1  # negative inside

# Create border classes
lbl[(dst <= 0) & (dst >= -1)] = 2   # inner border: 1px into nuclei
lbl[(dst <= 4) & (dst >   0)] = 3   # outer border: 4px into background

Training Modes: Cell vs Edge

The border classes enable two complementary segmentation strategies via hyperparameter sweep:

Mode Target Remapping (y) Weight Remapping (w) Use Case
cell lbl {1,2} → 1 (nuclei + inner border) All pixels = 1 Segment full cell bodies for counting/measurement
edge lbl {2,3} → 1 (inner + outer border) Borders = 2, rest = 1 Detect cell boundaries for separating touching cells

Why this design?

  • Touching cells problem: Cells in microscopy often touch; simple binary segmentation merges them
  • Cell model: Identifies cell regions (good for morphology analysis)
  • Edge model: Identifies boundaries (good for instance separation)
  • Ensemble inference: Combines both predictions for accurate instance segmentation

Key Files

File Purpose
main.py Prediction script entry point
prep/prepare.py Data preparation functions
comp/ymls/db-*.yml Database definitions per version
configs-v05.yml Active training configuration
NOTES.md Development history and notes

Dependencies

This project depends on jarvis-md, which provides all core deep learning and data management functionality.

Required Dependencies (from jarvis-md)

Package Version Purpose
Python >=3.13 Language runtime
TensorFlow >=2.20.0 Deep learning framework
tf-keras >=2.20.1 Keras API
NumPy any Array operations
SciPy any Image processing (ndimage)
scikit-image any Image I/O and preprocessing
Pandas any Data manipulation
h5py any HDF5 array storage
PyYAML >=5.2 Configuration files
matplotlib any Visualization

FLAME-Specific Additions

The FLAME project adds no additional Python dependencies beyond jarvis-md. All imports (jarvis.*, tensorflow, numpy, scipy, skimage, pandas) are covered by the base jarvis-md package.

Installation & Environment Setup

Recommended: Shared Virtual Environment

Since FLAME has no additional dependencies beyond jarvis-md, use the jarvis-md virtual environment directly:

# Navigate to jarvis-md repo
cd <path-to-jarvis-md>

# Create venv if it doesn't exist
uv venv .venv

# Install jarvis-md with all dependencies
uv pip install -e ".[all]"

# Use this venv for FLAME work
cd <path-to-flame>
source <path-to-jarvis-md>/.venv/bin/activate

Alternative: Project-Specific Environment

If you need isolation between projects:

cd <path-to-flame>

# Create local venv
uv venv .venv
source .venv/bin/activate

# Install jarvis-md as editable dependency
uv pip install -e <path-to-jarvis-md>

Environment Verification

Test your setup:

python -c "import tensorflow as tf; print(f'TF: {tf.__version__}')"
python -c "from jarvis.utils.db import DB; print('Jarvis: OK')"
python -c "from prep.prepare import create_v01; print('FLAME: OK')"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages