Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,20 @@ jobs:
- name: Run pytest
run: pytest -v

- name: Try install optional AdStat extra
id: adstat_extra
run: |
if python -m pip install -e ".[adstat]"; then
echo "available=true" >> "$GITHUB_OUTPUT"
else
echo "available=false" >> "$GITHUB_OUTPUT"
echo "AdStat extra is not available in this environment; adapter tests already skip without it."
fi

- name: Run AdStat adapter tests
if: steps.adstat_extra.outputs.available == 'true'
run: pytest -v tests/test_adstat_adapter.py tests/test_adstat_results_dialog.py

lint:
name: ruff
runs-on: ubuntu-latest
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,8 @@ dzdv = numeric_derivative(spec.x_array, z_smooth)

- [GUI guide](docs/gui.md)
- [Command-line guide](docs/cli.md)
- [Particle Statistics guide](docs/adstat_user_guide.md)
- [AdStat integration](docs/adstat_integration.md)
- [Createc `.dat` reader notes](docs/createc_dat_reader.md)
- [ROI manual workflow checklist](docs/roi_manual_test_checklist.md)
- [Review and cleanup status](docs/review_status.md)
Expand Down
119 changes: 119 additions & 0 deletions docs/adstat_integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# AdStat Integration

ProbeFlow can hand curated STM point collections to AdStat without exporting an
intermediate file. ProbeFlow remains responsible for image loading, processing,
point detection, ROI and mask editing, and Qt presentation. AdStat receives
normalised point-pattern objects, runs the statistics and matched null models,
and returns GUI-free result/view specifications for ProbeFlow to render.

The adapter lives in `probeflow.analysis.adstat_adapter` and imports AdStat
lazily. Install the optional AdStat extra, or put an AdStat checkout on
`PYTHONPATH`, before using this path:

```bash
pip install "probeflow[adstat]"
```

The viewer's **Particle Statistics...** command opens a ProbeFlow-native Qt
shell powered by the AdStat engine. Its **Analyze scan points** mode calls
`compare_point_source_view_spec(...)` for live point-source records already
collected by the image viewer. The saved/session feature-set workflow builds
`point_set_record(...)` objects and runs `compare_point_set_record_view_spec(...)`
for one set or `compare_point_set_records_view_spec(...)` for pooled replicate
sets.

Points reach that shared feature-set pool from several sources:

- **Feature Finder** and **Feature Counting** both have a *Send to Particle
Statistics* action. Feature Counting particles/detections are converted with
`feature_counting_to_particle_table(...)` (via
`point_table_io.feature_items_to_feature_set`) into a `FeatureSet`.
- **Load points from disk…** uses `point_table_io.sniff_point_table` /
`load_point_table` to import CSV position tables and ProbeFlow JSON (Feature
Counting exports and saved `FeatureSetStore` files).

`compare_particle_collection_view_spec(...)` remains the broadest single-call
entry point and also accepts independent feature layers.

## Single Image

1. Generate one point collection from any canonical ProbeFlow source:
Feature Finder maxima/minima, Feature maxima, point ROIs, Feature Counting
segmented particles, or template-match detections.
2. Choose that collection as the tested population. Each run analyses one
population; mixed species should be analysed separately unless the scientific
intent is an unlabelled merged population.
3. Choose the analysis region from an active area ROI, mask, or the full image.
The same region must be used for observed statistics and every null
simulation.
4. Convert through AdStat `ImageCalibration`, preserving anisotropic pixel
sizes from `Scan.scan_range_m` and `Scan.dims`.
5. Open **Measurements -> Features -> Particle Statistics...** and use
**Analyze scan points** to pass
`ParticleTable + AnalysisRegion + optional independent feature layers` to
AdStat and render the returned `ResultViewSpec` with ProbeFlow's native Qt
result-view widgets.

Measured feature layers, such as step traces or defect landmarks, must be
independent measurements. A layer derived from the same particle centroids being
tested is not a valid measured-inhomogeneity null.

The older **Pair correlation...** dialog remains available during migration. The
AdStat path uses matched simulation envelopes and verdict rows, so its plots are
not intended to be numerically identical to the older square-window
pair-correlation readout.

## Synthetic Demo Run

The repository includes a reproducible teaching script that generates a
clustered random point collection and runs it through the same direct adapter
path used by the viewer:

```bash
python scripts/adstat_demo.py --output-dir /tmp/probeflow_adstat_demo
```

If AdStat is checked out locally rather than installed as a package, put that
source tree on `PYTHONPATH` first, for example:

```bash
PYTHONPATH=/path/to/AdStat/src python scripts/adstat_demo.py
```

The script writes:

- `synthetic_points.csv` - the ProbeFlow-shaped point collection in nm and px.
- `adstat_result_view_spec.json` - the AdStat `ResultViewSpec` that ProbeFlow's
Qt-native renderer consumes.
- `synthetic_points_preview.png` - a quick plot of the generated point pattern.

This demo teaches the data path and expected result panels. It does not replace
a GUI workflow: in the viewer, users still generate or curate a point
collection, choose an active ROI/mask if needed, and open
**Measurements -> Features -> Particle Statistics...**.

## Generated Teaching Modes

Two Particle Statistics modes use AdStat's synthetic sandbox backend rather than
the current scan's ROIs or detected features:

- **Learn with tutorial** — a guided walkthrough that stages generated patterns,
models, and statistics step by step.
- **Model simulations** — free-play exploration of generated patterns, where the
user picks the pattern, null model, particle count, seed, and simulation count
directly and reruns at will.

Both pages are persistently labelled `TEST MODE - GENERATED DATA` and use
different point markers and colours from real-data analysis. Sandbox points are
deliberately isolated from real ProbeFlow data in v1: they do not become point
ROIs, measurements, processing provenance, or point-source records. Use
**Analyze scan points** for real scan data and either generated mode for
examples.

## Series

A multi-image collection is a list of scan records. Each record stores the scan
id, one normalised point set, calibration, the analysis ROI/mask, source
metadata, and an optional user-supplied series coordinate such as coverage or
temperature. ProbeFlow can run AdStat directly from those records and may also
export AdStat project/coverage-series JSON as provenance.
192 changes: 192 additions & 0 deletions docs/adstat_user_guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
# Particle Statistics User Guide

ProbeFlow is the normal entry point for particle/point-pattern statistics. Open
**Measurements -> Features -> Particle Statistics...** to choose between real
scan analysis, a guided tutorial, and free-play model simulations. The
calculations are powered by the AdStat engine.

> **Maturity note — please read.** Particle Statistics is the newest and
> **least user-tested** part of ProbeFlow. It has unit tests and a worked
> tutorial, but it has had far less real-world use than the imaging, ROI, and
> Feature Finder tools, and it **may contain mistakes** — in the statistics, the
> automatic scale choices, or the interpretation language. Treat its output as
> *exploratory*: sanity-check verdicts against your own judgement and, for
> anything you intend to publish, against an independent point-pattern method you
> trust. If a result looks wrong, it may well be — please report it.

## What Particle Statistics Does

Particle Statistics compares a collection of points against spatial null models.
In plain language, it asks whether the points look consistent with random
placement, or whether they show signs of clustering, separation, or association
with another measured feature.

It does not prove a physical mechanism. The result depends on which points you
analyse, which image region you allow, and which null model you choose.

## How It Works

The core idea is a **null model** plus a **simulation envelope**:

1. You give it a set of point positions and an analysis region.
2. It measures one or more spatial statistics on your points.
3. It generates many random point patterns from the chosen null model (same
region, same number of points) and measures the same statistic on each. The
spread of those simulations is the **envelope** — the range expected by
chance.
4. If your observed curve falls outside the envelope, the pattern is reported as
*inconsistent with* the null model; if it stays inside, it is *consistent
with* it.

**Null models** available for real data:

- **Homogeneous Poisson** — completely random placement (the usual baseline).
- **Hard-core random** — random but with a minimum spacing (points cannot sit
closer than a set distance), a baseline for excluded-volume effects.
- **Measured-feature Poisson** — random but biased toward an *independently
measured* feature layer (e.g. step edges), to test association. The feature
layer must be a different measurement from the particles being tested —
reusing the particles as their own feature is circular.

**Core statistics** (different lenses on the same points, always shown):

- **Pair correlation g(r)** — relative density of neighbours at distance *r*. A
short-range bump means clustering; a dip near zero means avoidance/spacing.
- **Nearest-neighbour distribution** — how far each point is from its closest
neighbour; the clearest first look at spacing vs clumping.
- **Ripley's L** — cumulative neighbour counts vs distance; sensitive to
clustering or regularity across a range of scales.
- **Cluster sizes** — counts of connected groups within a linking distance.

**Local-order checks (opt-in).** Bond-orientational order **ψ4 / ψ6** and the
angular pair map **g(r, θ)** answer a *different* question — is there square or
triangular *lattice* order? — so they are **off by default** and not shown in a
plain randomness analysis. Tick **"Include local-order checks"** (or run the
ordered tutorial examples) to compute them. They are validated to behave
correctly — a triangular lattice rejects on ψ6, a square lattice on ψ4, and
random points stay consistent (see `tests/test_adstat_validation.py`) — but they
depend on a neighbour-distance cutoff and have no edge correction, so for sparse
patterns or particles near the region boundary treat them as suggestive of local
order, not proof of a crystal.

ProbeFlow chooses the distance scales (bin widths, maximum radii, hard-core and
cluster radii) automatically from the region size and point density when you do
not set them. These automatic choices are *teaching-quality defaults*, not tuned
parameters — see the maturity note above.

**Reading a verdict.** "Consistent with random" means the null model was *not
rejected* — it is not positive proof that nothing is going on (a small or noisy
sample simply may not have the power to detect an effect). "Inconsistent with
random" means the statistic departed from the envelope — evidence of structure,
but not proof of any particular physical mechanism. Pooling several independent
images of the same condition is the practical way to strengthen a conclusion.

## Analyze Scan Points

Use **Analyze scan points** for real ProbeFlow data.

1. Generate or curate a point collection:
Feature Finder maxima/minima, feature maxima, or point ROIs are available in
the current viewer workflow.
2. Open **Particle Statistics...** and stay on **Analyze scan points**.
3. Choose one point source as the tested population.
4. Choose the analysis region: active area ROI, active mask, or full image.
5. Choose a model and run the comparison.

For session feature-set workflows, click **Send to Particle Statistics** from
Feature Finder. The set is saved with its image calibration. Tick one saved set
to analyse that image, or tick multiple saved sets from independent scans of the
same condition to pool them.

The real-data UI exposes homogeneous Poisson, hard-core random, and
measured-feature Poisson models, plus simulation count and random seed.
Measured-feature Poisson uses one tested session set and a different,
independently measured Feature layer set.

## Getting Points In

Particle Statistics shares one feature-set pool across the whole session, so
points from any of these sources can be ticked together and pooled:

1. **Feature Finder → Send to Particle Statistics** — local maxima/minima from
the open image.
2. **Feature Counting → Send to Particle Statistics** — segmented particle
centroids (Particles mode) or template detections (Template mode) from the
Feature Counting window.
3. **Point sources in the open image** — detected feature maxima and point ROIs
appear directly in the *Analyze scan points* dropdown (point ROIs include any
loaded from a `.rois.json` sidecar).
4. **Load points from disk…** — import an external position table. Accepted
formats: CSV position tables (with or without a leading particle-number
column; units inferred from `x_px` / `x_nm` / `x_m` / `x_phys` headers or
chosen on import), ProbeFlow's own Feature Finder / measurements CSV, and
ProbeFlow JSON (Feature Counting exports and saved feature-set files). For a
file with no embedded calibration, a small dialog (prefilled from the file)
asks for the position units and physical field size before the points become a
feature set.

Use **Save feature sets…** to write the current pool to a JSON file; it can be
re-imported later with **Load points from disk…**.

## Exporting Results

After running a comparison, the top **Export** menu writes the results in simple
formats so you can reproduce the plots in another program:

- **Export curves + verdicts (CSV folder)…** — one CSV per statistic. Each curve
file has a distance column plus the `observed` line and the model envelope
(`model_low` / `model_central` / `model_high`), so g(r), the nearest-neighbour
distribution, Ripley's L, cluster sizes, etc. can be re-plotted directly. A
`…_verdicts.csv` holds the per-model/per-statistic verdict table. (Heatmap and
real-space panels are not written as CSV; they are kept in the JSON export.)
- **Export full result (JSON)…** — the entire result (all panels, curves, and
verdicts) in one JSON file, for archiving or scripted post-processing.

The input points themselves are exported separately via **Save feature sets…**
(above) or the source tools' own CSV/JSON exports.

## Learn With Tutorial

Use **Learn with tutorial** (the **Tutorial** card on the workflow start page)
for a guided walkthrough that builds up particle-pattern analysis one idea at a
time, using generated example data rather than the current image. It steps
through random placement, sample size, pooling, clustering, hard-core spacing,
and feature association, then hands you off to the real scan-points workflow.

## Model Simulations

Use **Model simulations** to experiment freely with generated patterns without
the tutorial's guidance. Choose the synthetic pattern, null model, particle
count, seed, and simulation count, then run the comparison or draw a fresh
pattern. This is the place to build intuition by changing one knob at a time.

Both generated modes are labelled `TEST MODE - GENERATED DATA` and use different
point markers and colours so they cannot be mistaken for real data. Generated
points stay isolated: they do not become point ROIs, measurements, processing
provenance, or active ProbeFlow point sources.

## Appropriate Data

Good tested populations are point-like features whose coordinates have a clear
meaning: atoms, adsorbates, defects, particle centroids, template detections, or
manually curated point ROIs.

Do not mix unrelated species unless the scientific question is explicitly about
the merged set. Independent feature layers, such as step traces or external
landmarks, must be measured separately from the particles being tested.

## Current Limitations

- **This is the least battle-tested area of ProbeFlow** (see the maturity note at
the top): the statistics and especially the automatic scale defaults may still
have rough edges. Verify important results independently.
- Feature sets live in one shared session pool; **Save feature sets…** /
**Load points from disk…** persist and restore them as JSON, but they are not
yet tied into a durable per-project record.
- Imported files with no embedded calibration require you to supply the field
size; the image (pixel) dimensions are synthetic and only affect the
pixel-resolution note, not the statistics.
- Series/project export workflows are still pending.

See [AdStat integration](adstat_integration.md) for the developer-facing API
contract between ProbeFlow and AdStat.
14 changes: 13 additions & 1 deletion docs/gui.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,9 @@ atoms, molecules, defects, moiré sites — on the current image.
From the **Export** section you can write the coordinates to CSV, render
a synthetic *feature image* (a disk at every detection, useful for pair
correlation and lattice statistics), or send that feature image straight
to the FFT viewer.
to the FFT viewer. When Particle Statistics is available, **Send to Particle
Statistics** saves the current detections as a calibrated session feature set
for single-image or pooled spatial statistics.

For segmentation-based workflows — particle size statistics, template
matching, classification, lattice extraction — use the **Feature
Expand All @@ -119,6 +121,16 @@ optional `features` extra:
pip install "probeflow[features]"
```

For particle spatial-statistics workflows, these detected features and point
ROIs can be analysed from **Measurements → Features → Particle Statistics...**.
Use **Analyze scan points** for real ProbeFlow data, **Learn with tutorial** for
a guided walkthrough, or **Model simulations** to explore synthetic patterns and
null-model behaviour freely before applying the tool to a scan. Particle
Statistics is the newest and least user-tested part of ProbeFlow — treat its
verdicts as exploratory and verify important results independently. See the
[Particle Statistics guide](adstat_user_guide.md)
and the developer-facing [AdStat integration](adstat_integration.md) contract.

## Beyond the basics

* **ROIs** — draw rectangles, ellipses, and lines from the ROI tab; ROIs
Expand Down
Loading
Loading