Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/agents/fink_fat_doc.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,9 +103,9 @@ mathematical identifiers.
#### Reference working example

```rust
/// $$\begin{align} c &= \frac{1}{2}\chi^2\_{\text{pos}} \\ &+ \frac{1}{2}\chi^2\_{\text{vel}} \\ &+ \frac{1}{2}z\_{\text{flux}}^{2} \\ &+ \frac{1}{2}\bigl[\ln(|r\_{\sigma}| + \varepsilon)\bigr]^2 \\ &+ \ln(\varepsilon\_{\text{band}} + b\_{\text{shared}}) \end{align}$$
/// $$\begin{align} c &= \frac{1}{2}\chi^2\_{\text{pos}} \\ &+ \frac{1}{2}\chi^2\_{\text{vel}} \\ &+ \frac{1}{2}z\_{\text{mag}}^{2} \\ &+ \frac{1}{2}\bigl[\ln(|r\_{\sigma}| + \varepsilon)\bigr]^2 \\ &+ \ln(\varepsilon\_{\text{band}} + b\_{\text{shared}}) \end{align}$$
///
/// where $r\_{\sigma}$ is `flux_std_ratio` and $b\_{\text{shared}} \in \{0, 1\}$.
/// where $r\_{\sigma}$ is `mag_std_ratio` and $b\_{\text{shared}} \in \{0, 1\}$.
```

#### Summary of LaTeX rules
Expand Down
51 changes: 45 additions & 6 deletions crates/fink-fat-engine/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,37 @@ SavePersistedData ← flush alerts, seeds, edge journal, state
Progress is reported through the `PipelineHooks` trait, which can be backed
by any progress-bar or logging implementation.

### `IngestNights` input format

The `IngestNights` stage expects a Parquet dataset containing one row per
photometric detection. The loader reads the file through DataFusion and
projects only the columns required to build an `AlertStore`.

The default schema expected by the engine is:

| Column | Type | Description |
|---|---|---|
| `night_id` | `u32` | Integer night identifier used to group alerts into nightly batches. |
| `dia_source_id` | `u64` | Upstream unique detection identifier. |
| `ra` | `f64` | Right ascension in radians. |
| `ra_err` | `f64` | Right ascension uncertainty in radians. |
| `dec` | `f64` | Declination in radians. |
| `dec_err` | `f64` | Declination uncertainty in radians. |
| `mjd_tt` | `f64` | Observation epoch in MJD TT days. |
| `mag` | `f64` | PSF difference magnitude. |
| `mag_err` | `f64` | Uncertainty on `mag`. |
| `band` | `u8` | Photometric band code. |
| `observer_mpc_code` | string | MPC observatory code associated with the detection. |

All of these columns are required. The loader rejects rows with null values in
any required field. If a dataset uses different column names, the loader can be
configured programmatically through `AlertParquetColumns`, but the default
Parquet layout used by the engine is the table above.

The file may contain multiple nights in a single Parquet dataset. `night_id` is
used to partition rows into per-night alert stores before the downstream stages
run.

---

## Data model
Expand All @@ -90,7 +121,7 @@ An `Alert` represents a single photometric detection:
| `ra`, `dec` | `f64` | radians, ICRS J2000 |
| `ra_err`, `dec_err` | `f64` | radians, 1σ |
| `mjd_tt` | `f64` | MJD TT (days) |
| `flux`, `flux_err` | `f64` | PSF difference flux (upstream-dependent) |
| `mag`, `mag_err` | `f64` | PSF difference magnitude (upstream-dependent) |
| `band` | `u8` | photometric band code (LSST: u=0 … y=5) |
| `dia_source_id` | `u64` | upstream unique detection identifier |

Expand Down Expand Up @@ -149,6 +180,11 @@ export FINK_FAT__EDGES__TOP_K_PER_LEFT=64

### Top-level structure

Alert photometry is ingested and propagated as `mag` / `mag_err`.
The configuration key is `max_mag_difference`.
Edge feature names still use historical `mag` wording
(`z_mag`, `mag_std_ratio`) to preserve the public feature schema.

```yaml
version: 1
max_gap_nights: 2 # maximum inter-night gap considered for linking
Expand All @@ -157,13 +193,13 @@ storage_path: "storage/" # root for on-disk persistence
pairs:
max_dt: "86.4 min"
max_angular_speed: "35 arcmin/day"
max_flux_difference: 2.5
max_mag_difference: 2.5

triplets:
max_dt_between: "30 min"
max_pair_sep: "10 arcmin"
max_predicted_residual: "5 arcmin"
max_flux_difference: 2.5
max_mag_difference: 2.5

edges:
top_k_per_left: 32
Expand Down Expand Up @@ -264,13 +300,16 @@ growth that plagues the pure Gaussian model.

Added unconditionally regardless of the kinematic variant:

$$c\_{\mathrm{phot}} = \frac{1}{2} z\_{\mathrm{flux}}^{2} + \frac{1}{2}\bigl[\ln(|r\_{\sigma}| + \varepsilon)\bigr]^{2} + b\_{\mathrm{band}}$$
$$c\_{\mathrm{phot}} = \frac{1}{2} z\_{\mathrm{mag}}^{2} + \frac{1}{2}\bigl[\ln(|r\_{\sigma}| + \varepsilon)\bigr]^{2} + b\_{\mathrm{band}}$$

where $z\_{\mathrm{flux}}$ is the flux z-score between the two seeds,
$r\_{\sigma}$ is the ratio of their flux standard deviations, and
where $z\_{\mathrm{mag}}$ is the magnitude z-score between the two seeds,
$r\_{\sigma}$ is the ratio of their magnitude standard deviations, and
$b\_{\mathrm{band}} = 0$ when both seeds share a photometric band,
$b\_{\mathrm{band}} \approx 6.9$ otherwise.

The underlying Rust feature fields still use the historical names
`z_mag` and `mag_std_ratio` to preserve the public feature ordering.

For full implementation details see
[`edge_features`](https://docs.rs/fink-fat-engine/latest/fink_fat_engine/graph/edge/edge_features/index.html)
and
Expand Down
130 changes: 74 additions & 56 deletions crates/fink-fat-engine/benches/generate_topk_edges.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,27 +33,27 @@ use smallvec::SmallVec;
/// - This is **not** intended to be physically accurate: the goal is to generate
/// stable, deterministic inputs that exercise the linking code paths.
/// - RA/Dec errors are fixed to a small constant to keep the seed model stable.
/// - Flux errors are a simple proportional rule-of-thumb (never below 1).
/// - Mag errors are a simple proportional rule-of-thumb (never below 1).
fn make_alert(
dia_source_id: u64,
ra_rad: f64,
dec_rad: f64,
mjd_tt: f64,
band: u8,
flux: f64,
mag: f64,
) -> Alert {
Alert {
key: AlertKey {
night_id: NightId(0),
dia_source_id: dia_source_id,
dia_source_id,
},
ra: ra_rad,
ra_err: 1.0e-6, // ~0.2 arcsec in radians
dec: dec_rad,
dec_err: 1.0e-6,
mjd_tt,
flux,
flux_err: (0.1 * flux.abs()).max(1.0),
mag,
mag_err: (0.1 * mag.abs()).max(1.0),
band,
observer_mpc_code: Arc::new("I41".to_string()),
}
Expand Down Expand Up @@ -99,13 +99,7 @@ fn make_seeds_pair_model(
rng: &mut StdRng,
night_id: NightId,
num_seeds: usize,
start_mjd_tt: f64,
seed_time_step_days: f64,
start_ra_rad: f64,
start_dec_rad: f64,
ra_drift_rad_per_seed: f64,
dec_drift_rad_per_seed: f64,
max_speed_rad_per_day: Option<f64>,
spec: SeedSeriesSpec,
) -> Vec<SeedNode> {
let mut seed_store: SeedStore = SeedStore::new();

Expand All @@ -114,23 +108,24 @@ fn make_seeds_pair_model(
// This avoids lifetime issues because SeedNode::from_pair stores references.
// Criterion benchmarks build inputs once, so this is a pragmatic solution.
for seed_index in 0..num_seeds {
let time_alert_a = start_mjd_tt + (seed_index as f64) * seed_time_step_days;
let time_alert_b = time_alert_a + (seed_time_step_days * 0.5).max(1e-6);
let time_alert_a = spec.start_mjd_tt + (seed_index as f64) * spec.seed_time_step_days;
let time_alert_b = time_alert_a + (spec.seed_time_step_days * 0.5).max(1e-6);

let ra_jitter = (rng.random::<f64>() - 0.5) * 1e-4;
let dec_jitter = (rng.random::<f64>() - 0.5) * 1e-4;

let ra_a = start_ra_rad + (seed_index as f64) * ra_drift_rad_per_seed + ra_jitter;
let dec_a = start_dec_rad + (seed_index as f64) * dec_drift_rad_per_seed + dec_jitter;
let ra_a = spec.start_ra_rad + (seed_index as f64) * spec.ra_drift_rad_per_seed + ra_jitter;
let dec_a =
spec.start_dec_rad + (seed_index as f64) * spec.dec_drift_rad_per_seed + dec_jitter;

let ra_b = ra_a + ra_drift_rad_per_seed * 0.5;
let dec_b = dec_a + dec_drift_rad_per_seed * 0.5;
let ra_b = ra_a + spec.ra_drift_rad_per_seed * 0.5;
let dec_b = dec_a + spec.dec_drift_rad_per_seed * 0.5;

let band_a = (seed_index % 2) as u8;
let band_b = ((seed_index + 1) % 2) as u8;

let flux_a = 1000.0 + (rng.random::<f64>() - 0.5) * 50.0;
let flux_b = flux_a + (rng.random::<f64>() - 0.5) * 20.0;
let mag_a = 1000.0 + (rng.random::<f64>() - 0.5) * 50.0;
let mag_b = mag_a + (rng.random::<f64>() - 0.5) * 20.0;

let dia_source_id = 1_000_000 + seed_index as u64;

Expand All @@ -140,23 +135,23 @@ fn make_seeds_pair_model(
dec_a,
time_alert_a,
band_a,
flux_a,
mag_a,
)));
let alert_b: &'static Alert = Box::leak(Box::new(make_alert(
dia_source_id,
ra_b,
dec_b,
time_alert_b,
band_b,
flux_b,
mag_b,
)));

let seed_node = SeedNode::from_pair(
&mut seed_store,
night_id,
alert_a,
alert_b,
max_speed_rad_per_day,
spec.max_speed_rad_per_day,
)
.expect("SeedNode::from_pair failed (speed filter too strict?)");

Expand All @@ -167,6 +162,17 @@ fn make_seeds_pair_model(
seed_store.get(&night_id).unwrap_or_default().to_vec()
}

#[derive(Clone, Copy)]
struct SeedSeriesSpec {
start_mjd_tt: f64,
seed_time_step_days: f64,
start_ra_rad: f64,
start_dec_rad: f64,
ra_drift_rad_per_seed: f64,
dec_drift_rad_per_seed: f64,
max_speed_rad_per_day: Option<f64>,
}

/// Construct the spatial + time binners used by `SeedNode::score_edge_candidates`.
///
/// Parameters
Expand Down Expand Up @@ -231,26 +237,30 @@ fn bench_generate_topk_edges_end_to_end(c: &mut Criterion) {
&mut rng,
left_night,
num_left_seeds,
60000.0,
2.0 / 1440.0, // 2-minute spacing
1.0,
0.5,
5e-5,
2e-5,
None,
SeedSeriesSpec {
start_mjd_tt: 60000.0,
seed_time_step_days: 2.0 / 1440.0, // 2-minute spacing
start_ra_rad: 1.0,
start_dec_rad: 0.5,
ra_drift_rad_per_seed: 5e-5,
dec_drift_rad_per_seed: 2e-5,
max_speed_rad_per_day: None,
},
);

let right_seeds = make_seeds_pair_model(
&mut rng2,
right_night,
num_right_seeds,
60001.0,
2.0 / 1440.0,
1.01,
0.51,
5e-5,
2e-5,
None,
SeedSeriesSpec {
start_mjd_tt: 60001.0,
seed_time_step_days: 2.0 / 1440.0,
start_ra_rad: 1.01,
start_dec_rad: 0.51,
ra_drift_rad_per_seed: 5e-5,
dec_drift_rad_per_seed: 2e-5,
max_speed_rad_per_day: None,
},
);

// Use the minimum right epoch as time origin so bin indices stay small.
Expand All @@ -262,8 +272,10 @@ fn bench_generate_topk_edges_end_to_end(c: &mut Criterion) {
let (spatial_binner, _) = make_binners(time_origin);

// Base configuration for the tested function.
let mut edge_config = EdgeConfig::default();
edge_config.top_k_per_left = Some(top_k_per_left);
let edge_config = EdgeConfig {
top_k_per_left: Some(top_k_per_left),
..EdgeConfig::default()
};

group.throughput(Throughput::Elements(
(num_left_seeds * top_k_per_left) as u64,
Expand Down Expand Up @@ -372,26 +384,30 @@ fn bench_generate_topk_edges_components(c: &mut Criterion) {
&mut rng1,
left_night,
num_left_seeds,
61000.0,
2.0 / 1440.0,
2.0,
0.3,
6e-5,
3e-5,
None,
SeedSeriesSpec {
start_mjd_tt: 61000.0,
seed_time_step_days: 2.0 / 1440.0,
start_ra_rad: 2.0,
start_dec_rad: 0.3,
ra_drift_rad_per_seed: 6e-5,
dec_drift_rad_per_seed: 3e-5,
max_speed_rad_per_day: None,
},
);

let right_seeds = make_seeds_pair_model(
&mut rng,
right_night,
num_right_seeds,
61001.0,
2.0 / 1440.0,
2.01,
0.31,
6e-5,
3e-5,
None,
SeedSeriesSpec {
start_mjd_tt: 61001.0,
seed_time_step_days: 2.0 / 1440.0,
start_ra_rad: 2.01,
start_dec_rad: 0.31,
ra_drift_rad_per_seed: 6e-5,
dec_drift_rad_per_seed: 3e-5,
max_speed_rad_per_day: None,
},
);

// Use the minimum right epoch as time origin so time-bin indices stay small.
Expand All @@ -403,8 +419,10 @@ fn bench_generate_topk_edges_components(c: &mut Criterion) {
let (spatial_binner, time_binner) = make_binners(time_origin);
let right_index = SeedSpatialIndex::build(&right_seeds, &spatial_binner, &time_binner);

let mut edge_config = EdgeConfig::default();
edge_config.top_k_per_left = Some(top_k_per_left);
let edge_config = EdgeConfig {
top_k_per_left: Some(top_k_per_left),
..EdgeConfig::default()
};

// -----------------------------------------------------------------------------
// 1) Candidate generation + scoring per single left seed.
Expand Down Expand Up @@ -567,7 +585,7 @@ fn bench_generate_topk_edges_components(c: &mut Criterion) {
/// - sample count,
/// - warmup time,
/// - measurement time,
/// and slightly increase tolerated noise to get actionable numbers quickly.
/// - and slightly increase tolerated noise to get actionable numbers quickly.
fn criterion_config() -> Criterion {
Criterion::default()
.with_plots()
Expand Down
Loading
Loading