Skip to content

feat(conformal): cross-sectional conformal inference (vartype = "conformal")#143

Open
xuyiqing wants to merge 14 commits into
devfrom
feat/conformal-inference
Open

feat(conformal): cross-sectional conformal inference (vartype = "conformal")#143
xuyiqing wants to merge 14 commits into
devfrom
feat/conformal-inference

Conversation

@xuyiqing

@xuyiqing xuyiqing commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Summary

Adds vartype = "conformal": a cross-sectional conformal prediction interval for the average treatment effect on the treated. It ranks the treated unit's average post-treatment prediction error against the leave-one-control-out prediction errors of the donor pool, giving a distribution-free interval with no bootstrap draws. The interval is closed-form and never empty.

This implements the sc-conformal paper's Algorithm 1 as an estimator-agnostic fect feature.

Design: Family A (level statistics)

Every interval is a level statistic S_i(τ) = |m_i − τ| / scale_i, inverted in closed form to m_tr ± scale_tr · Q, where Q is the conformal (1−α) quantile of the control scores. Never empty; unbounded only below the resolution floor α < 1/(Nco+1). (An earlier path/constant-effect score family was removed because it could return an empty acceptance set, which is confusing to report.)

Options (orthogonal knobs)

arg values default
conformal.scale sd · none · rmspe · mad · diff (model-se reserved) sd
conformal.center mean · median mean
conformal.weight cell · unit · precision cell
conformal.band pointwise · simultaneous pointwise
conformal.cutoff per-period · pooled per-period

The simultaneous (sup-t uniform) band is also always stored in fit$est.att.sim.

Default chosen by simulation

scale = "sd" (the studentized-mean) was selected by a coverage study (tests/coverage-study/conformal_default_study.R, 150 reps, true effect 0): it has the highest worst-case scalar coverage (0.920) and the smallest mean width (3.16) across iid / AR(1) / heteroskedastic / nonstationary DGPs. It is decisively tighter than none under heteroskedasticity (width 2.82 vs 5.43 at equal coverage). This default is provisional and easy to change — the maintainer's call.

Scope and guards

  • Requires a separated (controls-only) fit and method not in c("mc", "both").
  • The bootstrap F / equivalence diagnostic (diagtest) is skipped for conformal (no draws), as it is for se = FALSE.
  • Group / reversal / weighted (W) / balanced / placebo / carryover options error clearly and direct the user to bootstrap/jackknife.
  • Existing vartypes (bootstrap/jackknife/parametric) are untouched — all changes are gated to the conformal branch / conformal.R.

Known open items (documented, not blocking)

  • Multi-treated calibration ranks an Ntr-aggregate against single controls; the simultaneous band undercovers nominal in small N (~0.77 vs 0.90), and cell/unit weights undercover under treated-unit heterogeneity (precision recovers it). A placebo-average calibration is the future refinement.
  • conformal.scale = "model-se" is reserved (needs estimator prediction-SE plumbing); it stops with a clear message.

Verification

  • tests/testthat/test-conformal.R: 55 tests pass (core engine, closed-form interval, all scales/centers/weights, pointwise + simultaneous bands, multi-treated weights, end-to-end fect() integration, coverage guards).
  • print() and plot(type = "gap"|"counterfactual") work on conformal fits.
  • Coverage study results: tests/coverage-study/results/conformal_default_study.md.
  • Full testthat suite (all files, serial): 0 failures, 0 warnings, 0 skips.
  • Inferential coverage study (run_para_error_coverage.R): T19/T20/T21 all PASS (the parametric variance path is unaffected; conformal changes are additive and gated).

Update — additional buildout

  • Staggered adoption is now supported (per-cohort donor pools, union-window calibration, event-time-aggregated bands). Block behavior is preserved exactly. Coverage on a two-cohort staggered DGP (100 reps, effect 0, α=0.10): scalar 0.90 (iid) / 0.94 (ar1); per-period band 0.91 / 0.91; never empty/unbounded.
  • Plotting: plot() renders the event-study and counterfactual for block and staggered conformal fits. The simultaneous band is plottable via conformal.band = "simultaneous" (it fills est.att) or the est.att.sim slot.
  • True inner band: est.att90 is now a genuine 1 − 2α conformal band (was a placeholder mirroring the main band), computed at no extra fitting cost.
  • Print: an Inference: conformal (scale, band, N.calib) line.
  • Joint-band coverage characterized (300 reps): the simultaneous band's joint coverage is ~0.84–0.87 (mildly anti-conservative, in the jackknife+ range), far better than the pointwise joint ~0.66. Full conformal on the max statistic would close the gap and is left as a documented follow-up; the scalar interval is already exact.

Design + session record: statsclaw-workspace/fect/runs/2026-06-08-conformal-inference.md and runs/REQ-conformal/spec.md.

🤖 Generated with Claude Code

xuyiqing and others added 14 commits June 8, 2026 21:31
…erval inversion)

Phase 1 (part 1) of vartype='conformal'. R/conformal.R: .conformal_score (studentized default; ratio = Abadie post/pre RMSPE; meanabs; rmse), .conformal_pval (weighted-capable), .conformal_ci_meanabs (closed-form att.avg), .conformal_ci_grid (refit-free grid inversion for the other scores), .conformal_neff. Isolated check: meanabs coverage 0.902 at target 0.90 (Nco=39); grid studentized CI brackets truth; resolution guard unbounds when alpha < 1/(Nco+1). Next: leave-one-control-out calibration via impute_Y0 (staggered-aware via valid_controls) + vartype dispatch in boot.R/default.R; not yet wired.

Spec: statsclaw-workspace/fect/runs/REQ-conformal/spec.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… (WIP)

Phase 1 part 2 of vartype='conformal'. R/conformal.R: conformal_calibrate() does the deterministic leave-one-control-out calibration, reusing valid_controls() + impute_Y0() (the held-out, non-resampled analog of draw.error). It runs end to end on fect_boot's internal matrices (N_calib = full donor pool); the fix for the impute_Y0 effect-aggregation error was to pass the real hasRevs rather than 1. R/default.R accepts vartype='conformal' (+ mc guard). R/boot.R adds a conformal branch after the point fit that computes the calibration and currently stops with a diagnostic. tests/testthat/test-conformal.R covers the estimator-agnostic core (scores, meanabs coverage ~0.90, p-value, resolution guard, n_eff) and passes.

Not done (next): wire the result through fect's output slots (eff.calendar/est.att/est.avg) by mirroring the parametric slot assembly (an early return skips the calendar-effect computation fect.default expects); fix the studentized grid CI; validate full-pipeline coverage (the LOO-gap assembly currently yields frequent unbounded intervals to debug; over-coverage is plausibly jackknife+ conservatism); full testthat + coverage-study gates. Existing vartypes are unaffected (conformal branch is gated and early-returns; validation change is additive); bootstrap regression smoke passes.

Spec: statsclaw-workspace/fect/runs/REQ-conformal/spec.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… empties)

Validates and corrects the leave-one-control-out calibration. 200-rep block DGP
(true effect 0, alpha=0.10): total coverage 0.92-0.95, 0 unbounded across all
four scores.

## Item 1 -- unbounded intervals (already fixed; now locked)
The "~29/50 unbounded" note predated the hasRevs plumbing fix. Verified 0
unbounded over 800 calibrate calls. Mild over-coverage (0.915 vs 0.90) is the
treated-vs-control LOO asymmetry (treated gap from the full fit, each control
from an Nco-1 fit) plus jackknife+ conservatism -- safe, not a bug.

## Item 2 -- studentized "flaky NAs" were misdiagnosed
They are correct conformal empties: with per-unit pre-period studentization, a
treated unit that draws a small sd(pre) has a score exceeding every control at
every tau, so the data legitimately reject all constant effects. A wide-grid
(2001-pt) diagnostic put the best-tau p-value at the 1/(Nco+1) floor -> genuine,
not a grid-span miss. Counted as misses, studentized total coverage is nominal.

## Fixes
- .conformal_ci_grid: dimensional span bug fixed. Span was max|gap| + 6*sd(score),
  mixing the outcome-unit tau grid with score units. Now max|gap| +
  (max score + 1)*denom via new .conformal_denom(e.pre, type) (sd for studentized,
  rms for ratio, 1 for rmse). Added adaptive edge-expansion and explicit +/-Inf on
  a genuinely unbounded side.
- Honest empty handling: empty acceptance returns c(NA,NA) with attr
  "empty"="rejected_all_tau"; conformal_calibrate surfaces status in
  {ok, empty, unbounded} and warns (no more silent NA).
- Default score -> "meanabs" in the boot.R conformal branch (was "studentized").
  meanabs is the closed-form level interval for the average effect: conservative,
  never empty/unbounded above the resolution floor, matches fect's headline
  att.avg. Dispersion scores stay available (calibrated, can be empty).

## Verification
- tests/testthat/test-conformal.R: 33 pass (added grid bracket/empty/unbounded/
  expansion, .conformal_denom, + 2 full-pipeline coverage guards: meanabs never
  empties/unbounds & covers ~nominal; studentized total coverage ~nominal with
  every empty flagged, no silent NA).
- bootstrap & jackknife vartypes smoke-tested unaffected (changes are gated inside
  the vartype=="conformal" branch; conformal.R is all-new functions).

Refs statsclaw runs/2026-06-08-conformal-inference.md (Phase 1 part 3).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… empty

Per design decision, remove the path / constant-effect scores (the only ones that
could return an empty acceptance set). Every interval is now a level statistic
S_i(tau) = |m_i - tau| / scale_i, inverted in closed form to m_tr +/- scale_tr*Q.

## Core (R/conformal.R, rewritten)
- .conformal_center (mean | median), .conformal_scale (none | sd | rmspe | mad |
  diff; model-se supplied externally), .conformal_quantile (weighted (1-alpha)
  quantile with the treated +Inf atom; reduces to the ceil((1-alpha)(n+1))-th
  order statistic), .conformal_pval, .conformal_neff, .conformal_ci_level
  (closed form, never empty; unbounded only below the resolution floor).
- Removed .conformal_score, .conformal_ci_meanabs, .conformal_denom,
  .conformal_ci_grid and the empty/status="empty" machinery. The grid-span and
  empty-handling fixes from earlier are retired with them: net simpler.
- conformal_calibrate takes scale/center/weight (was score); returns
  status in {ok, unbounded}. LOO calibration plumbing unchanged.

## Dispatch (R/boot.R)
- conformal branch calls scale = "none" (meanabs, the safe provisional default);
  still stop()s with a diagnostic, replaced by output-slot integration in Phase 2.

## Verification
- tests/testthat/test-conformal.R rewritten for the Family-A API: 30 pass.
- Coverage (200 reps, block DGP, effect 0, alpha=0.10), bad=0 throughout:
  scale=none 0.915/0.895; scale=sd 0.930/0.875 (two seeds). Both calibrated.
- bootstrap & jackknife vartypes smoke-tested unaffected.

Refs statsclaw runs/2026-06-08-conformal-inference.md (overnight build, Phase 1).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Output-slot integration + argument threading. fect(vartype="conformal") now
returns a complete, printable, plottable object; no bootstrap draws.

## conformal_calibrate
- Also returns a per-period pointwise band (calendar-indexed) built from the full
  leave-one-control-out gap matrix: pre rows = placebo/pre-trend band, post rows =
  effect band. Guards model-se scale and non-cell weight as "not yet implemented".

## boot.R conformal branch
- Builds est.avg / est.avg.unit / est.att (event-time via the treated T.on map) /
  est.att90 / att.bound / est.eff.calendar(.fit) + a `conformal` meta slot, then
  returns c(out, result). Nominal S.E. back-derived from the symmetric CI (display
  only). Unsupported options (group/reversals/W/balance/placebo/carryover) error.

## default.R
- Thread conformal.scale/center/weight through fect / fect.formula / fect.default
  and both pass-throughs (+ fect_boot signature); validate the values.
- Skip the bootstrap-based diagtest (F/equivalence test) for conformal, which has
  no draws (consistent with se=FALSE); fixes an apply(!is.na(att.boot)) error.

## Verification
- 40 tests pass (test-conformal.R), incl. 2 fect() end-to-end integration tests.
- print() and plot(gap|counterfactual) work; 5 scales (none/sd/rmspe/mad/diff) +
  median center give distinct finite intervals; CI covers the true ATT.
- bootstrap / jackknife / parametric unaffected; their test.out still produced.

Refs statsclaw runs/2026-06-08-conformal-inference.md (overnight build, Phase 2).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Complete the knob set. Weights aggregate across treated units; model-se scale is
deferred (needs estimator prediction-SE plumbing) and stops with a clear message.

## conformal_calibrate
- Scalar ATT = weighted mean of per-unit post centers; per-period band uses the
  matching weighted treated trajectory. cell = per-treated-cell (n_i),
  unit = equal per unit, precision = inverse pre-period variance. Precision is
  decoupled from the `scale` knob so it down-weights noisy units even under the
  default scale = "none".

## boot.R
- Report the conformal weighted center (cc$att / band eff) as the point estimates,
  so the point and the symmetric CI share one estimand. For weight = "cell" this
  equals fect's canonical att.avg (verified to 1e-6).

## Scope
- scale: none/sd/rmspe/mad/diff work; model-se deferred (clear stop()).
- center: mean/median; per-horizon is the band (always produced).
- weight: cell/unit/precision.

## Verification
- 48 tests pass: added multi-treated weight + precision-down-weighting tests.
- Existing vartypes untouched (all changes gated to the conformal path).

Refs statsclaw runs/2026-06-08-conformal-inference.md (overnight build, Phase 3).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e 4)

## Bands
- conformal.band: "pointwise" (default) | "simultaneous"; conformal.cutoff:
  "per-period" (default) | "pooled". Threaded through all signatures + validated.
- pointwise per-period: treated gap ranked vs control gaps at that period.
- pooled: one cutoff over all post-window standardized control gaps.
- simultaneous: one multiplier c = conformal quantile of the per-control MAX
  standardized gap over the post window (sup-t uniform band); pre rows keep the
  per-period placebo band.
- est.att carries the selected band; est.att.sim (uniform band, event-time
  mapped) is always stored. conformal meta gains band.type / cutoff.

## Coverage (150 reps, block, effect 0, alpha=0.10)
- pointwise per-period 0.879; pointwise JOINT 0.440 (multiple-comparison failure).
- simultaneous JOINT 0.773: materially better jointly, but undercovers nominal in
  small N (near the jackknife+ floor 1-2*alpha + the treated-vs-control LOO
  asymmetry). A symmetric treated-gap fix is flagged for the simulation phase.

## Verification
- 55 tests pass (added band-width, est.att.sim, pooled, joint-coverage tests).
  Loopy tests use parallel = FALSE (avoids PSOCK cluster exhaustion).
- bootstrap / jackknife unaffected.

Refs statsclaw runs/2026-06-08-conformal-inference.md (overnight build, Phase 4).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…hase 5/6)

## Default chosen by simulation
- tests/coverage-study/conformal_default_study.R sweeps scale x DGP
  (iid/ar1/hetero/nonstat, 150 reps, true effect 0) + a weight sweep on a
  staggered+hetero DGP. Results in results/conformal_default_study.md.
- scale = "sd" wins: highest worst-case scalar coverage (0.920) and smallest
  mean width (3.16). Decisive under heteroskedasticity: none over-covers at
  width 5.43 vs sd 0.920 at 2.82. Set as the provisional default across all
  signatures (flagged for the user's final call).
- Weight finding: under treated-unit heterogeneity cell/unit undercover (0.873)
  and precision recovers (0.993); cell stays the default estimand.
- bad = 0 throughout (never empty/unbounded, as designed).

## Docs
- man/fect.Rd: document vartype = "conformal" + the five conformal.* args; usage
  block updated.
- NEWS.md: 2.4.6 entry. DESCRIPTION: 2.4.5 -> 2.4.6, date 2026-06-09.

## Verification
- 55 conformal tests pass with the new sd default.

Refs statsclaw runs/2026-06-08-conformal-inference.md (overnight build, Phase 5/6 docs).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ands)

The calibration now handles staggered onsets correctly, not just block /
common-onset. Block behavior is preserved exactly (verified by the existing
tests).

## conformal.R
- Treatment-incidence structure (nt[t], union post window, per-unit onsets);
  control leave-one-out fits mask the UNION post window so donor gaps are
  held-out wherever any treated unit is post.
- Per-treated-unit summaries over each unit's OWN post/pre window (was the union
  window for every unit, which mixed post-effect and pre-residual cells under
  staggering). Treated trajectory is the per-period weighted mean over the units
  that are post (or, where none are, pre) at t.
- Control placebo center = the control's gap aggregated over the union post
  window the same nt-weighted way as the treated ATT (block: a plain mean).
- New EVENT-TIME band: aggregates cohorts at each relative period, pooling the
  control gaps at the matching calendar cells. For block it equals the calendar
  band re-indexed. Returned as band.et / band.et.sim alongside the calendar band.

## boot.R
- est.att / est.att.sim now come from the event-time band (correct for staggered),
  matched to out$time by relative-time value, replacing the single-unit T.on map.

## Verification
- Staggered coverage (2 cohorts, 100 reps, true effect 0, alpha=0.10): scalar
  0.900 (iid) / 0.940 (ar1); per-period band 0.911 / 0.906; bad=0.
- 58 conformal tests pass (added a staggered run/alignment/coverage test); all
  block tests unchanged.

Refs statsclaw runs/2026-06-08-conformal-inference.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
plot.fect already renders conformal fits correctly (gap + counterfactual), and
the simultaneous band is plottable via conformal.band = "simultaneous" (it
populates est.att) or the est.att.sim slot -- no plot.R change needed. Adds a
test that all plot types build a ggplot for block + staggered + simultaneous
fits, and that the uniform band is wider than the pointwise band post-treatment.

Verification figures (block/staggered gap + counterfactual, pointwise-vs-
simultaneous overlay) saved under the sc-conformal paper folder.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
est.att90 (the inner equivalence band shown in plots) was a placeholder that
mirrored the main band. It is now a genuine narrower conformal band at level
1 - 2*alpha, computed from the same control scores at no extra fitting cost (the
band builders are parameterized by level; the outer alpha and inner 2*alpha bands
reuse one calibration pass). att.bound now derives from this inner band.

Verification: alpha=0.05 gives a 95% main band and a narrower 90% inner band
(widths 5.20 vs 4.02); 61 conformal tests pass; block bands at alpha unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…inner

- print.fect shows "Inference: conformal (scale = ..., band = ..., N.calib = ...)"
  for conformal fits.
- NEWS 2.4.6 + man/fect.Rd note staggered-adoption support, event-study/
  counterfactual plotting, and the true inner (1-2*alpha) est.att90 band.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…red+X test

With p > 0 and vartype = 'conformal', fect returns no est.beta/est.marginal
(no coefficient draws), and the Xname labelling in default.R crashed with
'attempt to set rownames on an object with no dimensions'. Guard on
!is.null() so any vartype that omits inference objects skips labelling.
Adds a staggered + covariate conformal regression test (asserts finite ATT,
ordered CI, point beta present, est.beta absent). Also makes the test file
runnable against the installed package: fect::fect in the new test,
fect:::conformal_calibrate in the coverage helper.
…to-default

- conformal.fit is now user-facing (was dead code: defined in
  conformal_calibrate but never threaded through fect()). Threaded through
  all three signatures -> fect_boot -> conformal_calibrate, and made
  two-sided: each treated unit is scored through the same custom learner
  (own pre-window, T0 = onset - 1), not just the held-out donors.
  Validation: a quadprog simplex-SC learner reproduces the hand-rolled
  Proposition 99 conformal interval exactly (ATT -19.5136,
  CI95 [-53.6576, 14.6304], p .0769).
- time.component.from defaults to NULL = auto: resolves to 'nevertreated'
  under vartype='conformal' with method fe/ife/cfe (strict separation,
  matching the paper's Algorithm 1; message emitted) and 'notyettreated'
  otherwise; explicit 'notyettreated' + conformal warns (weak separation).
- Tests: custom-learner ATT equality (block), auto-resolution equivalence +
  warning capture (ife); the conformal test file now runs against the
  installed package.
- Docs: fect.Rd (conformal.fit entry; tcf NULL semantics), NEWS.

GATE NOTE: conformal test file fully green; full testthat suite re-run
pending (two session-killed attempts, zero failures at both kill points);
finish before merging PR #143.
@xuyiqing

Copy link
Copy Markdown
Owner Author

Overnight additions (2 commits, 9b79579 + 00483a3):

  • Fix: vartype="conformal" + covariates crashed in output labelling (no est.beta under conformal); null-guarded, staggered+covariate regression test added.
  • conformal.fit made user-facing and two-sided (it was unreachable from fect()): custom separated learners now score both the held-out donors and each treated unit (own pre-window). A quadprog simplex-SC learner reproduces the hand-rolled Proposition 99 conformal interval exactly (ATT -19.5136, CI [-53.6576, 14.6304], p .0769).
  • time.component.from defaults to NULL = auto: resolves to "nevertreated" under conformal + fe/ife/cfe (strict separation, matches the paper's Algorithm 1; message emitted); explicit "notyettreated" warns.

Merge gate still open: the full testthat suite re-run was interrupted twice by session ends (zero failures at both kill points; the conformal file is fully green standalone). Run NOT_CRAN=true TESTTHAT_CPUS=10 Rscript -e 'library(fect); testthat::test_dir("tests/testthat", reporter="summary")' to completion before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant