feat(conformal): cross-sectional conformal inference (vartype = "conformal") by xuyiqing · Pull Request #143 · xuyiqing/fect

xuyiqing · 2026-06-09T11:51:38Z

Summary

Adds vartype = "conformal": a cross-sectional conformal prediction interval for the average treatment effect on the treated. It ranks the treated unit's average post-treatment prediction error against the leave-one-control-out prediction errors of the donor pool, giving a distribution-free interval with no bootstrap draws. The interval is closed-form and never empty.

This implements the sc-conformal paper's Algorithm 1 as an estimator-agnostic fect feature.

Design: Family A (level statistics)

Every interval is a level statistic S_i(τ) = |m_i − τ| / scale_i, inverted in closed form to m_tr ± scale_tr · Q, where Q is the conformal (1−α) quantile of the control scores. Never empty; unbounded only below the resolution floor α < 1/(Nco+1). (An earlier path/constant-effect score family was removed because it could return an empty acceptance set, which is confusing to report.)

Options (orthogonal knobs)

arg	values	default
`conformal.scale`	`sd` · `none` · `rmspe` · `mad` · `diff` (`model-se` reserved)	`sd`
`conformal.center`	`mean` · `median`	`mean`
`conformal.weight`	`cell` · `unit` · `precision`	`cell`
`conformal.band`	`pointwise` · `simultaneous`	`pointwise`
`conformal.cutoff`	`per-period` · `pooled`	`per-period`

The simultaneous (sup-t uniform) band is also always stored in fit$est.att.sim.

Default chosen by simulation

scale = "sd" (the studentized-mean) was selected by a coverage study (tests/coverage-study/conformal_default_study.R, 150 reps, true effect 0): it has the highest worst-case scalar coverage (0.920) and the smallest mean width (3.16) across iid / AR(1) / heteroskedastic / nonstationary DGPs. It is decisively tighter than none under heteroskedasticity (width 2.82 vs 5.43 at equal coverage). This default is provisional and easy to change — the maintainer's call.

Scope and guards

Requires a separated (controls-only) fit and method not in c("mc", "both").
The bootstrap F / equivalence diagnostic (diagtest) is skipped for conformal (no draws), as it is for se = FALSE.
Group / reversal / weighted (W) / balanced / placebo / carryover options error clearly and direct the user to bootstrap/jackknife.
Existing vartypes (bootstrap/jackknife/parametric) are untouched — all changes are gated to the conformal branch / conformal.R.

Known open items (documented, not blocking)

Multi-treated calibration ranks an Ntr-aggregate against single controls; the simultaneous band undercovers nominal in small N (~0.77 vs 0.90), and cell/unit weights undercover under treated-unit heterogeneity (precision recovers it). A placebo-average calibration is the future refinement.
conformal.scale = "model-se" is reserved (needs estimator prediction-SE plumbing); it stops with a clear message.

Verification

tests/testthat/test-conformal.R: 55 tests pass (core engine, closed-form interval, all scales/centers/weights, pointwise + simultaneous bands, multi-treated weights, end-to-end fect() integration, coverage guards).
print() and plot(type = "gap"|"counterfactual") work on conformal fits.
Coverage study results: tests/coverage-study/results/conformal_default_study.md.
Full testthat suite (all files, serial): 0 failures, 0 warnings, 0 skips.
Inferential coverage study (run_para_error_coverage.R): T19/T20/T21 all PASS (the parametric variance path is unaffected; conformal changes are additive and gated).

Update — additional buildout

Staggered adoption is now supported (per-cohort donor pools, union-window calibration, event-time-aggregated bands). Block behavior is preserved exactly. Coverage on a two-cohort staggered DGP (100 reps, effect 0, α=0.10): scalar 0.90 (iid) / 0.94 (ar1); per-period band 0.91 / 0.91; never empty/unbounded.
Plotting: plot() renders the event-study and counterfactual for block and staggered conformal fits. The simultaneous band is plottable via conformal.band = "simultaneous" (it fills est.att) or the est.att.sim slot.
True inner band: est.att90 is now a genuine 1 − 2α conformal band (was a placeholder mirroring the main band), computed at no extra fitting cost.
Print: an Inference: conformal (scale, band, N.calib) line.
Joint-band coverage characterized (300 reps): the simultaneous band's joint coverage is ~0.84–0.87 (mildly anti-conservative, in the jackknife+ range), far better than the pointwise joint ~0.66. Full conformal on the max statistic would close the gap and is left as a documented follow-up; the scalar interval is already exact.

Design + session record: statsclaw-workspace/fect/runs/2026-06-08-conformal-inference.md and runs/REQ-conformal/spec.md.

🤖 Generated with Claude Code

…erval inversion) Phase 1 (part 1) of vartype='conformal'. R/conformal.R: .conformal_score (studentized default; ratio = Abadie post/pre RMSPE; meanabs; rmse), .conformal_pval (weighted-capable), .conformal_ci_meanabs (closed-form att.avg), .conformal_ci_grid (refit-free grid inversion for the other scores), .conformal_neff. Isolated check: meanabs coverage 0.902 at target 0.90 (Nco=39); grid studentized CI brackets truth; resolution guard unbounds when alpha < 1/(Nco+1). Next: leave-one-control-out calibration via impute_Y0 (staggered-aware via valid_controls) + vartype dispatch in boot.R/default.R; not yet wired. Spec: statsclaw-workspace/fect/runs/REQ-conformal/spec.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… (WIP) Phase 1 part 2 of vartype='conformal'. R/conformal.R: conformal_calibrate() does the deterministic leave-one-control-out calibration, reusing valid_controls() + impute_Y0() (the held-out, non-resampled analog of draw.error). It runs end to end on fect_boot's internal matrices (N_calib = full donor pool); the fix for the impute_Y0 effect-aggregation error was to pass the real hasRevs rather than 1. R/default.R accepts vartype='conformal' (+ mc guard). R/boot.R adds a conformal branch after the point fit that computes the calibration and currently stops with a diagnostic. tests/testthat/test-conformal.R covers the estimator-agnostic core (scores, meanabs coverage ~0.90, p-value, resolution guard, n_eff) and passes. Not done (next): wire the result through fect's output slots (eff.calendar/est.att/est.avg) by mirroring the parametric slot assembly (an early return skips the calendar-effect computation fect.default expects); fix the studentized grid CI; validate full-pipeline coverage (the LOO-gap assembly currently yields frequent unbounded intervals to debug; over-coverage is plausibly jackknife+ conservatism); full testthat + coverage-study gates. Existing vartypes are unaffected (conformal branch is gated and early-returns; validation change is additive); bootstrap regression smoke passes. Spec: statsclaw-workspace/fect/runs/REQ-conformal/spec.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… empties) Validates and corrects the leave-one-control-out calibration. 200-rep block DGP (true effect 0, alpha=0.10): total coverage 0.92-0.95, 0 unbounded across all four scores. ## Item 1 -- unbounded intervals (already fixed; now locked) The "~29/50 unbounded" note predated the hasRevs plumbing fix. Verified 0 unbounded over 800 calibrate calls. Mild over-coverage (0.915 vs 0.90) is the treated-vs-control LOO asymmetry (treated gap from the full fit, each control from an Nco-1 fit) plus jackknife+ conservatism -- safe, not a bug. ## Item 2 -- studentized "flaky NAs" were misdiagnosed They are correct conformal empties: with per-unit pre-period studentization, a treated unit that draws a small sd(pre) has a score exceeding every control at every tau, so the data legitimately reject all constant effects. A wide-grid (2001-pt) diagnostic put the best-tau p-value at the 1/(Nco+1) floor -> genuine, not a grid-span miss. Counted as misses, studentized total coverage is nominal. ## Fixes - .conformal_ci_grid: dimensional span bug fixed. Span was max|gap| + 6*sd(score), mixing the outcome-unit tau grid with score units. Now max|gap| + (max score + 1)*denom via new .conformal_denom(e.pre, type) (sd for studentized, rms for ratio, 1 for rmse). Added adaptive edge-expansion and explicit +/-Inf on a genuinely unbounded side. - Honest empty handling: empty acceptance returns c(NA,NA) with attr "empty"="rejected_all_tau"; conformal_calibrate surfaces status in {ok, empty, unbounded} and warns (no more silent NA). - Default score -> "meanabs" in the boot.R conformal branch (was "studentized"). meanabs is the closed-form level interval for the average effect: conservative, never empty/unbounded above the resolution floor, matches fect's headline att.avg. Dispersion scores stay available (calibrated, can be empty). ## Verification - tests/testthat/test-conformal.R: 33 pass (added grid bracket/empty/unbounded/ expansion, .conformal_denom, + 2 full-pipeline coverage guards: meanabs never empties/unbounds & covers ~nominal; studentized total coverage ~nominal with every empty flagged, no silent NA). - bootstrap & jackknife vartypes smoke-tested unaffected (changes are gated inside the vartype=="conformal" branch; conformal.R is all-new functions). Refs statsclaw runs/2026-06-08-conformal-inference.md (Phase 1 part 3). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

… empty Per design decision, remove the path / constant-effect scores (the only ones that could return an empty acceptance set). Every interval is now a level statistic S_i(tau) = |m_i - tau| / scale_i, inverted in closed form to m_tr +/- scale_tr*Q. ## Core (R/conformal.R, rewritten) - .conformal_center (mean | median), .conformal_scale (none | sd | rmspe | mad | diff; model-se supplied externally), .conformal_quantile (weighted (1-alpha) quantile with the treated +Inf atom; reduces to the ceil((1-alpha)(n+1))-th order statistic), .conformal_pval, .conformal_neff, .conformal_ci_level (closed form, never empty; unbounded only below the resolution floor). - Removed .conformal_score, .conformal_ci_meanabs, .conformal_denom, .conformal_ci_grid and the empty/status="empty" machinery. The grid-span and empty-handling fixes from earlier are retired with them: net simpler. - conformal_calibrate takes scale/center/weight (was score); returns status in {ok, unbounded}. LOO calibration plumbing unchanged. ## Dispatch (R/boot.R) - conformal branch calls scale = "none" (meanabs, the safe provisional default); still stop()s with a diagnostic, replaced by output-slot integration in Phase 2. ## Verification - tests/testthat/test-conformal.R rewritten for the Family-A API: 30 pass. - Coverage (200 reps, block DGP, effect 0, alpha=0.10), bad=0 throughout: scale=none 0.915/0.895; scale=sd 0.930/0.875 (two seeds). Both calibrated. - bootstrap & jackknife vartypes smoke-tested unaffected. Refs statsclaw runs/2026-06-08-conformal-inference.md (overnight build, Phase 1). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Output-slot integration + argument threading. fect(vartype="conformal") now returns a complete, printable, plottable object; no bootstrap draws. ## conformal_calibrate - Also returns a per-period pointwise band (calendar-indexed) built from the full leave-one-control-out gap matrix: pre rows = placebo/pre-trend band, post rows = effect band. Guards model-se scale and non-cell weight as "not yet implemented". ## boot.R conformal branch - Builds est.avg / est.avg.unit / est.att (event-time via the treated T.on map) / est.att90 / att.bound / est.eff.calendar(.fit) + a `conformal` meta slot, then returns c(out, result). Nominal S.E. back-derived from the symmetric CI (display only). Unsupported options (group/reversals/W/balance/placebo/carryover) error. ## default.R - Thread conformal.scale/center/weight through fect / fect.formula / fect.default and both pass-throughs (+ fect_boot signature); validate the values. - Skip the bootstrap-based diagtest (F/equivalence test) for conformal, which has no draws (consistent with se=FALSE); fixes an apply(!is.na(att.boot)) error. ## Verification - 40 tests pass (test-conformal.R), incl. 2 fect() end-to-end integration tests. - print() and plot(gap|counterfactual) work; 5 scales (none/sd/rmspe/mad/diff) + median center give distinct finite intervals; CI covers the true ATT. - bootstrap / jackknife / parametric unaffected; their test.out still produced. Refs statsclaw runs/2026-06-08-conformal-inference.md (overnight build, Phase 2). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Complete the knob set. Weights aggregate across treated units; model-se scale is deferred (needs estimator prediction-SE plumbing) and stops with a clear message. ## conformal_calibrate - Scalar ATT = weighted mean of per-unit post centers; per-period band uses the matching weighted treated trajectory. cell = per-treated-cell (n_i), unit = equal per unit, precision = inverse pre-period variance. Precision is decoupled from the `scale` knob so it down-weights noisy units even under the default scale = "none". ## boot.R - Report the conformal weighted center (cc$att / band eff) as the point estimates, so the point and the symmetric CI share one estimand. For weight = "cell" this equals fect's canonical att.avg (verified to 1e-6). ## Scope - scale: none/sd/rmspe/mad/diff work; model-se deferred (clear stop()). - center: mean/median; per-horizon is the band (always produced). - weight: cell/unit/precision. ## Verification - 48 tests pass: added multi-treated weight + precision-down-weighting tests. - Existing vartypes untouched (all changes gated to the conformal path). Refs statsclaw runs/2026-06-08-conformal-inference.md (overnight build, Phase 3). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…e 4) ## Bands - conformal.band: "pointwise" (default) | "simultaneous"; conformal.cutoff: "per-period" (default) | "pooled". Threaded through all signatures + validated. - pointwise per-period: treated gap ranked vs control gaps at that period. - pooled: one cutoff over all post-window standardized control gaps. - simultaneous: one multiplier c = conformal quantile of the per-control MAX standardized gap over the post window (sup-t uniform band); pre rows keep the per-period placebo band. - est.att carries the selected band; est.att.sim (uniform band, event-time mapped) is always stored. conformal meta gains band.type / cutoff. ## Coverage (150 reps, block, effect 0, alpha=0.10) - pointwise per-period 0.879; pointwise JOINT 0.440 (multiple-comparison failure). - simultaneous JOINT 0.773: materially better jointly, but undercovers nominal in small N (near the jackknife+ floor 1-2*alpha + the treated-vs-control LOO asymmetry). A symmetric treated-gap fix is flagged for the simulation phase. ## Verification - 55 tests pass (added band-width, est.att.sim, pooled, joint-coverage tests). Loopy tests use parallel = FALSE (avoids PSOCK cluster exhaustion). - bootstrap / jackknife unaffected. Refs statsclaw runs/2026-06-08-conformal-inference.md (overnight build, Phase 4). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…hase 5/6) ## Default chosen by simulation - tests/coverage-study/conformal_default_study.R sweeps scale x DGP (iid/ar1/hetero/nonstat, 150 reps, true effect 0) + a weight sweep on a staggered+hetero DGP. Results in results/conformal_default_study.md. - scale = "sd" wins: highest worst-case scalar coverage (0.920) and smallest mean width (3.16). Decisive under heteroskedasticity: none over-covers at width 5.43 vs sd 0.920 at 2.82. Set as the provisional default across all signatures (flagged for the user's final call). - Weight finding: under treated-unit heterogeneity cell/unit undercover (0.873) and precision recovers (0.993); cell stays the default estimand. - bad = 0 throughout (never empty/unbounded, as designed). ## Docs - man/fect.Rd: document vartype = "conformal" + the five conformal.* args; usage block updated. - NEWS.md: 2.4.6 entry. DESCRIPTION: 2.4.5 -> 2.4.6, date 2026-06-09. ## Verification - 55 conformal tests pass with the new sd default. Refs statsclaw runs/2026-06-08-conformal-inference.md (overnight build, Phase 5/6 docs). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ands) The calibration now handles staggered onsets correctly, not just block / common-onset. Block behavior is preserved exactly (verified by the existing tests). ## conformal.R - Treatment-incidence structure (nt[t], union post window, per-unit onsets); control leave-one-out fits mask the UNION post window so donor gaps are held-out wherever any treated unit is post. - Per-treated-unit summaries over each unit's OWN post/pre window (was the union window for every unit, which mixed post-effect and pre-residual cells under staggering). Treated trajectory is the per-period weighted mean over the units that are post (or, where none are, pre) at t. - Control placebo center = the control's gap aggregated over the union post window the same nt-weighted way as the treated ATT (block: a plain mean). - New EVENT-TIME band: aggregates cohorts at each relative period, pooling the control gaps at the matching calendar cells. For block it equals the calendar band re-indexed. Returned as band.et / band.et.sim alongside the calendar band. ## boot.R - est.att / est.att.sim now come from the event-time band (correct for staggered), matched to out$time by relative-time value, replacing the single-unit T.on map. ## Verification - Staggered coverage (2 cohorts, 100 reps, true effect 0, alpha=0.10): scalar 0.900 (iid) / 0.940 (ar1); per-period band 0.911 / 0.906; bad=0. - 58 conformal tests pass (added a staggered run/alignment/coverage test); all block tests unchanged. Refs statsclaw runs/2026-06-08-conformal-inference.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

plot.fect already renders conformal fits correctly (gap + counterfactual), and the simultaneous band is plottable via conformal.band = "simultaneous" (it populates est.att) or the est.att.sim slot -- no plot.R change needed. Adds a test that all plot types build a ggplot for block + staggered + simultaneous fits, and that the uniform band is wider than the pointwise band post-treatment. Verification figures (block/staggered gap + counterfactual, pointwise-vs- simultaneous overlay) saved under the sc-conformal paper folder. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

est.att90 (the inner equivalence band shown in plots) was a placeholder that mirrored the main band. It is now a genuine narrower conformal band at level 1 - 2*alpha, computed from the same control scores at no extra fitting cost (the band builders are parameterized by level; the outer alpha and inner 2*alpha bands reuse one calibration pass). att.bound now derives from this inner band. Verification: alpha=0.05 gives a 95% main band and a narrower 90% inner band (widths 5.20 vs 4.02); 61 conformal tests pass; block bands at alpha unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…inner - print.fect shows "Inference: conformal (scale = ..., band = ..., N.calib = ...)" for conformal fits. - NEWS 2.4.6 + man/fect.Rd note staggered-adoption support, event-study/ counterfactual plotting, and the true inner (1-2*alpha) est.att90 band. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…red+X test With p > 0 and vartype = 'conformal', fect returns no est.beta/est.marginal (no coefficient draws), and the Xname labelling in default.R crashed with 'attempt to set rownames on an object with no dimensions'. Guard on !is.null() so any vartype that omits inference objects skips labelling. Adds a staggered + covariate conformal regression test (asserts finite ATT, ordered CI, point beta present, est.beta absent). Also makes the test file runnable against the installed package: fect::fect in the new test, fect:::conformal_calibrate in the coverage helper.

…to-default - conformal.fit is now user-facing (was dead code: defined in conformal_calibrate but never threaded through fect()). Threaded through all three signatures -> fect_boot -> conformal_calibrate, and made two-sided: each treated unit is scored through the same custom learner (own pre-window, T0 = onset - 1), not just the held-out donors. Validation: a quadprog simplex-SC learner reproduces the hand-rolled Proposition 99 conformal interval exactly (ATT -19.5136, CI95 [-53.6576, 14.6304], p .0769). - time.component.from defaults to NULL = auto: resolves to 'nevertreated' under vartype='conformal' with method fe/ife/cfe (strict separation, matching the paper's Algorithm 1; message emitted) and 'notyettreated' otherwise; explicit 'notyettreated' + conformal warns (weak separation). - Tests: custom-learner ATT equality (block), auto-resolution equivalence + warning capture (ife); the conformal test file now runs against the installed package. - Docs: fect.Rd (conformal.fit entry; tcf NULL semantics), NEWS. GATE NOTE: conformal test file fully green; full testthat suite re-run pending (two session-killed attempts, zero failures at both kill points); finish before merging PR #143.

xuyiqing · 2026-06-10T14:41:40Z

Overnight additions (2 commits, 9b79579 + 00483a3):

Fix: vartype="conformal" + covariates crashed in output labelling (no est.beta under conformal); null-guarded, staggered+covariate regression test added.
conformal.fit made user-facing and two-sided (it was unreachable from fect()): custom separated learners now score both the held-out donors and each treated unit (own pre-window). A quadprog simplex-SC learner reproduces the hand-rolled Proposition 99 conformal interval exactly (ATT -19.5136, CI [-53.6576, 14.6304], p .0769).
time.component.from defaults to NULL = auto: resolves to "nevertreated" under conformal + fe/ife/cfe (strict separation, matches the paper's Algorithm 1; message emitted); explicit "notyettreated" warns.

Merge gate still open: the full testthat suite re-run was interrupted twice by session ends (zero failures at both kill points; the conformal file is fully green standalone). Run NOT_CRAN=true TESTTHAT_CPUS=10 Rscript -e 'library(fect); testthat::test_dir("tests/testthat", reporter="summary")' to completion before merging.

xuyiqing and others added 14 commits June 8, 2026 21:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(conformal): cross-sectional conformal inference (vartype = "conformal")#143

feat(conformal): cross-sectional conformal inference (vartype = "conformal")#143
xuyiqing wants to merge 14 commits into
devfrom
feat/conformal-inference

xuyiqing commented Jun 9, 2026 •

edited

Loading

Uh oh!

xuyiqing commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xuyiqing commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design: Family A (level statistics)

Options (orthogonal knobs)

Default chosen by simulation

Scope and guards

Known open items (documented, not blocking)

Verification

Update — additional buildout

Uh oh!

xuyiqing commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xuyiqing commented Jun 9, 2026 •

edited

Loading