feat(conformal): cross-sectional conformal inference (vartype = "conformal")#143
Open
xuyiqing wants to merge 14 commits into
Open
feat(conformal): cross-sectional conformal inference (vartype = "conformal")#143xuyiqing wants to merge 14 commits into
xuyiqing wants to merge 14 commits into
Conversation
…erval inversion) Phase 1 (part 1) of vartype='conformal'. R/conformal.R: .conformal_score (studentized default; ratio = Abadie post/pre RMSPE; meanabs; rmse), .conformal_pval (weighted-capable), .conformal_ci_meanabs (closed-form att.avg), .conformal_ci_grid (refit-free grid inversion for the other scores), .conformal_neff. Isolated check: meanabs coverage 0.902 at target 0.90 (Nco=39); grid studentized CI brackets truth; resolution guard unbounds when alpha < 1/(Nco+1). Next: leave-one-control-out calibration via impute_Y0 (staggered-aware via valid_controls) + vartype dispatch in boot.R/default.R; not yet wired. Spec: statsclaw-workspace/fect/runs/REQ-conformal/spec.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… (WIP) Phase 1 part 2 of vartype='conformal'. R/conformal.R: conformal_calibrate() does the deterministic leave-one-control-out calibration, reusing valid_controls() + impute_Y0() (the held-out, non-resampled analog of draw.error). It runs end to end on fect_boot's internal matrices (N_calib = full donor pool); the fix for the impute_Y0 effect-aggregation error was to pass the real hasRevs rather than 1. R/default.R accepts vartype='conformal' (+ mc guard). R/boot.R adds a conformal branch after the point fit that computes the calibration and currently stops with a diagnostic. tests/testthat/test-conformal.R covers the estimator-agnostic core (scores, meanabs coverage ~0.90, p-value, resolution guard, n_eff) and passes. Not done (next): wire the result through fect's output slots (eff.calendar/est.att/est.avg) by mirroring the parametric slot assembly (an early return skips the calendar-effect computation fect.default expects); fix the studentized grid CI; validate full-pipeline coverage (the LOO-gap assembly currently yields frequent unbounded intervals to debug; over-coverage is plausibly jackknife+ conservatism); full testthat + coverage-study gates. Existing vartypes are unaffected (conformal branch is gated and early-returns; validation change is additive); bootstrap regression smoke passes. Spec: statsclaw-workspace/fect/runs/REQ-conformal/spec.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… empties)
Validates and corrects the leave-one-control-out calibration. 200-rep block DGP
(true effect 0, alpha=0.10): total coverage 0.92-0.95, 0 unbounded across all
four scores.
## Item 1 -- unbounded intervals (already fixed; now locked)
The "~29/50 unbounded" note predated the hasRevs plumbing fix. Verified 0
unbounded over 800 calibrate calls. Mild over-coverage (0.915 vs 0.90) is the
treated-vs-control LOO asymmetry (treated gap from the full fit, each control
from an Nco-1 fit) plus jackknife+ conservatism -- safe, not a bug.
## Item 2 -- studentized "flaky NAs" were misdiagnosed
They are correct conformal empties: with per-unit pre-period studentization, a
treated unit that draws a small sd(pre) has a score exceeding every control at
every tau, so the data legitimately reject all constant effects. A wide-grid
(2001-pt) diagnostic put the best-tau p-value at the 1/(Nco+1) floor -> genuine,
not a grid-span miss. Counted as misses, studentized total coverage is nominal.
## Fixes
- .conformal_ci_grid: dimensional span bug fixed. Span was max|gap| + 6*sd(score),
mixing the outcome-unit tau grid with score units. Now max|gap| +
(max score + 1)*denom via new .conformal_denom(e.pre, type) (sd for studentized,
rms for ratio, 1 for rmse). Added adaptive edge-expansion and explicit +/-Inf on
a genuinely unbounded side.
- Honest empty handling: empty acceptance returns c(NA,NA) with attr
"empty"="rejected_all_tau"; conformal_calibrate surfaces status in
{ok, empty, unbounded} and warns (no more silent NA).
- Default score -> "meanabs" in the boot.R conformal branch (was "studentized").
meanabs is the closed-form level interval for the average effect: conservative,
never empty/unbounded above the resolution floor, matches fect's headline
att.avg. Dispersion scores stay available (calibrated, can be empty).
## Verification
- tests/testthat/test-conformal.R: 33 pass (added grid bracket/empty/unbounded/
expansion, .conformal_denom, + 2 full-pipeline coverage guards: meanabs never
empties/unbounds & covers ~nominal; studentized total coverage ~nominal with
every empty flagged, no silent NA).
- bootstrap & jackknife vartypes smoke-tested unaffected (changes are gated inside
the vartype=="conformal" branch; conformal.R is all-new functions).
Refs statsclaw runs/2026-06-08-conformal-inference.md (Phase 1 part 3).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… empty
Per design decision, remove the path / constant-effect scores (the only ones that
could return an empty acceptance set). Every interval is now a level statistic
S_i(tau) = |m_i - tau| / scale_i, inverted in closed form to m_tr +/- scale_tr*Q.
## Core (R/conformal.R, rewritten)
- .conformal_center (mean | median), .conformal_scale (none | sd | rmspe | mad |
diff; model-se supplied externally), .conformal_quantile (weighted (1-alpha)
quantile with the treated +Inf atom; reduces to the ceil((1-alpha)(n+1))-th
order statistic), .conformal_pval, .conformal_neff, .conformal_ci_level
(closed form, never empty; unbounded only below the resolution floor).
- Removed .conformal_score, .conformal_ci_meanabs, .conformal_denom,
.conformal_ci_grid and the empty/status="empty" machinery. The grid-span and
empty-handling fixes from earlier are retired with them: net simpler.
- conformal_calibrate takes scale/center/weight (was score); returns
status in {ok, unbounded}. LOO calibration plumbing unchanged.
## Dispatch (R/boot.R)
- conformal branch calls scale = "none" (meanabs, the safe provisional default);
still stop()s with a diagnostic, replaced by output-slot integration in Phase 2.
## Verification
- tests/testthat/test-conformal.R rewritten for the Family-A API: 30 pass.
- Coverage (200 reps, block DGP, effect 0, alpha=0.10), bad=0 throughout:
scale=none 0.915/0.895; scale=sd 0.930/0.875 (two seeds). Both calibrated.
- bootstrap & jackknife vartypes smoke-tested unaffected.
Refs statsclaw runs/2026-06-08-conformal-inference.md (overnight build, Phase 1).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Output-slot integration + argument threading. fect(vartype="conformal") now returns a complete, printable, plottable object; no bootstrap draws. ## conformal_calibrate - Also returns a per-period pointwise band (calendar-indexed) built from the full leave-one-control-out gap matrix: pre rows = placebo/pre-trend band, post rows = effect band. Guards model-se scale and non-cell weight as "not yet implemented". ## boot.R conformal branch - Builds est.avg / est.avg.unit / est.att (event-time via the treated T.on map) / est.att90 / att.bound / est.eff.calendar(.fit) + a `conformal` meta slot, then returns c(out, result). Nominal S.E. back-derived from the symmetric CI (display only). Unsupported options (group/reversals/W/balance/placebo/carryover) error. ## default.R - Thread conformal.scale/center/weight through fect / fect.formula / fect.default and both pass-throughs (+ fect_boot signature); validate the values. - Skip the bootstrap-based diagtest (F/equivalence test) for conformal, which has no draws (consistent with se=FALSE); fixes an apply(!is.na(att.boot)) error. ## Verification - 40 tests pass (test-conformal.R), incl. 2 fect() end-to-end integration tests. - print() and plot(gap|counterfactual) work; 5 scales (none/sd/rmspe/mad/diff) + median center give distinct finite intervals; CI covers the true ATT. - bootstrap / jackknife / parametric unaffected; their test.out still produced. Refs statsclaw runs/2026-06-08-conformal-inference.md (overnight build, Phase 2). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Complete the knob set. Weights aggregate across treated units; model-se scale is deferred (needs estimator prediction-SE plumbing) and stops with a clear message. ## conformal_calibrate - Scalar ATT = weighted mean of per-unit post centers; per-period band uses the matching weighted treated trajectory. cell = per-treated-cell (n_i), unit = equal per unit, precision = inverse pre-period variance. Precision is decoupled from the `scale` knob so it down-weights noisy units even under the default scale = "none". ## boot.R - Report the conformal weighted center (cc$att / band eff) as the point estimates, so the point and the symmetric CI share one estimand. For weight = "cell" this equals fect's canonical att.avg (verified to 1e-6). ## Scope - scale: none/sd/rmspe/mad/diff work; model-se deferred (clear stop()). - center: mean/median; per-horizon is the band (always produced). - weight: cell/unit/precision. ## Verification - 48 tests pass: added multi-treated weight + precision-down-weighting tests. - Existing vartypes untouched (all changes gated to the conformal path). Refs statsclaw runs/2026-06-08-conformal-inference.md (overnight build, Phase 3). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e 4) ## Bands - conformal.band: "pointwise" (default) | "simultaneous"; conformal.cutoff: "per-period" (default) | "pooled". Threaded through all signatures + validated. - pointwise per-period: treated gap ranked vs control gaps at that period. - pooled: one cutoff over all post-window standardized control gaps. - simultaneous: one multiplier c = conformal quantile of the per-control MAX standardized gap over the post window (sup-t uniform band); pre rows keep the per-period placebo band. - est.att carries the selected band; est.att.sim (uniform band, event-time mapped) is always stored. conformal meta gains band.type / cutoff. ## Coverage (150 reps, block, effect 0, alpha=0.10) - pointwise per-period 0.879; pointwise JOINT 0.440 (multiple-comparison failure). - simultaneous JOINT 0.773: materially better jointly, but undercovers nominal in small N (near the jackknife+ floor 1-2*alpha + the treated-vs-control LOO asymmetry). A symmetric treated-gap fix is flagged for the simulation phase. ## Verification - 55 tests pass (added band-width, est.att.sim, pooled, joint-coverage tests). Loopy tests use parallel = FALSE (avoids PSOCK cluster exhaustion). - bootstrap / jackknife unaffected. Refs statsclaw runs/2026-06-08-conformal-inference.md (overnight build, Phase 4). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…hase 5/6) ## Default chosen by simulation - tests/coverage-study/conformal_default_study.R sweeps scale x DGP (iid/ar1/hetero/nonstat, 150 reps, true effect 0) + a weight sweep on a staggered+hetero DGP. Results in results/conformal_default_study.md. - scale = "sd" wins: highest worst-case scalar coverage (0.920) and smallest mean width (3.16). Decisive under heteroskedasticity: none over-covers at width 5.43 vs sd 0.920 at 2.82. Set as the provisional default across all signatures (flagged for the user's final call). - Weight finding: under treated-unit heterogeneity cell/unit undercover (0.873) and precision recovers (0.993); cell stays the default estimand. - bad = 0 throughout (never empty/unbounded, as designed). ## Docs - man/fect.Rd: document vartype = "conformal" + the five conformal.* args; usage block updated. - NEWS.md: 2.4.6 entry. DESCRIPTION: 2.4.5 -> 2.4.6, date 2026-06-09. ## Verification - 55 conformal tests pass with the new sd default. Refs statsclaw runs/2026-06-08-conformal-inference.md (overnight build, Phase 5/6 docs). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ands) The calibration now handles staggered onsets correctly, not just block / common-onset. Block behavior is preserved exactly (verified by the existing tests). ## conformal.R - Treatment-incidence structure (nt[t], union post window, per-unit onsets); control leave-one-out fits mask the UNION post window so donor gaps are held-out wherever any treated unit is post. - Per-treated-unit summaries over each unit's OWN post/pre window (was the union window for every unit, which mixed post-effect and pre-residual cells under staggering). Treated trajectory is the per-period weighted mean over the units that are post (or, where none are, pre) at t. - Control placebo center = the control's gap aggregated over the union post window the same nt-weighted way as the treated ATT (block: a plain mean). - New EVENT-TIME band: aggregates cohorts at each relative period, pooling the control gaps at the matching calendar cells. For block it equals the calendar band re-indexed. Returned as band.et / band.et.sim alongside the calendar band. ## boot.R - est.att / est.att.sim now come from the event-time band (correct for staggered), matched to out$time by relative-time value, replacing the single-unit T.on map. ## Verification - Staggered coverage (2 cohorts, 100 reps, true effect 0, alpha=0.10): scalar 0.900 (iid) / 0.940 (ar1); per-period band 0.911 / 0.906; bad=0. - 58 conformal tests pass (added a staggered run/alignment/coverage test); all block tests unchanged. Refs statsclaw runs/2026-06-08-conformal-inference.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
plot.fect already renders conformal fits correctly (gap + counterfactual), and the simultaneous band is plottable via conformal.band = "simultaneous" (it populates est.att) or the est.att.sim slot -- no plot.R change needed. Adds a test that all plot types build a ggplot for block + staggered + simultaneous fits, and that the uniform band is wider than the pointwise band post-treatment. Verification figures (block/staggered gap + counterfactual, pointwise-vs- simultaneous overlay) saved under the sc-conformal paper folder. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
est.att90 (the inner equivalence band shown in plots) was a placeholder that mirrored the main band. It is now a genuine narrower conformal band at level 1 - 2*alpha, computed from the same control scores at no extra fitting cost (the band builders are parameterized by level; the outer alpha and inner 2*alpha bands reuse one calibration pass). att.bound now derives from this inner band. Verification: alpha=0.05 gives a 95% main band and a narrower 90% inner band (widths 5.20 vs 4.02); 61 conformal tests pass; block bands at alpha unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…inner - print.fect shows "Inference: conformal (scale = ..., band = ..., N.calib = ...)" for conformal fits. - NEWS 2.4.6 + man/fect.Rd note staggered-adoption support, event-study/ counterfactual plotting, and the true inner (1-2*alpha) est.att90 band. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…red+X test With p > 0 and vartype = 'conformal', fect returns no est.beta/est.marginal (no coefficient draws), and the Xname labelling in default.R crashed with 'attempt to set rownames on an object with no dimensions'. Guard on !is.null() so any vartype that omits inference objects skips labelling. Adds a staggered + covariate conformal regression test (asserts finite ATT, ordered CI, point beta present, est.beta absent). Also makes the test file runnable against the installed package: fect::fect in the new test, fect:::conformal_calibrate in the coverage helper.
…to-default - conformal.fit is now user-facing (was dead code: defined in conformal_calibrate but never threaded through fect()). Threaded through all three signatures -> fect_boot -> conformal_calibrate, and made two-sided: each treated unit is scored through the same custom learner (own pre-window, T0 = onset - 1), not just the held-out donors. Validation: a quadprog simplex-SC learner reproduces the hand-rolled Proposition 99 conformal interval exactly (ATT -19.5136, CI95 [-53.6576, 14.6304], p .0769). - time.component.from defaults to NULL = auto: resolves to 'nevertreated' under vartype='conformal' with method fe/ife/cfe (strict separation, matching the paper's Algorithm 1; message emitted) and 'notyettreated' otherwise; explicit 'notyettreated' + conformal warns (weak separation). - Tests: custom-learner ATT equality (block), auto-resolution equivalence + warning capture (ife); the conformal test file now runs against the installed package. - Docs: fect.Rd (conformal.fit entry; tcf NULL semantics), NEWS. GATE NOTE: conformal test file fully green; full testthat suite re-run pending (two session-killed attempts, zero failures at both kill points); finish before merging PR #143.
Owner
Author
|
Overnight additions (2 commits,
Merge gate still open: the full testthat suite re-run was interrupted twice by session ends (zero failures at both kill points; the conformal file is fully green standalone). Run |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
vartype = "conformal": a cross-sectional conformal prediction interval for the average treatment effect on the treated. It ranks the treated unit's average post-treatment prediction error against the leave-one-control-out prediction errors of the donor pool, giving a distribution-free interval with no bootstrap draws. The interval is closed-form and never empty.This implements the
sc-conformalpaper's Algorithm 1 as an estimator-agnostic fect feature.Design: Family A (level statistics)
Every interval is a level statistic
S_i(τ) = |m_i − τ| / scale_i, inverted in closed form tom_tr ± scale_tr · Q, whereQis the conformal (1−α) quantile of the control scores. Never empty; unbounded only below the resolution floorα < 1/(Nco+1). (An earlier path/constant-effect score family was removed because it could return an empty acceptance set, which is confusing to report.)Options (orthogonal knobs)
conformal.scalesd·none·rmspe·mad·diff(model-sereserved)sdconformal.centermean·medianmeanconformal.weightcell·unit·precisioncellconformal.bandpointwise·simultaneouspointwiseconformal.cutoffper-period·pooledper-periodThe simultaneous (sup-t uniform) band is also always stored in
fit$est.att.sim.Default chosen by simulation
scale = "sd"(the studentized-mean) was selected by a coverage study (tests/coverage-study/conformal_default_study.R, 150 reps, true effect 0): it has the highest worst-case scalar coverage (0.920) and the smallest mean width (3.16) across iid / AR(1) / heteroskedastic / nonstationary DGPs. It is decisively tighter thannoneunder heteroskedasticity (width 2.82 vs 5.43 at equal coverage). This default is provisional and easy to change — the maintainer's call.Scope and guards
methodnot inc("mc", "both").diagtest) is skipped for conformal (no draws), as it is forse = FALSE.W) / balanced / placebo / carryover options error clearly and direct the user tobootstrap/jackknife.bootstrap/jackknife/parametric) are untouched — all changes are gated to the conformal branch /conformal.R.Known open items (documented, not blocking)
cell/unitweights undercover under treated-unit heterogeneity (precisionrecovers it). A placebo-average calibration is the future refinement.conformal.scale = "model-se"is reserved (needs estimator prediction-SE plumbing); it stops with a clear message.Verification
tests/testthat/test-conformal.R: 55 tests pass (core engine, closed-form interval, all scales/centers/weights, pointwise + simultaneous bands, multi-treated weights, end-to-endfect()integration, coverage guards).print()andplot(type = "gap"|"counterfactual")work on conformal fits.tests/coverage-study/results/conformal_default_study.md.testthatsuite (all files, serial): 0 failures, 0 warnings, 0 skips.run_para_error_coverage.R): T19/T20/T21 all PASS (the parametric variance path is unaffected; conformal changes are additive and gated).Update — additional buildout
plot()renders the event-study and counterfactual for block and staggered conformal fits. The simultaneous band is plottable viaconformal.band = "simultaneous"(it fillsest.att) or theest.att.simslot.est.att90is now a genuine1 − 2αconformal band (was a placeholder mirroring the main band), computed at no extra fitting cost.Inference: conformal (scale, band, N.calib)line.Design + session record:
statsclaw-workspace/fect/runs/2026-06-08-conformal-inference.mdandruns/REQ-conformal/spec.md.🤖 Generated with Claude Code