diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 00000000..fa1d10a9 --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,337 @@ +# Architecture — fect + +> Generated by scriber for run `20260329-arch-docs` on 2026-03-29. + +## Overview + +fect is an R package for estimating causal effects in panel data using counterfactual imputation methods (Fixed Effects Counterfactual Estimators). It targets causal panel analysis with binary treatments under the parallel trends assumption, supporting treatment switching and limited carryover effects. The core abstraction is counterfactual imputation: impute missing potential outcomes Y(0) for treated units using control units, then compute the Average Treatment Effect on the Treated (ATT) as the gap between observed and imputed outcomes. The package is an R/C++ hybrid using Rcpp and RcppArmadillo for numerically intensive linear algebra (SVD, EM iterations, matrix factorization). Key external dependencies include fixest (initial FE regression), ggplot2 (visualization), doParallel/doFuture/future.apply (parallel bootstrap), MASS (generalized inverse), and mvtnorm (multivariate normal draws). Estimation methods include FE (fixed effects), IFE (interactive fixed effects / factor model), MC (matrix completion via nuclear norm regularization), CFE (complex fixed effects with structured covariates), and wrappers for modern DID estimators. Version 2.2.0. References: Liu, Wang, and Xu (2024); Chiu et al. (2025). + +--- + +## Module Structure + +```mermaid +%%{init: {'theme': 'neutral'}}%% +graph TD + subgraph API["API Layer"] + A1["default.R — fect() entry"] + A2["interFE.R — interFE()"] + A3["did_wrapper.R — DID wraps"] + A4["fect_mspe.R — MSPE comp"] + end + + subgraph Est["Estimation Layer"] + E1["fe.R — fect_fe() IFE"] + E2["mc.R — fect_mc() MC"] + E3["cfe.R — fect_cfe() CFE"] + E4["fect_nevertreated.R"] + end + + subgraph CV["Cross-Validation"] + V1["cv.R — fect_cv()"] + V2["cv_binary.R — binary CV"] + end + + subgraph Inf["Inference Layer"] + I1["boot.R — fect_boot()"] + end + + subgraph Diag["Diagnostics & Sensitivity"] + D1["diagtest.R — pre-trend"] + D2["fittest.R — fitness test"] + D3["fect_sens.R — sensitivity"] + D4["fect_iden.R — identification"] + end + + subgraph Viz["Visualization Layer"] + P1["plot.R — plot.fect()"] + P2["esplot.R — esplot()"] + end + + subgraph Cpp["C++ Core (RcppArmadillo)"] + C1["ife.cpp / ife_sub.cpp"] + C2["mc.cpp"] + C3["cfe.cpp / cfe_sub.cpp"] + C4["fe_sub.cpp — shared utils"] + C5["auxiliary.cpp — EM helpers"] + C6["binary_*.cpp — probit"] + end + + subgraph Util["Support & Data"] + U1["support.R — data helpers"] + U2["polynomial.R — trends"] + U3["effect.R / cumu.R — ATT"] + U4["score.R / permutation.R"] + U5["getcohort.R — cohorts"] + U6["print.R — S3 print"] + end + + A1 --> E1 + A1 --> E2 + A1 --> E3 + A1 --> E4 + A1 --> V1 + A1 --> I1 + A1 --> D1 + A1 --> P1 + V1 --> E1 + V1 --> E2 + I1 --> E1 + I1 --> E2 + I1 --> E3 + E1 --> C1 + E2 --> C2 + E3 --> C3 + C1 --> C4 + C2 --> C4 + C3 --> C4 + C1 --> C5 + E1 --> U1 + E2 --> U1 + E3 --> U1 + A1 --> U1 + A1 --> U2 +``` + +### Module Reference + +| Module / File | Layer | Purpose | Key Exports | Changed | +| --- | --- | --- | --- | --- | +| `R/default.R` (2,919 lines) | API | Main entry point, parameter validation, method routing | `fect()`, `fect.formula()`, `fect.default()` | no | +| `R/interFE.R` (515 lines) | API | Standalone interactive fixed effects estimator | `interFE()` | no | +| `R/did_wrapper.R` (656 lines) | API | Modern DID estimator wrappers (did, DIDmultiplegtDYN) | `did_wrapper()` | no | +| `R/fect_mspe.R` (344 lines) | API | MSPE computation for model comparison | `fect_mspe()` | no | +| `R/fe.R` (954 lines) | Estimation | Interactive Fixed Effects / factor model estimation | `fect_fe()` | no | +| `R/mc.R` (804 lines) | Estimation | Matrix Completion via nuclear norm regularization | `fect_mc()` | no | +| `R/cfe.R` (1,172 lines) | Estimation | Complex Fixed Effects with structured covariates | `fect_cfe()` | no | +| `R/fect_nevertreated.R` (3,166 lines) | Estimation | Never-treated comparison group variant | `fect_nevertreated()` | no | +| `R/cv.R` (1,526 lines) | Cross-Validation | Hyperparameter selection (r, lambda) via MSPE/PC | `fect_cv()` | no | +| `R/cv_binary.R` (421 lines) | Cross-Validation | Cross-validation for binary/probit models | `fect_cv_binary()` | no | +| `R/boot.R` (4,884 lines) | Inference | Bootstrap/jackknife/parametric inference with parallel support | `fect_boot()` | no | +| `R/diagtest.R` (215 lines) | Diagnostics | Pre-trend F-test, equivalence (TOST), placebo, carryover tests | `diagtest()` | no | +| `R/fittest.R` (636 lines) | Diagnostics | Fitness/wild bootstrap test | `fect_test()` | no | +| `R/fect_sens.R` (232 lines) | Diagnostics | Sensitivity analysis via HonestDiDFEct | `fect_sens()` | no | +| `R/fect_iden.R` (224 lines) | Diagnostics | Identification analysis | `fect_iden()` | no | +| `R/plot.R` (5,019 lines) | Visualization | Comprehensive ggplot2 plotting (gap, equiv, status, exit, factors, loadings, calendar, counterfactual, heterogeneous) | `plot.fect()` | no | +| `R/esplot.R` (1,118 lines) | Visualization | Standalone event-study plots | `esplot()` | no | +| `R/plot_return.R` (9 lines) | Visualization | Plot return object class definition | (internal) | no | +| `R/support.R` (676 lines) | Utilities | Data manipulation, initial FE fit, helper functions | `get_term()`, `align_beta0()` | no | +| `R/polynomial.R` (844 lines) | Utilities | Polynomial/B-spline trend specification | `fect_polynomial()` | no | +| `R/effect.R` (397 lines) | Utilities | Treatment effect decomposition by sub-group | `effect()` | no | +| `R/cumu.R` (206 lines) | Utilities | Cumulative ATT computation | `att.cumu()` | no | +| `R/score.R` (105 lines) | Utilities | Score-based inference | (internal) | no | +| `R/permutation.R` (264 lines) | Utilities | Permutation test for treatment effects | (internal) | no | +| `R/getcohort.R` (264 lines) | Utilities | Treatment cohort identification | `get.cohort()` | no | +| `R/print.R` (111 lines) | Utilities | S3 print methods for fect and interFE objects | `print.fect()`, `print.interFE()` | no | +| `R/RcppExports.R` (191 lines) | Utilities | Auto-generated Rcpp function bindings | (auto-generated) | no | +| `src/ife.cpp` (534 lines) | C++ Core | IFE algorithm: `inter_fe()`, `inter_fe_ub()`, `inter_fe_d()` | (Rcpp exports) | no | +| `src/ife_sub.cpp` (577 lines) | C++ Core | IFE sub-routines: SVD factor estimation, EM iterations, alternating minimization | (internal) | no | +| `src/mc.cpp` (223 lines) | C++ Core | Matrix completion: `inter_fe_mc()`, nuclear norm penalization | (Rcpp exports) | no | +| `src/cfe.cpp` (203 lines) | C++ Core | Complex FE: `complex_fe_ub()` | (Rcpp exports) | no | +| `src/cfe_sub.cpp` (564 lines) | C++ Core | Complex FE sub-routines: `cfe_iter()`, structured covariate handling | (internal) | no | +| `src/fe_sub.cpp` (291 lines) | C++ Core | Shared FE utilities: `Y_demean()`, `panel_beta()`, `panel_factor()`, `panel_FE()`, `XXinv()` | (internal) | no | +| `src/binary_sub.cpp` (539 lines) | C++ Core | Probit model sub-routines for binary outcomes | (internal) | no | +| `src/binary_qr.cpp` (347 lines) | C++ Core | QR-based probit estimation | (internal) | no | +| `src/binary_svd.cpp` (302 lines) | C++ Core | SVD-based probit estimation | (internal) | no | +| `src/auxiliary.cpp` (396 lines) | C++ Core | EM helpers, matrix utilities, log-likelihood computation | (internal) | no | +| `src/fect.h` (60 lines) | C++ Core | Header file with all C++ function declarations | (header) | no | + +--- + +## Function Call Graph + +### Main Estimation Pipeline + +```mermaid +%%{init: {'theme': 'neutral'}}%% +graph TD + F1["fect()"] + F2["fect.formula()"] + F3["fect.default()"] + F4["fect_cv()"] + F5["fect_fe()"] + F6["fect_mc()"] + F7["fect_cfe()"] + F8["fect_nevertreated()"] + C1["inter_fe_ub() [C++]"] + C2["inter_fe_mc() [C++]"] + C3["complex_fe_ub() [C++]"] + C4["inter_fe_d_qr_ub() [C++]"] + S1["panel_factor() [C++]"] + S2["panel_FE() [C++]"] + S3["Y_demean() [C++]"] + S4["cfe_iter() [C++]"] + + F1 --> F2 + F2 --> F3 + F3 -->|"CV=TRUE"| F4 + F3 -->|"method=ife/fe"| F5 + F3 -->|"method=mc"| F6 + F3 -->|"method=cfe"| F7 + F3 -->|"nevertreated"| F8 + F4 --> F5 + F4 --> F6 + F8 --> F5 + F8 --> F6 + F8 --> F7 + F5 --> C1 + F5 -->|"binary=TRUE"| C4 + F6 --> C2 + F7 --> C3 + C1 --> S1 + C1 --> S3 + C2 --> S2 + C2 --> S3 + C3 --> S4 + C3 --> S3 +``` + +### Inference and Diagnostics + +```mermaid +%%{init: {'theme': 'neutral'}}%% +graph TD + F3["fect.default()"] + B1["fect_boot()"] + D1["diagtest()"] + D2["fittest()"] + D3["fect_sens()"] + D4["fect_iden()"] + F5["fect_fe()"] + F6["fect_mc()"] + F7["fect_cfe()"] + PL["plot.fect()"] + ES["esplot()"] + + F3 -->|"se=TRUE"| B1 + F3 --> D1 + F3 --> D2 + B1 --> F5 + B1 --> F6 + B1 --> F7 + F3 --> PL + F3 --> ES + D3 -.->|"optional"| F3 + D4 -.->|"optional"| F3 +``` + +### Function Reference + +| Function | Defined In | Called By | Calls | Changed | Purpose | +| --- | --- | --- | --- | --- | --- | +| `fect()` | `R/default.R` | user / exported | `UseMethod("fect")` | no | S3 generic entry point for counterfactual estimation | +| `fect.formula()` | `R/default.R` | `fect()` | `fect.default()` | no | Parse formula, extract variable names, delegate to default method | +| `fect.default()` | `R/default.R` | `fect.formula()`, user | `fect_cv()`, `fect_fe()`, `fect_mc()`, `fect_cfe()`, `fect_boot()`, `diagtest()` | no | Workhorse: validation, preprocessing, method routing, inference, diagnostics | +| `fect_fe()` | `R/fe.R` | `fect.default()`, `fect_cv()`, `fect_boot()` | `inter_fe_ub()`, `inter_fe_d_qr_ub()` (C++) | no | IFE estimation (factor model with r latent factors) | +| `fect_mc()` | `R/mc.R` | `fect.default()`, `fect_cv()`, `fect_boot()` | `inter_fe_mc()` (C++) | no | Matrix completion estimation (nuclear norm regularization) | +| `fect_cfe()` | `R/cfe.R` | `fect.default()`, `fect_boot()` | `complex_fe_ub()` (C++) | no | Complex FE with structured covariates (Z, Q, gamma, kappa) | +| `fect_nevertreated()` | `R/fect_nevertreated.R` | `fect.default()` | `fect_fe()`, `fect_mc()`, `fect_cfe()` | no | Wrapper for never-treated-only estimation sample | +| `fect_cv()` | `R/cv.R` | `fect.default()` | `fect_fe()`, `fect_mc()` | no | Cross-validation to select r (IFE) or lambda (MC) | +| `fect_boot()` | `R/boot.R` | `fect.default()` | `fect_fe()`, `fect_mc()`, `fect_cfe()` | no | Bootstrap/jackknife inference engine with parallel support | +| `interFE()` | `R/interFE.R` | user / exported | `inter_fe()` (C++) | no | Standalone interactive fixed effects estimator | +| `did_wrapper()` | `R/did_wrapper.R` | user / exported | `fixest::feols()`, `did::att_gt()` | no | Modern DID estimator wrappers | +| `plot.fect()` | `R/plot.R` | user / exported | ggplot2 functions | no | Comprehensive visualization with 10+ plot types | +| `esplot()` | `R/esplot.R` | user / exported | ggplot2 functions | no | Standalone event-study plot | +| `effect()` | `R/effect.R` | user / exported | (internal helpers) | no | Treatment effect decomposition by sub-group | +| `att.cumu()` | `R/cumu.R` | user / exported | (internal helpers) | no | Cumulative ATT computation | +| `diagtest()` | `R/diagtest.R` | `fect.default()` | (statistical computations) | no | Pre-trend, placebo, carryover, equivalence tests | +| `fect_sens()` | `R/fect_sens.R` | user / exported | HonestDiDFEct functions | no | Sensitivity analysis | +| `fect_iden()` | `R/fect_iden.R` | user / exported | (internal helpers) | no | Identification analysis | +| `inter_fe_ub()` | `src/ife.cpp` | `fect_fe()` | `panel_factor()`, `fe_ub()`, `Y_demean()` | no | C++ IFE with unbalanced panels (EM algorithm) | +| `inter_fe_mc()` | `src/mc.cpp` | `fect_mc()` | `panel_FE()`, `Y_demean()` | no | C++ matrix completion with nuclear norm | +| `complex_fe_ub()` | `src/cfe.cpp` | `fect_cfe()` | `cfe_iter()`, `Y_demean()` | no | C++ complex FE estimation | +| `panel_factor()` | `src/fe_sub.cpp` | `inter_fe_ub()`, others | SVD routines | no | Extract latent factors via SVD | +| `panel_FE()` | `src/fe_sub.cpp` | `inter_fe_mc()`, others | soft-thresholding | no | Nuclear norm regularization / soft-thresholding | +| `Y_demean()` | `src/fe_sub.cpp` | most C++ estimators | (arma operations) | no | Remove unit and/or time fixed effects | + +--- + +## Data Flow + +```mermaid +%%{init: {'theme': 'neutral'}}%% +graph TD + IN["User Input (formula/data + params)"] + FP["Formula Parsing (fect.formula)"] + PV["Parameter Validation (fect.default)"] + DP["Data Preprocessing (long to T x N matrices)"] + CV{{"CV=TRUE?"}} + CVR["Cross-Validation (fect_cv)"] + OPT["Optimal r/lambda selected"] + NT{{"nevertreated?"}} + NTW["fect_nevertreated() wrapper"] + MR{{"Method?"}} + IFE["fect_fe() -> inter_fe_ub() C++"] + MC["fect_mc() -> inter_fe_mc() C++"] + CFE["fect_cfe() -> complex_fe_ub() C++"] + CI["Counterfactual Imputation (Y.ct)"] + ATT["ATT = Y.obs - Y.ct"] + SE{{"se=TRUE?"}} + BOOT["fect_boot() — resample + re-estimate"] + SECI["SEs, CIs, p-values"] + DIAG["Diagnostic Tests (diagtest)"] + OBJ["S3 Object Assembly (class fect)"] + OUT["Output (print / plot / esplot)"] + + IN --> FP + FP --> PV + PV --> DP + DP --> CV + CV -- yes --> CVR + CVR --> OPT + OPT --> NT + CV -- no --> NT + NT -- yes --> NTW + NTW --> MR + NT -- no --> MR + MR -- ife/fe --> IFE + MR -- mc --> MC + MR -- cfe --> CFE + IFE --> CI + MC --> CI + CFE --> CI + CI --> ATT + ATT --> SE + SE -- yes --> BOOT + BOOT --> SECI + SECI --> DIAG + SE -- no --> DIAG + DIAG --> OBJ + OBJ --> OUT +``` + +--- + +## Architectural Patterns + +- **S3 Dispatch with Formula Interface**: `fect()` uses `UseMethod()` to support both formula and direct (Y, D, X) interfaces. `fect.formula()` parses the formula into variable names, `fect.default()` does the computation. Same pattern for `interFE()`. + +- **R/C++ Layered Computation**: All numerically intensive operations (SVD, EM iterations, demeaning, matrix factorization) are implemented in C++ via RcppArmadillo. R handles data wrangling, parameter validation, control flow, and result assembly. The boundary is at the estimation functions: R `fect_fe()` calls C++ `inter_fe_ub()`. + +- **Method-Agnostic Pipeline**: `fect.default()` provides a single preprocessing, CV, estimation, inference, diagnostics pipeline. Method-specific logic is encapsulated in `fect_fe()`, `fect_mc()`, `fect_cfe()`. Adding a new estimation method requires only a new estimation function and a routing entry. + +- **Matrix-Oriented Data Representation**: Panel data is converted from long-form data frames to T x N matrices early in `fect.default()`. Covariates become T x N x p arrays. All downstream computation operates on these matrix forms, enabling efficient C++ computation. + +- **Two-Tier Tolerance**: Cross-validation uses a looser tolerance (`max(tol, 1e-3)`) for speed during hyperparameter search, while final estimation uses the user-specified tolerance for precision. + +- **Parallel Bootstrap via foreach**: `fect_boot()` uses `foreach` with `doParallel`/`doFuture` backends for parallel bootstrap replication. Includes `trim_closure_env()` optimization to reduce serialization overhead by keeping only referenced symbols in function environments. + +- **Counterfactual Imputation as Core Abstraction**: All methods share the same conceptual framework: impute Y(0) for treated units using untreated observations, compute ATT as the gap. FE uses additive fixed effects, IFE adds latent factors (F * L'), MC uses nuclear norm regularization, CFE adds structured covariates. + +- **Never-Treated vs Not-Yet-Treated Estimation Samples**: The package supports two estimation sample strategies. "notyettreated" includes not-yet-treated observations (requiring EM for missing data), "nevertreated" uses only never-treated units (allowing direct SVD). The `fect_nevertreated()` wrapper handles the latter. + +- **Comprehensive Diagnostic Suite**: Built-in tests (F-test, TOST equivalence, placebo, carryover) allow users to validate the parallel trends assumption without external tools. Sensitivity analysis via optional HonestDiDFEct integration. + +--- + +## Notes + +- FE is internally treated as IFE with `r = 0` (zero latent factors). The code sets `method = "ife"` when `method = "fe"` and `r = 0`. +- The `gsynth` method is a compatibility alias that forces `time.component.from = "nevertreated"` and `em = FALSE`, matching the behavior of the gsynth package. +- `boot.R` (4,884 lines) and `plot.R` (5,019 lines) are the two largest files. Both could benefit from modular decomposition in future refactors. +- The `binary` option (probit models) is only available with `method = "ife"` and has dedicated C++ implementations (`binary_qr.cpp`, `binary_svd.cpp`, `binary_sub.cpp`). +- The package uses `fixest::feols()` for initial OLS regression to obtain starting values for iterative estimation. +- Vignettes are organized as a Quarto book (`vignettes/_quarto.yml`) with 9 chapters covering getting started, FE, IFE/MC, CFE, heterogeneous effects, plots, gsynth compatibility, panel diagnostics, and sensitivity analysis. +- 10 bundled datasets (`simdata`, `sim_base`, `sim_gsynth`, `sim_linear`, `sim_region`, `sim_trend`, `turnout`, `gs2020`, `hh2019`, `simgsynth`) support examples and testing. +- 11 exported functions and 8 S3 methods registered in NAMESPACE. +- Total R source: 27,872 lines across 27 files. Total C++ source: 4,848 lines across 12 files (plus header). diff --git a/vignettes/01-start.Rmd b/vignettes/01-start.Rmd index d82e13e9..f3f2297b 100644 --- a/vignettes/01-start.Rmd +++ b/vignettes/01-start.Rmd @@ -78,7 +78,7 @@ ls() ### Simulated datasets -The package includes five simulated panel datasets. The first two (`simdata` and `sim_base`) are generated from the data-generating process (DGP) in @LWX2022. Both have $N = 200$ units and $T = 35$ time periods. Treatment switches on and off over time (99 of 150 treated units experience at least one reversal), reflecting a general treatment pattern rather than simple staggered adoption. The remaining three (`sim_trend`, `sim_region`, `sim_linear`) are block DID designs used to demonstrate CFE model components. +The package includes five simulated panel datasets. The first two (`simdata` and `sim_base`) are generated from the data-generating process (DGP) in @LWX2024. Both have $N = 200$ units and $T = 35$ time periods. Treatment switches on and off over time (99 of 150 treated units experience at least one reversal), reflecting a general treatment pattern rather than simple staggered adoption. The remaining three (`sim_trend`, `sim_region`, `sim_linear`) are block DID designs used to demonstrate CFE model components. The full DGP for `simdata` is: $$Y_{it} = \tau_{it} D_{it} + X_{1,it} + 3 X_{2,it} + \mu + 3\alpha_i + \xi_t + \lambda_i' f_t + \varepsilon_{it}$$ where $\alpha_i \sim N(0,1)$ are unit fixed effects, $\xi_t$ follows an AR(1) process with drift (time fixed effects), $X_{1,it}$ and $X_{2,it} \sim N(0,1)$ are observed covariates with coefficients 1 and 3, $\lambda_i \in \mathbb{R}^2$ are unit-specific factor loadings drawn from $N(0.5, 1)$, $f_t \in \mathbb{R}^2$ are latent time factors (one trending, one white noise), and $\varepsilon_{it} \sim N(0,2)$. The treatment effect is heterogeneous, i.e., $\tau_{it} \sim N(0.4 \cdot \text{tr\_cum}_{it}/T,\; 0.2)$, where $\text{tr\_cum}_{it}$ counts cumulative treatment periods. The grand mean is $\mu = 5$. diff --git a/vignettes/02-fect.Rmd b/vignettes/02-fect.Rmd index be10b617..8fbd8b90 100644 --- a/vignettes/02-fect.Rmd +++ b/vignettes/02-fect.Rmd @@ -1,6 +1,6 @@ # The Imputation Estimator {#sec-fect} -In this chapter, we illustrate how to use the **fect** package to implement counterfactual estimators (or imputation estimators) and conduct diagnostic tests proposed by @LWX2022 [Paper]. R script used in this chapter can be downloaded [here](https://raw.githubusercontent.com/xuyiqing/fect/dev/vignettes/rscript/02-fect.R). +In this chapter, we illustrate how to use the **fect** package to implement counterfactual estimators (or imputation estimators) and conduct diagnostic tests proposed by @LWX2024 [Paper]. R script used in this chapter can be downloaded [here](https://raw.githubusercontent.com/xuyiqing/fect/dev/vignettes/rscript/02-fect.R). ## Simulated data @@ -347,7 +347,6 @@ We can then visualize the weighted dynamic treatment effects using the inbuilt f plot(out.w, main = "Estimated Weighted ATT") ``` ------------------------------------------------------------------------- ## Additional notes @@ -356,3 +355,20 @@ plot(out.w, main = "Estimated Weighted ATT") 2. We can get replicable results by setting the option `seed` to a certain integer, no matter whether the parallel computing is used. 3. When `na.rm = FALSE` (default), the program allows observations to have missing outcomes $Y$ but not $X$ or treatment statuses $D$. When `na.rm = TRUE` the program will drop all observations that have missing values in outcomes, treatments, or covariates. + + +## How to Cite + +If you find these methods helpful, you can cite @LWX2024. + +```bibtex +@article{LWX2024, + title = {A Practical Guide to Counterfactual Estimators for Causal Inference with Time-Series Cross-Sectional Data}, + author = {Liu, Licheng and Wang, Ye and Xu, Yiqing}, + journal = {American Journal of Political Science}, + volume = {68}, + number = {1}, + pages = {160--176}, + year = {2024} +} +``` diff --git a/vignettes/03-ife-mc.Rmd b/vignettes/03-ife-mc.Rmd index a79c0891..ea6852e5 100644 --- a/vignettes/03-ife-mc.Rmd +++ b/vignettes/03-ife-mc.Rmd @@ -306,3 +306,19 @@ In the above plot, the three periods in blue are dropped from the first-stage es - The `proportion` option controls which pre-treatment periods are included in the tests (default: periods where the number of treated units exceeds `proportion` $\times$ total treated units). - The `tost.threshold` option sets the equivalence range for the TOST test (default: $0.36\hat{\sigma}_\epsilon$). Finding the "right" threshold is often a challenge in empirical research. ::: + +## How to Cite + +If you find these methods helpful, you can cite @LWX2024. + +```bibtex +@article{LWX2024, + title = {A Practical Guide to Counterfactual Estimators for Causal Inference with Time-Series Cross-Sectional Data}, + author = {Liu, Licheng and Wang, Ye and Xu, Yiqing}, + journal = {American Journal of Political Science}, + volume = {68}, + number = {1}, + pages = {160--176}, + year = {2024} +} +``` diff --git a/vignettes/06-plots.Rmd b/vignettes/06-plots.Rmd index c90c607f..46d6f21f 100644 --- a/vignettes/06-plots.Rmd +++ b/vignettes/06-plots.Rmd @@ -261,7 +261,7 @@ plot(out, type = "counterfactual", ## Pretrend Tests {#sec-pretrend} -We provide two tests that shed light on the parallel trends assumption: the placebo test and the equivalence test. For methodological details, see @sec-fect or @LWX2022. +We provide two tests that shed light on the parallel trends assumption: the placebo test and the equivalence test. For methodological details, see @sec-fect or @LWX2024. ### Placebo test---shape markers @@ -614,3 +614,21 @@ The table below summarizes which parameters apply to each plot type. Parameters | `status.*.color` | --- | --- | --- | --- | --- | --- | Yes | --- | --- | --- | | `xbreaks` / `ybreaks` | Yes | Yes | Yes | Yes | Yes | Yes | --- | --- | --- | Yes | | `xlim` / `ylim` | Yes | Yes | Yes | Yes | Yes | Yes | --- | Yes | --- | Yes | + + + +## How to Cite + +If you find these methods and visualization tools helpful, you can cite @LWX2024. + +```bibtex +@article{LWX2024, + title = {A Practical Guide to Counterfactual Estimators for Causal Inference with Time-Series Cross-Sectional Data}, + author = {Liu, Licheng and Wang, Ye and Xu, Yiqing}, + journal = {American Journal of Political Science}, + volume = {68}, + number = {1}, + pages = {160--176}, + year = {2024} +} +``` diff --git a/vignettes/07-gsynth.Rmd b/vignettes/07-gsynth.Rmd index 09ef64be..4c7bbb05 100644 --- a/vignettes/07-gsynth.Rmd +++ b/vignettes/07-gsynth.Rmd @@ -498,7 +498,6 @@ print(mspe.comp$summary[, c("Model", "MSPE", "RMSE", "MAD")]) Since `sim_gsynth` follows a pure IFE data generating process with two factors, Models 1 and 2 should produce identical MSPE --- confirming that CFE with `time.component.from = "nevertreated"` and no additional structure is numerically equivalent to gsynth. Model 3, which adds unnecessary linear trends, should produce similar or slightly worse MSPE because the extra parameters add noise without benefit when the true DGP has no unit-specific trends. ------------------------------------------------------------------------- ## Additional Notes @@ -507,3 +506,20 @@ Since `sim_gsynth` follows a pure IFE data generating process with two factors, 2. **Adding Covariates**: Including covariates in the model will significantly slow down the algorithm, as the IFE/MC model requires more time to converge. Users should be aware of this trade-off when incorporating covariates. 3. **Setting `min.T0`**: Setting `min.T0` to a positive value helps. The algorithm will automatically exclude treated units with too few pre-treatment periods. A larger $T_0$ reduces bias in causal estimates and minimizes the risk of severe extrapolation. When running cross-validation to select the number of factors, `min.T0` must be equal to or greater than (`r.max` + 2). Errors frequently occur when there are too few pre-treatment periods, so ensuring adequate $T_0$ (e.g. setting `min.T0 = 5`) is crucial. + + +## How to Cite + +If you find this method helpful, you can cite @Xu2017. + +```bibtex +@article{Xu2017, + title = {Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models}, + author = {Xu, Yiqing}, + journal = {Political Analysis}, + volume = {25}, + number = {1}, + pages = {57--76}, + year = {2017} +} +``` diff --git a/vignettes/08-panel.Rmd b/vignettes/08-panel.Rmd index c64e6516..55c81ddd 100644 --- a/vignettes/08-panel.Rmd +++ b/vignettes/08-panel.Rmd @@ -10,13 +10,13 @@ editor: source("_common.R") ``` -This chapter, authored by Ziyi Liu and Yiqing Xu, complements @CLLX2025 ([paper](https://yiqingxu.org/papers/english/2023_panel/CLLX.pdf), [slides](https://yiqingxu.org/papers/english/2023_panel/CLLX_slides.pdf)). +This chapter, authored by Ziyi Liu and Yiqing Xu, complements @CLLX2026 ([paper](https://yiqingxu.org/papers/english/2023_panel/CLLX.pdf), [slides](https://yiqingxu.org/papers/english/2023_panel/CLLX_slides.pdf)). Rivka Lipkovitz also contributes to this tutorial. R script used in this chapter can be downloaded [here](https://raw.githubusercontent.com/xuyiqing/fect/dev/vignettes/rscript/08-panel.R). ------------------------------------------------------------------------ In recent years, researchers have proposed various heterogeneous treatment effect (HTE) robust estimators for causal panel analysis under parallel trends (PT) as alternatives to traditional two-way fixed effects (TWFE) models. -Examples include those proposed by @CDLZ, @sun2021-event, @callaway2021-did, @CDH2020, @IKW2023, @BJS2024, and @LWX2022. +Examples include those proposed by @CDLZ, @sun2021-event, @callaway2021-did, @CDH2020, @IKW2023, @BJS2024, and @LWX2024. These methods are closely connected to the classic difference-in-differences (DID) estimator. This chapter will guide you through implementing these HTE-robust estimators, as well as TWFE, in R. @@ -596,7 +596,7 @@ fect.output <- as.matrix(out.fect$est.att) head(fect.output) ``` -@BJS2024 also provide the **didimputation** package to estimate the ATT using the same approach as @LWX2022. +@BJS2024 also provide the **didimputation** package to estimate the ATT using the same approach as @LWX2024. ```{r hh_impute, message = FALSE, warning = FALSE, fig.width = 6, fig.height = 4.5, cache=TRUE} df.impute <- df.use @@ -835,7 +835,7 @@ p.pm ### Imputation Method -Now we return to the imputation method proposed by @BJS2024 and @LWX2022. +Now we return to the imputation method proposed by @BJS2024 and @LWX2024. The estimated ATT is 0.127, with a standard error of 0.025. Both are very close to the TWFE estimates. @@ -976,4 +976,16 @@ esplot(data = res_st, main = "Stacked DID", xlim = c(-12,10)) ## How to Cite Please cite the authors of the original papers for their innovations. -If you find this tutorial helpful, you can cite @CLLX2025. +If you find this tutorial helpful, you can cite @CLLX2026. + +```bibtex +@article{CLLX2026, + title={Causal Panel Analysis under Parallel Trends: Lessons from A Large Reanalysis Study}, + author={Chiu, Albert and Lan, Xingchen and Liu, Ziyi and Xu, Yiqing}, + journal={American Political Science Review}, + volume={120}, + number={1}, + pages={245--266}, + year={2026} +} +``` diff --git a/vignettes/09-sens.Rmd b/vignettes/09-sens.Rmd index c1e96ee4..c9bdd06a 100644 --- a/vignettes/09-sens.Rmd +++ b/vignettes/09-sens.Rmd @@ -199,4 +199,26 @@ In this figure, different lines/bands will represent the robust CIs for $M=0$ (s ## How to Cite -Please cite @rambachan2023more for their original contribution to the sensitivity analysis framework for causal panel analysis. If you find this tutorial helpful, you can cite @CLLX2025. +Please cite @rambachan2023more for the original contribution on sensitivity analysis in causal panel analysis, and @CLLX2026 for adapting it to the counterfactual estimator framework. + +```bibtex +@article{rambachan2023more, + title={A more credible approach to parallel trends}, + author={Rambachan, Ashesh and Roth, Jonathan}, + journal={Review of Economic Studies}, + volume={90}, + number={5}, + pages={2555--2591}, + year={2023} +} + +@article{CLLX2026, + title={Causal Panel Analysis under Parallel Trends: Lessons from A Large Reanalysis Study}, + author={Chiu, Albert and Lan, Xingchen and Liu, Ziyi and Xu, Yiqing}, + journal={American Political Science Review}, + volume={120}, + number={1}, + pages={245--266}, + year={2026} +} +``` diff --git a/vignettes/bb-updates.Rmd b/vignettes/bb-updates.Rmd index 54f273f3..d031c9e0 100644 --- a/vignettes/bb-updates.Rmd +++ b/vignettes/bb-updates.Rmd @@ -2,6 +2,8 @@ ## v2.2.0 +(2026-03-27) CRAN release. + * Added `time.component.from` parameter: `"notyettreated"` (default) or `"nevertreated"` controls which units provide the time-varying model components (time fixed effects, latent factors, and temporal dynamics). Replaces `method = "gsynth"` with `method = "ife", time.component.from = "nevertreated"`. * CFE estimator now supports `time.component.from = "nevertreated"`, enabling full CFE model components (additional FEs, $Z/\gamma$, $Q/\kappa$, latent factors) with never-treated estimation. * Added `fect_mspe()` for out-of-sample model comparison (MSPE, RMSE, MAD) across specifications. diff --git a/vignettes/index.qmd b/vignettes/index.qmd index ca4dc3bb..641bbc17 100644 --- a/vignettes/index.qmd +++ b/vignettes/index.qmd @@ -6,9 +6,9 @@ This Quarto book serves as a user manual for the **fect** package in R, which im - @Xu2017 for Gsynth \[Paper\] -- @LWX2022 for counterfactual estimators \[Paper\] +- @LWX2024 for counterfactual estimators \[Paper\] -- @CLLX2025 for a survey of the new DID estimators \[Paper\] +- @CLLX2026 for a survey of the new DID estimators \[Paper\] ::: {.callout-note appearance="simple"} ### Source Code @@ -31,7 +31,7 @@ However, these counterfactual estimators come with important limitations: - They generally do not accommodate dynamic treatment assignment given past outcomes or covariates---i.e., "feedback"---based on sequential ignorability.\ - Methods for continuous treatments are still underdeveloped and are not currently covered by **fect**. -@CLLX2025 reanalyze 49 published studies in political science and offer justifications for adopting these estimators. +@CLLX2026 reanalyze 49 published studies in political science and offer justifications for adopting these estimators. ## Why the Merge? @@ -122,9 +122,27 @@ The following individuals (and AI) have contributed to **gsynth** and **fect**, - [Rivka Lipkovitz](https://rivka.me/) (Undergraduate at MIT) - [StatsClaw](https://github.com/xuyiqing/StatsClaw) (Agentic System for Statistical Software Development) + + +## How to Cite + +To cite the **fect** package or this user manual, please use: + +> Xu, Yiqing, Licheng Liu, Ye Wang, Ziyi Liu, Shijian Liu, Tianzhu Qin, Jinwen Wu, and Rivka Lipkovitz. 2026. *fect: Fixed Effects Counterfactual Estimators --- User Manual (v2.2.0).* + +```bibtex +@manual{fect2026, + title = {fect: Fixed Effects Counterfactual Estimators --- User Manual}, + author = {Xu, Yiqing and Liu, Licheng and Wang, Ye and Liu, Ziyi and Liu, Shijian and Qin, Tianzhu and Wu, Jinwen and Lipkovitz, Rivka}, + year = {2026}, + note = {R package version 2.2.0}, + url = {https://yiqingxu.org/packages/fect/} +} +``` + ## Report Bugs -Please report any bugs by submitting an issue on [GitHub](https://github.com/xuyiqing/fect/issues) or emailing me (yiqingxu \[at\] stanford.edu). We'd really appreciate it if you can include your minimally replicable code & data file and a **panelView** treatment status plot. Your feedback is highly valued! +Please report any bugs by submitting an issue on [GitHub](https://github.com/xuyiqing/fect/issues) or emailing me (yiqingxu \[at\] stanford.edu). We'd really appreciate it if you can include your minimally replicable code & data file and a [**panelView**](https://yiqingxu.org/packages/panelview/) treatment status plot. Your feedback is highly valued! @@ -136,6 +154,9 @@ Please report any bugs by submitting an issue on [GitHub](https://github.com/xuy --> ``` -**gsynth** (retiring): [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [CRAN status](https://CRAN.R-project.org/package=gsynth) [downloads: CRAN](https://cran.r-project.org/web/packages/gsynth/index.html) +**gsynth** (wrapper): [![Lifecycle: stable](https://lifecycle.r-lib.org/articles/figures/lifecycle-stable.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [CRAN status](https://CRAN.R-project.org/package=gsynth) [downloads: CRAN](https://cran.r-project.org/web/packages/gsynth/index.html) + +**panelView**: +[![Lifecycle: stable](https://lifecycle.r-lib.org/articles/figures/lifecycle-stable.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [CRAN status](https://CRAN.R-project.org/package=panelView) [downloads: CRAN](https://cran.r-project.org/package=panelView) diff --git a/vignettes/references.bib b/vignettes/references.bib index 16f011fc..14a6c8c4 100644 --- a/vignettes/references.bib +++ b/vignettes/references.bib @@ -1,4 +1,4 @@ -@article{CLLX2025, +@article{CLLX2026, title={Causal Panel Analysis under Parallel Trends: Lessons from A Large Reanalysis Study}, author={Chiu, Albert and Lan, Xingchen and Liu, Ziyi and Xu, Yiqing}, journal={American Political Science Review}, @@ -15,7 +15,7 @@ @article{li2025benchmarking publisher={OSF} } -@ARTICLE{LWX2022, +@ARTICLE{LWX2024, title = "A Practical Guide to Counterfactual Estimators for Causal Inference with Time-Series Cross-Sectional Data", author = "Liu, Licheng and Wang, Ye and Xu, Yiqing", journal = "American Journal of Political Science", diff --git a/vignettes/vignettes.Rproj b/vignettes/vignettes.Rproj deleted file mode 100644 index 8e3c2ebc..00000000 --- a/vignettes/vignettes.Rproj +++ /dev/null @@ -1,13 +0,0 @@ -Version: 1.0 - -RestoreWorkspace: Default -SaveWorkspace: Default -AlwaysSaveHistory: Default - -EnableCodeIndexing: Yes -UseSpacesForTab: Yes -NumSpacesForTab: 2 -Encoding: UTF-8 - -RnwWeave: Sweave -LaTeX: pdfLaTeX