From 4062750494c5dcd217fa4bd603163d2e42e05c4c Mon Sep 17 00:00:00 2001
From: Daniel Nachun <daniel.nachun@gmail.com>
Date: Mon, 22 Jun 2026 16:30:03 -0700
Subject: [PATCH] new S4 updates

---
 code/SoS/pecotmr_integration/colocboost.ipynb |   8 +-
 code/SoS/pecotmr_integration/ctwas.ipynb      | 102 ++++++
 code/SoS/pecotmr_integration/enloc.ipynb      |  89 +++++
 .../pecotmr_integration/fine_mapping.ipynb    |  23 +-
 .../pecotmr_integration/gwas_sumstats.ipynb   |  80 ++++
 .../SoS/pecotmr_integration/qtl_dataset.ipynb |   8 +-
 code/SoS/pecotmr_integration/twas.ipynb       |  80 ++++
 .../pecotmr_integration/twas_weights.ipynb    |   8 +-
 code/script/pecotmr_integration/coloc.R       |  96 +++++
 code/script/pecotmr_integration/colocboost.R  | 128 ++++++-
 .../pecotmr_integration/ctwas_assemble.R      | 130 +++++++
 code/script/pecotmr_integration/ctwas_est.R   |  78 ++++
 .../pecotmr_integration/ctwas_finemap.R       |  87 +++++
 .../script/pecotmr_integration/fine_mapping.R | 116 ++++--
 .../gwas_sumstats_construct.R                 | 189 ++++++++++
 .../legacy_ctwas_weights_to_s4.R              | 157 ++++++++
 .../legacy_twas_weights_to_s4.R               | 206 +++++++++++
 .../qtl_dataset_construct.R                   | 168 ++++++---
 .../pecotmr_integration/qtl_enrichment.R      |  68 ++++
 code/script/pecotmr_integration/twas.R        |  62 ++++
 .../script/pecotmr_integration/twas_weights.R |  79 ++--
 dev/pecotmr_migration_survey.md               |  87 +++++
 dev/python-replacement-scope.md               | 174 +++++++++
 dev/s4_migration_plan.md                      | 345 ++++++++++++++++++
 24 files changed, 2419 insertions(+), 149 deletions(-)
 create mode 100644 code/SoS/pecotmr_integration/ctwas.ipynb
 create mode 100644 code/SoS/pecotmr_integration/enloc.ipynb
 create mode 100644 code/SoS/pecotmr_integration/gwas_sumstats.ipynb
 create mode 100644 code/SoS/pecotmr_integration/twas.ipynb
 create mode 100644 code/script/pecotmr_integration/coloc.R
 create mode 100644 code/script/pecotmr_integration/ctwas_assemble.R
 create mode 100644 code/script/pecotmr_integration/ctwas_est.R
 create mode 100644 code/script/pecotmr_integration/ctwas_finemap.R
 create mode 100644 code/script/pecotmr_integration/gwas_sumstats_construct.R
 create mode 100644 code/script/pecotmr_integration/legacy_ctwas_weights_to_s4.R
 create mode 100644 code/script/pecotmr_integration/legacy_twas_weights_to_s4.R
 create mode 100644 code/script/pecotmr_integration/qtl_enrichment.R
 create mode 100644 code/script/pecotmr_integration/twas.R
 create mode 100644 dev/pecotmr_migration_survey.md
 create mode 100644 dev/python-replacement-scope.md
 create mode 100644 dev/s4_migration_plan.md

diff --git a/code/SoS/pecotmr_integration/colocboost.ipynb b/code/SoS/pecotmr_integration/colocboost.ipynb
index a5d94d90..f0e2e4b0 100644
--- a/code/SoS/pecotmr_integration/colocboost.ipynb
+++ b/code/SoS/pecotmr_integration/colocboost.ipynb
@@ -6,7 +6,7 @@
     "kernel": "SoS"
    },
    "source": [
-    "# Per-gene colocboost across contexts in a single study\n\n## Description\n\nFor a single study's `QtlDataset` RDS, run `pecotmr::colocboostPipeline()` per focal gene. Each task colocalizes the focal gene's signal across the QtlDataset's contexts (the within-study, cross-context case \u2014 no GWAS input). Gene-level parallelization is via SoS task fan-out: every task loads the same RDS and runs one focal gene.\n\n## Inputs\n\n- `--qtl-dataset` &mdash; path to the QtlDataset RDS produced by `qtl_dataset.ipynb`.\n- `--genes` &mdash; explicit gene-ID list (one focal gene per task).\n- `--cis-window` &mdash; bp window around each focal gene's TSS for variant selection. Default 1,000,000.\n\n## Output\n\n- `{cwd}/{study}.{gene}.colocboost.rds` per gene.\n"
+    "# Per-gene / per-region colocboost across contexts (and optionally GWAS) in a single study\n\n## Description\n\nFor a single study's `QtlDataset` RDS, run `pecotmr::colocboostPipeline()` per fan-out unit (gene or region). Three pipeline variants can be enabled independently:\n\n- `--xqtl-coloc` (default ON; opt out with `--no-xqtl-coloc=true` or `xqtl_coloc=False` in SoS): within-study cross-context QTL coloc.\n- `--joint-gwas` (default OFF): xQTL + GWAS joint coloc with the focal outcome from the GWAS. Requires `--gwas-sumstats`.\n- `--separate-gwas` (default OFF): one focal-GWAS coloc per GWAS study. Requires `--gwas-sumstats`.\n\nWhen `--gwas-sumstats` is supplied it must point at a QC'd `GwasSumStats` RDS (output of `summaryStatsQc()` upstream).\n\n## Inputs\n\n- `--qtl-dataset` &mdash; QtlDataset RDS produced by `qtl_dataset.ipynb`.\n- `--genes ID1 ID2 \u2026` **or** `--regions chr:start-end \u2026` (mutually exclusive).\n- `--cis-window` &mdash; bp window (gene mode default 1e6).\n- `--gwas-sumstats` &mdash; QC'd GwasSumStats RDS path (required for joint/separate variants).\n- `--xqtl-coloc` / `--joint-gwas` / `--separate-gwas` &mdash; pipeline-variant toggles.\n- `--modular-script-dir` &mdash; directory containing the per-gene worker R scripts. Default `code/script`.\n\n## Output\n\n- `{cwd}/{study}.{gene|region}.colocboost.rds` per fan-out unit. The RDS holds the colocboostPipeline list with slots `xqtl_coloc`, `joint_gwas`, `separate_gwas`, `computing_time`; unused-variant slots are NULL.\n"
    ]
   },
   {
@@ -15,7 +15,7 @@
     "kernel": "SoS"
    },
    "source": [
-    "## Example\n\n```bash\nsos run pipeline/colocboost.ipynb colocboost \\\n    --cwd output \\\n    --study ROSMAP_DLPFC \\\n    --qtl-dataset output/ROSMAP_DLPFC.qtl_dataset.rds \\\n    --genes ENSG00000060237 ENSG00000234593 \\\n    --cis-window 1000000\n```\n"
+    "## Example\n\n```bash\n# xQTL-only (default)\nsos run pipeline/colocboost.ipynb colocboost \\\n    --cwd output --modular-script-dir /path/to/code/script \\\n    --study TEST_STUDY \\\n    --qtl-dataset output/TEST_STUDY.qtl_dataset.rds \\\n    --genes ENSG00000060237\n\n# xQTL + joint-GWAS\nsos run pipeline/colocboost.ipynb colocboost \\\n    --cwd output --modular-script-dir /path/to/code/script \\\n    --study TEST_STUDY \\\n    --qtl-dataset output/TEST_STUDY.qtl_dataset.rds \\\n    --gwas-sumstats output/GWAS.qcd.rds \\\n    --joint-gwas True \\\n    --genes ENSG00000060237\n```\n"
    ]
   },
   {
@@ -26,7 +26,7 @@
    },
    "outputs": [],
    "source": [
-    "[global]\nparameter: cwd = path('output')\nparameter: study = str\nparameter: qtl_dataset = path\nparameter: genes = []\nparameter: cis_window = 1000000\nparameter: container = ''\nparameter: job_size = 1\nparameter: walltime = '1h'\nparameter: mem = '16G'\nparameter: numThreads = 1\n"
+    "[global]\nparameter: cwd = path('output')\nparameter: study = str\nparameter: qtl_dataset = path\nparameter: modular_script_dir = path('code/script')\n# Mutually exclusive fan-out sources.\nparameter: genes = []\nparameter: regions = []\nparameter: cis_window = 1000000\n# GWAS-side input: a QC'd GwasSumStats RDS (build it upstream).\nparameter: gwas_sumstats = path('.')\n# Pipeline variants.\nparameter: xqtl_coloc = True       # within-study cross-context (default ON)\nparameter: joint_gwas = False      # xQTL + GWAS joint (requires gwas_sumstats)\nparameter: separate_gwas = False   # per-GWAS-study separate-focal (requires gwas_sumstats)\nparameter: container = ''\nparameter: job_size = 1\nparameter: walltime = '30m'\nparameter: mem = '8G'\nparameter: numThreads = 1\n"
    ]
   },
   {
@@ -37,7 +37,7 @@
    },
    "outputs": [],
    "source": [
-    "[colocboost]\ninput: qtl_dataset, for_each = 'genes'\noutput: f\"{cwd}/{study}.{_genes}.colocboost.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n    Rscript ${path(\"code/script/pecotmr_integration/colocboost.R\", absolute=True)} \\\n        --qtl-dataset ${_input} \\\n        --gene-id ${_genes} \\\n        --cis-window ${cis_window} \\\n        --output ${_output}\n"
+    "[colocboost]\nif bool(genes) == bool(regions):\n    raise ValueError(\"Specify exactly one of --genes (trait IDs) or --regions (chr:start-end strings).\")\nif (joint_gwas or separate_gwas) and not gwas_sumstats.is_file():\n    raise ValueError(\"--joint-gwas / --separate-gwas require --gwas-sumstats pointing at a QC'd GwasSumStats RDS.\")\nif not (xqtl_coloc or joint_gwas or separate_gwas):\n    raise ValueError(\"Enable at least one of xqtl_coloc, joint_gwas, separate_gwas.\")\nfanout_items = genes if genes else regions\nfanout_kind  = 'gene' if genes else 'region'\ndef _fanout_safe(s):\n    return s.replace(':', '_').replace('-', '_')\ninput: qtl_dataset, for_each = 'fanout_items'\noutput: f\"{cwd}/{study}.{_fanout_safe(_fanout_items)}.colocboost.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n    Rscript ${modular_script_dir}/pecotmr_integration/colocboost.R \\\n        --qtl-dataset ${_input} \\\n        ${('--gene-id ' + _fanout_items + ' --cis-window ' + str(cis_window)) if fanout_kind == 'gene' else ('--region ' + _fanout_items)} \\\n        ${'--gwas-sumstats ' + str(gwas_sumstats) if gwas_sumstats.is_file() else ''} \\\n        ${'--no-xqtl-coloc' if not xqtl_coloc else ''} \\\n        ${'--joint-gwas' if joint_gwas else ''} \\\n        ${'--separate-gwas' if separate_gwas else ''} \\\n        --output ${_output}\n"
    ]
   }
  ],
diff --git a/code/SoS/pecotmr_integration/ctwas.ipynb b/code/SoS/pecotmr_integration/ctwas.ipynb
new file mode 100644
index 00000000..4a338241
--- /dev/null
+++ b/code/SoS/pecotmr_integration/ctwas.ipynb
@@ -0,0 +1,102 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "source": [
+    "# cTWAS via the three-step pecotmr API\n\n## Description\n\nDrives a multi-LD-block cTWAS analysis through `pecotmr`'s split pipeline:\n\n- `[ctwas_1]` &mdash; `assembleCtwasInputs`: consume the per-region manifest, load each block's `GwasSumStats` + per-gene `TwasWeights` RDS files, produce the assembled ctwas inputs (`z_snp`, `weights`, `region_info`, `snp_map`, `LD_map`, LD/snpInfo loader closures).\n- `[ctwas_2]` &mdash; `estCtwasParam`: estimate the joint group prior + prior variance (`ctwas::est_param`). `--fallback-to-prefit` recovers from accurate-EM NaN divergence by re-running the prefit step only (mirrors the legacy ctwas_2 workaround).\n- `[ctwas_3]` &mdash; `screenCtwasRegions` + `finemapCtwasRegions`: screen regions and fine-map causal genes; optional `--param-override` RDS lets the caller substitute hand-tuned priors before screening.\n\nThe three steps chain via intermediate RDS files (`{name}.ctwas_inputs.rds`, `{name}.ctwas_est.rds`, `{name}.ctwas_finemap.rds`) so individual stages can be re-run independently.\n\n## Inputs\n\n- `--manifest` &mdash; per-region manifest TSV with columns:\n    - `region_id` (unique string)\n    - `gwas_sumstats_rds` (path to per-block `GwasSumStats` RDS from `gwas_sumstats_construct.R`)\n    - `twas_weights_rds` (comma-separated per-gene `TwasWeights` RDS paths for that block; may be empty for SNP-only blocks)\n    - `fine_mapping_result_rds` (optional, comma-separated)\n- Optional weight pre-filters (`--twas-weight-cutoff`, `--cs-min-cor`, `--min-pip-cutoff`, `--max-num-variants`).\n- Optional `--twas-z` precomputed TWAS-Z `GRanges` RDS (output of `twas.ipynb`).\n- Step 2 knobs: `--thin`, `--niter-prefit`, `--niter`, `--min-group-size`, `--min-p-single-effect`, `--fallback-to-prefit`.\n- Step 3 knobs: `--L`, `--no-filter-L`, `--min-nonsnp-pip`, `--param-override`.\n\n## Output\n\n- `{cwd}/{name}.ctwas_inputs.rds` (step 1)\n- `{cwd}/{name}.ctwas_est.rds` (step 2)\n- `{cwd}/{name}.ctwas_finemap.rds` (step 3; ctwas_sumstats-shape result)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "source": [
+    "## Example\n\n```bash\nsos run pipeline/ctwas.ipynb ctwas_3 \\\n    --cwd output --modular-script-dir /path/to/code/script \\\n    --name protocol_example_twas_chr22 \\\n    --manifest /tmp/gwas_smoke/ctwas_manifest_blessed.tsv \\\n    --fallback-to-prefit \\\n    --min-group-size 1\n```\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "outputs": [],
+   "source": [
+    "[global]\nparameter: cwd = path('output')\nparameter: name = str\nparameter: manifest = path\n# --- step 1 (assemble) -----------------------------------------------\nparameter: method = ''\nparameter: twas_z = path('.')\nparameter: twas_weight_cutoff = 0.0\nparameter: cs_min_cor = 0.8\nparameter: min_pip_cutoff = 0.0\nparameter: max_num_variants = 'Inf'\n# --- step 2 (est_param) ----------------------------------------------\nparameter: thin = 0.1\nparameter: niter_prefit = 3\nparameter: niter = 30\nparameter: group_prior_var_structure = 'shared_type'\nparameter: min_group_size = 100\nparameter: min_p_single_effect = 0.8\nparameter: fallback_to_prefit = False\n# --- step 3 (screen + finemap) ---------------------------------------\nparameter: L = 5\nparameter: no_filter_L = False\nparameter: min_nonsnp_pip = 0.5\nparameter: param_override = path('.')\n# --- infrastructure --------------------------------------------------\nparameter: modular_script_dir = path('code/script')\nparameter: container = ''\nparameter: job_size = 1\nparameter: walltime = '1h'\nparameter: mem = '16G'\nparameter: numThreads = 1\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "outputs": [],
+   "source": [
+    "[ctwas_1 (assemble inputs)]\ninput: manifest\noutput: f\"{cwd}/{name}.ctwas_inputs.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n    Rscript ${modular_script_dir}/pecotmr_integration/ctwas_assemble.R \\\n        --manifest ${_input} \\\n        ${'--method ' + method if method else ''} \\\n        ${'--twas-z ' + str(twas_z) if twas_z.is_file() else ''} \\\n        --twas-weight-cutoff ${twas_weight_cutoff} \\\n        --cs-min-cor ${cs_min_cor} \\\n        --min-pip-cutoff ${min_pip_cutoff} \\\n        --max-num-variants ${max_num_variants} \\\n        --output ${_output}\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "outputs": [],
+   "source": [
+    "[ctwas_2 (estimate priors)]\noutput: f\"{cwd}/{name}.ctwas_est.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n    Rscript ${modular_script_dir}/pecotmr_integration/ctwas_est.R \\\n        --inputs ${_input} \\\n        --thin ${thin} \\\n        --niter-prefit ${niter_prefit} \\\n        --niter ${niter} \\\n        --group-prior-var-structure ${group_prior_var_structure} \\\n        --min-group-size ${min_group_size} \\\n        --min-p-single-effect ${min_p_single_effect} \\\n        ${'--fallback-to-prefit' if fallback_to_prefit else ''} \\\n        --ncore ${numThreads} \\\n        --output ${_output}\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "outputs": [],
+   "source": [
+    "[ctwas_3 (screen + finemap)]\noutput: f\"{cwd}/{name}.ctwas_finemap.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n    Rscript ${modular_script_dir}/pecotmr_integration/ctwas_finemap.R \\\n        --est ${_input} \\\n        ${'--param-override ' + str(param_override) if param_override.is_file() else ''} \\\n        --L ${L} \\\n        ${'--no-filter-L' if no_filter_L else ''} \\\n        --min-nonsnp-pip ${min_nonsnp_pip} \\\n        --ncore ${numThreads} \\\n        --output ${_output}\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "SoS",
+   "language": "sos",
+   "name": "sos"
+  },
+  "language_info": {
+   "codemirror_mode": "sos",
+   "file_extension": ".sos",
+   "mimetype": "text/x-sos",
+   "name": "sos",
+   "nbconvert_exporter": "sos_notebook.converter.SoS_Exporter",
+   "pygments_lexer": "sos"
+  },
+  "sos": {
+   "kernels": [
+    [
+     "Bash",
+     "bash",
+     "Bash",
+     "#E6EEFF",
+     "shell"
+    ],
+    [
+     "SoS",
+     "sos",
+     "",
+     "",
+     "sos"
+    ]
+   ],
+   "version": "0.24.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/code/SoS/pecotmr_integration/enloc.ipynb b/code/SoS/pecotmr_integration/enloc.ipynb
new file mode 100644
index 00000000..26d58dcd
--- /dev/null
+++ b/code/SoS/pecotmr_integration/enloc.ipynb
@@ -0,0 +1,89 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "source": [
+    "# xQTL–GWAS enrichment + colocalization\n\n## Description\n\nReplaces the legacy `SuSiE_enloc.ipynb` two-step `[xqtl_gwas_enrichment]` / `[susie_coloc]` flow with two thin SoS steps backed by the new pecotmr S4 API. Pairs an xQTL-side `QtlFineMappingResult` with a GWAS-side `GwasFineMappingResult` (or `GwasSumStats`) and runs the two pipelines back-to-back:\n\n- `[enrichment]` &mdash; `qtlEnrichmentPipeline(gwasFineMappingResult, qtlFineMappingResult, ...)` produces a per-(gwasStudy, qtlContext) enrichment data.frame.\n- `[coloc]` &mdash; `colocPipeline(qtlFineMappingResult, gwasInput, ..., enrichment = <step1 output>)` runs `coloc::coloc.bf_bf` per (QTL tuple, GWAS tuple, CS pair) with enrichment-scaled `p12` priors (the \"enloc\" variant). Pass `--skip-enrichment` to run plain coloc with the default p12.\n\nBecause the S4 FMR collections already encapsulate every (study, context, trait, method) tuple across regions, the workflow is two single-call steps &mdash; no per-gene fan-out, no Python overlap helpers, no manifest. Per-region fan-out (and the per-region coloc loop) lives inside `colocPipeline` itself.\n\n## Inputs\n\n- `--qtl-fine-mapping` &mdash; path to S4 `QtlFineMappingResult` RDS (output of `fine_mapping.ipynb` / `fineMappingPipeline`).\n- `--gwas-fine-mapping` &mdash; path to S4 `GwasFineMappingResult` RDS (also a `fineMappingPipeline` output). `colocPipeline` will also accept a `GwasSumStats` RDS and inline-fine-map it.\n- `--name` &mdash; identifier used in the output filenames.\n- Optional enrichment knobs (`--num-gwas`, `--pi-qtl`, `--lambda`, `--imp-n`).\n- Optional coloc knobs (`--p1`, `--p2`, `--p12`, `--p12-max`, `--filter-lbf-cs`, `--filter-lbf-cs-secondary`, `--no-adjust-pips`).\n- `--skip-enrichment` &mdash; skip step 1 and run coloc with the default p12 (the non-enloc variant).\n\n## Outputs\n\n- `{cwd}/{name}.enrichment.rds` &mdash; data.frame, one row per (gwasStudy, qtlContext) pair (omitted when `--skip-enrichment`).\n- `{cwd}/{name}.coloc.rds` &mdash; data.frame, one row per (QTL tuple, GWAS tuple, CS pair).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "source": "## Example\n\n```bash\nsos run pipeline/enloc.ipynb susie_enloc \\\n    --cwd output --modular-script-dir /path/to/code/script \\\n    --name protocol_example_enloc \\\n    --qtl-fine-mapping  output/fine_mapping/protocol_example.qtl_finemap.rds \\\n    --gwas-fine-mapping output/fine_mapping/protocol_example.gwas_finemap.rds\n```\n"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "outputs": [],
+   "source": [
+    "[global]\nparameter: cwd = path('output')\nparameter: name = str\nparameter: qtl_fine_mapping = path\nparameter: gwas_fine_mapping = path\n# --- step 1 (enrichment) --------------------------------------------\nparameter: skip_enrichment = False\nparameter: num_gwas = -1.0    # negative → NULL pass-through\nparameter: pi_qtl   = -1.0    # negative → NULL pass-through\nparameter: lambda_  = 1.0\nparameter: imp_n    = 25\n# --- step 2 (coloc) -------------------------------------------------\nparameter: p1                       = 1e-4\nparameter: p2                       = 1e-4\nparameter: p12                      = 5e-6\nparameter: p12_max                  = 1e-3\nparameter: filter_lbf_cs            = False\nparameter: filter_lbf_cs_secondary  = -1.0   # negative → NULL\nparameter: no_adjust_pips           = False\nparameter: finemapping_methods      = 'susie'\n# --- infrastructure -------------------------------------------------\nparameter: modular_script_dir = path('code/script')\nparameter: container = ''\nparameter: job_size = 1\nparameter: walltime = '30m'\nparameter: mem = '16G'\nparameter: numThreads = 1\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "outputs": [],
+   "source": [
+    "[susie_enloc_1 (enrichment)]\nstop_if(skip_enrichment, '--skip-enrichment set; skipping qtlEnrichmentPipeline.')\ninput: qtl_fine_mapping, gwas_fine_mapping\noutput: f\"{cwd}/{name}.enrichment.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n    Rscript ${modular_script_dir}/pecotmr_integration/qtl_enrichment.R \\\n        --qtl-fine-mapping ${qtl_fine_mapping} \\\n        --gwas-fine-mapping ${gwas_fine_mapping} \\\n        ${'--num-gwas ' + str(num_gwas) if num_gwas >= 0 else ''} \\\n        ${'--pi-qtl ' + str(pi_qtl) if pi_qtl >= 0 else ''} \\\n        --lambda ${lambda_} \\\n        --imp-n ${imp_n} \\\n        --ncore ${numThreads} \\\n        --output ${_output}\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "outputs": [],
+   "source": [
+    "[susie_enloc_2 (coloc)]\nenrich_path = f\"{cwd}/{name}.enrichment.rds\"\nenrich_arg  = f\"--enrichment {enrich_path}\" if not skip_enrichment else ''\ninput: qtl_fine_mapping, gwas_fine_mapping\noutput: f\"{cwd}/{name}.coloc.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n    Rscript ${modular_script_dir}/pecotmr_integration/coloc.R \\\n        --qtl-fine-mapping ${qtl_fine_mapping} \\\n        --gwas-input ${gwas_fine_mapping} \\\n        ${enrich_arg} \\\n        ${'--filter-lbf-cs' if filter_lbf_cs else ''} \\\n        ${'--filter-lbf-cs-secondary ' + str(filter_lbf_cs_secondary) if filter_lbf_cs_secondary >= 0 else ''} \\\n        --p1 ${p1} \\\n        --p2 ${p2} \\\n        --p12 ${p12} \\\n        --p12-max ${p12_max} \\\n        ${'--no-adjust-pips' if no_adjust_pips else ''} \\\n        --finemapping-methods ${finemapping_methods} \\\n        --output ${_output}\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "SoS",
+   "language": "sos",
+   "name": "sos"
+  },
+  "language_info": {
+   "codemirror_mode": "sos",
+   "file_extension": ".sos",
+   "mimetype": "text/x-sos",
+   "name": "sos",
+   "nbconvert_exporter": "sos_notebook.converter.SoS_Exporter",
+   "pygments_lexer": "sos"
+  },
+  "sos": {
+   "kernels": [
+    [
+     "Bash",
+     "bash",
+     "Bash",
+     "#E6EEFF",
+     "shell"
+    ],
+    [
+     "SoS",
+     "sos",
+     "",
+     "",
+     "sos"
+    ]
+   ],
+   "version": "0.24.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
\ No newline at end of file
diff --git a/code/SoS/pecotmr_integration/fine_mapping.ipynb b/code/SoS/pecotmr_integration/fine_mapping.ipynb
index 9886e42d..31f36f25 100644
--- a/code/SoS/pecotmr_integration/fine_mapping.ipynb
+++ b/code/SoS/pecotmr_integration/fine_mapping.ipynb
@@ -5,18 +5,14 @@
    "metadata": {
     "kernel": "SoS"
    },
-   "source": [
-    "# Per-gene fine-mapping against a pre-built QtlDataset\n\n## Description\n\nFor a single study's `QtlDataset` RDS, run `pecotmr::fineMappingPipeline(methods = \"susie\", ...)` per gene. Gene-level parallelization is via SoS task fan-out: each task loads the same RDS and fine-maps one gene. No pecotmr-internal accessors are called from this notebook \u2014 the wrapper hands the QtlDataset and a single `--gene-id` to the pipeline and writes one RDS per gene.\n\n## Inputs\n\n- `--qtl-dataset` &mdash; path to the QtlDataset RDS produced by `qtl_dataset.ipynb`.\n- `--genes` &mdash; explicit gene-ID list (space-separated CLI args).\n- `--cis-window` &mdash; bp window around each trait's TSS for variant selection. Default 1,000,000.\n- `--coverage` &mdash; SuSiE credible-set coverage. Default 0.95.\n\n## Output\n\n- `{cwd}/{study}.{gene}.finemap.rds` per gene.\n"
-   ]
+   "source": "# SuSiE fine-mapping (QTL or GWAS)\n\n## Description\n\nTwo workflows, both backed by `pecotmr::fineMappingPipeline` (which dispatches on the S4 input class):\n\n- **`fine_mapping`** &mdash; per-gene / per-region fine-mapping over a pre-built `QtlDataset` RDS. Fan-out goes by `--genes` (list of trait IDs, gene mode) **or** `--regions` (list of `chr:start-end` strings, region mode); exactly one must be supplied. Each task loads the same QtlDataset RDS and fine-maps one unit.\n- **`fine_mapping_gwas`** &mdash; per-block fine-mapping over a list of per-block `GwasSumStats` RDS paths (typically produced by `gwas_sumstats_construct.R` / `gwas_sumstats.ipynb`). One task per RDS; each task produces one `GwasFineMappingResult` RDS. Downstream consumers (e.g. `enloc.ipynb`) concatenate the per-block outputs as needed.\n\n## Inputs\n\n### `fine_mapping` (QTL)\n\n- `--qtl-dataset` &mdash; path to the QtlDataset RDS produced by `qtl_dataset.ipynb`.\n- `--genes ID1 ID2 …` **or** `--regions chr:start-end …` &mdash; fan-out targets (mutually exclusive).\n- `--cis-window` &mdash; bp window around each gene's TSS (gene mode only). Default 1,000,000.\n- `--methods` &mdash; comma-separated fine-mapping method tokens. Default `susie`.\n- `--coverage` &mdash; SuSiE credible-set coverage. Default 0.95.\n\n### `fine_mapping_gwas` (GWAS)\n\n- `--gwas-sumstats-list S1.rds S2.rds …` &mdash; per-block GwasSumStats RDS paths to fan out over.\n- `--methods` &mdash; comma-separated fine-mapping method tokens. Default `susie`.\n- `--coverage` &mdash; SuSiE credible-set coverage. Default 0.95.\n\n### Both\n\n- `--cwd` &mdash; output directory. Default `output`.\n- `--study` &mdash; study label used in the output filename.\n- `--modular-script-dir` &mdash; directory containing the per-task worker R scripts. Default `code/script`; override when SoS is invoked from a working directory that doesn't contain them.\n\n## Outputs\n\n- `fine_mapping`: `{cwd}/{study}.{gene|region}.finemap.rds` per fan-out unit (region strings are sanitised: `:` and `-` become `_`).\n- `fine_mapping_gwas`: `{cwd}/{study}.{bn-of-gwas-sumstats-rds}.gwas_finemap.rds` per input RDS.\n"
   },
   {
    "cell_type": "markdown",
    "metadata": {
     "kernel": "SoS"
    },
-   "source": [
-    "## Example\n\n```bash\nsos run pipeline/fine_mapping.ipynb fine_mapping \\\n    --cwd output \\\n    --study ROSMAP_DLPFC \\\n    --qtl-dataset output/ROSMAP_DLPFC.qtl_dataset.rds \\\n    --genes ENSG00000060237 ENSG00000234593 ENSG00000168487 \\\n    --cis-window 1000000\n```\n"
-   ]
+   "source": "## Example\n\nQTL (per-gene):\n```bash\nsos run pipeline/fine_mapping.ipynb fine_mapping \\\n    --cwd output \\\n    --modular-script-dir /path/to/xqtl-protocol/code/script \\\n    --study TEST_STUDY \\\n    --qtl-dataset output/TEST_STUDY.qtl_dataset.rds \\\n    --genes ENSG00000060237 ENSG00000234593\n```\n\nGWAS (per-block, list of GwasSumStats RDS):\n```bash\nsos run pipeline/fine_mapping.ipynb fine_mapping_gwas \\\n    --cwd output \\\n    --modular-script-dir /path/to/xqtl-protocol/code/script \\\n    --study TEST_STUDY \\\n    --gwas-sumstats-list output/TEST_STUDY.block1.gwas_sumstats.rds \\\n                         output/TEST_STUDY.block2.gwas_sumstats.rds\n```\n"
   },
   {
    "cell_type": "code",
@@ -25,9 +21,7 @@
     "kernel": "SoS"
    },
    "outputs": [],
-   "source": [
-    "[global]\nparameter: cwd = path('output')\nparameter: study = str\nparameter: qtl_dataset = path\n# Explicit gene list; for large fan-outs use the standard SoS @file.txt\n# CLI sugar (one gene per line in the file).\nparameter: genes = []\nparameter: cis_window = 1000000\nparameter: coverage = 0.95\nparameter: container = ''\nparameter: job_size = 1\nparameter: walltime = '30m'\nparameter: mem = '8G'\nparameter: numThreads = 1\n"
-   ]
+   "source": "[global]\nparameter: cwd = path('output')\nparameter: study = str\nparameter: modular_script_dir = path('code/script')\n# --- fine_mapping (QTL) ----------------------------------------------\nparameter: qtl_dataset = path('.')\nparameter: genes = []\nparameter: regions = []\nparameter: cis_window = 1000000\n# --- fine_mapping_gwas -----------------------------------------------\nparameter: gwas_sumstats_list = []\n# --- shared ----------------------------------------------------------\nparameter: methods = 'susie'\nparameter: coverage = 0.95\nparameter: container = ''\nparameter: job_size = 1\nparameter: walltime = '30m'\nparameter: mem = '8G'\nparameter: numThreads = 1\n"
   },
   {
    "cell_type": "code",
@@ -36,9 +30,14 @@
     "kernel": "SoS"
    },
    "outputs": [],
-   "source": [
-    "[fine_mapping]\n# One task per gene; every task loads the same QtlDataset RDS once and\n# runs fineMappingPipeline against one traitId.\ninput: qtl_dataset, for_each = 'genes'\noutput: f\"{cwd}/{study}.{_genes}.finemap.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n    Rscript ${path(\"code/script/pecotmr_integration/fine_mapping.R\", absolute=True)} \\\n        --qtl-dataset ${_input} \\\n        --gene-id ${_genes} \\\n        --cis-window ${cis_window} \\\n        --coverage ${coverage} \\\n        --output ${_output}\n"
-   ]
+   "source": "[fine_mapping]\nif not qtl_dataset.is_file():\n    raise ValueError(\"fine_mapping requires --qtl-dataset to point at an existing QtlDataset RDS.\")\nif bool(genes) == bool(regions):\n    raise ValueError(\"Specify exactly one of --genes (trait IDs) or --regions (chr:start-end strings).\")\nfanout_items = genes if genes else regions\nfanout_kind  = 'gene' if genes else 'region'\ndef _fanout_safe(s):\n    return s.replace(':', '_').replace('-', '_')\ninput: qtl_dataset, for_each = 'fanout_items'\noutput: f\"{cwd}/{study}.{_fanout_safe(_fanout_items)}.finemap.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n    Rscript ${modular_script_dir}/pecotmr_integration/fine_mapping.R \\\n        --qtl-dataset ${_input} \\\n        ${('--gene-id ' + _fanout_items + ' --cis-window ' + str(cis_window)) if fanout_kind == 'gene' else ('--region ' + _fanout_items)} \\\n        --methods ${methods} \\\n        --coverage ${coverage} \\\n        --output ${_output}\n"
+  },
+  {
+   "cell_type": "code",
+   "source": "[fine_mapping_gwas]\nif not gwas_sumstats_list:\n    raise ValueError(\"fine_mapping_gwas requires --gwas-sumstats-list pointing at one or more per-block GwasSumStats RDS files.\")\ngss_paths = [str(p) for p in gwas_sumstats_list]\ninput: gss_paths, group_by = 1\noutput: f\"{cwd}/{study}.{_input:bnn}.gwas_finemap.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n    Rscript ${modular_script_dir}/pecotmr_integration/fine_mapping.R \\\n        --gwas-sumstats ${_input} \\\n        --methods ${methods} \\\n        --coverage ${coverage} \\\n        --output ${_output}\n",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
   }
  ],
  "metadata": {
diff --git a/code/SoS/pecotmr_integration/gwas_sumstats.ipynb b/code/SoS/pecotmr_integration/gwas_sumstats.ipynb
new file mode 100644
index 00000000..9b908b01
--- /dev/null
+++ b/code/SoS/pecotmr_integration/gwas_sumstats.ipynb
@@ -0,0 +1,80 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "source": [
+    "# Per-LD-block GwasSumStats construction\n\n## Description\n\nFor one study's GWAS TSV, fan out per LD block, build a `pecotmr::GwasSumStats` (including `summaryStatsQc()`), and serialize one RDS per `(study, block)`. The RDS files become the per-block GWAS inputs for `twas.ipynb` and `ctwas.ipynb`.\n\n## Inputs\n\n- `--study` &mdash; study identifier.\n- `--gwas-tsv` &mdash; path to the GWAS summary-statistics TSV.\n- `--ld-blocks` &mdash; BED file of LD-block intervals (`chr/chrom`, `start`, `stop/end`).\n- `--ld-meta` &mdash; LD-meta TSV pointing at per-region LD references.\n- `--genome` &mdash; genome build label. Default `hg19`.\n- `--modular-script-dir` &mdash; directory containing the worker R scripts.\n\n## Output\n\n- `{cwd}/{study}.{block_id}.gwas_sumstats.rds` per LD block (`block_id` sanitises `:`\u2192`_`, `-`\u2192`_`).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "source": [
+    "## Example\n\n```bash\nsos run pipeline/gwas_sumstats.ipynb gwas_sumstats_construct \\\n    --cwd output --modular-script-dir /path/to/code/script \\\n    --study TEST_GWAS \\\n    --gwas-tsv input/twas/protocol_example.twas.gwas_sumstats.chr22.tsv.gz \\\n    --ld-blocks input/twas/protocol_example.twas.LD_blocks.chr22.bed \\\n    --ld-meta input/ld_reference/protocol_example.ld_meta_file.tsv\n```\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "outputs": [],
+   "source": [
+    "[global]\nparameter: cwd = path('output')\nparameter: study = str\nparameter: gwas_tsv = path\nparameter: ld_blocks = path\nparameter: ld_meta = path\nparameter: genome = 'hg19'\nparameter: modular_script_dir = path('code/script')\nparameter: container = ''\nparameter: walltime = '30m'\nparameter: mem = '8G'\nparameter: numThreads = 1\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "outputs": [],
+   "source": [
+    "[gwas_sumstats_construct]\n# Parse LD-blocks BED (header line uses 'chr/chrom', 'start', 'stop/end').\ndef _read_blocks(path):\n    out = []\n    with open(path) as f:\n        hdr = next(f).rstrip('\\n').lstrip('#').split('\\t')\n        for line in f:\n            row = dict(zip(hdr, line.rstrip('\\n').split('\\t')))\n            chrom = row.get('chr', row.get('chrom'))\n            start = int(row.get('start', row.get('Start', 0)))\n            stop  = int(row.get('stop', row.get('end', row.get('End', 0))))\n            out.append(f'{chrom}:{start}-{stop}')\n    return out\nblocks = _read_blocks(ld_blocks)\nif not blocks:\n    raise ValueError(f\"No LD blocks found in {ld_blocks}\")\ndef _safe(s):\n    return s.replace(':', '_').replace('-', '_')\ninput: gwas_tsv, for_each = 'blocks'\noutput: f\"{cwd}/{study}.{_safe(_blocks)}.gwas_sumstats.rds\"\ntask: trunk_workers = 1, trunk_size = 1, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n    Rscript ${modular_script_dir}/pecotmr_integration/gwas_sumstats_construct.R \\\n        --study ${study} \\\n        --gwas-tsv ${_input} \\\n        --ld-block ${_blocks} \\\n        --ld-meta ${ld_meta} \\\n        --genome ${genome} \\\n        --output ${_output}\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "SoS",
+   "language": "sos",
+   "name": "sos"
+  },
+  "language_info": {
+   "codemirror_mode": "sos",
+   "file_extension": ".sos",
+   "mimetype": "text/x-sos",
+   "name": "sos",
+   "nbconvert_exporter": "sos_notebook.converter.SoS_Exporter",
+   "pygments_lexer": "sos"
+  },
+  "sos": {
+   "kernels": [
+    [
+     "Bash",
+     "bash",
+     "Bash",
+     "#E6EEFF",
+     "shell"
+    ],
+    [
+     "SoS",
+     "sos",
+     "",
+     "",
+     "sos"
+    ]
+   ],
+   "version": "0.24.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
\ No newline at end of file
diff --git a/code/SoS/pecotmr_integration/qtl_dataset.ipynb b/code/SoS/pecotmr_integration/qtl_dataset.ipynb
index 9e091099..55e976ce 100644
--- a/code/SoS/pecotmr_integration/qtl_dataset.ipynb
+++ b/code/SoS/pecotmr_integration/qtl_dataset.ipynb
@@ -6,7 +6,7 @@
     "kernel": "SoS"
    },
    "source": [
-    "# Build a QtlDataset per study\n\n## Description\n\nConstruct a single pecotmr `QtlDataset` from one study's genotype + per-context phenotype + per-context phenotype-PC files (plus an optional uniform genotype-PC file), and serialize it to a single RDS. The resulting object is the upstream dependency for every per-gene fineMapping / TWAS / colocboost task \u2014 gene-level parallelization happens against this single object via per-gene `traitId` selectors inside the downstream pipelines.\n\nThis step does NOT iterate over genes or regions. Build the QtlDataset once per study; downstream notebooks fan out per-gene tasks against the resulting RDS.\n\n## Inputs\n\n- `--study` &mdash; study identifier.\n- `--genotype-prefix` &mdash; PLINK2 pgen/pvar/psam prefix (no extension).\n- `--phenotype CONTEXT=PATH ...` &mdash; one entry per QTL context; PATH is a bgzipped BED of phenotype values (columns `#chr`, `start`, `end`, `gene_id`, then per-sample columns).\n- `--phenotype-covariates CONTEXT=PATH ...` &mdash; optional per-context molecular-trait PC TSVs (samples as rows, PCs as columns).\n- `--genotype-covariates PATH` &mdash; optional TSV of genotype-derived covariates (e.g. ancestry PCs) applied uniformly across all contexts (same shape as the phenotype-PC files).\n- `--maf-cutoff` / `--xvar-cutoff` &mdash; pass-through to `QtlDataset()`. These are lazy filters applied inside the accessors at fit time, not at construction.\n\n## Output\n\n- `{cwd}/{study}.qtl_dataset.rds`\n"
+    "# Build a QtlDataset per study\n\n## Description\n\nConstruct a single pecotmr `QtlDataset` from one study's genotype + per-context phenotype + per-context covariate files (plus an optional uniform genotype-PC file), and serialize it to a single RDS. The resulting object is the upstream dependency for every per-gene fineMapping / TWAS / colocboost task \u2014 gene-level parallelization happens against this single object via per-gene `traitId` selectors inside the downstream pipelines.\n\nThis step does NOT iterate over genes or regions. Build the QtlDataset once per study; downstream notebooks fan out per-gene tasks against the resulting RDS.\n\n## Inputs\n\n- `--study` &mdash; study identifier.\n- `--genotype-prefix` &mdash; PLINK1 bed/bim/fam prefix (no extension).\n- `--phenotype-manifest` &mdash; TSV with columns `#chr, start, end, ID, path, cond` (and optionally `cov_path`). One row per (region, context). Relative `path` / `cov_path` entries resolve against the manifest's own directory.\n- `--genotype-covariates` &mdash; optional TSV of genotype-derived covariates (e.g. ancestry PCs) applied uniformly across all contexts.\n- `--transpose-covariates` &mdash; when set, transposes every covariate TSV (phenotype + genotype) after reading. Use this for QTLtools-format inputs where rows are covariate names and columns are samples.\n- `--maf-cutoff` / `--xvar-cutoff` &mdash; pass-through to `QtlDataset()`. Lazy filters applied inside the accessors at fit time.\n\n## Output\n\n- `{cwd}/{study}.qtl_dataset.rds`\n"
    ]
   },
   {
@@ -15,7 +15,7 @@
     "kernel": "SoS"
    },
    "source": [
-    "## Example\n\n```bash\nsos run pipeline/qtl_dataset.ipynb qtl_dataset_construct \\\n    --cwd output \\\n    --study ROSMAP_DLPFC \\\n    --genotype-prefix input/genotype/chr22 \\\n    --phenotype DLPFC=input/pheno/DLPFC.bed.gz AC=input/pheno/AC.bed.gz \\\n    --phenotype-covariates DLPFC=input/cov/DLPFC.pcs.tsv AC=input/cov/AC.pcs.tsv \\\n    --genotype-covariates input/cov/genotype_pcs.tsv\n```\n"
+    "## Example\n\n```bash\nsos run pipeline/qtl_dataset.ipynb qtl_dataset_construct \\\n    --cwd output \\\n    --study TEST_STUDY \\\n    --genotype-prefix input/colocboost/example.chr22 \\\n    --phenotype-manifest input/colocboost/pheno_manifest_multicontext.tsv \\\n    --transpose-covariates\n```\n"
    ]
   },
   {
@@ -26,7 +26,7 @@
    },
    "outputs": [],
    "source": [
-    "[global]\nparameter: cwd = path('output')\nparameter: study = str\nparameter: genotype_prefix = path\n# Per-context CONTEXT=PATH entries; SoS parses repeated CLI args as lists.\nparameter: phenotype = []\nparameter: phenotype_covariates = []\nparameter: genotype_covariates = path('.')\nparameter: maf_cutoff = 0.0\nparameter: xvar_cutoff = 0.0\nparameter: container = ''\nparameter: walltime = '15m'\nparameter: mem = '8G'\nparameter: numThreads = 1\n"
+    "[global]\nparameter: cwd = path('output')\nparameter: study = str\nparameter: genotype_prefix = path\nparameter: phenotype_manifest = path\nparameter: genotype_covariates = path('.')\n# Set when covariate TSVs are in QTLtools format (rows = covariates, cols = samples).\nparameter: transpose_covariates = False\nparameter: maf_cutoff = 0.0\nparameter: xvar_cutoff = 0.0\n# Directory holding code/script of the xqtl-protocol checkout; override\n# when SoS is invoked from a working dir that doesn't contain the scripts.\nparameter: modular_script_dir = path('code/script')\nparameter: container = ''\nparameter: walltime = '15m'\nparameter: mem = '8G'\nparameter: numThreads = 1\n"
    ]
   },
   {
@@ -37,7 +37,7 @@
    },
    "outputs": [],
    "source": [
-    "[qtl_dataset_construct]\n# Build the QtlDataset once. No fan-out; downstream notebooks load this\n# RDS and parallelize per gene.\noutput: f\"{cwd}/{study}.qtl_dataset.rds\"\ntask: trunk_workers = 1, trunk_size = 1, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n    Rscript ${path(\"code/script/pecotmr_integration/qtl_dataset_construct.R\", absolute=True)} \\\n        --study ${study} \\\n        --genotype-prefix ${genotype_prefix} \\\n        --phenotype \"${','.join(phenotype)}\" \\\n        --phenotype-covariates \"${','.join(phenotype_covariates)}\" \\\n        --genotype-covariates ${genotype_covariates if genotype_covariates.is_file() else '\"\"'} \\\n        --maf-cutoff ${maf_cutoff} \\\n        --xvar-cutoff ${xvar_cutoff} \\\n        --output ${_output}\n"
+    "[qtl_dataset_construct]\n# Build the QtlDataset once. No fan-out; downstream notebooks load this\n# RDS and parallelize per gene.\noutput: f\"{cwd}/{study}.qtl_dataset.rds\"\ntask: trunk_workers = 1, trunk_size = 1, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n    Rscript ${modular_script_dir}/pecotmr_integration/qtl_dataset_construct.R \\\n        --study ${study} \\\n        --genotype-prefix ${genotype_prefix} \\\n        --phenotype-manifest ${phenotype_manifest} \\\n        --genotype-covariates ${genotype_covariates if genotype_covariates.is_file() else '\"\"'} \\\n        ${'--transpose-covariates' if transpose_covariates else ''} \\\n        --maf-cutoff ${maf_cutoff} \\\n        --xvar-cutoff ${xvar_cutoff} \\\n        --output ${_output}\n"
    ]
   }
  ],
diff --git a/code/SoS/pecotmr_integration/twas.ipynb b/code/SoS/pecotmr_integration/twas.ipynb
new file mode 100644
index 00000000..9e0b0871
--- /dev/null
+++ b/code/SoS/pecotmr_integration/twas.ipynb
@@ -0,0 +1,80 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "source": [
+    "# Per-gene TWAS Z + Mendelian Randomization\n\n## Description\n\nFan out per row of a per-gene manifest. Each row pairs a gene's `TwasWeights` RDS with the matching `GwasSumStats` RDS for that gene's home LD block; the SoS step runs `twas.R` per row, which calls `pecotmr::causalInferencePipeline()` to compute the gene's TWAS Z + (optional) MR statistics. Optional per-gene `FineMappingResult` is wired through too.\n\n## Inputs\n\n- `--manifest` &mdash; per-gene manifest TSV with columns:\n    - `gene_id` &mdash; gene identifier (unique; used in the output filename).\n    - `twas_weights_rds` &mdash; path to the per-gene `TwasWeights` RDS (single TwasWeightsEntry).\n    - `gwas_sumstats_rds` &mdash; path to the per-LD-block `GwasSumStats` RDS that covers the gene's home block.\n    - `fine_mapping_result_rds` &mdash; optional; path to the matching per-gene `FineMappingResult` RDS (leave blank to skip).\n- `--study` &mdash; study identifier (used in output filenames).\n- `--mr-pip-cutoff` &mdash; pass-through (default 0.5).\n- `--mr-method` &mdash; pass-through (`\"ivwPerVariant\"` or `\"csAware\"`; default `\"ivwPerVariant\"`).\n\n## Output\n\n- `{cwd}/{study}.{gene_id}.twas.rds` per gene.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "source": [
+    "## Example\n\n```bash\nsos run pipeline/twas.ipynb twas \\\n    --cwd output --modular-script-dir /path/to/code/script \\\n    --study protocol_example_chr22 \\\n    --manifest /tmp/gwas_smoke/twas_manifest.tsv\n```\n\nManifest example:\n```\ngene_id\ttwas_weights_rds\tgwas_sumstats_rds\tfine_mapping_result_rds\nENSG00000130538\t/tmp/gwas_smoke/blessed_tw.rds\t/tmp/gwas_smoke/blocks/chr22_10516173_17414263.rds\t\n```\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "outputs": [],
+   "source": [
+    "[global]\nparameter: cwd = path('output')\nparameter: study = str\nparameter: manifest = path\nparameter: mr_pip_cutoff = 0.5\nparameter: mr_method = 'ivwPerVariant'\nparameter: modular_script_dir = path('code/script')\nparameter: container = ''\nparameter: job_size = 1\nparameter: walltime = '30m'\nparameter: mem = '8G'\nparameter: numThreads = 1\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "kernel": "SoS"
+   },
+   "outputs": [],
+   "source": [
+    "[twas]\n# Read the per-gene manifest. Header may be canonical (`gene_id`,\n# `twas_weights_rds`, `gwas_sumstats_rds`, `fine_mapping_result_rds`) and\n# may carry a leading `#`. fine_mapping_result_rds is optional per-row\n# (empty cell = skip).\ndef _read_manifest(path):\n    rows = []\n    with open(path) as f:\n        hdr = next(f).rstrip('\\n').lstrip('#').split('\\t')\n        required = {'gene_id', 'twas_weights_rds', 'gwas_sumstats_rds'}\n        missing = required - set(hdr)\n        if missing:\n            raise ValueError(\n                f\"Manifest missing required column(s): {sorted(missing)}\")\n        for line in f:\n            parts = line.rstrip('\\n').split('\\t')\n            row = dict(zip(hdr, parts))\n            if not row.get('gene_id'):\n                continue\n            rows.append(row)\n    if not rows:\n        raise ValueError(f\"Manifest {path} is empty.\")\n    ids = [r['gene_id'] for r in rows]\n    if len(set(ids)) != len(ids):\n        raise ValueError(\n            \"Manifest has duplicate gene_id values; each row must be unique.\")\n    return rows\nrows = _read_manifest(manifest)\ngene_ids   = [r['gene_id'] for r in rows]\ntw_paths   = [r['twas_weights_rds'] for r in rows]\ngss_paths  = [r['gwas_sumstats_rds'] for r in rows]\nfmr_paths  = [r.get('fine_mapping_result_rds', '') or '' for r in rows]\ninput: for_each = ['gene_ids', 'tw_paths', 'gss_paths', 'fmr_paths']\nfmr_arg = f\"--fine-mapping-result {_fmr_paths}\" if _fmr_paths else \"\"\noutput: f\"{cwd}/{study}.{_gene_ids}.twas.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n    Rscript ${modular_script_dir}/pecotmr_integration/twas.R \\\n        --twas-weights ${_tw_paths} \\\n        --gwas-sumstats ${_gss_paths} \\\n        ${fmr_arg} \\\n        --mr-pip-cutoff ${mr_pip_cutoff} \\\n        --mr-method ${mr_method} \\\n        --output ${_output}\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "SoS",
+   "language": "sos",
+   "name": "sos"
+  },
+  "language_info": {
+   "codemirror_mode": "sos",
+   "file_extension": ".sos",
+   "mimetype": "text/x-sos",
+   "name": "sos",
+   "nbconvert_exporter": "sos_notebook.converter.SoS_Exporter",
+   "pygments_lexer": "sos"
+  },
+  "sos": {
+   "kernels": [
+    [
+     "Bash",
+     "bash",
+     "Bash",
+     "#E6EEFF",
+     "shell"
+    ],
+    [
+     "SoS",
+     "sos",
+     "",
+     "",
+     "sos"
+    ]
+   ],
+   "version": "0.24.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/code/SoS/pecotmr_integration/twas_weights.ipynb b/code/SoS/pecotmr_integration/twas_weights.ipynb
index 99c817e5..c723922c 100644
--- a/code/SoS/pecotmr_integration/twas_weights.ipynb
+++ b/code/SoS/pecotmr_integration/twas_weights.ipynb
@@ -6,7 +6,7 @@
     "kernel": "SoS"
    },
    "source": [
-    "# Per-gene TWAS weights against a pre-built QtlDataset\n\n## Description\n\nFor a single study's `QtlDataset` RDS, run `pecotmr::twasWeightsPipeline(methods = \"default\", ...)` per gene. Gene-level parallelization is via SoS task fan-out: each task loads the same RDS and computes TWAS weights for one gene. The default method preset gives the standard univariate methods (SuSiE / lasso / elastic-net / etc. as defined by `pecotmr::.twasMethodLookup(\"default\")`).\n\nOptionally pass a pre-fit `--fine-mapping-result` RDS; SuSiE-family TWAS methods will reuse those fits instead of refitting.\n\n## Inputs\n\n- `--qtl-dataset` &mdash; path to the QtlDataset RDS produced by `qtl_dataset.ipynb`.\n- `--genes` &mdash; explicit gene-ID list.\n- `--cis-window` &mdash; bp window. Default 1,000,000.\n- `--fine-mapping-result` &mdash; optional pre-fit FineMappingResult RDS (e.g. the matching output of `fine_mapping.ipynb`); SuSiE-family TWAS methods will reuse the fits.\n\n## Output\n\n- `{cwd}/{study}.{gene}.twas_weights.rds` per gene.\n"
+    "# Per-gene / per-region TWAS weights against a pre-built QtlDataset\n\n## Description\n\nFor a single study's `QtlDataset` RDS, run `pecotmr::twasWeightsPipeline(methods = \"default\", ...)` per fan-out unit (gene or region). The default method preset gives the standard univariate methods (SuSiE / lasso / elastic-net / etc. as defined by `pecotmr::.twasMethodLookup(\"default\")`).\n\nOptionally pass a pre-fit `--fine-mapping-result` RDS; SuSiE-family TWAS methods reuse those fits via the `fineMappingResult` cache.\n\n## Inputs\n\n- `--qtl-dataset` &mdash; path to the QtlDataset RDS produced by `qtl_dataset.ipynb`.\n- `--genes ID1 ID2 \u2026` **or** `--regions chr:start-end \u2026` (mutually exclusive).\n- `--cis-window` &mdash; bp window (gene mode). Default 1,000,000.\n- `--fine-mapping-result` &mdash; optional pre-fit FineMappingResult RDS.\n- `--modular-script-dir` &mdash; directory containing the per-gene worker R scripts. Default `code/script`.\n\n## Output\n\n- `{cwd}/{study}.{gene|region}.twas_weights.rds` per fan-out unit.\n"
    ]
   },
   {
@@ -15,7 +15,7 @@
     "kernel": "SoS"
    },
    "source": [
-    "## Example\n\n```bash\nsos run pipeline/twas_weights.ipynb twas_weights \\\n    --cwd output \\\n    --study ROSMAP_DLPFC \\\n    --qtl-dataset output/ROSMAP_DLPFC.qtl_dataset.rds \\\n    --genes ENSG00000060237 ENSG00000234593 \\\n    --cis-window 1000000\n```\n"
+    "## Example\n\n```bash\nsos run pipeline/twas_weights.ipynb twas_weights \\\n    --cwd output \\\n    --modular-script-dir /path/to/xqtl-protocol/code/script \\\n    --study TEST_STUDY \\\n    --qtl-dataset output/TEST_STUDY.qtl_dataset.rds \\\n    --genes ENSG00000060237 ENSG00000234593\n```\n"
    ]
   },
   {
@@ -26,7 +26,7 @@
    },
    "outputs": [],
    "source": [
-    "[global]\nparameter: cwd = path('output')\nparameter: study = str\nparameter: qtl_dataset = path\nparameter: genes = []\nparameter: cis_window = 1000000\nparameter: fine_mapping_result = path('.')\nparameter: container = ''\nparameter: job_size = 1\nparameter: walltime = '1h'\nparameter: mem = '16G'\nparameter: numThreads = 1\n"
+    "[global]\nparameter: cwd = path('output')\nparameter: study = str\nparameter: qtl_dataset = path\nparameter: modular_script_dir = path('code/script')\n# Mutually exclusive fan-out sources: pass one as a list, leave the other empty.\nparameter: genes = []\nparameter: regions = []\nparameter: cis_window = 1000000\nparameter: fine_mapping_result = path('.')\nparameter: container = ''\nparameter: job_size = 1\nparameter: walltime = '30m'\nparameter: mem = '8G'\nparameter: numThreads = 1\n"
    ]
   },
   {
@@ -37,7 +37,7 @@
    },
    "outputs": [],
    "source": [
-    "[twas_weights]\ninput: qtl_dataset, for_each = 'genes'\noutput: f\"{cwd}/{study}.{_genes}.twas_weights.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n    Rscript ${path(\"code/script/pecotmr_integration/twas_weights.R\", absolute=True)} \\\n        --qtl-dataset ${_input} \\\n        --gene-id ${_genes} \\\n        --cis-window ${cis_window} \\\n        --fine-mapping-result ${fine_mapping_result if fine_mapping_result.is_file() else '\"\"'} \\\n        --output ${_output}\n"
+    "[twas_weights]\nif bool(genes) == bool(regions):\n    raise ValueError(\"Specify exactly one of --genes (trait IDs) or --regions (chr:start-end strings).\")\nfanout_items = genes if genes else regions\nfanout_kind  = 'gene' if genes else 'region'\ndef _fanout_safe(s):\n    return s.replace(':', '_').replace('-', '_')\ninput: qtl_dataset, for_each = 'fanout_items'\noutput: f\"{cwd}/{study}.{_fanout_safe(_fanout_items)}.twas_weights.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n    Rscript ${modular_script_dir}/pecotmr_integration/twas_weights.R \\\n        --qtl-dataset ${_input} \\\n        ${('--gene-id ' + _fanout_items + ' --cis-window ' + str(cis_window)) if fanout_kind == 'gene' else ('--region ' + _fanout_items)} \\\n        --fine-mapping-result ${fine_mapping_result if fine_mapping_result.is_file() else '\"\"'} \\\n        --output ${_output}\n"
    ]
   }
  ],
diff --git a/code/script/pecotmr_integration/coloc.R b/code/script/pecotmr_integration/coloc.R
new file mode 100644
index 00000000..55aebc7c
--- /dev/null
+++ b/code/script/pecotmr_integration/coloc.R
@@ -0,0 +1,96 @@
+#!/usr/bin/env Rscript
+# coloc.R
+#
+# Colocalization worker. Loads an S4 `QtlFineMappingResult` and either
+# an S4 `GwasFineMappingResult` or a `GwasSumStats` (colocPipeline does
+# inline fine-mapping when given sumstats), and calls
+# `pecotmr::colocPipeline()`. When `--enrichment` points at the output
+# of `qtl_enrichment.R`, the per-pair `p12` prior is scaled by
+# `(1 + enrichment)` (capped at `--p12-max`), the "enloc" variant of
+# coloc.
+#
+# Inputs:
+#   --qtl-fine-mapping     Path to S4 QtlFineMappingResult RDS
+#   --gwas-input           Path to S4 GwasFineMappingResult RDS, or
+#                          S4 GwasSumStats RDS (colocPipeline dispatches)
+#   --enrichment           Optional enrichment data.frame RDS (output of
+#                          qtl_enrichment.R)
+#   --filter-lbf-cs        Flag: keep only effects that produced a CS
+#   --filter-lbf-cs-secondary  Secondary coverage for CS-concentration filter (default NULL)
+#   --p1                   Per-variant QTL prior (default 1e-4)
+#   --p2                   Per-variant GWAS prior (default 1e-4)
+#   --p12                  Per-variant shared prior (default 5e-6)
+#   --p12-max              Cap on enrichment-adjusted p12 (default 1e-3)
+#   --no-adjust-pips       Disable adjustPips (default: PIPs renormalized to overlap)
+#   --finemapping-methods  Comma-separated list for inline GWAS FM when --gwas-input is GwasSumStats (default 'susie')
+#   --output               Output RDS path (data.frame of coloc results)
+
+suppressPackageStartupMessages({
+  library(argparser)
+  library(pecotmr)
+  library(coloc)
+})
+
+parser <- arg_parser("Colocalization via colocPipeline()")
+parser <- add_argument(parser, "--qtl-fine-mapping",
+                       help = "Path to S4 QtlFineMappingResult RDS",
+                       type = "character")
+parser <- add_argument(parser, "--gwas-input",
+                       help = "Path to S4 GwasFineMappingResult OR GwasSumStats RDS",
+                       type = "character")
+parser <- add_argument(parser, "--enrichment",
+                       help = "Optional enrichment data.frame RDS (from qtl_enrichment.R)",
+                       type = "character", default = "")
+parser <- add_argument(parser, "--filter-lbf-cs",
+                       help = "Keep only effects that produced a CS",
+                       flag = TRUE)
+parser <- add_argument(parser, "--filter-lbf-cs-secondary",
+                       help = "Secondary coverage for CS-concentration filter (default NA = off)",
+                       type = "numeric", default = NA)
+parser <- add_argument(parser, "--p1",
+                       help = "Per-variant QTL prior",
+                       type = "numeric", default = 1e-4)
+parser <- add_argument(parser, "--p2",
+                       help = "Per-variant GWAS prior",
+                       type = "numeric", default = 1e-4)
+parser <- add_argument(parser, "--p12",
+                       help = "Per-variant shared prior",
+                       type = "numeric", default = 5e-6)
+parser <- add_argument(parser, "--p12-max",
+                       help = "Cap on enrichment-adjusted p12",
+                       type = "numeric", default = 1e-3)
+parser <- add_argument(parser, "--no-adjust-pips",
+                       help = "Disable adjustPips (use FMRs as supplied)",
+                       flag = TRUE)
+parser <- add_argument(parser, "--finemapping-methods",
+                       help = "Comma-separated methods for inline GWAS FM (when --gwas-input is GwasSumStats)",
+                       type = "character", default = "susie")
+parser <- add_argument(parser, "--output",
+                       help = "Output RDS path", type = "character")
+argv <- parse_args(parser)
+
+qtlFmr   <- readRDS(argv$qtl_fine_mapping)
+gwasIn   <- readRDS(argv$gwas_input)
+enrich   <- if (nzchar(argv$enrichment) && argv$enrichment != "." &&
+                file.exists(argv$enrichment)) readRDS(argv$enrichment) else NULL
+
+methods <- trimws(strsplit(argv$finemapping_methods, ",", fixed = TRUE)[[1L]])
+
+res <- colocPipeline(
+  qtlFineMappingResult     = qtlFmr,
+  gwasInput                = gwasIn,
+  filterLbfCs              = argv$filter_lbf_cs,
+  filterLbfCsSecondary     = if (is.na(argv$filter_lbf_cs_secondary)) NULL
+                              else argv$filter_lbf_cs_secondary,
+  p1                       = argv$p1,
+  p2                       = argv$p2,
+  p12                      = argv$p12,
+  finemappingMethods       = methods,
+  enrichment               = enrich,
+  p12Max                   = argv$p12_max,
+  adjustPips               = !argv$no_adjust_pips)
+
+dir.create(dirname(argv$output), showWarnings = FALSE, recursive = TRUE)
+saveRDS(res, argv$output)
+cat(sprintf("Wrote colocPipeline result (%d row(s)) to %s\n",
+            nrow(res), argv$output))
diff --git a/code/script/pecotmr_integration/colocboost.R b/code/script/pecotmr_integration/colocboost.R
index 2508cb57..ba40c6da 100644
--- a/code/script/pecotmr_integration/colocboost.R
+++ b/code/script/pecotmr_integration/colocboost.R
@@ -1,48 +1,136 @@
 #!/usr/bin/env Rscript
 # colocboost.R
 #
-# Per-gene colocboost worker. Loads a pre-built pecotmr::QtlDataset RDS
-# and calls colocboostPipeline() for a single focal trait, colocalizing
-# its signal across the QtlDataset's contexts. Designed to be invoked
-# once per gene by the SoS fan-out in colocboost.ipynb.
+# Per-region colocboost worker. Loads a pre-built pecotmr::QtlDataset RDS
+# (and optionally a QC'd GwasSumStats RDS) and calls colocboostPipeline()
+# for either a single focal trait (gene mode) or a single genomic region
+# (region mode). Designed to be invoked once per fan-out unit by the SoS
+# step in colocboost.ipynb.
+#
+# Modes (mutually exclusive):
+#   gene   : --gene-id ENSG... --cis-window 1000000
+#            traitId = gene-id, focalTrait = gene-id.
+#   region : --region chr22:15000000-16000000
+#            Pipeline receives region = GRanges(...). No focal trait.
+#
+# Pipeline variants (any combination; at least one must be active):
+#   --xqtl-coloc      ON by default; within-study cross-context QTL coloc.
+#                     Pass --no-xqtl-coloc to disable.
+#   --joint-gwas      Off by default; xQTL+GWAS joint coloc. Requires
+#                     --gwas-sumstats.
+#   --separate-gwas   Off by default; one focal-GWAS coloc per GWAS study.
+#                     Requires --gwas-sumstats.
 #
 # Inputs:
 #   --qtl-dataset    Path to a QtlDataset RDS
-#   --gene-id        Focal trait identifier (and the only traitId passed
-#                    to the pipeline; colocboost colocalizes the focal
-#                    trait's signal across the QtlDataset's contexts)
+#   --gene-id        (gene mode) focal trait identifier
 #   --cis-window     cis-window in bp around the trait's TSS
+#   --region         (region mode) chr:start-end string
+#   --gwas-sumstats  Optional path to a QC'd GwasSumStats RDS
+#   --no-xqtl-coloc  Flag: disable xQTL-only cross-context coloc
+#   --joint-gwas     Flag: run xQTL+GWAS joint coloc
+#   --separate-gwas  Flag: run per-GWAS-study separate-focal coloc
 #   --output         Output RDS path (one colocboost-pipeline list)
 
 suppressPackageStartupMessages({
   library(argparser)
   library(pecotmr)
+  library(GenomicRanges)
+  library(IRanges)
 })
 
-parser <- arg_parser("Per-gene cross-context colocboost over a pre-built QtlDataset")
+parser <- arg_parser("Per-gene or per-region colocboost over a pre-built QtlDataset")
 parser <- add_argument(parser, "--qtl-dataset",
                        help = "Path to a QtlDataset RDS",
                        type = "character")
 parser <- add_argument(parser, "--gene-id",
-                       help = "Focal trait identifier",
-                       type = "character")
+                       help = "Focal trait identifier (gene mode); mutually exclusive with --region",
+                       type = "character", default = "")
 parser <- add_argument(parser, "--cis-window",
-                       help = "cis-window in bp around the trait's TSS",
+                       help = "cis-window in bp around the focal trait's TSS",
                        type = "integer", default = 1000000L)
+parser <- add_argument(parser, "--region",
+                       help = "Genomic region as chr:start-end (region mode); mutually exclusive with --gene-id",
+                       type = "character", default = "")
+parser <- add_argument(parser, "--gwas-sumstats",
+                       help = "Path to a QC'd GwasSumStats RDS (required when --joint-gwas or --separate-gwas is set)",
+                       type = "character", default = "")
+parser <- add_argument(parser, "--no-xqtl-coloc",
+                       help = "Disable the within-study cross-context xQTL coloc",
+                       flag = TRUE)
+parser <- add_argument(parser, "--joint-gwas",
+                       help = "Run xQTL+GWAS joint colocboost (requires --gwas-sumstats)",
+                       flag = TRUE)
+parser <- add_argument(parser, "--separate-gwas",
+                       help = "Run per-GWAS-study separate-focal coloc (requires --gwas-sumstats)",
+                       flag = TRUE)
 parser <- add_argument(parser, "--output",
                        help = "Output RDS path", type = "character")
 argv <- parse_args(parser)
 
-qd <- readRDS(argv$qtl_dataset)
+parse_region <- function(s) {
+  m <- regmatches(s, regexec("^([^:]+):([0-9]+)-([0-9]+)$", s))[[1L]]
+  if (length(m) != 4L)
+    stop("--region must be in chr:start-end format (got: ", s, ")")
+  GRanges(seqnames = m[[2L]],
+          ranges   = IRanges(start = as.integer(m[[3L]]),
+                             end   = as.integer(m[[4L]])))
+}
+
+# Mode validation
+has_gene   <- nzchar(argv$gene_id)
+has_region <- nzchar(argv$region)
+if (has_gene && has_region)
+  stop("--gene-id and --region are mutually exclusive; pass exactly one.")
+if (!has_gene && !has_region)
+  stop("Specify either --gene-id (with --cis-window) or --region.")
+
+xqtl_coloc    <- !argv$no_xqtl_coloc
+joint_gwas    <- argv$joint_gwas
+separate_gwas <- argv$separate_gwas
 
-res <- colocboostPipeline(
-  qd,
-  traitId    = argv$gene_id,
-  cisWindow  = argv$cis_window,
-  focalTrait = argv$gene_id,
-  xqtlColoc  = TRUE)
+if (!xqtl_coloc && !joint_gwas && !separate_gwas)
+  stop("All colocboost variants disabled. Enable at least one of xQTL coloc, --joint-gwas, --separate-gwas.")
+
+gwas_path <- argv$gwas_sumstats
+has_gwas  <- nzchar(gwas_path) && gwas_path != "." && file.exists(gwas_path)
+if ((joint_gwas || separate_gwas) && !has_gwas)
+  stop("--joint-gwas / --separate-gwas require --gwas-sumstats pointing at a QC'd GwasSumStats RDS.")
+
+qd  <- readRDS(argv$qtl_dataset)
+gss <- if (has_gwas) readRDS(gwas_path) else NULL
+
+res <- if (has_region) {
+  colocboostPipeline(
+    qd,
+    gwasSumStats = gss,
+    region       = parse_region(argv$region),
+    cisWindow    = argv$cis_window,
+    xqtlColoc    = xqtl_coloc,
+    jointGwas    = joint_gwas,
+    separateGwas = separate_gwas)
+} else {
+  colocboostPipeline(
+    qd,
+    gwasSumStats = gss,
+    traitId      = argv$gene_id,
+    cisWindow    = argv$cis_window,
+    focalTrait   = argv$gene_id,
+    xqtlColoc    = xqtl_coloc,
+    jointGwas    = joint_gwas,
+    separateGwas = separate_gwas)
+}
 
 dir.create(dirname(argv$output), showWarnings = FALSE, recursive = TRUE)
 saveRDS(res, argv$output)
-cat(sprintf("Wrote colocboost result for gene '%s' to %s\n",
-            argv$gene_id, argv$output))
+
+# Brief per-variant summary
+variants <- c(
+  if (xqtl_coloc) "xqtl-coloc" else NULL,
+  if (joint_gwas) "joint-gwas" else NULL,
+  if (separate_gwas) "separate-gwas" else NULL)
+cat(sprintf("Wrote colocboost result for %s [variants: %s] to %s\n",
+            if (has_region) paste0("region '", argv$region, "'")
+            else paste0("gene '", argv$gene_id, "'"),
+            paste(variants, collapse = ", "),
+            argv$output))
diff --git a/code/script/pecotmr_integration/ctwas_assemble.R b/code/script/pecotmr_integration/ctwas_assemble.R
new file mode 100644
index 00000000..747238b1
--- /dev/null
+++ b/code/script/pecotmr_integration/ctwas_assemble.R
@@ -0,0 +1,130 @@
+#!/usr/bin/env Rscript
+# ctwas_assemble.R
+#
+# cTWAS step 1: per-region manifest → assembled cTWAS inputs.
+# Reads the per-region manifest, loads each block's GwasSumStats /
+# TwasWeights RDS, and calls pecotmr::assembleCtwasInputs() to build
+# the named list of ctwas-shape inputs (z_snp, weights, region_info,
+# snp_map, LD_map, LD/snpInfo loader closures). The result is saved
+# to a single RDS that ctwas_est.R consumes downstream.
+#
+# Inputs:
+#   --manifest             Per-region manifest TSV with columns:
+#                            region_id              (string, unique)
+#                            gwas_sumstats_rds      (per-block GwasSumStats RDS)
+#                            twas_weights_rds       (comma-sep per-gene TwasWeights RDS; may be empty)
+#                            fine_mapping_result_rds (optional, comma-sep)
+#   --method               Which TWAS method to feed into ctwas
+#                          (default: NULL — resolves to 'ensemble' if
+#                          present, or sole method, or errors)
+#   --twas-z               Optional TWAS-Z GRanges RDS
+#   --twas-weight-cutoff   Pass-through (default 0)
+#   --cs-min-cor           Pass-through (default 0.8)
+#   --min-pip-cutoff       Pass-through (default 0)
+#   --max-num-variants     Pass-through (default Inf)
+#   --output               Output RDS path (the assembled `inputs` list)
+
+suppressPackageStartupMessages({
+  library(argparser)
+  library(pecotmr)
+  library(S4Vectors)
+})
+
+parser <- arg_parser("cTWAS step 1: assemble inputs from per-region manifest")
+parser <- add_argument(parser, "--manifest",
+                       help = "Per-region manifest TSV (region_id, gwas_sumstats_rds, twas_weights_rds[, fine_mapping_result_rds])",
+                       type = "character")
+parser <- add_argument(parser, "--method",
+                       help = "Which TWAS method to feed into ctwas (defaults to 'ensemble' if present, or sole method)",
+                       type = "character", default = "")
+parser <- add_argument(parser, "--twas-z",
+                       help = "Optional TWAS-Z GRanges RDS",
+                       type = "character", default = "")
+parser <- add_argument(parser, "--twas-weight-cutoff",
+                       help = "Drop weight variants with |w| below this",
+                       type = "numeric", default = 0)
+parser <- add_argument(parser, "--cs-min-cor",
+                       help = "CS purity floor for must-keep rescue",
+                       type = "numeric", default = 0.8)
+parser <- add_argument(parser, "--min-pip-cutoff",
+                       help = "PIP rescue threshold",
+                       type = "numeric", default = 0)
+parser <- add_argument(parser, "--max-num-variants",
+                       help = "Per-gene variant cap",
+                       type = "numeric", default = Inf)
+parser <- add_argument(parser, "--output",
+                       help = "Output RDS path (assembled cTWAS inputs)",
+                       type = "character")
+argv <- parse_args(parser)
+
+split_paths <- function(s) {
+  if (is.na(s) || !nzchar(s)) return(character(0))
+  trimws(strsplit(s, ",", fixed = TRUE)[[1L]])
+}
+
+manifest <- read.table(argv$manifest, header = TRUE, sep = "\t",
+                       stringsAsFactors = FALSE, check.names = FALSE,
+                       comment.char = "")
+required <- c("region_id", "gwas_sumstats_rds", "twas_weights_rds")
+missing <- setdiff(required, names(manifest))
+if (length(missing) > 0L)
+  stop("Manifest missing required column(s): ",
+       paste(missing, collapse = ", "))
+if (anyDuplicated(manifest$region_id))
+  stop("Manifest has duplicate region_id values.")
+if (nrow(manifest) < 2L)
+  stop("Manifest must list at least two LD blocks. cTWAS's EM cannot ",
+       "converge on a single region.")
+
+gwasSumStatsByRegion <- list()
+twasWeightsByRegion  <- list()
+fmrByRegion          <- list()
+for (i in seq_len(nrow(manifest))) {
+  rid <- manifest$region_id[[i]]
+  gwasSumStatsByRegion[[rid]] <- readRDS(manifest$gwas_sumstats_rds[[i]])
+  tw_paths <- split_paths(manifest$twas_weights_rds[[i]])
+  if (length(tw_paths) > 0L) {
+    tw_list <- lapply(tw_paths, readRDS)
+    twasWeightsByRegion[[rid]] <- if (length(tw_list) == 1L) tw_list[[1L]]
+                                  else Reduce(function(a, b)
+                                                 pecotmr:::.rbindTwasWeights(a, b),
+                                               tw_list)
+  }
+  if ("fine_mapping_result_rds" %in% names(manifest)) {
+    fmr_paths <- split_paths(manifest$fine_mapping_result_rds[[i]])
+    if (length(fmr_paths) > 0L) {
+      fmr_list <- lapply(fmr_paths, readRDS)
+      fmrByRegion[[rid]] <- if (length(fmr_list) == 1L) fmr_list[[1L]]
+                            else Reduce(function(a, b)
+                                          pecotmr:::.rbindFineMappingResult(a, b),
+                                        fmr_list)
+    }
+  }
+}
+if (length(twasWeightsByRegion) == 0L)
+  stop("Manifest yielded zero TwasWeights across all blocks.")
+
+# Optional precomputed TWAS-Z and fineMappingResult (single-object).
+tz <- if (nzchar(argv$twas_z) && argv$twas_z != "." && file.exists(argv$twas_z))
+        readRDS(argv$twas_z) else NULL
+fmr <- if (length(fmrByRegion) > 0L) {
+  if (length(fmrByRegion) == 1L) fmrByRegion[[1L]]
+  else Reduce(function(a, b) pecotmr:::.rbindFineMappingResult(a, b),
+              unname(fmrByRegion))
+} else NULL
+
+inputs <- assembleCtwasInputs(
+  gwasSumStats     = gwasSumStatsByRegion,
+  twasWeights      = twasWeightsByRegion,
+  twasZ            = tz,
+  fineMappingResult = fmr,
+  method           = if (nzchar(argv$method)) argv$method else NULL,
+  twasWeightCutoff = argv$twas_weight_cutoff,
+  csMinCor         = argv$cs_min_cor,
+  minPipCutoff     = argv$min_pip_cutoff,
+  maxNumVariants   = argv$max_num_variants)
+
+dir.create(dirname(argv$output), showWarnings = FALSE, recursive = TRUE)
+saveRDS(inputs, argv$output)
+cat(sprintf("Wrote assembled cTWAS inputs (%d regions, %d weights) to %s\n",
+            nrow(inputs$region_info), length(inputs$weights), argv$output))
diff --git a/code/script/pecotmr_integration/ctwas_est.R b/code/script/pecotmr_integration/ctwas_est.R
new file mode 100644
index 00000000..2f21218e
--- /dev/null
+++ b/code/script/pecotmr_integration/ctwas_est.R
@@ -0,0 +1,78 @@
+#!/usr/bin/env Rscript
+# ctwas_est.R
+#
+# cTWAS step 2: estimate group prior + prior variance.
+# Reads the assembled inputs from ctwas_assemble.R and calls
+# pecotmr::estCtwasParam() (which wraps ctwas::assemble_region_data +
+# ctwas::est_param). Saves the augmented state — `inputs` + `region_data`
+# + `boundary_genes` + `z_gene` + `param` — for ctwas_finemap.R to
+# consume downstream.
+#
+# Inputs:
+#   --inputs               RDS produced by ctwas_assemble.R
+#   --thin                 Pass-through (default 0.1)
+#   --niter-prefit         Pass-through (default 3)
+#   --niter                Pass-through (default 30)
+#   --group-prior-var-structure  Pass-through (default 'shared_type')
+#   --min-group-size       Pass-through (default 100)
+#   --min-p-single-effect  Pass-through (default 0.8)
+#   --fallback-to-prefit   Flag: on accurate-EM NaN, recover via prefit-only
+#   --ncore                Pass-through (default 1)
+#   --output               Output RDS path (the est state list)
+
+suppressPackageStartupMessages({
+  library(argparser)
+  library(pecotmr)
+})
+
+parser <- arg_parser("cTWAS step 2: estimate group prior + prior variance")
+parser <- add_argument(parser, "--inputs",
+                       help = "RDS produced by ctwas_assemble.R",
+                       type = "character")
+parser <- add_argument(parser, "--thin",
+                       help = "Thin", type = "numeric", default = 0.1)
+parser <- add_argument(parser, "--niter-prefit",
+                       help = "Prefit EM iterations",
+                       type = "integer", default = 3L)
+parser <- add_argument(parser, "--niter",
+                       help = "Accurate EM iterations",
+                       type = "integer", default = 30L)
+parser <- add_argument(parser, "--group-prior-var-structure",
+                       help = "Pass-through (default 'shared_type')",
+                       type = "character", default = "shared_type")
+parser <- add_argument(parser, "--min-group-size",
+                       help = "Minimum genes per ctwas group",
+                       type = "integer", default = 100L)
+parser <- add_argument(parser, "--min-p-single-effect",
+                       help = "Minimum p(single effect) for accurate EM region retention",
+                       type = "numeric", default = 0.8)
+parser <- add_argument(parser, "--fallback-to-prefit",
+                       help = "On accurate-EM NaN, fall back to prefit estimates",
+                       flag = TRUE)
+parser <- add_argument(parser, "--ncore",
+                       help = "Number of cores",
+                       type = "integer", default = 1L)
+parser <- add_argument(parser, "--output",
+                       help = "Output RDS path (est state)",
+                       type = "character")
+argv <- parse_args(parser)
+
+inputs <- readRDS(argv$inputs)
+est <- estCtwasParam(
+  inputs,
+  thin                   = argv$thin,
+  niterPrefit            = argv$niter_prefit,
+  niter                  = argv$niter,
+  groupPriorVarStructure = argv$group_prior_var_structure,
+  ncore                  = argv$ncore,
+  fallbackToPrefit       = argv$fallback_to_prefit,
+  min_group_size         = argv$min_group_size,
+  min_p_single_effect    = argv$min_p_single_effect)
+
+dir.create(dirname(argv$output), showWarnings = FALSE, recursive = TRUE)
+saveRDS(est, argv$output)
+cat(sprintf("Wrote estCtwasParam result to %s\n", argv$output))
+cat("group_prior:    ",
+    paste(format(est$param$group_prior, digits = 4), collapse = " | "), "\n")
+cat("group_prior_var:",
+    paste(format(est$param$group_prior_var, digits = 4), collapse = " | "), "\n")
diff --git a/code/script/pecotmr_integration/ctwas_finemap.R b/code/script/pecotmr_integration/ctwas_finemap.R
new file mode 100644
index 00000000..f3779c4f
--- /dev/null
+++ b/code/script/pecotmr_integration/ctwas_finemap.R
@@ -0,0 +1,87 @@
+#!/usr/bin/env Rscript
+# ctwas_finemap.R
+#
+# cTWAS step 3: screen regions + fine-map.
+# Reads the est state from ctwas_est.R and calls
+# pecotmr::screenCtwasRegions() then pecotmr::finemapCtwasRegions().
+# Optionally accepts a user-supplied param override RDS so the caller
+# can swap in hand-tuned priors when the accurate EM diverged and the
+# defaults aren't usable. Saves the final ctwas_sumstats-shape result.
+#
+# Inputs:
+#   --est                  RDS produced by ctwas_est.R
+#   --param-override       Optional RDS with $group_prior and
+#                          $group_prior_var to substitute before
+#                          screen/finemap. Mirrors the legacy ctwas_3
+#                          escape hatch for NaN-on-iter-2 EM divergence.
+#   --L                    Pass-through (default 5)
+#   --no-filter-L          Flag: disable ctwas's internal L >= 1 region screen
+#   --min-nonsnp-pip       Pass-through (default 0.5)
+#   --ncore                Pass-through (default 1)
+#   --output               Output RDS path (ctwas_sumstats-shape result)
+
+suppressPackageStartupMessages({
+  library(argparser)
+  library(pecotmr)
+})
+
+parser <- arg_parser("cTWAS step 3: screen regions + fine-map")
+parser <- add_argument(parser, "--est",
+                       help = "RDS produced by ctwas_est.R",
+                       type = "character")
+parser <- add_argument(parser, "--param-override",
+                       help = "Optional param-override RDS (overrides $param before screen/finemap)",
+                       type = "character", default = "")
+parser <- add_argument(parser, "--L",
+                       help = "Max number of credible sets per region",
+                       type = "integer", default = 5L)
+parser <- add_argument(parser, "--no-filter-L",
+                       help = "Disable ctwas's internal L >= 1 region screen (toy data)",
+                       flag = TRUE)
+parser <- add_argument(parser, "--min-nonsnp-pip",
+                       help = "min_nonSNP_PIP threshold for screen_regions",
+                       type = "numeric", default = 0.5)
+parser <- add_argument(parser, "--ncore",
+                       help = "Number of cores",
+                       type = "integer", default = 1L)
+parser <- add_argument(parser, "--output",
+                       help = "Output RDS path (ctwas_sumstats-shape result)",
+                       type = "character")
+argv <- parse_args(parser)
+
+est <- readRDS(argv$est)
+
+# Optional caller-supplied param override (NaN escape hatch).
+if (nzchar(argv$param_override) && argv$param_override != "." &&
+    file.exists(argv$param_override)) {
+  override <- readRDS(argv$param_override)
+  if (!is.null(override$group_prior))
+    est$param$group_prior <- override$group_prior
+  if (!is.null(override$group_prior_var))
+    est$param$group_prior_var <- override$group_prior_var
+  cat("Applied param override from ", argv$param_override, "\n", sep = "")
+}
+
+screened <- screenCtwasRegions(
+  est,
+  L              = argv$L,
+  ncore          = argv$ncore,
+  filter_L       = !argv$no_filter_L,
+  min_nonSNP_PIP = argv$min_nonsnp_pip)
+
+final <- finemapCtwasRegions(
+  screened,
+  L     = argv$L,
+  ncore = argv$ncore)
+
+dir.create(dirname(argv$output), showWarnings = FALSE, recursive = TRUE)
+saveRDS(final, argv$output)
+cat(sprintf("Wrote finemapCtwasRegions result to %s\n", argv$output))
+fm <- final$finemap_res
+if (!is.null(fm) && nrow(fm) > 0L) {
+  g <- fm[fm$type != "SNP", , drop = FALSE]
+  cat(sprintf("  finemap_res rows: %d (%d gene-level, %d SNP-level)\n",
+              nrow(fm), nrow(g), nrow(fm) - nrow(g)))
+} else {
+  cat("  finemap_res: NULL (no regions surviving filter_L >= 1)\n")
+}
diff --git a/code/script/pecotmr_integration/fine_mapping.R b/code/script/pecotmr_integration/fine_mapping.R
index 17c0c5f1..a9e81d0a 100644
--- a/code/script/pecotmr_integration/fine_mapping.R
+++ b/code/script/pecotmr_integration/fine_mapping.R
@@ -1,32 +1,55 @@
 #!/usr/bin/env Rscript
 # fine_mapping.R
 #
-# Per-gene fine-mapping worker. Loads a pre-built pecotmr::QtlDataset RDS
-# and calls fineMappingPipeline() for a single trait. Designed to be
-# invoked once per gene by the SoS fan-out in fine_mapping.ipynb.
+# SuSiE fine-mapping worker for either QTL (per-gene / per-region over a
+# pre-built QtlDataset) or GWAS (per-block over a GwasSumStats RDS).
+# Designed to be invoked once per fan-out unit by the SoS step in
+# fine_mapping.ipynb. `pecotmr::fineMappingPipeline` dispatches on the
+# input class.
 #
-# Inputs:
-#   --qtl-dataset    Path to a QtlDataset RDS (output of qtl_dataset_construct.R)
-#   --gene-id        Single trait identifier to fine-map
-#   --cis-window     cis-window in bp around the trait's TSS
-#   --coverage       SuSiE credible-set coverage (default 0.95)
-#   --output         Output RDS path (one QtlFineMappingResult)
+# Input modes (exactly one of --qtl-dataset / --gwas-sumstats):
+#
+# QTL — fan-out per gene or per region inside a single QtlDataset:
+#   --qtl-dataset <RDS>             pecotmr::QtlDataset (from qtl_dataset.ipynb)
+#   --gene-id ENSG... --cis-window 1000000   (gene mode)
+#   --region chr22:15000000-16000000          (region mode)
+#
+# GWAS — one call per per-block GwasSumStats RDS (each carrying its own
+# z-scores + LD sketch; no gene/region concept):
+#   --gwas-sumstats <RDS>           pecotmr::GwasSumStats (per LD block,
+#                                   typically from gwas_sumstats_construct.R)
+#
+# Shared:
+#   --methods       Comma-separated method tokens. Default "susie".
+#   --coverage      SuSiE credible-set coverage. Default 0.95.
+#   --output        Output RDS path.
 
 suppressPackageStartupMessages({
   library(argparser)
   library(pecotmr)
+  library(GenomicRanges)
+  library(IRanges)
 })
 
-parser <- arg_parser("Per-gene SuSiE fine-mapping over a pre-built QtlDataset")
+parser <- arg_parser("SuSiE fine-mapping over a pecotmr S4 input (QtlDataset or GwasSumStats)")
 parser <- add_argument(parser, "--qtl-dataset",
-                       help = "Path to a QtlDataset RDS",
-                       type = "character")
+                       help = "Path to a QtlDataset RDS (QTL mode)",
+                       type = "character", default = "")
+parser <- add_argument(parser, "--gwas-sumstats",
+                       help = "Path to a GwasSumStats RDS (GWAS mode)",
+                       type = "character", default = "")
 parser <- add_argument(parser, "--gene-id",
-                       help = "Single trait identifier to fine-map",
-                       type = "character")
+                       help = "Trait identifier (QTL gene mode); mutually exclusive with --region",
+                       type = "character", default = "")
 parser <- add_argument(parser, "--cis-window",
-                       help = "cis-window in bp around the trait's TSS",
+                       help = "cis-window in bp around the trait's TSS (QTL gene mode)",
                        type = "integer", default = 1000000L)
+parser <- add_argument(parser, "--region",
+                       help = "Genomic region as chr:start-end (QTL region mode)",
+                       type = "character", default = "")
+parser <- add_argument(parser, "--methods",
+                       help = "Comma-separated fine-mapping method tokens",
+                       type = "character", default = "susie")
 parser <- add_argument(parser, "--coverage",
                        help = "SuSiE credible-set coverage",
                        type = "numeric", default = 0.95)
@@ -34,16 +57,61 @@ parser <- add_argument(parser, "--output",
                        help = "Output RDS path", type = "character")
 argv <- parse_args(parser)
 
-qd <- readRDS(argv$qtl_dataset)
+parse_region <- function(s) {
+  m <- regmatches(s, regexec("^([^:]+):([0-9]+)-([0-9]+)$", s))[[1L]]
+  if (length(m) != 4L)
+    stop("--region must be in chr:start-end format (got: ", s, ")")
+  GRanges(seqnames = m[[2L]],
+          ranges   = IRanges(start = as.integer(m[[3L]]),
+                             end   = as.integer(m[[4L]])))
+}
+
+has_qtl  <- nzchar(argv$qtl_dataset)
+has_gwas <- nzchar(argv$gwas_sumstats)
+if (has_qtl && has_gwas)
+  stop("--qtl-dataset and --gwas-sumstats are mutually exclusive; pass exactly one.")
+if (!has_qtl && !has_gwas)
+  stop("Specify either --qtl-dataset (QTL mode) or --gwas-sumstats (GWAS mode).")
+
+methods <- trimws(strsplit(argv$methods, ",", fixed = TRUE)[[1L]])
 
-res <- fineMappingPipeline(
-  qd,
-  methods    = "susie",
-  traitId    = argv$gene_id,
-  cisWindow  = argv$cis_window,
-  coverage   = argv$coverage)
+if (has_gwas) {
+  # ----- GWAS mode -------------------------------------------------------
+  gss <- readRDS(argv$gwas_sumstats)
+  res <- fineMappingPipeline(
+    gss,
+    methods  = methods,
+    coverage = argv$coverage)
+  label <- paste0("GwasSumStats '", basename(argv$gwas_sumstats), "'")
+} else {
+  # ----- QTL mode --------------------------------------------------------
+  has_gene   <- nzchar(argv$gene_id)
+  has_region <- nzchar(argv$region)
+  if (has_gene && has_region)
+    stop("--gene-id and --region are mutually exclusive (QTL mode); pass exactly one.")
+  if (!has_gene && !has_region)
+    stop("QTL mode requires --gene-id (with --cis-window) or --region.")
+  qd <- readRDS(argv$qtl_dataset)
+  res <- if (has_region) {
+    fineMappingPipeline(
+      qd,
+      methods   = methods,
+      region    = parse_region(argv$region),
+      cisWindow = argv$cis_window,
+      coverage  = argv$coverage)
+  } else {
+    fineMappingPipeline(
+      qd,
+      methods    = methods,
+      traitId    = argv$gene_id,
+      cisWindow  = argv$cis_window,
+      coverage   = argv$coverage)
+  }
+  label <- if (has_region) paste0("region '", argv$region, "'")
+           else paste0("gene '", argv$gene_id, "'")
+}
 
 dir.create(dirname(argv$output), showWarnings = FALSE, recursive = TRUE)
 saveRDS(res, argv$output)
-cat(sprintf("Wrote fineMapping result for gene '%s' (%d row(s)) to %s\n",
-            argv$gene_id, nrow(res), argv$output))
+cat(sprintf("Wrote fineMapping result for %s (%d row(s)) to %s\n",
+            label, nrow(res), argv$output))
diff --git a/code/script/pecotmr_integration/gwas_sumstats_construct.R b/code/script/pecotmr_integration/gwas_sumstats_construct.R
new file mode 100644
index 00000000..04f505c7
--- /dev/null
+++ b/code/script/pecotmr_integration/gwas_sumstats_construct.R
@@ -0,0 +1,189 @@
+#!/usr/bin/env Rscript
+# gwas_sumstats_construct.R
+#
+# Build one pecotmr::GwasSumStats over a single LD block from a GWAS
+# summary-statistics TSV and an LD-reference meta file, run
+# summaryStatsQc(), and serialize to RDS. The resulting RDS is the
+# per-(study, block) input to twasWeightsPipeline downstream consumers,
+# causalInferencePipeline, and ctwasPipeline.
+#
+# Inputs:
+#   --study           Single study identifier
+#   --gwas-tsv        Path to a tabix-indexed (or plain) GWAS TSV
+#                     Standard columns (with aliases):
+#                       #chrom|chrom|chr, pos, variant_id|SNP,
+#                       A1, A2, z|Z, n_sample|N
+#                     Optional: effect_allele_frequency, p, beta, se
+#   --ld-block        Genomic interval as chr:start-end (one LD block)
+#   --ld-meta         LD-meta TSV (#chr, start, end, path)
+#   --genome          Genome build label (e.g. "GRCh38"). Default "GRCh38"
+#   --output          Output RDS path
+
+suppressPackageStartupMessages({
+  library(argparser)
+  library(pecotmr)
+  library(GenomicRanges)
+  library(IRanges)
+  library(S4Vectors)
+})
+
+parser <- arg_parser("Build a per-LD-block GwasSumStats RDS")
+parser <- add_argument(parser, "--study",
+                       help = "Single study identifier",
+                       type = "character")
+parser <- add_argument(parser, "--gwas-tsv",
+                       help = "Path to GWAS summary-statistics TSV",
+                       type = "character")
+parser <- add_argument(parser, "--ld-block",
+                       help = "LD block as chr:start-end",
+                       type = "character")
+parser <- add_argument(parser, "--ld-meta",
+                       help = "Path to LD-meta TSV",
+                       type = "character")
+parser <- add_argument(parser, "--genome",
+                       help = "Genome build label",
+                       type = "character", default = "GRCh38")
+parser <- add_argument(parser, "--skip-qc",
+                       help = "Skip summaryStatsQc() (debug-only; the output GwasSumStats will have qcInfo = list() and fineMappingPipeline / twasWeightsPipeline will reject it)",
+                       flag = TRUE)
+parser <- add_argument(parser, "--output",
+                       help = "Output RDS path", type = "character")
+argv <- parse_args(parser)
+
+parse_region <- function(s) {
+  m <- regmatches(s, regexec("^([^:]+):([0-9]+)-([0-9]+)$", s))[[1L]]
+  if (length(m) != 4L)
+    stop("--ld-block must be in chr:start-end format (got: ", s, ")")
+  list(chr = m[[2L]], start = as.integer(m[[3L]]), end = as.integer(m[[4L]]))
+}
+
+block <- parse_region(argv$ld_block)
+
+# ----- Read GWAS TSV and subset to the LD block -----------------------------
+gwas <- read.table(if (grepl("\\.gz$", argv$gwas_tsv)) gzfile(argv$gwas_tsv)
+                   else argv$gwas_tsv,
+                   header = TRUE, sep = "\t",
+                   stringsAsFactors = FALSE, check.names = FALSE,
+                   comment.char = "")
+
+pick <- function(opts, where) intersect(opts, names(where))[1L]
+chr_col <- pick(c("#chrom", "chrom", "chr"), gwas)
+pos_col <- pick(c("pos", "position", "BP"), gwas)
+snp_col <- pick(c("variant_id", "SNP", "rsid"), gwas)
+a1_col  <- pick(c("A1", "a1"), gwas)
+a2_col  <- pick(c("A2", "a2"), gwas)
+z_col   <- pick(c("z", "Z"), gwas)
+n_col   <- pick(c("n_sample", "N", "n"), gwas)
+
+if (any(is.na(c(chr_col, pos_col, snp_col, a1_col, a2_col, z_col, n_col))))
+  stop("--gwas-tsv missing one of required columns ",
+       "(chrom/pos/variant_id/A1/A2/z/n_sample) in: ", argv$gwas_tsv)
+
+# Normalise chromosome label to match the LD block's
+chrom_vals <- as.character(gwas[[chr_col]])
+if (!startsWith(chrom_vals[[1L]], "chr"))
+  chrom_vals <- paste0("chr", chrom_vals)
+
+pos_vals <- as.integer(gwas[[pos_col]])
+keep <- chrom_vals == block$chr & pos_vals >= block$start & pos_vals <= block$end
+sub <- gwas[keep, , drop = FALSE]
+if (nrow(sub) == 0L)
+  stop("No GWAS variants fall in LD block ", argv$ld_block,
+       " (after chromosome normalisation).")
+
+# ----- Build the LD-reference GenotypeHandle for the block ------------------
+# Resolve the LD-meta row covering this block directly: pecotmr's
+# GenotypeHandle(ldMeta=…, region=…) currently round-trips the meta file
+# path as the data path (an upstream bug), so we read the meta TSV
+# ourselves and call the appropriate GenotypeHandle constructor.
+ld_meta_dir <- dirname(normalizePath(argv$ld_meta))
+ld_meta_df <- read.table(argv$ld_meta, header = TRUE, sep = "\t",
+                          stringsAsFactors = FALSE, check.names = FALSE,
+                          comment.char = "")
+# Header may be "#chr", "#chrom", "chr", or "chrom".
+ld_chr_col <- intersect(c("#chr", "#chrom", "chr", "chrom"),
+                        names(ld_meta_df))[1L]
+if (is.na(ld_chr_col))
+  stop("Could not find a chromosome column in ", argv$ld_meta,
+       " (expected one of '#chr' / '#chrom' / 'chr' / 'chrom'); got: ",
+       paste(names(ld_meta_df), collapse = ", "))
+# Normalize both sides to the bare chromosome label (no "chr" prefix).
+ld_chr_norm    <- sub("^chr", "",
+                      as.character(ld_meta_df[[ld_chr_col]]),
+                      ignore.case = TRUE)
+block_chr_norm <- sub("^chr", "", block$chr, ignore.case = TRUE)
+# `start = 0, end = 0` is the meta convention for "whole chromosome" — any
+# block on that chromosome is covered.
+ld_start <- suppressWarnings(as.integer(ld_meta_df$start))
+ld_end   <- suppressWarnings(as.integer(ld_meta_df$end))
+whole_chrom <- !is.na(ld_start) & !is.na(ld_end) &
+               ld_start == 0L & ld_end == 0L
+covers <- which(ld_chr_norm == block_chr_norm &
+                (whole_chrom |
+                 (ld_start <= block$start & ld_end >= block$end)))
+if (length(covers) == 0L)
+  stop("No LD-meta row in ", argv$ld_meta, " fully covers ", argv$ld_block)
+if (length(covers) > 1L)
+  stop("Multiple LD-meta rows cover ", argv$ld_block,
+       "; restrict to a single LD block.")
+ld_prefix <- ld_meta_df$path[covers[[1L]]]
+if (!startsWith(ld_prefix, "/"))
+  ld_prefix <- file.path(ld_meta_dir, ld_prefix)
+
+# Detect data format from companion file extensions
+if (file.exists(paste0(ld_prefix, ".pgen"))) {
+  ld_handle <- GenotypeHandle(plink2Prefix = ld_prefix)
+} else if (file.exists(paste0(ld_prefix, ".bed"))) {
+  ld_handle <- GenotypeHandle(plink1Prefix = ld_prefix)
+} else if (file.exists(paste0(ld_prefix, ".gds"))) {
+  ld_handle <- GenotypeHandle(path = paste0(ld_prefix, ".gds"))
+} else if (file.exists(paste0(ld_prefix, ".vcf.gz"))) {
+  ld_handle <- GenotypeHandle(path = paste0(ld_prefix, ".vcf.gz"))
+} else {
+  stop("Could not find a recognised genotype payload at LD-meta prefix: ", ld_prefix,
+       " (looked for .pgen / .bed / .gds / .vcf.gz)")
+}
+
+# ----- Build GwasSumStats entry GRanges -------------------------------------
+entry_gr <- GRanges(
+  seqnames = chrom_vals[keep],
+  ranges   = IRanges(start = pos_vals[keep], width = 1L))
+mcols(entry_gr) <- DataFrame(
+  SNP = as.character(sub[[snp_col]]),
+  A1  = as.character(sub[[a1_col]]),
+  A2  = as.character(sub[[a2_col]]),
+  Z   = as.numeric(sub[[z_col]]),
+  N   = as.integer(sub[[n_col]]))
+# Optional columns when present
+opt_cols <- c(beta = "beta", se = "se", p = c("p", "pvalue"),
+              maf = c("effect_allele_frequency", "maf", "MAF"),
+              info = c("info", "INFO"))
+for (slot in c("BETA", "SE", "P", "MAF", "INFO")) {
+  src <- intersect(c(slot, tolower(slot),
+                     if (slot == "MAF") c("effect_allele_frequency"),
+                     if (slot == "P")   c("pvalue", "p")),
+                   names(sub))[1L]
+  if (!is.na(src)) {
+    mcols(entry_gr)[[slot]] <- if (slot == "N") as.integer(sub[[src]])
+                                else as.numeric(sub[[src]])
+  }
+}
+
+# ----- Construct + QC + save ------------------------------------------------
+gss <- GwasSumStats(
+  study    = argv$study,
+  entry    = list(entry_gr),
+  genome   = argv$genome,
+  ldSketch = ld_handle)
+gss_out <- if (argv$skip_qc) {
+  message("--skip-qc set; serialising raw GwasSumStats without summaryStatsQc().")
+  gss
+} else {
+  summaryStatsQc(gss)
+}
+
+dir.create(dirname(argv$output), showWarnings = FALSE, recursive = TRUE)
+saveRDS(gss_out, argv$output)
+cat(sprintf("Wrote %sGwasSumStats for study '%s' over %s (%d variants in) to %s\n",
+            if (argv$skip_qc) "(skip-QC) " else "QC'd ",
+            argv$study, argv$ld_block, length(entry_gr), argv$output))
diff --git a/code/script/pecotmr_integration/legacy_ctwas_weights_to_s4.R b/code/script/pecotmr_integration/legacy_ctwas_weights_to_s4.R
new file mode 100644
index 00000000..9394f93f
--- /dev/null
+++ b/code/script/pecotmr_integration/legacy_ctwas_weights_to_s4.R
@@ -0,0 +1,157 @@
+#!/usr/bin/env Rscript
+# legacy_ctwas_weights_to_s4.R
+#
+# One-shot extractor: legacy `*.ctwas_weights.*.rds` (the OUTPUT of the
+# legacy [ctwas_1] step) into the new S4 `pecotmr::TwasWeights` shape.
+#
+# Why this exists: the legacy MWE ships
+# `protocol_example.ctwas_weights.protocol_example_twas_chr22.chr22.rds`
+# with strong weights (max|w| ≈ 1.35) that demonstrably drive ctwas to
+# the documented gene-Z = 5.46 result, BUT those weights are not
+# reproducible from the upstream `.reshaped_toy.rds` via the documented
+# pipeline (the legacy code path produces ~1e-5 values, not ~1e-0).
+# The legacy `ctwas_weights.rds` therefore appears to come from a
+# stronger upstream fit that isn't shipped. To smoke-test our new
+# `ctwasPipeline` end-to-end with default thresholds, we extract those
+# already-on-correlation-scale weights into an S4 TwasWeights so the
+# pipeline sees the same numerical input the legacy ctwas_3 step did.
+#
+# The extracted weights are STANDARDIZED (already on the correlation
+# scale — the legacy multiplied by sqrt(variance) when building the
+# file), so the resulting TwasWeightsEntry carries `standardized = TRUE`
+# and our `.ctwasBuildWeights` will skip the sqrt(variance) scaling.
+#
+# Usage:
+#   Rscript legacy_ctwas_weights_to_s4.R \
+#     --legacy <legacy_ctwas_weights.rds> \
+#     --study <study_label> \
+#     --method <method_label>  (default "susie")
+#     --ld-meta <ld_meta.tsv> --ld-block <chr:start-end>   (or --ld-prefix)
+#     --output <out.rds>
+
+suppressPackageStartupMessages({
+  library(argparser)
+  library(pecotmr)
+})
+
+parser <- arg_parser("Convert legacy ctwas_weights.rds to S4 TwasWeights")
+parser <- add_argument(parser, "--legacy",
+                       help = "Path to legacy *.ctwas_weights.*.rds",
+                       type = "character")
+parser <- add_argument(parser, "--study",
+                       help = "Study label to stamp on every entry",
+                       type = "character")
+parser <- add_argument(parser, "--method",
+                       help = "Method label to stamp on every entry (default 'susie')",
+                       type = "character", default = "susie")
+parser <- add_argument(parser, "--ld-meta",
+                       help = "LD-meta TSV path (used with --ld-block)",
+                       type = "character", default = "")
+parser <- add_argument(parser, "--ld-block",
+                       help = "LD block as chr:start-end (with --ld-meta)",
+                       type = "character", default = "")
+parser <- add_argument(parser, "--ld-prefix",
+                       help = "Explicit LD prefix (bypasses LD-meta lookup)",
+                       type = "character", default = "")
+parser <- add_argument(parser, "--output",
+                       help = "Output RDS path", type = "character")
+argv <- parse_args(parser)
+
+resolve_ld_prefix <- function(meta_path, block_str) {
+  m <- regmatches(block_str, regexec("^([^:]+):([0-9]+)-([0-9]+)$", block_str))[[1L]]
+  if (length(m) != 4L)
+    stop("--ld-block must be chr:start-end (got: ", block_str, ")")
+  block <- list(chr = m[[2L]], start = as.integer(m[[3L]]),
+                end = as.integer(m[[4L]]))
+  meta_dir <- dirname(normalizePath(meta_path))
+  meta <- read.table(meta_path, header = TRUE, sep = "\t",
+                     stringsAsFactors = FALSE, check.names = FALSE,
+                     comment.char = "")
+  chr_col <- intersect(c("#chr", "#chrom", "chr", "chrom"), names(meta))[1L]
+  chr_norm   <- sub("^chr", "", as.character(meta[[chr_col]]), ignore.case = TRUE)
+  block_norm <- sub("^chr", "", block$chr, ignore.case = TRUE)
+  s <- suppressWarnings(as.integer(meta$start))
+  e <- suppressWarnings(as.integer(meta$end))
+  whole <- !is.na(s) & !is.na(e) & s == 0L & e == 0L
+  hit <- which(chr_norm == block_norm &
+               (whole | (s <= block$start & e >= block$end)))
+  if (length(hit) != 1L)
+    stop("LD-meta lookup for ", block_str, " returned ", length(hit), " rows.")
+  pfx <- meta$path[hit]
+  if (!startsWith(pfx, "/")) pfx <- file.path(meta_dir, pfx)
+  pfx
+}
+
+open_handle <- function(prefix) {
+  if (file.exists(paste0(prefix, ".pgen"))) GenotypeHandle(plink2Prefix = prefix)
+  else if (file.exists(paste0(prefix, ".bed"))) GenotypeHandle(plink1Prefix = prefix)
+  else if (file.exists(paste0(prefix, ".gds"))) GenotypeHandle(path = paste0(prefix, ".gds"))
+  else if (file.exists(paste0(prefix, ".vcf.gz"))) GenotypeHandle(path = paste0(prefix, ".vcf.gz"))
+  else stop("No genotype payload at LD prefix: ", prefix)
+}
+
+ld_handle <- if (nzchar(argv$ld_prefix)) {
+  open_handle(argv$ld_prefix)
+} else if (nzchar(argv$ld_meta) && nzchar(argv$ld_block)) {
+  open_handle(resolve_ld_prefix(argv$ld_meta, argv$ld_block))
+} else {
+  stop("Provide either --ld-prefix or (--ld-meta AND --ld-block).")
+}
+
+# --- Walk the legacy structure and build TwasWeightsEntry per gene ----
+legacy <- readRDS(argv$legacy)
+if (!is.list(legacy) || length(legacy) == 0L)
+  stop("Legacy ctwas_weights RDS is empty or not a list: ", argv$legacy)
+
+studies  <- character(0)
+contexts <- character(0)
+traits   <- character(0)
+methods  <- character(0)
+entries  <- list()
+
+# Legacy keys: "<molecular_id>|<type>_<context>"; values are a list with
+# wgt (variants × 1 matrix), molecular_id, type, context, etc.
+for (gene_key in names(legacy)) {
+  entry_obj <- legacy[[gene_key]]
+  if (!is.list(entry_obj) || is.null(entry_obj$wgt)) next
+  wgt_mat <- entry_obj$wgt
+  if (nrow(wgt_mat) == 0L) next
+
+  vids <- rownames(wgt_mat)
+  wvec <- as.numeric(wgt_mat[, 1L])
+
+  trait     <- entry_obj$molecular_id %||% sub("\\|.*$", "", gene_key)
+  ctx_label <- entry_obj$context %||% sub("^.*\\|", "", gene_key)
+
+  weights_one_col <- matrix(wvec, ncol = 1L,
+                             dimnames = list(vids, argv$method))
+
+  tw_entry <- TwasWeightsEntry(
+    variantIds    = vids,
+    weights       = weights_one_col,
+    # Already on the correlation scale (legacy multiplied by
+    # sqrt(variance)); skip the sqrt(variance) step downstream.
+    standardized  = TRUE)
+
+  studies  <- c(studies,  argv$study)
+  contexts <- c(contexts, ctx_label)
+  traits   <- c(traits,   trait)
+  methods  <- c(methods,  argv$method)
+  entries  <- c(entries,  list(tw_entry))
+}
+
+if (length(entries) == 0L)
+  stop("Converter produced 0 entries from ", argv$legacy)
+
+tw <- TwasWeights(
+  study    = studies,
+  context  = contexts,
+  trait    = traits,
+  method   = methods,
+  entry    = entries,
+  ldSketch = ld_handle)
+
+dir.create(dirname(argv$output), showWarnings = FALSE, recursive = TRUE)
+saveRDS(tw, argv$output)
+cat(sprintf("Wrote S4 TwasWeights (%d entries; %d unique traits) to %s\n",
+            length(entries), length(unique(traits)), argv$output))
diff --git a/code/script/pecotmr_integration/legacy_twas_weights_to_s4.R b/code/script/pecotmr_integration/legacy_twas_weights_to_s4.R
new file mode 100644
index 00000000..f224636c
--- /dev/null
+++ b/code/script/pecotmr_integration/legacy_twas_weights_to_s4.R
@@ -0,0 +1,206 @@
+#!/usr/bin/env Rscript
+# legacy_twas_weights_to_s4.R
+#
+# One-shot converter: legacy `*.univariate_twas_weights.rds` (top-level
+# gene IDs -> context names -> `twas_weights` + `twas_cv_result` etc.)
+# into the new S4 `pecotmr::TwasWeights` collection.
+#
+# Usage:
+#   Rscript legacy_twas_weights_to_s4.R \
+#     --legacy <legacy.rds> \
+#     --study <study_label> \
+#     --ld-meta <ld_meta.tsv> \
+#     --ld-block <chr:start-end>  (or --ld-prefix <prefix>) \
+#     --output <out.rds>
+#
+# The LD-block / LD-prefix is required because TwasWeights carries an
+# `ldSketch` slot that must match the GwasSumStats' LD sketch downstream.
+
+suppressPackageStartupMessages({
+  library(argparser)
+  library(pecotmr)
+})
+
+parser <- arg_parser("Convert legacy TwasWeights RDS to S4 TwasWeights")
+parser <- add_argument(parser, "--legacy",
+                       help = "Path to legacy *.univariate_twas_weights.rds",
+                       type = "character")
+parser <- add_argument(parser, "--study",
+                       help = "Study label to stamp on every entry",
+                       type = "character")
+parser <- add_argument(parser, "--ld-meta",
+                       help = "Path to LD-meta TSV (used when --ld-block is given)",
+                       type = "character", default = "")
+parser <- add_argument(parser, "--ld-block",
+                       help = "LD block as chr:start-end (resolved against --ld-meta)",
+                       type = "character", default = "")
+parser <- add_argument(parser, "--ld-prefix",
+                       help = "Explicit PLINK2/PLINK1/GDS/VCF prefix; bypasses LD-meta lookup",
+                       type = "character", default = "")
+parser <- add_argument(parser, "--output",
+                       help = "Output RDS path", type = "character")
+argv <- parse_args(parser)
+
+# --- Build an LD GenotypeHandle (same logic as gwas_sumstats_construct.R) ----
+resolve_ld_prefix <- function(meta_path, block_str) {
+  m <- regmatches(block_str, regexec("^([^:]+):([0-9]+)-([0-9]+)$", block_str))[[1L]]
+  if (length(m) != 4L)
+    stop("--ld-block must be chr:start-end (got: ", block_str, ")")
+  block <- list(chr = m[[2L]], start = as.integer(m[[3L]]),
+                end = as.integer(m[[4L]]))
+  meta_dir <- dirname(normalizePath(meta_path))
+  meta <- read.table(meta_path, header = TRUE, sep = "\t",
+                     stringsAsFactors = FALSE, check.names = FALSE,
+                     comment.char = "")
+  chr_col <- intersect(c("#chr", "#chrom", "chr", "chrom"), names(meta))[1L]
+  if (is.na(chr_col))
+    stop("Could not find a chromosome column in ", meta_path)
+  chr_norm    <- sub("^chr", "", as.character(meta[[chr_col]]), ignore.case = TRUE)
+  block_norm  <- sub("^chr", "", block$chr, ignore.case = TRUE)
+  s <- suppressWarnings(as.integer(meta$start))
+  e <- suppressWarnings(as.integer(meta$end))
+  whole <- !is.na(s) & !is.na(e) & s == 0L & e == 0L
+  hit <- which(chr_norm == block_norm &
+               (whole | (s <= block$start & e >= block$end)))
+  if (length(hit) != 1L)
+    stop("LD-meta lookup for ", block_str, " returned ", length(hit), " rows.")
+  pfx <- meta$path[hit]
+  if (!startsWith(pfx, "/")) pfx <- file.path(meta_dir, pfx)
+  pfx
+}
+
+open_handle <- function(prefix) {
+  if (file.exists(paste0(prefix, ".pgen"))) GenotypeHandle(plink2Prefix = prefix)
+  else if (file.exists(paste0(prefix, ".bed"))) GenotypeHandle(plink1Prefix = prefix)
+  else if (file.exists(paste0(prefix, ".gds"))) GenotypeHandle(path = paste0(prefix, ".gds"))
+  else if (file.exists(paste0(prefix, ".vcf.gz"))) GenotypeHandle(path = paste0(prefix, ".vcf.gz"))
+  else stop("No recognised genotype payload at LD prefix: ", prefix)
+}
+
+ld_handle <- if (nzchar(argv$ld_prefix)) {
+  open_handle(argv$ld_prefix)
+} else if (nzchar(argv$ld_meta) && nzchar(argv$ld_block)) {
+  open_handle(resolve_ld_prefix(argv$ld_meta, argv$ld_block))
+} else {
+  stop("Provide either --ld-prefix or (--ld-meta AND --ld-block).")
+}
+
+# --- Walk the legacy structure and build TwasWeightsEntry per (gene, context) ----
+legacy <- readRDS(argv$legacy)
+if (!is.list(legacy) || length(legacy) == 0L)
+  stop("Legacy RDS is empty or not a list: ", argv$legacy)
+
+studies  <- character(0)
+contexts <- character(0)
+traits   <- character(0)
+methods  <- character(0)
+entries  <- list()
+
+for (gene in names(legacy)) {
+  gene_obj <- legacy[[gene]]
+  if (!is.list(gene_obj)) next
+  for (ctx_name in names(gene_obj)) {
+    ctx_obj <- gene_obj[[ctx_name]]
+    if (!is.list(ctx_obj) ||
+        is.null(ctx_obj$twas_weights) ||
+        is.null(ctx_obj$twas_cv_result)) next
+    wts_list <- ctx_obj$twas_weights
+    if (length(wts_list) == 0L) next
+
+    # Variant ids: rownames of the first matrix-shaped weight column. Some
+    # legacy fields are length-N numeric vectors (e.g. `ensemble_weights`);
+    # coerce them to single-column matrices below.
+    first_mat <- Find(is.matrix, wts_list)
+    if (is.null(first_mat))
+      stop("Legacy weights for ", gene, "/", ctx_name,
+           " contain no matrix-shaped entry; cannot recover variant IDs.")
+    vids <- rownames(first_mat)
+    method_names <- sub("_weights$", "", names(wts_list))
+
+    # CV performance per method (subset of methods that actually had CV).
+    perf_list <- ctx_obj$twas_cv_result$performance
+    perf_method_names <- sub("_performance$", "", names(perf_list))
+    cv_lookup <- setNames(perf_list, perf_method_names)
+
+    # context_name follows the legacy convention "<context>_<trait>" — strip
+    # the trailing trait suffix so the new schema's context is a clean label.
+    context_label <- sub(paste0("_", gene, "$"), "", ctx_name)
+
+    # Legacy susie_weights_intermediate carries the SuSiE posterior
+    # tensor (mu, lbf_variable, X_column_scale_factors, etc.) that
+    # ctwasPipeline's alpha renormalization needs when variants get
+    # dropped downstream of the original fit. Attach it on susie-method
+    # entries via the TwasWeightsEntry@fits slot.
+    legacy_intermediate <- ctx_obj$susie_weights_intermediate
+
+    # One TwasWeightsEntry per method (TwasWeights schema requires one row
+    # per (study, context, trait, method) tuple).
+    for (mi in seq_along(method_names)) {
+      m  <- method_names[[mi]]
+      wm <- wts_list[[mi]]
+      wv <- if (is.matrix(wm)) as.numeric(wm[, 1L]) else as.numeric(wm)
+      # Skip a method when every weight is zero or NA — it would never
+      # contribute a TWAS-Z and downstream would silently drop it anyway.
+      if (all(is.na(wv)) || all(wv == 0, na.rm = TRUE)) next
+
+      weights_mat <- matrix(wv, ncol = 1L,
+                            dimnames = list(vids, m))
+
+      perf_row <- cv_lookup[[m]]
+      cv_df <- if (!is.null(perf_row)) {
+        df <- as.data.frame(perf_row, stringsAsFactors = FALSE)
+        names(df)[names(df) == "adj_rsq"] <- "adjusted_rsq"
+        df$method <- m
+        df <- df[, intersect(c("method", "rsq", "adjusted_rsq", "pval",
+                                "corr", "RMSE", "MAE"),
+                              names(df)), drop = FALSE]
+        rownames(df) <- NULL
+        df
+      } else {
+        data.frame(method = m, rsq = NA_real_, adjusted_rsq = NA_real_,
+                   pval = NA_real_, stringsAsFactors = FALSE)
+      }
+
+      # Attach SuSiE intermediate on susie-family methods so
+      # ctwasPipeline's renormalization branch can recompute weights
+      # over a filtered variant set.
+      entry_fits <- if (m %in% c("susie", "susie_inf", "susie_ash") &&
+                       !is.null(legacy_intermediate)) {
+        legacy_intermediate
+      } else {
+        NULL
+      }
+
+      entry <- TwasWeightsEntry(
+        variantIds    = vids,
+        weights       = weights_mat,
+        fits          = entry_fits,
+        cvPerformance = cv_df,
+        standardized  = FALSE)
+
+      studies  <- c(studies, argv$study)
+      contexts <- c(contexts, context_label)
+      traits   <- c(traits, gene)
+      methods  <- c(methods, m)
+      entries  <- c(entries, list(entry))
+    }
+  }
+}
+
+if (length(entries) == 0L)
+  stop("Converter produced 0 entries from ", argv$legacy)
+
+tw <- TwasWeights(
+  study    = studies,
+  context  = contexts,
+  trait    = traits,
+  method   = methods,
+  entry    = entries,
+  ldSketch = ld_handle)
+
+dir.create(dirname(argv$output), showWarnings = FALSE, recursive = TRUE)
+saveRDS(tw, argv$output)
+cat(sprintf("Wrote S4 TwasWeights (%d entries; %d unique traits) to %s\n",
+            length(entries),
+            length(unique(traits)),
+            argv$output))
diff --git a/code/script/pecotmr_integration/qtl_dataset_construct.R b/code/script/pecotmr_integration/qtl_dataset_construct.R
index de430907..4f879819 100644
--- a/code/script/pecotmr_integration/qtl_dataset_construct.R
+++ b/code/script/pecotmr_integration/qtl_dataset_construct.R
@@ -2,26 +2,36 @@
 # qtl_dataset_construct.R
 #
 # Build one pecotmr::QtlDataset from a single study's data and serialize
-# to RDS. The resulting RDS becomes the upstream dependency for every
-# per-gene fineMappingPipeline / twasWeightsPipeline / colocboostPipeline
-# task; gene-level parallelization happens against this single object.
+# to RDS. The resulting RDS is the upstream dependency for every per-gene
+# fineMappingPipeline / twasWeightsPipeline / colocboostPipeline task;
+# gene-level parallelization happens against this single object.
 #
-# Inputs (per-context phenotype + phenotype-covariate files use
-# CONTEXT=PATH pairs joined with commas, e.g.
-# --phenotype DLPFC=foo.bed.gz,AC=bar.bed.gz):
-#   --study                Study identifier
-#   --genotype-prefix      PLINK2 pgen/pvar/psam prefix
-#   --phenotype            CONTEXT=PATH list of BED.GZ phenotype files
-#   --phenotype-covariates CONTEXT=PATH list of per-context molecular-trait
-#                          PC TSVs (samples as rows, PCs as columns —
-#                          same shape as the genotype-PC TSV).
-#                          Optional; default none.
-#   --genotype-covariates  TSV of genotype-derived covariates (e.g. ancestry
-#                          PCs) applied uniformly across all contexts
-#                          (samples as rows, PCs as columns). Optional.
-#   --maf-cutoff           Pass-through to QtlDataset(mafCutoff = ...)
-#   --xvar-cutoff          Pass-through to QtlDataset(xvarCutoff = ...)
-#   --output               Output RDS path
+# Inputs:
+#   --study                Study identifier (length-1 character).
+#   --genotype-prefix      PLINK1 bed/bim/fam prefix (no extension).
+#   --phenotype-manifest   Path to a TSV manifest. Required columns:
+#                            ID       trait identifier
+#                            path     bgzipped BED phenotype file
+#                            cond     context name
+#                          Optional column:
+#                            cov_path per-context covariate TSV
+#                          One row per (region, context). Phenotype /
+#                          covariate paths may be absolute or relative to
+#                          the manifest's own directory. Multiple rows can
+#                          share the same context; per-context the
+#                          phenotype/cov path must agree.
+#   --genotype-covariates  Optional TSV of genotype-derived covariates
+#                          (e.g. ancestry PCs) applied uniformly across all
+#                          contexts. Same shape as the per-context covariate
+#                          files.
+#   --transpose-covariates When set, transposes every covariate TSV
+#                          (phenotype + genotype) before treating it as
+#                          samples-as-rows. Use this for QTLtools-format
+#                          inputs where rows are covariate names and
+#                          columns are samples.
+#   --maf-cutoff           Pass-through MAF cutoff for QtlDataset().
+#   --xvar-cutoff          Pass-through variance cutoff for QtlDataset().
+#   --output               Output RDS path.
 
 suppressPackageStartupMessages({
   library(argparser)
@@ -35,44 +45,26 @@ parser <- arg_parser("Build a pecotmr QtlDataset for one study and save to RDS")
 parser <- add_argument(parser, "--study",
                        help = "Study identifier", type = "character")
 parser <- add_argument(parser, "--genotype-prefix",
-                       help = "PLINK2 pgen/pvar/psam prefix",
+                       help = "PLINK1 bed/bim/fam prefix", type = "character")
+parser <- add_argument(parser, "--phenotype-manifest",
+                       help = "TSV manifest path (ID, path, cond, cov_path)",
                        type = "character")
-parser <- add_argument(parser, "--phenotype",
-                       help = "Comma-joined CONTEXT=PATH phenotype BED files",
-                       type = "character")
-parser <- add_argument(parser, "--phenotype-covariates",
-                       help = "Comma-joined CONTEXT=PATH PC TSV files",
-                       type = "character", default = "")
 parser <- add_argument(parser, "--genotype-covariates",
                        help = "TSV of uniformly-applied genotype PCs",
                        type = "character", default = "")
+parser <- add_argument(parser, "--transpose-covariates",
+                       help = "Transpose covariate TSVs (QTLtools format)",
+                       flag = TRUE)
 parser <- add_argument(parser, "--maf-cutoff",
-                       help = "Pass-through MAF cutoff for QtlDataset()",
+                       help = "QtlDataset() MAF cutoff",
                        type = "numeric", default = 0)
 parser <- add_argument(parser, "--xvar-cutoff",
-                       help = "Pass-through variance cutoff for QtlDataset()",
+                       help = "QtlDataset() variance cutoff",
                        type = "numeric", default = 0)
 parser <- add_argument(parser, "--output",
                        help = "Output RDS path", type = "character")
 argv <- parse_args(parser)
 
-# ----- CONTEXT=PATH list parser ---------------------------------------------
-parse_kv <- function(s) {
-  if (!nzchar(s)) return(list())
-  out <- list()
-  for (p in strsplit(s, ",", fixed = TRUE)[[1L]]) {
-    kv <- strsplit(p, "=", fixed = TRUE)[[1L]]
-    if (length(kv) != 2L) stop("Expected CONTEXT=PATH, got: ", p)
-    out[[trimws(kv[[1L]])]] <- trimws(kv[[2L]])
-  }
-  out
-}
-
-phenotype_files <- parse_kv(argv$phenotype)
-pheno_cov_files <- parse_kv(argv$phenotype_covariates)
-contexts <- names(phenotype_files)
-if (length(contexts) == 0L) stop("--phenotype must list at least one context")
-
 # ----- File readers ---------------------------------------------------------
 read_pheno_bed <- function(path) {
   read.table(gzfile(path), header = TRUE, sep = "\t",
@@ -80,15 +72,68 @@ read_pheno_bed <- function(path) {
              comment.char = "")
 }
 
-# Both --phenotype-covariates and --genotype-covariates files share this
-# shape: TSV with a sample-id column followed by one column per PC. The
-# returned matrix has samples as rows and PCs as columns.
-read_pcs_tsv <- function(path) {
-  as.matrix(read.table(path, header = TRUE, sep = "\t", row.names = 1,
-                       check.names = FALSE, comment.char = ""))
+# Read a covariate TSV. Default expects samples-as-rows + PC columns.
+# With transpose = TRUE, expects QTLtools-format input (covariates as rows,
+# samples as columns) and transposes to samples-as-rows.
+read_pcs_tsv <- function(path, transpose = FALSE) {
+  raw <- read.table(path, header = TRUE, sep = "\t", row.names = 1,
+                    check.names = FALSE, comment.char = "")
+  m <- as.matrix(raw)
+  if (transpose) m <- t(m)
+  m
+}
+
+# ----- Manifest ingestion ---------------------------------------------------
+manifest <- read.table(argv$phenotype_manifest, header = TRUE, sep = "\t",
+                       stringsAsFactors = FALSE, check.names = FALSE,
+                       comment.char = "")
+required <- c("ID", "path", "cond")
+missingCols <- setdiff(required, names(manifest))
+if (length(missingCols) > 0L) {
+  stop("--phenotype-manifest missing required column(s): ",
+       paste(missingCols, collapse = ", "))
+}
+has_cov_col <- "cov_path" %in% names(manifest)
+
+# Resolve manifest-relative paths against the manifest's own directory.
+manifest_dir <- dirname(normalizePath(argv$phenotype_manifest))
+resolve_path <- function(p) {
+  if (is.na(p) || !nzchar(p)) return("")
+  if (startsWith(p, "/")) return(p)
+  file.path(manifest_dir, p)
+}
+
+# Collapse to one (phenotype, cov_path) pair per context. Multiple rows per
+# context (different traits) must agree on the file paths.
+contexts <- unique(manifest$cond)
+phenotype_files <- list()
+pheno_cov_files <- list()
+for (cx in contexts) {
+  sub <- manifest[manifest$cond == cx, , drop = FALSE]
+  paths_here <- unique(sub$path)
+  if (length(paths_here) > 1L) {
+    stop(sprintf(
+      "Context '%s' references multiple phenotype paths: %s.",
+      cx, paste(paths_here, collapse = ", ")))
+  }
+  phenotype_files[[cx]] <- paths_here[[1L]]
+  if (has_cov_col) {
+    covs_here <- unique(sub$cov_path[!is.na(sub$cov_path) &
+                                      nzchar(sub$cov_path)])
+    if (length(covs_here) > 1L) {
+      stop(sprintf(
+        "Context '%s' references multiple cov_path values: %s.",
+        cx, paste(covs_here, collapse = ", ")))
+    }
+    pheno_cov_files[[cx]] <- if (length(covs_here) == 1L) covs_here[[1L]]
+                              else ""
+  } else {
+    pheno_cov_files[[cx]] <- ""
+  }
 }
 
-build_se <- function(bed_path, pcov_path) {
+# ----- Per-context SummarizedExperiment builder -----------------------------
+build_se <- function(bed_path, pcov_path, transpose_cov) {
   bed <- read_pheno_bed(bed_path)
   chr_col   <- intersect(c("#chr", "chrom", "chr"),    names(bed))[1L]
   start_col <- intersect(c("start", "Start"),          names(bed))[1L]
@@ -113,7 +158,7 @@ build_se <- function(bed_path, pcov_path) {
   names(rr) <- meta$gene_id
 
   cd <- if (nzchar(pcov_path)) {
-    pcov <- read_pcs_tsv(pcov_path)
+    pcov <- read_pcs_tsv(pcov_path, transpose = transpose_cov)
     common <- intersect(rownames(pcov), colnames(expr))
     if (length(common) == 0L)
       stop("No shared samples between phenotype and phenotype-covariate: ",
@@ -130,18 +175,19 @@ build_se <- function(bed_path, pcov_path) {
 
 phenotypes <- setNames(
   lapply(contexts, function(cx)
-    build_se(phenotype_files[[cx]],
-             if (!is.null(pheno_cov_files[[cx]])) pheno_cov_files[[cx]] else "")),
+    build_se(resolve_path(phenotype_files[[cx]]),
+             resolve_path(pheno_cov_files[[cx]]),
+             argv$transpose_covariates)),
   contexts)
 
-# ----- Genotype handle (PLINK2) + uniform genotype covariates ---------------
-geno_handle <- GenotypeHandle(plink2Prefix = argv$genotype_prefix)
+# ----- Genotype handle (PLINK1) + uniform genotype covariates ---------------
+geno_handle <- GenotypeHandle(plink1Prefix = argv$genotype_prefix)
 
 geno_cov_path <- argv$genotype_covariates
 has_geno_cov <- nzchar(geno_cov_path) && geno_cov_path != "." &&
                 file.exists(geno_cov_path)
 genoCov <- if (has_geno_cov) {
-  read_pcs_tsv(geno_cov_path)
+  read_pcs_tsv(geno_cov_path, transpose = argv$transpose_covariates)
 } else {
   matrix(numeric(0), nrow = 0, ncol = 0)
 }
@@ -157,5 +203,7 @@ qd <- QtlDataset(
 
 dir.create(dirname(argv$output), showWarnings = FALSE, recursive = TRUE)
 saveRDS(qd, argv$output)
-cat(sprintf("Wrote QtlDataset for study '%s' (%d contexts) to %s\n",
-            argv$study, length(phenotypes), argv$output))
+cat(sprintf("Wrote QtlDataset for study '%s' (%d contexts: %s) to %s\n",
+            argv$study, length(phenotypes),
+            paste(names(phenotypes), collapse = ", "),
+            argv$output))
diff --git a/code/script/pecotmr_integration/qtl_enrichment.R b/code/script/pecotmr_integration/qtl_enrichment.R
new file mode 100644
index 00000000..aea5acfb
--- /dev/null
+++ b/code/script/pecotmr_integration/qtl_enrichment.R
@@ -0,0 +1,68 @@
+#!/usr/bin/env Rscript
+# qtl_enrichment.R
+#
+# xQTL-GWAS enrichment worker. Loads an S4 `QtlFineMappingResult` and
+# an S4 `GwasFineMappingResult`, calls
+# `pecotmr::qtlEnrichmentPipeline()`, and saves the resulting
+# per-(gwasStudy, qtlContext) enrichment data.frame. The output feeds
+# downstream into `coloc.R` (via colocPipeline's `enrichment` arg) as
+# the prior-adjustment factor for colocalization.
+#
+# Inputs:
+#   --qtl-fine-mapping     Path to S4 QtlFineMappingResult RDS
+#                          (output of fine_mapping.ipynb / fineMappingPipeline)
+#   --gwas-fine-mapping    Path to S4 GwasFineMappingResult RDS
+#   --num-gwas             Number of GWAS variants per study (pass-through)
+#   --pi-qtl               Optional QTL pi value (pass-through)
+#   --lambda               Pass-through (default 1)
+#   --imp-n                Pass-through (default 25)
+#   --ncore                Number of threads (default 1)
+#   --output               Output RDS path (per-pair enrichment data.frame)
+
+suppressPackageStartupMessages({
+  library(argparser)
+  library(pecotmr)
+})
+
+parser <- arg_parser("xQTL-GWAS enrichment via qtlEnrichmentPipeline()")
+parser <- add_argument(parser, "--qtl-fine-mapping",
+                       help = "Path to S4 QtlFineMappingResult RDS",
+                       type = "character")
+parser <- add_argument(parser, "--gwas-fine-mapping",
+                       help = "Path to S4 GwasFineMappingResult RDS",
+                       type = "character")
+parser <- add_argument(parser, "--num-gwas",
+                       help = "Number of GWAS variants (per study; pass-through)",
+                       type = "numeric", default = NA)
+parser <- add_argument(parser, "--pi-qtl",
+                       help = "Optional QTL pi value (pass-through)",
+                       type = "numeric", default = NA)
+parser <- add_argument(parser, "--lambda",
+                       help = "Pass-through lambda",
+                       type = "numeric", default = 1)
+parser <- add_argument(parser, "--imp-n",
+                       help = "Pass-through impN",
+                       type = "integer", default = 25L)
+parser <- add_argument(parser, "--ncore",
+                       help = "Pass-through ncore",
+                       type = "integer", default = 1L)
+parser <- add_argument(parser, "--output",
+                       help = "Output RDS path", type = "character")
+argv <- parse_args(parser)
+
+qtlFmr  <- readRDS(argv$qtl_fine_mapping)
+gwasFmr <- readRDS(argv$gwas_fine_mapping)
+
+res <- qtlEnrichmentPipeline(
+  gwasFineMappingResult = gwasFmr,
+  qtlFineMappingResult  = qtlFmr,
+  numGwas               = if (is.na(argv$num_gwas)) NULL else argv$num_gwas,
+  piQtl                 = if (is.na(argv$pi_qtl))   NULL else argv$pi_qtl,
+  lambda                = argv$lambda,
+  impN                  = argv$imp_n,
+  numThreads            = argv$ncore)
+
+dir.create(dirname(argv$output), showWarnings = FALSE, recursive = TRUE)
+saveRDS(res, argv$output)
+cat(sprintf("Wrote qtlEnrichmentPipeline result (%d (gwasStudy, qtlContext) pair(s)) to %s\n",
+            nrow(res), argv$output))
diff --git a/code/script/pecotmr_integration/twas.R b/code/script/pecotmr_integration/twas.R
new file mode 100644
index 00000000..bbffce9c
--- /dev/null
+++ b/code/script/pecotmr_integration/twas.R
@@ -0,0 +1,62 @@
+#!/usr/bin/env Rscript
+# twas.R
+#
+# Per-gene TWAS Z-score + Mendelian-Randomization worker. Loads a
+# pre-computed per-gene TwasWeights RDS and a per-LD-block GwasSumStats
+# RDS, optionally a matching per-gene FineMappingResult, and calls
+# pecotmr::causalInferencePipeline().
+#
+# Inputs:
+#   --twas-weights         Per-gene TwasWeights RDS (output of twas_weights.ipynb)
+#   --gwas-sumstats        Per-LD-block GwasSumStats RDS
+#   --fine-mapping-result  Optional per-gene FineMappingResult RDS
+#   --mr-pip-cutoff        Pass-through (default 0.5)
+#   --mr-method            Pass-through; "ivwPerVariant" or "csAware".
+#                          Default "ivwPerVariant".
+#   --output               Output RDS path
+
+suppressPackageStartupMessages({
+  library(argparser)
+  library(pecotmr)
+})
+
+parser <- arg_parser("Per-gene TWAS Z + MR via causalInferencePipeline()")
+parser <- add_argument(parser, "--twas-weights",
+                       help = "Per-gene TwasWeights RDS",
+                       type = "character")
+parser <- add_argument(parser, "--gwas-sumstats",
+                       help = "Per-LD-block GwasSumStats RDS",
+                       type = "character")
+parser <- add_argument(parser, "--fine-mapping-result",
+                       help = "Optional per-gene FineMappingResult RDS",
+                       type = "character", default = "")
+parser <- add_argument(parser, "--mr-pip-cutoff",
+                       help = "Pass-through PIP cutoff for MR",
+                       type = "numeric", default = 0.5)
+parser <- add_argument(parser, "--mr-method",
+                       help = "MR method: ivwPerVariant or csAware",
+                       type = "character", default = "ivwPerVariant")
+parser <- add_argument(parser, "--output",
+                       help = "Output RDS path", type = "character")
+argv <- parse_args(parser)
+
+tw  <- readRDS(argv$twas_weights)
+gss <- readRDS(argv$gwas_sumstats)
+
+fmr_path <- argv$fine_mapping_result
+fmr <- if (nzchar(fmr_path) && fmr_path != "." && file.exists(fmr_path)) {
+  readRDS(fmr_path)
+} else {
+  NULL
+}
+
+res <- causalInferencePipeline(
+  gwasSumStats      = gss,
+  twasWeights       = tw,
+  fineMappingResult = fmr,
+  mrPipCutoff       = argv$mr_pip_cutoff,
+  mrMethod          = argv$mr_method)
+
+dir.create(dirname(argv$output), showWarnings = FALSE, recursive = TRUE)
+saveRDS(res, argv$output)
+cat(sprintf("Wrote causalInferencePipeline result to %s\n", argv$output))
diff --git a/code/script/pecotmr_integration/twas_weights.R b/code/script/pecotmr_integration/twas_weights.R
index 749149ce..4fe72878 100644
--- a/code/script/pecotmr_integration/twas_weights.R
+++ b/code/script/pecotmr_integration/twas_weights.R
@@ -1,36 +1,46 @@
 #!/usr/bin/env Rscript
 # twas_weights.R
 #
-# Per-gene TWAS-weights worker. Loads a pre-built pecotmr::QtlDataset RDS
-# and calls twasWeightsPipeline() for a single trait. Designed to be
-# invoked once per gene by the SoS fan-out in twas_weights.ipynb.
+# Per-region TWAS-weights worker. Loads a pre-built pecotmr::QtlDataset
+# RDS and calls twasWeightsPipeline() for either a single trait (gene
+# mode) or a single genomic region (region mode). Designed to be
+# invoked once per fan-out unit by the SoS step in twas_weights.ipynb.
+#
+# Modes (mutually exclusive):
+#   gene   : --gene-id ENSG... --cis-window 1000000
+#   region : --region chr22:15000000-16000000
 #
 # Inputs:
 #   --qtl-dataset           Path to a QtlDataset RDS
-#   --gene-id               Single trait identifier
-#   --cis-window            cis-window in bp around the trait's TSS
-#   --fine-mapping-result   Optional pre-fit FineMappingResult RDS; when
-#                           supplied, SuSiE-family TWAS methods reuse the
-#                           fits via the `fineMappingResult` cache instead
-#                           of refitting. Pass an empty string ("") or
+#   --gene-id               (gene mode) trait identifier
+#   --cis-window            (gene mode) cis-window in bp
+#   --region                (region mode) chr:start-end string
+#   --fine-mapping-result   Optional pre-fit FineMappingResult RDS;
+#                           SuSiE-family methods reuse the cached fits
+#                           via the fineMappingResult cache. Pass "" or
 #                           "." to skip.
 #   --output                Output RDS path (one TwasWeights)
 
 suppressPackageStartupMessages({
   library(argparser)
   library(pecotmr)
+  library(GenomicRanges)
+  library(IRanges)
 })
 
-parser <- arg_parser("Per-gene default-preset TWAS weights over a pre-built QtlDataset")
+parser <- arg_parser("Per-gene or per-region default-preset TWAS weights over a pre-built QtlDataset")
 parser <- add_argument(parser, "--qtl-dataset",
                        help = "Path to a QtlDataset RDS",
                        type = "character")
 parser <- add_argument(parser, "--gene-id",
-                       help = "Single trait identifier",
-                       type = "character")
+                       help = "Trait identifier (gene mode); mutually exclusive with --region",
+                       type = "character", default = "")
 parser <- add_argument(parser, "--cis-window",
-                       help = "cis-window in bp around the trait's TSS",
+                       help = "cis-window in bp around the trait's TSS (gene mode)",
                        type = "integer", default = 1000000L)
+parser <- add_argument(parser, "--region",
+                       help = "Genomic region as chr:start-end (region mode); mutually exclusive with --gene-id",
+                       type = "character", default = "")
 parser <- add_argument(parser, "--fine-mapping-result",
                        help = "Optional pre-fit FineMappingResult RDS",
                        type = "character", default = "")
@@ -38,6 +48,22 @@ parser <- add_argument(parser, "--output",
                        help = "Output RDS path", type = "character")
 argv <- parse_args(parser)
 
+parse_region <- function(s) {
+  m <- regmatches(s, regexec("^([^:]+):([0-9]+)-([0-9]+)$", s))[[1L]]
+  if (length(m) != 4L)
+    stop("--region must be in chr:start-end format (got: ", s, ")")
+  GRanges(seqnames = m[[2L]],
+          ranges   = IRanges(start = as.integer(m[[3L]]),
+                             end   = as.integer(m[[4L]])))
+}
+
+has_gene   <- nzchar(argv$gene_id)
+has_region <- nzchar(argv$region)
+if (has_gene && has_region)
+  stop("--gene-id and --region are mutually exclusive; pass exactly one.")
+if (!has_gene && !has_region)
+  stop("Specify either --gene-id (with --cis-window) or --region.")
+
 qd <- readRDS(argv$qtl_dataset)
 
 fmr_path <- argv$fine_mapping_result
@@ -47,14 +73,25 @@ fmr <- if (nzchar(fmr_path) && fmr_path != "." && file.exists(fmr_path)) {
   NULL
 }
 
-res <- twasWeightsPipeline(
-  qd,
-  methods           = "default",
-  traitId           = argv$gene_id,
-  cisWindow         = argv$cis_window,
-  fineMappingResult = fmr)
+res <- if (has_region) {
+  twasWeightsPipeline(
+    qd,
+    methods           = NULL,
+    region            = parse_region(argv$region),
+    cisWindow         = argv$cis_window,
+    fineMappingResult = fmr)
+} else {
+  twasWeightsPipeline(
+    qd,
+    methods           = NULL,
+    traitId           = argv$gene_id,
+    cisWindow         = argv$cis_window,
+    fineMappingResult = fmr)
+}
 
 dir.create(dirname(argv$output), showWarnings = FALSE, recursive = TRUE)
 saveRDS(res, argv$output)
-cat(sprintf("Wrote TWAS weights for gene '%s' (%d row(s)) to %s\n",
-            argv$gene_id, nrow(res), argv$output))
+cat(sprintf("Wrote TWAS weights for %s (%d row(s)) to %s\n",
+            if (has_region) paste0("region '", argv$region, "'")
+            else paste0("gene '", argv$gene_id, "'"),
+            nrow(res), argv$output))
diff --git a/dev/pecotmr_migration_survey.md b/dev/pecotmr_migration_survey.md
new file mode 100644
index 00000000..c8ace96f
--- /dev/null
+++ b/dev/pecotmr_migration_survey.md
@@ -0,0 +1,87 @@
+# pecotmr API migration — survey of legacy callers in xqtl-protocol
+
+MWE inspected: `/Users/danielnachun/Downloads/fungen_xqtl/xqtl-protocol`.
+Out of scope for this round: the four new wrappers under `code/SoS/pecotmr_integration/` (`qtl_dataset`, `fine_mapping`, `twas_weights`, `colocboost`).
+
+## Notebooks examined
+
+User-specified subset:
+
+| Notebook | SoS steps with pecotmr work | Migration shape |
+|---|---|---|
+| `code/SoS/association_scan/quantile_models/qr_and_twas.ipynb` | `[quantile_qtl_twas_weight]` — inline R | new wrapper |
+| `code/SoS/mnm_analysis/mnm_methods/mnm_regression.ipynb` | `[susie_twas]` only (per user directive) — invokes `code/script/mnm_analysis/mnm_methods/susie_twas.R` | replace step + script |
+| `code/SoS/mnm_analysis/mnm_methods/colocboost.ipynb` | `[colocboost]` — inline R, calls `library(colocboost)` + `library(pecotmr)` (plus the deprecated multi-task loader pattern via the SoS `[get_analysis_regions]` step) | replace with new colocboost.ipynb pattern |
+| `code/SoS/mnm_analysis/mnm_methods/rss_analysis.ipynb` | `[univariate_rss]` — invokes `code/script/pecotmr_integration/univariate_rss.R` | replace script + step |
+| `code/SoS/pecotmr_integration/twas_ctwas.ipynb` | `[twas]`, `[ctwas_1]`, `[ctwas_2]`, `[ctwas_3]`, `[quantile_twas]` — all inline R | refactor; needs design |
+| `code/SoS/pecotmr_integration/SuSiE_enloc.ipynb` | `[xqtl_gwas_enrichment]`, `[susie_coloc]` — both inline R | refactor; needs design |
+| `code/SoS/pecotmr_integration/intact.ipynb` | `[intact]` — inline R (uses `fastenloc`, `run_intact`, etc.) | might be downstream-only (no pecotmr fitting) — needs check |
+
+## Deprecated pecotmr symbols referenced per step
+
+(Searched both snake_case and camelCase forms; both are in pecotmr's `R/deprecated.R`.)
+
+- `qr_and_twas.ipynb [quantile_qtl_twas_weight]`: **`load_regional_association_data`** + quantile-regression-specific helpers.
+- `mnm_regression.ipynb [susie_twas]` → `susie_twas.R`: **`load_regional_univariate_data`**, **`univariate_analysis_pipeline`**. The legacy script is ~567 lines and wraps both data loading and pipeline fitting into one CLI.
+- `mnm_regression.ipynb [ensemble_twas_weight]`: **`ensemble_weights`** (+ `load_regional_univariate_data` for resume-cache reads). (Out of scope per user directive but logged here for context.)
+- `colocboost.ipynb [colocboost]`: uses `library(colocboost)` and `library(pecotmr)`; relies on the SoS `[get_analysis_regions]` upstream which uses the deprecated multi-task data-loader convention (manifest-driven). The R block itself likely calls `pecotmr` helpers for data assembly and the `colocboost::colocboost()` solver directly.
+- `rss_analysis.ipynb [univariate_rss]` → `univariate_rss.R`: **`load_LD_matrix`** (deprecated → `loadLdMatrix`). The script is ~265 lines.
+- `twas_ctwas.ipynb`:
+  - `[twas]` — **`twas_pipeline`**, **`load_twas_weights`**, **`batch_load_twas_weights`**, **`twas_z`** + various `mr_*` MR helpers.
+  - `[ctwas_1]` — **`get_ctwas_meta_data`**, **`harmonize_gwas`**, **`load_LD_matrix`**, **`twas_analysis`**, plus internal `ctwas_*` data structures.
+  - `[ctwas_2]` — internal `ctwas_param_*` flow.
+  - `[ctwas_3]` — **`ctwas_bimfile_loader`**, **`get_ctwas_meta_data`**, plus `ctwas_*` plotting/postprocess helpers.
+  - `[quantile_twas]` — `twas_pipeline`, `mr_result`.
+- `SuSiE_enloc.ipynb`:
+  - `[xqtl_gwas_enrichment]` — **`xqtl_enrichment_wrapper`** (deprecated → `qtlEnrichmentPipeline`).
+  - `[susie_coloc]` — calls `pecotmr:::get_nested_element` (internal helper) on a per-context fine-mapped RDS; the per-context coloc loop is built around `library(coloc)` directly.
+- `intact.ipynb [intact]` — `fastenloc`, `run_intact`, `intact_pip`, `intact_res`. No direct pecotmr-pipeline call observed; mostly a downstream synthesizer. Needs deeper check before concluding.
+
+## MWE data conventions discovered
+
+- **Genotype**: BOTH PLINK1 (e.g. `input/colocboost/example.chr22.bed/.bim/.fam`) AND VCF / bgzipped formats (`input/genotype/protocol_example.genotype.chr*.vcf.gz`, `*.bgz`). **No PLINK2 (pgen/pvar/psam) files present** in the MWE.
+- **Phenotype BED**: standard QTLtools format with `#chr, start, end, ID, SAMPLE_001, SAMPLE_002, ...`. Matches what our new `qtl_dataset_construct.R` parses.
+- **Covariate file**: **QTLtools convention** — `#id` header + rows like `sex`, `age`, `PC1`, ... with sample columns. **This is the opposite orientation from what the new wrapper expects** (we updated the script to take samples-as-rows since the user described `--phenotype-covariates` as "molecular trait PCs, same shape as genotype PCs").
+- **Per-region manifest TSVs** (`fine_mapping_meta.tsv`, `pheno_manifest_multicontext.tsv`): columns `#chr, start, end, ID, path, cond, cov_path` — one row per (region, context). The same `cov_path` is reused across contexts in the MWE examples.
+- **xQTL meta TSVs** (TWAS / cTWAS / enloc inputs): point at pre-computed per-(study, gene) `*.univariate_susie_twas_weights.rds` files. These workflows do NOT recompute fits; they consume the pre-built RDS files.
+- **GWAS inputs** (`input/rss_analysis/`, `input/twas/`): tabix-indexed `.tsv.gz` plus a YAML column-mapping file.
+
+## Migration sketch by notebook
+
+1. **`mnm_regression.ipynb` `[susie_twas]` step**: delete the step and `susie_twas.R`; replace with the existing `fine_mapping.ipynb` + `twas_weights.ipynb` against a pre-built QtlDataset RDS.
+2. **`rss_analysis.ipynb`**: new wrapper `gwas_finemap.R` that takes a QC'd `GwasSumStats` RDS and calls `fineMappingPipeline(methods = "susie")`. The legacy `[univariate_rss]` step's inputs (sumstats TSV + LD meta) feed a prior `summaryStatsQc` step that builds the GwasSumStats RDS.
+3. **`colocboost.ipynb`** (mnm_analysis): the existing notebook covers individual-level cross-context coloc with `colocboost::colocboost()` invoked through pecotmr's old data assembly. The new `pecotmr_integration/colocboost.ipynb` we wrote covers the QtlDataset focal-gene case; multi-condition / GWAS-jointed colocboost can be added similarly via `colocboostPipeline(jointGwas = TRUE, ...)`.
+4. **`qr_and_twas.ipynb`**: quantile-regression TWAS. Need to confirm whether pecotmr's S4 refactor exposes a new entry point for quantile TWAS or whether the legacy `load_regional_association_data` + a `quantile_*` worker is still the only path.
+5. **`twas_ctwas.ipynb`**: maps to the new `ctwasPipeline` + `causalInferencePipeline` in pecotmr. Each step (`twas`, `ctwas_1/2/3`, `quantile_twas`) becomes a thin wrapper that takes the pre-computed TwasWeights RDS + GWAS RDS + LD meta and calls one pipeline function. The data flow stays the same; only the inline R block is replaced.
+6. **`SuSiE_enloc.ipynb`**:
+   - `[xqtl_gwas_enrichment]` → `qtlEnrichmentPipeline(gwasFineMappingResult, qtlFineMappingResult)`. Input meta TSV stays the same; the wrapper just calls the new pipeline.
+   - `[susie_coloc]` → `colocPipeline(qtlFineMappingResult, gwasFineMappingResult)` or similar. The new pipeline handles the per-(study, region) coloc loop internally; no need to hand-write the loop in the SoS notebook.
+7. **`intact.ipynb`**: downstream of TWAS + enloc; uses `fastenloc` and `run_intact`. These look like INTACT-specific functions, not pecotmr S4 pipeline targets. Likely the inline R only consumes RDS outputs from earlier steps and a thin wrapper that calls the same `run_intact` is enough. Confirm whether `run_intact` is a pecotmr export or an INTACT-package function.
+
+## Status update (2026-06-22)
+
+**colocboost extension done.** `colocboost.R` + `colocboost.ipynb` now accept `--gwas-sumstats <RDS>` plus the three pipeline-variant flags (`--no-xqtl-coloc`, `--joint-gwas`, `--separate-gwas`), with the matching SoS parameter validations. xqtl-only regression and the GWAS-variant validations all smoke-tested against the MWE. Joint/separate-gwas end-to-end testing still needs a QC'd GwasSumStats RDS.
+
+**twas_ctwas.ipynb migration — wrappers + notebooks written.** Three new pairs under `code/SoS/pecotmr_integration/` + `code/script/pecotmr_integration/`:
+
+- `gwas_sumstats_construct.R` + `gwas_sumstats.ipynb` — per-LD-block GwasSumStats builder. Reads a GWAS TSV, subsets to the block, resolves the LD-meta row, builds a PLINK1/PLINK2/GDS/VCF `GenotypeHandle`, calls `GwasSumStats(...)`. Has a `--skip-qc` flag escape hatch because `summaryStatsQc()` triggers MungeSumstats which requires the `SNPlocs.Hsapiens.dbSNP155.GRCh38` BioConductor package. **Smoke-tested against the MWE** — produces a valid `GwasSumStats` over chr22:10516173-17414263 (112 variants) with the PLINK2 LD sketch attached.
+- `twas.R` + `twas.ipynb` — per-gene `causalInferencePipeline`. Notebook is now manifest-driven (rows: `gene_id`, `twas_weights_rds`, `gwas_sumstats_rds`, optional `fine_mapping_result_rds`), matching the `ctwas.ipynb` pattern. Per-row fan-out, one `twas.R` call per gene. Output: `{cwd}/{study}.{gene_id}.twas.rds`.
+- `fine_mapping.R` + `fine_mapping.ipynb` — supports BOTH QTL (`fine_mapping` workflow, `--qtl-dataset` + `--genes`/`--regions` fan-out) AND GWAS (`fine_mapping_gwas` workflow, `--gwas-sumstats-list <RDS>...` per-block fan-out). One wrapper script dispatches on which S4 input was supplied; `pecotmr::fineMappingPipeline` dispatches on input class (QtlDataset / GwasSumStats). Smoked end-to-end on the MWE: produces `QtlFineMappingResult` for ENSG00000130538 and `GwasFineMappingResult` for all 20 chr22 LD blocks.
+- `ctwas_assemble.R` + `ctwas_est.R` + `ctwas_finemap.R` + `ctwas.ipynb` — three-step cTWAS pipeline mirroring the legacy `[ctwas_1/2/3]`. Each step calls one of pecotmr's exported sub-pipelines (`assembleCtwasInputs` / `estCtwasParam` / `screenCtwasRegions` + `finemapCtwasRegions`). The SoS notebook chains them via intermediate RDS files so analysts can re-run only the screen/finemap step when tuning knobs.
+
+**Bugs surfaced in installed pecotmr 0.5.3 during testing**:
+1. `.twasNormalizeMethods("default")` echoes the literal `"default"` as the only token (skips the `methodList`-based expansion that the `is.null` branch uses). Workaround: wrappers pass `methods = NULL`.
+2. `pecotmr:::getRegionalLdMeta` returns the LD-meta file path as the *value* with the data path as the *name* of a single-element character vector, so `GenotypeHandle(ldMeta=…, region=…)` ends up checking the meta file's `.tsv` extension instead of the genotype payload's. Workaround: `gwas_sumstats_construct.R` resolves LD-meta rows directly and calls `GenotypeHandle(plink2Prefix=…)` (or the matching constructor) itself.
+3. `cisWindow` is required in `fineMappingPipeline(... region = ...)` despite the doc saying it's required only with `traitId`. Wrappers pass `cisWindow` in both modes.
+
+## Outstanding questions for the user
+
+1. **Covariate orientation**: MWE files are QTLtools format (covariates × samples). Our new `qtl_dataset_construct.R` expects samples × PCs (transposed). Options: (a) transpose inside the wrapper, (b) make orientation a flag, (c) keep current and require pre-transposition. Which?
+2. **Genotype format**: MWE has PLINK1 + VCF, no PLINK2. Do you want to (a) convert MWE inputs to PLINK2 once, (b) extend `qtl_dataset_construct.R` to accept PLINK1 / VCF, or (c) leave the wrapper PLINK2-only?
+3. **Meta TSV consumption**: the legacy workflow ingests per-region manifest TSVs (`fine_mapping_meta.tsv`) and expands them into per-region data dicts via `[get_analysis_regions]`. Under the new design, where does the meta TSV live? Options: (a) supply CLI args directly to `qtl_dataset_construct.R` (current), (b) add a CLI flag that reads a meta TSV and builds the args internally, (c) leave the TSV→args translation as a one-off pre-step.
+4. **`--region` vs `--window`**: legacy CLI distinguished a per-gene fine-mapping region from a larger association window for variant selection. The new wrappers only expose `--cis-window` (per-trait). Confirm this consolidation is intentional and the wider association-window concept is gone.
+5. **Quantile TWAS**: does pecotmr's new S4 API have a quantile-TWAS pipeline (`quantileTwasPipeline` or similar), or is quantile TWAS still bolted on top of the legacy `load_regional_association_data` flow?
+6. **Pre-computed TWAS weights inputs**: `twas_ctwas` / `SuSiE_enloc` / `intact` all consume pre-computed `*.univariate_susie_twas_weights.rds` files. After the migration of `twas_weights.ipynb`, the RDS schema is the new `TwasWeights` S4 collection — which should be a drop-in replacement everywhere the old `.rds` was consumed via `loadTwasWeights` / `batchLoadTwasWeights`. Confirm: are we OK requiring downstream notebooks to take new-format `TwasWeights` RDS files, or do we need a back-compat shim?
+7. **`ensemble_weights` (mnm_regression `[ensemble_twas_weight]`)**: still wants to be a callable. Has it been moved to a camelCase `ensembleWeights` (still exported), or removed in the S4 refactor? Affects whether the ensemble step is a 1-line wrapper or needs reimplementation.
+8. **Scope confirmation**: per user directive, `mnm_regression.ipynb` is treated as just `[susie_twas]` for this round; `[ensemble_twas_weight]`, `[mnm]`, `[mnm_genes]`, `[fsusie]`, `[mvfsusie]` are deferred. Confirm.
+9. **`intact.ipynb`**: does the `intact` step call into pecotmr (deprecated or otherwise) or is it purely the INTACT R package + RDS file munging? Determines whether this notebook needs migration at all.
diff --git a/dev/python-replacement-scope.md b/dev/python-replacement-scope.md
new file mode 100644
index 00000000..e22ea5fe
--- /dev/null
+++ b/dev/python-replacement-scope.md
@@ -0,0 +1,174 @@
+# Python replacement scope
+
+Inventory of all Python code in xqtl-protocol, categorized by what we plan to
+do with it. Companion to the pecotmr refactor (see
+`StatFunGen/pecotmr/dev/refactor-design.md`) — most of the "internal Python"
+absorption happens inside pecotmr constructors and pipelines after that
+refactor lands.
+
+## Out of scope
+
+The following Python is **not** considered for replacement and is excluded
+from every category below.
+
+- **SoS framework scaffolding** (~3,065 LOC across all SoS notebooks): the
+  `[step_name]` headers, `parameter:`, `input:`, `output:`, `task:`, and
+  `depends:` directives, plus action-block headers (`R:`, `bash:`,
+  `python:`, etc.). These use Python syntax but are SoS framework
+  declarations, not Python you write. Stays implicitly because SoS itself
+  stays.
+- **Snakemake** (`code/snakemake/` directory in its entirety, including
+  `compat/python/sitecustomize.py`, `Snakefile`, and `rules/*`). Out of
+  scope for this work.
+
+## Categories
+
+There are four buckets of Python in xqtl-protocol. Categories 1 contains
+code we keep; categories 2–4 contain code we plan to replace.
+
+### Category 1 — Vendored Python scripts and external tools (KEEP)
+
+Vendored scripts and external tool wrappers we do not plan to replace.
+These are either copies of upstream tools or wrappers around large
+external libraries (ML / GPU frameworks) with no clean R equivalent. They
+are invoked from the workflow layer via shell-out (`bash:` blocks or
+`subprocess`), so the rest of the codebase doesn't care about their
+language.
+
+| File | LOC | Origin / purpose |
+|---|---|---|
+| `code/SoS/molecular_phenotypes/calling/apa/DaPars_Extract_Anno.py` | 152 | Vendored from the [DaPars](https://github.com/ZhengXia/DaPars) APA-calling project. Extracts 3'UTR annotation from gene BED + symbol map. **Python 2.** |
+| `code/SoS/molecular_phenotypes/calling/apa/Dapars2_Multi_Sample.py` | 448 | Vendored DaPars2 multi-sample driver. Same project family as `DaPars_Extract_Anno.py`. **Python 2.** |
+| `code/SoS/molecular_phenotypes/calling/apa/gtf2bed12.py` | 209 | Vendored GTF→BED12 converter (Xudong Zou, 2021). Preprocessing step for APA workflows. **Python 2.** |
+| `code/SoS/xqtl_modifier_score/gems_pipeline.py` | 678 | GEMS (Generalized Expression Modifier Scores) ML pipeline. Uses sklearn, torch, xgboost, optuna. Self-contained CLI with `train` and `predict` subcommands; invoked from `ems_training.ipynb` / `ems_prediction.ipynb` via `bash:` blocks. |
+| `code/script/association_scan/TensorQTL/TensorQTL.py` | 693 | Wrapper around the `tensorqtl` Python library for QTL association testing (cis/trans, nominal + permutation). |
+| **Total** | **2,180** | |
+
+### Category 2 — SoS notebooks with step-body Python (REPLACE)
+
+Notebooks running under the SoS kernel that contain Python written
+**outside** the `R:` / `bash:` action blocks. This includes both:
+
+- Step-body Python at the SoS step level (manifest assembly, region
+  parsing, file-path resolution, conditional logic, output-filename
+  templating)
+- `python:` / `python3:` action block bodies (Python executed as the
+  step's action)
+
+After the pecotmr refactor, the data-construction slice of this Python
+(region parsing, GWAS-meta TSV reading, manifest assembly, per-trait /
+per-context path resolution) is absorbed into pecotmr constructors. The
+remaining step-body Python — per-job argument shaping, output-filename
+templating, control flow — is what we want to replace with bash + R.
+
+**82 SoS notebooks have some step-body Python.** The 16 heavy hitters
+(≥100 LOC) are where the bulk of the work lives:
+
+| Notebook | Step-body LOC |
+|---|---|
+| `code/SoS/pecotmr_integration/twas_ctwas.ipynb` | 673 |
+| `code/SoS/mnm_analysis/mnm_methods/colocboost.ipynb` | 589 |
+| `code/SoS/mnm_analysis/mnm_methods/mnm_regression.ipynb` | 389 |
+| `code/SoS/commands_generator/eQTL_analysis_commands.ipynb` | 267 |
+| `code/SoS/mnm_analysis/mnm_postprocessing.ipynb` | 247 |
+| `code/SoS/association_scan/quantile_models/qr_and_twas.ipynb` | 243 |
+| `code/SoS/molecular_phenotypes/QC/pseudobulk_expression_aggregation_QC_norm.ipynb` | 243 *(mixed-kernel — see Category 3)* |
+| `code/SoS/pecotmr_integration/SuSiE_enloc.ipynb` | 239 |
+| `code/SoS/enrichment/sldsc_enrichment.ipynb` | 211 |
+| `code/SoS/molecular_phenotypes/calling/RNA_calling.ipynb` | 182 |
+| `code/SoS/data_preprocessing/genotype/genotype_formatting.ipynb` | 151 |
+| `code/SoS/data_preprocessing/genotype/GWAS_QC.ipynb` | 137 |
+| `code/SoS/molecular_phenotypes/QC/pseudobulk_mega_expression_QC_and_normalization.ipynb` | 130 *(mixed-kernel — see Category 3)* |
+| `code/SoS/mnm_analysis/mnm_methods/rss_analysis.ipynb` | 129 |
+| `code/SoS/association_scan/TensorQTL/TensorQTL.ipynb` | 112 |
+| `code/SoS/data_preprocessing/genotype/PCA.ipynb` | 104 |
+
+**32 more notebooks have 20–99 LOC of step-body Python** (medium size; mostly
+smaller workflows and graveyard files). **34 more have <20 LOC** (incidental
+and cleanable in passing).
+
+Aggregate Python in SoS code cells across all SoS notebooks:
+
+- **5,655 LOC** of step-body Python (outside action blocks)
+- **1,011 LOC** of `python:` action block bodies
+- **106 LOC** of helper `def`/`class` definitions
+
+Total Category 2 LOC: **~6,772**.
+
+### Category 3 — Notebooks with Python kernels (REPLACE)
+
+Notebooks whose notebook-level `kernelspec` is Python (i.e., not SoS
+workflows — standalone Python notebooks). These are mostly analysis,
+plotting, and ML-driver notebooks.
+
+| Notebook | Code LOC | Notes |
+|---|---|---|
+| `code/SoS/cv2f/notebooks/analyze_selection_criteria.ipynb` | 738 | Analysis of cross-validation selection in fine-mapping |
+| `code/SoS/reference_data/rss_ld_sketch.ipynb` | 419 | **Mixed-kernel.** Notebook-level kernel is Python, but most cells are R-implicit (R code in cells with no per-cell kernel). True Python content is small. |
+| `code/SoS/molecular_phenotypes/QC/pseudobulk_expression_aggregation_QC_norm.ipynb` | 282 | **Mixed-kernel.** Also appears in Category 2 with 243 LOC of SoS step-body Python. |
+| `code/SoS/molecular_phenotypes/QC/pseudobulk_mega_expression_QC_and_normalization.ipynb` | 149 | **Mixed-kernel.** Also appears in Category 2 with 130 LOC of SoS step-body Python. |
+| `code/SoS/reference_data/ld_reference_generation.ipynb` | 146 | LD reference data prep |
+| `code/SoS/xqtl_modifier_score/ems_prediction.ipynb` | 117 | Driver for `gems_pipeline.py predict` (Category 1). Mostly shell-out invocations. |
+| `code/SoS/xqtl_modifier_score/ems_training.ipynb` | 76 | Driver for `gems_pipeline.py train`. Mostly shell-out invocations. |
+| `code/SoS/mnm_analysis/mnm_miniprotocol.ipynb` | 62 | Mini-protocol tutorial |
+| `code/SoS/mnm_analysis/multivariate_multigene_fine_mapping_vignette.ipynb` | 21 | Vignette |
+| `code/SoS/mnm_analysis/multivariate_fine_mapping_vignette.ipynb` | 21 | Vignette |
+| `code/SoS/mnm_analysis/univariate_fine_mapping_fsusie_vignette.ipynb` | 11 | Vignette |
+| `code/SoS/mnm_analysis/summary_stats_finemapping_vignette.ipynb` | 9 | Vignette |
+| `code/SoS/pecotmr_integration/twas_vignette.ipynb` | 0 | Markdown-only |
+| **Total** | **~2,051** | |
+
+Two additional notes:
+
+- `website/nature_protocol/conversion_notebook.ipynb` (516 LOC) is a
+  Python-kernel notebook used for the protocol website's doc generation.
+  Listed for completeness but is not analysis code; treat separately if /
+  when website automation gets touched.
+- The three **mixed-kernel notebooks** (`rss_ld_sketch.ipynb`,
+  `pseudobulk_expression_aggregation_QC_norm.ipynb`,
+  `pseudobulk_mega_expression_QC_and_normalization.ipynb`) appear in both
+  Category 2 (their SoS-cell step-body Python) and Category 3 (their
+  notebook-level Python kernel). When tackling these, the work spans
+  both flavors.
+
+### Category 4 — Standalone Python scripts to replace (REPLACE)
+
+`.py` files outside of the vendored / external set, mostly under
+`code/script/`. These are real production code that wraps existing R-side
+libraries (edgeR, limma, tabix, plink) or does data manipulation that
+ports cleanly to R or bash.
+
+| File | LOC | What it does | Replacement target |
+|---|---|---|---|
+| `code/script/association_scan/TensorQTL/TensorQTL.py` | 693 | *(Category 1 — kept as-is; listed here only to flag that we are not replacing it)* | — |
+| `code/script/data_preprocessing/phenotype/gene_annotation.py` | 589 | Maps leafcutter / psichomics splicing clusters and isoforms to genes. Pandas-heavy. | R |
+| `code/script/data_preprocessing/phenotype/phenotype_formatting.py` | 325 | Phenotype file format conversions (BED, tabix prep). | Bash + R |
+| `code/script/molecular_phenotypes/calling/RNA_calling.py` | 316 | Merges STAR / RSEM GCT outputs into a combined expression matrix. | R |
+| `code/script/molecular_phenotypes/QC/bulk_expression_normalization.py` | 256 | TMM / CPM / voom / quantile normalization. Already wraps edgeR / limma via rpy2. | Native R port |
+| `code/script/data_preprocessing/genotype/genotype_formatting.py` | 247 | VCF / PLINK format conversions and splitting. Mostly `subprocess` calls. | Bash |
+| **Total to replace** | **~1,733** | | |
+
+(Note: `TensorQTL.py` is shown in the table for completeness but is part of
+Category 1 — we are explicitly not replacing it. Excluding it, the
+replacement target in Category 4 is ~1,733 LOC.)
+
+## Summary
+
+| Category | Disposition | Approx. LOC | File / notebook count |
+|---|---|---|---|
+| 1. Vendored Python / external tools | Keep | 2,180 | 5 files |
+| 2. SoS notebooks with step-body Python | Replace | 6,772 | 82 notebooks (16 heavy hitters) |
+| 3. Notebooks with Python kernels | Replace | 2,051 | 13 notebooks (3 mixed-kernel) |
+| 4. Standalone Python scripts (excl. TensorQTL) | Replace | 1,733 | 5 files |
+| **Replacement target** | | **10,556** | |
+| **Out of scope** (SoS scaffolding) | Implicit keep | ~3,065 | — |
+| **Out of scope** (Snakemake) | Excluded | ~125+ | `code/snakemake/` tree |
+
+After the pecotmr refactor absorbs the data-construction slice of Category 2
+(rough estimate: 2,000–3,000 LOC of region parsing / manifest assembly /
+GWAS-meta TSV reading), the remaining replacement target is approximately
+**7,500–8,500 LOC** of step-body orchestration Python, Python-kernel analysis
+notebooks, and standalone scripts. Roughly 80% of this is straightforwardly
+portable to R or bash; the other 20% is per-step orchestration that needs to
+be rewritten in the SoS notebooks (as inline bash + R, replacing the
+imperative Python that's there today).
diff --git a/dev/s4_migration_plan.md b/dev/s4_migration_plan.md
new file mode 100644
index 00000000..ee82e25e
--- /dev/null
+++ b/dev/s4_migration_plan.md
@@ -0,0 +1,345 @@
+# S4 migration plan: pecotmr ↔ xqtl-protocol
+
+Working document. Captures the design conversation from 2026-06-13/15 covering the
+S4 migration of pecotmr, the conversion of xqtl-protocol to thin wrappers, and
+the architectural constraints we agreed on.
+
+## Constraints
+
+1. **xqtl-protocol R code must never use `@` or `slot()`** on a pecotmr object. Only
+   accessor functions (`getX()`) are allowed.
+2. **xqtl-protocol R code should rarely use any accessor either.** A per-region
+   script should be: parse args → loader → pipeline → save. If a script needs to
+   call more than a couple of accessors, pecotmr should encapsulate that work.
+3. **Every loader returns an S4** of an appropriate class.
+4. **Every per-region pipeline accepts that S4 directly** via S4 method dispatch.
+   The per-condition loop lives **inside pecotmr**, never in xqtl-protocol.
+5. **pecotmr's own code follows the Bioconductor rule**: `@slot` is allowed only
+   inside accessor method bodies (`setMethod("getX", "Class", ...)`), in
+   `new(...)` / constructor bodies, in `slot<-` assignment, and in validity
+   functions. Everywhere else, slot access must go through accessors.
+6. Genome-aggregating analyses (mash, S-LDSC) stay two-stage. That is a method-
+   level fact, not a smell.
+
+## Where we are today
+
+### Snake → camel rename (done)
+
+68 mechanical function-name renames applied across `code/SoS/` notebooks and
+`code/script/` R files. Scoped to `name(` and `pecotmr::name` patterns inside R
+contexts only; skipped Python contexts in SoS notebooks; skipped local-collision
+cases (`filter_relatedness` in GWAS_QC.R, `meta_analysis_per_cell` /
+`update_mash_model_cov` in mash_posterior.ipynb).
+
+### Mechanical S4 access conversion (done)
+
+63 substitutions across 7 files (`fsusie.R`, `susie_twas.R`, `mnm.R`,
+`mnm_regression.ipynb`, `qr_and_twas.ipynb`, `twas_ctwas.ipynb`,
+`rss_ld_sketch.ipynb`):
+
+- `fdat$residual_X[[r]]` → `getResidualX(fdat, r)`
+- `fdat$residual_Y[[r]]` → `getResidualY(fdat, r)`
+- `fdat$residual_X_scalar[[r]]` → `getResidualXScalar(fdat, r)`
+- `fdat$residual_Y_scalar[[r]]` → `getResidualYScalar(fdat, r)`
+- `fdat$X_variance[[r]]` → `getXVariance(fdat, r)`
+- `fdat$X` → `getGenotypeMatrix(fdat)` (both RegionalData and Multivariate)
+- `fdat$Y[[r]]` on RegionalData → `getPhenotypes(fdat)[[r]]`
+- `fdat$Y` on MultivariateRegionalData → `getY(fdat)`
+- `fdat$maf[[r]]` → `getMaf(fdat)[[r]]` (will work once getMaf for RegionalData lands)
+- `fdat$covar[[r]]` → `getCovariates(fdat)[[r]]`
+- `length/names/seq_along(fdat$residual_Y)` → `length/names/seq_along(getPhenotypes(fdat))`
+  (count- and name-equivalent by construction; not "iterate residuals" semantically,
+  but functionally correct)
+- LdData: `$LD_variants` → `getVariantIds`, `$ref_panel` → `getRefPanel`,
+  `$LD_matrix` → `getCorrelation` or `getGenotypes` (depending on `returnGenotype`
+  flag at the call site)
+
+**Reassignments skipped** (would have produced invalid R syntax):
+- `fdat$residual_X[[r]] <- NA` and `fdat$residual_Y[[r]] <- NA` (intent was
+  memory-clearing; moot under lazy residual computation — should be deleted in
+  cleanup pass).
+- `LD_list$LD_matrix <- LD_list$LD_matrix[-dup_idx, -dup_idx]` (the whole
+  dedup block should be deleted — pecotmr deduplicates internally now).
+
+**Zero `@` introduced** in any of the 63 substitutions.
+
+### Pending S4 access points
+
+30 remaining; after subtracting false positives (`opt$maf` is optparse, not
+pecotmr; `qr_results$maf` is a local list), the real remaining work is ~22 sites
+in four categories:
+
+1. **Slot renames** (~12 sites): `fdat$dropped_samples` → `@droppedSamples`;
+   `fdat$Y_coordinates` / `phenotype_coordiates` → `@coordinates`. These cannot
+   be mechanically rewritten without accessors — see Phase 0 below.
+2. **Reassignments** (~5 sites): delete the lines.
+3. **Wrong access on Multivariate** (~2 sites): `$residual_Y` on
+   `MultivariateRegionalData` → `getY()` (no residualization on that class).
+4. **Judgment calls** (~3 sites): `$X_data`, `$X_variance` without index,
+   `$residual_Y` matrix indexing in `mnm_regression.ipynb`. Need surrounding code
+   review.
+
+### Architectural realization
+
+`fdat` is purely a variable name in xqtl-protocol — it doesn't appear in
+pecotmr. With S4, the type is exposed by the loader name and the class name; the
+variable name is doing zero work.
+
+More importantly, **xqtl-protocol is reaching into low-level pecotmr internals**.
+The current shape of `fsusie.R` / `susie_twas.R` / `mnm.R` is a per-condition
+loop that unpacks `fdat` into pieces and feeds them to pipeline functions that
+were designed for the old list-based API. Two smoking guns:
+
+1. `susie_post_processor` (called from fsusie.R) was **removed from pecotmr in
+   March 2024** (commit d85aca2). It doesn't exist anywhere — the script would
+   fail at runtime. The replacement (`postprocessFinemappingFits`) is meant to be
+   called *inside* `univariateAnalysisPipeline`, not by external callers.
+2. `univariateAnalysisPipeline`, `multivariateAnalysisPipeline`,
+   `twasWeightsPipeline`, `fsusieWrapper` all still take loose pieces (X, Y,
+   maf, ...). With S4, this signature is a smell: the caller is forced to ask
+   the object for its slots and feed them back.
+
+## Target architecture
+
+### pecotmr public API: S4 in, structured out
+
+Each per-region pipeline accepts the loader's S4 directly. No loose-piece public
+signatures. Loose-piece logic survives as private helpers.
+
+| Pipeline | Public signature | Returns |
+|---|---|---|
+| `univariateAnalysisPipeline` | `(regional = RegionalData, ...)` | named list per condition |
+| `multivariateAnalysisPipeline` | `(regional = MultivariateRegionalData, ...)` | single result |
+| `twasWeightsPipeline` | `(regional = RegionalData, ...)` | named list of per-condition `TwasWeights` |
+| `fsusieWrapper` | `(regional = RegionalData, ...)` | named list of per-condition fSuSiE results |
+| `functionalAnalysisPipeline` *(new)* | `(regional = RegionalData, ...)` | named list of per-condition composites (top-PC + SuSiE-on-PC + TWAS + fSuSiE) |
+| `quantileAnalysisPipeline` *(new)* | `(regional = RegionalData, ...)` | named list of per-condition results |
+| `colocboostPipeline` | `(regional = MultitaskRegionalData, focalTrait, ...)` | single result |
+| `rssAnalysisPipeline` | unchanged — already takes `LdData` | unchanged |
+| `susieRssPipeline` | `(qcResult = QcResult, ldData = LdData, ...)` (tighten from loose) | single result |
+| `twasPipeline` (cTWAS) | unchanged — per-LD-block by design | unchanged |
+| `mashPipeline`, `sldscPostprocessingPipeline` | unchanged — genome-scope by design | unchanged |
+
+### Per-gene SoS parallelization compatibility
+
+Confirmed: pecotmr's per-region pipelines are scoped exactly right for "one SoS
+job per gene". Three caveats:
+
+- **cTWAS** is per-LD-block (`twasPipeline(twasWeightsData, ldMetaFilePath,
+  gwasMetaFile, regionBlock, ...)`) — joint inference across genes sharing an LD
+  region. `twas_ctwas.ipynb` already drives this correctly.
+- **mash / S-LDSC** are genome-aggregating; the SoS pipeline is two-stage
+  (per-gene fan-out → genome gather). `mash_preprocessing` → `mash_fit` →
+  `mash_posterior` already has this shape.
+- **`loadMultitaskRegionalData` currently returns a plain `list`**, not S4.
+  Needs S4-ification along with a `MultitaskRegionalData` class.
+
+### xqtl-protocol target shape
+
+Each per-region R script collapses to ~10 lines:
+
+```r
+opt <- parse_args(...)
+regional <- loadRegional*Data(
+  genotype = opt$genotype, phenotype = phenotype_files,
+  covariate = covariate_files, region = opt$region,
+  conditions = conditions, ...
+)
+result <- xxxAnalysisPipeline(regional, ...)
+result$region_info <- list(
+  region_coord = parseRegion(opt$region),
+  grange       = parseRegion(opt$window),
+  region_name  = opt[["region-name"]]
+)
+saveRDS(result, opt$output)
+```
+
+No `fdat$...`. No `@`. No accessors in the success path. No per-condition loop.
+
+## Plan
+
+### Phase 0 — pecotmr accessor foundation
+
+**Must come first** because every later phase needs internal pecotmr code to
+read slots via accessors (Bioconductor rule).
+
+#### Phase 0A — Accessor completion (~25-30 new accessors)
+
+Add missing `setGeneric` + `setMethod("get*", ...)` for every slot read anywhere
+in pecotmr code outside of accessor bodies / constructors / validity.
+
+**`RegionalData`** (4 new):
+- `getScaleResiduals(x)` → `x@scaleResiduals`
+- `getMaf(x)` for `RegionalData` (mirror of the `MultivariateRegionalData` method)
+- `getRegion(x)` → `x@region` (the `GRanges` directly; complementary to existing
+  derived `getChrom`/`getGrange`)
+- `getDroppedSamples(x)` → `x@droppedSamples`
+- `getCoordinates(x)` → `x@coordinates`
+
+**`MultivariateRegionalData`** (3 new, same generics):
+- `getDroppedSamples`, `getRegion`, `getCoordinates`
+
+**`LdData`** (2 new):
+- `getSnpIdx(x)` → `x@snpIdx`
+- `getNRef(x)` → `x@nRef`
+
+**`GenotypeHandle`** (5 new):
+- `getPath`, `getFormat`, `getSnpInfo`, `getNSamples`, `getSampleIds`
+- (`getPgenPtr` only if any non-method code reads it)
+
+**`GwasSumStats`** (3 new):
+- `getSumstats`, `getGenome`, `getTraitName`
+
+**`FineMappingResult`** (2 new):
+- `getMethod`, `getSumstats`
+
+**`H2Estimate`, `LdBlocks`, `LdEigen`, `LdScore`, `AnnotationMatrix`** (~10
+more across the heritability / LD-statistic infrastructure, exact set after
+internal audit)
+
+All exported. Each gets a roxygen page and a test against a fixture object.
+
+#### Phase 0B — Internal `@` → accessor refactor
+
+Scripted pass across all of `R/`. For every `<expr>@<slot>` read outside of:
+
+- `setMethod("get*", "Class", ...)` bodies
+- `new("Class", ...)` and constructor function bodies
+- `validity = function(object) { ... }` bodies (Bioconductor permits `@` here
+  because the object is being constructed)
+- `slot(x, "name") <- value` assignment
+
+…replace with the corresponding accessor call. No behavior change. All tests
+must pass. Baseline for `R CMD BiocCheck` compliance.
+
+Existing scope: ~357 `@` access sites across 43 files. Bulk in `file_utils.R`,
+`twas.R`, `univariate_pipeline.R`, `multivariate_pipeline.R`, `LD.R`,
+`ld_loader.R`, `mash_wrapper.R`, `sumstats_qc.R`.
+
+### Phase 1 — Pipelines accept only S4
+
+**pecotmr PR A**: Refactor `univariateAnalysisPipeline`. Rename current body to
+private `.univariateAnalysisOneCondition` (still uses accessors internally — no
+`@`). New `univariateAnalysisPipeline(regional, ...)` requires `RegionalData`,
+iterates conditions via `seq_along(getPhenotypes(regional))`, calls the private
+helper. Update tests + roxygen examples to construct `RegionalData`.
+
+**pecotmr PR B**: Same refactor for `multivariateAnalysisPipeline`,
+`twasWeightsPipeline`, `fsusieWrapper`.
+
+**pecotmr PR C**: New `MultitaskRegionalData` S4 class. Update
+`loadMultitaskRegionalData` to return it. Update `regionDataToIndInput`,
+`regionDataToRssInput`, `regionDataToColocboostInput` to accept it.
+`colocboostPipeline(regional = MultitaskRegionalData, ...)`.
+
+**pecotmr PR D**: New `functionalAnalysisPipeline(regional, ...)` — absorbs
+the per-condition composite from `fsusie.R` (top-PC + SuSiE-on-PC + TWAS +
+fSuSiE).
+
+**pecotmr PR E**: New `quantileAnalysisPipeline(regional, ...)` — absorbs the
+quantile-regression logic from `qr_and_twas.ipynb`.
+
+**pecotmr PR F**: Tighten `susieRssPipeline` to take `QcResult` + `LdData`.
+
+**pecotmr PR G**: Documentation pass. `NEWS.md`: "Per-region pipelines accept
+S4 inputs only. Loose-piece signatures are no longer public. Construct a
+`RegionalData` via the loader or the constructor." pkgdown index updated.
+
+#### Open call: hard vs. soft deprecation
+
+If the only real consumer of pecotmr is xqtl-protocol + this team's local
+scripts, **hard break** is fine and saves a release cycle. If there are
+external workflows, **soft path** with `.Deprecated()` for a release is safer.
+Decide before Phase 1.
+
+### Phase 2 — xqtl-protocol thin wrappers
+
+Each per-region script becomes ~10 lines (template above). Per file:
+
+- `code/script/mnm_analysis/mnm_methods/susie_twas.R` (~440 lines → ~15)
+- `code/script/mnm_analysis/mnm_methods/fsusie.R` (~250 lines → ~15)
+- `code/script/mnm_analysis/mnm_methods/mnm.R` (~180 lines → ~15)
+- `code/SoS/mnm_analysis/mnm_methods/mnm_regression.ipynb` (R-in-SoS blocks)
+- `code/SoS/association_scan/quantile_models/qr_and_twas.ipynb`
+- `code/SoS/mnm_analysis/mnm_methods/colocboost.ipynb` (R-block only — the
+  Python side stays for now)
+
+One file at a time. Toy-dataset end-to-end verification between each.
+
+### Phase 3 — Cleanup
+
+- Delete `filter_fdat_except_specific_names` from fsusie.R (replaced by
+  `minMarkers` arg in `loadRegionalFunctionalData`).
+- Delete the cTWAS dedup block in `twas_ctwas.ipynb` (pecotmr's `loadLdMatrix`
+  already deduplicates internally).
+- Delete `fdat$residual_X[[r]] <- NA` and `fdat$residual_Y[[r]] <- NA`
+  reassignments (memory-clearing is moot under lazy compute).
+- Audit thin-wrapper outputs vs. pre-refactor outputs to confirm no behavioral
+  drift.
+
+## Sequencing summary
+
+| Order | Work | Why |
+|---|---|---|
+| 1 | Phase 0A — add missing accessors | Pure addition, no risk. Unblocks 0B and Phase 1. |
+| 2 | Phase 0B — internal `@` → accessor refactor | Establishes Bioconductor compliance. No behavior change. |
+| 3 | Phase 1 PRs A–G — pipelines accept S4 only | Sequentially; each PR is one pipeline family. |
+| 4 | Phase 2 — xqtl-protocol thin wrappers | One file at a time, verified against toy. |
+| 5 | Phase 3 — cleanup | After thin wrappers are confirmed working. |
+
+## Verification strategy
+
+For each Phase 1 pipeline PR:
+
+```r
+test_that("univariateAnalysisPipeline.RegionalData matches one-condition path", {
+  regional <- makeToyRegionalData()
+  out_via_helper <- .univariateAnalysisOneCondition(
+    X = getResidualX(regional, 1),
+    Y = getResidualY(regional, 1),
+    maf = getMaf(regional)[[1]],
+    ...
+  )
+  out_via_public <- univariateAnalysisPipeline(regional, ...)
+  expect_equal(out_via_public[[1]], out_via_helper)
+})
+```
+
+For each xqtl-protocol thin wrapper:
+
+1. Run pre-refactor script on toy dataset; capture output RDS as fixture.
+2. Refactor to thin wrapper.
+3. Run new wrapper on same toy dataset.
+4. Diff RDS structures and key numerics (PIPs, z-scores, weights).
+
+A wrapper is "done" only when its output matches the fixture, not when it
+compiles.
+
+## Per-condition iteration semantic note
+
+For non-indexed `length()` / `names()` / `seq_along()` wrappers around
+`fdat$residual_Y`, the mechanical conversion used `getPhenotypes(fdat)` rather
+than materializing a residuals list. This is functionally equivalent because
+`getResidualY(fdat, r)` returns a residual for each phenotype condition with the
+same name. The conversion is correct for count- and name-based iteration; a
+purist would precompute the residual list once outside the loop and index it
+inside, which is both clearer and faster but a structural change beyond the
+mechanical scope.
+
+## Files modified by the mechanical passes (committed state TBD)
+
+- `code/SoS/association_scan/quantile_models/qr_and_twas.ipynb`
+- `code/SoS/enrichment/sldsc_enrichment.ipynb`
+- `code/SoS/mnm_analysis/mnm_methods/colocboost.ipynb`
+- `code/SoS/mnm_analysis/mnm_methods/mnm_regression.ipynb`
+- `code/SoS/mnm_analysis/mnm_postprocessing.ipynb`
+- `code/SoS/multivariate_genome/MASH/mash_preprocessing.ipynb`
+- `code/SoS/pecotmr_integration/SuSiE_enloc.ipynb`
+- `code/SoS/pecotmr_integration/twas_ctwas.ipynb`
+- `code/SoS/reference_data/rss_ld_sketch.ipynb`
+- `code/script/mnm_analysis/mnm_methods/fsusie.R`
+- `code/script/mnm_analysis/mnm_methods/mnm.R`
+- `code/script/mnm_analysis/mnm_methods/susie_twas.R`
+
+`pipeline/*.ipynb` are symlinks into `code/SoS/`, so they pick up the changes
+transparently.