StatFunGen · al4225 · Jun 24, 2026 · Jun 24, 2026
diff --git a/code/SoS/mnm_analysis/mnm_methods/mnm_regression.ipynb b/code/SoS/mnm_analysis/mnm_methods/mnm_regression.ipynb
@@ -8,7 +8,7 @@
    "source": [
     "# Advanced regression models for association analysis with individual-level data\n",
     "\n",
-    "This is a new-user-friendly minimal working example (MWE) of the `mnm_regression` pipeline. It runs end-to-end on the bundled chr22 toy dataset and shows how to compute SuSiE fine-mapping results and multi-method TWAS weights for a list of molecular-phenotype regions, then build the ensemble TWAS weights."
+    "This is a new-user-friendly minimal working example (MWE) of the `mnm_regression` pipeline. It runs end-to-end on the bundled chr22 toy dataset (60 samples) and shows how to compute SuSiE fine-mapping results and multi-method TWAS weights for a list of molecular-phenotype regions, then build the ensemble (multi-context) TWAS weights."
    ]
   },
   {
@@ -37,12 +37,12 @@
     "- **Peak / chromatin input** (`input/`, Mode B): the *same* pipeline also accepts a list of chr22 peaks (or any other molecular phenotype) - the command is identical, you just swap in the phenotype BED and its manifest.\n",
     "\n",
     "**Workflows used here:**\n",
-    "- `susie_twas` -- univariate SuSiE fine-mapping plus multi-method TWAS weights (one `*.univariate_twas_weights.rds` per region).\n",
-    "- `ensemble_twas_weight` -- reads the univariate weights and adds the ensemble weight (one `*.ensemble_twas_weights.rds` per region).\n",
+    "- `susie_twas` -- univariate SuSiE fine-mapping plus multi-method TWAS weights (one `*.univariate_twas_weights.rds` per region). Run with `--save-data` so the combined fine-mapping object is written for the next step.\n",
+    "- `mnm` -- multi-context fine-mapping (mvSuSiE + mr.mash) that consumes the `susie_twas --save-data` output through a `--fine-mapping-meta` table and produces the ensemble weights (one `*.multicontext_twas_weights.rds` per gene).\n",
     "\n",
-    "The notebook also documents the `mvsusie` (multivariate) and `fsusie` (functional) workflows that the pipeline supports, but the MWE focuses on the univariate + ensemble path that is runnable on the toy data.\n",
+    "The notebook also documents the `mnm_genes` (multi-gene), `mvfsusie`/`fsusie` (functional), and `mvsusie` (multivariate) workflows that the pipeline supports, but the MWE focuses on the univariate (`susie_twas`) + multi-context (`mnm`) path that is runnable on the toy data.\n",
     "\n",
-    "**Note on data:** this example uses a small synthetic chr22 toy dataset (1,995 variants after subsetting). No access-controlled individual-level human data is used."
+    "**Note on data:** this example uses a small synthetic chr22 toy dataset (60 samples, 1,995 variants after subsetting). No access-controlled individual-level human data is used."
    ]
   },
   {
@@ -78,13 +78,13 @@
    "source": [
     "## Steps\n",
     "\n",
-    "Run the two steps in order. **Step 1** (`susie_twas`) produces the per-region univariate TWAS weights; **Step 2** (`ensemble_twas_weight`) reads those weights and adds the ensemble. Run each command from inside `data/example_10peaks_susie_twas/`; the pipeline notebook is referenced as `../../pipeline/mnm_regression.ipynb`.\n",
+    "Run the two steps in order. **Step 1** (`susie_twas`) produces the per-region univariate TWAS weights; **Step 2** (`mnm`) reads those weights and adds the multi-context ensemble. The pipeline notebook is referenced as `../../pipeline/mnm_regression.ipynb`.\n",
     "\n",
-    "**Step 1.** SuSiE fine-mapping + multi-method TWAS weights for every region in the phenotype list.\n",
+    "**Step 1.** SuSiE fine-mapping + multi-method TWAS weights for every region in the phenotype list. Run with `--save-data` so the combined fine-mapping object is written for Step 2.\n",
     "\n",
-    "**Step 2.** Build the ensemble TWAS weight from the Step-1 output. The ensemble is computed by default on the toy data (the `ensemble_r2_threshold` is relaxed so no method is silently dropped).\n",
+    "**Step 2.** Build the ensemble (multi-context) TWAS weight from the Step-1 output with `mnm`. The ensemble is computed by default on the toy data (the `ensemble_r2_threshold` is relaxed so no method is silently dropped).\n",
     "\n",
-    "_Memory note:_ run the peak loop with `-j1`. Running many regions in parallel (`-j4`) -- or even all 10 serially in one process -- can exhaust memory on heavier regions and trigger `ERROR: One of the local workers has been killed`. If that happens, re-run the remaining heavy regions one at a time using a single-region phenotype list."
+    "_Memory note:_ run multi-region loops with `-j1`. Running many regions in parallel (`-j4`) -- or even all of them serially in one process -- can exhaust memory on heavier regions and trigger `ERROR: One of the local workers has been killed`. If that happens, re-run the remaining heavy regions one at a time using a single-region phenotype list."
    ]
   },
   {
@@ -106,9 +106,9 @@
     "kernel": "SoS"
    },
    "source": [
-    "**Timing** (toy chr22 dataset):\n",
+    "**Timing** (toy chr22 dataset, 60 samples):\n",
     "- `susie_twas`: ~3 min (chr22 toy, 1 gene)\n",
-    "- `ensemble_twas_weight`: ~30 sec\n"
+    "- `mnm`: ~30 sec"
    ]
   },
   {
@@ -119,16 +119,13 @@
    },
    "outputs": [],
    "source": [
-    "# Univariate fine-mapping (susie_twas) - VERIFIED end-to-end on the toy gene-expression data.\n",
-    "# Run from the toy_example root. --save-data writes univariate_data.rds, which supplies\n",
-    "# the combined_data_sumstats that the downstream mnm_genes workflow consumes.\n",
-    "sos run pipeline/mnm_regression.ipynb susie_twas \\\n",
-    "    --name test_uni --cwd output_uni \\\n",
+    "sos run pipeline/mnm_regression.ipynb qtl_dataset_construct+susie_twas \\\n",
+    "    --name protocol_example --cwd output_uni \\\n",
     "    --genoFile input/colocboost/example.chr22.bed \\\n",
-    "    --phenoFile input/colocboost/pheno_manifest.tsv \\\n",
+    "    --phenoFile input/colocboost/pheno_manifest_multicontext.tsv \\\n",
     "    --covFile input/colocboost/example_covariates.tsv \\\n",
     "    --customized-association-windows input/colocboost/association_windows.bed \\\n",
-    "    --save-data -j1"
+    "    --region-name ENSG00000130538 --transpose-covariates --save-data -j1"
    ]
   },
   {
@@ -137,7 +134,7 @@
     "kernel": "SoS"
    },
    "source": [
-    "**Step 2 (gene expression): ensemble TWAS weights.**"
+    "**Step 2 (gene expression): multi-context fine-mapping + ensemble TWAS weights (mnm).**"
    ]
   },
   {
@@ -148,14 +145,14 @@
    },
    "outputs": [],
    "source": [
-    "sos run pipeline/mnm_regression.ipynb ensemble_twas_weight \\\n",
-    "    --name geneexpr_twas \\\n",
-    "    --cwd output/colocboost \\\n",
+    "sos run pipeline/mnm_regression.ipynb mnm \\\n",
+    "    --name protocol_example --cwd output_mnm \\\n",
     "    --genoFile input/colocboost/example.chr22.bed \\\n",
-    "    --phenoFile input/colocboost/pheno_manifest_1gene.tsv \\\n",
+    "    --phenoFile input/colocboost/pheno_manifest_multicontext.tsv \\\n",
     "    --covFile input/colocboost/example_covariates.tsv \\\n",
-    "    --customized-association-windows input/colocboost/association_windows_1gene.bed \\\n",
-    "    -j1"
+    "    --customized-association-windows input/colocboost/association_windows.bed \\\n",
+    "    --fine-mapping-meta input/colocboost/fine_mapping_meta.tsv \\\n",
+    "    --transpose-covariates --save-data -j1"
    ]
   },
   {
@@ -180,7 +177,7 @@
    "outputs": [],
    "source": [
     "sos run pipeline/mnm_regression.ipynb susie_twas \\\n",
-    "    --name example_10peaks_twas \\\n",
+    "    --name protocol_example \\\n",
     "    --cwd output_10peaks \\\n",
     "    --genoFile input/genotype/protocol_example.genotype.chr22.bed \\\n",
     "    --phenoFile input/phenotype/protocol_example.pheno_manifest.tsv \\\n",
@@ -195,7 +192,7 @@
     "kernel": "SoS"
    },
    "source": [
-    "**Step 2 (peak): ensemble TWAS weights.**"
+    "**Step 2 (peak): multi-context fine-mapping + ensemble TWAS weights (mnm).**"
    ]
   },
   {
@@ -206,14 +203,17 @@
    },
    "outputs": [],
    "source": [
-    "sos run pipeline/mnm_regression.ipynb ensemble_twas_weight \\\n",
-    "    --name example_10peaks_twas \\\n",
+    "# Step 2 (10 peaks): multi-context fine-mapping + ensemble TWAS weights (mnm).\n",
+    "# As above, mnm consumes the Step-1 univariate outputs via --fine-mapping-meta.\n",
+    "sos run pipeline/mnm_regression.ipynb mnm \\\n",
+    "    --name protocol_example \\\n",
     "    --cwd output_10peaks \\\n",
     "    --genoFile input/genotype/protocol_example.genotype.chr22.bed \\\n",
     "    --phenoFile input/phenotype/protocol_example.pheno_manifest.tsv \\\n",
     "    --covFile input/covariate/protocol_example.covariates.tsv \\\n",
     "    --customized-association-windows input/finemapping/protocol_example.association_windows.bed \\\n",
-    "    -j1\n"
+    "    --fine-mapping-meta output_10peaks/fine_mapping_meta.tsv \\\n",
+    "    -j1"
    ]
   },
   {
@@ -224,9 +224,9 @@
    "source": [
     "## Output Files\n",
     "\n",
-    "Under `--cwd` (e.g. `output_10peaks/` or `output/colocboost/`), in the `twas_weights/` subfolder:\n",
-    "- `{name}.<region>.univariate_twas_weights.rds` -- Step 1 output, one per region: the SuSiE fine-mapping result plus TWAS weights from each method (e.g. lasso, enet, mrash, susie), variant names, region info and CV performance.\n",
-    "- `{name}.<region>.ensemble_twas_weights.rds` -- Step 2 output, one per region: the same content with the added ensemble. The ensemble is stored under an `ensemble` entry as a list of `method_coef`, `ensemble_twas_weights`, and `method_performance`.\n",
+    "Under `--cwd` (e.g. `output_10peaks/` or `output_uni/`):\n",
+    "- `twas_weights/{name}.<region>.univariate_twas_weights.rds` -- Step 1 (`susie_twas`) output, one per region: the SuSiE fine-mapping result plus TWAS weights from each method (e.g. lasso, enet, mrash, susie), variant names, region info and CV performance.\n",
+    "- `multivariate_twas_weights/{name}.<region>.multicontext_twas_weights.rds` -- Step 2 (`mnm`) output, one per gene: the multi-context model with the added ensemble. The ensemble is stored under an `ensemble` entry alongside `method_coef`, the ensemble weights, and `method_performance`.\n",
     "\n",
     "Fine-mapping objects are also written under the `fine_mapping/` subfolder, and per-region `.stdout`/`.stderr` logs accompany each run."
    ]
@@ -239,9 +239,9 @@
    "source": [
     "## Anticipated Results\n",
     "\n",
-    "For the 10-peak run you should get 10 `*.univariate_twas_weights.rds` and 10 `*.ensemble_twas_weights.rds` files (one per peak: C22P107555, 557, 559, 560, 562, 563, 564, 565, 567, 568). Step 2 reports something like `10 items in 10 groups`. For the 1-gene gene-expression run you get one file of each type.\n",
+    "For the 10-peak run you should get 10 `*.univariate_twas_weights.rds` (one per peak) and 10 `*.multicontext_twas_weights.rds` from the `mnm` step. For the 1-gene gene-expression run you get one file of each type.\n",
     "\n",
-    "You can sanity-check an ensemble in R:\n"
+    "You can sanity-check a multi-context ensemble weight in R:"
    ]
   },
   {
@@ -252,13 +252,10 @@
    },
    "outputs": [],
    "source": [
-    "# In R: inspect one ensemble output\n",
-    "a <- readRDS('output_10peaks/twas_weights/example_10peaks_twas.chr22_C22P107568.ensemble_twas_weights.rds')\n",
-    "region <- names(a)[1]; pheno <- names(a[[region]])[1]\n",
-    "en <- a[[region]][[pheno]][['ensemble']]\n",
-    "names(en)                       # method_coef, ensemble_twas_weights, method_performance\n",
-    "length(en[['ensemble_twas_weights']])   # 1995 (one weight per variant)\n",
-    "length(en[['method_coef']])             # number of contributing methods"
+    "# In R: inspect one multi-context (mnm) ensemble output\n",
+    "w <- readRDS('output_uni/multivariate_twas_weights/protocol_example.ENSG00000130538.multicontext_twas_weights.rds')\n",
+    "names(w)                       # available genes / contexts\n",
+    "str(w, max.level = 2)          # weights + per-method performance"
    ]
   },
   {
@@ -293,9 +290,10 @@
     "\n",
     "The same pipeline notebook also provides:\n",
     "- `mvsusie` -- multivariate SuSiE with mr.mash TWAS weights (for jointly analyzing multiple phenotypes that share a genotype).\n",
+    "- `mnm_genes` -- multi-gene multivariate fine-mapping across genes sharing a window.\n",
     "- `fsusie` -- functional SuSiE for functional/high-resolution modalities (e.g. methylation), with a `--fsusie-post-processing` option (`TI`, `HMM`, or `none`; `TI` is recommended).\n",
     "\n",
-    "These follow the same `--genoFile / --phenoFile / --covFile` interface; they are documented here for completeness but the verified MWE above uses `susie_twas` + `ensemble_twas_weight`."
+    "These follow the same `--genoFile` / `--phenoFile` / `--covFile` interface; they are documented here for completeness but the verified MWE above uses `susie_twas` + `mnm`."
    ]
   },
   {
@@ -315,7 +313,7 @@
    "source": [
     "### Multivariate analysis: mvSuSiE and mr.mash (`mnm`)\n",
     "\n",
-    "Fine-maps multiple molecular phenotypes jointly across a shared window using mvSuSiE with mr.mash priors. This step needs **multiple contexts** (e.g. the same genes measured under two or more conditions). The toy set ships a single context, so for this MWE we add a **second, synthetic context** (`example/colocboost_ctx2.bed.gz`, derived from context 1 with added noise — *demo data only, not a real measurement*) and a multi-context phenotype manifest. With that in place the `mnm` step now runs end-to-end on the toy data and produces `multicontext_bvsr.rds`, `multicontext_twas_weights.rds`, and `multicontext_data.rds` per gene."
+    "Fine-maps multiple molecular phenotypes jointly across a shared window using mvSuSiE with mr.mash priors. This step needs **multiple contexts** (e.g. the same genes measured under two or more conditions). The toy set ships a single context, so for this MWE we add a **second, synthetic context** (`example/colocboost_ctx2.bed.gz`, derived from context 1 with added noise \u2014 *demo data only, not a real measurement*) and a multi-context phenotype manifest. With that in place the `mnm` step now runs end-to-end on the toy data and produces `multicontext_bvsr.rds`, `multicontext_twas_weights.rds`, and `multicontext_data.rds` per gene."
    ]
   },
   {
@@ -334,7 +332,7 @@
     "#    SAMPLE_001..) so genotype and phenotype/covariate samples match.\n",
     "#  - Requires the mr.mashr R package (provided by mr.mash.alpha in this env).\n",
     "sos run pipeline/mnm_regression.ipynb mnm \\\n",
-    "    --name test_mnm --cwd output_mnm3 \\\n",
+    "    --name protocol_example --cwd output_mnm3 \\\n",
     "    --genoFile input/colocboost/example.chr22.bed \\\n",
     "    --phenoFile input/colocboost/pheno_manifest_multicontext.tsv \\\n",
     "    --covFile input/colocboost/example_covariates.tsv \\\n",
@@ -370,7 +368,7 @@
     "# On this toy set all regions return a NULL multigene_bvsr (no multi-gene/trans signal),\n",
     "# which is the correct, honest result for independent single-gene toy regions.\n",
     "sos run pipeline/mnm_regression.ipynb mnm_genes \\\n",
-    "    --name test_mnmgenes --cwd output_mnmgenes \\\n",
+    "    --name protocol_example --cwd output_mnmgenes \\\n",
     "    --genoFile input/colocboost/example.chr22.bed \\\n",
     "    --phenoFile input/colocboost/pheno_manifest_multicontext.tsv \\\n",
     "    --covFile input/colocboost/example_covariates.tsv \\\n",
@@ -403,7 +401,7 @@
     "# Functional SuSiE (fsusie) - VERIFIED end-to-end on the toy peak data.\n",
     "# Run from the toy_example root. Produces *.fsusie_mixture_normal_TI__top_pc_weights.rds.\n",
     "sos run pipeline/mnm_regression.ipynb fsusie \\\n",
-    "    --name test_fsusie --cwd output/fsusie \\\n",
+    "    --name protocol_example --cwd output/fsusie \\\n",
     "    --genoFile input/genotype/protocol_example.genotype.chr22.bed \\\n",
     "    --phenoFile input/phenotype/protocol_example.pheno_manifest.tsv \\\n",
     "    --covFile input/covariate/protocol_example.covariates.tsv \\\n",
@@ -433,7 +431,7 @@
     "# Still under development on the toy dataset (the mvfSuSiE R step errors out).\n",
     "# Command template:\n",
     "sos run pipeline/mnm_regression.ipynb mvfsusie \\\n",
-    "    --name test_mvfsusie --cwd output/mvfsusie \\\n",
+    "    --name protocol_example --cwd output/mvfsusie \\\n",
     "    --genoFile input/genotype/protocol_example.genotype.chr22.bed \\\n",
     "    --phenoFile input/phenotype/protocol_example.pheno_manifest.tsv \\\n",
     "    --covFile input/covariate/protocol_example.covariates.tsv \\\n",
@@ -452,9 +450,12 @@
     "| Step | Problem | Possible Reason | Solution |\n",
     "| --- | --- | --- | --- |\n",
     "| susie_twas | A region is skipped / no window found | 4th-column `ID` in the association-windows file does not match the phenotype list `ID` | Make the `ID` columns match exactly. |\n",
+    "| susie_twas | `susie_twas: pass --region-name <gene-id>` | No region selected | Pass `--region-name <gene-id>` (e.g. `ENSG00000130538`) or a `--region-list`. |\n",
     "| susie_twas | `ERROR: One of the local workers has been killed` | Out-of-memory: too many regions in one process (`-j4`, or all 10 serially) | Use `-j1`; re-run heavy regions one at a time with a single-region phenotype list. |\n",
-    "| ensemble_twas_weight | `Missing upstream TWAS weights file` | Step 1 was not run for that region | Run `susie_twas` first so the `*.univariate_twas_weights.rds` exists. |\n",
-    "| both | Sample IDs do not overlap between genotype/covariates and phenotype | Genotype `.fam` IDs differ from the phenotype matrix sample names | Rename one side so they match (e.g. the gene-expression mode renames genotype/covariate IDs to `SAMPLE_001..049`). |"
+    "| mnm | `Target unavailable: ...qtl_dataset.rds` | `qtl_dataset_construct` not run first | Chain it: `qtl_dataset_construct+susie_twas`, then `mnm`. |\n",
+    "| mnm | `--phenotype-manifest missing required column(s): cond` | Single-context manifest used | Use the multi-context manifest (`pheno_manifest_multicontext.tsv`). |\n",
+    "| both | `No shared samples between phenotype and phenotype-covariate` | Covariates are in QTLtools (transposed) layout | Add `--transpose-covariates`. |\n",
+    "| both | Sample IDs do not overlap between genotype/covariates and phenotype | Genotype `.fam` IDs differ from the phenotype matrix sample names | Rename one side so they match (e.g. the gene-expression mode renames genotype/covariate IDs to `SAMPLE_001..060`). |"
    ]
   },
   {
@@ -1329,4 +1330,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 4
-}
+}