diff --git a/code/SoS/mnm_analysis/mnm_methods/mnm_regression.ipynb b/code/SoS/mnm_analysis/mnm_methods/mnm_regression.ipynb index 70bb2808..075cd8a8 100644 --- a/code/SoS/mnm_analysis/mnm_methods/mnm_regression.ipynb +++ b/code/SoS/mnm_analysis/mnm_methods/mnm_regression.ipynb @@ -8,7 +8,7 @@ "source": [ "# Advanced regression models for association analysis with individual-level data\n", "\n", - "This is a new-user-friendly minimal working example (MWE) of the `mnm_regression` pipeline. It runs end-to-end on the bundled chr22 toy dataset and shows how to compute SuSiE fine-mapping results and multi-method TWAS weights for a list of molecular-phenotype regions, then build the ensemble TWAS weights." + "This is a new-user-friendly minimal working example (MWE) of the `mnm_regression` pipeline. It runs end-to-end on the bundled chr22 toy dataset (60 samples) and shows how to compute SuSiE fine-mapping results and multi-method TWAS weights for a list of molecular-phenotype regions, then build the ensemble (multi-context) TWAS weights." ] }, { @@ -37,12 +37,12 @@ "- **Peak / chromatin input** (`input/`, Mode B): the *same* pipeline also accepts a list of chr22 peaks (or any other molecular phenotype) - the command is identical, you just swap in the phenotype BED and its manifest.\n", "\n", "**Workflows used here:**\n", - "- `susie_twas` -- univariate SuSiE fine-mapping plus multi-method TWAS weights (one `*.univariate_twas_weights.rds` per region).\n", - "- `ensemble_twas_weight` -- reads the univariate weights and adds the ensemble weight (one `*.ensemble_twas_weights.rds` per region).\n", + "- `susie_twas` -- univariate SuSiE fine-mapping plus multi-method TWAS weights (one `*.univariate_twas_weights.rds` per region). Run with `--save-data` so the combined fine-mapping object is written for the next step.\n", + "- `mnm` -- multi-context fine-mapping (mvSuSiE + mr.mash) that consumes the `susie_twas --save-data` output through a `--fine-mapping-meta` table and produces the ensemble weights (one `*.multicontext_twas_weights.rds` per gene).\n", "\n", - "The notebook also documents the `mvsusie` (multivariate) and `fsusie` (functional) workflows that the pipeline supports, but the MWE focuses on the univariate + ensemble path that is runnable on the toy data.\n", + "The notebook also documents the `mnm_genes` (multi-gene), `mvfsusie`/`fsusie` (functional), and `mvsusie` (multivariate) workflows that the pipeline supports, but the MWE focuses on the univariate (`susie_twas`) + multi-context (`mnm`) path that is runnable on the toy data.\n", "\n", - "**Note on data:** this example uses a small synthetic chr22 toy dataset (1,995 variants after subsetting). No access-controlled individual-level human data is used." + "**Note on data:** this example uses a small synthetic chr22 toy dataset (60 samples, 1,995 variants after subsetting). No access-controlled individual-level human data is used." ] }, { @@ -78,13 +78,13 @@ "source": [ "## Steps\n", "\n", - "Run the two steps in order. **Step 1** (`susie_twas`) produces the per-region univariate TWAS weights; **Step 2** (`ensemble_twas_weight`) reads those weights and adds the ensemble. Run each command from inside `data/example_10peaks_susie_twas/`; the pipeline notebook is referenced as `../../pipeline/mnm_regression.ipynb`.\n", + "Run the two steps in order. **Step 1** (`susie_twas`) produces the per-region univariate TWAS weights; **Step 2** (`mnm`) reads those weights and adds the multi-context ensemble. The pipeline notebook is referenced as `../../pipeline/mnm_regression.ipynb`.\n", "\n", - "**Step 1.** SuSiE fine-mapping + multi-method TWAS weights for every region in the phenotype list.\n", + "**Step 1.** SuSiE fine-mapping + multi-method TWAS weights for every region in the phenotype list. Run with `--save-data` so the combined fine-mapping object is written for Step 2.\n", "\n", - "**Step 2.** Build the ensemble TWAS weight from the Step-1 output. The ensemble is computed by default on the toy data (the `ensemble_r2_threshold` is relaxed so no method is silently dropped).\n", + "**Step 2.** Build the ensemble (multi-context) TWAS weight from the Step-1 output with `mnm`. The ensemble is computed by default on the toy data (the `ensemble_r2_threshold` is relaxed so no method is silently dropped).\n", "\n", - "_Memory note:_ run the peak loop with `-j1`. Running many regions in parallel (`-j4`) -- or even all 10 serially in one process -- can exhaust memory on heavier regions and trigger `ERROR: One of the local workers has been killed`. If that happens, re-run the remaining heavy regions one at a time using a single-region phenotype list." + "_Memory note:_ run multi-region loops with `-j1`. Running many regions in parallel (`-j4`) -- or even all of them serially in one process -- can exhaust memory on heavier regions and trigger `ERROR: One of the local workers has been killed`. If that happens, re-run the remaining heavy regions one at a time using a single-region phenotype list." ] }, { @@ -106,9 +106,9 @@ "kernel": "SoS" }, "source": [ - "**Timing** (toy chr22 dataset):\n", + "**Timing** (toy chr22 dataset, 60 samples):\n", "- `susie_twas`: ~3 min (chr22 toy, 1 gene)\n", - "- `ensemble_twas_weight`: ~30 sec\n" + "- `mnm`: ~30 sec" ] }, { @@ -119,16 +119,13 @@ }, "outputs": [], "source": [ - "# Univariate fine-mapping (susie_twas) - VERIFIED end-to-end on the toy gene-expression data.\n", - "# Run from the toy_example root. --save-data writes univariate_data.rds, which supplies\n", - "# the combined_data_sumstats that the downstream mnm_genes workflow consumes.\n", - "sos run pipeline/mnm_regression.ipynb susie_twas \\\n", - " --name test_uni --cwd output_uni \\\n", + "sos run pipeline/mnm_regression.ipynb qtl_dataset_construct+susie_twas \\\n", + " --name protocol_example --cwd output_uni \\\n", " --genoFile input/colocboost/example.chr22.bed \\\n", - " --phenoFile input/colocboost/pheno_manifest.tsv \\\n", + " --phenoFile input/colocboost/pheno_manifest_multicontext.tsv \\\n", " --covFile input/colocboost/example_covariates.tsv \\\n", " --customized-association-windows input/colocboost/association_windows.bed \\\n", - " --save-data -j1" + " --region-name ENSG00000130538 --transpose-covariates --save-data -j1" ] }, { @@ -137,7 +134,7 @@ "kernel": "SoS" }, "source": [ - "**Step 2 (gene expression): ensemble TWAS weights.**" + "**Step 2 (gene expression): multi-context fine-mapping + ensemble TWAS weights (mnm).**" ] }, { @@ -148,14 +145,14 @@ }, "outputs": [], "source": [ - "sos run pipeline/mnm_regression.ipynb ensemble_twas_weight \\\n", - " --name geneexpr_twas \\\n", - " --cwd output/colocboost \\\n", + "sos run pipeline/mnm_regression.ipynb mnm \\\n", + " --name protocol_example --cwd output_mnm \\\n", " --genoFile input/colocboost/example.chr22.bed \\\n", - " --phenoFile input/colocboost/pheno_manifest_1gene.tsv \\\n", + " --phenoFile input/colocboost/pheno_manifest_multicontext.tsv \\\n", " --covFile input/colocboost/example_covariates.tsv \\\n", - " --customized-association-windows input/colocboost/association_windows_1gene.bed \\\n", - " -j1" + " --customized-association-windows input/colocboost/association_windows.bed \\\n", + " --fine-mapping-meta input/colocboost/fine_mapping_meta.tsv \\\n", + " --transpose-covariates --save-data -j1" ] }, { @@ -180,7 +177,7 @@ "outputs": [], "source": [ "sos run pipeline/mnm_regression.ipynb susie_twas \\\n", - " --name example_10peaks_twas \\\n", + " --name protocol_example \\\n", " --cwd output_10peaks \\\n", " --genoFile input/genotype/protocol_example.genotype.chr22.bed \\\n", " --phenoFile input/phenotype/protocol_example.pheno_manifest.tsv \\\n", @@ -195,7 +192,7 @@ "kernel": "SoS" }, "source": [ - "**Step 2 (peak): ensemble TWAS weights.**" + "**Step 2 (peak): multi-context fine-mapping + ensemble TWAS weights (mnm).**" ] }, { @@ -206,14 +203,17 @@ }, "outputs": [], "source": [ - "sos run pipeline/mnm_regression.ipynb ensemble_twas_weight \\\n", - " --name example_10peaks_twas \\\n", + "# Step 2 (10 peaks): multi-context fine-mapping + ensemble TWAS weights (mnm).\n", + "# As above, mnm consumes the Step-1 univariate outputs via --fine-mapping-meta.\n", + "sos run pipeline/mnm_regression.ipynb mnm \\\n", + " --name protocol_example \\\n", " --cwd output_10peaks \\\n", " --genoFile input/genotype/protocol_example.genotype.chr22.bed \\\n", " --phenoFile input/phenotype/protocol_example.pheno_manifest.tsv \\\n", " --covFile input/covariate/protocol_example.covariates.tsv \\\n", " --customized-association-windows input/finemapping/protocol_example.association_windows.bed \\\n", - " -j1\n" + " --fine-mapping-meta output_10peaks/fine_mapping_meta.tsv \\\n", + " -j1" ] }, { @@ -224,9 +224,9 @@ "source": [ "## Output Files\n", "\n", - "Under `--cwd` (e.g. `output_10peaks/` or `output/colocboost/`), in the `twas_weights/` subfolder:\n", - "- `{name}..univariate_twas_weights.rds` -- Step 1 output, one per region: the SuSiE fine-mapping result plus TWAS weights from each method (e.g. lasso, enet, mrash, susie), variant names, region info and CV performance.\n", - "- `{name}..ensemble_twas_weights.rds` -- Step 2 output, one per region: the same content with the added ensemble. The ensemble is stored under an `ensemble` entry as a list of `method_coef`, `ensemble_twas_weights`, and `method_performance`.\n", + "Under `--cwd` (e.g. `output_10peaks/` or `output_uni/`):\n", + "- `twas_weights/{name}..univariate_twas_weights.rds` -- Step 1 (`susie_twas`) output, one per region: the SuSiE fine-mapping result plus TWAS weights from each method (e.g. lasso, enet, mrash, susie), variant names, region info and CV performance.\n", + "- `multivariate_twas_weights/{name}..multicontext_twas_weights.rds` -- Step 2 (`mnm`) output, one per gene: the multi-context model with the added ensemble. The ensemble is stored under an `ensemble` entry alongside `method_coef`, the ensemble weights, and `method_performance`.\n", "\n", "Fine-mapping objects are also written under the `fine_mapping/` subfolder, and per-region `.stdout`/`.stderr` logs accompany each run." ] @@ -239,9 +239,9 @@ "source": [ "## Anticipated Results\n", "\n", - "For the 10-peak run you should get 10 `*.univariate_twas_weights.rds` and 10 `*.ensemble_twas_weights.rds` files (one per peak: C22P107555, 557, 559, 560, 562, 563, 564, 565, 567, 568). Step 2 reports something like `10 items in 10 groups`. For the 1-gene gene-expression run you get one file of each type.\n", + "For the 10-peak run you should get 10 `*.univariate_twas_weights.rds` (one per peak) and 10 `*.multicontext_twas_weights.rds` from the `mnm` step. For the 1-gene gene-expression run you get one file of each type.\n", "\n", - "You can sanity-check an ensemble in R:\n" + "You can sanity-check a multi-context ensemble weight in R:" ] }, { @@ -252,13 +252,10 @@ }, "outputs": [], "source": [ - "# In R: inspect one ensemble output\n", - "a <- readRDS('output_10peaks/twas_weights/example_10peaks_twas.chr22_C22P107568.ensemble_twas_weights.rds')\n", - "region <- names(a)[1]; pheno <- names(a[[region]])[1]\n", - "en <- a[[region]][[pheno]][['ensemble']]\n", - "names(en) # method_coef, ensemble_twas_weights, method_performance\n", - "length(en[['ensemble_twas_weights']]) # 1995 (one weight per variant)\n", - "length(en[['method_coef']]) # number of contributing methods" + "# In R: inspect one multi-context (mnm) ensemble output\n", + "w <- readRDS('output_uni/multivariate_twas_weights/protocol_example.ENSG00000130538.multicontext_twas_weights.rds')\n", + "names(w) # available genes / contexts\n", + "str(w, max.level = 2) # weights + per-method performance" ] }, { @@ -293,9 +290,10 @@ "\n", "The same pipeline notebook also provides:\n", "- `mvsusie` -- multivariate SuSiE with mr.mash TWAS weights (for jointly analyzing multiple phenotypes that share a genotype).\n", + "- `mnm_genes` -- multi-gene multivariate fine-mapping across genes sharing a window.\n", "- `fsusie` -- functional SuSiE for functional/high-resolution modalities (e.g. methylation), with a `--fsusie-post-processing` option (`TI`, `HMM`, or `none`; `TI` is recommended).\n", "\n", - "These follow the same `--genoFile / --phenoFile / --covFile` interface; they are documented here for completeness but the verified MWE above uses `susie_twas` + `ensemble_twas_weight`." + "These follow the same `--genoFile` / `--phenoFile` / `--covFile` interface; they are documented here for completeness but the verified MWE above uses `susie_twas` + `mnm`." ] }, { @@ -315,7 +313,7 @@ "source": [ "### Multivariate analysis: mvSuSiE and mr.mash (`mnm`)\n", "\n", - "Fine-maps multiple molecular phenotypes jointly across a shared window using mvSuSiE with mr.mash priors. This step needs **multiple contexts** (e.g. the same genes measured under two or more conditions). The toy set ships a single context, so for this MWE we add a **second, synthetic context** (`example/colocboost_ctx2.bed.gz`, derived from context 1 with added noise — *demo data only, not a real measurement*) and a multi-context phenotype manifest. With that in place the `mnm` step now runs end-to-end on the toy data and produces `multicontext_bvsr.rds`, `multicontext_twas_weights.rds`, and `multicontext_data.rds` per gene." + "Fine-maps multiple molecular phenotypes jointly across a shared window using mvSuSiE with mr.mash priors. This step needs **multiple contexts** (e.g. the same genes measured under two or more conditions). The toy set ships a single context, so for this MWE we add a **second, synthetic context** (`example/colocboost_ctx2.bed.gz`, derived from context 1 with added noise \u2014 *demo data only, not a real measurement*) and a multi-context phenotype manifest. With that in place the `mnm` step now runs end-to-end on the toy data and produces `multicontext_bvsr.rds`, `multicontext_twas_weights.rds`, and `multicontext_data.rds` per gene." ] }, { @@ -334,7 +332,7 @@ "# SAMPLE_001..) so genotype and phenotype/covariate samples match.\n", "# - Requires the mr.mashr R package (provided by mr.mash.alpha in this env).\n", "sos run pipeline/mnm_regression.ipynb mnm \\\n", - " --name test_mnm --cwd output_mnm3 \\\n", + " --name protocol_example --cwd output_mnm3 \\\n", " --genoFile input/colocboost/example.chr22.bed \\\n", " --phenoFile input/colocboost/pheno_manifest_multicontext.tsv \\\n", " --covFile input/colocboost/example_covariates.tsv \\\n", @@ -370,7 +368,7 @@ "# On this toy set all regions return a NULL multigene_bvsr (no multi-gene/trans signal),\n", "# which is the correct, honest result for independent single-gene toy regions.\n", "sos run pipeline/mnm_regression.ipynb mnm_genes \\\n", - " --name test_mnmgenes --cwd output_mnmgenes \\\n", + " --name protocol_example --cwd output_mnmgenes \\\n", " --genoFile input/colocboost/example.chr22.bed \\\n", " --phenoFile input/colocboost/pheno_manifest_multicontext.tsv \\\n", " --covFile input/colocboost/example_covariates.tsv \\\n", @@ -403,7 +401,7 @@ "# Functional SuSiE (fsusie) - VERIFIED end-to-end on the toy peak data.\n", "# Run from the toy_example root. Produces *.fsusie_mixture_normal_TI__top_pc_weights.rds.\n", "sos run pipeline/mnm_regression.ipynb fsusie \\\n", - " --name test_fsusie --cwd output/fsusie \\\n", + " --name protocol_example --cwd output/fsusie \\\n", " --genoFile input/genotype/protocol_example.genotype.chr22.bed \\\n", " --phenoFile input/phenotype/protocol_example.pheno_manifest.tsv \\\n", " --covFile input/covariate/protocol_example.covariates.tsv \\\n", @@ -433,7 +431,7 @@ "# Still under development on the toy dataset (the mvfSuSiE R step errors out).\n", "# Command template:\n", "sos run pipeline/mnm_regression.ipynb mvfsusie \\\n", - " --name test_mvfsusie --cwd output/mvfsusie \\\n", + " --name protocol_example --cwd output/mvfsusie \\\n", " --genoFile input/genotype/protocol_example.genotype.chr22.bed \\\n", " --phenoFile input/phenotype/protocol_example.pheno_manifest.tsv \\\n", " --covFile input/covariate/protocol_example.covariates.tsv \\\n", @@ -452,9 +450,12 @@ "| Step | Problem | Possible Reason | Solution |\n", "| --- | --- | --- | --- |\n", "| susie_twas | A region is skipped / no window found | 4th-column `ID` in the association-windows file does not match the phenotype list `ID` | Make the `ID` columns match exactly. |\n", + "| susie_twas | `susie_twas: pass --region-name ` | No region selected | Pass `--region-name ` (e.g. `ENSG00000130538`) or a `--region-list`. |\n", "| susie_twas | `ERROR: One of the local workers has been killed` | Out-of-memory: too many regions in one process (`-j4`, or all 10 serially) | Use `-j1`; re-run heavy regions one at a time with a single-region phenotype list. |\n", - "| ensemble_twas_weight | `Missing upstream TWAS weights file` | Step 1 was not run for that region | Run `susie_twas` first so the `*.univariate_twas_weights.rds` exists. |\n", - "| both | Sample IDs do not overlap between genotype/covariates and phenotype | Genotype `.fam` IDs differ from the phenotype matrix sample names | Rename one side so they match (e.g. the gene-expression mode renames genotype/covariate IDs to `SAMPLE_001..049`). |" + "| mnm | `Target unavailable: ...qtl_dataset.rds` | `qtl_dataset_construct` not run first | Chain it: `qtl_dataset_construct+susie_twas`, then `mnm`. |\n", + "| mnm | `--phenotype-manifest missing required column(s): cond` | Single-context manifest used | Use the multi-context manifest (`pheno_manifest_multicontext.tsv`). |\n", + "| both | `No shared samples between phenotype and phenotype-covariate` | Covariates are in QTLtools (transposed) layout | Add `--transpose-covariates`. |\n", + "| both | Sample IDs do not overlap between genotype/covariates and phenotype | Genotype `.fam` IDs differ from the phenotype matrix sample names | Rename one side so they match (e.g. the gene-expression mode renames genotype/covariate IDs to `SAMPLE_001..060`). |" ] }, { @@ -1329,4 +1330,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file diff --git a/code/SoS/mnm_analysis/mnm_methods/rss_analysis.ipynb b/code/SoS/mnm_analysis/mnm_methods/rss_analysis.ipynb index 49d8a89a..88c18435 100644 --- a/code/SoS/mnm_analysis/mnm_methods/rss_analysis.ipynb +++ b/code/SoS/mnm_analysis/mnm_methods/rss_analysis.ipynb @@ -2,22 +2,24 @@ "cells": [ { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "kernel": "SoS" + }, "source": [ "# RSS Fine-mapping with GWAS Summary Statistics\n", "\n", - "End-to-end SuSiE-RSS fine-mapping over GWAS summary statistics, per study \u00d7 per LD block, using\n", + "End-to-end SuSiE-RSS fine-mapping over GWAS summary statistics, per study × per LD block, using\n", "the current pecotmr S4 API (`GwasSumStats` / `summaryStatsQc()` / `fineMappingPipeline()`). Four SoS\n", "steps:\n", "\n", - "1. **`[generate_manifest]`** \u2014 `gwas_rss_manifest.R` resolves `--gwas-meta` + `--gwas-tsv-list` against\n", - " `--region-list` + `--regions` into one manifest TSV (one row per study \u00d7 region).\n", - "2. **`[generate_gwas_sumstats]`** \u2014 `gwas_sumstats_construct.R` builds one `GwasSumStats` RDS per\n", - " study \u00d7 LD block, running `summaryStatsQc()` (optional SLALOM/DENTIST QC + RAISS imputation;\n", + "1. **`[generate_manifest]`** — `gwas_rss_manifest.R` resolves `--gwas-meta` + `--gwas-tsv-list` against\n", + " `--region-list` + `--regions` into one manifest TSV (one row per study × region).\n", + "2. **`[generate_gwas_sumstats]`** — `gwas_sumstats_construct.R` builds one `GwasSumStats` RDS per\n", + " study × LD block, running `summaryStatsQc()` (optional SLALOM/DENTIST QC + RAISS imputation;\n", " harmonization re-keys variants to the LD-panel id convention).\n", - "3. **`[gwas_fine_mapping]`** \u2014 `fine_mapping.R` dispatches to\n", + "3. **`[gwas_fine_mapping]`** — `fine_mapping.R` dispatches to\n", " `pecotmr::fineMappingPipeline(gwasSumStats, methods = ...)` (SuSiE-RSS).\n", - "4. **`[gwas_rss_plot]`** (opt-out via `--no-plot`) \u2014 `gwas_rss_plot.R` renders a PIP plot.\n", + "4. **`[gwas_rss_plot]`** (opt-out via `--no-plot`) — `gwas_rss_plot.R` renders a PIP plot.\n", "\n", "This replaces the legacy two-step workflow (`get_analysis_regions` + `univariate_rss`, which called the\n", "removed `rss_analysis_pipeline()`). The LD reference must be **genotype-backed** (PLINK2/PLINK1/GDS/VCF);\n", @@ -25,59 +27,83 @@ "\n", "## Inputs\n", "\n", - "- `--gwas-meta` \u2014 TSV with columns `study_id`, `path` (relative to the meta file or absolute), and\n", + "- `--gwas-meta` — TSV with columns `study_id`, `path` (relative to the meta file or absolute), and\n", " optional `column_mapping` (per-study YAML). One row per study.\n", - "- `--gwas-tsv-list S1=path1 ...` \u2014 explicit STUDY=PATH pairs (alternative to `--gwas-meta`; no\n", + "- `--gwas-tsv-list S1=path1 ...` — explicit STUDY=PATH pairs (alternative to `--gwas-meta`; no\n", " per-study column mapping).\n", - "- `--region-list` \u2014 BED-like TSV of LD blocks; and/or `--regions chr:start-end ...`.\n", - "- `--ld-meta` \u2014 LD-reference meta TSV (`#chr`, `start`, `end`, `path` -> genotype prefix). One LD block\n", + "- `--region-list` — BED-like TSV of LD blocks; and/or `--regions chr:start-end ...`.\n", + "- `--ld-meta` — LD-reference meta TSV (`#chr`, `start`, `end`, `path` -> genotype prefix). One LD block\n", " per row (or one row with `start=end=0` for a whole chromosome).\n", "\n", "## QC knobs (forwarded to `gwas_sumstats_construct.R` -> `summaryStatsQc()`)\n", "\n", - "- `--qc-method` \u2014 `none` (default), `slalom`, or `dentist`.\n", - "- `--impute` \u2014 enable RAISS sumstat imputation.\n", - "- `--qc-args '{\"key\":value,...}'` \u2014 JSON passthrough for any other `summaryStatsQc()` kwarg (e.g.\n", + "- `--qc-method` — `none` (default), `slalom`, or `dentist`.\n", + "- `--impute` — enable RAISS sumstat imputation.\n", + "- `--qc-args '{\"key\":value,...}'` — JSON passthrough for any other `summaryStatsQc()` kwarg (e.g.\n", " `{\"mafCutoff\":0.01,\"nCutoff\":0,\"alleleFlipKriging\":true}`).\n", "\n", "## Fine-mapping knobs (forwarded to `fine_mapping.R`)\n", "\n", "- `--methods` (default `susie`), `--coverage` (0.95), `--secondary-coverage` (`0.7,0.5`).\n", "- `--min-abs-corr` (0.8), `--median-abs-corr` (off; OR-logic purity), `--pip-cutoff` (0.025).\n", - "- `--L` (20) / `--L-greedy` (5) \u2014 SuSiE fit defaults; `--method-args '{\"susie\":{...}}'` overrides per key.\n", + "- `--L` (20) / `--L-greedy` (5) — SuSiE fit defaults; `--method-args '{\"susie\":{...}}'` overrides per key.\n", "\n", "## Outputs\n", "\n", "- `{cwd}/{manifest_name}.manifest.tsv`\n", "- `{cwd}/sumstats/{study}.{chrN_S_E}.gwas_sumstats.rds`\n", - "- `{cwd}/fine_mapping/{study}.{chrN_S_E}.gwas_finemap.rds` \u2014 one `GwasFineMappingResult` per (study, block).\n", + "- `{cwd}/fine_mapping/{study}.{chrN_S_E}.gwas_finemap.rds` — one `GwasFineMappingResult` per (study, block).\n", "- `{cwd}/plots/{study}.{chrN_S_E}.pip_plot.png` (unless `--no-plot`).\n" ] }, { "cell_type": "markdown", - "metadata": {}, + "metadata": { + "kernel": "SoS" + }, "source": [ "## Minimal working example\n", "\n", - "Genotype-LD MWE (AD_Bellenguez chr22 toy data under `input/`). There is no single combined workflow\n", - "name \u2014 chain the four steps with `+`:\n", - "\n", - "```bash\n", + "#### Genotype-LD MWE (AD_Bellenguez chr22 toy data under `input/`). There is no single combined workflow" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "kernel": "SoS" + }, + "outputs": [], + "source": [ + "cd /restricted/projectnb/xqtl/jaempawi/xqtl-protocol\n", "sos run pipeline/rss_analysis.ipynb \\\n", " generate_manifest+generate_gwas_sumstats+gwas_fine_mapping+gwas_rss_plot \\\n", " --cwd output/rss_analysis --modular-script-dir code/script \\\n", " --gwas-meta input/rss_analysis/protocol_example.rss_mwe.gwas_meta.tsv \\\n", " --regions chr22:49355984-50799822 \\\n", - " --ld-meta input/ld_reference/protocol_example.ld_meta_file.tsv\n", - "```\n", - "\n", - "This fine-maps one study \u00d7 one LD block and writes the manifest, the QC'd `GwasSumStats`, the\n", + " --ld-meta input/ld_reference/protocol_example.ld_meta_file.tsv" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "kernel": "SoS" + }, + "source": [ + "This fine-maps one study × one LD block and writes the manifest, the QC'd `GwasSumStats`, the\n", "`GwasFineMappingResult`, and a PIP plot under `output/rss_analysis/`.\n", "\n", - "With SLALOM QC + RAISS imputation + per-method tuning:\n", - "\n", - "```bash\n", + "#### With SLALOM QC + RAISS imputation + per-method tuning:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "kernel": "SoS" + }, + "outputs": [], + "source": [ "sos run pipeline/rss_analysis.ipynb \\\n", " generate_manifest+generate_gwas_sumstats+gwas_fine_mapping+gwas_rss_plot \\\n", " --cwd output/rss_analysis --modular-script-dir code/script \\\n", @@ -85,8 +111,16 @@ " --regions chr22:49355984-50799822 \\\n", " --ld-meta input/ld_reference/protocol_example.ld_meta_file.tsv \\\n", " --qc-method slalom --impute --qc-args '{\"mafCutoff\":0.01}' \\\n", - " --min-abs-corr 0.5 --method-args '{\"susie\":{\"L\":10}}'\n", - "```\n" + " --min-abs-corr 0.5 --method-args '{\"susie\":{\"L\":10}}'" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "kernel": "SoS" + }, + "source": [ + "### Command interface" ] }, { @@ -247,20 +281,22 @@ "file_extension": ".sos", "mimetype": "text/x-sos", "name": "sos", - "pygments_lexer": "python", - "sos": { - "kernels": [ - [ - "SoS", - "sos", - "", - "" - ] - ], - "version": "0.22.4" - } + "nbconvert_exporter": "sos_notebook.converter.SoS_Exporter", + "pygments_lexer": "sos" + }, + "sos": { + "kernels": [ + [ + "SoS", + "sos", + "sos", + "", + "" + ] + ], + "version": "" } }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +}