Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions code/SoS/pecotmr_integration/colocboost.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"kernel": "SoS"
},
"source": [
"# Per-gene colocboost across contexts in a single study\n\n## Description\n\nFor a single study's `QtlDataset` RDS, run `pecotmr::colocboostPipeline()` per focal gene. Each task colocalizes the focal gene's signal across the QtlDataset's contexts (the within-study, cross-context case \u2014 no GWAS input). Gene-level parallelization is via SoS task fan-out: every task loads the same RDS and runs one focal gene.\n\n## Inputs\n\n- `--qtl-dataset` — path to the QtlDataset RDS produced by `qtl_dataset.ipynb`.\n- `--genes` — explicit gene-ID list (one focal gene per task).\n- `--cis-window` — bp window around each focal gene's TSS for variant selection. Default 1,000,000.\n\n## Output\n\n- `{cwd}/{study}.{gene}.colocboost.rds` per gene.\n"
"# Per-gene / per-region colocboost across contexts (and optionally GWAS) in a single study\n\n## Description\n\nFor a single study's `QtlDataset` RDS, run `pecotmr::colocboostPipeline()` per fan-out unit (gene or region). Three pipeline variants can be enabled independently:\n\n- `--xqtl-coloc` (default ON; opt out with `--no-xqtl-coloc=true` or `xqtl_coloc=False` in SoS): within-study cross-context QTL coloc.\n- `--joint-gwas` (default OFF): xQTL + GWAS joint coloc with the focal outcome from the GWAS. Requires `--gwas-sumstats`.\n- `--separate-gwas` (default OFF): one focal-GWAS coloc per GWAS study. Requires `--gwas-sumstats`.\n\nWhen `--gwas-sumstats` is supplied it must point at a QC'd `GwasSumStats` RDS (output of `summaryStatsQc()` upstream).\n\n## Inputs\n\n- `--qtl-dataset` — QtlDataset RDS produced by `qtl_dataset.ipynb`.\n- `--genes ID1 ID2 \u2026` **or** `--regions chr:start-end \u2026` (mutually exclusive).\n- `--cis-window` — bp window (gene mode default 1e6).\n- `--gwas-sumstats` — QC'd GwasSumStats RDS path (required for joint/separate variants).\n- `--xqtl-coloc` / `--joint-gwas` / `--separate-gwas` — pipeline-variant toggles.\n- `--modular-script-dir` — directory containing the per-gene worker R scripts. Default `code/script`.\n\n## Output\n\n- `{cwd}/{study}.{gene|region}.colocboost.rds` per fan-out unit. The RDS holds the colocboostPipeline list with slots `xqtl_coloc`, `joint_gwas`, `separate_gwas`, `computing_time`; unused-variant slots are NULL.\n"
]
},
{
Expand All @@ -15,7 +15,7 @@
"kernel": "SoS"
},
"source": [
"## Example\n\n```bash\nsos run pipeline/colocboost.ipynb colocboost \\\n --cwd output \\\n --study ROSMAP_DLPFC \\\n --qtl-dataset output/ROSMAP_DLPFC.qtl_dataset.rds \\\n --genes ENSG00000060237 ENSG00000234593 \\\n --cis-window 1000000\n```\n"
"## Example\n\n```bash\n# xQTL-only (default)\nsos run pipeline/colocboost.ipynb colocboost \\\n --cwd output --modular-script-dir /path/to/code/script \\\n --study TEST_STUDY \\\n --qtl-dataset output/TEST_STUDY.qtl_dataset.rds \\\n --genes ENSG00000060237\n\n# xQTL + joint-GWAS\nsos run pipeline/colocboost.ipynb colocboost \\\n --cwd output --modular-script-dir /path/to/code/script \\\n --study TEST_STUDY \\\n --qtl-dataset output/TEST_STUDY.qtl_dataset.rds \\\n --gwas-sumstats output/GWAS.qcd.rds \\\n --joint-gwas True \\\n --genes ENSG00000060237\n```\n"
]
},
{
Expand All @@ -26,7 +26,7 @@
},
"outputs": [],
"source": [
"[global]\nparameter: cwd = path('output')\nparameter: study = str\nparameter: qtl_dataset = path\nparameter: genes = []\nparameter: cis_window = 1000000\nparameter: container = ''\nparameter: job_size = 1\nparameter: walltime = '1h'\nparameter: mem = '16G'\nparameter: numThreads = 1\n"
"[global]\nparameter: cwd = path('output')\nparameter: study = str\nparameter: qtl_dataset = path\nparameter: modular_script_dir = path('code/script')\n# Mutually exclusive fan-out sources.\nparameter: genes = []\nparameter: regions = []\nparameter: cis_window = 1000000\n# GWAS-side input: a QC'd GwasSumStats RDS (build it upstream).\nparameter: gwas_sumstats = path('.')\n# Pipeline variants.\nparameter: xqtl_coloc = True # within-study cross-context (default ON)\nparameter: joint_gwas = False # xQTL + GWAS joint (requires gwas_sumstats)\nparameter: separate_gwas = False # per-GWAS-study separate-focal (requires gwas_sumstats)\nparameter: container = ''\nparameter: job_size = 1\nparameter: walltime = '30m'\nparameter: mem = '8G'\nparameter: numThreads = 1\n"
]
},
{
Expand All @@ -37,7 +37,7 @@
},
"outputs": [],
"source": [
"[colocboost]\ninput: qtl_dataset, for_each = 'genes'\noutput: f\"{cwd}/{study}.{_genes}.colocboost.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n Rscript ${path(\"code/script/pecotmr_integration/colocboost.R\", absolute=True)} \\\n --qtl-dataset ${_input} \\\n --gene-id ${_genes} \\\n --cis-window ${cis_window} \\\n --output ${_output}\n"
"[colocboost]\nif bool(genes) == bool(regions):\n raise ValueError(\"Specify exactly one of --genes (trait IDs) or --regions (chr:start-end strings).\")\nif (joint_gwas or separate_gwas) and not gwas_sumstats.is_file():\n raise ValueError(\"--joint-gwas / --separate-gwas require --gwas-sumstats pointing at a QC'd GwasSumStats RDS.\")\nif not (xqtl_coloc or joint_gwas or separate_gwas):\n raise ValueError(\"Enable at least one of xqtl_coloc, joint_gwas, separate_gwas.\")\nfanout_items = genes if genes else regions\nfanout_kind = 'gene' if genes else 'region'\ndef _fanout_safe(s):\n return s.replace(':', '_').replace('-', '_')\ninput: qtl_dataset, for_each = 'fanout_items'\noutput: f\"{cwd}/{study}.{_fanout_safe(_fanout_items)}.colocboost.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n Rscript ${modular_script_dir}/pecotmr_integration/colocboost.R \\\n --qtl-dataset ${_input} \\\n ${('--gene-id ' + _fanout_items + ' --cis-window ' + str(cis_window)) if fanout_kind == 'gene' else ('--region ' + _fanout_items)} \\\n ${'--gwas-sumstats ' + str(gwas_sumstats) if gwas_sumstats.is_file() else ''} \\\n ${'--no-xqtl-coloc' if not xqtl_coloc else ''} \\\n ${'--joint-gwas' if joint_gwas else ''} \\\n ${'--separate-gwas' if separate_gwas else ''} \\\n --output ${_output}\n"
]
}
],
Expand Down
102 changes: 102 additions & 0 deletions code/SoS/pecotmr_integration/ctwas.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"kernel": "SoS"
},
"source": [
"# cTWAS via the three-step pecotmr API\n\n## Description\n\nDrives a multi-LD-block cTWAS analysis through `pecotmr`'s split pipeline:\n\n- `[ctwas_1]` — `assembleCtwasInputs`: consume the per-region manifest, load each block's `GwasSumStats` + per-gene `TwasWeights` RDS files, produce the assembled ctwas inputs (`z_snp`, `weights`, `region_info`, `snp_map`, `LD_map`, LD/snpInfo loader closures).\n- `[ctwas_2]` — `estCtwasParam`: estimate the joint group prior + prior variance (`ctwas::est_param`). `--fallback-to-prefit` recovers from accurate-EM NaN divergence by re-running the prefit step only (mirrors the legacy ctwas_2 workaround).\n- `[ctwas_3]` — `screenCtwasRegions` + `finemapCtwasRegions`: screen regions and fine-map causal genes; optional `--param-override` RDS lets the caller substitute hand-tuned priors before screening.\n\nThe three steps chain via intermediate RDS files (`{name}.ctwas_inputs.rds`, `{name}.ctwas_est.rds`, `{name}.ctwas_finemap.rds`) so individual stages can be re-run independently.\n\n## Inputs\n\n- `--manifest` — per-region manifest TSV with columns:\n - `region_id` (unique string)\n - `gwas_sumstats_rds` (path to per-block `GwasSumStats` RDS from `gwas_sumstats_construct.R`)\n - `twas_weights_rds` (comma-separated per-gene `TwasWeights` RDS paths for that block; may be empty for SNP-only blocks)\n - `fine_mapping_result_rds` (optional, comma-separated)\n- Optional weight pre-filters (`--twas-weight-cutoff`, `--cs-min-cor`, `--min-pip-cutoff`, `--max-num-variants`).\n- Optional `--twas-z` precomputed TWAS-Z `GRanges` RDS (output of `twas.ipynb`).\n- Step 2 knobs: `--thin`, `--niter-prefit`, `--niter`, `--min-group-size`, `--min-p-single-effect`, `--fallback-to-prefit`.\n- Step 3 knobs: `--L`, `--no-filter-L`, `--min-nonsnp-pip`, `--param-override`.\n\n## Output\n\n- `{cwd}/{name}.ctwas_inputs.rds` (step 1)\n- `{cwd}/{name}.ctwas_est.rds` (step 2)\n- `{cwd}/{name}.ctwas_finemap.rds` (step 3; ctwas_sumstats-shape result)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"kernel": "SoS"
},
"source": [
"## Example\n\n```bash\nsos run pipeline/ctwas.ipynb ctwas_3 \\\n --cwd output --modular-script-dir /path/to/code/script \\\n --name protocol_example_twas_chr22 \\\n --manifest /tmp/gwas_smoke/ctwas_manifest_blessed.tsv \\\n --fallback-to-prefit \\\n --min-group-size 1\n```\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"kernel": "SoS"
},
"outputs": [],
"source": [
"[global]\nparameter: cwd = path('output')\nparameter: name = str\nparameter: manifest = path\n# --- step 1 (assemble) -----------------------------------------------\nparameter: method = ''\nparameter: twas_z = path('.')\nparameter: twas_weight_cutoff = 0.0\nparameter: cs_min_cor = 0.8\nparameter: min_pip_cutoff = 0.0\nparameter: max_num_variants = 'Inf'\n# --- step 2 (est_param) ----------------------------------------------\nparameter: thin = 0.1\nparameter: niter_prefit = 3\nparameter: niter = 30\nparameter: group_prior_var_structure = 'shared_type'\nparameter: min_group_size = 100\nparameter: min_p_single_effect = 0.8\nparameter: fallback_to_prefit = False\n# --- step 3 (screen + finemap) ---------------------------------------\nparameter: L = 5\nparameter: no_filter_L = False\nparameter: min_nonsnp_pip = 0.5\nparameter: param_override = path('.')\n# --- infrastructure --------------------------------------------------\nparameter: modular_script_dir = path('code/script')\nparameter: container = ''\nparameter: job_size = 1\nparameter: walltime = '1h'\nparameter: mem = '16G'\nparameter: numThreads = 1\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"kernel": "SoS"
},
"outputs": [],
"source": [
"[ctwas_1 (assemble inputs)]\ninput: manifest\noutput: f\"{cwd}/{name}.ctwas_inputs.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n Rscript ${modular_script_dir}/pecotmr_integration/ctwas_assemble.R \\\n --manifest ${_input} \\\n ${'--method ' + method if method else ''} \\\n ${'--twas-z ' + str(twas_z) if twas_z.is_file() else ''} \\\n --twas-weight-cutoff ${twas_weight_cutoff} \\\n --cs-min-cor ${cs_min_cor} \\\n --min-pip-cutoff ${min_pip_cutoff} \\\n --max-num-variants ${max_num_variants} \\\n --output ${_output}\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"kernel": "SoS"
},
"outputs": [],
"source": [
"[ctwas_2 (estimate priors)]\noutput: f\"{cwd}/{name}.ctwas_est.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n Rscript ${modular_script_dir}/pecotmr_integration/ctwas_est.R \\\n --inputs ${_input} \\\n --thin ${thin} \\\n --niter-prefit ${niter_prefit} \\\n --niter ${niter} \\\n --group-prior-var-structure ${group_prior_var_structure} \\\n --min-group-size ${min_group_size} \\\n --min-p-single-effect ${min_p_single_effect} \\\n ${'--fallback-to-prefit' if fallback_to_prefit else ''} \\\n --ncore ${numThreads} \\\n --output ${_output}\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"kernel": "SoS"
},
"outputs": [],
"source": [
"[ctwas_3 (screen + finemap)]\noutput: f\"{cwd}/{name}.ctwas_finemap.rds\"\ntask: trunk_workers = 1, trunk_size = job_size, walltime = walltime, mem = mem, cores = numThreads, tags = f\"{step_name}_{_output:bn}\"\nbash: expand = '${ }', stderr = f\"{_output}.stderr\", stdout = f\"{_output}.stdout\", container = container\n Rscript ${modular_script_dir}/pecotmr_integration/ctwas_finemap.R \\\n --est ${_input} \\\n ${'--param-override ' + str(param_override) if param_override.is_file() else ''} \\\n --L ${L} \\\n ${'--no-filter-L' if no_filter_L else ''} \\\n --min-nonsnp-pip ${min_nonsnp_pip} \\\n --ncore ${numThreads} \\\n --output ${_output}\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "SoS",
"language": "sos",
"name": "sos"
},
"language_info": {
"codemirror_mode": "sos",
"file_extension": ".sos",
"mimetype": "text/x-sos",
"name": "sos",
"nbconvert_exporter": "sos_notebook.converter.SoS_Exporter",
"pygments_lexer": "sos"
},
"sos": {
"kernels": [
[
"Bash",
"bash",
"Bash",
"#E6EEFF",
"shell"
],
[
"SoS",
"sos",
"",
"",
"sos"
]
],
"version": "0.24.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading