Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 47 additions & 47 deletions code/SoS/enrichment/sldsc_enrichment.ipynb

Large diffs are not rendered by default.

44 changes: 22 additions & 22 deletions code/SoS/mnm_analysis/multivariate_fine_mapping_vignette.ipynb

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@
"id": "c6344399",
"metadata": {},
"source": [
"- `{region_name}.{chr}_{region_id}.multigene_bvsr.rds` one RDS per region, the fitted multivariate (mvSuSiE / mr.mash) Bayesian variable-selection model jointly fine-mapping all genes in that region.\n",
"- `{region_name}.{chr}_{region_id}.multigene_bvsr.rds` \u2014 one RDS per region, the fitted multivariate (mvSuSiE / mr.mash) Bayesian variable-selection model jointly fine-mapping all genes in that region.\n",
"\n",
"Inspecting the object shows the per-region fit (posterior effect estimates, PVE, credible sets and convergence trace):\n",
"\n",
Expand Down Expand Up @@ -193,27 +193,20 @@
"metadata": {},
"outputs": [],
"source": [
"sos run pipeline/mnm_regression.ipynb mnm_genes \\\n",
" --name ROSMAP_Ast_mega_eQTL \\\n",
" --genoFile data/mnm_genes/ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.11.bed \\\n",
" --phenoFile data/mnm_genes/snuc_pseudo_bulk.Ast.mega.normalized.log2cpm.region_list.txt \\\n",
" --covFile data/mnm_genes/snuc_pseudo_bulk.Ast.mega.normalized.log2cpm.rosmap_cov.ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.snuc_pseudo_bulk_mega.related.plink_qc.extracted.pca.projected.Marchenko_PC.gz \\\n",
" --customized-association-windows data/mnm_genes/extended_TADB.bed \\\n",
" --phenotype-names Ast_mega_eQTL \\\n",
" --max-cv-variants 5000 \\\n",
" --ld_reference_meta_file data/ld_meta_file_with_bim.tsv \\\n",
" --independent_variant_list data/mnm_genes/ld_pruned_variants.txt.gz \\\n",
" --fine_mapping_meta data/mnm_genes/combined_data_updated.tsv \\\n",
" --phenoIDFile data/mnm_genes/phenoIDFile_extended_TADB.bed \\\n",
" --region-name chr11_77324757_82556425 \\\n",
" --skip-analysis-pip-cutoff 0 \\\n",
" --maf 0.01 \\\n",
" --coverage 0.95 \\\n",
" --pheno_id_map_file data/mnm_genes/pheno_id_map_file.txt \\\n",
" --prior-canonical-matrices \\\n",
" --twas-cv-folds 0 \\\n",
" --trans-analysis \\\n",
" --cwd output/mnm_regression/mnm_genes -s build"
"# Step 1: build the QtlDataset RDS for the joint-gene region.\n",
"sos run pipeline/qtl_dataset.ipynb qtl_dataset_construct \\\n",
" --cwd output/qtl_dataset \\\n",
" --study ROSMAP_Ast_mega_eQTL \\\n",
" --genotype-prefix data/mnm_genes/ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.11 \\\n",
" --phenotype-manifest data/mnm_genes/snuc_pseudo_bulk.Ast.mega.pheno_manifest.tsv\n",
"\n",
"# Step 2: per-region mvSuSiE joint fine-mapping. Use a region span that\n",
"# covers multiple genes (multi-trait joint mvSuSiE).\n",
"sos run pipeline/multivariate_fine_mapping.ipynb multivariate_fine_mapping \\\n",
" --cwd output/mnm_genes \\\n",
" --study ROSMAP_Ast_mega_eQTL \\\n",
" --qtl-dataset output/qtl_dataset/ROSMAP_Ast_mega_eQTL.qtl_dataset.rds \\\n",
" --regions chr11:65000000-67000000"
]
},
{
Expand Down Expand Up @@ -286,4 +279,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}
18 changes: 9 additions & 9 deletions code/SoS/mnm_analysis/summary_stats_finemapping_vignette.ipynb

Large diffs are not rendered by default.

34 changes: 18 additions & 16 deletions code/SoS/mnm_analysis/univariate_fine_mapping_fsusie_vignette.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -73,8 +73,8 @@
"id": "436a39bb-b182-49c0-9ec9-c4b899c8b684",
"metadata": {},
"source": [
"- `{name}.chr{N}_{start}_{end}.{n}_marks.dataset.rds` the per-region functional dataset (the epigenomic marks, residualized phenotype and genotype matrices) prepared for fSuSiE.\n",
"- `{name}.chr{N}_..._top_pc_weights.rds` the fitted fSuSiE result with top-PC TWAS weights, fine-mapping the functional (epigenomic) signal in that region.\n",
"- `{name}.chr{N}_{start}_{end}.{n}_marks.dataset.rds` \u2014 the per-region functional dataset (the epigenomic marks, residualized phenotype and genotype matrices) prepared for fSuSiE.\n",
"- `{name}.chr{N}_..._top_pc_weights.rds` \u2014 the fitted fSuSiE result with top-PC TWAS weights, fine-mapping the functional (epigenomic) signal in that region.\n",
"\n",
"Inspecting either object shows its internal structure:\n",
"```\n",
Expand Down Expand Up @@ -113,7 +113,7 @@
" .. .. ..- attr(*, \"names\")= chr [1:15857] \"chr7:139302775:AACACACACAC:AACACACACACAC\" \"chr7:139302775:AACACACACAC:AACACACACACACAC\" \"chr7:139304706:G:GGT\" \"chr7:139305695:G:T\" ...\n",
" ..$ grange : chr [1:2] \"139293693\" \"145380632\"\n",
" ..$ Y_coordinates :List of 1\n",
" .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':\t166 obs. of 3 variables:\n",
" .. ..$ :Classes \u2018tbl_df\u2019, \u2018tbl\u2019 and 'data.frame':\t166 obs. of 3 variables:\n",
" .. .. ..$ #chr : chr [1:166] \"chr7\" \"chr7\" \"chr7\" \"chr7\" ...\n",
" .. .. ..$ start: num [1:166] 1.39e+08 1.39e+08 1.39e+08 1.39e+08 1.39e+08 ...\n",
" .. .. ..$ end : num [1:166] 1.39e+08 1.39e+08 1.39e+08 1.39e+08 1.39e+08 ...\n",
Expand Down Expand Up @@ -146,7 +146,7 @@
" .. .. ..- attr(*, \"names\")= chr [1:5] \"user.self\" \"sys.self\" \"elapsed\" \"user.child\" ...\n",
" .. ..$ fsusie_result :List of 34\n",
" .. .. ..- attr(*, \"class\")= chr \"susiF\"\n",
" .. ..$ Y_coordinates :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':\t166 obs. of 3 variables:\n",
" .. ..$ Y_coordinates :Classes \u2018tbl_df\u2019, \u2018tbl\u2019 and 'data.frame':\t166 obs. of 3 variables:\n",
" .. ..$ fsusie_summary :List of 5\n",
" .. ..$ region_info :List of 3\n",
"```"
Expand Down Expand Up @@ -175,17 +175,19 @@
"metadata": {},
"outputs": [],
"source": [
"sos run pipeline/mnm_regression.ipynb fsusie \\\n",
" --cwd output/fsusie/ \\\n",
" --name Mic \\\n",
" --genoFile data/fsusie/mwe.genotype_by_chrom_files.txt \\\n",
" --phenoFile data/fsusie/mwe.pheno.region_list \\\n",
" --covFile data/fsusie/mwe.chr7_139293693_145380632.Marchenko_PC.anon.gz \\\n",
" --cis-window 0 --max-cv-variants 5000 \\\n",
" --susie_top_pc 0 --phenotype-names ROSMAP_Mic_snATACQTL --maf 0.01 \\\n",
" --save-data \\\n",
" --numThreads 8 \\\n",
" --post_processing \"none\" --small-sample-correction "
"# Step 1: build the QtlDataset RDS for the functional phenotype.\n",
"sos run pipeline/qtl_dataset.ipynb qtl_dataset_construct \\\n",
" --cwd output/qtl_dataset \\\n",
" --study Mic \\\n",
" --genotype-prefix data/fsusie/mwe.chr7 \\\n",
" --phenotype-manifest data/fsusie/mwe.pheno_manifest.tsv\n",
"\n",
"# Step 2: per-region fSuSiE joint fine-mapping over the bins in the LD block.\n",
"sos run pipeline/functional_fine_mapping.ipynb functional_fine_mapping \\\n",
" --cwd output/fsusie \\\n",
" --study Mic \\\n",
" --qtl-dataset output/qtl_dataset/Mic.qtl_dataset.rds \\\n",
" --regions chr7:139293693-145380632"
]
},
{
Expand Down Expand Up @@ -256,4 +258,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}
122 changes: 71 additions & 51 deletions code/SoS/mnm_analysis/univariate_fine_mapping_twas_vignette.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
"\n",
"This pipeline performs per-region univariate fine-mapping with SuSiE and computes TWAS\n",
"weights, via the `susie_twas` workflow of `pipeline/mnm_regression.ipynb`. For each\n",
"region the workflow fits SuSiE on the covariate-residualized phenotype × genotype\n",
"region the workflow fits SuSiE on the covariate-residualized phenotype \u00d7 genotype\n",
"matrix and then cross-validates five prediction models (**enet**, **lasso**,\n",
"**bayes_r**, **mrash**, **susie**) to pick TWAS weights.\n",
"\n",
Expand All @@ -32,10 +32,10 @@
"`start/end`). One `susie_twas` call fits one SuSiE + TWAS model per region.\n",
"\n",
"* **Gene-based phenotypes.** Manifest row = one gene; window = gene's containing TAD (or\n",
" ± cis-window around the TSS).\n",
" \u00b1 cis-window around the TSS).\n",
"* **Peak-based phenotypes (caQTL, mQTL).** Manifest row = one peak/CpG; window = the\n",
" peak's containing [extended-TAD](./../reference_data/generalized_TADB.html). A peak is\n",
" 2–5 kb and cannot serve as its own window.\n"
" 2\u20135 kb and cannot serve as its own window.\n"
]
},
{
Expand All @@ -53,9 +53,9 @@
"\n",
"The pipeline expects two derived files per experiment:\n",
"\n",
"* `pheno_manifest.tsv` 5 columns (`#chr start end ID path`), one row per region.\n",
"* `pheno_manifest.tsv` \u2014 5 columns (`#chr start end ID path`), one row per region.\n",
" `path` points to a bgzipped+tabixed phenotype BED restricted to that region.\n",
"* `association_windows.bed` 4 columns (`chr start end ID`), one row per region.\n",
"* `association_windows.bed` \u2014 4 columns (`chr start end ID`), one row per region.\n",
" `start/end` define the cis search window; `ID` must match a row in the manifest.\n",
"\n",
"For gene-based phenotypes these files are produced by the standard phenotype\n",
Expand Down Expand Up @@ -99,18 +99,27 @@
"metadata": {},
"outputs": [],
"source": [
"sos run pipeline/mnm_regression.ipynb susie_twas \\\n",
" --name test_susie_twas \\\n",
" --cwd output/mnm_regression/susie_twas \\\n",
" --genoFile output/genotype_by_chrom/wgs.merged.plink_qc.1.bed \\\n",
" --phenoFile output/phenotype/phenotype_by_chrom_for_cis/bulk_rnaseq.phenotype_by_chrom_files.region_list.txt \\\n",
" --covFile output/covariate/bulk_rnaseq_tmp_matrix.low_expression_filtered.outlier_removed.tmm.expression.covariates.wgs.merged.plink_qc.plink_qc.prune.pca.Marchenko_PC.gz \\\n",
" --customized-association-windows reference_data/TAD/TADB_enhanced_cis.bed \\\n",
" --phenotype-names test_pheno \\\n",
" --max-cv-variants 5000 \\\n",
" --ld_reference_meta_file data/ld_meta_file_with_bim.tsv \\\n",
" --region-name ENSG00000049246 ENSG00000054116 ENSG00000116678 \\\n",
" --save-data\n"
"# Step 1: build the QtlDataset RDS.\n",
"sos run pipeline/qtl_dataset.ipynb qtl_dataset_construct \\\n",
" --cwd output/qtl_dataset \\\n",
" --study test_susie_twas \\\n",
" --genotype-prefix output/genotype_by_chrom/wgs.merged.plink_qc.chr1 \\\n",
" --phenotype-manifest data/test_susie_twas.pheno_manifest.tsv\n",
"\n",
"# Step 2: per-region univariate SuSiE fine-mapping.\n",
"sos run pipeline/fine_mapping.ipynb fine_mapping \\\n",
" --cwd output/fine_mapping \\\n",
" --study test_susie_twas \\\n",
" --qtl-dataset output/qtl_dataset/test_susie_twas.qtl_dataset.rds \\\n",
" --regions chr1:1000000-2000000\n",
"\n",
"# Step 3: TWAS weights (default per-method panel + SR-TWAS ensemble inline).\n",
"sos run pipeline/twas_weights.ipynb twas_weights \\\n",
" --cwd output/twas_weights \\\n",
" --study test_susie_twas \\\n",
" --qtl-dataset output/qtl_dataset/test_susie_twas.qtl_dataset.rds \\\n",
" --regions chr1:1000000-2000000 \\\n",
" --fine-mapping-result output/fine_mapping/test_susie_twas.chr1_1000000_2000000.finemap.rds"
]
},
{
Expand All @@ -126,16 +135,24 @@
"metadata": {},
"outputs": [],
"source": [
"sos run pipeline/mnm_regression.ipynb susie_twas \\\n",
" --name example_10peaks_twas \\\n",
" --cwd output/susie_twas_out \\\n",
" --genoFile input/example.chr22.bed \\\n",
" --phenoFile output/inputs_ready/pheno_manifest.tsv \\\n",
" --covFile input/example_covariates.tsv \\\n",
" --customized-association-windows output/inputs_ready/association_windows.bed \\\n",
" --phenotype-names caQTL \\\n",
" --max-cv-variants 5000 \\\n",
" --save-data\n"
"# Smaller example: same chain, different data.\n",
"sos run pipeline/qtl_dataset.ipynb qtl_dataset_construct \\\n",
" --cwd output/qtl_dataset \\\n",
" --study example_10peaks_twas \\\n",
" --genotype-prefix input/example.chr22 \\\n",
" --phenotype-manifest output/inputs_ready/pheno_manifest.tsv\n",
"\n",
"sos run pipeline/fine_mapping.ipynb fine_mapping \\\n",
" --cwd output/fine_mapping \\\n",
" --study example_10peaks_twas \\\n",
" --qtl-dataset output/qtl_dataset/example_10peaks_twas.qtl_dataset.rds \\\n",
" --regions chr22:15000000-17000000\n",
"\n",
"sos run pipeline/twas_weights.ipynb twas_weights \\\n",
" --cwd output/twas_weights \\\n",
" --study example_10peaks_twas \\\n",
" --qtl-dataset output/qtl_dataset/example_10peaks_twas.qtl_dataset.rds \\\n",
" --regions chr22:15000000-17000000"
]
},
{
Expand All @@ -155,7 +172,7 @@
"...\n",
"INFO: susie_twas output: <cwd>/fine_mapping/<name>.<chrom>_<ID>.univariate_bvsr.rds\n",
" <cwd>/twas_weights/<name>.<chrom>_<ID>.univariate_twas_weights.rds\n",
" ... (2 × N_regions items)\n",
" ... (2 \u00d7 N_regions items)\n",
"INFO: Workflow susie_twas is executed successfully.\n",
"```\n"
]
Expand All @@ -173,7 +190,8 @@
"metadata": {},
"outputs": [],
"source": [
"sos run pipeline/mnm_regression.ipynb susie_twas -h\n"
"sos run pipeline/fine_mapping.ipynb -h\n",
"sos run pipeline/twas_weights.ipynb -h"
]
},
{
Expand All @@ -182,29 +200,29 @@
"source": [
"Flags consumed by `susie_twas`:\n",
"\n",
"* `--genoFile` path to a PLINK1 `.bed` file (pass the `.bed` path, not the prefix).\n",
"* `--genoFile` \u2014 path to a PLINK1 `.bed` file (pass the `.bed` path, not the prefix).\n",
" FID/IID in the `.fam` must match the phenotype/covariate sample IDs.\n",
"* `--phenoFile` 5-column `#chr start end ID path` manifest. One row per region; `path`\n",
"* `--phenoFile` \u2014 5-column `#chr start end ID path` manifest. One row per region; `path`\n",
" points at a bgzipped+tabixed phenotype BED restricted to that region. Pass multiple\n",
" files for multi-condition analyses.\n",
"* `--covFile` tab-delimited, literal `ID` header in column 1; covariate names in rows,\n",
"* `--covFile` \u2014 tab-delimited, literal `ID` header in column 1; covariate names in rows,\n",
" sample IDs across the header row.\n",
"* `--customized-association-windows` 4-column `chr start end ID` BED. Required unless\n",
"* `--customized-association-windows` \u2014 4-column `chr start end ID` BED. Required unless\n",
" `--cis-window` is non-negative.\n",
"* `--cis-window` symmetric cis radius (bp) around each region's `start/end`. Default\n",
"* `--cis-window` \u2014 symmetric cis radius (bp) around each region's `start/end`. Default\n",
" `-1` (forces the customized-windows path).\n",
"* `--phenotype-names` condition name(s), one per `--phenoFile`.\n",
"* `--max-cv-variants` cap on variants used in cross-validation (default 5,000).\n",
"* `--ld_reference_meta_file` summary-stats / RSS mode only; omit for individual-level.\n",
"* `--region-name` restrict to one or more region IDs for sanity checks.\n",
"* `--save-data` also emit `*.univariate_data.rds` with the residualized X/Y matrices.\n"
"* `--phenotype-names` \u2014 condition name(s), one per `--phenoFile`.\n",
"* `--max-cv-variants` \u2014 cap on variants used in cross-validation (default 5,000).\n",
"* `--ld_reference_meta_file` \u2014 summary-stats / RSS mode only; omit for individual-level.\n",
"* `--region-name` \u2014 restrict to one or more region IDs for sanity checks.\n",
"* `--save-data` \u2014 also emit `*.univariate_data.rds` with the residualized X/Y matrices.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Output format\n\nAll results are written under `--cwd` (e.g. `output/mnm_regression/susie_twas/twas_weights/`), one set of files per region:\n\n- `{name}.{chr}_{region_id}.univariate_twas_weights.rds` the fitted SuSiE fine-mapping result plus the trained TWAS weights for that region/gene; this is the main output consumed by downstream TWAS.\n- a companion fine-mapping RDS keyed by region `ID` and phenotype condition (and a third data RDS when `--save-data` is set), holding the per-region SuSiE posterior summaries and credible sets.\n"
"## Output format\n\nAll results are written under `--cwd` (e.g. `output/mnm_regression/susie_twas/twas_weights/`), one set of files per region:\n\n- `{name}.{chr}_{region_id}.univariate_twas_weights.rds` \u2014 the fitted SuSiE fine-mapping result plus the trained TWAS weights for that region/gene; this is the main output consumed by downstream TWAS.\n- a companion fine-mapping RDS keyed by region `ID` and phenotype condition (and a third data RDS when `--save-data` is set), holding the per-region SuSiE posterior summaries and credible sets.\n"
]
},
{
Expand All @@ -213,7 +231,7 @@
"source": [
"## Anticipated Results\n",
"\n",
"Quick sanity check inspect cross-validation performance for the first region:"
"Quick sanity check \u2014 inspect cross-validation performance for the first region:"
]
},
{
Expand All @@ -222,12 +240,14 @@
"metadata": {},
"outputs": [],
"source": [
"# Inspect the new S4 TwasWeights collection via the accessor surface:\n",
"Rscript -e '\n",
"fp <- \"output/mnm_regression/susie_twas/twas_weights/test_susie_twas.chr1_ENSG00000049246.univariate_twas_weights.rds\"\n",
"x <- readRDS(fp)\n",
"cat(\"methods:\", names(x[[1]][[1]]$twas_weights), \"\\n\")\n",
"print(x[[1]][[1]]$twas_cv_result$performance)\n",
"'\n"
"suppressPackageStartupMessages(library(pecotmr))\n",
"tw <- readRDS(\"output/twas_weights/test_susie_twas.chr1_1000000_2000000.twas_weights.rds\")\n",
"cat(\"methods present:\", paste(unique(as.character(tw$method)), collapse = \", \"), \"\\n\")\n",
"cat(\"rows:\", nrow(tw), \"\\n\")\n",
"print(getR2(tw)) # per-method CV R^2 for this region\n",
"'"
]
},
{
Expand All @@ -236,16 +256,16 @@
"source": [
"Interpretation:\n",
"\n",
"* `twas_weights$<method>` use for TWAS (`Z weights · LD · GWAS z-scores`).\n",
"* `twas_cv_result$performance` pick the method with highest `adj_rsq` before applying.\n",
"* `top_loci` in `univariate_bvsr.rds` lead variants and credible-set coverages.\n",
"* `susie_result_trimmed$sets$cs` indices of variants in each 95% credible set.\n",
"* `twas_weights$<method>` \u2014 use for TWAS (`Z \u2248 weights \u00b7 LD \u00b7 GWAS z-scores`).\n",
"* `twas_cv_result$performance` \u2014 pick the method with highest `adj_rsq` before applying.\n",
"* `top_loci` in `univariate_bvsr.rds` \u2014 lead variants and credible-set coverages.\n",
"* `susie_result_trimmed$sets$cs` \u2014 indices of variants in each 95% credible set.\n",
"\n",
"### Common pitfalls\n",
"\n",
"* **Sample-ID drift.** The phenotype BED's header columns (5+), the bfile FID/IID, and\n",
" the covariate TSV header must all be drawn from the same set. Sos does not warn on\n",
" missing samples it silently drops them and may report zero overlap.\n",
" missing samples \u2014 it silently drops them and may report zero overlap.\n",
"* **Empty regions.** If a region has no variants in its association window (e.g. the\n",
" manifest points at a window with no bfile coverage) sos marks it completed with an\n",
" empty output. Verify the expected file count in the `INFO: susie_twas output:` line.\n",
Expand Down
Loading
Loading