diff --git a/code/SoS/enrichment/sldsc_enrichment.ipynb b/code/SoS/enrichment/sldsc_enrichment.ipynb index e022ec8a..8b352789 100644 --- a/code/SoS/enrichment/sldsc_enrichment.ipynb +++ b/code/SoS/enrichment/sldsc_enrichment.ipynb @@ -10,7 +10,7 @@ "\n", "Minimal working-example driver for the S-LDSC functional-enrichment pipeline. The **Steps** section below gives one ready-to-run `sos run` command per workflow, using the toy inputs symlinked under `input/`.\n", "\n", - "> **Environment note.** Steps 1\u20132 (`make_annotation_files_ldscore`, `get_heritability`) wrap the external **polyfun** toolkit (`compute_ldscores.py`, `ldsc.py`, `munge_polyfun_sumstats.py`) and require pre-computed reference-panel files (baseline-LD scores, LD weights, `.frq`, and PLINK `.bed/.bim/.fam`). polyfun is **not installed in this environment** and the reference panel is not shipped with the toy example, so those two steps cannot be executed here; their commands are provided for use on a system where polyfun and a matching panel are available. Steps 3\u20134 (`postprocess`, `meta_subset`) use `pecotmr::sldsc_postprocessing_pipeline` (available here) and read the `.results`/`.log` files produced by Step 2.\n" + "> **Environment note.** Steps 1–2 (`make_annotation_files_ldscore`, `get_heritability`) wrap the external **polyfun** toolkit (`compute_ldscores.py`, `ldsc.py`, `munge_polyfun_sumstats.py`) and require pre-computed reference-panel files (baseline-LD scores, LD weights, `.frq`, and PLINK `.bed/.bim/.fam`). polyfun is **not installed in this environment** and the reference panel is not shipped with the toy example, so those two steps cannot be executed here; their commands are provided for use on a system where polyfun and a matching panel are available. Steps 3–4 (`postprocess`, `meta_subset`) use `pecotmr::sldsc_postprocessing_pipeline` (available here) and read the `.results`/`.log` files produced by Step 2.\n" ] }, { @@ -137,8 +137,8 @@ "\n", "The pipeline has two computational layers:\n", "\n", - "- **Regression layer** \u2014 the S-LDSC regression itself, performed by the [polyfun](https://github.com/omerwe/polyfun) engine. We do not re-implement this.\n", - "- **Post-processing layer** \u2014 standardization, differential per-SNP heritability, binary/continuous detection, and random-effects meta-analysis across traits. Implemented in the [`pecotmr`](https://github.com/StatFunGen/pecotmr) R package (`R/sldsc_wrapper.R`).\n", + "- **Regression layer** — the S-LDSC regression itself, performed by the [polyfun](https://github.com/omerwe/polyfun) engine. We do not re-implement this.\n", + "- **Post-processing layer** — standardization, differential per-SNP heritability, binary/continuous detection, and random-effects meta-analysis across traits. Implemented in the [`pecotmr`](https://github.com/StatFunGen/pecotmr) R package (`R/sldsc_wrapper.R`).\n", "\n", "The notation below tags each modeling quantity as **(polyfun)** or **(pecotmr)**.\n", "\n", @@ -151,7 +151,7 @@ "\n", "#### Reference panel and MAF cutoff\n", "\n", - "All LD-derived quantities \u2014 partitioned LD scores for the 97 baseline annotations and for our $K$ target annotations, the LD-score-regression weights, allele frequencies, and the SNP set \u2014 are computed against our own LD reference panel. We do not mix in pre-computed quantities from external panels (e.g. 1000G); $M_{\\mathrm{ref}}$ throughout this notebook denotes the number of common SNPs in our panel.\n", + "All LD-derived quantities — partitioned LD scores for the 97 baseline annotations and for our $K$ target annotations, the LD-score-regression weights, allele frequencies, and the SNP set — are computed against our own LD reference panel. We do not mix in pre-computed quantities from external panels (e.g. 1000G); $M_{\\mathrm{ref}}$ throughout this notebook denotes the number of common SNPs in our panel.\n", "\n", "By default we restrict to MAF $> 5\\%$ per the sLDSC recommendation: rare-variant LD is unstable and HapMap3-style regression weights are common-variant by construction. The cutoff is exposed as the SoS parameter `maf_cutoff` (default $0.05$); the regression, the standardized $sd_C$, and $M_{\\mathrm{ref}}$ are all evaluated on the same MAF $>$ cutoff SNP set. If allele-frequency files are not available the pipeline fails; the user must explicitly set `maf_cutoff = 0` to opt out (not recommended).\n", "\n", @@ -159,30 +159,30 @@ "\n", "Solving Equation (2) jointly across annotations, with 200-block genomic jackknife for inference, is performed by polyfun's `ldsc.py`. From each polyfun run we obtain, per annotation:\n", "\n", - "- $\\tau_C$ and its standard error \u2014 **(polyfun)**.\n", - "- $\\pi^{h^2}_C$ and $\\pi^{M}_C$ \u2014 **(polyfun)**.\n", - "- $E_C = \\pi^{h^2}_C / \\pi^{M}_C$ and its standard error \u2014 **(polyfun)**.\n", - "- The p-value of the differential per-SNP heritability test (defined below) \u2014 **(polyfun)**, computed internally with the full coefficient covariance matrix.\n", + "- $\\tau_C$ and its standard error — **(polyfun)**.\n", + "- $\\pi^{h^2}_C$ and $\\pi^{M}_C$ — **(polyfun)**.\n", + "- $E_C = \\pi^{h^2}_C / \\pi^{M}_C$ and its standard error — **(polyfun)**.\n", + "- The p-value of the differential per-SNP heritability test (defined below) — **(polyfun)**, computed internally with the full coefficient covariance matrix.\n", "\n", "We also obtain, per run:\n", "\n", - "- The total trait heritability $h^2_g$ \u2014 **(polyfun)**.\n", - "- The 200-block jackknife delete-values of $\\tau_C$ \u2014 **(polyfun)**.\n", + "- The total trait heritability $h^2_g$ — **(polyfun)**.\n", + "- The 200-block jackknife delete-values of $\\tau_C$ — **(polyfun)**.\n", "\n", "#### Quantities from the post-processing layer (pecotmr)\n", "\n", "From the polyfun outputs above plus our reference panel, the post-processing layer computes:\n", "\n", - "- $sd_C$ \u2014 per-annotation standard deviation over MAF $>$ cutoff SNPs \u2014 **(pecotmr: `compute_sldsc_annot_sd`)**.\n", - "- $M_{\\mathrm{ref}}$ \u2014 reference SNP count at the MAF cutoff \u2014 **(pecotmr: `compute_sldsc_M_ref`)**.\n", - "- Whether each annotation is binary or continuous \u2014 **(pecotmr: `is_binary_sldsc_annot`)**.\n", - "- $\\tau^*_C$ point estimate and per-block $\\tau^*_C$ \u2014 **(pecotmr: `standardize_sldsc_trait`)**.\n", - "- EnrichStat point estimate and its standard error (formula below) \u2014 **(pecotmr: `standardize_sldsc_trait`)**.\n", - "- DerSimonian-Laird random-effects meta-analysis of $\\tau^*_C$, $E_C$, or EnrichStat across traits \u2014 **(pecotmr: `meta_sldsc_random`)**.\n", + "- $sd_C$ — per-annotation standard deviation over MAF $>$ cutoff SNPs — **(pecotmr: `compute_sldsc_annot_sd`)**.\n", + "- $M_{\\mathrm{ref}}$ — reference SNP count at the MAF cutoff — **(pecotmr: `compute_sldsc_M_ref`)**.\n", + "- Whether each annotation is binary or continuous — **(pecotmr: `is_binary_sldsc_annot`)**.\n", + "- $\\tau^*_C$ point estimate and per-block $\\tau^*_C$ — **(pecotmr: `standardize_sldsc_trait`)**.\n", + "- EnrichStat point estimate and its standard error (formula below) — **(pecotmr: `standardize_sldsc_trait`)**.\n", + "- DerSimonian-Laird random-effects meta-analysis of $\\tau^*_C$, $E_C$, or EnrichStat across traits — **(pecotmr: `meta_sldsc_random`)**.\n", "\n", "The top-level entry point `pecotmr::sldsc_postprocessing_pipeline` orchestrates all of the above.\n", "\n", - "#### Standardized tau ($\\tau^*$) \u2014 (pecotmr)\n", + "#### Standardized tau ($\\tau^*$) — (pecotmr)\n", "\n", "$\\tau_C$ has units that depend on the scale of the annotation and on the total heritability of the trait, so raw $\\tau$ is not directly comparable across annotations or across traits. We compute the standardized version (Gazal et al. 2017)\n", "\n", @@ -194,7 +194,7 @@ "$$SE^{\\text{jackknife}}(\\tau^*_C) \\;=\\; \\sqrt{\\,\\tfrac{(B-1)^2}{B}\\, \\mathrm{Var}_b(\\tau^*_{C,(b)})\\,}$$\n", "with $B = 200$, used as the per-trait input to cross-trait meta-analysis.\n", "\n", - "#### Differential per-SNP heritability (\"EnrichStat\") \u2014 (polyfun + pecotmr)\n", + "#### Differential per-SNP heritability (\"EnrichStat\") — (polyfun + pecotmr)\n", "\n", "To test whether the per-SNP heritability *inside* annotation $C$ differs from *outside* it (Finucane et al. 2015):\n", "\n", @@ -206,7 +206,7 @@ "\n", "This per-trait point + SE is the input to cross-trait meta-analysis.\n", "\n", - "#### Reporting: binary vs. continuous annotations \u2014 (pecotmr)\n", + "#### Reporting: binary vs. continuous annotations — (pecotmr)\n", "\n", "The estimation machinery applies to both annotation types, but the *headline* quantity to report **within each type** differs.\n", "\n", @@ -222,17 +222,17 @@ ">\n", "> The pipeline always passes `--print-coefficients` to polyfun for this reason.\n", "\n", - "#### Cross-type comparison: always use $\\tau^*_C$ \u2014 (pecotmr)\n", + "#### Cross-type comparison: always use $\\tau^*_C$ — (pecotmr)\n", "\n", - "For an apple-to-apple comparison **across binary and continuous annotations** \u2014 ranking annotations on a single axis, meta-analyzing a mixed set, or reporting a leaderboard that pools both types \u2014 use $\\tau^*_C$. The standardization in Gazal et al. (2017) was designed for exactly this purpose: $sd_C = \\sqrt{p(1-p)}$ for a binary annotation (where $p$ is the proportion in the category) and $sd_C = $ empirical standard deviation for a continuous annotation, so the resulting $\\tau^*_C$ is dimensionless and has the same interpretation in both cases \u2014 additive change in per-SNP heritability per 1 SD increase in the annotation, normalized by the average per-SNP heritability. $E_C$ does not have this property and must not be compared across types.\n", + "For an apple-to-apple comparison **across binary and continuous annotations** — ranking annotations on a single axis, meta-analyzing a mixed set, or reporting a leaderboard that pools both types — use $\\tau^*_C$. The standardization in Gazal et al. (2017) was designed for exactly this purpose: $sd_C = \\sqrt{p(1-p)}$ for a binary annotation (where $p$ is the proportion in the category) and $sd_C = $ empirical standard deviation for a continuous annotation, so the resulting $\\tau^*_C$ is dimensionless and has the same interpretation in both cases — additive change in per-SNP heritability per 1 SD increase in the annotation, normalized by the average per-SNP heritability. $E_C$ does not have this property and must not be compared across types.\n", "\n", "The pipeline emits both $E_C$ and $\\tau^*_C$ for every annotation, with the binary/continuous flag, so callers can pick the right column for the comparison they are making.\n", "\n", - "#### Joint analysis \u2014 (polyfun runs the regression; pecotmr standardizes both modes)\n", + "#### Joint analysis — (polyfun runs the regression; pecotmr standardizes both modes)\n", "\n", "For **joint analysis** (multiple annotations fit together), both $\\tau$ and $E$ are conditional on the other annotations in the model. We report joint $\\tau^*_C$ as the independent contribution of annotation $C$ after controlling for the others. The annotation-prep step exposes two independent toggles, `compute_single` and `compute_joint` (both default `True`), so the user can produce the $N$ single-target outputs, the joint output, or both in one invocation. With both defaults the post-processing layer reads all $N+1$ regression outputs per trait and presents single + joint side-by-side. When the joint subset is decided after looking at single-target results (exploratory $\\rightarrow$ conditional workflow), the user runs the annotation-prep step a second time with `compute_single=False` on the curated subset.\n", "\n", - "### Meta-Analysis across Traits (Random Effects) \u2014 (pecotmr)\n", + "### Meta-Analysis across Traits (Random Effects) — (pecotmr)\n", "\n", "DerSimonian-Laird random-effects meta-analysis of per-annotation estimates across traits, implemented in `pecotmr::meta_sldsc_random` (which delegates the numerics to `rmeta::meta.summaries(..., method = \"random\")`):\n", "\n", @@ -322,11 +322,45 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 9, "metadata": { "kernel": "Bash" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/restricted/projectnb/xqtl/jaempawi/.pixi/envs/python/lib/python3.12/site-packages/sos/targets.py:22: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.\n", + " import pkg_resources\n", + "INFO: Running \u001b[32mmake_annotation_files_ldscore\u001b[0m: \n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=1) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=3) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=2) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=0) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=5) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=6) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=4) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=7) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=9) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=10) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=8) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=11) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=14) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=13) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=12) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=15) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=18) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=16) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=17) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=19) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=21) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m (index=20) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmake_annotation_files_ldscore\u001b[0m output: \u001b[32m/restricted/projectnb/xqtl/jaempawi/xqtl-protocol/output/sldsc_ldscore/protocol_example_single_1/protocol_example_single_1.1.annot.gz /restricted/projectnb/xqtl/jaempawi/xqtl-protocol/output/sldsc_ldscore/protocol_example_single_1/protocol_example_single_1.1.l2.ldscore.parquet... (66 items in 22 groups)\u001b[0m\n", + "INFO: Workflow make_annotation_files_ldscore (ID=weae0ca3fdf468fd8) is executed successfully with 1 completed step and 22 completed substeps.\n" + ] + } + ], "source": [ "sos run pipeline/sldsc_enrichment.ipynb make_annotation_files_ldscore \\\n", " --annotation_file input/enrichment/sldsc/colocboost_test_annotation_path.txt \\\n", @@ -335,9 +369,8 @@ " --annotation_name protocol_example \\\n", " --plink_name reference. --baseline_name annotations. --weight_name weights. \\\n", " --python_exec python \\\n", - " --polyfun_path ../polyfun \\\n", - " --cwd output/sldsc_ldscore -j 4\n", - "\n" + " --polyfun_path polyfun \\\n", + " --cwd output/sldsc_ldscore -j 4\n" ] }, { @@ -403,11 +436,32 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 10, "metadata": { "kernel": "Bash" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/restricted/projectnb/xqtl/jaempawi/.pixi/envs/python/lib/python3.12/site-packages/sos/targets.py:22: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.\n", + " import pkg_resources\n", + "INFO: Running \u001b[32mget_heritability\u001b[0m: \n", + "maf_cutoff = 0: skipping MAF filter (--not-M-5-50)\n", + "maf_cutoff = 0: skipping MAF filter (--not-M-5-50)\n", + "maf_cutoff = 0: skipping MAF filter (--not-M-5-50)\n", + "python: can't open file '/restricted/projectnb/xqtl/jaempawi/xqtl-protocol/../polyfun/ldsc.py': [Errno 2] No such file or directory\n", + "python: can't open file '/restricted/projectnb/xqtl/jaempawi/xqtl-protocol/../polyfun/ldsc.py': [Errno 2] No such file or directory\n", + "python: can't open file '/restricted/projectnb/xqtl/jaempawi/xqtl-protocol/../polyfun/ldsc.py': [Errno 2] No such file or directory\n", + "INFO: \u001b[32mget_heritability\u001b[0m (index=1) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mget_heritability\u001b[0m (index=0) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mget_heritability\u001b[0m (index=2) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mget_heritability\u001b[0m output: \u001b[32m/restricted/projectnb/xqtl/jaempawi/xqtl-protocol/output/sldsc_heritability/protocol_example_single_1/sumstats.parquet.log /restricted/projectnb/xqtl/jaempawi/xqtl-protocol/output/sldsc_heritability/protocol_example_single_1/sumstats.parquet.results... (6 items in 3 groups)\u001b[0m\n", + "INFO: Workflow get_heritability (ID=wa79eac1662f5dd2d) is executed successfully with 1 completed step and 3 completed substeps.\n" + ] + } + ], "source": [ "sos run pipeline/sldsc_enrichment.ipynb get_heritability \\\n", " --target_anno_dirs output/sldsc_ldscore/protocol_example_single_1 \\\n", @@ -449,16 +503,29 @@ "to produce per-trait standardized tables and the default random-effects\n", "meta across all traits.\n", "\n", - "Use `--target-categories-label` (same order as `--target-categories`) to give the target annotations friendly names in the output \u2014 e.g. `--target-categories ANNOT_1_0 ANNOT_2_0 --target-categories-label quantile_eQTL eQTL` makes the `target` column read `quantile_eQTL` / `eQTL` instead of `ANNOT_1_0` / `ANNOT_2_0` (the original names are kept in `params$target_categories_orig`). Omit it to keep the polyfun `.results` names.\n" + "Use `--target-categories-label` (same order as `--target-categories`) to give the target annotations friendly names in the output — e.g. `--target-categories ANNOT_1_0 ANNOT_2_0 --target-categories-label quantile_eQTL eQTL` makes the `target` column read `quantile_eQTL` / `eQTL` instead of `ANNOT_1_0` / `ANNOT_2_0` (the original names are kept in `params$target_categories_orig`). Omit it to keep the polyfun `.results` names.\n" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 15, "metadata": { "kernel": "Bash" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/restricted/projectnb/xqtl/jaempawi/.pixi/envs/python/lib/python3.12/site-packages/sos/targets.py:22: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.\n", + " import pkg_resources\n", + "INFO: Running \u001b[32mpostprocess\u001b[0m: \n", + "INFO: \u001b[32mpostprocess\u001b[0m is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mpostprocess\u001b[0m output: \u001b[32m/restricted/projectnb/xqtl/jaempawi/xqtl-protocol/output/sldsc_postprocess/protocol_example.sldsc_postprocess.rds\u001b[0m\n", + "INFO: Workflow postprocess (ID=wb64dc2b84958960c) is executed successfully with 1 completed step.\n" + ] + } + ], "source": [ "sos run pipeline/sldsc_enrichment.ipynb postprocess \\\n", " --traits_file input/enrichment/sldsc/sumstats_test_all.txt \\\n", @@ -481,6 +548,7 @@ "\n", "The default meta in Step 2 pools all traits the user supplied. To re-run the meta on a subset (e.g., neurodegenerative traits only, or autoimmune traits only) without re-running the regression layer:\n", "\n", + "\n", "```r\n", "res <- readRDS(\"sldsc_results.rds\")\n", "neuro <- c(\"AD_GWAX\", \"PD_meta\", \"ALS_meta\")\n", @@ -492,16 +560,30 @@ "This step is light-weight and can be run interactively.\n", "\n", "\n", - "The default meta in step 3 pools all traits supplied to `[postprocess]`. Use `[meta_subset]` to re-run the meta on a user-defined trait subset (e.g., neurodegenerative traits only, autoimmune traits only) without re-running the regression or the per-trait standardization. The subset operates on the cached `.sldsc_postprocess.rds` output; it is light-weight and can be run interactively or in batch.\n" + "The default meta in step 3 pools all traits supplied to `[postprocess]`. Use `[meta_subset]` to re-run the meta on a user-defined trait subset (e.g., neurodegenerative traits only, autoimmune traits only) without re-running the regression or the per-trait standardization. The subset operates on the cached `.sldsc_postprocess.rds` output; it is light-weight and can be run interactively or in batch.\n", + "\n" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 17, "metadata": { "kernel": "Bash" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/restricted/projectnb/xqtl/jaempawi/.pixi/envs/python/lib/python3.12/site-packages/sos/targets.py:22: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.\n", + " import pkg_resources\n", + "INFO: Running \u001b[32mmeta_subset\u001b[0m: \n", + "INFO: \u001b[32mmeta_subset\u001b[0m is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmeta_subset\u001b[0m output: \u001b[32m/restricted/projectnb/xqtl/jaempawi/xqtl-protocol/output/sldsc_postprocess/protocol_example.category1.meta.rds\u001b[0m\n", + "INFO: Workflow meta_subset (ID=w09a2a0530119f1d2) is executed successfully with 1 completed step.\n" + ] + } + ], "source": [ "sos run pipeline/sldsc_enrichment.ipynb meta_subset \\\n", " --postprocess_rds output/sldsc_postprocess/protocol_example.sldsc_postprocess.rds \\\n", @@ -509,7 +591,7 @@ " --subset_name category1 --target_categories ANNOT_0 \\\n", " --annotation_name protocol_example --python_exec python \\\n", " --polyfun_path ../polyfun \\\n", - " --maf_cutoff 0 --cwd output/sldsc_postprocess -j 4" + " --maf_cutoff 0 --cwd output/sldsc_postprocess -j 4\n" ] }, { @@ -520,7 +602,7 @@ "source": [ "## Output\n", "\n", - "### Output summary (cached artifacts)\n", + "### Output summary\n", "\n", "| Stage | Cached on disk | Recomputable from | Purpose |\n", "|---|---|---|---|\n", @@ -541,10 +623,10 @@ "\n", "Each workflow writes into its `--cwd`:\n", "\n", - "- **make_annotation_files_ldscore** \u2014 polyfun `.annot.gz` files plus per-annotation LD-score directories (`.l2.ldscore.{gz,parquet}`, `.l2.M`, `.l2.M_5_50`). One single-target directory per annotation, plus (when more than one annotation) a joint directory.\n", - "- **get_heritability** \u2014 per trait and per target directory, the S-LDSC regression outputs `.{results,log,part_delete}`. The `.results` `Category` column carries the annotation name with a `_` suffix.\n", - "- **postprocess** \u2014 a single `.sldsc_postprocess.rds` containing per-trait tables (Gazal-style tau*, EnrichStat with back-solved jackknife SE) and three DerSimonian\u2013Laird random-effects meta tables (tau*, E, EnrichStat).\n", - "- **meta_subset** \u2014 a re-meta of the cached `.sldsc_postprocess.rds` over a user-defined trait subset (lightweight; no regression re-run).\n" + "- **make_annotation_files_ldscore** — polyfun `.annot.gz` files plus per-annotation LD-score directories (`.l2.ldscore.{gz,parquet}`, `.l2.M`, `.l2.M_5_50`). One single-target directory per annotation, plus (when more than one annotation) a joint directory.\n", + "- **get_heritability** — per trait and per target directory, the S-LDSC regression outputs `.{results,log,part_delete}`. The `.results` `Category` column carries the annotation name with a `_` suffix.\n", + "- **postprocess** — a single `.sldsc_postprocess.rds` containing per-trait tables (Gazal-style tau*, EnrichStat with back-solved jackknife SE) and three DerSimonian–Laird random-effects meta tables (tau*, E, EnrichStat).\n", + "- **meta_subset** — a re-meta of the cached `.sldsc_postprocess.rds` over a user-defined trait subset (lightweight; no regression re-run).\n" ] }, { @@ -1092,7 +1174,7 @@ "# index 1 = the baseline file -> \"base_1\",\"Coding_UCSC_1\", ... (the 97 baseline annots)\n", "# So in this pipeline the suffix is only ever 0 (target) or 1 (baseline); it would\n", "# continue 0,1,2,... only if you handed `ldsc.py --ref-ld-chr` more than two sources.\n", - "# (Why ANNOT_0 vs L2_0: see the [make_annotation_files_ldscore] header \u2014 ldsc.py's\n", + "# (Why ANNOT_0 vs L2_0: see the [make_annotation_files_ldscore] header — ldsc.py's\n", "# \"n_annot == 1 -> column name 'L2'\" quirk vs compute_ldscores.py keeping the annot\n", "# column name.) [postprocess] auto-detects the target Category; if you instead pass\n", "# --target-categories, the names must match this column exactly.\n", @@ -1222,7 +1304,7 @@ "outputs": [], "source": [ "[postprocess]\n", - "# Post-processing of polyfun outputs via pecotmr::sldscPostprocessingPipeline.\n", + "# Post-processing of polyfun outputs via pecotmr::sldsc_postprocessing_pipeline.\n", "# Reads .results / .log / .part_delete for all traits in `traits_file`, both\n", "# single-target and (when present) joint-target runs, computes Gazal-style\n", "# tau*, EnrichStat with back-solved jackknife SE, and runs the default\n", @@ -1275,15 +1357,15 @@ " trait_joint_prefix <- setNames(rep(NA_character_, length(traits)), traits)\n", " }\n", "\n", - " res <- sldscPostprocessingPipeline(\n", - " traitSinglePrefixes = trait_single_prefixes,\n", - " traitJointPrefix = trait_joint_prefix,\n", - " targetAnnoDir = \"${target_anno_dir}\",\n", - " frqfileDir = \"${frqfile_dir}\",\n", - " plinkName = \"${plink_name}\",\n", - " mafCutoff = ${maf_cutoff},\n", - " targetCategories = if (length(target_cats) > 0) target_cats else NULL,\n", - " targetLabels = if (length(target_lab) > 0) target_lab else NULL\n", + " res <- sldsc_postprocessing_pipeline(\n", + " trait_single_prefixes = trait_single_prefixes,\n", + " trait_joint_prefix = trait_joint_prefix,\n", + " target_anno_dir = \"${target_anno_dir}\",\n", + " frqfile_dir = \"${frqfile_dir}\",\n", + " plink_name = \"${plink_name}\",\n", + " maf_cutoff = ${maf_cutoff},\n", + " target_categories = if (length(target_cats) > 0) target_cats else NULL,\n", + " target_labels = if (length(target_lab) > 0) target_lab else NULL\n", " )\n", "\n", " saveRDS(res, \"${_output[0]}\")\n", @@ -1324,15 +1406,15 @@ "\n", " subset_per_trait <- res$per_trait[subset_traits]\n", "\n", - " # Map wide names (tau_star_single/joint) to bare names metaSldscRandom expects.\n", - " view_single <- pecotmr:::.sldscViewForMeta(subset_per_trait, \"single\")\n", - " view_joint <- pecotmr:::.sldscViewForMeta(subset_per_trait, \"joint\")\n", + " # Map wide names (tau_star_single/joint) to bare names meta_sldsc_random expects.\n", + " view_single <- pecotmr:::.sldsc_view_for_meta(subset_per_trait, \"single\")\n", + " view_joint <- pecotmr:::.sldsc_view_for_meta(subset_per_trait, \"joint\")\n", "\n", " out <- list(\n", - " tau_star_single = setNames(lapply(target_cats, function(c) metaSldscRandom(view_single, c, \"tauStar\")), target_cats),\n", - " tau_star_joint = setNames(lapply(target_cats, function(c) metaSldscRandom(view_joint, c, \"tauStar\")), target_cats),\n", - " enrichment = setNames(lapply(target_cats, function(c) metaSldscRandom(view_single, c, \"enrichment\")), target_cats),\n", - " enrichstat = setNames(lapply(target_cats, function(c) metaSldscRandom(view_single, c, \"enrichstat\")), target_cats)\n", + " tau_star_single = setNames(lapply(target_cats, function(c) meta_sldsc_random(view_single, c, \"tau_star\")), target_cats),\n", + " tau_star_joint = setNames(lapply(target_cats, function(c) meta_sldsc_random(view_joint, c, \"tau_star\")), target_cats),\n", + " enrichment = setNames(lapply(target_cats, function(c) meta_sldsc_random(view_single, c, \"enrichment\")), target_cats),\n", + " enrichstat = setNames(lapply(target_cats, function(c) meta_sldsc_random(view_single, c, \"enrichstat\")), target_cats)\n", " )\n", "\n", " saveRDS(out, \"${_output[0]}\")\n", @@ -1357,11 +1439,18 @@ "sos": { "kernels": [ [ - "Markdown", - "markdown", - "markdown", - "", - "" + "Bash", + "calysto_bash", + "Bash", + "#E6EEFF", + "shell" + ], + [ + "R", + "ir", + "R", + "#DCDCDA", + "r" ], [ "SoS", @@ -1380,4 +1469,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +} diff --git a/code/SoS/reference_data/rss_ld_sketch.ipynb b/code/SoS/reference_data/rss_ld_sketch.ipynb index c1ac6c9a..0396c03d 100644 --- a/code/SoS/reference_data/rss_ld_sketch.ipynb +++ b/code/SoS/reference_data/rss_ld_sketch.ipynb @@ -19,19 +19,19 @@ "source": [ "## Description\n", "\n", - "This pipeline generates a stochastic genotype sample **U = W\u1d40G** from whole-genome sequencing VCF files and stores it as a PLINK2 pgen file for use as an LD reference panel with SuSiE-RSS fine-mapping.\n", + "This pipeline generates a stochastic genotype sample **U = WᵀG** from whole-genome sequencing VCF files and stores it as a PLINK2 pgen file for use as an LD reference panel with SuSiE-RSS fine-mapping.\n", "\n", - "**Key idea:** Rather than storing the full genotype matrix G (n \u00d7 p), we compute U = W\u1d40G (B \u00d7 p) using a random projection matrix $W \\sim N(0, 1/\\sqrt{n})$. The approximate LD matrix $R = U^T U / B \\approx G^T G / n$ by the Johnson\u2013Lindenstrauss lemma. G is never stored.\n", + "**Key idea:** Rather than storing the full genotype matrix G (n × p), we compute U = WᵀG (B × p) using a random projection matrix $W \\sim N(0, 1/\\sqrt{n})$. The approximate LD matrix $R = U^T U / B \\approx G^T G / n$ by the Johnson–Lindenstrauss lemma. G is never stored.\n", "\n", "**Matrix dimensions:**\n", - "- G : (n \u00d7 p) \u2014 n individuals \u00d7 p variants\n", - "- W : (n \u00d7 B) \u2014 projection matrix, generated once per cohort\n", - "- U : (B \u00d7 p) \u2014 stochastic genotype sample = W\u1d40G, stored in pgen\n", - "- $\\hat{R}$ : (p \u00d7 p) \u2014 approximate LD matrix, computed on-the-fly by SuSiE-RSS from U\n", + "- G : (n × p) — n individuals × p variants\n", + "- W : (n × B) — projection matrix, generated once per cohort\n", + "- U : (B × p) — stochastic genotype sample = WᵀG, stored in pgen\n", + "- $\\hat{R}$ : (p × p) — approximate LD matrix, computed on-the-fly by SuSiE-RSS from U\n", "\n", "The workflow has three steps run in order: `generate_W` (build the projection matrix), `process_block` (read VCF per LD block and write per-block dosage sketches), and `merge_chrom` (merge per-block dosages into one per-chromosome pgen).\n", "\n", - "**Note on data:** This example runs on a clearly-labeled synthetic toy dataset \u2014 a chr22 VCF (`protocol_example.genotype.chr22.bgz`, 60 individuals) and a 3-block LD-block BED (`protocol_example.ld_blocks.bed`). No access-controlled individual-level human genomic data is used." + "**Note on data:** This example runs on a clearly-labeled synthetic toy dataset — a chr22 VCF (`protocol_example.genotype.chr22.bgz`, 60 individuals) and a 3-block LD-block BED (`protocol_example.ld_blocks.bed`). No access-controlled individual-level human genomic data is used." ] }, { @@ -68,7 +68,9 @@ { "cell_type": "markdown", "id": "4633fe31", - "metadata": {}, + "metadata": { + "kernel": "SoS" + }, "source": [ "**Timing:** ~10-20 min (chr22) on typical compute infrastructure." ] @@ -76,17 +78,34 @@ { "cell_type": "markdown", "id": "6ffb31a4-7ad4-479d-8955-ba598a16ef07", - "metadata": {}, + "metadata": { + "kernel": "SoS" + }, "source": [ "### Step 1. Generate the projection matrix W (run once per cohort; `--n-samples` must equal the VCF sample count)." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 4, "id": "03612385", - "metadata": {}, - "outputs": [], + "metadata": { + "kernel": "Bash" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/restricted/projectnb/xqtl/jaempawi/.pixi/envs/python/lib/python3.12/site-packages/sos/targets.py:22: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.\n", + " import pkg_resources\n", + "INFO: Running \u001b[32mgenerate_W\u001b[0m: \n", + "INFO: \u001b[32mgenerate_W\u001b[0m is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mgenerate_W\u001b[0m output: \u001b[32moutput/rss_ld_sketch/W_B50.npy\u001b[0m\n", + "INFO: Workflow generate_W (ID=w68f63c60d8da4b5e) is executed successfully with 1 completed step.\n" + ] + } + ], "source": [ "sos run pipeline/rss_ld_sketch.ipynb generate_W \\\n", " --n-samples 60 \\\n", @@ -99,19 +118,37 @@ { "cell_type": "markdown", "id": "b62d37b8-da5d-4d5f-a3b7-7633ff5ff70f", - "metadata": {}, + "metadata": { + "kernel": "SoS" + }, "source": [ - "### Step 2. Process all LD blocks for the chromosome \u2014 read the VCF, filter variants, and write per-block dosage sketches U = W\u1d40G.\n" + "### Step 2. Process all LD blocks for the chromosome — read the VCF, filter variants, and write per-block dosage sketches U = WᵀG.\n" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 5, "id": "d2eccffd", "metadata": { "kernel": "Bash" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/restricted/projectnb/xqtl/jaempawi/.pixi/envs/python/lib/python3.12/site-packages/sos/targets.py:22: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.\n", + " import pkg_resources\n", + "INFO: Running \u001b[32mprocess_block\u001b[0m: \n", + " 3 LD blocks queued\n", + "INFO: \u001b[32mprocess_block\u001b[0m (index=0) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mprocess_block\u001b[0m (index=1) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mprocess_block\u001b[0m (index=2) is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mprocess_block\u001b[0m output: \u001b[32moutput/rss_ld_sketch/chr22/chr22_16000000_20000000/protocol_example..chr22_16000000_20000000.dosage.gz output/rss_ld_sketch/chr22/chr22_30000000_34000000/protocol_example..chr22_30000000_34000000.dosage.gz... (3 items in 3 groups)\u001b[0m\n", + "INFO: Workflow process_block (ID=w8c1e759d62203ef6) is executed successfully with 1 completed step and 3 completed substeps.\n" + ] + } + ], "source": [ "sos run pipeline/rss_ld_sketch.ipynb process_block \\\n", " --ld-block-file input/rss_ld_sketch/protocol_example.ld_blocks.bed \\\n", @@ -128,19 +165,78 @@ { "cell_type": "markdown", "id": "452c348f-487f-44e4-96c1-75fe118cbc9a", - "metadata": {}, + "metadata": { + "kernel": "SoS" + }, "source": [ "### Step 3. Merge the per-block dosage sketches into one per-chromosome PLINK2 pgen." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 6, "id": "81f28809", "metadata": { "kernel": "Bash" }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/restricted/projectnb/xqtl/jaempawi/.pixi/envs/python/lib/python3.12/site-packages/sos/targets.py:22: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.\n", + " import pkg_resources\n", + "INFO: Running \u001b[32mmerge_chrom\u001b[0m: \n", + "PLINK v2.0.0-a.6.9LM 64-bit Intel (29 Jan 2025) cog-genomics.org/plink/2.0/\n", + "(C) 2005-2025 Shaun Purcell, Christopher Chang GNU General Public License v3\n", + "Logging to output/rss_ld_sketch/chr22/protocol_example..chr22.log.\n", + "Options in effect:\n", + " --make-pgen\n", + " --out output/rss_ld_sketch/chr22/protocol_example..chr22\n", + " --pmerge-list output/rss_ld_sketch/chr22/protocol_example..chr22_pmerge_list.txt pfile\n", + " --sort-vars\n", + "\n", + "Start time: Tue Jun 23 09:52:54 2026\n", + "191527 MiB RAM detected, ~187643 available; reserving 95763 MiB for main\n", + "workspace.\n", + "Using up to 32 threads (change this with --threads).\n", + "--pmerge-list: 3 filesets specified.\n", + "--pmerge-list: 50 samples present.\n", + "--pmerge-list: Merged .psam written to\n", + "output/rss_ld_sketch/chr22/protocol_example..chr22-merge.psam .\n", + "--pmerge-list: 3 .pvar files scanned.\n", + "Concatenation job detected.\n", + "Concatenating... 673/673 variants complete.\n", + "Results written to\n", + "output/rss_ld_sketch/chr22/protocol_example..chr22-merge.pgen +\n", + "output/rss_ld_sketch/chr22/protocol_example..chr22-merge.pvar .\n", + "50 samples (0 females, 0 males, 50 ambiguous; 50 founders) loaded from\n", + "output/rss_ld_sketch/chr22/protocol_example..chr22-merge.psam.\n", + "673 variants loaded from\n", + "output/rss_ld_sketch/chr22/protocol_example..chr22-merge.pvar.\n", + "Note: No phenotype data present.\n", + "Writing output/rss_ld_sketch/chr22/protocol_example..chr22.pvar ... done.\n", + "Writing output/rss_ld_sketch/chr22/protocol_example..chr22.psam ... done.\n", + "Writing output/rss_ld_sketch/chr22/protocol_example..chr22.pgen ... done.\n", + "End time: Tue Jun 23 09:52:54 2026\n", + "\n", + "=== Filter Summary for chr22 ===\n", + " value\n", + "n_total 6157.0\n", + "n_passed 673.0\n", + "n_multiallelic 0.0\n", + "n_monomorphic 4701.0\n", + "n_all_na 0.0\n", + "n_high_msng 101.0\n", + "n_low_maf 0.0\n", + "n_low_mac 682.0\n", + "pct_dropped 89.1\n", + "INFO: \u001b[32mmerge_chrom\u001b[0m is \u001b[32mcompleted\u001b[0m.\n", + "INFO: \u001b[32mmerge_chrom\u001b[0m output: \u001b[32moutput/rss_ld_sketch/chr22/protocol_example..chr22.pgen\u001b[0m\n", + "INFO: Workflow merge_chrom (ID=w8e5a670551e06660) is executed successfully with 1 completed step.\n" + ] + } + ], "source": [ "sos run pipeline/rss_ld_sketch.ipynb merge_chrom \\\n", " --output-dir output/rss_ld_sketch \\\n", @@ -548,7 +644,7 @@ "\n", "import os, glob\n", "\n", - "def _chroms_to_process(output_dir, chrom_filter):\n", + "def chromsto_process(output_dir, chrom_filter):\n", " if chrom_filter != 0:\n", " return [f\"chr{chrom_filter}\"]\n", " return sorted(set(\n", @@ -557,12 +653,13 @@ " if os.path.isdir(d)\n", " ))\n", "\n", - "chroms = _chroms_to_process(output_dir, chrom)\n", + "chroms = chromsto_process(output_dir, chrom)\n", "\n", "input: for_each = \"chroms\"\n", "output: f\"{output_dir}/{_chroms}/{cohort_id}.{_chroms}.pgen\"\n", "task: trunk_workers = 1, trunk_size = 1, walltime = walltime, mem = mem, cores = numThreads\n", "bash: expand = \"$[ ]\"\n", + "\n", " set -euo pipefail\n", " shopt -s nullglob\n", "\n", @@ -584,11 +681,13 @@ " map_file=\"${block_dir}/$[cohort_id].${block_tag}.map\"\n", " psam_file=\"${block_dir}/$[cohort_id].${block_tag}.psam\"\n", " meta_file=\"${block_dir}/$[cohort_id].${block_tag}.meta\"\n", + "\n", " B=$(grep \"^B=\" \"${meta_file}\" | cut -d= -f2)\n", " printf '#FID\\tIID\\n' > \"${psam_file}\"\n", " for i in $(seq 1 ${B}); do\n", " printf 'S%d\\tS%d\\n' ${i} ${i} >> \"${psam_file}\"\n", " done\n", + "\n", " $[plink2_bin] \\\n", " --import-dosage \"${dosage_gz}\" format=1 noheader \\\n", " --psam \"${psam_file}\" \\\n", @@ -596,12 +695,14 @@ " --make-pgen \\\n", " --out \"${prefix}_unsorted\" \\\n", " --silent\n", + "\n", " $[plink2_bin] \\\n", " --pfile \"${prefix}_unsorted\" \\\n", " --make-pgen \\\n", " --sort-vars \\\n", " --out \"${prefix}\" \\\n", " --silent\n", + "\n", " rm -f \"${prefix}_unsorted.pgen\" \"${prefix}_unsorted.pvar\" \"${prefix}_unsorted.psam\"\n", " echo \"${prefix}\" >> \"${merge_list}\"\n", " done\n", @@ -613,6 +714,9 @@ " --sort-vars \\\n", " --out \"${final_prefix}\"\n", "\n", + " # Remove PLINK merge intermediates immediately after merge\n", + " rm -f \"${final_prefix}-merge.pgen\" \"${final_prefix}-merge.pvar\" \"${final_prefix}-merge.psam\"\n", + "\n", " # Step 3: Concatenate .afreq\n", " first=1\n", " for f in \"${chrom_dir}\"/*/*.afreq; do\n", @@ -625,6 +729,7 @@ " done\n", "\n", "R: expand = \"$[ ]\"\n", + "\n", " library(data.table)\n", " meta_files <- list.files(\"$[output_dir]/$[_chroms]\",\n", " pattern = \"[.]meta$\", recursive = TRUE,\n", @@ -644,15 +749,17 @@ " cat(\"\\n=== Filter Summary for $[_chroms] ===\\n\")\n", " print(data.frame(value = unlist(summary), row.names = names(summary)))\n", " }\n", + "\n", "bash: expand = \"$[ ]\"\n", + "\n", " # Step 5: Cleanup block intermediates\n", " chrom_dir=\"$[output_dir]/$[_chroms]\"\n", " final_prefix=\"${chrom_dir}/$[cohort_id].$[_chroms]\"\n", + "\n", " rm -f \"${final_prefix}_pmerge_list.txt\"\n", - " rm -f \"${final_prefix}-merge.pgen\" \"${final_prefix}-merge.pvar\" \"${final_prefix}-merge.psam\"\n", " for block_dir in \"${chrom_dir}\"/*/; do\n", " rm -rf \"${block_dir}\"\n", - " done\n" + " done" ] }, { @@ -683,10 +790,10 @@ "## Output\n", "\n", "Per chromosome, under `--cwd`:\n", - "- `{cohort_id}.chr{N}.pgen` \u2014 binary genotype-sketch data (B pseudo-samples \u00d7 p variants)\n", - "- `{cohort_id}.chr{N}.pvar` \u2014 variant information\n", - "- `{cohort_id}.chr{N}.psam` \u2014 sample (sketch) information\n", - "- `{cohort_id}.chr{N}.afreq` \u2014 allele frequencies\n", + "- `{cohort_id}.chr{N}.pgen` — binary genotype-sketch data (B pseudo-samples × p variants)\n", + "- `{cohort_id}.chr{N}.pvar` — variant information\n", + "- `{cohort_id}.chr{N}.psam` — sample (sketch) information\n", + "- `{cohort_id}.chr{N}.afreq` — allele frequencies\n", "\n", "These feed SuSiE-RSS fine-mapping: load with a metadata TSV (one row per chromosome, columns `#chrom start end path`, `path` = pgen prefix). Use the X (genotype) interface for `susie_rss(z, X=X)` or the R (correlation) interface for `susie_rss(z, R=R)`." ] @@ -694,7 +801,9 @@ { "cell_type": "markdown", "id": "ff927c9c", - "metadata": {}, + "metadata": { + "kernel": "SoS" + }, "source": [ "## Anticipated Results\n", "\n", @@ -704,30 +813,33 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" + "display_name": "SoS", + "language": "sos", + "name": "sos" }, "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.12.13" + "codemirror_mode": "sos", + "file_extension": ".sos", + "mimetype": "text/x-sos", + "name": "sos", + "nbconvert_exporter": "sos_notebook.converter.SoS_Exporter", + "pygments_lexer": "sos" }, "sos": { "kernels": [ + [ + "Bash", + "calysto_bash", + "Bash", + "#E6EEFF", + "" + ], [ "SoS", "sos", "sos", "", - "" + "sos" ] ], "version": "" @@ -735,4 +847,4 @@ }, "nbformat": 4, "nbformat_minor": 5 -} \ No newline at end of file +}