siying-ma · siying-ma · Apr 2, 2026
diff --git a/02_activities/assignments/QQ_plot_A2.png b/02_activities/assignments/QQ_plot_A2.png
diff --git a/02_activities/assignments/assignment_2.html b/02_activities/assignments/assignment_2.html
diff --git a/02_activities/assignments/assignment_2.qmd b/02_activities/assignments/assignment_2.qmd
@@ -9,18 +9,32 @@ You will need **PLINK2** installed and available in your PATH. Please follow the
 
 Before you run any models, first get familiar with the dataset. You may find `data.table::fread()` in R helpful for reading `.bim` and `.fam` files.
 
+```{r setup, include=FALSE}
+# Input files are expected in ./02_activities/data/.
+#knitr::opts_chunk$set(echo = TRUE)
+knitr::opts_knit$set(root.dir = normalizePath("../../"))
+library(data.table)
+library(qqman)
+```
+
 (i) Read the `.fam` file. How many samples does the dataset contain?
 
-```{r}
-# Your code here...
+```{bash}
+head ./02_activities/data/gwa.A2.fam
+wc -l ./02_activities/data/gwa.A2.fam         
 ```
 
+There are 4000 samples in the dataset.
+
 (ii) Read the `.bim` file. How many SNPs does the dataset contain?
 
-```{r}
-# Your code here...
+```{bash}
+head ./02_activities/data/gwa.A2.bim
+wc -l ./02_activities/data/gwa.A2.bim         
 ```
 
+There are 306102 SNPs in the dataset.
+
 
 #### Question 2: Quality control (QC)
 
@@ -29,7 +43,13 @@ Now we will perform QC using PLINK2 for the genotype files in `gwa.A2`.
 (i) Using PLINK2 from the command line (bash), perform basic QC with the following filters: MAF ≥ 0.05, SNP missingness (`--geno`) ≤ 0.01, individual missingness (`--mind`) ≤ 0.10, and HWE p-value ≥ 0.00005, and output the QC’ed dataset as `gwa.qc.A2`.
 
 ```{bash}
-# Your code here...
+PLINK2 --bfile ./02_activities/data/gwa.A2 \
+  --maf 0.05 \
+  --geno 0.01 \
+  --mind 0.10 \
+  --hwe 0.00005 \
+  --make-bed \
+  --out ./02_activities/data/gwa.qc.A2
 ```
 
 #### Question 3: Relatedness
@@ -39,19 +59,33 @@ In this question, you will use **PLINK2’s built-in KING-robust kinship** (`--k
 i)  Perform LD pruning on `gwa.qc.A2` using PLINK2 with the following parameters: `--indep-pairwise 500 50 0.05`, and then generate a new dataset containing only the pruned SNPs.
 
 ```{bash}
-# Your code here...
+PLINK2 --bfile ./02_activities/data/gwa.qc.A2 \
+  --indep-pairwise 500 50 0.05 \
+  --out ./02_activities/data/gwa.qc.A2
+```
+
+```{bash}
+PLINK2 --bfile ./02_activities/data/gwa.qc.A2 \
+  --extract ./02_activities/data/gwa.qc.A2.prune.in \
+  --make-bed \
+  --out ./02_activities/data/gwa.qc.A2.pruned
 ```
 
 (ii) Use PLINK2 on the LD-pruned dataset to identify a set of unrelated individuals up to (approximately) 2nd-degree relatives (use a kinship cutoff of 0.0884).
 
 ```{bash}
-# Your code here...
+PLINK2 --bfile ./02_activities/data/gwa.qc.A2 \
+  --king-cutoff 0.0884 \
+  --out ./02_activities/data/gwa.qc.A2
 ```
 
 (iii) Using the unrelated individual list from part (ii), create a dataset containing only unrelated individuals and name it `gwa.qc_unrelated.A2`.
 
 ```{bash}
-# Your code here...
+PLINK2 --bfile ./02_activities/data/gwa.qc.A2 \
+  --keep ./02_activities/data/gwa.qc.A2.king.cutoff.in.id \
+  --make-bed \
+  --out ./02_activities/data/gwa.qc_unrelated.A2
 ```
 
 #### Question 4: principal component analysis (PCA)
@@ -61,7 +95,16 @@ i)  Perform LD pruning on `gwa.qc.A2` using PLINK2 with the following parameters
     (Hint: Refer to the PCA section in the GWAS Tutorial II. You should use a `*.prune.in` file to select a set of independent SNPs before performing PCA.)
 
 ```{bash}
-# Your code here...
+PLINK2 --bfile ./02_activities/data/gwa.qc_unrelated.A2 \
+  --extract ./02_activities/data/gwa.qc.A2.prune.in \
+  --make-bed \
+  --out ./02_activities/data/gwa_A2_unrelated_pruned
+```
+
+```{bash}
+PLINK2 --bfile ./02_activities/data/gwa_A2_unrelated_pruned \
+  --pca 20 \
+  --out ./02_activities/data/gwa_A2_unrelated_PCA
 ```
 
 #### Question 5: GWAS analyses
@@ -77,19 +120,53 @@ Assume that:
 (i) Using PLINK2, run a linear regression GWAS using `gwa.qc_unrelated.A2` as the input dataset. Adjust for PCs 1–3 as covariates.
 
 ```{bash}
-# Your code here...
+PLINK2 --bfile ./02_activities/data/gwa.qc_unrelated.A2 \
+  --glm \
+  --covar ./02_activities/data/gwa_A2_unrelated_PCA.eigenvec \
+  --covar-name PC1,PC2,PC3 \
+  --ci 0.95 \
+  --adjust \
+  --out ./02_activities/data/gwa.qc_A2_assoc
 ```
 
 (ii) Create a Manhattan plot of the GWAS results.
 
 ```{r}
-# Your code here...
+assoc <- data.table::fread("./02_activities/data/gwa.qc_A2_assoc.PHENO1.glm.linear")
+head(assoc)
+assoc_adjusted <- data.table::fread("./02_activities/data/gwa.qc_A2_assoc.PHENO1.glm.linear.adjusted")
+
+assoc_add <- assoc[TEST == "ADD"]
+
+# qqman expects columns named CHR (chromosome), BP (base pair), SNP, and P (p-value).
+# In PLINK2 output they are #CHROM, POS, ID, and P.
+setnames(assoc_add, c("#CHROM","POS","ID"), c("CHR","BP","SNP"))
+
+# Manhattan plot
+png("./02_activities/data/manhattan_plot_A2.png", width = 1200, height = 800, res = 150)
+
+manhattan(assoc_add,
+          chr = "CHR",
+          bp  = "BP",
+         snp = "SNP",
+          p   = "P",
+          xlab = "",
+          ylab = "",
+          suggestiveline = FALSE,
+          cex.axis = 1.5,
+          col = c("lightblue", "lightslateblue"),
+          annotatePval=5e-5) 
+dev.off()
+
 ```
 
 (iii) Create a QQ plot of the GWAS p-values.
 
 ```{r}
-# Your code here...
+# Q-Q plot
+png("./02_activities/data/QQ_plot_A2.png", width = 1200, height = 800, res = 150)
+qq(assoc_add$P, main = "Q-Q plot of GWAS p-values")
+dev.off()
 ```
 
 ### Criteria

diff --git a/...ents/assignment_2_files/libs/bootstrap/bootstrap-138a6193a3bd40baf1e627da441a4734.min.css b/...ents/assignment_2_files/libs/bootstrap/bootstrap-138a6193a3bd40baf1e627da441a4734.min.css