Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added 02_activities/assignments/QQ_plot_A2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,071 changes: 1,071 additions & 0 deletions 02_activities/assignments/assignment_2.html

Large diffs are not rendered by default.

101 changes: 89 additions & 12 deletions 02_activities/assignments/assignment_2.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,32 @@ You will need **PLINK2** installed and available in your PATH. Please follow the

Before you run any models, first get familiar with the dataset. You may find `data.table::fread()` in R helpful for reading `.bim` and `.fam` files.

```{r setup, include=FALSE}
# Input files are expected in ./02_activities/data/.
#knitr::opts_chunk$set(echo = TRUE)
knitr::opts_knit$set(root.dir = normalizePath("../../"))
library(data.table)
library(qqman)
```

(i) Read the `.fam` file. How many samples does the dataset contain?

```{r}
# Your code here...
```{bash}
head ./02_activities/data/gwa.A2.fam
wc -l ./02_activities/data/gwa.A2.fam
```

There are 4000 samples in the dataset.

(ii) Read the `.bim` file. How many SNPs does the dataset contain?

```{r}
# Your code here...
```{bash}
head ./02_activities/data/gwa.A2.bim
wc -l ./02_activities/data/gwa.A2.bim
```

There are 306102 SNPs in the dataset.


#### Question 2: Quality control (QC)

Expand All @@ -29,7 +43,13 @@ Now we will perform QC using PLINK2 for the genotype files in `gwa.A2`.
(i) Using PLINK2 from the command line (bash), perform basic QC with the following filters: MAF ≥ 0.05, SNP missingness (`--geno`) ≤ 0.01, individual missingness (`--mind`) ≤ 0.10, and HWE p-value ≥ 0.00005, and output the QC’ed dataset as `gwa.qc.A2`.

```{bash}
# Your code here...
PLINK2 --bfile ./02_activities/data/gwa.A2 \
--maf 0.05 \
--geno 0.01 \
--mind 0.10 \
--hwe 0.00005 \
--make-bed \
--out ./02_activities/data/gwa.qc.A2
```

#### Question 3: Relatedness
Expand All @@ -39,19 +59,33 @@ In this question, you will use **PLINK2’s built-in KING-robust kinship** (`--k
i) Perform LD pruning on `gwa.qc.A2` using PLINK2 with the following parameters: `--indep-pairwise 500 50 0.05`, and then generate a new dataset containing only the pruned SNPs.

```{bash}
# Your code here...
PLINK2 --bfile ./02_activities/data/gwa.qc.A2 \
--indep-pairwise 500 50 0.05 \
--out ./02_activities/data/gwa.qc.A2
```

```{bash}
PLINK2 --bfile ./02_activities/data/gwa.qc.A2 \
--extract ./02_activities/data/gwa.qc.A2.prune.in \
--make-bed \
--out ./02_activities/data/gwa.qc.A2.pruned
```

(ii) Use PLINK2 on the LD-pruned dataset to identify a set of unrelated individuals up to (approximately) 2nd-degree relatives (use a kinship cutoff of 0.0884).

```{bash}
# Your code here...
PLINK2 --bfile ./02_activities/data/gwa.qc.A2 \
--king-cutoff 0.0884 \
--out ./02_activities/data/gwa.qc.A2
```

(iii) Using the unrelated individual list from part (ii), create a dataset containing only unrelated individuals and name it `gwa.qc_unrelated.A2`.

```{bash}
# Your code here...
PLINK2 --bfile ./02_activities/data/gwa.qc.A2 \
--keep ./02_activities/data/gwa.qc.A2.king.cutoff.in.id \
--make-bed \
--out ./02_activities/data/gwa.qc_unrelated.A2
```

#### Question 4: principal component analysis (PCA)
Expand All @@ -61,7 +95,16 @@ i) Perform LD pruning on `gwa.qc.A2` using PLINK2 with the following parameters
(Hint: Refer to the PCA section in the GWAS Tutorial II. You should use a `*.prune.in` file to select a set of independent SNPs before performing PCA.)

```{bash}
# Your code here...
PLINK2 --bfile ./02_activities/data/gwa.qc_unrelated.A2 \
--extract ./02_activities/data/gwa.qc.A2.prune.in \
--make-bed \
--out ./02_activities/data/gwa_A2_unrelated_pruned
```

```{bash}
PLINK2 --bfile ./02_activities/data/gwa_A2_unrelated_pruned \
--pca 20 \
--out ./02_activities/data/gwa_A2_unrelated_PCA
```

#### Question 5: GWAS analyses
Expand All @@ -77,19 +120,53 @@ Assume that:
(i) Using PLINK2, run a linear regression GWAS using `gwa.qc_unrelated.A2` as the input dataset. Adjust for PCs 1–3 as covariates.

```{bash}
# Your code here...
PLINK2 --bfile ./02_activities/data/gwa.qc_unrelated.A2 \
--glm \
--covar ./02_activities/data/gwa_A2_unrelated_PCA.eigenvec \
--covar-name PC1,PC2,PC3 \
--ci 0.95 \
--adjust \
--out ./02_activities/data/gwa.qc_A2_assoc
```

(ii) Create a Manhattan plot of the GWAS results.

```{r}
# Your code here...
assoc <- data.table::fread("./02_activities/data/gwa.qc_A2_assoc.PHENO1.glm.linear")
head(assoc)
assoc_adjusted <- data.table::fread("./02_activities/data/gwa.qc_A2_assoc.PHENO1.glm.linear.adjusted")

assoc_add <- assoc[TEST == "ADD"]

# qqman expects columns named CHR (chromosome), BP (base pair), SNP, and P (p-value).
# In PLINK2 output they are #CHROM, POS, ID, and P.
setnames(assoc_add, c("#CHROM","POS","ID"), c("CHR","BP","SNP"))

# Manhattan plot
png("./02_activities/data/manhattan_plot_A2.png", width = 1200, height = 800, res = 150)

manhattan(assoc_add,
chr = "CHR",
bp = "BP",
snp = "SNP",
p = "P",
xlab = "",
ylab = "",
suggestiveline = FALSE,
cex.axis = 1.5,
col = c("lightblue", "lightslateblue"),
annotatePval=5e-5)
dev.off()

```

(iii) Create a QQ plot of the GWAS p-values.

```{r}
# Your code here...
# Q-Q plot
png("./02_activities/data/QQ_plot_A2.png", width = 1200, height = 800, res = 150)
qq(assoc_add$P, main = "Q-Q plot of GWAS p-values")
dev.off()
```

### Criteria
Expand Down

Large diffs are not rendered by default.

Loading