diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md new file mode 100644 index 000000000..1a68db5e5 --- /dev/null +++ b/.github/pull_request_template.md @@ -0,0 +1,13 @@ +## Goal + + +## Changes +- + +## Testing + + +## Checklist +- [ ] Title is a clear sentence (≤ 70 chars) +- [ ] Commits are signed (`git log --show-signature`) +- [ ] `submissions/labN.md` updated diff --git a/labs/lab1.md b/labs/lab1.md index bb4e226d9..eb319e50f 100644 --- a/labs/lab1.md +++ b/labs/lab1.md @@ -91,9 +91,20 @@ git config --global commit.gpgsign true git config --global tag.gpgsign true ``` -Tell the platform your SSH key is a **signing key**: -- GitHub: Settings → SSH and GPG keys → **New SSH key**, key type **Signing Key** -- GitLab: Profile → SSH Keys → tick "Usage type: Authentication & signing" +Now register the key on the platform. GitHub treats **Authentication** and **Signing** as *separate* roles for the same key, so you add it under both: + +- **Authentication Key** — lets you `clone` / `fetch` / `push` over SSH (`git@github.com:…`). If you cloned over HTTPS, or have never seen `ssh -T git@github.com` greet you by name, you don't have one configured yet — add it now or the `upstream` SSH remote will fail in Lab 2. +- **Signing Key** — gives your commits the **Verified** badge. + +- 🐙 GitHub: Settings → SSH and GPG keys → **New SSH key** → add the **same** `~/.ssh/id_ed25519.pub` **twice**, once with Key type **Authentication Key** and once with **Signing Key**. +- 🦊 GitLab: Profile → SSH Keys → a single key with **Usage type: Authentication & signing** covers both. + +Confirm authentication works before moving on: + +```bash +ssh -T git@github.com +# expect: Hi YOUR_USERNAME! You've successfully authenticated... +``` ### 1.4: Make a Signed Commit @@ -303,7 +314,8 @@ In `submissions/lab1.md`: ## Common Pitfalls - 🪤 **PR template doesn't auto-populate** — make sure the template is on `main` *before* opening the PR -- 🪤 **Commits show "Unverified"** — the SSH key must be added as a *Signing Key* on GitHub (not just an authentication key) +- 🪤 **Commits show "Unverified"** — the key must also be added as a **Signing Key** on GitHub; an Authentication Key alone won't verify commits (they're separate roles — see §1.3) +- 🪤 **`git@github.com: Permission denied (publickey)` on clone/fetch/push** — the *reverse* gap: your key is registered for signing but not as an **Authentication Key**. Add it as Authentication too (§1.3) and confirm with `ssh -T git@github.com`. Quick unblock for the *public* upstream: `git remote set-url upstream https://github.com/inno-devops-labs/DevOps-Intro.git` - 🪤 **`git push` rejected on `main`** — that's the bonus rule working as designed; push to `feature/lab1` instead - 🪤 **`gpg.format=ssh` ignored** — confirm Git ≥ 2.34: `git --version` - 🪤 **Pushed to the wrong branch** — `git switch feature/lab1` before `git push` diff --git a/labs/lab2.md b/labs/lab2.md index fca7b3f22..aae9acfb3 100644 --- a/labs/lab2.md +++ b/labs/lab2.md @@ -223,6 +223,7 @@ git bisect reset ## Common Pitfalls +- 🪤 **`git@github.com: Permission denied (publickey)` on `git fetch upstream`** — *not* a remote-config bug (the error is at the SSH layer, before Git reads the repo). Your key isn't registered for **authentication** on GitHub — and a **Signing Key** (Lab 1) does *not* count for auth, they're separate roles. Add the same `~/.ssh/id_ed25519.pub` as an **Authentication Key** (Lab 1 §1.3), verify with `ssh -T git@github.com`, then re-run. To unblock right now, the public upstream fetches over HTTPS with no key: `git remote set-url upstream https://github.com/inno-devops-labs/DevOps-Intro.git` - 🪤 **`reset --hard` without committing first** — your *uncommitted* edits really *are* gone (reflog only saves committed work). Always check `git status` first - 🪤 **`tag -v` says "no signature"** — you used `git tag NAME` instead of `git tag -a -s NAME -m "..."` - 🪤 **Rebase conflicts** — resolve, then `git rebase --continue`. Never `git rebase --skip` unless you know what you're skipping diff --git a/labs/lab3.md b/labs/lab3.md index 9f0970b20..87344cfb3 100644 --- a/labs/lab3.md +++ b/labs/lab3.md @@ -160,6 +160,23 @@ Tips: - GitLab: `parallel:matrix:` - Set `fail-fast: false` (GH) or equivalent so a single bad cell doesn't cancel the others — you want to *see* which combo broke +> ⚠️ **The matrix renames your checks — update branch protection (1.6) or your PR blocks forever.** A matrixed `test` job reports as `test (1.23)` and `test (1.24)`; the old required check named `test` will sit at *"Expected — Waiting for status to be reported"* indefinitely, even though every real check is green. Two fixes: +> +> 1. **Quick:** in the branch-protection rule, replace `vet`/`test` with the matrixed names (`vet (1.23)`, `vet (1.24)`, `test (1.23)`, `test (1.24)`). +> 2. **Robust (recommended):** add one aggregation job and require *only* it — then the matrix can change freely without touching protection settings: +> +> ```yaml +> ci-ok: +> if: always() +> needs: [vet, test, lint] +> runs-on: ubuntu-24.04 +> steps: +> - run: | +> test "${{ contains(needs.*.result, 'failure') || contains(needs.*.result, 'cancelled') }}" = "false" +> ``` +> +> The `if: always()` matters — without it, a failed `needs` job *skips* `ci-ok`, and a skipped required check lets the PR through on some configurations. + ### 2.3: Skip docs-only changes Edit your trigger so the pipeline runs **only** when something in `app/` or your CI config itself changes. README edits should not burn 4 minutes of CI time. @@ -179,6 +196,8 @@ Capture wall-clock times from the CI UI for three scenarios: > 💡 To get a clean baseline, temporarily disable each optimization with a commit, take a screenshot of the run time, then restore. +> 🧪 **Expect the cache rows to be boring — that's the finding, not a failure.** QuickNotes has **zero third-party dependencies** (look at `app/go.mod` — no `require` block, no `go.sum`), so the module cache has nothing to store and total wall-clock barely moves with `cache: true` vs `cache: false`. Most of your 60–80 s is runner provisioning, checkout, and the Go toolchain download — none of which `setup-go`'s cache touches. Report what you measured and *explain why* (that's design question **f** in disguise). To see where caching *would* pay, compare the **per-step** durations (`setup-go`, `go test`) instead of job totals, and note which step a real dependency-heavy project would save on. + ### 2.5: Document In `submissions/lab3.md`: @@ -284,6 +303,8 @@ Answer in 4-6 sentences: - 🪤 **Forgot `working-directory` (or `cd app`) for Go commands** — Go modules live in `app/`, not the repo root; commands run from the root will fail with "no Go files" - 🪤 **`fail-fast: true` (the GH Actions default) in a matrix** — one fail cancels the others; you can't see *which* combo broke - 🪤 **Branch protection set on someone else's fork's `main`** — you can only protect *your* fork's `main`. The upstream course repo has its own protection +- 🪤 **PR stuck on "Expected — Waiting for status to be reported" after adding the matrix** — the matrix renamed `test` → `test (1.23)`/`test (1.24)`, but branch protection still requires the old `test` context, which will never report again. Update the required-check names or switch to the `ci-ok` aggregation job (see §2.2) +- 🪤 **"Caching didn't speed anything up"** — on a zero-dependency module that's the *correct* result, not a mistake (see §2.4); don't pad the timing table with numbers you didn't observe - 🪤 **`golangci-lint` version not pinned** — "latest" pulls a new release tomorrow that may flag your code with new rules. Pin `v2.5.0` exactly - 🪤 **GitLab CI: incorrect anchor syntax** (`<<: *name`) — GitLab is strict; use the in-platform CI Lint tool (`Project → CI/CD → Editor → Validate`) - 🪤 **Cache hits expire after 7 days of inactivity on GH** — that's expected; the cache key is what protects you against poisoning diff --git a/submissions/lab1.md b/submissions/lab1.md new file mode 100644 index 000000000..33da79068 --- /dev/null +++ b/submissions/lab1.md @@ -0,0 +1,93 @@ +# Lab 1 submission + +## Task 1 +Request: +``` +curl -s http://localhost:8080/health | python3 -m json.tool +``` + +Answer: +``` +{ + "notes": 5, + "status": "ok" +} +``` + +Request: +``` +curl -s http://localhost:8080/notes | python3 -m json.tool +``` + +Answer: +``` +[ + { + "id": 2, + "title": "Read app/main.go first", + "body": "Start by understanding the entry point \u2014 env vars, signal handling, graceful shutdown.", + "created_at": "2026-01-15T10:05:00Z" + }, + { + "id": 3, + "title": "DevOps mantra", + "body": "If it hurts, do it more often.", + "created_at": "2026-01-15T10:10:00Z" + }, + { + "id": 4, + "title": "Endpoint cheat-sheet", + "body": "GET /notes GET /notes/{id} POST /notes DELETE /notes/{id} GET /health GET /metrics", + "created_at": "2026-01-15T10:15:00Z" + }, + { + "id": 1, + "title": "Welcome to QuickNotes", + "body": "This is the project you'll containerize, deploy, monitor, and harden across all 10 labs.", + "created_at": "2026-01-15T10:00:00Z" + } +] +``` + +Request: +``` +curl -s -X POST http://localhost:8080/notes \ + -H 'Content-Type: application/json' \ + -d '{"title":"hello","body":"first POST"}' | python3 -m json.tool +``` + +Answer: +``` +{ + "id": 5, + "title": "hello", + "body": "first POST", + "created_at": "2026-06-05T10:51:13.503497Z" +}, +``` + +``` +git log --show-signature -1 + +commit 843a27f3ade36ea41d723f168fb3f8c9c1f7b70c (HEAD -> feature/lab1, origin/feature/lab1) +Good "git" signature for 15dnau@gmail.com with ED25519 key SHA256:k0n7/mx/uRX52s/zu9pxaN+h/IKnBJzcnuybJgthVkM +Author: Dmitrii <15dnau@gmail.com> +Date: Fri Jun 5 14:03:50 2026 +0300 + + docs(lab1): start submission + + Signed-off-by: Dmitrii <15dnau@gmail.com> +``` + +### Verified commit + +![Verified commit](verified.png "Verified commit") + +When we work with Github we trust that commit made by Dmitrii was actually made by Dmitrii. However Git itself does not verify commit's author. Anyone can set any name and make a commit, therefore we want commits to be verified. + +### GitHub Community +Why starring repositories matters in open source +For a project, stars are a signal of trust and relevance. Moreover, starring is something like bookmarking a repository. + +How following developers helps in team projects and professional growth +Following your colleagues on GitHub gives you a low-noise feed of their activity \ No newline at end of file diff --git a/submissions/lab4.md b/submissions/lab4.md new file mode 100644 index 000000000..e6eb32d18 --- /dev/null +++ b/submissions/lab4.md @@ -0,0 +1,294 @@ +## Task 1 — Trace a Request End-to-End (6 pts) + +### 1.1 Capture + +App started in terminal A with `cd app/ && go run .` (listens on `:8080`). +Capture in terminal B (sudo pre-authenticated with `sudo -v` so it could run +backgrounded without a suspended password prompt): + +```bash +sudo tcpdump -i lo0 -s 0 -w lab4-trace.pcap 'tcp port 8080' & +TCPDUMP_PID=$! +``` + +One request fired in terminal C: + +```bash +curl -v -X POST http://localhost:8080/notes \ + -H 'Content-Type: application/json' \ + -d '{"title":"trace me","body":"in flight"}' +``` + +`curl -v` output (note: connection went over IPv6 loopback `::1`): + +``` +* Connected to localhost (::1) port 8080 +> POST /notes HTTP/1.1 +> Host: localhost:8080 +> User-Agent: curl/8.7.1 +> Content-Type: application/json +> Content-Length: 39 +> +< HTTP/1.1 201 Created +< Content-Type: application/json +< Date: Tue, 16 Jun 2026 20:46:57 GMT +< Content-Length: 90 +< +{"id":7,"title":"trace me","body":"in flight","created_at":"2026-06-16T20:46:57.566588Z"} +* Connection #0 to host localhost left intact +``` + +Capture stopped: `sudo kill $TCPDUMP_PID` → tcpdump reported **12 packets captured**. + +### 1.2 Decode + annotate + +Decoded with `sudo tcpdump -r lab4-trace.pcap -nn -A | tee lab4-trace.txt`. +The whole transaction is loopback IPv6 (`::1`), client port `55246` ↔ server port `8080`. +Annotated packet walk: + +**① TCP three-way handshake** (packets 1–3): + +``` +[S] ::1.55246 > ::1.8080: Flags [S], seq 1061962572 ← SYN (client → server) +[S.] ::1.8080 > ::1.55246: Flags [S.], seq 1185686777, ack ...573 ← SYN/ACK (server → client) +[.] ::1.55246 > ::1.8080: Flags [.], ack 1 ← ACK (client → server) +``` +Connection established. (Plus an extra server `[.]` ack — normal on loopback.) + +**② HTTP request line + JSON body** (packet 5, `[P.]` PUSH, length 174): + +``` +::1.55246 > ::1.8080: Flags [P.], seq 1:175 ... HTTP: POST /notes HTTP/1.1 + POST /notes HTTP/1.1 + Host: localhost:8080 + User-Agent: curl/8.7.1 + Content-Type: application/json + Content-Length: 39 + + {"title":"trace me","body":"in flight"} +``` + +**③ HTTP response line + JSON body** (packet 7, `[P.]` PUSH, length 203): + +``` +::1.8080 > ::1.55246: Flags [P.], seq 1:204 ... HTTP: HTTP/1.1 201 Created + HTTP/1.1 201 Created + Content-Type: application/json + Date: Tue, 16 Jun 2026 20:46:57 GMT + Content-Length: 90 + + {"id":7,"title":"trace me","body":"in flight","created_at":"2026-06-16T20:46:57.566588Z"} +``` + +**④ Connection close** (FIN handshake, packets 9–12): + +``` +[F.] ::1.55246 > ::1.8080: Flags [F.], seq 175 ← client FIN +[.] ::1.8080 > ::1.55246: Flags [.], ack 176 ← server ACKs the FIN +[F.] ::1.8080 > ::1.55246: Flags [F.], seq 204 ← server FIN +[.] ::1.55246 > ::1.8080: Flags [.], ack 205 ← client ACKs → connection closed +``` + +Graceful four-way FIN close (no `RST`). Full raw capture in `lab4-trace.txt`. + +### 1.3 Five debugging commands + +**1. What's listening on :8080?** (`ss` does not exist on macOS → `lsof`) + +``` +$ lsof -nP -iTCP:8080 -sTCP:LISTEN +COMMAND PID USER FD TYPE DEVICE NODE NAME +quicknote 34042 dmitrijnaumov 5u IPv6 0x19c0...877 TCP *:8080 (LISTEN) +``` +→ The `quicknotes` process (PID 34042) owns `*:8080`, bound on IPv6 — which is +why the curl above connected via `::1`. + +**2. Routes from this host** (`ip route` → `netstat -rn`, relevant rows): + +``` +Destination Gateway Flags Netif +default 192.168.0.1 UGScIg en0 ← default route via Wi-Fi gateway +127 127.0.0.1 UCS lo0 ← all 127/8 stays on loopback +127.0.0.1 127.0.0.1 UH lo0 +::1 ::1 UHL lo0 ← IPv6 loopback (used by this request) +``` +(Full table — incl. many VPN `utun4` host routes — omitted for brevity.) + +**3. Reachability on loopback** (`mtr` → `traceroute`): + +``` +$ traceroute localhost +traceroute to localhost (127.0.0.1), 64 hops max + 1 localhost (127.0.0.1) 1.687 ms 1.040 ms 0.702 ms +``` +→ Single hop, sub-2ms — loopback never leaves the host. + +**4. DNS works** (`dig` — same on macOS): + +``` +$ dig +short example.com @1.1.1.1 +172.66.147.243 +104.20.23.154 +``` +→ Resolution against Cloudflare `1.1.1.1` succeeds. + +**5. Service logs** — no `journalctl`/journald on macOS. QuickNotes was run in the +foreground via `go run .`, so its log line +(`quicknotes listening on :8080 (notes loaded: N)`) prints to terminal A's stdout; +that terminal *is* the log sink here. + +### 1.4 Reflection — what would I check first on a 502? + +A 502 Bad Gateway is a **proxy/upstream** error: the front proxy (nginx, Caddy, +a load balancer) accepted my connection but got no valid response from the +backend. So I would not start at the client — I'd work the upstream link. First, +is the QuickNotes process actually alive and **listening** on the expected port +(`lsof -nP -iTCP:8080 -sTCP:LISTEN`)? If it's not listening, it crashed or never +bound — check its logs/stdout for a panic or `bind` error. If it *is* listening, +is the proxy pointed at the right host:port, and is the backend simply too slow +(a handler timeout will surface as 502/504)? In short: 502 means "the thing +behind the gateway is broken or unreachable," so I trace proxy → upstream +process → port → logs, in that order. + +--- + +## Task 2 — Outside-In Debugging on a Broken Deploy (4 pts) + +### 2.1 Reproduce the break (port conflict) + +```bash +cd app/ +ADDR=:8080 go run . & # first instance +PID1=$! +sleep 2 +ADDR=:8080 go run . 2>&1 | tee /tmp/qn-broken.log & # second instance +``` + +First instance bound cleanly: + +``` +[1] 40010 +2026/06/16 23:54:30 quicknotes listening on :8080 (notes loaded: 7) +``` + +Second instance failed — **exact root-cause error**: + +``` +2026/06/16 23:54:54 quicknotes listening on :8080 (notes loaded: 7) +2026/06/16 23:54:54 listen: listen tcp :8080: bind: address already in use +exit status 1 +[2] + exit 1 ADDR=:8080 go run . 2>&1 | tee /tmp/qn-broken.log +``` + +Root cause: **`listen tcp :8080: bind: address already in use`** (`EADDRINUSE`) — +two processes cannot bind the same `host:port`. + +### 2.2 Outside-in chain (command → output → decision) + +**1. Is it running?** (`ps`) + +``` +$ ps -ef | grep -E "go run|quicknotes" | grep -v grep +501 40010 39989 ... go run . +501 40014 40010 ... /Users/.../go-build/.../quicknotes +``` +→ Yes, and there are **two** processes: the `go run .` wrapper (PID 40010) and +its **child compiled binary** `quicknotes` (PID 40014, parent = 40010). The child +is the real server. *Decision: process alive; move inward.* + +**2. Is it listening?** (`lsof`, replaces `ss -tlnp`) + +``` +$ lsof -nP -iTCP:8080 +COMMAND PID USER FD TYPE DEVICE NODE NAME +quicknote 40014 dmitrijnaumov 5u IPv6 0x3677...01c0 TCP *:8080 (LISTEN) +``` +→ The **child** PID 40014 (not the `go run` wrapper) owns `*:8080`. *Decision: +exactly one listener — the second instance never bound, consistent with EADDRINUSE.* + +**3. Reachable from host?** (`curl` health probe) + +``` +$ curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8080/health +200 +``` +→ The surviving first instance serves fine. *Decision: not a connectivity/app +fault — the failure is purely the second bind losing the race for the port.* + +**4. Firewall blocking?** (`pfctl`, replaces `iptables`/`nft`) + +``` +$ sudo pfctl -sr +scrub-anchor "com.apple/*" all fragment reassemble +anchor "com.apple/*" all +``` +→ Only default macOS anchors, no rules blocking 8080. *Decision: not a firewall +problem — network layer ruled out.* + +**5. DNS?** (`dig`) + +``` +$ dig +short localhost +(empty) +``` +→ Empty is **expected**: `localhost` resolves from `/etc/hosts`, not DNS, and +`dig` only queries DNS. *Decision: name resolution not involved — "it's not DNS."* + +**Conclusion of the chain:** running ✓, listening ✓ (one instance), reachable ✓, +firewall clear, DNS irrelevant → the only fault is the **second process losing the +bind race** on an already-occupied port. + +### 2.3 Repair + re-verify — and a process-tree gotcha + +```bash +kill $PID1 # PID1 = 40010 +sleep 1 +ADDR=:8080 go run . & +sleep 1 +curl -s http://localhost:8080/health +``` + +Result — the new instance **still failed to bind**, yet `/health` returned 200: + +``` +[1] + terminated ADDR=:8080 go run . +2026/06/16 23:56:03 listen: listen tcp :8080: bind: address already in use +exit status 1 +$ curl -s http://localhost:8080/health +{"notes":7,"status":"ok"} +``` + +**Why:** `kill $PID1` killed only the `go run` **wrapper** (40010), not its child +listener `quicknotes` (40014). The child was **orphaned and kept holding `:8080`**, +so the fresh `go run` hit the same `EADDRINUSE`, and the 200 from `/health` is the +*old orphaned listener* still answering — not a clean restart. A classic false-green. + +**Correct repair** — kill the actual listener found via `lsof`, then restart: + +```bash +lsof -nP -iTCP:8080 # find the real listener PID (the quicknotes child, 40014) +kill 40014 # or: pkill quicknotes +sleep 1 +ADDR=:8080 go run . & # now binds cleanly +curl -s http://localhost:8080/health +``` + +### 2.4 Blameless mini-postmortem (≤200 words) + +A second instance crash-looped with `bind: address already in use`. No one "did it +wrong" — the system simply *permitted* two processes to contend for one `host:port` +with nothing enforcing single-instance ownership. The incident exposed a second, +sharper systemic gap: killing the `go run` wrapper did **not** reap its child +server. Wrapper and listener are different PIDs, and a plain `kill` of the parent +orphaned the child, which kept holding the port. The misleading `200` from the +health check then made a *failed* restart look successful. + +What's systemic: process *lifecycle* and *ownership* were both unmanaged. A real +supervisor (launchd, systemd, or a container runtime) tracks the whole process +group/cgroup, so stopping the unit reaps every child and frees the port atomically, +and it refuses to start a duplicate. Preventive tooling: readiness/liveness probes +that distinguish "bound and serving" from "stale orphan answering," a deploy +pre-check that the port is free (`lsof`/`ss`) before launch, and dynamic port +allocation or per-container network namespaces so two instances can never collide +in the first place. +--- diff --git a/submissions/verified.png b/submissions/verified.png new file mode 100644 index 000000000..e7f1e43dd Binary files /dev/null and b/submissions/verified.png differ