Skip to content

Chandra179/rocks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

Infrastructure Engineering for Software Engineers

Learning path: SWE -> Infra Engineer. Ordered by dependency. ~16 weeks.

Phase 1: Foundation (Week 1-4)

Linux Internals

  • Processes: fork/exec, signals, PID namespaces
  • cgroups v2: cpu/memory/io limits
  • Filesystems: ext4, overlayfs, inodes, /proc, /sys
  • System calls: strace, lsof, perf
  • Exercise: Write a minimal container in ~50 lines bash using unshare
  • Tools: strace, lsof, bpftrace, unshare

Networking

  • OSI layers 3-7, TCP handshake/congestion control, UDP
  • DNS resolution chain, dig, tcpdump
  • HTTP/1.1, HTTP/2, TLS 1.3 handshake
  • Load balancers (L4 vs L7), reverse proxies, NAT
  • Exercise: Capture & analyze a TLS handshake with tcpdump + Wireshark
  • Tools: tcpdump, Wireshark, dig, curl, netcat

Shell & Scripting

  • bash scripting, jq, awk, sed
  • Process management, signal trapping
  • Exercise: Write a deployment health-check script polling an API with retries
  • Tools: bash, jq, awk, sed, xargs

Phase 2: Core Infra Primitives (Week 5-8)

Containers

  • Docker internals: overlayfs layers, OCI runtime spec, runc
  • Image building: multi-stage builds, layer caching, distroless
  • Container networking: bridge, host, CNI
  • Exercise: Build multi-service app with Docker Compose, then break networking
  • Tools: Docker, Docker Compose, dive (image layer inspector)

Infrastructure as Code (IaC)

  • Declarative vs imperative. State management.
  • Terraform: providers, resources, data sources, modules, remote state
  • Pulumi: general-purpose language for infra
  • Exercise: Provision VPC + subnet + EC2, destroy, re-provision
  • Tools: Terraform, Pulumi, OpenTofu

CI/CD

  • Pipeline stages: build, test, scan, deploy
  • Artifact management, container registries
  • Deployment strategies: rolling, blue-green, canary
  • Exercise: GitHub Actions pipeline builds + tests + pushes Docker image
  • Tools: GitHub Actions, GitLab CI, ArgoCD (intro)

GitOps

  • Declarative config in git as source of truth
  • Pull-based reconciliation (ArgoCD/Flux)
  • Branching models for infra repos
  • Exercise: Deploy app via ArgoCD with automatic sync from git
  • Tools: ArgoCD, Flux, Helm

Phase 3: Orchestration & Runtime (Week 9-12)

Kubernetes

  • Architecture: control plane, kubelet, etcd, scheduler
  • Core objects: Pod, Deployment, Service, Ingress, ConfigMap, Secret
  • Scheduling: affinity, taints/tolerations, topology spread
  • Storage: PV, PVC, StorageClass, CSI
  • RBAC: roles, bindings, service accounts
  • Exercise: Run 3-tier app (frontend, API, DB) on kind/k3s with proper probes
  • Tools: kind, k3s, kubectl, k9s, kubectx

Cloud Services (AWS focus)

  • Compute: EC2, Auto Scaling Groups, spot instances
  • Networking: VPC, subnets, security groups, NAT gateway
  • Storage: S3, EBS, EFS
  • IAM: policies, roles, trust relationships
  • Managed services: RDS, EKS, SQS, Lambda
  • Exercise: Architect & deploy same 3-tier app using AWS managed services
  • Tools: AWS CLI, aws-vault, LocalStack

Service Mesh & Networking

  • Sidecar pattern, Envoy proxy
  • Traffic management: routing, retries, circuit breaking, timeouts
  • mTLS, authentication policies
  • Exercise: Install Istio on k8s cluster, configure traffic splitting
  • Tools: Istio, Linkerd, Envoy, Cilium (intro)

Phase 4: Production Practices (Week 13-16)

Observability

  • Pillars: logs, metrics, traces
  • Structured logging (JSON), log aggregation (Loki/ELK)
  • Metrics: RED (Rate/Errors/Duration), USE (Utilization/Saturation/Errors)
  • Prometheus: metrics types, PromQL, alerting rules
  • Distributed tracing: OpenTelemetry, Jaeger
  • Exercise: Instrument 3-tier app with OpenTelemetry, dashboards in Grafana
  • Tools: Prometheus, Grafana, Loki, OpenTelemetry, Jaeger

SRE & Reliability

  • SLI -> SLO -> SLA -> Error Budget
  • Incident management: runbooks, blameless postmortems
  • Toil reduction: automate repetitive ops work
  • Exercise: Define SLIs for 3-tier app, simulate failures, write postmortem
  • Tools: PagerDuty (concept), incident.io (concept)

Security

  • Least privilege: IAM roles, k8s RBAC, network policies
  • Secrets management: Vault, Sealed Secrets, SOPS
  • Supply chain: image signing (cosign), SBOM (syft/grype)
  • Vulnerability scanning in CI
  • Exercise: Scan container images, enforce network policies, rotate secrets with Vault
  • Tools: HashiCorp Vault, cosign, trivy, kyverno

Capacity & Cost

  • Resource requests/limits, HPA, VPA, cluster autoscaler
  • Right-sizing, idle resource detection
  • FinOps: tagging, cost allocation, reserved instances
  • Exercise: Set up HPA based on custom Prometheus metrics, analyze cost with Kubecost
  • Tools: Kubecost, kube-state-metrics, Goldilocks

Learning Methodology

  1. Read docs, then break things. Understanding comes from fixing failures.
  2. Every tool: install, configure, break, fix, then read source code.
  3. Build from scratch before using managed services. Know what abstraction hides.
  4. Debug in production mindset: learn strace, tcpdump, perf early.
  5. Write about it. Blogging forces clarity.

Key References

About

your company software infrastructure

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages