This repository is the single source of truth for the blueberry-k3s K3S cluster running on Raspberry Pi 4.
Architecture Philosophy: Minimal, deterministic, reproducible GitOps for edge/SBC environments. See AGENTS.md for detailed guardrails and constraints.
- Cluster Name:
blueberry-k3s - Hardware: Raspberry Pi 4 (8GB RAM)
- Architecture: aarch64
- Kubernetes: K3S (single server node, may scale to +2 agent nodes)
- Storage: USB-attached
- GitOps: FluxCD v2.4.0
.
├── clusters/
│ └── blueberry-k3s/ # Cluster-specific entrypoint
│ ├── flux-system/ # Flux controllers and GitRepository source
│ ├── infrastructure.yaml # Infrastructure Kustomization
│ └── kustomization.yaml # Root composition
├── infrastructure/
│ └── monitoring/ # Prometheus + Grafana observability stack
├── apps/ # Application workloads (empty initially)
├── .github/
│ └── workflows/ # CI validation (lint, kubeconform, policy checks)
└── AGENTS.md # Repository guardrails and architectural philosophy
Before bootstrapping Flux, ensure:
-
K3S installed on
blueberry-k3s- K3S should be configured with your desired settings
kubectlaccess to the cluster
-
Flux CLI installed (v2.4.0)
curl -s https://fluxcd.io/install.sh | sudo bash -
Git repository access
- SSH key or personal access token configured
- Write access to this repository
-
No port conflicts
- Cockpit runs on port 9090
- Prometheus configured to use port 9091
- Grafana uses port 3000
-
Fork or clone this repository
-
Update the GitRepository URL in
clusters/blueberry-k3s/flux-system/gotk-sync.yaml:spec: url: ssh://git@github.com/YOUR_ORG/YOUR_REPO
-
Bootstrap Flux:
flux bootstrap git \ --url=ssh://git@github.com/YOUR_ORG/YOUR_REPO \ --branch=main \ --path=clusters/blueberry-k3s \ --private-key-file=/path/to/ssh/key
Or, if using GitHub directly:
flux bootstrap github \ --owner=YOUR_ORG \ --repository=YOUR_REPO \ --branch=main \ --path=clusters/blueberry-k3s \ --personal
-
Verify reconciliation:
flux get kustomizations flux get helmreleases -A
-
Verify deployment:
# Check Flux reconciliation flux get kustomizations flux get helmreleases -A # Verify all monitoring pods are running kubectl get pods -n monitoring # Expected pods: # - kube-prometheus-stack-operator-* # - prometheus-kube-prometheus-stack-prometheus-0 # - grafana-* # - blackbox-exporter-* # - speedtest-exporter-* # - node-exporter-* (one per node)
-
Validate internet monitoring:
# Check Prometheus targets (all should be "UP") kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9091:9091 # Visit http://localhost:9091/targets # Look for: blackbox-http (3 targets), speedtest (1 target), node (1 target)
-
Access Kubernetes Dashboard:
# Get read-only access token (valid for 1 year) kubectl -n kubernetes-dashboard create token dashboard-viewer --duration=8760h # Access dashboard at https://<node-ip>:30800 # Login with the token from above # Note: Browser will warn about self-signed cert - this is expected
-
Access Grafana:
kubectl port-forward -n monitoring svc/grafana 3000:80
- URL: http://localhost:3000
- Default credentials:
admin/admin(change immediately)
-
Access Prometheus (optional):
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9091:9091
Prometheus (kube-prometheus-stack v67.4.0):
- Prometheus Operator + Prometheus server
- Port: 9091 (to avoid Cockpit conflict on 9090)
- Retention: 30 days / 10GB (increased for internet monitoring historical data)
- Scrape interval: 60s (tuned for edge IO constraints)
- Resource limits: 1 CPU / 1.5GB RAM
- Disabled: Alertmanager, built-in node-exporter, kube-state-metrics (can enable later)
- No persistence (emptyDir) - can be added later if needed
Grafana (v8.5.2 / image 11.4.0):
- Pre-configured Prometheus datasource
- Default dashboards: Kubernetes cluster overview, pod monitoring, internet connection, node metrics
- Resource limits: 500m CPU / 512MB RAM
- No persistence (emptyDir)
- Default credentials:
admin/admin⚠️ Change after first login
Internet Monitoring Exporters:
Internet monitoring tracks connectivity quality (bandwidth, latency, uptime) to detect ISP issues or network degradation.
Blackbox Exporter (prom/blackbox-exporter:v0.25.0):
- HTTP/ICMP probing for uptime and latency monitoring
- Default targets: google.com, github.com, cloudflare.com (customizable via ConfigMap)
- Scrape interval: 30s
- Resource usage: 50m CPU / 64Mi RAM (limits: 200m / 128Mi)
Speedtest Exporter (miguelndecarvalho/speedtest-exporter:v0.5.1):
- Bandwidth testing via Speedtest.net
- Scrape interval: 60m
⚠️ Bandwidth consumption: ~500MB/day (not suitable for metered connections)- To reduce bandwidth, increase scrape interval in
prometheus-helmrelease.yaml - Resource usage: 100m CPU / 128Mi RAM (limits: 500m / 256Mi, spikes during test)
Node Exporter (prom/node-exporter:v1.8.2):
- System metrics (CPU, memory, disk, network)
- Deployed as DaemonSet (runs on all nodes)
- Security note: Requires
privileged: trueandhostNetwork: truefor full system access (standard node-exporter requirement) - Deployed separately from kube-prometheus-stack's built-in node-exporter for explicit configuration control and version independence
- Important: Do not enable
nodeExporter.enabled: truein Prometheus HelmRelease - it will conflict with this deployment - Scrape interval: 15s
- Resource usage: 100m CPU / 128Mi RAM (limits: 250m / 256Mi)
Grafana Dashboards:
- Internet connection - Bandwidth graphs, latency gauges, uptime timeline (in "Internet Monitoring" folder)
- Node Exporter Full (gnetId 1860) - System metrics visualization
- Note: Speedtest metrics appear after first 60-minute scrape cycle
Resource Usage (approximate):
- Total CPU: ~1.6 cores (requests) / ~2.5 cores (limits)
- Total RAM: ~1.3GB (requests) / ~2.9GB (limits)
- Network: ~500MB/day (speedtest-exporter only)
- Storage growth: ~500MB/week with all exporters enabled
- Acceptable for 8GB Raspberry Pi 4 with headroom for workloads
# Overall health
flux check
# Reconciliation status
flux get sources git
flux get kustomizations
# HelmRelease status
flux get helmreleases -A# Reconcile infrastructure
flux reconcile kustomization infrastructure --with-source
# Reconcile specific HelmRelease
flux reconcile helmrelease -n monitoring kube-prometheus-stack
flux reconcile helmrelease -n monitoring grafana# Flux controller logs
flux logs --level=error --all-namespaces
# Kubernetes Dashboard logs
kubectl logs -n kubernetes-dashboard -l app.kubernetes.io/name=kong
kubectl logs -n kubernetes-dashboard -l app.kubernetes.io/name=kubernetes-dashboard
# Prometheus operator logs
kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus-operator
# Grafana logs
kubectl logs -n monitoring -l app.kubernetes.io/name=grafanaHelmRelease stuck or failing:
kubectl describe helmrelease -n monitoring kube-prometheus-stack
kubectl describe helmrelease -n monitoring grafana
kubectl describe helmrelease -n kubernetes-dashboard kubernetes-dashboardKubernetes Dashboard not accessible:
- Verify pod status:
kubectl get pods -n kubernetes-dashboard - Check service:
kubectl get svc -n kubernetes-dashboard kubernetes-dashboard-kong - Check NodePort is 30800:
kubectl get svc -n kubernetes-dashboard -o yaml | grep nodePort - Browser warning about certificate is expected (self-signed)
Dashboard token issues:
# Create new token (1 year validity)
kubectl -n kubernetes-dashboard create token dashboard-viewer --duration=8760h
# Verify service account exists
kubectl get sa -n kubernetes-dashboard dashboard-viewer
# Check ClusterRoleBinding
kubectl get clusterrolebinding dashboard-viewerPrometheus not scraping:
# Check Prometheus targets
kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9091:9091
# Visit http://localhost:9091/targetsGrafana datasource issues:
- Verify Prometheus service name:
kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9091 - Check Grafana datasource config in Grafana UI
Speedtest Exporter failures:
Common causes:
- DNS resolution failure (check
/etc/resolv.confin pod) - Speedtest.net outage or rate limiting
- Network connectivity issues
- First scrape takes 60 minutes - dashboard gauges remain empty until first test completes
Diagnostics:
kubectl logs -n monitoring -l app=speedtest-exporter
kubectl exec -it -n monitoring deploy/speedtest-exporter -- ping -c3 www.speedtest.netReducing bandwidth usage:
If 500MB/day is too high, edit infrastructure/monitoring/prometheus-helmrelease.yaml:
scrape_interval: 120m # Reduces to ~250MB/day
# or
scrape_interval: 180m # Reduces to ~170MB/dayNode Exporter not showing metrics:
- Verify privileged security context is allowed
- Check hostPath mounts are accessible
- Ensure no port conflict with built-in node-exporter (should be disabled)
To add or change HTTP probe targets:
-
Edit
infrastructure/monitoring/prometheus-targets-configmap.yaml:data: blackbox-targets.yaml: | - targets: - http://www.google.com/ - https://github.com/ - https://www.cloudflare.com/ - https://your-isp-homepage.com/ # Add custom target - http://192.168.1.1/ # Monitor local gateway
-
Commit and push:
git add infrastructure/monitoring/prometheus-targets-configmap.yaml git commit -m "chore(monitoring): add custom probe targets" git push -
Prometheus auto-reloads configuration within 30 seconds
All upgrades must be done via Git commits (PRs recommended).
- Update chart version in
infrastructure/monitoring/*-helmrelease.yaml - Review upstream changelog
- Test reconciliation:
flux reconcile helmrelease -n monitoring <name> - Monitor for errors:
flux logs
Exporters are deployed as raw Kubernetes manifests (not Helm).
To upgrade an exporter:
-
Check for new version in upstream repository:
-
Review CHANGELOG for breaking changes:
- ConfigMap structure changes (blackbox-exporter)
- Metrics format changes (all exporters)
- New resource requirements
- Security updates
-
Update image tag and digest in
infrastructure/monitoring/exporters/<exporter>.yaml:# Example: Upgrading blackbox-exporter image: prom/blackbox-exporter:v0.26.0@sha256:NEW_DIGEST_HERE
-
Get ARM64 digest (for Prometheus official images):
docker manifest inspect prom/blackbox-exporter:v0.26.0 | \ jq -r '.manifests[] | select(.platform.architecture == "arm64") | .digest'
-
Update ConfigMap if needed (blackbox-exporter only):
# If blackbox.yml config format changed vim infrastructure/monitoring/exporters/blackbox-exporter.yaml -
Commit and push:
git add infrastructure/monitoring/exporters/ git commit -m "chore(monitoring): upgrade blackbox-exporter to v0.26.0" git push -
Verify deployment:
kubectl get pods -n monitoring -w kubectl logs -n monitoring -l app=blackbox-exporter # Check metrics endpoint kubectl port-forward -n monitoring svc/blackbox-exporter 9115:9115 curl http://localhost:9115/metrics
Rollback: Revert Git commit if issues arise:
git revert HEAD
git pushNote: Prometheus data is stored in emptyDir (ephemeral). Rolling back exporter versions does not affect historical data, but data will be lost if Prometheus pod is deleted.
Dashboard is deployed via HelmRelease (version upgrades are simpler than raw manifests).
- Check new version at https://artifacthub.io/packages/helm/k8s-dashboard/kubernetes-dashboard
- Review release notes for breaking changes
- Update version in
infrastructure/dashboard/kubernetes-dashboard-helmrelease.yaml:spec: chart: spec: version: 7.15.0 # Update this line
- Commit and push:
git add infrastructure/dashboard/kubernetes-dashboard-helmrelease.yaml git commit -m "chore(dashboard): upgrade kubernetes-dashboard to v7.15.0" git push - Monitor upgrade:
flux logs --follow kubectl get pods -n kubernetes-dashboard -w
Rollback: Same as Helm charts - revert commit or update to previous version.
# Check current version
flux version
# Upgrade Flux CLI
curl -s https://fluxcd.io/install.sh | sudo bash
# Upgrade controllers
flux install --export > clusters/blueberry-k3s/flux-system/gotk-components.yaml
git add clusters/blueberry-k3s/flux-system/gotk-components.yaml
git commit -m "chore: upgrade Flux to vX.Y.Z"
git pushTo rollback to a previous state:
# Find known-good commit
git log --oneline
# Revert to commit
git revert <commit-sha>
git push
# Or hard reset (use with caution)
git reset --hard <commit-sha>
git push --forceFlux will reconcile to the reverted state automatically.
Note: CRD changes and stateful components may not rollback cleanly. Always test upgrades in a non-production environment first.
- Create component directory under
infrastructure/orapps/ - Add manifests or HelmRelease
- Update parent kustomization.yaml to reference new component
- Commit and push
- Verify reconciliation:
flux get kustomizations
Example:
mkdir -p infrastructure/ingress
# Add manifests...
echo " - ingress" >> infrastructure/kustomization.yaml
git add infrastructure/
git commit -m "feat: add ingress-nginx"
git pushPull requests are automatically validated with:
- yamllint: YAML syntax and formatting
- kustomize build: Ensure manifests build successfully
- kubeconform: Kubernetes schema validation
- Policy checks: No
:latesttags, explicit namespaces
See .github/workflows/validate.yaml for details.
This cluster runs on a Raspberry Pi 4 with limited resources:
- RAM: 8GB total (K3S + system overhead ~1-2GB)
- CPU: 4 cores (ARM Cortex-A72)
- Storage: USB-attached (limited IO bandwidth, avoid write-heavy workloads)
Design Principles:
- Conservative resource requests/limits
- Minimal scrape intervals for Prometheus
- No persistent storage by default (can be added later)
- Disabled non-essential exporters and controllers
- Single-replica deployments (no HA)
See AGENTS.md for full architectural constraints.
admin / admin. Change these immediately after first login.
For production use, consider:
- Implementing SOPS encryption for secrets (see Flux SOPS guide)
- Setting up proper ingress with TLS
- Configuring authentication for Prometheus/Grafana
- Enabling RBAC policies
See AGENTS.md for contribution guidelines and architectural guardrails.
Key principles:
- Keep changes minimal and justified
- Pin all versions (charts, images)
- Test in CI before merging
- Document resource impact
- Ensure reproducibility
See LICENSE