Add ghproxy caching server for GitHub API GET traffic#838
Conversation
There was a problem hiding this comment.
1 issue found across 11 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="internal/source/etag_transport_test.go">
<violation number="1" location="internal/source/etag_transport_test.go:268">
P2: This test uses separate ETag transport instances per client, so it cannot verify cache-key separation by upstream header. Share one ETag transport across both clients to make the assertion meaningful.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
gjkim42
left a comment
There was a problem hiding this comment.
Add a ghproxy as well to the repository.
/kelos pick-up
cmd/kelos-controller/main.go
Outdated
| flag.StringVar(&tokenRefresherImagePullPolicy, "token-refresher-image-pull-policy", "", "The image pull policy for the token refresher sidecar (e.g., Always, Never, IfNotPresent).") | ||
| flag.StringVar(&tokenRefresherResourceRequests, "token-refresher-resource-requests", "", "Resource requests for token refresher sidecars as comma-separated name=value pairs (e.g., cpu=100m,memory=128Mi).") | ||
| flag.StringVar(&tokenRefresherResourceLimits, "token-refresher-resource-limits", "", "Resource limits for token refresher sidecars as comma-separated name=value pairs (e.g., cpu=200m,memory=256Mi).") | ||
| flag.StringVar(&spawnerGitHubReadProxyBaseURL, "spawner-github-read-proxy-base-url", "", "Optional ghproxy base URL for routing spawner GitHub GET traffic through a centralized cache.") |
There was a problem hiding this comment.
The new field should be added to the taskspawner crd.
|
🤖 Kelos Task Status Task |
|
/kelos needs-input 🤖 Kelos Agent @gjkim42 All review feedback addressed:
All CI checks pass (build, test, test-integration, verify). Ready for re-review. |
There was a problem hiding this comment.
3 issues found across 12 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="cmd/ghproxy/Dockerfile">
<violation number="1" location="cmd/ghproxy/Dockerfile:1">
P2: Pin the distroless base image to an immutable digest instead of a mutable tag so rebuilds are deterministic and don’t silently pull changed upstream contents.
(Based on your team's feedback about pinning Docker images to avoid silent external version changes.) [FEEDBACK_USED]</violation>
</file>
<file name="cmd/ghproxy/main.go">
<violation number="1" location="cmd/ghproxy/main.go:136">
P2: The cache is unbounded and never evicted, so long-running proxy pods can accumulate response bodies until memory pressure/OOM.</violation>
<violation number="2" location="cmd/ghproxy/main.go:147">
P1: Forward the GitHub `Link` header; dropping it breaks pagination for issue/comment/PR discovery behind the proxy.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| @@ -0,0 +1,5 @@ | |||
| FROM gcr.io/distroless/static:nonroot | |||
There was a problem hiding this comment.
P2: Pin the distroless base image to an immutable digest instead of a mutable tag so rebuilds are deterministic and don’t silently pull changed upstream contents.
(Based on your team's feedback about pinning Docker images to avoid silent external version changes.)
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cmd/ghproxy/Dockerfile, line 1:
<comment>Pin the distroless base image to an immutable digest instead of a mutable tag so rebuilds are deterministic and don’t silently pull changed upstream contents.
(Based on your team's feedback about pinning Docker images to avoid silent external version changes.) </comment>
<file context>
@@ -0,0 +1,5 @@
+FROM gcr.io/distroless/static:nonroot
+WORKDIR /
+COPY bin/ghproxy .
</file context>
There was a problem hiding this comment.
2 issues found across 8 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="api/v1alpha1/taskspawner_types.go">
<violation number="1" location="api/v1alpha1/taskspawner_types.go:443">
P1: Renaming the CRD JSON field to `githubReadBaseURL` introduces a breaking API change without backward compatibility. Existing manifests using `githubReadProxyBaseURL` will silently stop configuring the read proxy.</violation>
</file>
<file name="cmd/ghproxy/main.go">
<violation number="1" location="cmd/ghproxy/main.go:131">
P2: Link header rewriting hardcodes `http://`, causing incorrect pagination URLs when ghproxy is served over HTTPS.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
1 issue found across 5 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="cmd/ghproxy/metrics_test.go">
<violation number="1" location="cmd/ghproxy/metrics_test.go:72">
P2: Avoid calling `t.Fatal` inside the HTTP handler goroutine; it can cause nondeterministic test failures. Use a non-`FailNow` failure path (e.g., `t.Errorf`) and return.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Add a centralized ghproxy caching server that all TaskSpawner pods use for GitHub GET traffic, while keeping reporting writes and token minting on the direct upstream API. ghproxy: - Lightweight reverse proxy with two-tier caching: TTL-based freshness (--cache-ttl, default 15s) avoids upstream round trips, falling back to ETag revalidation when stale - Cache key varies by upstream host, path, Accept, and Authorization (SHA-256 hashed) to prevent cross-user and cross-host collisions - Link headers rewritten so pagination URLs route back through the proxy - --allowed-upstreams flag (configurable via ghproxy.allowedUpstreams Helm value and --ghproxy-allowed-upstreams install flag) prevents SSRF - ghproxy_github_api_requests_total Prometheus counter with method, status_code, resource, and cache labels on --metrics-address (:9090) Spawner changes: - Always routes discovery GETs through ghproxy at a well-known in-cluster URL (http://ghproxy.kelos-system:8888) - UpstreamHeaderTransport injects X-Kelos-GitHub-Upstream-Base-URL on GETs so ghproxy knows which upstream to forward to (derived from the workspace repo URL via --github-api-base-url) - Client-side ETagTransport removed — ghproxy handles all caching Deployment: - ghproxy Deployment + Service added to the Helm chart (always deployed) - PodMonitoring resource added to deploy-dev workflow - ghproxy image loaded into kind for e2e tests - ghproxy restarted alongside controller/spawners in deploy-dev
Add three new agent conventions from recent PR review feedback: 1. Per-TaskSpawner configuration should be CRD fields, not controller flags (PR #838 - gjkim42 review) 2. CRD API backward compatibility - never rename JSON field tags (PR #838 - P1 review finding) 3. Gate optional CRDs behind Helm values flags (PR #821 - PodMonitoring broke installs on clusters without monitoring.googleapis.com) Also includes previously proposed conventions from PR #786: - Consistent guidance across surfaces - Provider-agnostic API design - Idiomatic Helm values - Deploy-dev workflow sync - Controller-driven migration - Release note user action requirements Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
/kind feature
What this PR does / why we need it:
Adds a centralized ghproxy caching server that all TaskSpawner pods use for GitHub GET traffic, while keeping reporting writes and GitHub App token minting on the direct upstream API.
When multiple TaskSpawner pods watch the same repos, each currently hits GitHub independently. This feature deploys a shared caching proxy as part of the Helm chart that all spawner pods use for discovery GETs, reducing duplicate GitHub API requests.
Key changes:
cmd/ghproxy: a lightweight caching reverse proxy for GitHub API GET requests, with ETag-based caching keyed by upstream host, request path, Accept, and Authorization headershttp://ghproxy.kelos-system:8888UpstreamHeaderTransportinjectsX-Kelos-GitHub-Upstream-Base-URLon GET requests so ghproxy knows which upstream to forward to (derived from the workspace repo URL via--github-api-base-url)--cache-ttl, default 15s) serves fresh entries without upstream revalidation--allowed-upstreamsconfigurable viaghproxy.allowedUpstreamsHelm value (default:https://api.github.com)Which issue(s) this PR is related to:
Fixes #837
Special notes for your reviewer:
UpstreamHeaderTransport > MetricsTransport > DefaultTransportX-Kelos-GitHub-Upstream-Base-URLheader + path + Accept + Authorization hashX-Forwarded-Proto)ghproxy.allowedUpstreamsHelm value restricts which upstream hosts ghproxy will forward to, preventing SSRFghproxy_github_api_requests_totalPrometheus counter withmethod,status_code,resource,cachelabels; served on--metrics-address(:9090)fresh_hit(served within TTL),revalidated_hit(upstream 304),miss(fresh fetch),skip(non-GET or error)Does this PR introduce a user-facing change?