Skip to content

Add ghproxy caching server for GitHub API GET traffic#838

Merged
gjkim42 merged 1 commit intomainfrom
kelos-task-837
Mar 29, 2026
Merged

Add ghproxy caching server for GitHub API GET traffic#838
gjkim42 merged 1 commit intomainfrom
kelos-task-837

Conversation

@kelos-bot
Copy link
Copy Markdown

@kelos-bot kelos-bot bot commented Mar 29, 2026

/kind feature

What this PR does / why we need it:

Adds a centralized ghproxy caching server that all TaskSpawner pods use for GitHub GET traffic, while keeping reporting writes and GitHub App token minting on the direct upstream API.

When multiple TaskSpawner pods watch the same repos, each currently hits GitHub independently. This feature deploys a shared caching proxy as part of the Helm chart that all spawner pods use for discovery GETs, reducing duplicate GitHub API requests.

Key changes:

  • New cmd/ghproxy: a lightweight caching reverse proxy for GitHub API GET requests, with ETag-based caching keyed by upstream host, request path, Accept, and Authorization headers
  • ghproxy Deployment + Service added to the Helm chart (always deployed)
  • Spawner always routes discovery GETs through ghproxy at http://ghproxy.kelos-system:8888
  • New UpstreamHeaderTransport injects X-Kelos-GitHub-Upstream-Base-URL on GET requests so ghproxy knows which upstream to forward to (derived from the workspace repo URL via --github-api-base-url)
  • Cache TTL (--cache-ttl, default 15s) serves fresh entries without upstream revalidation
  • --allowed-upstreams configurable via ghproxy.allowedUpstreams Helm value (default: https://api.github.com)
  • Token refresher and reporting continue using the direct upstream API
  • Client-side ETagTransport removed — ghproxy handles all caching

Which issue(s) this PR is related to:

Fixes #837

Special notes for your reviewer:

  • The spawner transport chain: UpstreamHeaderTransport > MetricsTransport > DefaultTransport
  • POST/PATCH requests pass through the upstream header transport unmodified (header only added on GETs)
  • The ghproxy server caches GET responses with ETags and keys by X-Kelos-GitHub-Upstream-Base-URL header + path + Accept + Authorization hash
  • Link headers from upstream are rewritten so pagination URLs route back through the proxy (scheme-aware via X-Forwarded-Proto)
  • ghproxy.allowedUpstreams Helm value restricts which upstream hosts ghproxy will forward to, preventing SSRF
  • ghproxy_github_api_requests_total Prometheus counter with method, status_code, resource, cache labels; served on --metrics-address (:9090)
  • Cache label values: fresh_hit (served within TTL), revalidated_hit (upstream 304), miss (fresh fetch), skip (non-GET or error)
  • PodMonitoring resource added to deploy-dev workflow

Does this PR introduce a user-facing change?

Add ghproxy caching server to reduce duplicate GitHub API requests across TaskSpawner pods. Deployed automatically as part of the Helm chart. Configure allowed upstream hosts via ghproxy.allowedUpstreams Helm value.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 11 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="internal/source/etag_transport_test.go">

<violation number="1" location="internal/source/etag_transport_test.go:268">
P2: This test uses separate ETag transport instances per client, so it cannot verify cache-key separation by upstream header. Share one ETag transport across both clients to make the assertion meaningful.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@kelos-bot kelos-bot bot added kind/feature Categorizes issue or PR as related to a new feature and removed needs-kind Indicates an issue or PR lacks a kind/* label labels Mar 29, 2026
Copy link
Copy Markdown
Collaborator

@gjkim42 gjkim42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a ghproxy as well to the repository.
/kelos pick-up

flag.StringVar(&tokenRefresherImagePullPolicy, "token-refresher-image-pull-policy", "", "The image pull policy for the token refresher sidecar (e.g., Always, Never, IfNotPresent).")
flag.StringVar(&tokenRefresherResourceRequests, "token-refresher-resource-requests", "", "Resource requests for token refresher sidecars as comma-separated name=value pairs (e.g., cpu=100m,memory=128Mi).")
flag.StringVar(&tokenRefresherResourceLimits, "token-refresher-resource-limits", "", "Resource limits for token refresher sidecars as comma-separated name=value pairs (e.g., cpu=200m,memory=256Mi).")
flag.StringVar(&spawnerGitHubReadProxyBaseURL, "spawner-github-read-proxy-base-url", "", "Optional ghproxy base URL for routing spawner GitHub GET traffic through a centralized cache.")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new field should be added to the taskspawner crd.

@kelos-bot
Copy link
Copy Markdown
Author

kelos-bot bot commented Mar 29, 2026

🤖 Kelos Task Status

Task kelos-pr-responder-838 has succeeded. ✅

@kelos-bot kelos-bot bot changed the title Add controller-wide ghproxy support for TaskSpawner GitHub GET traffic Add ghproxy server and per-TaskSpawner proxy CRD field for GitHub GET traffic Mar 29, 2026
@kelos-bot
Copy link
Copy Markdown
Author

kelos-bot bot commented Mar 29, 2026

/kelos needs-input

🤖 Kelos Agent @gjkim42

All review feedback addressed:

  1. Moved githubReadProxyBaseURL from controller flag to TaskSpawner CRD field (spec.githubReadProxyBaseURL) — each TaskSpawner can now configure its own proxy independently. Removed the controller flag, Helm value, and DeploymentBuilder field.

  2. Added cmd/ghproxy — a lightweight caching reverse proxy that:

    • Caches GitHub API GET responses using ETags
    • Keys cache entries by X-Kelos-GitHub-Upstream-Base-URL header + request path (supports multi-host routing)
    • Passes through non-GET requests without caching
    • Uses the shared source.UpstreamBaseURLHeader constant
    • Includes Dockerfile and is registered in Makefile IMAGE_DIRS

All CI checks pass (build, test, test-integration, verify). Ready for re-review.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 12 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cmd/ghproxy/Dockerfile">

<violation number="1" location="cmd/ghproxy/Dockerfile:1">
P2: Pin the distroless base image to an immutable digest instead of a mutable tag so rebuilds are deterministic and don’t silently pull changed upstream contents.

(Based on your team's feedback about pinning Docker images to avoid silent external version changes.) [FEEDBACK_USED]</violation>
</file>

<file name="cmd/ghproxy/main.go">

<violation number="1" location="cmd/ghproxy/main.go:136">
P2: The cache is unbounded and never evicted, so long-running proxy pods can accumulate response bodies until memory pressure/OOM.</violation>

<violation number="2" location="cmd/ghproxy/main.go:147">
P1: Forward the GitHub `Link` header; dropping it breaks pagination for issue/comment/PR discovery behind the proxy.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@@ -0,0 +1,5 @@
FROM gcr.io/distroless/static:nonroot
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Pin the distroless base image to an immutable digest instead of a mutable tag so rebuilds are deterministic and don’t silently pull changed upstream contents.

(Based on your team's feedback about pinning Docker images to avoid silent external version changes.)

View Feedback

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At cmd/ghproxy/Dockerfile, line 1:

<comment>Pin the distroless base image to an immutable digest instead of a mutable tag so rebuilds are deterministic and don’t silently pull changed upstream contents.

(Based on your team's feedback about pinning Docker images to avoid silent external version changes.) </comment>

<file context>
@@ -0,0 +1,5 @@
+FROM gcr.io/distroless/static:nonroot
+WORKDIR /
+COPY bin/ghproxy .
</file context>
Fix with Cubic

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 8 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="api/v1alpha1/taskspawner_types.go">

<violation number="1" location="api/v1alpha1/taskspawner_types.go:443">
P1: Renaming the CRD JSON field to `githubReadBaseURL` introduces a breaking API change without backward compatibility. Existing manifests using `githubReadProxyBaseURL` will silently stop configuring the read proxy.</violation>
</file>

<file name="cmd/ghproxy/main.go">

<violation number="1" location="cmd/ghproxy/main.go:131">
P2: Link header rewriting hardcodes `http://`, causing incorrect pagination URLs when ghproxy is served over HTTPS.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 5 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="cmd/ghproxy/metrics_test.go">

<violation number="1" location="cmd/ghproxy/metrics_test.go:72">
P2: Avoid calling `t.Fatal` inside the HTTP handler goroutine; it can cause nondeterministic test failures. Use a non-`FailNow` failure path (e.g., `t.Errorf`) and return.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@gjkim42 gjkim42 changed the title Add ghproxy server and per-TaskSpawner proxy CRD field for GitHub GET traffic Add ghproxy caching server for GitHub API GET traffic Mar 29, 2026
Add a centralized ghproxy caching server that all TaskSpawner pods use
for GitHub GET traffic, while keeping reporting writes and token minting
on the direct upstream API.

ghproxy:
- Lightweight reverse proxy with two-tier caching: TTL-based freshness
  (--cache-ttl, default 15s) avoids upstream round trips, falling back
  to ETag revalidation when stale
- Cache key varies by upstream host, path, Accept, and Authorization
  (SHA-256 hashed) to prevent cross-user and cross-host collisions
- Link headers rewritten so pagination URLs route back through the proxy
- --allowed-upstreams flag (configurable via ghproxy.allowedUpstreams
  Helm value and --ghproxy-allowed-upstreams install flag) prevents SSRF
- ghproxy_github_api_requests_total Prometheus counter with method,
  status_code, resource, and cache labels on --metrics-address (:9090)

Spawner changes:
- Always routes discovery GETs through ghproxy at a well-known
  in-cluster URL (http://ghproxy.kelos-system:8888)
- UpstreamHeaderTransport injects X-Kelos-GitHub-Upstream-Base-URL on
  GETs so ghproxy knows which upstream to forward to (derived from the
  workspace repo URL via --github-api-base-url)
- Client-side ETagTransport removed — ghproxy handles all caching

Deployment:
- ghproxy Deployment + Service added to the Helm chart (always deployed)
- PodMonitoring resource added to deploy-dev workflow
- ghproxy image loaded into kind for e2e tests
- ghproxy restarted alongside controller/spawners in deploy-dev
@gjkim42 gjkim42 merged commit 4c7ba61 into main Mar 29, 2026
9 checks passed
@gjkim42 gjkim42 deleted the kelos-task-837 branch March 29, 2026 15:36
kelos-bot bot pushed a commit that referenced this pull request Mar 29, 2026
Add three new agent conventions from recent PR review feedback:

1. Per-TaskSpawner configuration should be CRD fields, not controller
   flags (PR #838 - gjkim42 review)
2. CRD API backward compatibility - never rename JSON field tags
   (PR #838 - P1 review finding)
3. Gate optional CRDs behind Helm values flags (PR #821 - PodMonitoring
   broke installs on clusters without monitoring.googleapis.com)

Also includes previously proposed conventions from PR #786:
- Consistent guidance across surfaces
- Provider-agnostic API design
- Idiomatic Helm values
- Deploy-dev workflow sync
- Controller-driven migration
- Release note user action requirements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integration: Add controller-wide ghproxy support for TaskSpawner GitHub GET traffic

1 participant