Skip to content

Overview: reliability, diagnostics, and performance foundations for GooseRelayVPN#153

Draft
poulcarlsen53 wants to merge 1 commit into
Kianmhz:mainfrom
poulcarlsen53:optimization/overview-bundle
Draft

Overview: reliability, diagnostics, and performance foundations for GooseRelayVPN#153
poulcarlsen53 wants to merge 1 commit into
Kianmhz:mainfrom
poulcarlsen53:optimization/overview-bundle

Conversation

@poulcarlsen53
Copy link
Copy Markdown
Contributor

Context

This is intentionally a large overview PR. It is meant to show the full shape of a performance and reliability branch, not to ask for a single large merge as-is.

The branch grew out of field testing on difficult Apps Script/fronted routes where direct connections are not always available. The goal was to make the tunnel easier to operate under quota pressure, mobile network pauses, slow Google fronting paths, and ambiguous relay failures.

Main themes

Apps Script and fronted POST reliability

  • Apps Script is treated as a first-class transport path rather than just a fallback.
  • Client defaults, examples, and diagnostics are adjusted around Apps Script deployments and account buckets.
  • Endpoint failure handling distinguishes transient failures, quota-like failures, and non-batch relay responses more clearly.
  • The Apps Script forwarder is hardened so obvious non-tunnel payloads and relay loops do not waste quota unnecessarily.

Mobile and lossy-network recovery

  • Adds downstream ACK/replay support so stateless POST response loss can be recovered instead of silently corrupting active streams.
  • Adds startup/run reset behavior so a restarted client can ask the server to clear stale sessions from an older run.
  • Extends shutdown diagnostics and best-effort cleanup so Ctrl+C and restarts leave fewer stale server-side sessions.
  • Adds tests around replay behavior, pending opens, reconnect cases, and cleanup paths.

Performance foundations

  • Adds tunable response caps and max drain frames per session so field testing can balance page snappiness against bulk download throughput.
  • Adds direct stream plumbing and benchmarks for environments where direct routing is reachable.
  • Adds fronting probes and more timing information to explain slow startup and high TTFB cases.
  • Adds benchmark harness improvements and refreshed baseline support so performance changes can be compared more systematically.

Observability and supportability

  • Adds cmd/analyze for log/diagnostic review.
  • Adds client/server diagnostics improvements, terminal run logging, config summaries, and stats JSON fields for endpoint health, queue wait, replay, compression, and quota behavior.
  • Adds docs describing real-world testing, expected ceilings, and architecture tradeoffs.

Safety and hardening

  • Adds stricter config validation and unknown-field handling.
  • Adds server-side and client-side tests around session ownership, frame handling, payload cloning, timer usage, replay windows, and shutdown behavior.
  • Adds pprof/debug helpers behind explicit configuration.

Suggested review approach

I do not expect this overview branch to be the easiest merge unit. The useful path is probably to split it into smaller PRs, for example:

  1. Apps Script forwarder hardening and static tests.
  2. Quota/account quarantine persistence and endpoint health diagnostics.
  3. Diagnostics/analyzer improvements.
  4. Downstream ACK/replay with focused protocol and recovery tests.
  5. Benchmark harness and docs updates.
  6. Direct stream support if that direction is useful for the project.

I am opening this large PR first so the maintainer can see how the pieces fit together before reviewing smaller slices.

Validation

  • go test -count=1 ./...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant