Skip to content

Improve Apps Script relay and example configs#129

Merged
Kianmhz merged 2 commits into
Kianmhz:mainfrom
poulcarlsen53:apps-script-config-hardening
May 20, 2026
Merged

Improve Apps Script relay and example configs#129
Kianmhz merged 2 commits into
Kianmhz:mainfrom
poulcarlsen53:apps-script-config-hardening

Conversation

@poulcarlsen53
Copy link
Copy Markdown
Contributor

Summary

This PR improves the Apps Script forwarder and example configs without changing the tunnel protocol.

Changes:

  • Adds RELAY_URLS support for multiple VPS tunnel endpoints.
  • Adds a relay loop guard so Apps Script cannot accidentally relay to another Apps Script URL.
  • Makes invocation counting optional to avoid a PropertiesService write on every tunnel request.
  • Makes example configs safer to copy by avoiding invalid placeholder values.

Verification

  • go test -count=1 ./...
  • go vet ./...

Compatibility

The wire protocol is unchanged. Existing client/server deployments continue to work after redeploying Code.gs.

@Kianmhz
Copy link
Copy Markdown
Owner

Kianmhz commented May 17, 2026

Thanks for this. The Apps Script changes here are solid and I'd like to land them. Before merging, I'd like to trim the config-example pieces, because most of the new fields they introduce aren't read by internal/config yet. They belong with the larger perf bundle in #131. Right now encoding/json silently drops them, so users copying the example would think they're configuring something when they're not.

Keep as-is:

  • All of apps_script/Code.gs: RELAY_URLS, the GAS_RELAY_LOOP_RE guard, and ENABLE_INVOCATION_COUNTING are all wins, no Go-side dependency.
  • In server_config.example.json: the upstream_proxy: "" fix and the _comment_upstream_proxy note. The old "OPTIONAL_socks5://…" placeholder fails to parse if copied verbatim, good catch.
  • In client_config.example.json: trimming script_keys to a single entry plus the _comment_script_keys note for adding more.

Please drop from this PR (move to #131 alongside the code that reads them):

  • client_config.example.json: transport_mode, direct_stream_urls, performance_mode, auto_tune, workers_per_endpoint, poll_idle_sleep_ms, endpoint_blacklist_base_ms, endpoint_blacklist_max_ms, endpoint_outage_grace_ms, max_request_bytes_pre_encode, stream_connect_timeout_ms, stream_ping_interval_ms, stream_reconnect_backoff_ms, and the _comment_performance / _comment_advanced_performance blocks.
  • server_config.example.json: auto_tune, performance_mode, active_drain_window_ms, long_poll_window_ms, upstream_dial_timeout_ms, coalesce_window_ms, coalesce_window_busy_ms, max_sessions, max_request_body_bytes, max_response_bytes_pre_encode.

tunnel_key placeholder — please also update the validator:
The placeholder swap to "REPLACE_WITH_64_HEX_CHARACTER_RANDOM_KEY" is fine in spirit (scripts/gen-key.sh doesn't actually exist in the repo — only deploy.sh, goose-relay.service, and goose-server.sh are under scripts/, so the existing hint sends users to a missing script), but it desyncs from the validator at internal/config/client.go:326, which still matches on the old string and still tells users to run scripts/gen-key.sh in its error messages (lines 327, 330, 334).

Could you include a small follow-up in this PR: update line 326 to match the new placeholder, and replace the three 'bash scripts/gen-key.sh' references in the error strings with 'openssl rand -hex 32' (which is what the README already instructs)? Keeps validator and example in sync and fixes the dangling-script reference at the same time.

Re: deadline removal in UrlFetchApp.fetch: was that intentional? Was a tradeoff explored? Apps Script's default deadline is shorter than the old 30s. If it was deliberate (e.g. so the call hangs up before Apps Script's 6-min execution cap rather than after a fixed 30s), worth a one-line comment in the .gs file.

Happy to merge as soon as the example-config diff is trimmed. Thanks again.

@poulcarlsen53
Copy link
Copy Markdown
Contributor Author

Updated this PR based on your feedback:

  • trimmed example config fields that are not read by internal/config yet
  • kept the Apps Script relay changes
  • restored the Apps Script fetch deadline
  • synced the tunnel_key placeholder with the client config validator
  • replaced the missing gen-key.sh guidance with openssl rand -hex 32

Verification:
go test -count=1 ./...
go vet ./...

@Kianmhz Kianmhz merged commit a7dd3bd into Kianmhz:main May 20, 2026
6 checks passed
Kianmhz added a commit that referenced this pull request May 23, 2026
…eam failure (#149)

v1.7.0 Code.gs added a try/catch around UrlFetchApp.fetch that, on
exception or non-200 from the VPS, swallowed the failure and returned
the error as plain text with HTTP 200. The GooseRelay client trusts
200 to mean "the body is a base64 batch" — its non-batch sniffer
(isLikelyNonBatchRelayPayload) only recognises HTML quota pages, JSON
status, and raw HTTP/. So the new error strings fell through to
DecodeBatch, which tried to base64-decode them and failed at the first
non-alphabet byte.

The user-visible symptom in #149 ("batch: base64 decode: illegal
base64 data at input byte 9") is the literal length of "Exception"
before the colon — "Exception:" was the most common shape, emitted
by Code.gs's catch block whenever the VPS was transiently unreachable
from Google's UrlFetchApp egress.

This was misattributed in the v1.7.0 release notes to #130 (the
EncodeBatch → AppendMarshal migration). The marshaler is fine; the
break was in #129 (Code.gs rewrite).

Fixes both sides:

Code.gs:
  - All terminal-failure paths now `throw` instead of returning 200
    with diagnostic text. Apps Script then surfaces its native HTML
    error page with HTTP 500, which the client's existing non-200
    code path already handles cleanly (mark endpoint failure, rotate
    to next deployment, no log noise).
  - Success path unchanged: still 200 with the upstream's base64
    batch verbatim.
  - The per-RELAY_URLS failover loop is preserved — we only throw
    after every URL has failed.

Carrier (defense-in-depth for users who upgrade the client but
forget to redeploy Code.gs):
  - isLikelyNonBatchRelayPayload now sniffs "Exception:",
    "relay_loop_detected:", "upstream status ", and
    "upstream fetch error:" prefixes.
  - classifyRelayErrorBody maps each sentinel to an actionable log
    message: relay_loop_detected → hard (config error), the rest →
    soft (transient VPS reachability or non-200 from server).
  - Four sniffer test cases added.

Net effect for affected users: the "illegal base64 data at input
byte 9" log line goes away. Transient VPS-unreachable conditions now
show up as one clear line per occurrence ("Code.gs could not reach
your VPS — check VPS is up, the server_port…") and the carrier
rotates to a healthy endpoint instead of retrying-then-dropping.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants