Add Go based backend#29
Conversation
- Add main.go for server and sync-realtime functionality - Create db.go for SQLite database handling and schema initialization - Introduce static.go for managing static GTFS data - Define spec types for agencies, routes, stops, trips, scheduled stop times, service calendars, and exceptions - Enhance routes.go to include endpoints for syncing static data - Update go.mod and go.sum for new dependencies
- Updated go.sum to include AWS SDK dependencies. - Enhanced FetchAndParseStatic function to log download progress and handle shapes.txt. - Introduced new provider for CTA with static and real-time URLs. - Modified StaticFeed struct to include Shapes data. - Added ShapePoint struct to represent shape geometry. - Implemented GeoJSON building from shapes and routes. - Created tile generation function using tippecanoe. - Added S3 upload functionality for tiles with environment variable configuration. - Developed a viewer for tiles using MapLibre GL.
There was a problem hiding this comment.
Pull request overview
This WIP PR introduces a new Go backend foundation for ingesting GTFS static + realtime data from multiple providers, persisting static data to SQLite, and generating/uploading transit route tiles (PMTiles), plus a simple MapLibre-based tile viewer.
Changes:
- Added a provider registry with a shared GTFS/GTFS-RT “base provider” plus initial provider configs (Amtrak custom realtime, Brightline, CTA, Metra, Metro Transit, Tri-Rail).
- Implemented GTFS static ZIP parsing + GTFS-RT protobuf parsing into shared
specmodels, and added SQLite schema + transactional static feed persistence. - Added a
sync-staticCLI (and Dockerfile) to fetch static feeds, write to DB, generate PMTiles via tippecanoe, optionally upload to S3-compatible storage; plus a minimal server and a debug tile viewer.
Reviewed changes
Copilot reviewed 27 out of 29 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| apps/api/tiles/viewer.html | New MapLibre + PMTiles debug viewer with route toggle panel and click popups |
| apps/api/tiles/upload.go | S3-compatible uploader for generated artifacts |
| apps/api/tiles/tippecanoe.go | Tippecanoe wrapper to generate PMTiles from GeoJSON |
| apps/api/tiles/geojson.go | Builds GeoJSON features from GTFS shapes/trips/routes for tiling |
| apps/api/spec/static.go | New static GTFS domain models (agency/route/stop/trip/etc.) |
| apps/api/spec/realtime.go | New realtime domain models (positions/stop times) + helpers |
| apps/api/routes/routes.go | HTTP route registration and debug/sync endpoints |
| apps/api/providers/trirail/trirail.go | Tri-Rail provider configuration |
| apps/api/providers/providers.go | Provider interface + registry + feed container types |
| apps/api/providers/metrotransit/metrotransit.go | Metro Transit provider configuration |
| apps/api/providers/metra/metra.go | Metra provider configuration (API key env support) |
| apps/api/providers/cta/cta.go | CTA provider configuration |
| apps/api/providers/brightline/brightline.go | Brightline provider configuration |
| apps/api/providers/base/base.go | Shared provider implementation using GTFS + GTFS-RT fetch/parse helpers |
| apps/api/providers/amtrak/types.go | Custom Amtrak JSON decoding types for decrypted map payload |
| apps/api/providers/amtrak/amtrak.go | Custom Amtrak realtime fetch + decryption + mapping to spec models |
| apps/api/gtfs/static.go | GTFS static ZIP download + CSV parsing into spec models |
| apps/api/gtfs/realtime.go | GTFS-RT download + protobuf parsing into spec models |
| apps/api/go.mod | Module path update + dependency declarations |
| apps/api/go.sum | Dependency lock updates |
| apps/api/Dockerfile.sync-static | Container build for tippecanoe + sync-static runtime |
| apps/api/db/static.go | Transactional “replace all” static feed persistence into SQLite |
| apps/api/db/db.go | SQLite connection wrapper + schema initialization |
| apps/api/cmd/sync-static/main.go | CLI tool to fetch provider static feeds, persist, generate/upload tiles |
| apps/api/cmd/server/main.go | Minimal HTTP server wiring DB + providers + routes |
| apps/api/cmd/functions/sync-realtime/main.go | DigitalOcean Functions entrypoint stub for realtime sync |
| apps/api/cmd/api/main.go | Removed old minimal API stub |
| apps/api/.env.example | Added env vars for provider keys and tile upload |
| .gitignore | Ignore generated binaries, DB files, pmtiles, and debug dumps |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| log.Printf("starting server on :%s", port) | ||
| if err := http.ListenAndServe(":"+port, mux); err != nil { | ||
| log.Fatal(err) |
There was a problem hiding this comment.
Server uses http.ListenAndServe without setting timeouts. This leaves the process more vulnerable to slowloris-style attacks and can tie up resources under bad clients. Prefer configuring an http.Server with at least ReadHeaderTimeout (and usually ReadTimeout/WriteTimeout/IdleTimeout) and calling server.ListenAndServe().
…re environment configuration
Co-authored-by: Copilot <copilot@github.com>
- Added FetchStaticConditional function to handle conditional GET requests with If-None-Match and If-Modified-Since headers. - Introduced FetchResult struct to encapsulate the result of the conditional fetch. - Enhanced ParseStaticBytes function to parse GTFS zip files and return structured data. - Updated parseStops and parseTrips functions to handle additional fields (stop_code and trip_short_name). feat: add rate limiting middleware - Implemented RateLimit middleware to limit requests per IP address. - Introduced token bucket algorithm for managing request rates and bursts. - Added clientIP function to extract the client's IP address from the request. feat: extend provider interface for static URL retrieval - Added StaticURL method to the Provider interface and its implementations. - Updated Amtrak provider to return its GTFS static feed URL. feat: register static routes for API - Created static.go to define and register API endpoints for static data retrieval. - Implemented handlers for fetching providers, stops, routes, trips, departures, and connections. refactor: clean up static data structures - Adjusted struct field comments for clarity and consistency in the spec package. - Ensured proper JSON and database tags are maintained across all static data structures.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a modular Go backend (server, collector, sync-static), PostgreSQL schema/access, GTFS parsers, providers, WS hub and processor, HTTP routes, S3 drainer, tiles tooling, CI, and a Cloudflare Worker hosting a containerized collector with R2 interception and cron wake. ChangesGo Backend: Data, Ingestion, API, Realtime
Cloudflare Worker Collector
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ate tile generation, and improve viewer interactivity
…runs for a provider
There was a problem hiding this comment.
Actionable comments posted: 4
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (25)
apps/api/middleware/ratelimit.go-86-96 (1)
86-96:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftUnconditional trust of
X-Forwarded-Forallows client-side spoofing.A direct client can set
X-Forwarded-Forand rotate arbitrary IPs, bypassing per-IP throttling. Only honor forwarded headers from trusted proxy sources (or disable XFF parsing unless explicitly configured).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/middleware/ratelimit.go` around lines 86 - 96, The clientIP function currently unconditionally trusts X-Forwarded-For and is vulnerable to client-side spoofing; update clientIP to only honor X-Forwarded-For when the immediate peer (parsed from r.RemoteAddr via net.SplitHostPort) is in a configured trusted proxy list (e.g., a function like isTrustedProxy(remoteIP) or a trustedProxies map), otherwise ignore X-Forwarded-For and return the peer IP; keep the existing logic to extract the first hop from XFF (trimSpace(xff[:i])) but gate that branch behind the trusted-proxy check so only requests coming from known proxies can influence clientIP.apps/api/gtfs/realtime.go-185-188 (1)
185-188:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't turn invalid
start_dateinto a zero run date.Ignoring the parse error here collapses every bad or missing
start_dateinto0001-01-01, which can merge unrelated runs under the same downstream keys. Return an error from this helper and let the callers skip or fail the entity explicitly.Suggested fix
-func parseStartDate(s string) time.Time { - t, _ := time.Parse("20060102", s) - return t +func parseStartDate(s string) (time.Time, error) { + if s == "" { + return time.Time{}, fmt.Errorf("missing start_date") + } + t, err := time.Parse("20060102", s) + if err != nil { + return time.Time{}, fmt.Errorf("parse start_date %q: %w", s, err) + } + return t, nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/gtfs/realtime.go` around lines 185 - 188, The helper parseStartDate currently swallows parse errors and returns time.Time zero, causing invalid/missing start_date to coalesce; change parseStartDate(s string) to return (time.Time, error) and propagate the parsing error from time.Parse instead of ignoring it, then update all callers (e.g., code paths that call parseStartDate) to handle the error by skipping or failing the GTFS-RT entity as appropriate rather than using a zero time; ensure callers check the returned error and do not treat the zero time as a valid run date.apps/api/spec/realtime.go-67-75 (1)
67-75:⚠️ Potential issue | 🟠 Major | ⚡ Quick winArrival-only stop helpers misclassify origin and dwell stops.
IsPassed()flips true as soon asActualArrexists, so an intermediate stop looks passed before departure, while an origin stop with onlyActualDepnever becomes passed.IsLive()has the same arrival-only bias. These helpers need to use departure when the stop has one, and fall back to arrival for terminal stops.Suggested fix
func (s *TrainStopTime) IsPassed() bool { - return s.ActualArr != nil + hasDeparture := s.ScheduledDep != nil || s.EstimatedDep != nil || s.ActualDep != nil + if hasDeparture { + return s.ActualDep != nil + } + return s.ActualArr != nil } // IsLive returns true if this stop still has a pending estimate. func (s *TrainStopTime) IsLive() bool { - return s.ActualArr == nil && s.EstimatedArr != nil + if s.IsPassed() { + return false + } + return s.EstimatedArr != nil || s.EstimatedDep != nil || s.ActualArr != nil }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/spec/realtime.go` around lines 67 - 75, TrainStopTime.IsPassed and IsLive wrongly check only arrival fields, causing origin/dwell stops to be misclassified; update both methods to prefer departure timestamps/estimates when present (check ActualDep then ActualArr for IsPassed, and EstimatedDep then EstimatedArr for IsLive, falling back to arrival for terminal stops) and ensure nil checks use the corresponding fields (ActualDep/ActualArr and EstimatedDep/EstimatedArr) so origin/through stops are classified by departure while terminal stops still use arrival.apps/api/db/realtime_write.go-25-42 (1)
25-42:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject older stop-time snapshots during upsert.
This query always overwrites estimates and resets
last_updatedtonow(). During backlog replay or any out-of-order ingest, an older snapshot can rewind a newer row and then look fresh because the database timestamp was rewritten. Persists.LastUpdatedand guard theDO UPDATEbranch on recency.Suggested fix
const q = ` INSERT INTO train_stop_times ( provider, trip_id, run_date, stop_sequence, stop_code, estimated_arr, estimated_dep, actual_arr, actual_dep, last_updated - ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, now()) + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10) ON CONFLICT (provider, trip_id, run_date, stop_sequence) DO UPDATE SET stop_code = EXCLUDED.stop_code, estimated_arr = EXCLUDED.estimated_arr, estimated_dep = EXCLUDED.estimated_dep, actual_arr = COALESCE(EXCLUDED.actual_arr, train_stop_times.actual_arr), actual_dep = COALESCE(EXCLUDED.actual_dep, train_stop_times.actual_dep), - last_updated = now()` + last_updated = EXCLUDED.last_updated + WHERE train_stop_times.last_updated IS NULL + OR train_stop_times.last_updated <= EXCLUDED.last_updated` batch := &pgx.Batch{} for _, s := range sts { batch.Queue(q, s.Provider, s.TripID, s.RunDate, s.StopSequence, s.StopCode, - s.EstimatedArr, s.EstimatedDep, s.ActualArr, s.ActualDep, + s.EstimatedArr, s.EstimatedDep, s.ActualArr, s.ActualDep, s.LastUpdated, ) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/db/realtime_write.go` around lines 25 - 42, The upsert currently always overwrites values and resets last_updated; change the SQL and batch.Queue to persist the incoming s.LastUpdated and only apply the DO UPDATE when the incoming snapshot is newer. Modify q to include last_updated in VALUES (add a $10 placeholder), add last_updated = EXCLUDED.last_updated to the SET, and add a WHERE clause to the DO UPDATE like WHERE EXCLUDED.last_updated >= train_stop_times.last_updated so older snapshots are ignored; then update the batch.Queue call to pass s.LastUpdated as the final argument. Ensure references: the q string, train_stop_times table, and the batch.Queue(...) invocation are updated accordingly.apps/collector/src/index.ts-17-21 (1)
17-21:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTreat failed wake attempts as actual cron failures.
stub.fetch()does not reject for HTTP 4xx/5xx responses—it returns a Response with a status code that must be checked. The code logs the response but never checksres.ok, so wake failures silently succeed. Additionally, the catch block swallows errors without rethrowing, which masks network failures as successful cron executions. Checkres.okand rethrow after logging to ensure failed wakes are actually reported as failed.Suggested fix
try { const res = await stub.fetch("https://container.internal/wake"); - console.log("wake response:", res.body ? await res.text() : "no body"); + const body = res.body ? await res.text() : "no body"; + if (!res.ok) { + throw new Error(`wake returned ${res.status}: ${body}`); + } + console.log("wake response:", body); } catch (e) { console.error("wake failed:", e); + throw e; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/collector/src/index.ts` around lines 17 - 21, The wake call currently treats any HTTP response as success and swallows exceptions; update the try/catch around stub.fetch so you check the returned Response's res.ok (and include res.status/res.statusText in the log) and throw an Error when res.ok is false, and also rethrow the caught exception after logging in the catch block; target the stub.fetch call and the surrounding try/catch in apps/collector/src/index.ts and ensure failed wakes propagate as errors rather than being silently ignored.apps/api/gtfs/realtime.go-124-130 (1)
124-130:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't collapse stop-id-only updates onto sequence
0.In GTFS-Realtime,
StopTimeUpdatecan identify a stop by eitherstop_sequenceorstop_idalone. This parser usesstu.GetStopSequence(), which returns0when the field is unset. Any update without an explicit sequence gets stored as0, causing distinct stops (identified bystop_id) to overwrite each other on the(provider, trip_id, run_date, stop_sequence)key. Go's protobuf bindings cannot distinguish between "unset" and "explicitly 0". Validate thatStopSequenceis present before persisting, or resolve it from static GTFS when onlystop_idis available.Suggested fix
for _, stu := range tu.StopTimeUpdate { + if stu.StopSequence == nil { + continue // or resolve stop_sequence from static GTFS using stop_id + } + st := spec.TrainStopTime{ Provider: providerID, TripID: tripID, RunDate: runDate, StopCode: providerID + ":" + stu.GetStopId(),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/gtfs/realtime.go` around lines 124 - 130, The parser is incorrectly treating unset StopSequence values (stu.GetStopSequence() returns 0) as a real sequence and collapsing stop-id-only updates; update the logic that builds spec.TrainStopTime (in the loop over StopTimeUpdate) to detect whether a sequence was actually provided before setting TrainStopTime.StopSequence — if stu does not contain an explicit stop_sequence, leave StopSequence unset/zero in the struct and use StopCode (providerID + ":" + stu.GetStopId()) as the unique key, or attempt to resolve the sequence from static GTFS data (lookup by stop_id) before assigning; ensure the code that persists/keys TrainStopTime uses this presence/lookup logic rather than blindly using stu.GetStopSequence().apps/api/collector/storage_emitter.go-30-42 (1)
30-42:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrevent panic on nil
snap/ nilClientinEmit.This method can crash on
snap.Key()ore.Client.Do(req)when either pointer is nil.💡 Proposed fix
func (e *StorageEmitter) Emit(ctx context.Context, snap *Snapshot) error { + if snap == nil { + return fmt.Errorf("storage emitter: nil snapshot") + } + + client := e.Client + if client == nil { + client = &http.Client{Timeout: 10 * time.Second} + } + body, err := json.Marshal(snap) if err != nil { return fmt.Errorf("storage emitter: marshal: %w", err) } @@ - resp, err := e.Client.Do(req) + resp, err := client.Do(req)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/collector/storage_emitter.go` around lines 30 - 42, The Emit method can panic when snap or e.Client is nil; add an early nil-check for snap in StorageEmitter.Emit and return a descriptive error before calling snap.Key(), and handle a nil e.Client by either returning an error or defaulting to http.DefaultClient before calling e.Client.Do(req); ensure the URL construction and req creation happen after the snap nil-check and use the validated client when invoking e.Client.Do to avoid nil-pointer dereferences.apps/api/collector/emitter.go-19-27 (1)
19-27:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard against nil snapshots to avoid collector panic.
Line 19 assumes
snapis always non-nil; a bad caller crashes this path.💡 Proposed fix
import ( "context" + "fmt" "log" ) @@ func (MockEmitter) Emit(_ context.Context, snap *Snapshot) error { + if snap == nil { + return fmt.Errorf("mock emitter: nil snapshot") + } positions, stopTimes := 0, 0 if snap.Feed != nil {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/collector/emitter.go` around lines 19 - 27, The MockEmitter.Emit function currently dereferences snap (and snap.Feed) without nil checks which can panic on bad callers; update MockEmitter.Emit to first check if snap == nil and return a clear error (or nil per project convention), then guard access to snap.Feed (e.g., only read snap.Feed.Positions and snap.Feed.StopTimes if snap.Feed != nil) before using snap.ProviderID and snap.Timestamp in the log so the method never panics when given a nil snapshot.apps/api/Dockerfile.sync-static-18-27 (1)
18-27:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRun the runtime image as a non-root user.
The final stage currently runs as root, which unnecessarily increases blast radius if
sync-staticortippecanoeis compromised.🔒 Proposed hardening patch
FROM debian:bookworm-slim RUN apt-get update && apt-get install -y --no-install-recommends \ ca-certificates libsqlite3-0 zlib1g \ && rm -rf /var/lib/apt/lists/* +RUN useradd --system --uid 10001 --create-home --shell /usr/sbin/nologin appuser \ + && mkdir -p /data \ + && chown -R appuser:appuser /data COPY --from=tippecanoe-build /tippecanoe/tippecanoe /usr/local/bin/tippecanoe COPY --from=go-build /sync-static /usr/local/bin/sync-static WORKDIR /data +USER appuser ENTRYPOINT ["sync-static", "--upload"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/Dockerfile.sync-static` around lines 18 - 27, The final Docker stage runs as root (ENTRYPOINT ["sync-static", "--upload"]) which increases blast radius; create a non-root user and group (e.g., "syncuser"), chown the WORKDIR (/data) and any binaries (e.g., /usr/local/bin/sync-static and /usr/local/bin/tippecanoe) to that user, and add a USER directive before the ENTRYPOINT so the container executes sync-static as the non-root account; ensure the created user has a fixed UID/GID and a home/shell as appropriate and that file permissions allow execution.apps/api/routes/static.go-55-57 (1)
55-57:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid exposing internal error details to clients.
serverErrorwriteserr.Error()directly to the HTTP response, which could leak sensitive implementation details (database errors, file paths, stack traces) to external users.🔒 Proposed fix
func serverError(w http.ResponseWriter, err error) { + log.Printf("internal error: %v", err) writeError(w, http.StatusInternalServerError, err.Error()) + writeError(w, http.StatusInternalServerError, "internal server error") - writeError(w, http.StatusInternalServerError, err.Error()) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/routes/static.go` around lines 55 - 57, serverError currently sends err.Error() to clients which can leak internals; update serverError to return a generic message (e.g., "internal server error") to the client via writeError while logging the full error internally (using your existing logger or log package) for diagnostics; locate the serverError function and replace the use of err.Error() in the writeError call with a constant generic message and add an internal log of err before/after calling writeError.apps/api/db/static.go-27-67 (1)
27-67:⚠️ Potential issue | 🟠 Major | ⚡ Quick winValidate and normalize the feed before deleting the existing provider rows.
SaveStaticFeedis scoped by theproviderIDargument, but every COPY pullsprovider_idback out of the feed rows. That means a nil or mixed-provider feed can either panic here or delete provider A and repopulate the transaction under provider B/blank IDs. Please reject nil/mismatched feeds up front, or stampproviderIDonto every row before COPY.Suggested guard at the transaction boundary
func (d *DB) SaveStaticFeed(ctx context.Context, providerID string, feed *providers.StaticFeed) (SyncCounts, error) { + if feed == nil { + return SyncCounts{}, fmt.Errorf("save static feed: nil feed") + } + if err := validateStaticFeedProvider(providerID, feed); err != nil { + return SyncCounts{}, err + } + var counts SyncCounts err := pgx.BeginFunc(ctx, d.pool, func(tx pgx.Tx) error {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/db/static.go` around lines 27 - 67, SaveStaticFeed currently deletes rows for providerID before validating/stamping the feed rows, which allows nil/mismatched provider_id values from feed.Agencies / feed.Routes / copyAgencies / copyRoutes etc. to cause panics or repopulate under the wrong provider; fix by validating and normalizing the feed at the start of SaveStaticFeed: ensure feed is non-nil, verify each feed record's provider_id matches the providerID (or uniformly overwrite/stamp providerID onto every row in feed.Agencies, feed.Routes, feed.Stops, feed.Trips, feed.StopTimes, feed.Calendars, feed.Exceptions) before running the DELETE/COPY transaction, so the DELETE + subsequent copy* functions operate only on rows for the intended provider.apps/api/routes/ingest.go-56-64 (1)
56-64:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject bodies that exceed
maxIngestBodyinstead of silently truncating them.
io.LimitReader(src, maxIngestBody)stops at 32 MiB, but it does not tell you whether the payload was larger. An oversized request can be truncated and still be processed as valid JSON if the extra bytes come after the document.Suggested limit check
- body, err := io.ReadAll(io.LimitReader(src, maxIngestBody)) + body, err := io.ReadAll(io.LimitReader(src, maxIngestBody+1)) if err != nil { http.Error(w, "read body: "+err.Error(), http.StatusBadRequest) return } + if len(body) > maxIngestBody { + http.Error(w, "body too large", http.StatusRequestEntityTooLarge) + return + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/routes/ingest.go` around lines 56 - 64, The handler currently uses io.LimitReader(src, maxIngestBody) which silently truncates larger requests; change the read to detect overflow by reading up to maxIngestBody+1 bytes (or use io.LimitedReader with N=maxIngestBody+1) from src into body, and if the read returns more than maxIngestBody bytes return an HTTP 413 (Payload Too Large) instead of proceeding to json.Unmarshal into collector.Snapshot; keep existing error handling paths (e.g., http.Error) but add this explicit size check referencing maxIngestBody, io.LimitReader/src/body before decoding.apps/api/cmd/server/main.go-81-81 (1)
81-81:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSet timeouts on the public
http.Server.This listener is internet-facing, but the server is created without
ReadHeaderTimeout/ReadTimeout/IdleTimeout. A slow client can hold connections open indefinitely and tie up the process.Suggested hardening
- srv := &http.Server{Addr: ":" + port, Handler: handler} + srv := &http.Server{ + Addr: ":" + port, + Handler: handler, + ReadHeaderTimeout: 5 * time.Second, + ReadTimeout: 15 * time.Second, + WriteTimeout: 15 * time.Second, + IdleTimeout: 60 * time.Second, + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/cmd/server/main.go` at line 81, The public http.Server is created without timeouts (srv := &http.Server{Addr: ":" + port, Handler: handler}); add sensible ReadHeaderTimeout, ReadTimeout, WriteTimeout and IdleTimeout fields to the server struct to protect against slow clients (for example ReadHeaderTimeout ~10s, ReadTimeout/WriteTimeout ~15s, IdleTimeout ~60s) so that srv enforces connection/request time limits for the internet-facing listener.apps/api/ws/handler.go-19-24 (1)
19-24:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRestrict
CheckOriginbefore shipping to production.The current implementation returns
trueunconditionally (lines 22–23), which disables CORS origin validation. This allows any website to establish a WebSocket connection to this endpoint from a visitor's browser. Lock this down to same-origin or an explicit allowlist before merge.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/ws/handler.go` around lines 19 - 24, The upgrader's CheckOrigin currently returns true unconditionally; update the websocket.Upgrader.CheckOrigin function (the CheckOrigin closure on the upgrader variable) to validate r.Header.Get("Origin") instead of allowing all origins — either implement a same-origin check by comparing the request Origin host to r.Host or enforce an explicit allowlist of trusted origins and only return true when Origin matches an entry. Ensure the logic safely handles empty/missing Origin headers and documents the chosen allowlist or same-origin behavior.apps/api/cmd/sync-static/main.go-173-231 (1)
173-231:⚠️ Potential issue | 🟠 Major | ⚡ Quick winKeep tile generation failures provider-scoped.
buildTilesexits the whole process on any GeoJSON/tippecanoe/upload error, and the uncheckedos.Statcan still panic afterGenerateTiles. Since this helper runs inside the per-provider loop, one bad provider or missing local dependency stops every later provider from syncing. Return anerrorhere, check theStatresult, and letmainlog/continue for that provider.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/cmd/sync-static/main.go` around lines 173 - 231, The buildTiles helper currently calls log.Fatalf and can panic on an unchecked os.Stat, killing the whole process; change its signature to return error (func buildTiles(...) error) and replace all log.Fatalf calls in buildTiles (including errors from tiles.BuildRouteGeoJSON, tiles.BuildStopGeoJSON, os.CreateTemp, file Write, tiles.GenerateTiles, tiles.Upload) with returned errors (fmt.Errorf with context and providerID), check the result/error from os.Stat before using stat.Size(), and ensure callers in main handle the returned error by logging the provider-scoped failure and continuing to the next provider instead of exiting the process.apps/api/tiles/viewer.html-390-417 (1)
390-417:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid
setHTML()with feed-backed fields.
name,provider_id,route_id,stop_id, andcolorall come from upstream tile data and are interpolated into HTML/CSS verbatim here. A malformed or hostile feed can inject markup into the popup. Build the popup content with DOM nodes andtextContentinstead ofsetHTML().🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/tiles/viewer.html` around lines 390 - 417, The popup creation currently uses maplibregl.Popup().setHTML(...) with feed-backed fields (name, provider_id, route_id, stop_id, color) which allows HTML/JS injection; replace the setHTML usage in both click handlers for "transit-routes" and "transit-stops" by constructing the popup content via DOM APIs: create container DIVs, create a route-swatch element and assign its style.backgroundColor = color (after validating/sanitizing the color), set text nodes via textContent for name, shortName, provider_id, route_id, code and stopID (falling back to "Unknown stop" or "unknown" as before), append children to the container and pass that element to Popup.setDOMContent(...) so no untrusted strings are inserted as HTML. Ensure you reference the existing maplibregl.Popup calls and variables (name, shortName, props, color, code, stopID) when making the change.apps/api/drainer/drainer.go-64-66 (1)
64-66:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftOne failing provider can starve every later provider backlog.
fetchAll()loads keys for everybacklog/{provider}/...prefix, butReplay()stops on the firstProcessErr. That means a persistent failure in an earlier provider prefix can prevent later providers from ever being replayed or deleted. Replay per provider prefix instead of across the combined backlog.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/drainer/drainer.go` around lines 64 - 66, Replay currently processes the combined slice returned by fetchAll() and stops on the first ProcessErr, allowing one failing provider to block later providers; change the logic to group keys/items by provider prefix (the backlog/{provider}/... key namespace from fetchAll()) and call Replay(ctx, providerItems, d.Processor) for each provider separately, ensuring you still log per-provider counts and handle/collect errors per provider rather than aborting the entire combined replay flow.apps/api/tiles/geojson.go-140-151 (1)
140-151:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftDon't collapse every shared shape down to one route.
routeByShapestores a singlespec.RoutepershapeKeyand ignores later trips that reuse the same shape. GTFS feeds do reuse shapes across multiple route IDs, so the generated tiles can miss routes or show the wrong name/color depending on trip order. Emit features per(shapeKey, routeID)pair, or model shared geometry separately.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/tiles/geojson.go` around lines 140 - 151, The current logic collapses multiple routes that share the same shape by using routeByShape keyed only by shapeKey, causing later routes with the same shape to be ignored; change the keying so features are emitted per (shapeKey, routeID) instead of just shapeKey — e.g., include t.RouteID in the map key or use a nested map keyed by shapeKey then routeID (modify routeByShape usage and the shapeKey construction where trips are iterated), and ensure downstream consumers that read routeByShape are updated to use the new composite key (or iterate nested map) so every distinct routeId/shape pair produces a feature.apps/api/providers/amtrak/amtrak.go-92-95 (1)
92-95:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd a timeout to the HTTP client for the Amtrak realtime fetch.
The request is created with
http.NewRequestWithContext(ctx, ...)at line 86, which respects the caller's context deadline. However, it then useshttp.DefaultClient.Do(req)which has no built-in timeout. If the caller doesn't provide a deadline onctx, this call can hang indefinitely on a slow upstream response. Create anhttp.Clientwith a bounded timeout (e.g., 10–30 seconds) and use it here instead ofhttp.DefaultClient.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/providers/amtrak/amtrak.go` around lines 92 - 95, The code calls http.DefaultClient.Do(req) after creating the request with http.NewRequestWithContext(ctx, ...), which can hang if ctx has no deadline; replace use of http.DefaultClient with a dedicated http.Client that has a bounded Timeout (e.g., 10–30s) and call client.Do(req) instead; update the fetch code in amtrak.go (the block around http.NewRequestWithContext and the resp, err := http.DefaultClient.Do(req) line) to instantiate an http.Client with Timeout and use that client to perform the request.apps/api/tiles/viewer.html-137-139 (1)
137-139:⚠️ Potential issue | 🟠 Major | ⚡ Quick winNormalize
?tiles=overrides to apmtiles://URL.The comment documents that callers can pass a CDN URL, but raw
https://...pmtilesvalues assigned directly to the vector sourceurlproperty will fail. MapLibre GL treats theurlas a TileJSON endpoint unless the URL uses a registered protocol scheme. Sincepmtiles://is registered as a protocol handler, plain HTTP(S) URLs must be prefixed with it to work correctly.Suggested fix
- const tilesUrl = params.get("tiles") || "pmtiles://http://localhost:3456/tiles/transit.pmtiles"; + const rawTilesUrl = params.get("tiles") || "http://localhost:3456/tiles/transit.pmtiles"; + const tilesUrl = rawTilesUrl.startsWith("pmtiles://") + ? rawTilesUrl + : `pmtiles://${rawTilesUrl}`;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/tiles/viewer.html` around lines 137 - 139, Normalize the tiles override so plain HTTP(S) URLs are wrapped with the pmtiles:// scheme and existing pmtiles:// values are left unchanged: read the query param via params.get("tiles"), then if it startsWith("http://") or "https://" set tilesUrl = "pmtiles://" + tilesParam, else if it already startsWith("pmtiles://") set tilesUrl = tilesParam, otherwise fall back to the existing default ("pmtiles://http://localhost:3456/tiles/transit.pmtiles"); update the code around params and tilesUrl to perform this normalization and avoid producing double-prefixed values.apps/api/db/static_read.go-255-259 (1)
255-259:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftDon’t fabricate
LastUpdatedwhen no realtime row exists.
COALESCE(tst.last_updated, now())makes untouched stops look freshly updated even whentrain_stop_timeshas no matching row. That erases the difference between “no realtime yet” and “realtime just arrived”.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/db/static_read.go` around lines 255 - 259, The query currently uses COALESCE(tst.last_updated, now()) which fabricates a fresh LastUpdated timestamp when no realtime row exists; change this to preserve NULL by removing the COALESCE and using tst.last_updated directly (or explicitly CASE WHEN tst.id IS NULL THEN NULL ELSE tst.last_updated END) so that missing train_stop_times rows remain NULL rather than being replaced with now(); update the expression replacing COALESCE(tst.last_updated, now()) accordingly in the SELECT that references tst.last_updated.apps/api/gtfs/static.go-47-118 (1)
47-118:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail fast when required GTFS files are missing.
Right now
routes.txt,stops.txt,trips.txt, andstop_times.txtare effectively optional: if a member is absent, parsing still succeeds and returns empty slices. In the sync path that can turn a partial/bad ZIP into a “successful” refresh and overwrite previously-ingested static data with an incomplete feed.Suggested direction
+func requireFiles(files map[string]*zip.File, names ...string) error { + for _, name := range names { + if _, ok := files[name]; !ok { + return fmt.Errorf("gtfs: missing required file %s", name) + } + } + return nil +} + func ParseStaticBytes( providerID string, data []byte, ) ( @@ files := indexZip(zr) + if err := requireFiles(files, "routes.txt", "stops.txt", "trips.txt", "stop_times.txt"); err != nil { + return nil, nil, nil, nil, nil, nil, nil, nil, err + }Also applies to: 205-268
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/gtfs/static.go` around lines 47 - 118, The code currently treats routes.txt, stops.txt, trips.txt and stop_times.txt as optional because it only parses them when present; update the logic in the function that builds `files` (use the same scope where `files` and `providerID` are used) to verify presence of these required files up front and return a descriptive error if any are missing, rather than proceeding and returning empty slices; specifically, before calling `parseRoutes`, `parseStops`, `parseTrips`, or `parseStopTimes` (and their associated log lines), check for keys "routes.txt", "stops.txt", "trips.txt", and "stop_times.txt" in `files` and return an error that mentions the missing filename and `providerID` so the sync fails fast and does not overwrite existing data with a partial feed.apps/api/db/static_read.go-592-601 (1)
592-601:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift
GetTrainServiceignores feeds that use onlycalendar_dates.txt.The parser explicitly treats
calendar.txtas optional, but this query derives the range only fromservice_calendars. For providers backed solely bycalendar_dates.txt, this method will always returnErrNotFound, and even mixed feeds will miss added/removed exception dates.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/db/static_read.go` around lines 592 - 601, GetTrainService currently derives service date range only from service_calendars, so providers that use only calendar_dates (or have exception dates) are ignored; update the SQL in the d.pool.QueryRow call (the query that references service_calendars, trips, providerID, trainNumber and uses nullIfEmpty(from)/nullIfEmpty(to)) to compute MIN/ MAX over a union of service_calendars and calendar_dates for the matching service_ids (i.e., include dates from calendar_dates where service_id is in the matching set and provider_id = $1), then apply the existing COALESCE/ to_char logic so added/removed exception dates are considered and GetTrainService no longer returns ErrNotFound for calendar_dates-only feeds.apps/api/gtfs/static.go-403-404 (1)
403-404:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStop silently coercing malformed GTFS values to zero.
These conversions drop
ParseFloat,Atoi, andtime.Parseerrors, so a bad feed row becomes(0,0)coordinates,stop_sequence = 0, or0001-01-01dates. That silently corrupts the stored feed instead of failing the import with a useful error.Also applies to: 453-453, 476-477, 502-503, 521-523
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/gtfs/static.go` around lines 403 - 404, The code is silently ignoring parsing errors for GTFS fields (e.g., strconv.ParseFloat for stop_lat/stop_lon, strconv.Atoi for stop_sequence, time.Parse for dates) which converts bad values to zero; update each parsing site (the ParseFloat calls producing lat/lon, the Atoi calls for stop_sequence, and time.Parse uses) to check the returned error and propagate a descriptive error instead of discarding it—return or bubble up fmt.Errorf that includes the field name and raw value (e.g., "invalid stop_lat: '%s': %w") so the import fails with actionable diagnostics rather than storing corrupted defaults; apply this change to all affected parsing locations in static.go (the lat/lon ParseFloat calls and the other mentioned ParseFloat/Atoi/time.Parse occurrences).apps/api/db/static_read.go-200-212 (1)
200-212:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse Earth radius in the haversine filter.
2 * asin(sqrt(...))returns an angular distance in radians, so multiplying it by111000is incorrect here. That underestimates the real distance by about 57x, which meansListStopsNearbycan return stops well outside the requested radius.Suggested fix
- AND 111000.0 * 2 * asin(sqrt( + AND 6371000.0 * 2 * asin(sqrt( power(sin(radians(lat - $5)/2), 2) + cos(radians($5)) * cos(radians(lat)) * power(sin(radians(lon - $6)/2), 2) )) <= $7🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/db/static_read.go` around lines 200 - 212, The haversine expression in the SQL uses 111000.0 * 2 * asin(...) which incorrectly treats the angular distance as degrees; replace the magic constant with Earth's radius in meters so distance = earthRadiusMeters * 2 * asin(sqrt(...)). Update the SQL in q (and any comparisons using $7) to multiply the angular distance by a named constant (e.g., earthRadiusMeters = 6371000.0) or a parameter, and adjust any callers of ListStopsNearby (or the function that builds q) to pass/expect the radius in meters consistently.
🟡 Minor comments (5)
apps/collector/.gitignore-4-5 (1)
4-5:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winWildcard ignore patterns are malformed and won’t match expected files.
Several entries use
_or escaped\*where*is needed, so logs/cache/artifacts may still be committed.Suggested fix (representative lines)
-_.log +*.log -npm-debug.log_ +npm-debug.log* -report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json +report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json -\*.pid.lock +*.pid.lock -\*.lcov +*.lcov -\*.tsbuildinfo +*.tsbuildinfo -\*.tgz +*.tgz -.pnp.\* +.pnp.*Also applies to: 13-13, 18-20, 29-29, 62-62, 89-89, 159-159
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/collector/.gitignore` around lines 4 - 5, Several .gitignore entries use underscores or escaped asterisks instead of real glob wildcards (e.g., "_.log" and "npm-debug.log_") so logs/cache/artifacts still get committed; update those entries to use proper glob patterns (replace leading/trailing underscores with * and remove backslashes before *), e.g., change patterns like "_.log", "npm-debug.log_", and any "\*" occurrences to use "*.log", "npm-debug.log*", etc., and apply the same fixes to the other affected entries referenced (lines 13, 18-20, 29, 62, 89, 159) so the ignore rules actually match the intended files.apps/api/middleware/ratelimit.go-16-21 (1)
16-21:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winValidate
rateandburstat middleware construction.Without guards, non-positive values can create invalid bucket state and unpredictable allow/deny behavior.
Suggested fail-fast guard
func RateLimit(rate float64, burst int) func(http.Handler) http.Handler { + if rate <= 0 || burst <= 0 { + panic("RateLimit: rate and burst must be > 0") + } rl := &limiter{ rate: rate, burst: float64(burst), bucket: make(map[string]*tokenBucket), }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/middleware/ratelimit.go` around lines 16 - 21, Validate inputs in RateLimit before constructing the limiter: check that rate > 0 and burst > 0 and fail fast (panic or return an error) with a clear message if invalid; perform this check before converting burst to float64 and before initializing bucket/tokenBucket state so you never create an invalid limiter instance. Reference the RateLimit function and the limiter struct fields (rate, burst, bucket) when adding the guard and error message.apps/api/cmd/server/.env.example-13-13 (1)
13-13:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd a trailing newline.
The file is missing a blank line at the end. Some tools and POSIX conventions expect files to end with a newline.
🔧 Proposed fix
R2_BUCKET= R2_ENDPOINT= R2_ACCESS_KEY_ID= R2_SECRET_ACCESS_KEY= +🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/cmd/server/.env.example` at line 13, The .env.example file ends without a newline which can break POSIX tools; open the file containing the R2_SECRET_ACCESS_KEY entry and add a single trailing newline (blank line at end of file) so the file ends with a newline character, then save the file.README.md-143-165 (1)
143-165:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAccount for
cmd/sync-staticin the backend overview.This section says the backend is split into two Go binaries, but the README also documents
./cmd/sync-staticas a third backend binary. That mismatch makes the setup story easy to misread.Suggested fix
-The backend is split into two Go binaries plus a thin Cloudflare Worker that hosts the collector container. +The backend is split into three Go binaries plus a thin Cloudflare Worker that hosts the collector container.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` around lines 143 - 165, The README backend overview omits the third Go binary cmd/sync-static; update the paragraph that currently states "The backend is split into two Go binaries" to mention the third binary and add a short bullet (parallel to the entries for apps/api/cmd/server and apps/api/cmd/collector) describing cmd/sync-static: what it does, where it lives (cmd/sync-static), and how to run it (example go run command and any env vars), and adjust the introductory sentence to "three Go binaries" so readers see the full setup story.apps/api/cmd/collector/main.go-42-45 (1)
42-45:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winGuard against zero providers to prevent panic.
If the registry is empty (e.g., all providers commented out during development),
len(all)would be 0, causing a division-by-zero panic on line 43.🛡️ Proposed fix
all := registry.All() + if len(all) == 0 { + log.Printf("collector: no providers registered, exiting") + return + } stagger := pollInterval / time.Duration(len(all))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/cmd/collector/main.go` around lines 42 - 45, The code computes stagger := pollInterval / time.Duration(len(all)) and will panic if registry.All() returns an empty slice; guard by checking count := len(all) before dividing: if count == 0, log a clear message (using log.Printf or process logger) and either return/exit or set a safe default (e.g., stagger = pollInterval or 0) to avoid division-by-zero; update the code around the call to registry.All(), the len(all) usage, and the stagger assignment in main (where pollInterval and stagger are defined) accordingly.
🧹 Nitpick comments (7)
apps/api/db/postgres.go (1)
41-44: ⚡ Quick winAdd error context for schema application failures.
The error from
applySchemais returned without context. When schema application fails, the error message won't indicate what operation failed.♻️ Proposed fix
if err := applySchema(ctx, pool); err != nil { pool.Close() - return nil, err + return nil, fmt.Errorf("db: apply schema: %w", err) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/db/postgres.go` around lines 41 - 44, The call to applySchema returns errors without context; update the error handling around applySchema(ctx, pool) in the function where pool.Close() is invoked so the returned error wraps or annotates the original (e.g., using fmt.Errorf("applySchema failed: %w", err) or errors.Wrap) to indicate the schema application step failed, and ensure pool.Close() is still called before returning the wrapped error; reference applySchema and pool.Close() to locate the block to modify.apps/api/tiles/upload.go (1)
64-70: 💤 Low valueConsider setting
ContentLengthexplicitly for S3-compatible storage.While the AWS SDK can often determine the content length from seekable readers like
*os.File, some S3-compatible services (including R2) may benefit from an explicitContentLengthto avoid potential issues with chunked transfers or retries.♻️ Proposed fix
_, err = client.PutObject(ctx, &s3.PutObjectInput{ Bucket: aws.String(bucket), Key: aws.String(objectKey), Body: f, + ContentLength: aws.Int64(stat.Size()), ContentType: aws.String(contentType), CacheControl: aws.String(cacheControl), })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/tiles/upload.go` around lines 64 - 70, The PutObject call should explicitly set ContentLength for S3-compatible targets; before calling client.PutObject (the call that passes Bucket, Key, Body, ContentType, CacheControl), obtain the size of the file reader (e.g., use os.File.Stat() to get fileInfo.Size() or f.Seek to compute length and then seek back) and add ContentLength: aws.Int64(size) to the s3.PutObjectInput so uploads avoid chunked-transfer issues with services like R2.apps/collector/src/collector-container.ts (1)
31-40: ⚡ Quick winConsider adding error handling for R2 put failures.
If
BACKLOG_BUCKET.put()fails, the error bubbles up as an unhandled rejection. Wrapping in try-catch and returning a 500 response would provide clearer failure semantics to the container.♻️ Proposed fix
CollectorContainer.outboundByHost = { [R2_BACKLOG_HOST]: async (req, env, _ctx) => { if (req.method !== "PUT") return new Response("method not allowed", { status: 405 }); const url = new URL(req.url); const key = url.pathname.replace(/^\//, ""); if (!key.startsWith("backlog/")) return new Response("forbidden key prefix", { status: 403 }); - await env.BACKLOG_BUCKET.put(key, await req.arrayBuffer()); - return new Response(null, { status: 204 }); + try { + await env.BACKLOG_BUCKET.put(key, await req.arrayBuffer()); + return new Response(null, { status: 204 }); + } catch (err) { + console.error("R2 put failed:", err); + return new Response("internal error", { status: 500 }); + } } };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/collector/src/collector-container.ts` around lines 31 - 40, The R2 PUT handler assigned to CollectorContainer.outboundByHost for R2_BACKLOG_HOST currently calls env.BACKLOG_BUCKET.put(...) without error handling; wrap the body of that async handler (the function assigned to [R2_BACKLOG_HOST]) in a try-catch that catches errors from await req.arrayBuffer() and env.BACKLOG_BUCKET.put(), log the caught error (or attach it to an existing logger) and return a Response with status 500 and an explanatory message on failure, while preserving the existing 405/403 checks and returning 204 on success.apps/api/db/schema.sql (1)
159-160: 💤 Low valuePotentially redundant index.
The index
idx_tst_runon(provider, trip_id, run_date)duplicates the leading columns of the primary key(provider, trip_id, run_date, stop_sequence). PostgreSQL can efficiently use the PK index for queries filtering on these three columns.Consider removing this index unless you have specific query patterns that benefit from it (e.g., index-only scans where you need only these columns).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/db/schema.sql` around lines 159 - 160, The added index idx_tst_run on table train_stop_times duplicates the leading columns of the table's primary key (provider, trip_id, run_date, stop_sequence) and is likely redundant; either remove the CREATE INDEX statement for idx_tst_run or justify/retain it only if you have specific query patterns (e.g., index-only scans on provider, trip_id, run_date) that benefit from a narrower index—locate the CREATE INDEX IF NOT EXISTS idx_tst_run ... and delete it (or document the query use-case) to avoid unnecessary index maintenance costs.apps/api/Dockerfile.sync-static (1)
6-7: ⚡ Quick winPin
tippecanoeto an immutable ref for reproducible builds.Cloning the default branch head makes image output non-deterministic and weakens supply-chain control. The latest stable release is
2.79.0; pin by commit SHA or release tag for deterministic builds.📌 Suggested ref pinning approach
+ARG TIPPECANOE_REF=<commit-sha-or-release-tag> -RUN git clone --depth 1 https://github.com/felt/tippecanoe.git /tippecanoe \ +RUN git clone --depth 1 --branch "${TIPPECANOE_REF}" https://github.com/felt/tippecanoe.git /tippecanoe \ && cd /tippecanoe && make -j$(nproc)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/Dockerfile.sync-static` around lines 6 - 7, The Dockerfile's git clone for tippecanoe should be pinned to an immutable release to ensure reproducible builds; replace the current "git clone --depth 1 https://github.com/felt/tippecanoe.git /tippecanoe" with a clone that checks out a specific tag or commit (for example use "--branch v2.79.0 --depth 1" or clone then "git checkout <COMMIT_SHA>") so the subsequent "cd /tippecanoe && make -j$(nproc)" builds a deterministic version of tippecanoe..github/workflows/api-server.yml (1)
69-75: ⚡ Quick winConsider restricting image push to main branch only.
With
push: true, images are pushed on every PR, not just main branch merges. This clutters the registry with unreviewed code and may not be intentional.♻️ Proposed fix to push only on main
- name: Build & push uses: docker/build-push-action@v6 with: context: ./apps/api file: ./apps/api/Dockerfile.server - push: true + push: ${{ github.ref == 'refs/heads/main' }} tags: ${{ steps.meta.outputs.tags }}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/api-server.yml around lines 69 - 75, The "Build & push" Docker step is currently pushing images for every run because push: true; change it so images are only pushed from the main branch by making the push conditional (e.g., compute push as an expression or add an if: condition on the "Build & push" step that checks github.ref == 'refs/heads/main') and keep tags and build context unchanged; update the step that uses docker/build-push-action@v6 and the push property (or add the step-level if) so the action only pushes when on main.apps/api/collector/fallback_emitter.go (1)
18-27: 💤 Low valueAdd nil guards for exported emitter fields.
PrimaryandFallbackare exported fields, so external code could construct aFallbackEmitterwith nil values, causing a panic whenEmitis called. WhilebuildEmitterinmain.gocorrectly ensures both are set, defensive checks would make this safer.🛡️ Proposed defensive fix
func (e *FallbackEmitter) Emit(ctx context.Context, snap *Snapshot) error { + if e.Primary == nil { + if e.Fallback == nil { + return ErrNoEmitter + } + return e.Fallback.Emit(ctx, snap) + } if err := e.Primary.Emit(ctx, snap); err != nil { log.Printf("collector[%s]: primary emit failed, falling back: %v", snap.ProviderID, err) + if e.Fallback == nil { + return err + } if fbErr := e.Fallback.Emit(ctx, snap); fbErr != nil {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/collector/fallback_emitter.go` around lines 18 - 27, The Emit method of FallbackEmitter must defensively check for nil exported fields Primary and Fallback before calling them: in FallbackEmitter.Emit, if Primary is nil skip calling it (or return an error) and attempt to call Fallback if non-nil; if Primary.Emit returns an error and Fallback is nil return a clear wrapped error (e.g., "primary failed (%v) and no fallback"), and if both Primary and Fallback are nil return a descriptive error; update the error messages to reference the failing component and wrap underlying errors using fmt.Errorf to preserve context.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: fbcf9df3-26c0-457b-97df-8689c8cd1866
⛔ Files ignored due to path filters (3)
apps/api/go.sumis excluded by!**/*.sumapps/collector/package-lock.jsonis excluded by!**/package-lock.jsonpnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (72)
.github/workflows/api-server.yml.gitignoreREADME.mdapps/api/.dockerignoreapps/api/.env.exampleapps/api/Dockerfile.serverapps/api/Dockerfile.sync-staticapps/api/cmd/api/main.goapps/api/cmd/collector/.env.exampleapps/api/cmd/collector/main.goapps/api/cmd/server/.env.exampleapps/api/cmd/server/main.goapps/api/cmd/sync-static/main.goapps/api/collector/emitter.goapps/api/collector/fallback_emitter.goapps/api/collector/http_emitter.goapps/api/collector/poller.goapps/api/collector/snapshot.goapps/api/collector/storage_emitter.goapps/api/config/env.goapps/api/db/feed_versions.goapps/api/db/migrations.goapps/api/db/postgres.goapps/api/db/realtime_write.goapps/api/db/schema.sqlapps/api/db/static.goapps/api/db/static_read.goapps/api/drainer/drainer.goapps/api/drainer/replay.goapps/api/go.modapps/api/gtfs/realtime.goapps/api/gtfs/static.goapps/api/middleware/ratelimit.goapps/api/providers/amtrak/amtrak.goapps/api/providers/amtrak/types.goapps/api/providers/base/base.goapps/api/providers/brightline/brightline.goapps/api/providers/cta/cta.goapps/api/providers/metra/metra.goapps/api/providers/metrotransit/metrotransit.goapps/api/providers/providers.goapps/api/providers/trirail/trirail.goapps/api/realtime/process.goapps/api/routes/ingest.goapps/api/routes/routes.goapps/api/routes/static.goapps/api/spec/realtime.goapps/api/spec/static.goapps/api/tiles/geojson.goapps/api/tiles/tippecanoe.goapps/api/tiles/upload.goapps/api/tiles/viewer.htmlapps/api/ws/handler.goapps/api/ws/hub.goapps/api/ws/poller.goapps/collector/.editorconfigapps/collector/.gitignoreapps/collector/.prettierrcapps/collector/AGENTS.mdapps/collector/container/Dockerfileapps/collector/package.jsonapps/collector/src/collector-container.tsapps/collector/src/index.tsapps/collector/test/env.d.tsapps/collector/test/index.spec.tsapps/collector/test/tsconfig.jsonapps/collector/tsconfig.jsonapps/collector/vitest.config.mtsapps/collector/worker-configuration.d.tsapps/collector/wrangler.jsoncpackage.jsonpnpm-workspace.yaml
💤 Files with no reviewable changes (2)
- apps/api/.env.example
- apps/api/cmd/api/main.go
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
apps/api/cmd/sync-static/main.go (1)
191-215: 💤 Low valueConsider deferring
Close()immediately afterCreateTempfor safer resource handling.The current pattern manually closes files on error paths and before the next operation, which works correctly for normal execution. However, deferring
Close()immediately afterCreateTempis more idiomatic Go and provides safety against panics or future code changes that might introduce new error paths.Note: Calling
Close()twice is safe in Go (subsequent calls return an error but don't panic).♻️ Suggested refactor
routesTmpFile, err := os.CreateTemp("", fmt.Sprintf("tracky-%s-routes-*.geojson", providerID)) if err != nil { return fmt.Errorf("%s: creating routes temp file: %w", providerID, err) } routesTmpPath := routesTmpFile.Name() + defer routesTmpFile.Close() defer os.Remove(routesTmpPath) if _, err := routesTmpFile.Write(routesGeoJSON); err != nil { - routesTmpFile.Close() return fmt.Errorf("%s: writing routes GeoJSON: %w", providerID, err) } - routesTmpFile.Close() stopsTmpFile, err := os.CreateTemp("", fmt.Sprintf("tracky-%s-stops-*.geojson", providerID)) if err != nil { return fmt.Errorf("%s: creating stops temp file: %w", providerID, err) } stopsTmpPath := stopsTmpFile.Name() + defer stopsTmpFile.Close() defer os.Remove(stopsTmpPath) if _, err := stopsTmpFile.Write(stopsGeoJSON); err != nil { - stopsTmpFile.Close() return fmt.Errorf("%s: writing stops GeoJSON: %w", providerID, err) } - stopsTmpFile.Close()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/cmd/sync-static/main.go` around lines 191 - 215, The temp files created with os.CreateTemp (routesTmpFile and stopsTmpFile) should have Close() deferred immediately after successful creation to ensure they're closed on all control paths; update main.go so that right after creating routesTmpFile and stopsTmpFile you call defer routesTmpFile.Close() and defer stopsTmpFile.Close(), keep the existing os.Remove defers for routesTmpPath/stopsTmpPath, and remove the explicit routesTmpFile.Close() and stopsTmpFile.Close() calls later in the function (or keep them only if you want to ignore their returned errors) so you don't rely on manual closes in multiple places.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/api/db/realtime_write.go`:
- Around line 30-37: The upsert conflict key in the INSERT/ON CONFLICT for
train_stop_times uses (provider, trip_id, run_date, stop_sequence) which allows
multiple stop-id-only updates (where stop_sequence is zero/missing) to collide
and overwrite each other; change the conflict target to include the actual stop
identity used by ingestion (e.g., replace or add stop_id or stop_code to the
unique key) and update the ON CONFLICT clause to match that unique constraint,
and add a schema/index migration to create the corresponding UNIQUE index (e.g.,
UNIQUE(provider, trip_id, run_date, stop_id)) so the database enforces
uniqueness by stop identity rather than sequence.
In `@apps/api/drainer/drainer.go`:
- Around line 39-43: The Run method currently calls time.NewTicker(d.Interval)
which will panic if d.Interval <= 0; in Drainer.Run validate d.Interval (or set
a sensible default, e.g. time.Second) before creating the ticker. Locate the Run
method on type Drainer and add a guard that checks d.Interval <= 0 and assigns a
default positive duration (or return an error) prior to calling time.NewTicker
so the ticker creation cannot panic.
In `@apps/api/gtfs/realtime.go`:
- Around line 189-197: The code currently uses http.DefaultClient.Do(req) and
io.ReadAll(resp.Body) which lack an explicit timeout and response-size cap;
replace usage of http.DefaultClient with a configured http.Client that sets a
sensible Timeout (e.g., 10s) and when reading the response wrap resp.Body with a
size-limiting reader (e.g., io.LimitReader or http.MaxBytesReader) before
calling io.ReadAll to avoid unbounded memory use; update the call sites around
the req/resp variables (where http.DefaultClient.Do is invoked and
io.ReadAll(resp.Body) is called) to use the new client and bounded reader,
following the pattern used in storage_emitter.go / http_emitter.go / amtrak.go
and the bounded-read approach used in routes/ingest.go.
In `@apps/api/spec/realtime.go`:
- Around line 70-75: The IsPassed() implementation treats any ActualArr as
terminal-equivalent and thus marks through-stops passed when arrival is known
before departure; change IsPassed() on TrainStopTime so it returns true when
ActualDep != nil OR when ActualArr != nil only if ActualDep is nil AND the stop
is a terminal (or otherwise explicitly indicates no departure). Concretely,
update TrainStopTime.IsPassed() to prefer ActualDep, and only fall back to
ActualArr when ActualDep is nil or when the stop is terminal (use the
stop/segment terminal flag available on the TrainStopTime or parent stop
record), and make the same adjustment for IsLive() logic that currently uses
ActualArr as terminal-equivalent.
In `@apps/api/ws/hub.go`:
- Around line 114-116: The snapshots map is storing raw []byte slices which
allows callers to mutate cached data; make defensive copies using
append([]byte(nil), payload...) whenever storing or exposing payloads: when
assigning h.snapshots[msg.topic] in the Publish/DeliverSnapshot path (replace
direct assignment of msg.payload with a copy), when returning data in the
Snapshot method (return a copy of the stored slice), and when iterating to
publish to clients in DeliverSnapshot/Publish (send a copied slice to each
client rather than the stored slice). Use the existing symbols h.snapshotMu,
h.snapshots, Publish(), Snapshot(), DeliverSnapshot(), and msg.payload to locate
and update these three spots.
- Around line 94-99: Hub.Run currently exits on ctx.Done() without closing
channels or signalling producers, and Publish()/DeliverSnapshot() perform
unconditional blocking sends to h.deliver/h.publish which can deadlock producers
during shutdown; add a hub shutdown signal (e.g., a done chan on Hub) and change
Publish() and DeliverSnapshot() to return bool and use a non-blocking select
that sends on h.deliver/h.publish or fails fast if <-h.done or ctx is done,
ensure Hub.Run closes all client send channels in a deferred cleanup before
returning, and make snapshot access return copies (or copy the []byte before
returning) so callers can mutate without changing the cached payload; update
references to Hub.Run, Publish, DeliverSnapshot, h.deliver, h.publish, and
client send channels accordingly.
---
Nitpick comments:
In `@apps/api/cmd/sync-static/main.go`:
- Around line 191-215: The temp files created with os.CreateTemp (routesTmpFile
and stopsTmpFile) should have Close() deferred immediately after successful
creation to ensure they're closed on all control paths; update main.go so that
right after creating routesTmpFile and stopsTmpFile you call defer
routesTmpFile.Close() and defer stopsTmpFile.Close(), keep the existing
os.Remove defers for routesTmpPath/stopsTmpPath, and remove the explicit
routesTmpFile.Close() and stopsTmpFile.Close() calls later in the function (or
keep them only if you want to ignore their returned errors) so you don't rely on
manual closes in multiple places.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 45f3e628-8a58-4e18-91e1-a42e63299cde
📒 Files selected for processing (15)
apps/api/cmd/sync-static/main.goapps/api/collector/emitter.goapps/api/collector/storage_emitter.goapps/api/db/realtime_write.goapps/api/db/static_read.goapps/api/drainer/drainer.goapps/api/gtfs/realtime.goapps/api/gtfs/static.goapps/api/providers/amtrak/amtrak.goapps/api/routes/ingest.goapps/api/routes/static.goapps/api/spec/realtime.goapps/api/tiles/geojson.goapps/api/ws/handler.goapps/api/ws/hub.go
✅ Files skipped from review due to trivial changes (1)
- apps/api/collector/emitter.go
🚧 Files skipped from review as they are similar to previous changes (8)
- apps/api/routes/ingest.go
- apps/api/collector/storage_emitter.go
- apps/api/ws/handler.go
- apps/api/gtfs/static.go
- apps/api/providers/amtrak/amtrak.go
- apps/api/routes/static.go
- apps/api/tiles/geojson.go
- apps/api/db/static_read.go
|
@copilot resolve the merge conflicts in this pull request |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
apps/api/routes/ingest_test.go (1)
68-82: ⚡ Quick winStrengthen negative-path assertions by checking hub remains unchanged.
The 401/400 tests only check status codes; they should also assert no snapshot was published to the hub.
Proposed test hardening
func TestIngest_RejectsBadSecret(t *testing.T) { - srv, _ := setup(t, "shh") + srv, hub := setup(t, "shh") ... if resp.StatusCode != http.StatusUnauthorized { t.Errorf("status = %d, want 401", resp.StatusCode) } + if _, ok := hub.Snapshot("amtrak"); ok { + t.Fatal("unauthorized request should not publish to hub") + } } ... func TestIngest_RejectsBadJSON(t *testing.T) { - srv, _ := setup(t, "") + srv, hub := setup(t, "") ... if resp.StatusCode != http.StatusBadRequest { t.Errorf("status = %d, want 400", resp.StatusCode) } + if _, ok := hub.Snapshot("amtrak"); ok { + t.Fatal("invalid JSON should not publish to hub") + } }Also applies to: 99-109
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/api/routes/ingest_test.go` around lines 68 - 82, TestIngest_RejectsBadSecret only asserts the HTTP status; change the test to capture the second return value from setup (srv, hub := setup(t, "shh")) instead of discarding it, then after performing the request check that the hub has no published snapshots (e.g. assert hub.PublishCount == 0 or len(hub.PublishedSnapshots()) == 0 depending on the mock hub API) to ensure no snapshot was published; keep references to TestIngest_RejectsBadSecret, setup, and snapshotJSON to locate the test and the related helpers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/api/collector/poller_test.go`:
- Around line 106-107: The test currently calls cancel() then blocks
indefinitely on <-done which can hang if StartPoller never exits; update the
test to wait with a bounded timeout (e.g., use a select between receiving from
done and time.After) so the test fails fast if the poller doesn't shut down;
locate the shutdown sequence around StartPoller, the done channel receive and
the cancel() call and replace the plain <-done with a select that reports/fails
on timeout.
- Around line 31-33: The helper currently indexes f.feeds[len(f.feeds)-1] when
f.idx >= len(f.feeds), which will panic if f.feeds is empty; change the branch
in the method that uses f.feeds and f.idx to first check if len(f.feeds) == 0
and return a safe zero value plus an error (e.g., nil, fmt.Errorf("no feeds"))
or the appropriate test-friendly default instead of indexing into the slice;
update the callers in the test to handle the returned error or default value
accordingly.
In `@apps/api/collector/storage_emitter_test.go`:
- Around line 62-67: The test's HTTP handler in storage_emitter_test.go
currently checks only path and header but not the verb; add an assertion to
verify r.Method == "POST" inside the httptest.NewServer handler (the closure
that reads r.URL.Path and r.Header.Get(ingestSecretHeader) and sets gotSecret)
and call t.Errorf (or t.Fatalf) if the method is not POST so the test fails on
unexpected HTTP methods.
---
Nitpick comments:
In `@apps/api/routes/ingest_test.go`:
- Around line 68-82: TestIngest_RejectsBadSecret only asserts the HTTP status;
change the test to capture the second return value from setup (srv, hub :=
setup(t, "shh")) instead of discarding it, then after performing the request
check that the hub has no published snapshots (e.g. assert hub.PublishCount == 0
or len(hub.PublishedSnapshots()) == 0 depending on the mock hub API) to ensure
no snapshot was published; keep references to TestIngest_RejectsBadSecret,
setup, and snapshotJSON to locate the test and the related helpers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 4871d8f4-68ca-4a82-ae37-e982b30634d6
📒 Files selected for processing (6)
apps/api/collector/poller_test.goapps/api/collector/storage_emitter_test.goapps/api/drainer/replay_test.goapps/api/realtime/process_test.goapps/api/routes/ingest_test.goapps/api/ws/hub.go
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/api/ws/hub.go
# Conflicts: # pnpm-lock.yaml # pnpm-workspace.yaml Co-authored-by: Mr-Technician <26885142+Mr-Technician@users.noreply.github.com>
Done. Resolved the merge conflicts with |
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Summary by CodeRabbit
New Features
Infrastructure
Documentation
Tests