Skip to content

baywire/baywire.app

Repository files navigation

Baywire

The live wire for Tampa Bay.

A unified guide to live music, festivals, food, and family fun across the Tampa Bay area — Tampa, St. Petersburg, Clearwater, Brandon, Bradenton, Safety Harbor, Dunedin, and an Other catch-all for edge cases. Listings are aggregated daily from multiple sources, deduplicated, and ranked for readability. Curated Places (beaches, venues, food, and similar) ship on /places via a separate discovery pipeline. Lives at baywire.app.

Baywire runs 18 event adapters (src/lib/scrapers/index.ts). The daily matrix includes only sources that are enabled in the database and resolve through the adapter registry (scripts/ci/scrape-matrix.ts). Each job tries structured data first (JSON-LD, Tribe REST, Ticketmaster Discovery API, iCal) and falls back to OpenAI extraction when needed; venue rows are upserted into Places as events are processed. Taxonomy (config/taxonomy.json) drives tags, vibes, discovery verticals, and editorial re-classification — see ARCHITECTURE.md for the full ingestion design.

Vercel hosts the read-only Next.js app — scrapes, discovery, and backfill run on GitHub Actions, not on Vercel.

  taxonomy.json ──publish──▶ DB taxonomy + backfill_jobs
                                    │
  GHA scrape (daily) ───────────────┼──▶ events / canonical_events
  GHA discover (weekly) ────────────┼──▶ places
  GHA backfill (every 6h) ──────────┘    (classify, sanitize, new profiles)
                                    │
                                    ▼
                         Prisma Postgres + Accelerate
                                    │
                                    ▼
                         Vercel Next.js (read-only)

Stack

  • Next.js 16 (App Router, React 19, RSC by default) + Serwist (@serwist/turbopack) for the PWA shell
  • proxy.ts — anonymous guest profile cookie bootstrap
  • Tailwind CSS v4 + custom coastal palette
  • Prisma ORM + Prisma Postgres + Prisma Accelerate; URL in prisma.config.ts for Prisma 7 CLI
  • OpenAI gpt-4.1-mini with Zod-typed structured outputs (OPENAI_BASE_URL for compatible proxies)
  • Google Places API (New) + Vercel Blob for place discovery imagery
  • Stytch for SMS sign-in (optional locally)
  • cheerio, p-limit, Playwright for browser/WAF adapters
  • GitHub Actions — scrape matrix, places discovery, taxonomy backfill, cleanup

Sources

Slug Site Path Notes
eventbrite eventbrite.com JSON-LD Geo-search across metro cities, 2 pages each
ticketmaster ticketmaster.com/discover/tampa Discovery API DMA 635 (Tampa-St. Pete-Sarasota)
visit_tampa_bay visittampabay.com/events JSON-LD Official tourism
visit_st_pete_clearwater visitstpeteclearwater.com JSON-LD /events + /events-festivals
tampa_gov tampa.gov/calendar JSON-LD + ICS City calendar
ilovetheburg ilovetheburg.com Tribe REST API St. Pete blog
thats_so_tampa thatssotampa.com Tribe REST API Tampa-side blog
tampa_bay_times tampabay.com/things-to-do HTML + LLM Editorial weekend picks
tampa_bay_markets tampabaymarkets.com HTML + LLM Farmers' markets
safety_harbor cityofsafetyharbor.com RSS + LLM CivicPlus feed
side_splitters sidesplitterscomedy.com HTML + LLM Comedy club
dont_tell_comedy donttellcomedy.com HTML + LLM Pop-up comedy
funny_bone_tampa tampa.funnybone.com HTML + LLM DataDome; optional cookie secret in CI
straz_center strazcenter.org HTML + LLM Playwright / Incapsula
tampa_theatre tampatheatre.org HTML + LLM Live events + detail pages

Browser-powered sources: dunedin_gov, unation, feverup, straz_center, funny_bone_tampa, visit_tampa_bay (listing) — see ARCHITECTURE.md.

Local setup

npm install
cp .env.example .env.local
# Required: DATABASE_URL, OPENAI_API_KEY
# Optional: TICKETMASTER_API_KEY, GOOGLE_MAPS_API_KEY, BLOB_READ_WRITE_TOKEN, STYTCH_*, CRON_SECRET

npm run db:migrate:dev    # or db:push for a quick schema sync
npm run ingestion:taxonomy-publish   # seed taxonomy tables from config/taxonomy.json

npm run ingestion:scrape            # full scrape (or: npm run ingestion:scrape -- eventbrite)
# Optional: INLINE_CLASSIFY=1 for synchronous editorial during scrape

npm run dev

Open http://localhost:3000.

Prisma Postgres

  1. Create a database at console.prisma.io.
  2. Set DATABASE_URL to the prisma+postgres://accelerate... URL.
  3. npm run db:migrate:dev (or db:push in early dev).

Useful scripts

Command What it does
npm run dev Next.js dev server
npm run build Production build (postinstall runs prisma generate)
npm run typecheck tsc --noEmit
npm run lint ESLint
npm run db:migrate:dev Create/apply a dev migration
npm run db:migrate Apply migrations (production)
npm run db:studio Prisma Studio
npm run ingestion:scrape [-- <slug>] Event scrape (one source or all enabled)
npm run ingestion:discover Google Places discovery (--help)
npm run ingestion:refresh Re-verify existing discovery places
npm run ingestion:backfill Drain backfill queue (--limit, --kind)
npm run ingestion:taxonomy-publish Publish config/taxonomy.json → DB + enqueue diff jobs
npm run ingestion:cleanup Delete stale events / places (--skip-places)
npm run ingestion:matrix Emit GHA scrape matrix JSON (CI only)
npm run ops:blob-purge Purge Vercel Blob prefix (--execute to delete)

Taxonomy (quick reference)

Edit config/taxonomy.json — terms, aliases, discovery profiles, prompt bundles, rankingGuides. Bump version (and promptRevision when prompts change), then:

npm run ingestion:taxonomy-publish
npm run ingestion:backfill -- --limit 500
  • Async classify (default): scrape/discover enqueue classify_* jobs; backfill runs on a schedule.
  • INLINE_CLASSIFY=1: run editorial inline during scrape/discover/refresh (local debugging).

Details: ARCHITECTURE.md — Taxonomy and Classification fingerprints.

Deployment

Production splits Vercel (HTTP) from GitHub Actions (scheduled writes).

Vercel

  1. Import repo; set DATABASE_URL (and optional CRON_SECRET, Stytch, Blob, Google keys).
  2. No Vercel crons for scrapes — vercel.json is empty.

GitHub Actions

See .github/workflows/README.md for the full index.

Workflow Schedule (UTC) Command
ingestion-scrape.yml Daily 12:00 ingestion:scrape (matrix)
ingestion-discover.yml Sun 08:00 ingestion:discover
ingestion-backfill.yml Every 6h ingestion:backfill
ingestion-taxonomy-publish.yml Push config/taxonomy.jsonmain ingestion:taxonomy-publish
ingestion-cleanup.yml Sun 09:00 ingestion:cleanup

Scrape secrets: DATABASE_URL, OPENAI_API_KEY; optional OPENAI_BASE_URL, OPENAI_EXTRACT_MODEL, TICKETMASTER_API_KEY, FUNNYBONE_SCRAPE_COOKIE.

Discover secrets: add GOOGLE_MAPS_API_KEY, BLOB_READ_WRITE_TOKEN.

Backfill secrets: DATABASE_URL, OPENAI_API_KEY (for classify jobs).

Manual scrape trigger

  • workflow_dispatch on scrape workflow with optional source slug.
  • POST /api/cron/scrape with Authorization: Bearer $CRON_SECRET (202 + background after).

Project layout

ARCHITECTURE.md           Ingestion, taxonomy, pipelines (this doc)
config/taxonomy.json      Taxonomy draft (publish to DB)
proxy.ts                  Guest profile cookie bootstrap
src/
  app/                    Next.js routes, UI, metrics, cron API
  ingestion/              Taxonomy, backfill queue, pipeline entrypoints, adapters
    taxonomy/             Snapshot, publish, diff, sanitize, validate
    queue/                enqueue + process backfill jobs
    pipelines/            events/scrape, places/discover|refresh, maintenance
    adapters/             resolveAdapter (tribe, jsonld, custom, …)
    kernel/               classification fingerprint
  lib/
    pipeline/             Scrape, canonicalize, editorial orchestration
    scrapers/             Per-source adapters
    extract/              OpenAI extraction + editorial
    db/                   Prisma client + queries (+ queriesTaxonomy)
    places/               Google Places + discovery helpers
prisma/schema.prisma      baywire schema
.github/workflows/        Scheduled ingestion — see .github/workflows/README.md
.github/actions/          Composite steps (install, playwright-chromium)
scripts/                  CLI entrypoints — see scripts/README.md
  ingestion/              scrape, discover, refresh, backfill, taxonomy-publish
  maintenance/            cleanup-expired
  ci/                     scrape-matrix (GHA)
  ops/                    blob utilities
  _lib/                   shared runCli + arg helpers

Cost & rate posture

  • Per-host pacing ~1 req / 1.1s; extraction concurrency 4.
  • Structured-first adapters avoid LLM calls when JSON-LD/ICS/API succeeds.
  • Content-hash skips re-extraction when upstream payload unchanged.
  • Classification fingerprint skips editorial only when content + taxonomy version + prompt revision match — bumping taxonomy version triggers re-classify via backfill (not a full re-scrape).
  • Reduced HTML capped at 16k chars before gpt-4.1-mini.
  • Read helpers use Accelerate cacheStrategy where noted in query modules.

Attribution & ToS

This project respects each source's robots.txt and only fetches public listing pages. Event cards link to originals; the footer lists enabled sources from the database. For removal requests, disable the adapter or open an issue.

About

A modern, AI-curated guide to live music, festivals, food, and family fun across the Tampa Bay area — Tampa, St. Petersburg, Clearwater, Brandon, and Bradenton.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages