Skip to content

feat(ci): add release infrastructure with CI gates, versioning, and publishing#6

Merged
heyjawrsh merged 55 commits into
mainfrom
feature/release-infra
Apr 4, 2026
Merged

feat(ci): add release infrastructure with CI gates, versioning, and publishing#6
heyjawrsh merged 55 commits into
mainfrom
feature/release-infra

Conversation

@heyjawrsh

@heyjawrsh heyjawrsh commented Apr 4, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add Release Please manifest-based config with version sync across package.json and manifest.json
  • Add CI workflow with lint, typecheck, test, and build quality gates on every PR to main
  • Add Release Please workflow with auto-merge via branch protection
  • Add release publish workflow chained via workflow_run with all 5 implementation guardrails
  • Add extension zip packaging script with dist pre-check and stale zip cleanup
  • Add Chrome Web Store draft upload job (Phase 2, gated on CWS_EXTENSION_ID variable)

Test plan

  • Verify pnpm typecheck passes from repo root
  • Verify pnpm test passes (73/73 tests)
  • Verify pnpm --filter @vxrtx/extension package:zip creates vxrtx-extension-0.1.0.zip
  • Verify all config and workflow YAML files are valid
  • Configure branch protection on main after merge
  • First release smoke test after a feat: commit merges to main

heyjawrsh and others added 30 commits March 26, 2026 13:52
* Updating gitignore

* Split into concerns of a newly created monorepo
- Introduce LockedTabGroup type, storage key, and persistence helpers
- Add queryTabGroups, getLockedTabIds, and resolveStaleLockedGroups
  utilities (including stale group re-matching by name+color)
- Add get-locked-tab-groups, lock-tab-group, unlock-tab-group message
  actions and service worker handlers
- Integrate locking into organize, apply, and undo flows — locked tabs
  are excluded from all AI-driven reorganization
- Surface lock/unlock toggles in TabOrganizer idle state, with dormant
  lock management for groups no longer open
- Replace hardcoded indigo-* colors with a custom brand color scale
  (cyan) and coal background token via Tailwind @theme
- Update extension icons
Add a 1-5 granularity control for tab and bookmark organization that
lets users choose between broad and fine-grained grouping. Introduce
port-based messaging for long-running operations with progress updates,
elapsed timers, and session storage persistence for preview state.
…d folder filter

Persistent snapshot system that auto-captures state before every
destructive apply and supports manual named snapshots. Includes
rename, per-snapshot and bulk export to JSON, import with dedup,
selective restore (tabs/bookmarks/both), and auto-pruning at 20
snapshots. Adds type-to-filter input on the bookmark folders list.
…re 0, and retry logic

- Add system/user message separation for Claude (was single user message)
- Reduce temperature to 0.0 across all providers (was 0.3 or unset)
- Add withRetry() wrapper that retries parse failures with error context
- Unify system message across Claude, OpenAI, and OpenRouter providers
withRetry retries on any error thrown by parseFn, which can mask real programming/runtime issues (not just parse/validation failures) and trigger an unnecessary second LLM call. Consider only retrying for known parse/validation error types (e.g., ZodError, SyntaxError, or your own No valid JSON found... error) and rethrowing everything else immediately.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Move duplicated SYSTEM_MESSAGE constant from three provider files
into ai/types.ts to prevent drift and simplify future updates.
…ction

Replace brittle string-match check in isRetryableParseError with
instanceof against a dedicated error class thrown by extractJson.
…on messaging

All fetch() calls to LLM providers now use fetchWithTimeout() with a
60-second AbortController. The client-side long-running message channel
adds a 90-second safety timeout. Previously, a hung API connection
would cause the spinner to run indefinitely.
Users saw a spinner with no status updates for minutes with no way to
tell if the operation was working or hung.

Changes:
- Thread StatusCallback through AIProvider → withRetry → service worker
- Report 4 phases: preparing → sending to AI → waiting/retrying → done
- withRetry reports: "Waiting for AI response...", "Processing...",
  "Response invalid, retrying..."
- Service worker sends progress with item counts at each phase
- Reduce API timeout from 60s to 30s (single call should never need more)
- Client safety timeout resets on each progress message (45s silence = hung)
- Progress bar now visible during AI wait (total=4 phases)
For 118 tabs, the 30s fixed timeout was too tight — OpenRouter
legitimately needs more time for large sets.

Changes:
- Timeout now scales: 30s base + 0.25s per item, capped at 90s
  (118 tabs → ~60s timeout)
- Providers thread item count to fetchWithTimeout for per-call scaling
- Fix "Batch X of Y" → "Step X of Y" (these are phases, not batches)
- Client silence timeout increased to 60s (resets on each progress msg)
Console logs now show: model, item count, prompt size, timeout, elapsed
time, and response size for every OpenRouter call. Status messages show
the specific model name so users can see which model is being used.
This helps diagnose whether timeouts are due to the model, prompt size,
or network issues.
Log port connections, message actions, and OpenRouter request/response
details to the service worker console. Helps diagnose timeout and
connectivity issues without guessing.
Refactor monolithic prompt builders into composable modules:
rules(), granularityInstruction(), fewShotExamples(), dataBlock(),
schemaBlock(). Each is a pure function returning a string segment.

Add 3 diverse few-shot examples per feature:
- Tabs: dev/docs mix, shopping/social/media, stale+duplicates edge case
- Bookmarks: dev/learning, finance/travel, work+duplicates

Few-shot examples use the exact same JSON schema as real output
so the model learns the expected format from examples.
…n apply

Empty or whitespace-only group names from the AI now fail Zod
validation (triggering a retry). If an empty name somehow gets
through, createTabGroup falls back to "Untitled Group".

- Add .trim().min(1) to TabGroupSuggestionSchema.name
- Add .min(1) to TabGroupSuggestionSchema.tabIds
- Add .trim().min(1) to BookmarkFolderSchema.name
- Add .min(1) to BookmarkFolderSchema.bookmarkIds
- Defensive fallback in createTabGroup for empty names
Pinned tabs are now filtered out before AI analysis, stale detection,
and duplicate detection — same treatment as locked groups. The apply
handler also protects pinned tabs from being grouped, closed, or
deduplicated.

- Add pinned:boolean to TabInfo, populated from Chrome API
- Filter pinned tabs in handleOrganizeTabs before AI call
- Build protectedTabIds (pinned + locked) in handleApplyTabSuggestions
- Report pinned tab count in reasoning string
- Handle edge case where all tabs are pinned/locked
All newly created tab groups are collapsed after applying suggestions,
reducing tab bar clutter. Groups are collapsed as a post-processing
step after all groups are created, not during creation.
heyjawrsh added 25 commits April 3, 2026 10:36
Both tab and bookmark organizer UIs now have bulk toggle buttons:
- Tab stale section: "Select all" / "Deselect all" with count (X/Y)
- Tab duplicates section: same toggle
- Bookmark duplicates section: same toggle

Adds allStale[] to TabOrganizer preview state for select-all support.
…coring

52 tests covering the full AI pipeline — zero production code changes:

Parser tests (20):
- extractJson: valid JSON, code-block wrapped, prose embedded, no-JSON
- parseTabOrganization: valid, empty name, whitespace name, empty tabIds,
  invalid color fallback, optional field defaults
- parseBookmarkOrganization: valid, empty folder name, empty bookmarkIds
- withRetry: success, retry-on-parse-fail, both-fail, API error propagation,
  onStatus callback phases

Prompt builder tests (22):
- Tab and bookmark builders: all sections present (rules, granularity,
  few-shot, data, schema), URL inclusion/exclusion, granularity levels 1-5,
  3 few-shot examples, example ordering before real data

Scoring utility + tests (10):
- scoreTabGrouping / scoreBookmarkGrouping: precision calculation,
  group count delta, mismatch detection
- Perfect match, partial match, complete mismatch, unassigned items

Golden fixtures: 3 tab sets + 2 bookmark sets with labeled groupings.
…ntent

Split prompt builders into cached (rules + few-shot + schema) and
dynamic (granularity + data) parts via new buildXxxPromptParts() exports.

Claude providers now send user messages as content blocks with
cache_control: { type: "ephemeral" } on the static prefix. On cache
hit, Anthropic charges 90% less for the cached portion (~1.5K tokens
of rules/examples/schema identical across every call).

OpenAI caches automatically for prompts >1,024 tokens — no changes needed.
OpenRouter passes through provider caching.
…for large

Items <= 30 use claude-haiku-4-5 (faster, cheaper), items > 30 use
claude-sonnet-4 (more capable for complex grouping). Routing is
transparent — the user sees the same results, just faster and cheaper
for small tab/bookmark sets.

- Add selectClaudeModel(itemCount) to ai/types.ts
- Replace hardcoded model strings in relaxed + yolo providers
- Add diagnostic logging showing routed model name
When a user edits AI-suggested tab groups (renames, disables, removes
tabs) and clicks Apply, the diff between AI suggestions and applied
result is extracted as domain-level correction signals.

Corrections are stored in chrome.storage.local and injected into the
prompt's dynamic section as USER PREFERENCES on subsequent organize
calls. The AI sees "github.com tabs → prefer 'Dev' (corrected 3x)"
and learns from past user feedback.

- CorrectionSignal type: domain, preferredGroup/rejectedGroup, count, lastSeen
- extractCorrections(): diffs AI groups vs applied groups
- mergeCorrections(): upserts by domain+group, prunes to max 50
- rankCorrections(): sorts by count × recency (14-day decay half-life)
- correctionsBlock(): formats top 10 as structured prompt text
- OrganizeTabsOptions replaces positional args on AIProvider.organizeTabs
- 16 new tests (68 total, all passing)
When a user accepts AI-suggested groupings without edits, domain→group
pairs are now captured as "acceptance" signals alongside explicit
corrections. This builds domain affinity over time — the AI learns
which groupings work well even when the user doesn't edit anything.

- Add source field to CorrectionSignal: "correction" | "acceptance"
- extractCorrections() now captures accepted groupings as acceptance signals
- Acceptances weighted at 0.5x vs corrections (1x) in decay ranking
- Merge upgrades source to "correction" if either signal is explicit
- correctionsBlock() formats acceptances as '"Dev" works well (confirmed 5x)'
- 5 new tests (73 total, all passing)
Every AI organize call is now logged with: variant ID, model, item
count, latency, and group count. At apply time, the edit count
(corrections vs acceptances) is recorded. Undo marks the log entry
as undone.

This enables offline A/B analysis: compare prompt variants by
measuring edit rate (lower = better), undo rate, latency, etc.
Currently all requests use variant "default" — adding variants
is a config change, not a code change.

- ExperimentLog type with all fields for analysis
- appendExperimentLog/updateExperimentLog in storage
- Organize handler logs experiment at AI response time
- Apply handler records edit count on the same log entry
- Undo handler marks latest log as undone
- Capped at 200 entries (oldest pruned)
…ame UX

Users can now steer AI results with custom text guidance:
- Collapsible "Guidance" section in both tab and bookmark organizers
- Quick-preset chips: "By project", "By domain", "By activity", etc.
- Guidance persisted in settings, injected into AI prompts as USER GUIDANCE
- Debounced auto-save (500ms) to avoid excessive writes

Group/folder rename improvements:
- Tab GroupCard: pencil icon appears on hover to signal editability
- Bookmark folder cards: now support inline rename (was read-only)
- Edit input gets brand-400 focus border for better visual feedback
- New BookmarkFolderCard component with same UX pattern as GroupCard

Backend wiring:
- Settings gains tabGuidance/bookmarkGuidance fields
- OrganizeBookmarksOptions mirrors OrganizeTabsOptions pattern
- Prompt builders accept guidance in options, inject as USER GUIDANCE block
- All 4 providers + service worker thread guidance through
Root cause: bookmark organize was failing silently because max_tokens
was hardcoded to 4096. For 200+ bookmarks, the output JSON (every
bookmark ID assigned to a folder + reasoning) exceeds 4096 tokens,
causing truncated JSON → parse failure → retry → same truncation →
error after 2 attempts.

Fix:
- aiMaxTokens(itemCount): 2048 base + 20 tokens per item, capped at 16384
- Applied to all Claude, OpenAI, and OpenRouter calls
- Added diagnostic logging to bookmark organize path and OpenAI calls
- Console now shows: item count, prompt size, max_tokens, timeout

Scaling examples:
  20 items  → 2448 max_tokens
  100 items → 4048 max_tokens
  500 items → 12048 max_tokens
Some models via OpenRouter (including Claude Sonnet) cap output tokens
lower than 16384. Reducing cap to 8192 for broad compatibility.
2,410 bookmarks in a single 304K char prompt was never going to work.
Now splits into batches of 100 bookmarks when the collection exceeds
100 items. Each batch is processed independently and results are
merged by folder name (case-insensitive consolidation).

For the user's 2,410 bookmarks: 25 batches of ~100, each with a
~13K char prompt that any model can handle. Progress shows
"Batch 1/25: analyzing 100 bookmarks..." per batch.

- Collections <= 100: single call (unchanged behavior)
- Collections > 100: batched with per-batch progress
- Failed batches logged but don't abort the whole operation
- Folder names consolidated across batches (title-cased)
- Duplicates and reasoning merged across all batches
Pages were conditionally rendered ({page === "tabs" && <TabOrganizer />})
which unmounted the component on tab switch — killing port connections,
losing loading state, and resetting analysis progress.

Now all pages are always mounted but hidden via CSS (className="hidden").
Active page gets animate-fade-in. This means:
- Running analyses continue when switching to Settings and back
- Preview state survives tab switches without session storage hacks
- Port connections for long-running operations stay alive
- Loading spinners and progress indicators persist
Bookmark auto-snapshots weren't appearing because bookmark organize
was failing before reaching the apply step. Now that batching works,
auto-snapshots should fire normally. Added console log to confirm
when the snapshot is created.
Wrap session snapshot and auto-snapshot writes in try/catch with
console.error — if chrome.storage fails (quota, serialization), we'll
see the actual error instead of silent failure.

Logs: move/folder/removal counts, snapshot entry count, and explicit
success/failure messages for both session and persistent snapshots.
TabSnapshot only stored {id, groupId, windowId} — no group metadata.
When restoring, chrome.tabs.group() creates new unnamed groups because
the old groupId is gone. Groups were structurally correct (right tabs
together) but had no titles or colors.

Fix:
- Add TabGroupSnapshot type: { groupId, title, color }
- Add optional tabGroups[] to Snapshot interface (backward compatible)
- Capture live group metadata at snapshot time (auto + manual)
- Store group metadata in session for undo
- Restore: after chrome.tabs.group(), call chrome.tabGroups.update()
  with saved title + color
- Both undo and full restore now recover group names and colors
Bookmark organization now creates nested folder structures. The AI
returns path-based names like "Dev/Frontend" and the apply handler
creates the folder hierarchy automatically.

Prompt changes:
- Rules allow "/" for hierarchy, max 3 levels deep
- Granularity scales nesting: level 1 = flat only, level 5 = deep
- Few-shot examples demonstrate nested paths
- Schema comment documents "/" convention

Service worker:
- New createFolderPath() splits on "/", creates intermediate folders
  with a cache to avoid duplicates (e.g., "Dev" created once for
  both "Dev/Frontend" and "Dev/Backend")
- Batch merge preserves first-seen casing instead of broken title-case
- Fixed dead nameMap variable that was never populated

Parser:
- Reject leading/trailing slashes and double slashes in folder names

UI:
- BookmarkFolderCard renders hierarchy: "Dev / Frontend" with parent
  segments muted and leaf segment bright
- Folders sorted alphabetically so shared parents appear adjacent
- "Nested" guidance preset added, "Flat" preset updated to explicitly
  suppress "/" nesting
Adds a .refine() check that rejects folder paths with more than 3
segments (e.g., "A/B/C/D" fails). If the AI returns deeper nesting,
Zod validation fails and triggers a retry with the error context.
Pinned tabs were always silently excluded with no user control. Now
a toggle in the Tab Organizer idle state lets users choose:

- OFF (default): pinned tabs excluded, same as before
- ON: pinned tabs included in AI organization

Design:
- Compact toggle row with iOS-style switch (brand-400 when active)
- Pin icon on the right, label changes color with state
- Border highlights when active (brand-400/30)
- Setting persisted in chrome.storage.local
- Loaded on mount, saved immediately on toggle

Wiring:
- includePinnedTabs added to Settings type
- Passed through organize-tabs message payload
- Service worker conditionally filters based on the flag
The secure tier now has functional bookmark organization instead of
returning empty results. Same pattern as the existing tab rule-based
fallback.

Bookmark organization (secure tier):
- groupBookmarksByDomain(): groups bookmarks by URL domain
- ruleBasedBookmarkOrganize(): creates folder suggestions from domain
  groups with granularity-aware minimum group size
- Ungrouped bookmarks collected into a "Misc" folder
- Duplicate detection included (already worked, now surfaced)
- Folders sorted largest-first

Bookmark location suggestions (secure tier):
- suggestFolderByDomain(): matches bookmark's domain against existing
  folder names/paths, returns up to 3 suggestions with confidence
- Replaces the previous hard error with actual suggestions

Before: "Local AI not yet available. Showing current bookmarks."
After: Actual folder suggestions grouped by domain, ready to apply.
Rule-based tab organization now uses a two-pass strategy:

Pass 1: Domain grouping (existing behavior)
  - github.com tabs → "Github" group
  - youtube.com tabs → "Youtube" group

Pass 2: Keyword themes for ungrouped tabs
  - Tabs not captured by domain groups get keyword-extracted
  - Common title words (3+ chars, stop words filtered) that appear
    in 2+ tabs form theme groups
  - Example: "React Docs" + "React Hooks — SO" → "React" group
    even though they're on different domains

Strategy: domain groups first (high confidence), then keyword themes
for leftovers (greedy, largest keyword clusters first). Tabs are never
assigned to both — domain takes priority.
The secure tier now supports three local AI engines:

1. Rule-based (default): domain + keyword grouping, no AI model needed
2. Ollama: connects to local Ollama instance via OpenAI-compatible API
   - Configurable server URL and model (defaults: localhost:11434, llama3.2)
   - Diagnostic logging with timing
   - Uses relaxed-tier data (titles only, no URLs)
3. Chrome AI: uses Chrome's built-in Gemini Nano via the Prompt API
   - Available in Chrome 138+ extensions
   - No install, no network, fully on-device
   - Checks LanguageModel.availability() before use
   - Session created per request and destroyed after

Architecture:
- New OllamaProvider and ChromeAIProvider classes
- LocalAIProvider type: "rule-based" | "ollama" | "chrome-ai"
- Settings: localAIProvider, ollamaUrl, ollamaModel
- Provider factory routes secure tier based on localAIProvider setting
- Service worker checks localAIProvider before falling back to rule-based

Settings UI:
- 3-way segmented control: Rules / Ollama / Chrome AI
- Ollama: server URL input + model selector (Llama 3.2, Mistral, Gemma 2, Phi-3, custom)
- Chrome AI: description text (no config needed)
- Contextual descriptions for each engine
…ublishing

Add complete release automation pipeline for the Chrome extension:

- Release Please manifest-based config with version sync across
  package.json and manifest.json (bump-minor-pre-major enabled)
- CI workflow with lint, typecheck, test, and build quality gates
- Release Please workflow with auto-merge via branch protection
- Release publish workflow chained via workflow_run with all 5
  implementation guardrails (success filter, explicit ref:main,
  early-exit, explicit permissions, visible auto-merge failure)
- Extension zip packaging script with dist pre-check and cleanup
- Chrome Web Store draft upload job (Phase 2, gated on CWS config)
Merge origin/main into feature/release-infra, keeping the feature
branch's more complete implementations while incorporating main's
monorepo restructure and additional features.
- Remove ESLint, add Biome v2 with recommended rules tuned for
  Chrome extension context (a11y rules as warnings)
- Add lefthook with pre-commit hook that auto-fixes staged files
- Auto-fix import ordering, formatting across all extension source
@heyjawrsh heyjawrsh merged commit 73c637c into main Apr 4, 2026
1 check passed
@heyjawrsh heyjawrsh deleted the feature/release-infra branch April 4, 2026 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant