feat(ci): add release infrastructure with CI gates, versioning, and publishing by heyjawrsh · Pull Request #6 · MakeGoodShip/vxrtx

heyjawrsh · 2026-04-04T12:37:59Z

Summary

Add Release Please manifest-based config with version sync across package.json and manifest.json
Add CI workflow with lint, typecheck, test, and build quality gates on every PR to main
Add Release Please workflow with auto-merge via branch protection
Add release publish workflow chained via workflow_run with all 5 implementation guardrails
Add extension zip packaging script with dist pre-check and stale zip cleanup
Add Chrome Web Store draft upload job (Phase 2, gated on CWS_EXTENSION_ID variable)

Test plan

Verify pnpm typecheck passes from repo root
Verify pnpm test passes (73/73 tests)
Verify pnpm --filter @vxrtx/extension package:zip creates vxrtx-extension-0.1.0.zip
Verify all config and workflow YAML files are valid
Configure branch protection on main after merge
First release smoke test after a feat: commit merges to main

* Updating gitignore * Split into concerns of a newly created monorepo

@theme

- Introduce LockedTabGroup type, storage key, and persistence helpers - Add queryTabGroups, getLockedTabIds, and resolveStaleLockedGroups utilities (including stale group re-matching by name+color) - Add get-locked-tab-groups, lock-tab-group, unlock-tab-group message actions and service worker handlers - Integrate locking into organize, apply, and undo flows — locked tabs are excluded from all AI-driven reorganization - Surface lock/unlock toggles in TabOrganizer idle state, with dormant lock management for groups no longer open - Replace hardcoded indigo-* colors with a custom brand color scale (cyan) and coal background token via Tailwind @theme - Update extension icons

Add a 1-5 granularity control for tab and bookmark organization that lets users choose between broad and fine-grained grouping. Introduce port-based messaging for long-running operations with progress updates, elapsed timers, and session storage persistence for preview state.

…d folder filter Persistent snapshot system that auto-captures state before every destructive apply and supports manual named snapshots. Includes rename, per-snapshot and bulk export to JSON, import with dedup, selective restore (tabs/bookmarks/both), and auto-pruning at 20 snapshots. Adds type-to-filter input on the bookmark folders list.

… for review, case study, etc

…re 0, and retry logic - Add system/user message separation for Claude (was single user message) - Reduce temperature to 0.0 across all providers (was 0.3 or unset) - Add withRetry() wrapper that retries parse failures with error context - Unify system message across Claude, OpenAI, and OpenRouter providers

withRetry retries on any error thrown by parseFn, which can mask real programming/runtime issues (not just parse/validation failures) and trigger an unnecessary second LLM call. Consider only retrying for known parse/validation error types (e.g., ZodError, SyntaxError, or your own No valid JSON found... error) and rethrowing everything else immediately. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Move duplicated SYSTEM_MESSAGE constant from three provider files into ai/types.ts to prevent drift and simplify future updates.

…ction Replace brittle string-match check in isRetryableParseError with instanceof against a dedicated error class thrown by extractJson.

…on messaging All fetch() calls to LLM providers now use fetchWithTimeout() with a 60-second AbortController. The client-side long-running message channel adds a 90-second safety timeout. Previously, a hung API connection would cause the spinner to run indefinitely.

Users saw a spinner with no status updates for minutes with no way to tell if the operation was working or hung. Changes: - Thread StatusCallback through AIProvider → withRetry → service worker - Report 4 phases: preparing → sending to AI → waiting/retrying → done - withRetry reports: "Waiting for AI response...", "Processing...", "Response invalid, retrying..." - Service worker sends progress with item counts at each phase - Reduce API timeout from 60s to 30s (single call should never need more) - Client safety timeout resets on each progress message (45s silence = hung) - Progress bar now visible during AI wait (total=4 phases)

For 118 tabs, the 30s fixed timeout was too tight — OpenRouter legitimately needs more time for large sets. Changes: - Timeout now scales: 30s base + 0.25s per item, capped at 90s (118 tabs → ~60s timeout) - Providers thread item count to fetchWithTimeout for per-call scaling - Fix "Batch X of Y" → "Step X of Y" (these are phases, not batches) - Client silence timeout increased to 60s (resets on each progress msg)

Console logs now show: model, item count, prompt size, timeout, elapsed time, and response size for every OpenRouter call. Status messages show the specific model name so users can see which model is being used. This helps diagnose whether timeouts are due to the model, prompt size, or network issues.

Log port connections, message actions, and OpenRouter request/response details to the service worker console. Helps diagnose timeout and connectivity issues without guessing.

Refactor monolithic prompt builders into composable modules: rules(), granularityInstruction(), fewShotExamples(), dataBlock(), schemaBlock(). Each is a pure function returning a string segment. Add 3 diverse few-shot examples per feature: - Tabs: dev/docs mix, shopping/social/media, stale+duplicates edge case - Bookmarks: dev/learning, finance/travel, work+duplicates Few-shot examples use the exact same JSON schema as real output so the model learns the expected format from examples.

…n apply Empty or whitespace-only group names from the AI now fail Zod validation (triggering a retry). If an empty name somehow gets through, createTabGroup falls back to "Untitled Group". - Add .trim().min(1) to TabGroupSuggestionSchema.name - Add .min(1) to TabGroupSuggestionSchema.tabIds - Add .trim().min(1) to BookmarkFolderSchema.name - Add .min(1) to BookmarkFolderSchema.bookmarkIds - Defensive fallback in createTabGroup for empty names

Pinned tabs are now filtered out before AI analysis, stale detection, and duplicate detection — same treatment as locked groups. The apply handler also protects pinned tabs from being grouped, closed, or deduplicated. - Add pinned:boolean to TabInfo, populated from Chrome API - Filter pinned tabs in handleOrganizeTabs before AI call - Build protectedTabIds (pinned + locked) in handleApplyTabSuggestions - Report pinned tab count in reasoning string - Handle edge case where all tabs are pinned/locked

All newly created tab groups are collapsed after applying suggestions, reducing tab bar clutter. Groups are collapsed as a post-processing step after all groups are created, not during creation.

Both tab and bookmark organizer UIs now have bulk toggle buttons: - Tab stale section: "Select all" / "Deselect all" with count (X/Y) - Tab duplicates section: same toggle - Bookmark duplicates section: same toggle Adds allStale[] to TabOrganizer preview state for select-all support.

…coring 52 tests covering the full AI pipeline — zero production code changes: Parser tests (20): - extractJson: valid JSON, code-block wrapped, prose embedded, no-JSON - parseTabOrganization: valid, empty name, whitespace name, empty tabIds, invalid color fallback, optional field defaults - parseBookmarkOrganization: valid, empty folder name, empty bookmarkIds - withRetry: success, retry-on-parse-fail, both-fail, API error propagation, onStatus callback phases Prompt builder tests (22): - Tab and bookmark builders: all sections present (rules, granularity, few-shot, data, schema), URL inclusion/exclusion, granularity levels 1-5, 3 few-shot examples, example ordering before real data Scoring utility + tests (10): - scoreTabGrouping / scoreBookmarkGrouping: precision calculation, group count delta, mismatch detection - Perfect match, partial match, complete mismatch, unassigned items Golden fixtures: 3 tab sets + 2 bookmark sets with labeled groupings.

…ntent Split prompt builders into cached (rules + few-shot + schema) and dynamic (granularity + data) parts via new buildXxxPromptParts() exports. Claude providers now send user messages as content blocks with cache_control: { type: "ephemeral" } on the static prefix. On cache hit, Anthropic charges 90% less for the cached portion (~1.5K tokens of rules/examples/schema identical across every call). OpenAI caches automatically for prompts >1,024 tokens — no changes needed. OpenRouter passes through provider caching.

…for large Items <= 30 use claude-haiku-4-5 (faster, cheaper), items > 30 use claude-sonnet-4 (more capable for complex grouping). Routing is transparent — the user sees the same results, just faster and cheaper for small tab/bookmark sets. - Add selectClaudeModel(itemCount) to ai/types.ts - Replace hardcoded model strings in relaxed + yolo providers - Add diagnostic logging showing routed model name

When a user edits AI-suggested tab groups (renames, disables, removes tabs) and clicks Apply, the diff between AI suggestions and applied result is extracted as domain-level correction signals. Corrections are stored in chrome.storage.local and injected into the prompt's dynamic section as USER PREFERENCES on subsequent organize calls. The AI sees "github.com tabs → prefer 'Dev' (corrected 3x)" and learns from past user feedback. - CorrectionSignal type: domain, preferredGroup/rejectedGroup, count, lastSeen - extractCorrections(): diffs AI groups vs applied groups - mergeCorrections(): upserts by domain+group, prunes to max 50 - rankCorrections(): sorts by count × recency (14-day decay half-life) - correctionsBlock(): formats top 10 as structured prompt text - OrganizeTabsOptions replaces positional args on AIProvider.organizeTabs - 16 new tests (68 total, all passing)

When a user accepts AI-suggested groupings without edits, domain→group pairs are now captured as "acceptance" signals alongside explicit corrections. This builds domain affinity over time — the AI learns which groupings work well even when the user doesn't edit anything. - Add source field to CorrectionSignal: "correction" | "acceptance" - extractCorrections() now captures accepted groupings as acceptance signals - Acceptances weighted at 0.5x vs corrections (1x) in decay ranking - Merge upgrades source to "correction" if either signal is explicit - correctionsBlock() formats acceptances as '"Dev" works well (confirmed 5x)' - 5 new tests (73 total, all passing)

Every AI organize call is now logged with: variant ID, model, item count, latency, and group count. At apply time, the edit count (corrections vs acceptances) is recorded. Undo marks the log entry as undone. This enables offline A/B analysis: compare prompt variants by measuring edit rate (lower = better), undo rate, latency, etc. Currently all requests use variant "default" — adding variants is a config change, not a code change. - ExperimentLog type with all fields for analysis - appendExperimentLog/updateExperimentLog in storage - Organize handler logs experiment at AI response time - Apply handler records edit count on the same log entry - Undo handler marks latest log as undone - Capped at 200 entries (oldest pruned)

…ame UX Users can now steer AI results with custom text guidance: - Collapsible "Guidance" section in both tab and bookmark organizers - Quick-preset chips: "By project", "By domain", "By activity", etc. - Guidance persisted in settings, injected into AI prompts as USER GUIDANCE - Debounced auto-save (500ms) to avoid excessive writes Group/folder rename improvements: - Tab GroupCard: pencil icon appears on hover to signal editability - Bookmark folder cards: now support inline rename (was read-only) - Edit input gets brand-400 focus border for better visual feedback - New BookmarkFolderCard component with same UX pattern as GroupCard Backend wiring: - Settings gains tabGuidance/bookmarkGuidance fields - OrganizeBookmarksOptions mirrors OrganizeTabsOptions pattern - Prompt builders accept guidance in options, inject as USER GUIDANCE block - All 4 providers + service worker thread guidance through

Root cause: bookmark organize was failing silently because max_tokens was hardcoded to 4096. For 200+ bookmarks, the output JSON (every bookmark ID assigned to a folder + reasoning) exceeds 4096 tokens, causing truncated JSON → parse failure → retry → same truncation → error after 2 attempts. Fix: - aiMaxTokens(itemCount): 2048 base + 20 tokens per item, capped at 16384 - Applied to all Claude, OpenAI, and OpenRouter calls - Added diagnostic logging to bookmark organize path and OpenAI calls - Console now shows: item count, prompt size, max_tokens, timeout Scaling examples: 20 items → 2448 max_tokens 100 items → 4048 max_tokens 500 items → 12048 max_tokens

Some models via OpenRouter (including Claude Sonnet) cap output tokens lower than 16384. Reducing cap to 8192 for broad compatibility.

2,410 bookmarks in a single 304K char prompt was never going to work. Now splits into batches of 100 bookmarks when the collection exceeds 100 items. Each batch is processed independently and results are merged by folder name (case-insensitive consolidation). For the user's 2,410 bookmarks: 25 batches of ~100, each with a ~13K char prompt that any model can handle. Progress shows "Batch 1/25: analyzing 100 bookmarks..." per batch. - Collections <= 100: single call (unchanged behavior) - Collections > 100: batched with per-batch progress - Failed batches logged but don't abort the whole operation - Folder names consolidated across batches (title-cased) - Duplicates and reasoning merged across all batches

Pages were conditionally rendered ({page === "tabs" && <TabOrganizer />}) which unmounted the component on tab switch — killing port connections, losing loading state, and resetting analysis progress. Now all pages are always mounted but hidden via CSS (className="hidden"). Active page gets animate-fade-in. This means: - Running analyses continue when switching to Settings and back - Preview state survives tab switches without session storage hacks - Port connections for long-running operations stay alive - Loading spinners and progress indicators persist

Bookmark auto-snapshots weren't appearing because bookmark organize was failing before reaching the apply step. Now that batching works, auto-snapshots should fire normally. Added console log to confirm when the snapshot is created.

Wrap session snapshot and auto-snapshot writes in try/catch with console.error — if chrome.storage fails (quota, serialization), we'll see the actual error instead of silent failure. Logs: move/folder/removal counts, snapshot entry count, and explicit success/failure messages for both session and persistent snapshots.

TabSnapshot only stored {id, groupId, windowId} — no group metadata. When restoring, chrome.tabs.group() creates new unnamed groups because the old groupId is gone. Groups were structurally correct (right tabs together) but had no titles or colors. Fix: - Add TabGroupSnapshot type: { groupId, title, color } - Add optional tabGroups[] to Snapshot interface (backward compatible) - Capture live group metadata at snapshot time (auto + manual) - Store group metadata in session for undo - Restore: after chrome.tabs.group(), call chrome.tabGroups.update() with saved title + color - Both undo and full restore now recover group names and colors

Bookmark organization now creates nested folder structures. The AI returns path-based names like "Dev/Frontend" and the apply handler creates the folder hierarchy automatically. Prompt changes: - Rules allow "/" for hierarchy, max 3 levels deep - Granularity scales nesting: level 1 = flat only, level 5 = deep - Few-shot examples demonstrate nested paths - Schema comment documents "/" convention Service worker: - New createFolderPath() splits on "/", creates intermediate folders with a cache to avoid duplicates (e.g., "Dev" created once for both "Dev/Frontend" and "Dev/Backend") - Batch merge preserves first-seen casing instead of broken title-case - Fixed dead nameMap variable that was never populated Parser: - Reject leading/trailing slashes and double slashes in folder names UI: - BookmarkFolderCard renders hierarchy: "Dev / Frontend" with parent segments muted and leaf segment bright - Folders sorted alphabetically so shared parents appear adjacent - "Nested" guidance preset added, "Flat" preset updated to explicitly suppress "/" nesting

Adds a .refine() check that rejects folder paths with more than 3 segments (e.g., "A/B/C/D" fails). If the AI returns deeper nesting, Zod validation fails and triggers a retry with the error context.

Pinned tabs were always silently excluded with no user control. Now a toggle in the Tab Organizer idle state lets users choose: - OFF (default): pinned tabs excluded, same as before - ON: pinned tabs included in AI organization Design: - Compact toggle row with iOS-style switch (brand-400 when active) - Pin icon on the right, label changes color with state - Border highlights when active (brand-400/30) - Setting persisted in chrome.storage.local - Loaded on mount, saved immediately on toggle Wiring: - includePinnedTabs added to Settings type - Passed through organize-tabs message payload - Service worker conditionally filters based on the flag

The secure tier now has functional bookmark organization instead of returning empty results. Same pattern as the existing tab rule-based fallback. Bookmark organization (secure tier): - groupBookmarksByDomain(): groups bookmarks by URL domain - ruleBasedBookmarkOrganize(): creates folder suggestions from domain groups with granularity-aware minimum group size - Ungrouped bookmarks collected into a "Misc" folder - Duplicate detection included (already worked, now surfaced) - Folders sorted largest-first Bookmark location suggestions (secure tier): - suggestFolderByDomain(): matches bookmark's domain against existing folder names/paths, returns up to 3 suggestions with confidence - Replaces the previous hard error with actual suggestions Before: "Local AI not yet available. Showing current bookmarks." After: Actual folder suggestions grouped by domain, ready to apply.

Rule-based tab organization now uses a two-pass strategy: Pass 1: Domain grouping (existing behavior) - github.com tabs → "Github" group - youtube.com tabs → "Youtube" group Pass 2: Keyword themes for ungrouped tabs - Tabs not captured by domain groups get keyword-extracted - Common title words (3+ chars, stop words filtered) that appear in 2+ tabs form theme groups - Example: "React Docs" + "React Hooks — SO" → "React" group even though they're on different domains Strategy: domain groups first (high confidence), then keyword themes for leftovers (greedy, largest keyword clusters first). Tabs are never assigned to both — domain takes priority.

The secure tier now supports three local AI engines: 1. Rule-based (default): domain + keyword grouping, no AI model needed 2. Ollama: connects to local Ollama instance via OpenAI-compatible API - Configurable server URL and model (defaults: localhost:11434, llama3.2) - Diagnostic logging with timing - Uses relaxed-tier data (titles only, no URLs) 3. Chrome AI: uses Chrome's built-in Gemini Nano via the Prompt API - Available in Chrome 138+ extensions - No install, no network, fully on-device - Checks LanguageModel.availability() before use - Session created per request and destroyed after Architecture: - New OllamaProvider and ChromeAIProvider classes - LocalAIProvider type: "rule-based" | "ollama" | "chrome-ai" - Settings: localAIProvider, ollamaUrl, ollamaModel - Provider factory routes secure tier based on localAIProvider setting - Service worker checks localAIProvider before falling back to rule-based Settings UI: - 3-way segmented control: Rules / Ollama / Chrome AI - Ollama: server URL input + model selector (Llama 3.2, Mistral, Gemma 2, Phi-3, custom) - Chrome AI: description text (no config needed) - Contextual descriptions for each engine

…ublishing Add complete release automation pipeline for the Chrome extension: - Release Please manifest-based config with version sync across package.json and manifest.json (bump-minor-pre-major enabled) - CI workflow with lint, typecheck, test, and build quality gates - Release Please workflow with auto-merge via branch protection - Release publish workflow chained via workflow_run with all 5 implementation guardrails (success filter, explicit ref:main, early-exit, explicit permissions, visible auto-merge failure) - Extension zip packaging script with dist pre-check and cleanup - Chrome Web Store draft upload job (Phase 2, gated on CWS config)

Merge origin/main into feature/release-infra, keeping the feature branch's more complete implementations while incorporating main's monorepo restructure and additional features.

- Remove ESLint, add Biome v2 with recommended rules tuned for Chrome extension context (a11y rules as warnings) - Add lefthook with pre-commit hook that auto-fixes staged files - Auto-fix import ordering, formatting across all extension source

heyjawrsh and others added 30 commits March 26, 2026 13:52

Wrapping phase 3

57f2fc3

a little phase 4 pregame; adding in openrouter options

1a8162d

Saving initial pass at bookmarks org

013477c

Added folder cleanup to Bookmarks section

64e39d4

Monorepo (#2)

d87184c

* Updating gitignore * Split into concerns of a newly created monorepo

Bookmark locking

7f90050

Updated to new direction branding

64b74e3

Ignoring .screens dir

c916d0c

Adding screenshot directive so that we can always obtain good screens…

e4241dc

… for review, case study, etc

Big UI hugs

d97d4ca

Adding in the current state of case study

78b8384

Ignoring a couple of additional generated dirs

13e84b1

Spiffing up the place

29ff342

Updating gitignore to ignore root docs folder

5e1f16b

refactor(ai): consolidate SYSTEM_MESSAGE into shared types module

83ac082

Move duplicated SYSTEM_MESSAGE constant from three provider files into ai/types.ts to prevent drift and simplify future updates.

refactor(ai): use dedicated JsonExtractionError for stable retry dete…

667b21b

…ction Replace brittle string-match check in isRetryableParseError with instanceof against a dedicated error class thrown by extractJson.

fix(ai): add diagnostic logging to service worker message handlers

317e2c7

Log port connections, message actions, and OpenRouter request/response details to the service worker console. Helps diagnose timeout and connectivity issues without guessing.

feat(tabs): auto-collapse tab groups after applying organization

2f3706d

All newly created tab groups are collapsed after applying suggestions, reducing tab bar clutter. Groups are collapsed as a post-processing step after all groups are created, not during creation.

heyjawrsh added 25 commits April 3, 2026 10:36

Updating turbothangs

7b02621

fix(ai): cap max_tokens at 8192 for OpenRouter model compatibility

0fd1cdf

Some models via OpenRouter (including Claude Sonnet) cap output tokens lower than 16384. Reducing cap to 8192 for broad compatibility.

fix(bookmarks): enforce 3-level max nesting depth in Zod schema

499a0bb

Adds a .refine() check that rejects folder paths with more than 3 segments (e.g., "A/B/C/D" fails). If the AI returns deeper nesting, Zod validation fails and triggers a retry with the error context.

merge: resolve conflicts with main branch

b6d9946

Merge origin/main into feature/release-infra, keeping the feature branch's more complete implementations while incorporating main's monorepo restructure and additional features.

heyjawrsh merged commit 73c637c into main Apr 4, 2026
1 check passed

heyjawrsh deleted the feature/release-infra branch April 4, 2026 19:20

heyjawrsh mentioned this pull request Apr 4, 2026

chore(main): release extension 0.2.0 #7

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): add release infrastructure with CI gates, versioning, and publishing#6

feat(ci): add release infrastructure with CI gates, versioning, and publishing#6
heyjawrsh merged 55 commits into
mainfrom
feature/release-infra

heyjawrsh commented Apr 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

heyjawrsh commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

heyjawrsh commented Apr 4, 2026 •

edited

Loading