feat(ci): add release infrastructure with CI gates, versioning, and publishing#6
Merged
Conversation
* Updating gitignore * Split into concerns of a newly created monorepo
- Introduce LockedTabGroup type, storage key, and persistence helpers - Add queryTabGroups, getLockedTabIds, and resolveStaleLockedGroups utilities (including stale group re-matching by name+color) - Add get-locked-tab-groups, lock-tab-group, unlock-tab-group message actions and service worker handlers - Integrate locking into organize, apply, and undo flows — locked tabs are excluded from all AI-driven reorganization - Surface lock/unlock toggles in TabOrganizer idle state, with dormant lock management for groups no longer open - Replace hardcoded indigo-* colors with a custom brand color scale (cyan) and coal background token via Tailwind @theme - Update extension icons
Add a 1-5 granularity control for tab and bookmark organization that lets users choose between broad and fine-grained grouping. Introduce port-based messaging for long-running operations with progress updates, elapsed timers, and session storage persistence for preview state.
…d folder filter Persistent snapshot system that auto-captures state before every destructive apply and supports manual named snapshots. Includes rename, per-snapshot and bulk export to JSON, import with dedup, selective restore (tabs/bookmarks/both), and auto-pruning at 20 snapshots. Adds type-to-filter input on the bookmark folders list.
… for review, case study, etc
…re 0, and retry logic - Add system/user message separation for Claude (was single user message) - Reduce temperature to 0.0 across all providers (was 0.3 or unset) - Add withRetry() wrapper that retries parse failures with error context - Unify system message across Claude, OpenAI, and OpenRouter providers
withRetry retries on any error thrown by parseFn, which can mask real programming/runtime issues (not just parse/validation failures) and trigger an unnecessary second LLM call. Consider only retrying for known parse/validation error types (e.g., ZodError, SyntaxError, or your own No valid JSON found... error) and rethrowing everything else immediately. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Move duplicated SYSTEM_MESSAGE constant from three provider files into ai/types.ts to prevent drift and simplify future updates.
…ction Replace brittle string-match check in isRetryableParseError with instanceof against a dedicated error class thrown by extractJson.
…on messaging All fetch() calls to LLM providers now use fetchWithTimeout() with a 60-second AbortController. The client-side long-running message channel adds a 90-second safety timeout. Previously, a hung API connection would cause the spinner to run indefinitely.
Users saw a spinner with no status updates for minutes with no way to tell if the operation was working or hung. Changes: - Thread StatusCallback through AIProvider → withRetry → service worker - Report 4 phases: preparing → sending to AI → waiting/retrying → done - withRetry reports: "Waiting for AI response...", "Processing...", "Response invalid, retrying..." - Service worker sends progress with item counts at each phase - Reduce API timeout from 60s to 30s (single call should never need more) - Client safety timeout resets on each progress message (45s silence = hung) - Progress bar now visible during AI wait (total=4 phases)
For 118 tabs, the 30s fixed timeout was too tight — OpenRouter legitimately needs more time for large sets. Changes: - Timeout now scales: 30s base + 0.25s per item, capped at 90s (118 tabs → ~60s timeout) - Providers thread item count to fetchWithTimeout for per-call scaling - Fix "Batch X of Y" → "Step X of Y" (these are phases, not batches) - Client silence timeout increased to 60s (resets on each progress msg)
Console logs now show: model, item count, prompt size, timeout, elapsed time, and response size for every OpenRouter call. Status messages show the specific model name so users can see which model is being used. This helps diagnose whether timeouts are due to the model, prompt size, or network issues.
Log port connections, message actions, and OpenRouter request/response details to the service worker console. Helps diagnose timeout and connectivity issues without guessing.
Refactor monolithic prompt builders into composable modules: rules(), granularityInstruction(), fewShotExamples(), dataBlock(), schemaBlock(). Each is a pure function returning a string segment. Add 3 diverse few-shot examples per feature: - Tabs: dev/docs mix, shopping/social/media, stale+duplicates edge case - Bookmarks: dev/learning, finance/travel, work+duplicates Few-shot examples use the exact same JSON schema as real output so the model learns the expected format from examples.
…n apply Empty or whitespace-only group names from the AI now fail Zod validation (triggering a retry). If an empty name somehow gets through, createTabGroup falls back to "Untitled Group". - Add .trim().min(1) to TabGroupSuggestionSchema.name - Add .min(1) to TabGroupSuggestionSchema.tabIds - Add .trim().min(1) to BookmarkFolderSchema.name - Add .min(1) to BookmarkFolderSchema.bookmarkIds - Defensive fallback in createTabGroup for empty names
Pinned tabs are now filtered out before AI analysis, stale detection, and duplicate detection — same treatment as locked groups. The apply handler also protects pinned tabs from being grouped, closed, or deduplicated. - Add pinned:boolean to TabInfo, populated from Chrome API - Filter pinned tabs in handleOrganizeTabs before AI call - Build protectedTabIds (pinned + locked) in handleApplyTabSuggestions - Report pinned tab count in reasoning string - Handle edge case where all tabs are pinned/locked
All newly created tab groups are collapsed after applying suggestions, reducing tab bar clutter. Groups are collapsed as a post-processing step after all groups are created, not during creation.
Both tab and bookmark organizer UIs now have bulk toggle buttons: - Tab stale section: "Select all" / "Deselect all" with count (X/Y) - Tab duplicates section: same toggle - Bookmark duplicates section: same toggle Adds allStale[] to TabOrganizer preview state for select-all support.
…coring 52 tests covering the full AI pipeline — zero production code changes: Parser tests (20): - extractJson: valid JSON, code-block wrapped, prose embedded, no-JSON - parseTabOrganization: valid, empty name, whitespace name, empty tabIds, invalid color fallback, optional field defaults - parseBookmarkOrganization: valid, empty folder name, empty bookmarkIds - withRetry: success, retry-on-parse-fail, both-fail, API error propagation, onStatus callback phases Prompt builder tests (22): - Tab and bookmark builders: all sections present (rules, granularity, few-shot, data, schema), URL inclusion/exclusion, granularity levels 1-5, 3 few-shot examples, example ordering before real data Scoring utility + tests (10): - scoreTabGrouping / scoreBookmarkGrouping: precision calculation, group count delta, mismatch detection - Perfect match, partial match, complete mismatch, unassigned items Golden fixtures: 3 tab sets + 2 bookmark sets with labeled groupings.
…ntent
Split prompt builders into cached (rules + few-shot + schema) and
dynamic (granularity + data) parts via new buildXxxPromptParts() exports.
Claude providers now send user messages as content blocks with
cache_control: { type: "ephemeral" } on the static prefix. On cache
hit, Anthropic charges 90% less for the cached portion (~1.5K tokens
of rules/examples/schema identical across every call).
OpenAI caches automatically for prompts >1,024 tokens — no changes needed.
OpenRouter passes through provider caching.
…for large Items <= 30 use claude-haiku-4-5 (faster, cheaper), items > 30 use claude-sonnet-4 (more capable for complex grouping). Routing is transparent — the user sees the same results, just faster and cheaper for small tab/bookmark sets. - Add selectClaudeModel(itemCount) to ai/types.ts - Replace hardcoded model strings in relaxed + yolo providers - Add diagnostic logging showing routed model name
When a user edits AI-suggested tab groups (renames, disables, removes tabs) and clicks Apply, the diff between AI suggestions and applied result is extracted as domain-level correction signals. Corrections are stored in chrome.storage.local and injected into the prompt's dynamic section as USER PREFERENCES on subsequent organize calls. The AI sees "github.com tabs → prefer 'Dev' (corrected 3x)" and learns from past user feedback. - CorrectionSignal type: domain, preferredGroup/rejectedGroup, count, lastSeen - extractCorrections(): diffs AI groups vs applied groups - mergeCorrections(): upserts by domain+group, prunes to max 50 - rankCorrections(): sorts by count × recency (14-day decay half-life) - correctionsBlock(): formats top 10 as structured prompt text - OrganizeTabsOptions replaces positional args on AIProvider.organizeTabs - 16 new tests (68 total, all passing)
When a user accepts AI-suggested groupings without edits, domain→group pairs are now captured as "acceptance" signals alongside explicit corrections. This builds domain affinity over time — the AI learns which groupings work well even when the user doesn't edit anything. - Add source field to CorrectionSignal: "correction" | "acceptance" - extractCorrections() now captures accepted groupings as acceptance signals - Acceptances weighted at 0.5x vs corrections (1x) in decay ranking - Merge upgrades source to "correction" if either signal is explicit - correctionsBlock() formats acceptances as '"Dev" works well (confirmed 5x)' - 5 new tests (73 total, all passing)
Every AI organize call is now logged with: variant ID, model, item count, latency, and group count. At apply time, the edit count (corrections vs acceptances) is recorded. Undo marks the log entry as undone. This enables offline A/B analysis: compare prompt variants by measuring edit rate (lower = better), undo rate, latency, etc. Currently all requests use variant "default" — adding variants is a config change, not a code change. - ExperimentLog type with all fields for analysis - appendExperimentLog/updateExperimentLog in storage - Organize handler logs experiment at AI response time - Apply handler records edit count on the same log entry - Undo handler marks latest log as undone - Capped at 200 entries (oldest pruned)
…ame UX Users can now steer AI results with custom text guidance: - Collapsible "Guidance" section in both tab and bookmark organizers - Quick-preset chips: "By project", "By domain", "By activity", etc. - Guidance persisted in settings, injected into AI prompts as USER GUIDANCE - Debounced auto-save (500ms) to avoid excessive writes Group/folder rename improvements: - Tab GroupCard: pencil icon appears on hover to signal editability - Bookmark folder cards: now support inline rename (was read-only) - Edit input gets brand-400 focus border for better visual feedback - New BookmarkFolderCard component with same UX pattern as GroupCard Backend wiring: - Settings gains tabGuidance/bookmarkGuidance fields - OrganizeBookmarksOptions mirrors OrganizeTabsOptions pattern - Prompt builders accept guidance in options, inject as USER GUIDANCE block - All 4 providers + service worker thread guidance through
Root cause: bookmark organize was failing silently because max_tokens was hardcoded to 4096. For 200+ bookmarks, the output JSON (every bookmark ID assigned to a folder + reasoning) exceeds 4096 tokens, causing truncated JSON → parse failure → retry → same truncation → error after 2 attempts. Fix: - aiMaxTokens(itemCount): 2048 base + 20 tokens per item, capped at 16384 - Applied to all Claude, OpenAI, and OpenRouter calls - Added diagnostic logging to bookmark organize path and OpenAI calls - Console now shows: item count, prompt size, max_tokens, timeout Scaling examples: 20 items → 2448 max_tokens 100 items → 4048 max_tokens 500 items → 12048 max_tokens
Some models via OpenRouter (including Claude Sonnet) cap output tokens lower than 16384. Reducing cap to 8192 for broad compatibility.
2,410 bookmarks in a single 304K char prompt was never going to work. Now splits into batches of 100 bookmarks when the collection exceeds 100 items. Each batch is processed independently and results are merged by folder name (case-insensitive consolidation). For the user's 2,410 bookmarks: 25 batches of ~100, each with a ~13K char prompt that any model can handle. Progress shows "Batch 1/25: analyzing 100 bookmarks..." per batch. - Collections <= 100: single call (unchanged behavior) - Collections > 100: batched with per-batch progress - Failed batches logged but don't abort the whole operation - Folder names consolidated across batches (title-cased) - Duplicates and reasoning merged across all batches
Pages were conditionally rendered ({page === "tabs" && <TabOrganizer />})
which unmounted the component on tab switch — killing port connections,
losing loading state, and resetting analysis progress.
Now all pages are always mounted but hidden via CSS (className="hidden").
Active page gets animate-fade-in. This means:
- Running analyses continue when switching to Settings and back
- Preview state survives tab switches without session storage hacks
- Port connections for long-running operations stay alive
- Loading spinners and progress indicators persist
Bookmark auto-snapshots weren't appearing because bookmark organize was failing before reaching the apply step. Now that batching works, auto-snapshots should fire normally. Added console log to confirm when the snapshot is created.
Wrap session snapshot and auto-snapshot writes in try/catch with console.error — if chrome.storage fails (quota, serialization), we'll see the actual error instead of silent failure. Logs: move/folder/removal counts, snapshot entry count, and explicit success/failure messages for both session and persistent snapshots.
TabSnapshot only stored {id, groupId, windowId} — no group metadata.
When restoring, chrome.tabs.group() creates new unnamed groups because
the old groupId is gone. Groups were structurally correct (right tabs
together) but had no titles or colors.
Fix:
- Add TabGroupSnapshot type: { groupId, title, color }
- Add optional tabGroups[] to Snapshot interface (backward compatible)
- Capture live group metadata at snapshot time (auto + manual)
- Store group metadata in session for undo
- Restore: after chrome.tabs.group(), call chrome.tabGroups.update()
with saved title + color
- Both undo and full restore now recover group names and colors
Bookmark organization now creates nested folder structures. The AI returns path-based names like "Dev/Frontend" and the apply handler creates the folder hierarchy automatically. Prompt changes: - Rules allow "/" for hierarchy, max 3 levels deep - Granularity scales nesting: level 1 = flat only, level 5 = deep - Few-shot examples demonstrate nested paths - Schema comment documents "/" convention Service worker: - New createFolderPath() splits on "/", creates intermediate folders with a cache to avoid duplicates (e.g., "Dev" created once for both "Dev/Frontend" and "Dev/Backend") - Batch merge preserves first-seen casing instead of broken title-case - Fixed dead nameMap variable that was never populated Parser: - Reject leading/trailing slashes and double slashes in folder names UI: - BookmarkFolderCard renders hierarchy: "Dev / Frontend" with parent segments muted and leaf segment bright - Folders sorted alphabetically so shared parents appear adjacent - "Nested" guidance preset added, "Flat" preset updated to explicitly suppress "/" nesting
Adds a .refine() check that rejects folder paths with more than 3 segments (e.g., "A/B/C/D" fails). If the AI returns deeper nesting, Zod validation fails and triggers a retry with the error context.
Pinned tabs were always silently excluded with no user control. Now a toggle in the Tab Organizer idle state lets users choose: - OFF (default): pinned tabs excluded, same as before - ON: pinned tabs included in AI organization Design: - Compact toggle row with iOS-style switch (brand-400 when active) - Pin icon on the right, label changes color with state - Border highlights when active (brand-400/30) - Setting persisted in chrome.storage.local - Loaded on mount, saved immediately on toggle Wiring: - includePinnedTabs added to Settings type - Passed through organize-tabs message payload - Service worker conditionally filters based on the flag
The secure tier now has functional bookmark organization instead of returning empty results. Same pattern as the existing tab rule-based fallback. Bookmark organization (secure tier): - groupBookmarksByDomain(): groups bookmarks by URL domain - ruleBasedBookmarkOrganize(): creates folder suggestions from domain groups with granularity-aware minimum group size - Ungrouped bookmarks collected into a "Misc" folder - Duplicate detection included (already worked, now surfaced) - Folders sorted largest-first Bookmark location suggestions (secure tier): - suggestFolderByDomain(): matches bookmark's domain against existing folder names/paths, returns up to 3 suggestions with confidence - Replaces the previous hard error with actual suggestions Before: "Local AI not yet available. Showing current bookmarks." After: Actual folder suggestions grouped by domain, ready to apply.
Rule-based tab organization now uses a two-pass strategy:
Pass 1: Domain grouping (existing behavior)
- github.com tabs → "Github" group
- youtube.com tabs → "Youtube" group
Pass 2: Keyword themes for ungrouped tabs
- Tabs not captured by domain groups get keyword-extracted
- Common title words (3+ chars, stop words filtered) that appear
in 2+ tabs form theme groups
- Example: "React Docs" + "React Hooks — SO" → "React" group
even though they're on different domains
Strategy: domain groups first (high confidence), then keyword themes
for leftovers (greedy, largest keyword clusters first). Tabs are never
assigned to both — domain takes priority.
The secure tier now supports three local AI engines: 1. Rule-based (default): domain + keyword grouping, no AI model needed 2. Ollama: connects to local Ollama instance via OpenAI-compatible API - Configurable server URL and model (defaults: localhost:11434, llama3.2) - Diagnostic logging with timing - Uses relaxed-tier data (titles only, no URLs) 3. Chrome AI: uses Chrome's built-in Gemini Nano via the Prompt API - Available in Chrome 138+ extensions - No install, no network, fully on-device - Checks LanguageModel.availability() before use - Session created per request and destroyed after Architecture: - New OllamaProvider and ChromeAIProvider classes - LocalAIProvider type: "rule-based" | "ollama" | "chrome-ai" - Settings: localAIProvider, ollamaUrl, ollamaModel - Provider factory routes secure tier based on localAIProvider setting - Service worker checks localAIProvider before falling back to rule-based Settings UI: - 3-way segmented control: Rules / Ollama / Chrome AI - Ollama: server URL input + model selector (Llama 3.2, Mistral, Gemma 2, Phi-3, custom) - Chrome AI: description text (no config needed) - Contextual descriptions for each engine
…ublishing Add complete release automation pipeline for the Chrome extension: - Release Please manifest-based config with version sync across package.json and manifest.json (bump-minor-pre-major enabled) - CI workflow with lint, typecheck, test, and build quality gates - Release Please workflow with auto-merge via branch protection - Release publish workflow chained via workflow_run with all 5 implementation guardrails (success filter, explicit ref:main, early-exit, explicit permissions, visible auto-merge failure) - Extension zip packaging script with dist pre-check and cleanup - Chrome Web Store draft upload job (Phase 2, gated on CWS config)
Merge origin/main into feature/release-infra, keeping the feature branch's more complete implementations while incorporating main's monorepo restructure and additional features.
- Remove ESLint, add Biome v2 with recommended rules tuned for Chrome extension context (a11y rules as warnings) - Add lefthook with pre-commit hook that auto-fixes staged files - Auto-fix import ordering, formatting across all extension source
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan