Design browser-triggered rebuilds as a single out-of-process rebuild supervisor

## Context

We are choosing the simpler v1 contract now:

- The browser can edit settings.
- Runtime-only settings apply immediately.
- Rebuild-requiring settings are saved, but the user runs the rebuild from the CLI.
- The CLI remains the source of truth for setup, ingestion, and rebuilds.

That is the right boundary for the current homelab release. It keeps the app predictable and avoids turning the web process into an operational control plane.

At the same time, browser-triggered rebuilds are still a reasonable future convenience feature if we build them with a much tighter contract than the current in-process job approach.

## Problem

The current in-process rebuild model is the wrong foundation for long-running rebuilds:

- rebuild execution is tied to the Gunicorn worker lifecycle
- job state lives in process memory
- config reloads and rebuild execution can interfere with each other
- page refresh/navigation can lose rebuild state in the UI
- long rebuilds interact poorly with worker timeouts and restarts
- failure details are hard to surface cleanly and durably

This issue is **not** asking for a generic background job system. That would add too much complexity for a single-user homelab app.

## Goal

If we reintroduce browser-triggered rebuilds later, build them as **one special rebuild supervisor** with a narrow contract:

- only one rebuild may run at a time
- rebuild runs in a separate process, not inside the web worker/thread model
- rebuild state is durable across page refreshes
- queries continue using the last completed profile/index while rebuild runs
- rebuilt artifacts become active only after a successful rebuild
- failed rebuilds leave the previous active artifacts unchanged
- rebuilds are always explicit; saving settings must not auto-start a rebuild

## Proposed Product Contract

### Allowed while rebuild runs

- users may continue to browse and query recommendations
- queries use the **last completed active snapshot** of the watch index and taste profile
- settings may still be edited, but the UI must make it clear whether a newer rebuild is now required after the current one finishes

### Not allowed while rebuild runs

- starting a second rebuild
- overwriting active artifacts mid-run
- treating an in-progress rebuild as partially active

### User-visible states

- idle
- rebuild required
- rebuilding
- rebuild succeeded
- rebuild failed

The UI should remain accurate across refreshes and navigation.

## Minimal Technical Design

### 1. Replace in-process rebuild execution

Use `subprocess.Popen(...)` to launch the CLI rebuild command from the web app instead of calling `run_setup()` inside the web process.

Examples:

- profile-only rebuild: `./recommend setup --refresh-profile`
- data + profile rebuild: `./recommend setup --refresh-data`

The subprocess should write stdout/stderr to a dedicated rebuild log file.

### 2. Persist rebuild state to disk

Store rebuild state in a small JSON file under app-managed local state.

Suggested fields:

- `status`
- `mode` (`profile` or `data`)
- `pid`
- `started_at`
- `finished_at`
- `exit_code`
- `log_path`
- `error_summary`
- `requested_from_config_generation` or equivalent stale marker if useful

This state file becomes the source of truth for the UI and survives page refreshes.

### 3. Enforce single rebuild semantics

Use a lock file or equivalent process-level guard.

Behavior:

- if a rebuild is already running, `POST /rebuild` returns the existing rebuild state instead of launching another one
- the UI should show that a rebuild is already in progress

### 4. Keep runtime reads on the previous active snapshot

Do not mutate the active recommendation context while the rebuild is in progress.

Only after successful completion:

- invalidate the cached runtime context
- reload artifacts on the next request

Failure must leave the old active context untouched.

### 5. Poll durable state, not in-memory job objects

The rebuild status endpoint should read the rebuild state file and render the current rebuild state from durable data.

This must survive:

- page refresh
n- navigating away and back
- worker restart

### 6. Surface meaningful failure output

The UI should show a concise failure summary derived from the rebuild subprocess output, not just `Process exited with status 1`.

The full log can remain in a file for operator inspection.

## Suggested Implementation Steps

1. Remove the current in-process rebuild path from the web layer.
2. Add a small rebuild-state module responsible for:
   - state file read/write
   - lock acquisition/release
   - process metadata
3. Implement rebuild process launch via subprocess.
4. Replace current rebuild polling to read durable state.
5. Invalidate runtime context only after successful completion.
6. Preserve and display concise failure messages.
7. Add tests for the rebuild supervisor contract.

## Acceptance Criteria

- `POST /rebuild` starts at most one rebuild.
- Refreshing `/settings` during rebuild still shows correct progress.
- Query requests continue working during rebuild and use the last completed active data.
- A failed rebuild does not break queries or replace active artifacts.
- A successful rebuild becomes active on the next request after completion.
- Saving rebuild-requiring settings never auto-starts rebuild work.
- The UI shows a meaningful failure reason when rebuild fails.

## Explicit Non-Goals

Do **not** turn this into a general-purpose job framework.

Out of scope:

- multiple concurrent job types
- cancellation
- retries
- queueing multiple rebuilds
- resumable jobs after reboot
- real-time log streaming
- generalized worker orchestration

## Why this scope is correct

This keeps the convenience of browser-triggered rebuilds without drifting into SaaS-style control-plane complexity. It fits the product philosophy: single-user, explicit operations, predictable behavior, and low operational burden.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design browser-triggered rebuilds as a single out-of-process rebuild supervisor #25

Context

Problem

Goal

Proposed Product Contract

Allowed while rebuild runs

Not allowed while rebuild runs

User-visible states

Minimal Technical Design

1. Replace in-process rebuild execution

2. Persist rebuild state to disk

3. Enforce single rebuild semantics

4. Keep runtime reads on the previous active snapshot

5. Poll durable state, not in-memory job objects

6. Surface meaningful failure output

Suggested Implementation Steps

Acceptance Criteria

Explicit Non-Goals

Why this scope is correct

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Design browser-triggered rebuilds as a single out-of-process rebuild supervisor #25

Description

Context

Problem

Goal

Proposed Product Contract

Allowed while rebuild runs

Not allowed while rebuild runs

User-visible states

Minimal Technical Design

1. Replace in-process rebuild execution

2. Persist rebuild state to disk

3. Enforce single rebuild semantics

4. Keep runtime reads on the previous active snapshot

5. Poll durable state, not in-memory job objects

6. Surface meaningful failure output

Suggested Implementation Steps

Acceptance Criteria

Explicit Non-Goals

Why this scope is correct

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions