Context
We are choosing the simpler v1 contract now:
- The browser can edit settings.
- Runtime-only settings apply immediately.
- Rebuild-requiring settings are saved, but the user runs the rebuild from the CLI.
- The CLI remains the source of truth for setup, ingestion, and rebuilds.
That is the right boundary for the current homelab release. It keeps the app predictable and avoids turning the web process into an operational control plane.
At the same time, browser-triggered rebuilds are still a reasonable future convenience feature if we build them with a much tighter contract than the current in-process job approach.
Problem
The current in-process rebuild model is the wrong foundation for long-running rebuilds:
- rebuild execution is tied to the Gunicorn worker lifecycle
- job state lives in process memory
- config reloads and rebuild execution can interfere with each other
- page refresh/navigation can lose rebuild state in the UI
- long rebuilds interact poorly with worker timeouts and restarts
- failure details are hard to surface cleanly and durably
This issue is not asking for a generic background job system. That would add too much complexity for a single-user homelab app.
Goal
If we reintroduce browser-triggered rebuilds later, build them as one special rebuild supervisor with a narrow contract:
- only one rebuild may run at a time
- rebuild runs in a separate process, not inside the web worker/thread model
- rebuild state is durable across page refreshes
- queries continue using the last completed profile/index while rebuild runs
- rebuilt artifacts become active only after a successful rebuild
- failed rebuilds leave the previous active artifacts unchanged
- rebuilds are always explicit; saving settings must not auto-start a rebuild
Proposed Product Contract
Allowed while rebuild runs
- users may continue to browse and query recommendations
- queries use the last completed active snapshot of the watch index and taste profile
- settings may still be edited, but the UI must make it clear whether a newer rebuild is now required after the current one finishes
Not allowed while rebuild runs
- starting a second rebuild
- overwriting active artifacts mid-run
- treating an in-progress rebuild as partially active
User-visible states
- idle
- rebuild required
- rebuilding
- rebuild succeeded
- rebuild failed
The UI should remain accurate across refreshes and navigation.
Minimal Technical Design
1. Replace in-process rebuild execution
Use subprocess.Popen(...) to launch the CLI rebuild command from the web app instead of calling run_setup() inside the web process.
Examples:
- profile-only rebuild:
./recommend setup --refresh-profile
- data + profile rebuild:
./recommend setup --refresh-data
The subprocess should write stdout/stderr to a dedicated rebuild log file.
2. Persist rebuild state to disk
Store rebuild state in a small JSON file under app-managed local state.
Suggested fields:
status
mode (profile or data)
pid
started_at
finished_at
exit_code
log_path
error_summary
requested_from_config_generation or equivalent stale marker if useful
This state file becomes the source of truth for the UI and survives page refreshes.
3. Enforce single rebuild semantics
Use a lock file or equivalent process-level guard.
Behavior:
- if a rebuild is already running,
POST /rebuild returns the existing rebuild state instead of launching another one
- the UI should show that a rebuild is already in progress
4. Keep runtime reads on the previous active snapshot
Do not mutate the active recommendation context while the rebuild is in progress.
Only after successful completion:
- invalidate the cached runtime context
- reload artifacts on the next request
Failure must leave the old active context untouched.
5. Poll durable state, not in-memory job objects
The rebuild status endpoint should read the rebuild state file and render the current rebuild state from durable data.
This must survive:
- page refresh
n- navigating away and back
- worker restart
6. Surface meaningful failure output
The UI should show a concise failure summary derived from the rebuild subprocess output, not just Process exited with status 1.
The full log can remain in a file for operator inspection.
Suggested Implementation Steps
- Remove the current in-process rebuild path from the web layer.
- Add a small rebuild-state module responsible for:
- state file read/write
- lock acquisition/release
- process metadata
- Implement rebuild process launch via subprocess.
- Replace current rebuild polling to read durable state.
- Invalidate runtime context only after successful completion.
- Preserve and display concise failure messages.
- Add tests for the rebuild supervisor contract.
Acceptance Criteria
POST /rebuild starts at most one rebuild.
- Refreshing
/settings during rebuild still shows correct progress.
- Query requests continue working during rebuild and use the last completed active data.
- A failed rebuild does not break queries or replace active artifacts.
- A successful rebuild becomes active on the next request after completion.
- Saving rebuild-requiring settings never auto-starts rebuild work.
- The UI shows a meaningful failure reason when rebuild fails.
Explicit Non-Goals
Do not turn this into a general-purpose job framework.
Out of scope:
- multiple concurrent job types
- cancellation
- retries
- queueing multiple rebuilds
- resumable jobs after reboot
- real-time log streaming
- generalized worker orchestration
Why this scope is correct
This keeps the convenience of browser-triggered rebuilds without drifting into SaaS-style control-plane complexity. It fits the product philosophy: single-user, explicit operations, predictable behavior, and low operational burden.
Context
We are choosing the simpler v1 contract now:
That is the right boundary for the current homelab release. It keeps the app predictable and avoids turning the web process into an operational control plane.
At the same time, browser-triggered rebuilds are still a reasonable future convenience feature if we build them with a much tighter contract than the current in-process job approach.
Problem
The current in-process rebuild model is the wrong foundation for long-running rebuilds:
This issue is not asking for a generic background job system. That would add too much complexity for a single-user homelab app.
Goal
If we reintroduce browser-triggered rebuilds later, build them as one special rebuild supervisor with a narrow contract:
Proposed Product Contract
Allowed while rebuild runs
Not allowed while rebuild runs
User-visible states
The UI should remain accurate across refreshes and navigation.
Minimal Technical Design
1. Replace in-process rebuild execution
Use
subprocess.Popen(...)to launch the CLI rebuild command from the web app instead of callingrun_setup()inside the web process.Examples:
./recommend setup --refresh-profile./recommend setup --refresh-dataThe subprocess should write stdout/stderr to a dedicated rebuild log file.
2. Persist rebuild state to disk
Store rebuild state in a small JSON file under app-managed local state.
Suggested fields:
statusmode(profileordata)pidstarted_atfinished_atexit_codelog_patherror_summaryrequested_from_config_generationor equivalent stale marker if usefulThis state file becomes the source of truth for the UI and survives page refreshes.
3. Enforce single rebuild semantics
Use a lock file or equivalent process-level guard.
Behavior:
POST /rebuildreturns the existing rebuild state instead of launching another one4. Keep runtime reads on the previous active snapshot
Do not mutate the active recommendation context while the rebuild is in progress.
Only after successful completion:
Failure must leave the old active context untouched.
5. Poll durable state, not in-memory job objects
The rebuild status endpoint should read the rebuild state file and render the current rebuild state from durable data.
This must survive:
n- navigating away and back
6. Surface meaningful failure output
The UI should show a concise failure summary derived from the rebuild subprocess output, not just
Process exited with status 1.The full log can remain in a file for operator inspection.
Suggested Implementation Steps
Acceptance Criteria
POST /rebuildstarts at most one rebuild./settingsduring rebuild still shows correct progress.Explicit Non-Goals
Do not turn this into a general-purpose job framework.
Out of scope:
Why this scope is correct
This keeps the convenience of browser-triggered rebuilds without drifting into SaaS-style control-plane complexity. It fits the product philosophy: single-user, explicit operations, predictable behavior, and low operational burden.