feat(setup): progress bars, RichHandler logging, log-level cleanup, TMDB audit dump by abhichandra21 · Pull Request #52 · abhichandra21/streamline

abhichandra21 · 2026-05-13T02:36:54Z

Context

This is the same work that was in PR #47, which was merged then reverted (#50) because it bypassed the project's normal review flow. Reopening here for a proper review pass. Branch is identical to #47's HEAD at the time of merge.

Summary

Four user-facing improvements to ./recommend setup --refresh-data:

Rich progress bars on all three long steps — TMDB metadata fetch, enrichment, and taste profile build. Each shows count, percent, elapsed, and ETA. Enrichment additionally tracks a running cache-hit tally.
RichHandler-routed logging on a shared Console. Log lines render cleanly above an active progress bar instead of stomping on the spinner line. recommender.log, setup.py, and main.py all share the same Console instance.
Demoted noisy INFO logs to DEBUG. Temp-path traces (Netflix/Prime/AppleTV/Disney "reading "), LLM provider model echoes, ingestion-side per-platform count restatements. --debug and the rotating file log still capture everything. One-shot operational summaries (dedup counts, cache cleanup) stay at INFO.
Full TMDB-match audit dumped to file. _audit_cache_mismatches now writes the complete audit (unmatched + content-type/title/year/runtime mismatches + weak matches; no truncation) to recommender/cache/logs/tmdb_audit.txt. Console still truncates to 10 per category for readability.

⚠️ Known bug — must be fixed before merging

tests/test_main.py:1018, 1044, 1136, 1162, 1187 pass tmp_path as cache_dir but the new audit-dump code writes to config.TMDB_AUDIT_PATH unconditionally. Running pytest tests/ clobbers the real audit file with a near-empty one (this actually happened in the session that produced these commits — wiped a 400-line audit down to 15 lines).

Two fixes available:

Preferred: add an audit_output_path parameter to _audit_cache_mismatches (default config.TMDB_AUDIT_PATH); update the five tests to pass tmp_path / "audit.txt".
Alternative: monkeypatch config.TMDB_AUDIT_PATH in those five tests.

I'd recommend resolving this before any reviewer runs the test suite.

Out of scope

This PR does not address the TMDB matching quality problem itself (see #51). The audit dump in this PR makes that problem visible — it doesn't fix it.

Files

File	Why
`recommender/setup.py`	Progress bars for all three steps; audit dump to file
`recommender/enricher.py`	`on_progress` callback; `%d`→`%s` bug fix on None `tmdb_id`
`recommender/taste_profile_builder.py`	`on_batch_progress` callback
`recommender/log.py`	RichHandler on shared Console (stderr)
`recommender/main.py`	Use shared console
`recommender/ingestion/{netflix,prime,apple_tv,disney}.py`	INFO → DEBUG
`recommender/llm.py`	INFO → DEBUG (provider init lines)
`config.py`	`TMDB_AUDIT_PATH` constant
`tests/test_log.py`	`.out` → `.err` after the stderr move

Net: 13 files, +162 / −51.

Test plan

pytest tests/ passes (370 tests)
Fix the test-clobber bug above before reviewer runs the suite
Reviewer runs ./recommend setup --refresh-data and verifies progress bars + log rendering
Verifies audit file at recommender/cache/logs/tmdb_audit.txt contains all six sections in full

The stdout StreamHandler couldn't coordinate with rich.progress/status, so DEBUG/INFO log lines emitted during the enrichment step appended to the spinner's render line instead of breaking onto their own line. Switch the stream handler to RichHandler bound to a module-level Console(stderr=True) exported from recommender.log, and have setup.py and main.py import that same Console so Progress and the logger share live state. RichHandler pauses the active Progress while it writes, so log records render cleanly above the bar. Side effects: * Stream logs now go to stderr (matched what most CLIs do already). Three test_log.py capsys assertions updated from .out to .err. * Fixed a pre-existing %d-on-None TypeError in enricher.enrich() that was hidden by previous logger configuration ordering in the test suite.

Apply the same Progress widget to the other two long-running setup steps: * TMDB metadata fetch — was an indeterminate console.status spinner with a periodic "i/N processed" line. Replaced with an inline bar driven by the existing enumerate() index. * Taste profile build — was a spinner that gave no feedback for the full duration of a 10-batch LLM job. Added an on_batch_progress callback to taste_profile_builder.build() so setup can drive a bar that fills as each batch completes (cached batches advance the bar immediately). Factored the Progress construction into a small _progress_bar() helper so all three steps use the same column layout (spinner, label, bar, M/N, optional extra, elapsed, eta).

User-facing setup output was dominated by lines that just echo internal state — temp directories, extracted CSV paths, provider model IDs, raw row counts before transforms. Each ingestion module also re-announced the per-platform parse count immediately before setup's own "<platform>: ok N events" console line restated the same number more cleanly. Reclassified to DEBUG so a normal run shows only the meaningful human-readable summaries; --debug or the rotating file log still captures everything for diagnostics. Affected: * netflix/prime/disney "reading <temp path>" / "read N raw rows" * apple_tv "Extracting from", "Found", "N watch events parsed" * llm "Using <provider> (fast=..., reason=...)" for all three providers Left at INFO: watch_index dedup/cleanup counts (one-shot summaries with unique information not duplicated by console output).

_audit_cache_mismatches() previously printed the first 10 entries per category and dropped the rest on the floor. With several hundred mismatches it was impossible to triage them or to feed the full list to an LLM-driven override-generator. Now writes the complete audit (no truncation) to recommender/cache/logs/tmdb_audit.txt with one section per category: unmatched titles, content-type mismatches, title mismatches, year mismatches, runtime mismatches, and weak matches. Console output still truncates to 10 per category so the run summary stays readable. Added a new "unmatched titles (no TMDB ID)" section so titles that report_unmatched flags are also captured in the same file, giving a single triage surface.

Addresses two P1/P2 findings from review of #52: P1 — Tests could overwrite the real TMDB audit artifact. _audit_cache_mismatches() now accepts an optional audit_output_path that callers can override; default remains config.TMDB_AUDIT_PATH so the production caller is unchanged. The five direct tests in test_main.py pass tmp_path / "audit.txt". Belt-and-suspenders: a new tests/conftest.py autouse fixture also patches config.TMDB_AUDIT_PATH globally so any future test that runs the setup pipeline indirectly (e.g. via run_setup) cannot clobber the real artifact either. Without this, the audit-via-run_setup test in test_main.py was still mutating the real file even with the param fix applied to the direct callers. P2 — Stale findings could survive a clean rerun. The previous code only wrote the file when has_any was true. A clean refresh after a previous bad run would leave the prior audit text intact and misrepresent the current state. Always rewrite, with an explicit "No mismatches detected." marker when sections are empty.

Addresses P2 finding from review of #52: the rotating file log was hard-coded to INFO regardless of level_override, so messages demoted to DEBUG (provider init in llm.py, temp-path traces in the ingestion parsers) never made it to logs/app.log even when the operator ran with --debug. The PR description claimed otherwise; this commit makes the claim true. setup_logging now derives file_level from stream_level: DEBUG when explicitly requested, INFO otherwise (matching prior behavior for default and WARNING configurations). _configure_file_handler takes the target level as a parameter so the werkzeug logger stays at INFO (no DEBUG-level Werkzeug traffic worth persisting).

abhichandra21 · 2026-05-13T03:14:18Z

All three review findings addressed in 2 commits:

P1 (tests can overwrite the real audit artifact) — 773117d
- _audit_cache_mismatches() now takes an optional audit_output_path; the five direct tests pass tmp_path / "audit.txt".
- Added tests/conftest.py with an autouse fixture that patches config.TMDB_AUDIT_PATH globally. This was load-bearing: even with the param fix applied to the direct tests, the audit-via-run_setup test in test_main.py was still clobbering the real file because run_setup uses the default. Verified with cp /tmp/audit_before.txt → run full pytest → diff (audit file unchanged after suite).
P2 (stale findings survive a clean rerun) — 773117d
- Audit now rewrites unconditionally with an explicit No mismatches detected. marker when sections are empty.
P2 (DEBUG-demoted logs don't reach logs/app.log) — a4f02f9
- setup_logging now derives file handler level from level_override instead of hard-coding INFO. With --debug, both stream and file handlers capture DEBUG. Werkzeug stays at INFO (no DEBUG-level Werkzeug traffic worth persisting). The PR description claim is now accurate.

Full suite: 370 passed. No new tests for the conftest fixture itself; the proof is in the cross-suite diff check shown above.

abhichandra21 added 7 commits May 12, 2026 18:35

chore(enricher): drop per-title cache-hit debug log

a0337e1

The progress bar now shows the running cache-hit count, so the per-title log line is redundant. It also interleaves with the rich Progress/spinner render and breaks the terminal display under DEBUG logging.

Revert "chore(enricher): drop per-title cache-hit debug log"

bb85cc7

This reverts commit a0337e1.

abhichandra21 added the enhancement New feature or request label May 13, 2026

abhichandra21 added 2 commits May 12, 2026 22:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(setup): progress bars, RichHandler logging, log-level cleanup, TMDB audit dump#52

feat(setup): progress bars, RichHandler logging, log-level cleanup, TMDB audit dump#52
abhichandra21 wants to merge 9 commits into
masterfrom
feat/enrichment-progress-bar

abhichandra21 commented May 13, 2026

Uh oh!

abhichandra21 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abhichandra21 commented May 13, 2026

Context

Summary

⚠️ Known bug — must be fixed before merging

Out of scope

Files

Test plan

Related

Uh oh!

abhichandra21 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant