feat: unify rerank/embed degrade-logging; default scaffold to Gitee embed+rerank#251
Conversation
… to Gitee embed+rerank Read-path resilience + observability, fail-fast invariants preserved: - transient query-embedding failure degrades the hybrid query to FTS-only (vec leg dropped) instead of 500; hybrid-only, single-leg vector/bm25 ablations re-raise (eval purity); permanent query-embed still 500s. - transient rerank-degrade + exhausted embed-batch-skip now log at ERROR (a configured leg that failed; was WARNING). In-progress retries stay WARNING. - enabled-but-unconfigured reranker logs one WARNING per retrieve; a write path that defers embedding (no embedder wired / version drift) warns instead of silently shipping a vector-less doc (ingest/wisdom/synth). - permanent provider errors (401/403/404, bad key/model) still fail fast on both read (->500) and write (ingest aborts) paths — invariant unchanged. default_config(): embedder -> Gitee bge-m3 (dim 1024) AND ships a Gitee BAAI/bge-reranker-v2-m3 reranker, both keyed by one GITEE_API_KEY, so a fresh dikw init reranks out of the box (OpenAI has no /rerank endpoint). LLM default stays Anthropic. Tests pin the historical openai identity + clear rerank via init_test_base so the suite is insulated from the shipping default. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…gate synth deferred-embed warning Address codex review of the prior commit: - default_config(): set embedding_batch_size=16. The default embedder is now Gitee (bge-m3), and Gitee's /v1/embeddings caps the input array at 25 (HTTP 400 above) — the inherited OpenAI-tuned 64 would fail a fresh ingest out of the box (docs/providers.md gotcha #2). The 64 field default still applies to bases that omit the field. - api_synth: gate the "embedder wired but no active text version" warning on there being sources to synth, so a no-op synth on an empty base stays quiet (codex minor false-positive). - tests/fakes.init_test_base: pin embedding_batch_size=64 (historical test value) so the suite is insulated from the new Gitee scaffold default — the streaming tests string-replace `embedding_batch_size: 64` to force tiny batches. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…der default, synth drift guard, eval-purity flag - rerank_model: use the Gitee-served BARE `bge-reranker-v2-m3` (not the HuggingFace-prefixed `BAAI/...`, which 404s on Gitee → permanent error → 500 on every fresh-base retrieve). The eval-verified baseline + cookbook + wiring test all use the bare id. Fixed default + test + docs + the Gitee-citing validator/docstring examples. - DRY the two diverging provider defaults: `default_config()` (dikw init) now reuses `_default_provider_config()` (the DikwConfig.provider field factory), which is updated to the Gitee profile — one source of truth so a bare DikwConfig() and an init'd base no longer ship different embedders. - synth: resolve the active text version via the shared drift-aware `_resolve_active_text_version_for_inline_embed` (same as wisdom / lint apply) instead of a raw get_active_embed_version, so a cfg-identity drift defers inline embed instead of embedding under the wrong model/space (which would deactivate the freshly-synthesized page). Honors the documented invariant. - eval purity: add `degrade_query_embed_on_transient` (default True for the production read path); eval/runner passes False so a transient query-embed blip fails the eval loud rather than silently degrading a hybrid query and biasing the --against gate. - docs: quickstart `export OPENAI_API_KEY` → `GITEE_API_KEY`; README config block → Gitee, matching the new scaffold. Tests: +test_default_provider_factory_matches_scaffold, +test_synth_defers_inline_embed_on_version_drift, +test_query_embed_transient_propagates_when_degrade_disabled. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 41 minutes and 14 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughSwitches the default ChangesGitee defaults, read/write-path resilience, and observability
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/dikw_core/api_synth.py`:
- Around line 144-158: The warning in api_synth.py is gated on sources, but
sources still includes items skipped by the already logic, so it can fire even
when no synth work happens. Move the warning gate to the post-skip candidate set
used by the synth path, or trigger it only after a source actually enters the
rewrite flow in the same function where embed_deferred and already are
evaluated. Keep the message and logger.warning call, but ensure it reflects only
documents that will באמת be synthesized without inline embeddings.
In `@src/dikw_core/domains/info/search.py`:
- Around line 460-463: The fallback log in the text-embedding failure path is
misleading because the query may still use multimodal or graph signals, so it
should not be described as FTS-only. Update the error message in the search flow
around the text embedding failure handling in the relevant search method to use
neutral wording that reflects partial degradation without implying only FTS
remains, while keeping the existing exception logging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 6e1a80e0-bde1-4878-b63c-d0961de00730
📒 Files selected for processing (21)
CHANGELOG.mdCLAUDE.mdREADME.mddocs/getting-started.mddocs/providers.mdsrc/dikw_core/api_ingest.pysrc/dikw_core/api_retrieve.pysrc/dikw_core/api_synth.pysrc/dikw_core/api_wisdom.pysrc/dikw_core/config.pysrc/dikw_core/domains/info/embed.pysrc/dikw_core/domains/info/search.pysrc/dikw_core/eval/runner.pysrc/dikw_core/providers/rerank.pytests/fakes.pytests/test_config.pytests/test_embed_resilient.pytests/test_rerank_api_wiring.pytests/test_search.pytests/test_synthesize_pipeline.pytests/test_write_path_embed_warnings.py
| sources = list(await storage.list_documents(layer=Layer.SOURCE, active=True)) | ||
| # Heads-up only when there's real work: an embedder was wired but no | ||
| # usable active text version (none yet, or cfg drifted from the active | ||
| # identity), so any pages synthesized below land without inline vectors | ||
| # (a future ingest's resume scan reconciles them) — warn rather than | ||
| # silently shipping vector-less K pages. Gated on ``sources`` so a no-op | ||
| # synth on an empty base stays quiet. | ||
| if embed_deferred and sources: | ||
| logger.warning( | ||
| "synth: embedder wired but no usable active text version (none " | ||
| "yet, or cfg embedding identity drifted from the active one); " | ||
| "%d source(s) will be synthesized without inline embeddings (a " | ||
| "future ingest will reconcile their vectors)", | ||
| len(sources), | ||
| ) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Gate this warning on actual synth work.
sources includes documents that the existing already logic will skip. On a default rerun of an up-to-date base, this still logs that sources "will be synthesized without inline embeddings" even though no page is rewritten. Please key the warning off the post-skip candidate set, or emit it only once a source actually enters the synth path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/dikw_core/api_synth.py` around lines 144 - 158, The warning in
api_synth.py is gated on sources, but sources still includes items skipped by
the already logic, so it can fire even when no synth work happens. Move the
warning gate to the post-skip candidate set used by the synth path, or trigger
it only after a source actually enters the rewrite flow in the same function
where embed_deferred and already are evaluated. Keep the message and
logger.warning call, but ensure it reflects only documents that will באמת be
synthesized without inline embeddings.
| logger.error( | ||
| "query text embedding failed transiently; degrading to " | ||
| "FTS-only for this query", | ||
| exc_info=True, |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Avoid calling this fallback FTS-only.
If multimodal retrieval or the graph leg is active, this path still has non-FTS signals after the text-vector leg drops out. The current message will mislead debugging.
Suggested wording
- logger.error(
- "query text embedding failed transiently; degrading to "
- "FTS-only for this query",
- exc_info=True,
- )
+ logger.error(
+ "query text embedding failed transiently; running this "
+ "query without the text-vector leg",
+ exc_info=True,
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.error( | |
| "query text embedding failed transiently; degrading to " | |
| "FTS-only for this query", | |
| exc_info=True, | |
| logger.error( | |
| "query text embedding failed transiently; running this " | |
| "query without the text-vector leg", | |
| exc_info=True, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/dikw_core/domains/info/search.py` around lines 460 - 463, The fallback
log in the text-embedding failure path is misleading because the query may still
use multimodal or graph signals, so it should not be described as FTS-only.
Update the error message in the search flow around the text embedding failure
handling in the relevant search method to use neutral wording that reflects
partial degradation without implying only FTS remains, while keeping the
existing exception logging.
codecov/patch flagged 6 new lines without coverage. Add tests for the read-path multimodal (asset-leg) query-embed transient degrade — both the hybrid degrade-to-FTS path and the eval-opt-out propagate path (search.py) — and the asset embed-batch exhausted-retry ERROR log (embed.py). Mirrors the existing text-leg / chunk-leg cases. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Cut the 0.6.5 release and bump every user-facing version reference ahead of tagging. Since 0.6.4 the engine gained three changes, all already merged and CI-green on main: - #251 (feat) — default `dikw init` scaffold now ships a Gitee `bge-m3` embedder + `bge-reranker-v2-m3` reranker keyed by one `GITEE_API_KEY`, and rerank/embed degrade-logging is unified (transient leg failures log at ERROR, enabled-but-unconfigured / deferred-embed log at WARNING; permanent provider errors still fail fast). - #250 (fix) — eval snapshot cache key now includes the ingest-time `cjk_tokenizer` and reads every search-time retrieval knob from the live config, so retrieval ablations on a shared `cache_root` under the default `--cache read_write` stop silently reusing the prior run's config. - #249 (feat) — eval per-query + negative rows surface `top1_score` and `top1_vec_cosine` (reranker-independent, via an eval-internal probe) so expect_none / OOD robustness is measurable from absolute relevance. Mechanics (mirrors the prepare-0.6.4 PR #247): - Version bump `0.6.4 -> 0.6.5` (`pyproject.toml` + `uv.lock` self-entry). - CHANGELOG: rename `## Unreleased` -> `## 0.6.5 — …` (Added/Changed/Fixed), open a fresh empty `## Unreleased`. The 0.6.5 heading is awk-extractable by `release.yml`'s notes extractor. - Docs/examples: bump the GHCR image tags (`docs/deployment-docker.md`), the pip-install pin (`docs/getting-started.md`), the compose `.env` default + error-message example (`examples/docker/.env.example`, `docker-compose.yml`), and the bug-report version placeholder. - `examples/docker/Dockerfile` intentionally NOT bumped — `release.yml`'s `sync-dockerfile` job auto-bumps it after the tag publishes to PyPI; its current 0.6.4 is published, so the Dockerfile guard stays green. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What
Unifies rerank/embedding failure semantics on the retrieval read + write paths into one observable, never-silently-broken contract, and flips the
dikw initdefault scaffold to a single-key Gitee embed + rerank profile.Degrade / log contract
Concretely:
vector/bm25ablations re-raise. A transient rerank failure degrades to the fused order. Both now log at ERROR. An enabled-but-unconfigured reranker logs one WARNING per retrieve.Default scaffold → Gitee (one
GITEE_API_KEYdrives both legs)dikw initnow defaults the embedder to Giteebge-m3(dim 1024, batch 16 — Gitee caps the input array at 25) and ships a Giteebge-reranker-v2-m3reranker, so a fresh base reranks out of the box (OpenAI has no/rerankendpoint). LLM default stays Anthropic. The field default factory (_default_provider_config) and the scaffold are now a single source of truth.Why
Before, a transient model blip 500'd the read path while a misconfig and an unconfigured leg were indistinguishable in logs; the embed write path classified transient-vs-permanent but the "configured-but-failed" and "not-configured" cases were silent. This makes every degrade observable at the right level while keeping the deliberate fail-fast-on-misconfig invariant.
Notable review catches (fixed)
bge-reranker-v2-m3— the HuggingFace-prefixedBAAI/...404s on Gitee → permanent error → 500 on every fresh-base retrieve.embedding_batch_size=16in the default — Gitee's 25-item cap would 400 a fresh ingest at the inherited OpenAI-tuned 64.degrade_query_embed_on_transient=False) so the--againstgate isn't false-flagged.Eval gate
no-baseline-needed: under healthy providers the change is ranking-neutral — the degrade fires only on injected failure, the rest is log levels + a default-config swap (eval uses its own dataset config, notdefault_config). Therecall@100invariant is structurally preserved.Delivery receipt
Verify (step 3):
tools/check.pypgvector/pgvector:pg18; storage + task-store contract, 186 passed, 0 PG-skipsdikw init→ Gitee bge-m3 + barebge-reranker-v2-m3+ batch 16; validator OKno-baseline-needed(ranking-neutral)Codex review (step 4): 1 round — TP: Gitee batch size (fixed) + synth deferred-embed warning false-positive (gated).
Fresh-review (step 5): PASS — no blocking findings (fail-fast preserved, mode-gating correct, layering clean).
/code-review xhigh (step 5, 31 agents): 6 findings triaged vs source — 4 CONFIRMED + 1 PLAUSIBLE fixed (rerank id, provider-default divergence, README/getting-started drift, synth drift, eval purity); 1 nit rejected (synth-without-any-embedder is an intentional LLM-only mode).
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Documentation