Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,4 +156,4 @@ Common variables (see `app.rag.*` defaults in `application.properties`):
- `RAG_TOP_K`
- `RAG_RETURN_K`
- `RAG_CITATIONS_K`
- `RAG_RERANKER_TIMEOUT` (default `12s`) — timeout budget for LLM reranking calls
- `RAG_RERANKER_TIMEOUT` (default `30s`) — timeout budget for LLM reranking calls
4 changes: 2 additions & 2 deletions docs/retrieval-pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ Each document is presented as `[index] title | url` followed by the first 500 ch

- Model: same provider as chat (OpenAI/GitHub Models)
- Temperature: `0.0` (deterministic)
- Timeout: configurable via `app.rag.reranker-timeout` (default 12s)
- Timeout: configurable via `app.rag.reranker-timeout` (default 30s)
- Response format: `{"order": [0, 3, 1, 2, ...]}` (0-based indices)

### Response parsing
Expand Down Expand Up @@ -292,7 +292,7 @@ All four must be non-blank and distinct (validated on startup).
|---|---|---|---|
| `app.rag.search-top-k` | `12` | Must be > 0 | Candidates fetched from hybrid search before reranking |
| `app.rag.search-return-k` | `6` | Must be > 0, must be <= `search-top-k` | Results returned to the LLM after reranking |
| `app.rag.reranker-timeout` | `12s` | Must be positive | Timeout for LLM reranking call |
| `app.rag.reranker-timeout` | `30s` | Must be positive | Timeout for LLM reranking call |
| `app.rag.search-citations` | `3` | Must be >= 0 | Citation references included in the response |
| `app.rag.search-mmr-lambda` | `0.5` | Must be in [0.0, 1.0] | MMR lambda (higher = relevance, lower = diversity) |
| `app.rag.chunk-max-tokens` | `900` | Must be > 0 | Max tokens per ingested chunk (see [ingestion.md](ingestion.md)) |
Expand Down