Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
109 commits
Select commit Hold shift + click to select a range
0e24a87
Switch from buf.build to local generated protobuf + OpenRouter support
claude Mar 19, 2026
9f26f20
Improve agent with 7 benchmark-driven enhancements (U1-U7)
claude Mar 19, 2026
c444000
гз
claude Mar 19, 2026
cb9c5bb
Improve agent with 4 benchmark-driven fixes (U8-U11): 100% score
claude Mar 20, 2026
3fd5fd3
Improve agent with 6 deterministic fixes (t02/t03/t05): smart refs, A…
claude Mar 20, 2026
899601d
Improve agent to 100% score: Fix-21 through Fix-27 for qwen3.5:9b
claude Mar 21, 2026
2db3f40
Improve agent to 100% score: Fix-54 through Fix-61 for qwen3.5:4b
claude Mar 22, 2026
67f8e25
гз
claude Mar 22, 2026
b46b648
up
claude Mar 22, 2026
3afb726
Improve agent to 100% score: Fix-62, Fix-62b, Fix-28b for qwen3.5:2b
claude Mar 22, 2026
f81702d
Add qwen3.5:2b benchmark results and update RESULT.md
claude Mar 22, 2026
448de0c
гз
claude Mar 24, 2026
087ab35
up
claude Mar 24, 2026
a04530c
Add FIX-63, stats table, and pac1-py fixes documentation
claude Mar 24, 2026
dcc6759
Switch to Anthropic SDK + Ollama fallback, add 3-min task timeout, fi…
claude Mar 25, 2026
7560c9c
Fix JSON parse fallback bug and move inline imports to module level i…
claude Mar 25, 2026
3f3ecc1
Add capability detection for structured output with fallback for unsu…
claude Mar 26, 2026
c8602e8
Add FIX-75/76: LLM-based task classification and multi-model routing …
claude Mar 26, 2026
9bc5996
fix(main): FIX-85/86B — add cloud Ollama MODEL_CONFIGS + MODEL_CLASSI…
claude Mar 27, 2026
ed27c66
fix(classifier): FIX-83/84/86A — Ollama model routing, think param, c…
claude Mar 27, 2026
c87850e
fix(dispatch): move is_ollama_model before call_llm_raw to resolve fo…
claude Mar 27, 2026
90e8661
chore: update fix counter to FIX-86
claude Mar 27, 2026
d80ed01
docs(readme): rewrite model configuration guide for normal and multi-…
claude Mar 27, 2026
5ce0000
fix(classifier): FIX-87 — adaptive token budget for thinking models i…
claude Mar 27, 2026
22569f7
fix(main): FIX-88 — always use ModelRouter so classification logs and…
claude Mar 27, 2026
b75ce6b
fix(stats): remove Думать(~tok) column and thinking_tokens tracking
claude Mar 27, 2026
a270a19
up
claude Mar 27, 2026
bee60bb
feat(classifier): FIX-98 — structured rule engine in classify_task()
claude Mar 28, 2026
079a795
feat(classifier): FIX-97 — keyword-fingerprint cache in ModelRouter
claude Mar 28, 2026
bd4ade7
feat(classifier): FIX-99 — two-phase LLM re-class with vault context
claude Mar 28, 2026
9a2fd0f
refactor(prompt): AB — discovery-first prompt audit (all P0–P3 fixes)
claude Mar 28, 2026
25544c3
merge(prompt): discovery-first prompt audit — AB branch → main
claude Mar 28, 2026
438051d
fix(dispatch): FIX-104 — plain-text Ollama retry when json_object fails
claude Mar 28, 2026
3550f70
fix(prompt): FIX-103 + FIX-104 — seq.json semantics and inbox non-email
claude Mar 28, 2026
8cb71b5
fix(classifier): FIX-103 — disable think + max_tokens=64 for Ollama c…
claude Mar 28, 2026
8addf1b
feat(stats): Ollama-native tok/s metrics + model config update
claude Mar 28, 2026
950c208
fix(classifier): FIX-103 — use _cls_cfg max_completion_tokens, not ha…
claude Mar 28, 2026
7b58137
docs(claude): update fix counter to FIX-107 (FIX-105..107 classifier …
claude Mar 28, 2026
00088c8
feat(logging): FIX-110 — LOG_LEVEL + auto-tee to logs/ + step/call st…
claude Mar 29, 2026
5e0f022
feat(classifier): FIX-111..112 — done_operations ledger + skip LLM cl…
claude Mar 29, 2026
840c6f8
feat(agent): FIX-113..116 — inbox channel trust rules + dynamic docs …
claude Mar 30, 2026
3846c56
feat(routing): FIX-117..118 — single-pass routing + ollama_options wi…
claude Mar 30, 2026
0142d43
feat(routing): FIX-119 — named profiles for task-adaptive ollama_options
claude Mar 30, 2026
02f8bb0
chore: merge pending changes — docs, env, gitignore, cleanup
claude Mar 30, 2026
b4987e3
feat(classifier): FIX-120 — regex pre-check fast-path in classify_tas…
claude Mar 30, 2026
a5f0228
fix(classifier): FIX-121 — reliable classifier under GPU load
claude Mar 30, 2026
750be55
fix(classifier): FIX-122 — remove max_tokens from Ollama tier in call…
claude Mar 30, 2026
3bad9ef
feat(loop): FIX-123..125 — context deduplication: tool result compact…
claude Mar 30, 2026
15f4964
fix(loop): FIX-123/125 review — two bugs in context compaction helpers
claude Mar 30, 2026
7ebc9cc
up
claude Mar 31, 2026
6c5d04c
fix(loop): FIX-127..130 — SGR-based verification cycles
claude Mar 31, 2026
a508a2e
fix(prompt): align FIX-113 contact rule with FIX-129 search cycle
claude Mar 31, 2026
80d84ec
fix(loop): verify — 4 post-/verify fixes (FIX-127..130)
claude Mar 31, 2026
c5f0897
fix(loop): FIX-131 — repair ReadRequest.path and remove false-positiv…
claude Mar 31, 2026
6b42074
fix(loop): FIX-128 router max_completion_tokens 256→512
claude Mar 31, 2026
c76aec6
fix(agent): FIX-134..139 — repair regression 66.67%→~93% (6 logic fixes)
claude Apr 1, 2026
39e81df
feat(models): add field validators for delete, EmailOutbox, search/find
Apr 1, 2026
ba3b9e6
refactor(dispatch): rename _TRANSIENT_KWS_RAW to public TRANSIENT_KWS…
Apr 1, 2026
3383f35
refactor(prompt): simplify system_prompt — remove duplicate contact r…
Apr 1, 2026
b504b9c
refactor(loop): Unit 2 — extract 5 inline blocks from run_loop() into…
Apr 1, 2026
52b9d0b
fix(models): restore wildcard check in Req_Delete validator
Apr 1, 2026
ce503fe
fix(models): remove wildcard check from Req_Delete Pydantic validator
Apr 1, 2026
6377fba
feat(agent): FIX-133 — code_eval sandbox + TASK_CODER type with MODEL…
Apr 1, 2026
7860f0d
refactor(loop): remove FIX-N labels from comments and print statements
Apr 1, 2026
243d82e
up
Apr 1, 2026
587568a
docs(env): add MODEL_CODER and new task-type model vars to .env.examp…
Apr 1, 2026
52fd622
up
claude Apr 1, 2026
979eb63
up
Apr 1, 2026
c850694
up
Apr 1, 2026
c67d89f
fix(loop): FIX-134 — replace hardcoded "qwen2.5:7b" with model variab…
claude Apr 1, 2026
0f43788
fix(loop): FIX-135 + FIX-136 — routing false-CLARIFY and JSON decode …
claude Apr 1, 2026
306b048
fix(loop): FIX-137 — use json_object for Ollama tier, not json_schema
claude Apr 1, 2026
3489d75
fix(prompt): FIX-138 — inbox injection scan before format detection
claude Apr 1, 2026
e561673
fix(prompt): FIX-139 — explicit inbox injection criteria, data-not-in…
claude Apr 1, 2026
93be7f5
fix(prompt): FIX-140 — split inbox security into explicit steps 1.5 a…
claude Apr 1, 2026
e81e7e1
fix(prompt): FIX-141 — null-field rule for structured file creation
claude Apr 1, 2026
c644dc4
fix(loop): FIX-142 — _verify_json_write injects correction on parse f…
claude Apr 1, 2026
aa6a019
fix(prompt+loop): FIX-143 + FIX-144 — invoice total and null-field ve…
claude Apr 1, 2026
ee66f9e
fix(prompt): FIX-145 — code_eval modules are pre-loaded, no import ne…
claude Apr 1, 2026
28f8e1f
fix(loop): FIX-146/147 — prefer richest JSON in extraction; widen rea…
claude Apr 1, 2026
8ace541
fix(loop): FIX-148 — pre-dispatch empty-path guard for write/delete/m…
claude Apr 1, 2026
235b7ad
fix(loop): FIX-149 — mutations rank above report_completion in JSON e…
claude Apr 1, 2026
e837cba
up
claude Apr 1, 2026
34be301
fix(loop): FIX-150 — infer tool from Req_XXX prefix; prefer bare tool…
claude Apr 1, 2026
dd73163
fix(prompt): FIX-151 — make reschedule +8 constant impossible to miss
claude Apr 1, 2026
5c6dc6c
fix(classifier): FIX-152 — route reschedule/postpone tasks to MODEL_C…
claude Apr 1, 2026
7657131
fix(classifier): FIX-152r — numeric duration pattern routes to MODEL_…
claude Apr 1, 2026
2f43662
fix(loop): FIX-153 — skip EmailOutbox schema check for seq.json and n…
claude Apr 1, 2026
2c94472
fix(prompt): FIX-154 — explicit OTP delete checklist in inbox step 2.6B
claude Apr 1, 2026
4d3b512
fix(loop): FIX-155 — hint-echo guard in _call_openai_tier
claude Apr 1, 2026
73c1fc3
fix(prompt): FIX-156 — close step 2.5 security check loopholes for re…
claude Apr 1, 2026
0399da4
fix(prompt,loop): FIX-157/158 — admin channel security exemption + DE…
claude Apr 1, 2026
14f6cb7
feat(arch): FIX-159–167 — coder sub-agent architecture + code_eval pa…
claude Apr 2, 2026
46750e0
fix(prompt,classifier,loop): FIX-168–176 — inbox security, routing, c…
claude Apr 2, 2026
a9eee16
fix(prompt): FIX-178 — precision instruction rule for "Return only" l…
claude Apr 2, 2026
cf5d3b0
merge(dev): incorporate main branch (FIX-133..139) into dev
claude Apr 2, 2026
dbafa3f
fix(prompt): FIX-180 — email body anti-contamination rule
claude Apr 2, 2026
8d8c7f5
fix(dispatch): FIX-181 — plain_text mode for coder model calls
claude Apr 2, 2026
c73b3df
fix(dispatch): FIX-182 — move context_vars guard before path injection
claude Apr 2, 2026
56fda02
u
Apr 3, 2026
20fb8a0
up
claude Apr 3, 2026
255fda5
up
Apr 3, 2026
24f25b7
fix(sampling): FIX-187 — add seed=42 to Ollama profiles, pass tempera…
claude Apr 3, 2026
32651e0
гз
claude Apr 3, 2026
1c2f377
fix(loop): FIX-188 — semantic router caching + conservative fallback
claude Apr 3, 2026
c824789
fix(prompt): FIX-189..194 — resolve audit 2.4 contradictions and ambi…
claude Apr 3, 2026
0389cb6
refactor(loop): FIX-195 — decompose run_loop() God Function
claude Apr 3, 2026
00ec655
Merge pull request #2 from ikeniborn/fix/195-decompose-run-loop
ikeniborn Apr 3, 2026
9b98220
fix(classifier): FIX-196..198 — classifier determinism and coder rout…
claude Apr 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
.DS_Store
.envrc
.idea/
.claude/plans
.secrets.backup
.secrets
tmp/
9 changes: 9 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Ограничения

1. Целевой каталог агента pac1-py
2. Нельзя корректировать pac1-py/.secrets

# Разработка

Никога не использовать паттерн хардкода при доработке агента.
Прорабатывать логику.
755 changes: 755 additions & 0 deletions docs/pac1-py-architecture-audit.md

Large diffs are not rendered by default.

30 changes: 30 additions & 0 deletions pac1-py/.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# pac1-py/.env — не коммитить в git
# Настройки без credentials. Credentials → .secrets
#
# Приоритет загрузки в dispatch.py:
# 1. переменные окружения (env)
# 2. .secrets
# 3. .env (этот файл — загружается первым, перекрывается .secrets и env)

# ─── Benchmark ───────────────────────────────────────────────────────────────
BENCHMARK_HOST=https://api.bitgn.com
BENCHMARK_ID=bitgn/pac1-dev
TASK_TIMEOUT_S=900

# ─── Роутинг по типам задания ────────────────────────────────────────────────
# Типы:
# classifier— лёгкая модель только для классификации задания
# default — все исполнительные задачи (capture, create, delete, move и т.д.)
# think — анализ и рассуждения (distill, analyze, compare, summarize)
# longContext — пакетные операции (all/every/batch + большой vault)
#
MODEL_CLASSIFIER=minimax-m2.7:cloud
MODEL_DEFAULT=minimax-m2.7:cloud
MODEL_THINK=minimax-m2.7:cloud
MODEL_LONG_CONTEXT=minimax-m2.7:cloud
MODEL_CODER=qwen3-coder-next:cloud

# ─── Ollama (local / cloud via Ollama-compatible endpoint) ───────────────────
OLLAMA_BASE_URL=http://localhost:11434/v1

LOG_LEVEL=DEBUG
46 changes: 46 additions & 0 deletions pac1-py/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# pac1-py/.env — не коммитить в git
# Настройки без credentials. Credentials → .secrets
#
# Приоритет загрузки в dispatch.py:
# 1. переменные окружения (env)
# 2. .secrets
# 3. .env (этот файл — загружается первым, перекрывается .secrets и env)

# ─── Benchmark ───────────────────────────────────────────────────────────────
BENCHMARK_HOST=https://api.bitgn.com
BENCHMARK_ID=bitgn/pac1-dev
TASK_TIMEOUT_S=300

# ─── Модель по умолчанию ─────────────────────────────────────────────────────
# Используется как fallback для любого незаданного MODEL_* ниже.
MODEL_ID=anthropic/claude-sonnet-4.6

# ─── Роутинг по типам задания ────────────────────────────────────────────────
# Обязательные переменные (агент не запустится без них):
# MODEL_CLASSIFIER — лёгкая модель только для классификации задания
# MODEL_DEFAULT — все исполнительные задачи (capture, create, delete, move и т.д.)
# MODEL_THINK — анализ и рассуждения (distill, analyze, compare, summarize)
# MODEL_LONG_CONTEXT — пакетные операции (all/every/batch + большой vault)
#
# Опциональные (fallback на default/think если не заданы):
# MODEL_EMAIL — compose/send email (fallback: MODEL_DEFAULT)
# MODEL_LOOKUP — поиск контактов, read-only запросы (fallback: MODEL_DEFAULT)
# MODEL_INBOX — обработка входящих сообщений (fallback: MODEL_THINK)
# MODEL_CODER — вычисления, арифметика дат, агрегация через code_eval
# (fallback: MODEL_DEFAULT; рекомендуется: детерминированная модель)
#
MODEL_CLASSIFIER=anthropic/claude-haiku-4.5
MODEL_DEFAULT=anthropic/claude-sonnet-4.6
MODEL_THINK=anthropic/claude-sonnet-4.6
MODEL_LONG_CONTEXT=anthropic/claude-sonnet-4.6
# MODEL_EMAIL=anthropic/claude-haiku-4.5
# MODEL_LOOKUP=anthropic/claude-haiku-4.5
# MODEL_INBOX=anthropic/claude-sonnet-4.6
# MODEL_CODER=qwen3.5:cloud # или любая модель с профилем coder (temperature=0.1)

# ─── Ollama (local / cloud via Ollama-compatible endpoint) ───────────────────
# Используется автоматически для моделей форматаname:tag(без слэша).
# Примеры: qwen3.5:9b, qwen3.5:cloud, deepseek-v3.1:671b-cloud
#
OLLAMA_BASE_URL=http://localhost:11434/v1
# OLLAMA_MODEL=qwen3.5:cloud
5 changes: 5 additions & 0 deletions pac1-py/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
__pycache__
*.egg-info
**/.claude/plans
**/.env
**/logs
1 change: 1 addition & 0 deletions pac1-py/.python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.12
12 changes: 12 additions & 0 deletions pac1-py/.secrets.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# pac1-py/.secrets — не коммитить в git
#
# Провайдеры LLM (приоритет при выборе бэкенда в dispatch.py):
# 1. ANTHROPIC_API_KEY → Anthropic SDK напрямую (только Claude-модели)
# 2. OPENROUTER_API_KEY → OpenRouter (Claude + open-source модели через облако)
# 3. Ничего → только Ollama (локальные / cloud-via-Ollama модели)

# ─── Anthropic (console.anthropic.com/settings/api-keys) ───────────────────
# ANTHROPIC_API_KEY=sk-ant-...

# ─── OpenRouter (openrouter.ai/settings/keys) ──────────────────────────────
# OPENROUTER_API_KEY=sk-or-...
Loading