feat: auto-detect image aspect ratio from prompt#125

Open

TriTue2011 wants to merge 958 commits into

basketikun:mainfrom

TriTue2011:main

TriTue2011 commented May 6, 2026

Cho phép tự động nhận diện tỷ lệ khung hình (ví dụ 16:9, 1:1, 4:3) từ nội dung của prompt để ghi đè lên cấu hình mặc định (khi kích thước bị Home Assistant hoặc các client khác gửi lên mặc định). Điều này cho phép ghi đè tham số \size\ bằng những keyword tìm thấy trong yêu cầu gốc của người dùng.

Author

TriTue2011 commented May 7, 2026

I have pushed two additional commits to this PR:

Fix: Changed _build_tool_prompt\ to check for \properties\ instead of
equired. This fixes a critical bug where tools with only optional arguments (like Home Assistant's \HassTurnOn) were forced to be called with {}\ because they lacked required parameters. Now the model can see and use optional arguments correctly.
Update: Translated the hardcoded image size hints from Chinese to Vietnamese to better support Vietnamese text-to-speech and users. (Note: Feel free to ask if you'd prefer this part to remain in Chinese or be changed to English for internationalization!)

TriTue2011 force-pushed the main branch from 3a4540c to 9aa7d7d Compare

May 7, 2026 12:06

TriTue2011 and others added 28 commits

May 22, 2026 19:18


          feat: Add playwright + cloudscraper for Turnstile bypass

35531dc

Dockerfile: Chromium system libs (libatk, libcups, etc.)
pyproject.toml: playwright>=1.40, cloudscraper>=1.2

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          feat(phat_nguoi): Playwright headless browser for Turnstile bypass

fd426fa

- Use playwright (Chromium) to load phatnguoi.vn, auto-solve Turnstile
- Fill form, submit, wait for result, parse violation data
- Removes dependency on overloaded api.phatnguoi.vn endpoint

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix(Dockerfile): Fix Chromium lib package names for Debian Bookworm

daf9fff

t64 suffix not available on python:3.13-slim (Bookworm).
Use standard package names.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix(Dockerfile): Use playwright install --with-deps for auto dep reso…

1c432eb

…lution

--with-deps automatically installs correct system packages for Chromium.
Avoids manual package name errors across Debian versions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix(telegram): Increase AI timeout 90s → 180s for slow models

54c68ba

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix(telegram): Return webhook immediately, process AI in background t…

0f3b03f

…hread

Webhook was hanging for 90-180s waiting for AI response.
Now responds instantly (200 OK) and processes AI in daemon thread.
Timeout increased to 300s for slow models.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix(phat_nguoi): Fix button selector - use #tracuu form + visible fal…

09cc155

…lback

button[type=submit] matched 4 elements, first one invisible.
Narrow to #tracuu form context.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix: Replace playwright with cloudscraper for lighter build

3cb5b15

- Remove playwright + Chromium (~200MB) from Docker image
- Use cloudscraper to call api.phatnguoi.vn with retry
- 5 retries with increasing delay for rate-limited/overloaded API

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          feat(telegram): Add photo + PDF file handling

6ef972a

- Download photos/documents via Telegram getFile API
- PDF: extract text with markitdown + pdftotext, AI summarize
- Photo: acknowledge receipt (vision support coming)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          feat(telegram): Send photos/documents + image OCR

0c8f811

- send_photo(chat_id, bytes): Send images via multipart upload
- send_document(chat_id, bytes, filename): Send files
- Image OCR with pytesseract for received photos
- PDF processing with markitdown + AI summarize

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          chore: Trigger rebuild for cloudscraper deployment

89d2471

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix(mcp): Auto-update installed MCP URLs on hub discover

727ba21

When user clicks Connect Hub with a new URL, update all
already-installed MCPs to use the new hub address.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix(mcp): Handle FastMCP plain JSON response, not just SSE

d22caba

_call() only parsed SSE data: lines. FastMCP Streamable HTTP
returns plain JSON. Added fallback to parse JSON directly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix(mcp): Also handle plain JSON in HTTPError response handler

a89922a

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          debug(mcp): Add mcp_debug log to trace why tools count is 0

b186299

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix(mcp): Remove 30s failure cooldown preventing retry

The cooldown on _last_fail prevented MCP sessions from retrying
after an initial connection failure. Now always retries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix(mcp): Correct protocol version + add initialized notification

b1140a6

- Change protocolVersion from 0.1.0 to 2024-11-05 (MCP spec)
- Add notifications/initialized after initialize (required by spec)
- Without these FastMCP streamable HTTP ignores the session

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          debug(mcp): Add start log to trace _inject_mcp_tools call path

778a403

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix(mcp): Sync both files in single commit - protocol + debug

ee818db

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix(mcp): Convert dict.values() to list - isinstance check fails

fc85c09

dict.values() returns dict_values, not list. isinstance check fails
and function returns [] before reaching debug log. Root cause of
mcp_count:0 bug.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix(mcp): Add missing return statement after if

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix(mcp): Deduplicate tool names for DeepSeek compatibility

2bae853

DeepSeek requires unique tool names across all MCP servers.
Filter duplicates keeping first occurrence.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          perf(mcp): Parallel MCP tool execution + deduplicate tool names

a68abd1

- Execute multiple MCP tool calls in parallel via ThreadPoolExecutor
- Single tool calls remain sequential (no overhead)
- Deduplicate tool names for DeepSeek compatibility
- Timeout 30s for parallel batch

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          feat(ha): Expand context domains + improve friendly_name guidance

739bf9d

- Add sensor, binary_sensor, scene, script, vacuum, camera, etc.
- Reduce per-domain max to 20 (balanced with 16 domains)
- Guide AI to search by friendly_name (natural language)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          perf(ha): Dynamic entity discovery — 90% token reduction

25f055b

MCP Assist approach: instead of dumping all entities (12K tokens),
provide compact domain summary. AI uses ha_search_entities for
on-demand lookup. Only actionable devices listed directly.
Sensors/binary_sensors shown as name preview only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          perf(ha): Fast path - skip 77 MCP tools for HA commands

3f86f63

HA requests only need 3 tools (ha_call_service, ha_search, ha_get_state).
Skip loading 77 MCP tools to reduce context and latency.
Target: 16s → 2-3s for simple commands.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          perf(ha): Auto-detect HA requests — skip MCP tools for control+status

176bdad

Keywords: bật, tắt, đèn, quạt, nhiệt độ, trạng thái, etc.
HA requests only get 3 tools instead of 80+ → 3-5x faster.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          perf(ha): Cache pre-computed context + configurable refresh interval

048649a

- Pre-compute context string once, reuse until TTL expires
- Configurable refresh_interval in HA settings (default 60s)
- On first request: fetch states once, build context, cache result
- Subsequent requests within TTL: return cached string instantly
- Settings key: home_assistant.refresh_interval (seconds)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

TriTue2011 and others added 30 commits

May 27, 2026 01:13


          fix: file upload for tool results (GetLiveContext output can be 100KB+)

Extended file upload bypass to ALL message roles (user, system, tool).
When GetLiveContext returns >80KB of device states, the tool result is
uploaded as a file instead of being head+tail compressed.

Previously only user and system messages were checked. Tool results
from GetLiveContext could be silently truncated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix: parse XML tool calls from ChatGPT web response + remove 3000-cha…

a39c95c

…r truncation

The ChatGPT web backend returns tool calls as XML text in content
(```xml <tool_call>), not native function-call objects. Added XML
parsing in _execute_mcp_tools_in_response so server-side tools
(GetLiveContext, ha_*) are executed instead of passed through.

Also removed the 3000-char truncation on tool results — file upload
bypass now handles large content preservation during re-query.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix: parse XML tool calls in streaming path (_wrap_mcp_stream)

92580c3

The chatgpt web backend (free accounts) outputs tool calls as XML text
in content deltas, not native tool_calls objects. _wrap_mcp_stream was
only checking delta.tool_calls, so XML calls were streamed through to
HA as raw text.

Now parses ```xml <tool_call> from the accumulated content, converts to
native tool_calls, strips the XML fence, and feeds into the agentic
execution loop.

This is the PRIMARY code path for chatgpt/auto → free JWT accounts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          Fix: correctly stream XML-extracted tool calls for native HA tools

70beddb


          Fix: correctly extract function name and arguments from XML tool calls

922cbb7


          Fix: correct XML extraction in _execute_mcp_tools_in_response as well

fe29e2c


          Fix: initialize xml_calls to prevent UnboundLocalError

ed78d45


          Fix: include tool and assistant roles in web_proxy _last_user_text fo…

7c4d94b

…r multi-step agentic loops


          Fix: correctly translate tool messages and tool_calls for chatgpt nat…

f767c58

…ive backend compatibility


          Fix root cause: chatgpt/auto Docker path bypassed _wrap_mcp_stream - …

685168c

…thread route through handler chain


          Fix 400 error: normalize role=tool to role=user in _dispatch for chat…

a68e5bc

…gpt.com provider re-dispatch


          Redesign: pre-fetch GetLiveContext before LLM call for chatgpt/auto (…

a2d57db

…single-response HA constraint)


          Fix chatgpt free payload limit: upload large HA context as file using…

71f4f60

… _file_upload_store


          Fix hallucination: inject HA context directly instead of file upload …

9c287cc

…for chatgpt free, truncate to 70KB


          Fix GetLiveContext to use format_states_context instead of raw unfilt…

39edc7f

…ered dump


          Remove _MAX_PER_DOMAIN limit (set to 9999) to allow listing all devic…

140d537

…es in HA context


          Strengthen AI prompt instructions for GetLiveContext to prevent hallu…

05df7c5

…cination on random entities


          Filter HA Context by _CONTEXT_DOMAINS to remove weather/calendar and …

e672d3e

…avoid AI distraction


          Add weather back to _CONTEXT_DOMAINS

bb0b451


          Add 'thời tiết' to HA prefetch keywords

2f6ebfa


          Update system prompt to explicitly enforce comprehensive device summary

2d5a17d


          Inject HA context directly into the user message to prevent prompt dr…

81701d1

…owning in GPT-4o-mini


          Fix float conversion error for ISO expires_at strings in token refresh

410b8c8


          fix: prevent duplicate HA context injection + add mandatory listing i…

6e206ba

…nstructions

- Skip _prefetch_ha_context_if_needed when inject_ha_context() already
  inserted the device registry (detected by "Device Registry" marker).
  Double context (~98KB) overwhelmed the LLM, causing it to only read
  one device group per response.
- Add SYSTEM OVERRIDE instructions to _build_context() forcing the LLM
  to summarize ALL controllable device groups (lights, fans, ACs, doors,
  switches, locks) instead of cherry-picking 1-2 entities.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>


          fix: keep read-only HA tools (GetLiveContext) for HA-native clients

0b75273

Previously HA clients with HassTurn* tools got ha_tools=[] and had to
rely on injected context (49KB) which the LLM often only partially read.
Now they get GetLiveContext, ha_search_entities, ha_get_state so the LLM
can query live device state via tool calls, matching Gemini pipeline.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

c6d5ebb

fix(account): hard-pin chatgpt/auto to free-only pool, block codex leaking

get_text_access_token(account_type="free") now skips "free,codex" hybrid
accounts so Codex quota never burns on free-tier traffic (HA voice,
ai_task, n8n). add_accounts_with_type no longer merges "free"+"codex",
keeping the two pools strictly separate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@

2a5d984

fix(ui): reset model numbering per provider section

Compute order number from regularModels (already scoped to current
provider) instead of enabled_models[provider] which may contain entries
from other providers, causing chatgpt numbering to continue from codex.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@

29bd56a

fix(ha): require GetLiveContext tool call instead of answering from static registry

Change _build_context() instructions so the LLM MUST call GetLiveContext
to fetch live device state from HA, matching Gemini pipeline behavior.
The registry is still injected for entity_id lookup (control commands).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@

5f79009

fix(build): use bun install instead of npm install for web-build stage

npm does not read bun.lock by default, causing sporadic failures when
a dependency is yanked or version resolution drifts. Switch the
web-build stage to bun (canonical lock file format for this project).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@


          fix(build): remove --frozen-lockfile from bun install

b1a29bc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet