Skip to content

feat: auto-detect image aspect ratio from prompt#125

Open
TriTue2011 wants to merge 958 commits into
basketikun:mainfrom
TriTue2011:main
Open

feat: auto-detect image aspect ratio from prompt#125
TriTue2011 wants to merge 958 commits into
basketikun:mainfrom
TriTue2011:main

Conversation

@TriTue2011
Copy link
Copy Markdown

Cho phép tự động nhận diện tỷ lệ khung hình (ví dụ 16:9, 1:1, 4:3) từ nội dung của prompt để ghi đè lên cấu hình mặc định (khi kích thước bị Home Assistant hoặc các client khác gửi lên mặc định). Điều này cho phép ghi đè tham số \size\ bằng những keyword tìm thấy trong yêu cầu gốc của người dùng.

@TriTue2011
Copy link
Copy Markdown
Author

I have pushed two additional commits to this PR:

  1. Fix: Changed _build_tool_prompt\ to check for \properties\ instead of
    equired. This fixes a critical bug where tools with only optional arguments (like Home Assistant's \HassTurnOn) were forced to be called with {}\ because they lacked required parameters. Now the model can see and use optional arguments correctly.
  2. Update: Translated the hardcoded image size hints from Chinese to Vietnamese to better support Vietnamese text-to-speech and users. (Note: Feel free to ask if you'd prefer this part to remain in Chinese or be changed to English for internationalization!)

TriTue2011 and others added 28 commits May 22, 2026 19:18
Dockerfile: Chromium system libs (libatk, libcups, etc.)
pyproject.toml: playwright>=1.40, cloudscraper>=1.2

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Use playwright (Chromium) to load phatnguoi.vn, auto-solve Turnstile
- Fill form, submit, wait for result, parse violation data
- Removes dependency on overloaded api.phatnguoi.vn endpoint

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
t64 suffix not available on python:3.13-slim (Bookworm).
Use standard package names.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…lution

--with-deps automatically installs correct system packages for Chromium.
Avoids manual package name errors across Debian versions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…hread

Webhook was hanging for 90-180s waiting for AI response.
Now responds instantly (200 OK) and processes AI in daemon thread.
Timeout increased to 300s for slow models.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…lback

button[type=submit] matched 4 elements, first one invisible.
Narrow to #tracuu form context.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Remove playwright + Chromium (~200MB) from Docker image
- Use cloudscraper to call api.phatnguoi.vn with retry
- 5 retries with increasing delay for rate-limited/overloaded API

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Download photos/documents via Telegram getFile API
- PDF: extract text with markitdown + pdftotext, AI summarize
- Photo: acknowledge receipt (vision support coming)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- send_photo(chat_id, bytes): Send images via multipart upload
- send_document(chat_id, bytes, filename): Send files
- Image OCR with pytesseract for received photos
- PDF processing with markitdown + AI summarize

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When user clicks Connect Hub with a new URL, update all
already-installed MCPs to use the new hub address.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
_call() only parsed SSE data: lines. FastMCP Streamable HTTP
returns plain JSON. Added fallback to parse JSON directly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The cooldown on _last_fail prevented MCP sessions from retrying
after an initial connection failure. Now always retries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Change protocolVersion from 0.1.0 to 2024-11-05 (MCP spec)
- Add notifications/initialized after initialize (required by spec)
- Without these FastMCP streamable HTTP ignores the session

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dict.values() returns dict_values, not list. isinstance check fails
and function returns [] before reaching debug log. Root cause of
mcp_count:0 bug.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DeepSeek requires unique tool names across all MCP servers.
Filter duplicates keeping first occurrence.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Execute multiple MCP tool calls in parallel via ThreadPoolExecutor
- Single tool calls remain sequential (no overhead)
- Deduplicate tool names for DeepSeek compatibility
- Timeout 30s for parallel batch

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add sensor, binary_sensor, scene, script, vacuum, camera, etc.
- Reduce per-domain max to 20 (balanced with 16 domains)
- Guide AI to search by friendly_name (natural language)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
MCP Assist approach: instead of dumping all entities (12K tokens),
provide compact domain summary. AI uses ha_search_entities for
on-demand lookup. Only actionable devices listed directly.
Sensors/binary_sensors shown as name preview only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HA requests only need 3 tools (ha_call_service, ha_search, ha_get_state).
Skip loading 77 MCP tools to reduce context and latency.
Target: 16s → 2-3s for simple commands.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Keywords: bật, tắt, đèn, quạt, nhiệt độ, trạng thái, etc.
HA requests only get 3 tools instead of 80+ → 3-5x faster.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Pre-compute context string once, reuse until TTL expires
- Configurable refresh_interval in HA settings (default 60s)
- On first request: fetch states once, build context, cache result
- Subsequent requests within TTL: return cached string instantly
- Settings key: home_assistant.refresh_interval (seconds)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
TriTue2011 and others added 30 commits May 27, 2026 01:13
Extended file upload bypass to ALL message roles (user, system, tool).
When GetLiveContext returns >80KB of device states, the tool result is
uploaded as a file instead of being head+tail compressed.

Previously only user and system messages were checked. Tool results
from GetLiveContext could be silently truncated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…r truncation

The ChatGPT web backend returns tool calls as XML text in content
(```xml <tool_call>), not native function-call objects. Added XML
parsing in _execute_mcp_tools_in_response so server-side tools
(GetLiveContext, ha_*) are executed instead of passed through.

Also removed the 3000-char truncation on tool results — file upload
bypass now handles large content preservation during re-query.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The chatgpt web backend (free accounts) outputs tool calls as XML text
in content deltas, not native tool_calls objects. _wrap_mcp_stream was
only checking delta.tool_calls, so XML calls were streamed through to
HA as raw text.

Now parses ```xml <tool_call> from the accumulated content, converts to
native tool_calls, strips the XML fence, and feeds into the agentic
execution loop.

This is the PRIMARY code path for chatgpt/auto → free JWT accounts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nstructions

- Skip _prefetch_ha_context_if_needed when inject_ha_context() already
  inserted the device registry (detected by "Device Registry" marker).
  Double context (~98KB) overwhelmed the LLM, causing it to only read
  one device group per response.
- Add SYSTEM OVERRIDE instructions to _build_context() forcing the LLM
  to summarize ALL controllable device groups (lights, fans, ACs, doors,
  switches, locks) instead of cherry-picking 1-2 entities.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previously HA clients with HassTurn* tools got ha_tools=[] and had to
rely on injected context (49KB) which the LLM often only partially read.
Now they get GetLiveContext, ha_search_entities, ha_get_state so the LLM
can query live device state via tool calls, matching Gemini pipeline.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fix(account): hard-pin chatgpt/auto to free-only pool, block codex leaking

get_text_access_token(account_type="free") now skips "free,codex" hybrid
accounts so Codex quota never burns on free-tier traffic (HA voice,
ai_task, n8n). add_accounts_with_type no longer merges "free"+"codex",
keeping the two pools strictly separate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@
fix(ui): reset model numbering per provider section

Compute order number from regularModels (already scoped to current
provider) instead of enabled_models[provider] which may contain entries
from other providers, causing chatgpt numbering to continue from codex.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@
fix(ha): require GetLiveContext tool call instead of answering from static registry

Change _build_context() instructions so the LLM MUST call GetLiveContext
to fetch live device state from HA, matching Gemini pipeline behavior.
The registry is still injected for entity_id lookup (control commands).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@
fix(build): use bun install instead of npm install for web-build stage

npm does not read bun.lock by default, causing sporadic failures when
a dependency is yanked or version resolution drifts. Switch the
web-build stage to bun (canonical lock file format for this project).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant