feat: auto-detect image aspect ratio from prompt#125
Open
TriTue2011 wants to merge 958 commits into
Open
Conversation
Author
|
I have pushed two additional commits to this PR:
|
Dockerfile: Chromium system libs (libatk, libcups, etc.) pyproject.toml: playwright>=1.40, cloudscraper>=1.2 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Use playwright (Chromium) to load phatnguoi.vn, auto-solve Turnstile - Fill form, submit, wait for result, parse violation data - Removes dependency on overloaded api.phatnguoi.vn endpoint Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
t64 suffix not available on python:3.13-slim (Bookworm). Use standard package names. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…lution --with-deps automatically installs correct system packages for Chromium. Avoids manual package name errors across Debian versions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…hread Webhook was hanging for 90-180s waiting for AI response. Now responds instantly (200 OK) and processes AI in daemon thread. Timeout increased to 300s for slow models. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…lback button[type=submit] matched 4 elements, first one invisible. Narrow to #tracuu form context. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Remove playwright + Chromium (~200MB) from Docker image - Use cloudscraper to call api.phatnguoi.vn with retry - 5 retries with increasing delay for rate-limited/overloaded API Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Download photos/documents via Telegram getFile API - PDF: extract text with markitdown + pdftotext, AI summarize - Photo: acknowledge receipt (vision support coming) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- send_photo(chat_id, bytes): Send images via multipart upload - send_document(chat_id, bytes, filename): Send files - Image OCR with pytesseract for received photos - PDF processing with markitdown + AI summarize Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When user clicks Connect Hub with a new URL, update all already-installed MCPs to use the new hub address. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
_call() only parsed SSE data: lines. FastMCP Streamable HTTP returns plain JSON. Added fallback to parse JSON directly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The cooldown on _last_fail prevented MCP sessions from retrying after an initial connection failure. Now always retries. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Change protocolVersion from 0.1.0 to 2024-11-05 (MCP spec) - Add notifications/initialized after initialize (required by spec) - Without these FastMCP streamable HTTP ignores the session Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
dict.values() returns dict_values, not list. isinstance check fails and function returns [] before reaching debug log. Root cause of mcp_count:0 bug. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DeepSeek requires unique tool names across all MCP servers. Filter duplicates keeping first occurrence. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Execute multiple MCP tool calls in parallel via ThreadPoolExecutor - Single tool calls remain sequential (no overhead) - Deduplicate tool names for DeepSeek compatibility - Timeout 30s for parallel batch Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add sensor, binary_sensor, scene, script, vacuum, camera, etc. - Reduce per-domain max to 20 (balanced with 16 domains) - Guide AI to search by friendly_name (natural language) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
MCP Assist approach: instead of dumping all entities (12K tokens), provide compact domain summary. AI uses ha_search_entities for on-demand lookup. Only actionable devices listed directly. Sensors/binary_sensors shown as name preview only. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HA requests only need 3 tools (ha_call_service, ha_search, ha_get_state). Skip loading 77 MCP tools to reduce context and latency. Target: 16s → 2-3s for simple commands. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Keywords: bật, tắt, đèn, quạt, nhiệt độ, trạng thái, etc. HA requests only get 3 tools instead of 80+ → 3-5x faster. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Pre-compute context string once, reuse until TTL expires - Configurable refresh_interval in HA settings (default 60s) - On first request: fetch states once, build context, cache result - Subsequent requests within TTL: return cached string instantly - Settings key: home_assistant.refresh_interval (seconds) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extended file upload bypass to ALL message roles (user, system, tool). When GetLiveContext returns >80KB of device states, the tool result is uploaded as a file instead of being head+tail compressed. Previously only user and system messages were checked. Tool results from GetLiveContext could be silently truncated. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…r truncation The ChatGPT web backend returns tool calls as XML text in content (```xml <tool_call>), not native function-call objects. Added XML parsing in _execute_mcp_tools_in_response so server-side tools (GetLiveContext, ha_*) are executed instead of passed through. Also removed the 3000-char truncation on tool results — file upload bypass now handles large content preservation during re-query. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The chatgpt web backend (free accounts) outputs tool calls as XML text in content deltas, not native tool_calls objects. _wrap_mcp_stream was only checking delta.tool_calls, so XML calls were streamed through to HA as raw text. Now parses ```xml <tool_call> from the accumulated content, converts to native tool_calls, strips the XML fence, and feeds into the agentic execution loop. This is the PRIMARY code path for chatgpt/auto → free JWT accounts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…r multi-step agentic loops
…ive backend compatibility
…thread route through handler chain
…gpt.com provider re-dispatch
…single-response HA constraint)
… _file_upload_store
…for chatgpt free, truncate to 70KB
…cination on random entities
…avoid AI distraction
…owning in GPT-4o-mini
…nstructions - Skip _prefetch_ha_context_if_needed when inject_ha_context() already inserted the device registry (detected by "Device Registry" marker). Double context (~98KB) overwhelmed the LLM, causing it to only read one device group per response. - Add SYSTEM OVERRIDE instructions to _build_context() forcing the LLM to summarize ALL controllable device groups (lights, fans, ACs, doors, switches, locks) instead of cherry-picking 1-2 entities. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previously HA clients with HassTurn* tools got ha_tools=[] and had to rely on injected context (49KB) which the LLM often only partially read. Now they get GetLiveContext, ha_search_entities, ha_get_state so the LLM can query live device state via tool calls, matching Gemini pipeline. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fix(account): hard-pin chatgpt/auto to free-only pool, block codex leaking get_text_access_token(account_type="free") now skips "free,codex" hybrid accounts so Codex quota never burns on free-tier traffic (HA voice, ai_task, n8n). add_accounts_with_type no longer merges "free"+"codex", keeping the two pools strictly separate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> @
fix(ui): reset model numbering per provider section Compute order number from regularModels (already scoped to current provider) instead of enabled_models[provider] which may contain entries from other providers, causing chatgpt numbering to continue from codex. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> @
fix(ha): require GetLiveContext tool call instead of answering from static registry Change _build_context() instructions so the LLM MUST call GetLiveContext to fetch live device state from HA, matching Gemini pipeline behavior. The registry is still injected for entity_id lookup (control commands). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> @
fix(build): use bun install instead of npm install for web-build stage npm does not read bun.lock by default, causing sporadic failures when a dependency is yanked or version resolution drifts. Switch the web-build stage to bun (canonical lock file format for this project). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> @
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cho phép tự động nhận diện tỷ lệ khung hình (ví dụ 16:9, 1:1, 4:3) từ nội dung của prompt để ghi đè lên cấu hình mặc định (khi kích thước bị Home Assistant hoặc các client khác gửi lên mặc định). Điều này cho phép ghi đè tham số \size\ bằng những keyword tìm thấy trong yêu cầu gốc của người dùng.