Native Tool Calling and Model Registry Cleanup and Better Runtime Info#769
Native Tool Calling and Model Registry Cleanup and Better Runtime Info#769Ninot1Quyi wants to merge 61 commits into
Conversation
Updated local model recommendations and installation instructions.
Added a note to avoid using Ollama with local models.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
typo?
Merge PR 756 directly so its stale-provider removal commits remain visible before the branch continues with provider registry cleanup. Constraint: Preserve visible PR merge history for PR 756 Confidence: high Scope-risk: narrow Related: #756 Tested: GLHF deletion conflict resolved by accepting the PR deletion Not-tested: Synthetic provider replacement Co-authored-by: OmX <omx@oh-my-codex.dev>
Model selection now resolves provider ids through a project registry instead of baking endpoint and key details into many one-off model classes. The example registry, profiles, shared transports, and tests move the integration surface toward OpenClaw-style format names while keeping local key material ignored. Constraint: Real provider credentials must stay in ignored llm_providers.json, not in committed examples Rejected: Keep per-provider class copies | duplicated OpenAI-compatible transports made stale providers harder to remove Confidence: medium Scope-risk: broad Directive: Do not reintroduce provider-specific OpenAI-compatible classes unless a provider needs a genuinely different protocol Tested: git diff --cached --check; git grep --cached secret-pattern scan only found placeholders/test refresh_token strings; npm test passed before commit split Not-tested: Live provider matrix after commit split Co-authored-by: OmX <omx@oh-my-codex.dev>
Native-tool capable model responses now flow through command schemas instead of relying on assistant text commands. Default prompts move into markdown files so native-mode hygiene tests can keep legacy command examples out of model-facing context. Constraint: Human !command parsing must remain available while AI responses use native tool calls Rejected: Teach models both native tools and text command syntax | mixed examples caused providers to emit legacy text commands in native mode Confidence: high Scope-risk: moderate Directive: Keep prompt markdown and schema generation aligned when adding or renaming commands Tested: npm test passed 49/49; rg glhf/GLHF/GHLF/glhf.chat returned no matches; git diff --cached --check Not-tested: Live Minecraft server interaction Co-authored-by: OmX <omx@oh-my-codex.dev>
Merge PR 752 first so its original provider-addition commit remains visible before the branch adapts those providers into the shared registry style. Constraint: Preserve visible PR merge history for PR 752 Confidence: high Scope-risk: moderate Related: #752 Tested: PR provider files merged before registry adaptation Not-tested: Live provider calls at merge point Co-authored-by: OmX <omx@oh-my-codex.dev>
The PR added eight OpenAI-compatible providers as separate model classes. This branch now keeps that contribution as provider registry entries so the shared OpenAI completions transport handles them without reintroducing boilerplate adapters. Constraint: New provider support should use llm_providers registry records rather than one class per OpenAI-compatible endpoint Rejected: Keep the PR 752 class files | they duplicate the shared OpenAI-compatible transport and weaken the new configuration style Confidence: high Scope-risk: moderate Related: #752 Tested: npm test passed 50/50; JSON parse check for llm_providers example/local config; git diff --cached --check Not-tested: Live calls to the eight added providers Co-authored-by: OmX <omx@oh-my-codex.dev>
The provider migration notes are useful locally but should not ship with the PR. Stop tracking the docs directory and ignore it so future local notes do not reappear as untracked branch content. Constraint: User requested removing docs from git while preserving the rest of the current branch Confidence: high Scope-risk: narrow Tested: git diff --cached --name-status confirmed only docs removal and .gitignore change were staged Not-tested: Full test suite; no runtime code changed in this commit Co-authored-by: OmX <omx@oh-my-codex.dev>
Native tool calls now stay in structured history with matching tool result turns so follow-up requests can replay the conversation across OpenAI, Responses, Anthropic, Gemini, Codex, and Replicate adapters without losing call/result pairing. Constraint: Model-facing history must preserve native tool call/result structure while keeping human text-command syntax separate Rejected: Store tool results as plain system messages | protocol adapters cannot reliably reconstruct required tool result fields from prose Confidence: high Scope-risk: moderate Directive: When adding provider adapters, route native tool history through the shared native_tools serializers instead of ad hoc message shaping Tested: npm test passed 62/62; git grep --cached -i openclaw returned no matches; staged secret/path checks excluded local key files Not-tested: Live Minecraft server with every provider Co-authored-by: OmX <omx@oh-my-codex.dev>
The default runtime now starts from the root Andy profile and keeps the larger provider catalog as commented presets at the bottom of settings.js. Profiles expose empty embedding/code/vision model selectors so users can discover the shape without enabling extra transports, while empty selectors still fall back to the main model. The same pass removes the obsolete Google Translate dependency and adds JSONL chat-history tracing so prompt, tool-call, tool-result, and compression behavior can be inspected after a run. Constraint: Codex ChatGPT auth must remain project-local and the example auth placeholder stays shortened for first-run login testing. Rejected: Keep five smoke profiles enabled by default | too noisy for normal startup. Rejected: Preserve per-profile Andy reasoning prompts | duplicated global prompt templates and diverged memory language. Confidence: high Scope-risk: moderate Directive: Keep defaults/tasks profile fragments free of selectable model placeholders; only root selectable profiles should expose them. Tested: npm test (65/65); eslint on touched JS/tests; git diff --check Not-tested: Manual Minecraft startup after commit Co-authored-by: OmX <omx@oh-my-codex.dev>
This checkpoint preserves the current native-tool provider registry, project Codex login storage, prompt markdown migration, runtime chat observability, token accounting, state update persistence, and internal coding request isolation before the next message-lifecycle cleanup.\n\nConstraint: User requested committing the current working state before further refactor work.\nRejected: Continue refactoring before a checkpoint | would make regression isolation harder across the broad native-tool changes.\nConfidence: medium\nScope-risk: broad\nDirective: Keep subsequent ReAct message-management changes small and append-only-cache focused.\nTested: npm test (113/113 passing)\nNot-tested: Live Minecraft server and paid provider matrix after this checkpoint
The ReAct request assembly and Runtime trace rendering had grown across several call sites, making it hard to tell whether UI duplication, internal coding requests, or transport metadata were mutating the active conversation prefix. This commit centralizes ReAct message assembly, separates internal coding cache scope from conversation cache scope, records instruction contexts in trace, and moves chat trace projection into a dedicated display-only module. Constraint: Prompt-cache debugging requires trace evidence without logging auth secrets or mutating model history. Constraint: newAction coding requests must remain tool-internal and not become top-level ReAct history. Rejected: Keep fixing individual UI/request edge cases inline | scattered patches made cache-prefix behavior harder to reason about. Rejected: Store runtime instructions only in notepad | user explicitly needed them in trace. Confidence: high Scope-risk: moderate Directive: Do not add transient state or coding requests directly to ReAct history; route outbound user context through ReactMessageManager and keep chat_trace_projector display-only. Tested: node --check src/agent/history.js Tested: node --check src/mindcraft/public/chat_trace_projector.js Tested: npm test (119 passing) Tested: git diff --check
A human !stop command could stop the Mineflayer action while the native tool call remained open in chat history and Runtime UI. Track active native tool call ids, close them with an explicit interrupted tool result before waiting for the action loop to unwind, and synthesize a result for tool calls interrupted before execution starts. Constraint: Native tool protocols require every assistant tool_call to be followed by a tool result before later requests can preserve prefix/cache stability. Rejected: Waiting for ActionManager.stop to finish before writing the result | the UI can keep showing running and the next user message may be sent before closure is visible. Confidence: high Scope-risk: narrow Directive: Do not add alternate UI-only running-state patches; close native tool calls in history so model history, trace UI, and cache prefix all agree. Tested: node --check src/agent/agent.js; node --check src/agent/commands/actions.js; npm test; git diff --check
Kimi's Anthropic-compatible endpoint resolved AAAA records locally, but Node's node-fetch path timed out on IPv6 while curl and IPv4 requests succeeded. Add an explicit provider option for Anthropic transports to force IPv4 and enable it for the Kimi preset so native tool calls can reach the coding endpoint. Constraint: Do not add proxy requirements or global networking side effects; the fix must be provider-scoped and config-driven. Rejected: Adding local proxy settings | would make other users' setups fail and does not address the DNS family mismatch. Confidence: high Scope-risk: narrow Tested: node --check src/models/anthropic_messages.js; npm test; LIVE_FUNCTION_CALL_INCLUDE='kimi' LIVE_FUNCTION_CALL_TIMEOUT_MS=60000 node scripts/smoke/live_function_call_smoke.js; git diff --check
This reverts commit a452546.
Kimi's coding endpoint accepts Chat Completions-style requests when called with coding-agent headers, while the Anthropic SDK path timed out in this environment. The provider registry now models Kimi as an OpenAI-compatible provider and the shared transport can opt into curl without forcing IPv4 or requiring proxy settings. Constraint: Kimi coding endpoint rejects generic clients without a coding-agent User-Agent Rejected: Keep the IPv4 workaround | Kimi CLI does not use forced IPv4 and the user requested reverting it Confidence: high Scope-risk: narrow Tested: node --check src/models/openai_compatible.js Tested: npm test -- tests/llm_providers_config.test.js Tested: LIVE_FUNCTION_CALL_INCLUDE='kimi' LIVE_FUNCTION_CALL_TIMEOUT_MS=60000 node scripts/smoke/live_function_call_smoke.js
The Kimi profile already selected kimi-k2.6, so the shared provider registry now matches that default instead of relying on the endpoint's internal normalization to kimi-for-coding. Constraint: Kimi remains routed through the OpenAI-compatible coding endpoint with coding-agent headers Confidence: high Scope-risk: narrow Tested: git diff --check -- settings_llm_providers.example.json tests/llm_providers_config.test.js Tested: npm test -- tests/llm_providers_config.test.js
Thinking/reasoning output is now captured from each supported protocol, attached to native tool-call turns, replayed where providers require it, and surfaced in runtime chat history without mixing internal coding turns into the ReAct stream. Kimi's OpenAI-compatible endpoint requires reasoning_content to be present when thinking is enabled, so assistant tool-call history now round-trips that field. Constraint: Kimi rejects replayed assistant tool-call messages when thinking is enabled but reasoning_content is absent Constraint: Anthropic thinking blocks must remain signed, so unsigned text is displayed/traced but not synthesized as Anthropic replay blocks Rejected: Store thinking only in UI trace | would not fix provider replay errors or saved history fidelity Confidence: high Scope-risk: moderate Directive: Keep thinking metadata append-only on history turns; do not mutate previous request messages for display-only UI changes Tested: node --check changed JS files Tested: npm test -- tests/native_tools.test.js tests/openai_compatible.test.js tests/codex_chatgpt.test.js tests/chat_history_trace.test.js tests/agent_native_text_policy.test.js tests/memory_summary_tool_history.test.js (runs full suite via package script; 127 passed) Tested: LIVE_FUNCTION_CALL_INCLUDE='kimi' LIVE_FUNCTION_CALL_TIMEOUT_MS=60000 node scripts/smoke/live_function_call_smoke.js Tested: Live two-turn Kimi tool replay smoke with captured reasoning_content Not-tested: Browser visual inspection of the new Thinking details block
Runtime already recorded provider thinking on llm_response and tool_call events, but native tool activity only rendered response-level thinking. Showing tool-call thinking beside the action output makes the field visible where users inspect native tool work without mutating history or request payloads.\n\nConstraint: Existing traces only show thinking when the provider returns reasoning content; Codex/Gemini runs may legitimately have none.\nConfidence: high\nScope-risk: narrow\nDirective: Keep Runtime UI display-only; do not mutate trace/history to synthesize thinking fields.\nTested: npm test -- tests/agent_native_text_policy.test.js (package script ran full suite; 127 passed)\nNot-tested: Manual browser refresh after commit
Concurrent chat messages could enter ReAct handling before the previous LLM response had closed its assistant/tool turns. The second request then used an unstable prefix such as user1,user2, while the eventual history became user1,assistant1,user2, breaking prompt-cache matching. Conversation handling is now FIFO for normal messages while urgent control commands can still bypass to stop the agent.\n\nCompaction also counted compact boundary and summary turns against the next context budget, causing immediate repeated compaction after only one short exchange. The model context now excludes the boundary and the trigger counts only turns added after the latest compact summary.\n\nConstraint: !stop/!stfu/!restart must remain responsive even while a long action or request is active.\nRejected: Cancel in-flight requests on every newer message | providers may not support cancellation and it would leave partially persisted user turns.\nConfidence: high\nScope-risk: moderate\nDirective: Keep all ReAct model requests serialized unless pending turns are made transactionally invisible to shared history.\nTested: node --check src/agent/agent.js && node --check src/agent/history.js\nTested: npm test -- tests/agent_native_text_policy.test.js tests/chat_history_trace.test.js (package script ran full suite; 129 passed)\nNot-tested: Manual browser/game repro after restart
Runtime history loading used only one candidate trace file, so the web UI showed a short slice after restarts or compaction. The loader now merges all per-session trace jsonl files, dedupes the live latest trace, and expands compact archives when trace events reference archived turns.\n\nThe per-agent Disconnect button now sends a targeted stop event to the selected agent socket before falling back to process termination, avoiding broad shutdown behavior from UI disconnects.\n\nConstraint: load_memory=false must still start a fresh UI session without loading saved history.\nRejected: Teach the front-end to recursively fetch archive files | server-side aggregation keeps the browser simple and avoids exposing arbitrary file paths.\nConfidence: high\nScope-risk: moderate\nDirective: Keep chat history aggregation read-only; do not mutate memory or trace files while rendering Runtime.\nTested: node --check src/mindcraft/mindserver.js && node --check src/agent/mindserver_proxy.js\nTested: npm test -- tests/mindserver_chat_history.test.js tests/agent_native_text_policy.test.js (package script ran full suite; 132 passed)\nNot-tested: Manual multi-agent browser disconnect after commit
uukelele
left a comment
There was a problem hiding this comment.
Only managed to view 50/118 files. But overall, I think it is good but I have left some comments for improvement. I will also approve / request more changes when I am able to test this locally.
Co-authored-by: uukelele <robustrobot11@gmail.com>
Thanks for your review!!! |
The provider example should not ship with a project-specific Azure resource or
implicit deployment names. The Azure chat and embedding entries now use clear
placeholders and comments that tell users to supply their own resource endpoint
and deployment names.
Constraint: Review feedback asked that the Azure embedding example not be left ambiguous or pre-filled incorrectly
Confidence: high
Scope-risk: narrow
Tested: node -e "JSON.parse(require('fs').readFileSync('settings_llm_providers.example.json','utf8'))"
Tested: git diff --check -- settings_llm_providers.example.json
Not-tested: Live Azure request with a real deployment
Co-authored-by: OmX <omx@oh-my-codex.dev>
The Azure profile should demonstrate the matching embedding provider instead
of leaving embedding blank. It now points to the Azure embedding provider and
uses a deployment-name placeholder that users replace with their own Azure
OpenAI embedding deployment.
Constraint: User explicitly requested profiles/azure.json prefill the embedding provider
Confidence: high
Scope-risk: narrow
Tested: node -e "JSON.parse(require('fs').readFileSync('profiles/azure.json','utf8'))"
Tested: git diff --check -- profiles/azure.json
Not-tested: Live Azure embedding request with a real deployment
Co-authored-by: OmX <omx@oh-my-codex.dev>
Bring in the Andy 4.2 profile and README guidance from the upstream PR while
converting the profile to this branch's provider-registry schema. LM Studio is
registered as a local OpenAI-compatible provider/embedding provider, and the
OpenAI-compatible adapter now respects an explicit null keyName for no-key
local servers.
Constraint: Preserve the PR merge record and contributor commits instead of squashing
Constraint: Profiles must use the local provider/model schema, not the PR's legacy api/model shorthand
Rejected: Keep the PR profile format unchanged | it bypasses the provider registry used by this branch
Confidence: high
Scope-risk: moderate
Tested: node -e "for (const f of ['settings_llm_providers.example.json','settings_llm_providers.json','profiles/andy-4.2.json']) if (require('fs').existsSync(f)) JSON.parse(require('fs').readFileSync(f,'utf8'))"
Tested: MINDCRAFT_LLM_PROVIDERS_PATH=settings_llm_providers.example.json node --input-type=module provider resolution smoke
Tested: npx eslint src/models/_model_map.js src/models/openai_compatible.js
Tested: git diff --check
Not-tested: Live LM Studio Andy 4.2 request; requires local LM Studio server and model
Co-authored-by: OmX <omx@oh-my-codex.dev>
The dev native-tool loop is a local smoke utility and should not be versioned with the shared repository. Removing it from the index keeps existing local copies available without making the script part of future checkouts. Constraint: User requested removing scripts/smoke/dev_native_tool_loop.js from git without adding a gitignore rule Confidence: high Scope-risk: narrow Reversibility: clean Directive: Keep this script local unless it becomes a documented shared smoke test Tested: Verified cached diff contains only scripts/smoke/dev_native_tool_loop.js deletion Not-tested: No runtime tests needed for index-only removal Co-authored-by: OmX <omx@oh-my-codex.dev>
The Azure profile now points users at the provider registry template for chat and embedding deployment endpoint configuration. This keeps profile selection concise while making the Azure baseUrl/defaultModel setup path visible where users first inspect the profile.
Constraint: profiles are parsed as strict JSON, so guidance must be a JSON field instead of a comment token
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep Azure deployment names and endpoints in settings_llm_providers*.json rather than hardcoding user-specific values in profiles/azure.json
Tested: node -e JSON.parse(require('fs').readFileSync('profiles/azure.json','utf8'))
Not-tested: Live Azure request; documentation-only profile guidance
Co-authored-by: OmX <omx@oh-my-codex.dev>
|
Happy to test this if needed. |
Yeah you can try to test this and feedback the results. See if there are any bugs or edge cases that worked before. |
Thanks so much! If you find any issues, please feel free to pull this branch and make fixes on top of it. If you need test keys, you can reach me on Discord or by email, and I’d be happy to provide access to some model subscriptions for testing. |
Verified functionality
Update
settings_llm_providers.json, including Kimi, MiniMax, OpenRouter, Gemini, Codex, Replicate, and other presets.profiles/defaults/prompts/for easier review and prompt maintenance.newAction/coding requests independent from the main conversation request path so coding does not pollute conversation cache/state.botResponder) that forks the current conversation context instead of rebuilding or replacing the system prompt.This branch has merged the following 4 PRs:
#680
#756
#752
#744
Merging this branch will resolve a total of 5 PRs, including the current one.
Assisted by Codex GPT-5.5 xhigh & Oh-My-Codex, and has already gone through code quality checks and optimization.
More stronger compact and ui