Skip to content

Jarvis 2026-05 hardening + personalization#369

Open
machj8968-lab wants to merge 12 commits into
open-jarvis:mainfrom
machj8968-lab:feat/jarvis-hardening-2026-05
Open

Jarvis 2026-05 hardening + personalization#369
machj8968-lab wants to merge 12 commits into
open-jarvis:mainfrom
machj8968-lab:feat/jarvis-hardening-2026-05

Conversation

@machj8968-lab
Copy link
Copy Markdown

Summary

Two-phase work landed on this branch:

Phase A — Hardening (P0/P1 fixes from a full health audit)

  • jarvis ask now respects config.agent.default_agent instead of silently bypassing the configured agent / system prompt / tools / memory when --agent is omitted.
  • Extracted prompt middleware (DateTimeInjector) from BaseAgent._build_messages; timezone configurable, no longer hard-coded Asia/Taipei; no more double SYSTEM message when context already supplies one.
  • Server streaming now routes through the agent bridge when the server-side agent has tools (the frontend never sends tools, so previously every Web/Desktop chat silently bypassed tools / skills / web_search). Direct stream also injects default_system_prompt + middleware when no SYSTEM is present.
  • Centralised runtime_tools factory used by serve.py and SystemBuilder; tool resolution order is now override -> config.tools.enabled -> config.agent.tools -> defaults, and SkillManager is mounted by default.
  • /v1/skills is real: list returns manifest metadata, install writes manifests and re-discovers, delete removes the file.
  • jarvis chat agent mode now carries conversation history, so multi-turn follow-ups stay coherent.
  • Skill arguments_template rendering is JSON-safe (quotes / newlines in values no longer break json.loads).
  • Hooked up the Rust extension via brew toolchain + maturin (memory / security / sqlite test failures gone).
  • frontend/ npm audit fix — 13 vulns (7 high) → 0.
  • whatsapp_baileys_bridge/ npm audit fix + Baileys 7.x API compat — 5 vulns (1 critical) → 0.
  • .gitignore cleanup; dropped stray test artifacts (Inline, :memory:, MagicMock/, ~/); ignored JarvisNative/ (Swift HUD subproject with nested .git).

Phase B — Personalization ("more aware of who it's talking to")

  • ~/.openjarvis/USER.md — structured markdown profile, Identity / Preferences / Facts / Relations / Notes sections, key-prefix routing (user.*, pref.*, fact.*, relation.*).
  • ProfileConsolidator — folds memory backend rows AND the legacy user_facts SQLite table into the profile; latest-write-wins per key.
  • Three prompt-middleware injectors auto-wired into every agent call:
    • ProfileInjector — appends `[我知道關於你]` block from USER.md.
    • ToolAffinityInjector — appends `[你常用的工具]` block from a SQLite usage tracker.
    • SessionRecallInjector — pulls relevant past turns from SessionStore (CJK-bigram + ASCII token overlap scoring; per-query opt-in).
  • ToolAffinityTracker subscribes to TOOL_CALL_END events on every ask / chat / serve entry point so the tracker self-populates.
  • New jarvis profile group: show / rebuild / edit / clear / tools.

Privacy: every personalisation path is local-only. No cloud calls. Profile lives in ~/.openjarvis/; tracker DB in ~/.openjarvis/tool_affinity.db.

Test plan

  • uv run ruff check src/ tests/ — all checks passed.
  • uv run pytest tests/ (excluding live / cloud / docker / connectors) — 6211 passed, 53 skipped, 0 failed (was 25 failed before this branch).
  • npm audit in frontend/0 vulnerabilities (was 13).
  • npm audit in whatsapp_baileys_bridge/0 vulnerabilities (was 5).
  • npm run build in frontend/ — green.
  • npm run build in whatsapp_baileys_bridge/ — green.
  • uv run maturin develop --release for the Rust extension — green.
  • jarvis profile rebuild against a real memory.db with seeded user_facts — picks up rows correctly.
  • jarvis profile show — renders USER.md.
  • jarvis profile tools — renders ranked tool usage table.

What this PR intentionally does NOT touch

  • Pre-existing frontend UI WIP (ChatArea.tsx, InputArea.tsx, MessageBubble.tsx, Layout.tsx, Sidebar.tsx, index.css, tsconfig.tsbuildinfo, JarvisCore.tsx) — left for the UI PR.
  • CSP unsafe-inline / unsafe-eval tightening — separate follow-up.
  • Bundle code-splitting (main chunk 926KB) — separate follow-up.
  • _build_messages wiring SessionRecallInjector to live query — recaller is implemented + tested, just needs the runtime hand-off (next PR).

Commits (12)

d4a3b3e8 feat(personalization): wire injectors into prompt middleware + jarvis profile CLI
13c9e9cb feat(personalization): UserProfile + memory.db consolidator (local-only)
7ced7d2a chore(deps): npm audit fix for frontend and WhatsApp Baileys bridge
0db64a9d fix(skills): JSON-safe placeholder rendering in arguments_template
c6d6bb75 fix(cli): jarvis chat carries conversation history into the agent
b2727fae feat(server): wire /v1/skills routes to the live SkillManager
883c8562 fix(server): route streaming chat through agent bridge when tools are configured
17f05048 refactor(system): shared runtime tool/skill factory used by serve and SystemBuilder
6b3bc740 feat(cli): jarvis ask falls back to config.agent.default_agent when --agent omitted
9b7c7a6e feat(agents): extract prompt middleware with configurable datetime injection
d18c81fd fix(cli): drop extraneous f-string prefix and wrap long line in hints
37254fec chore: gitignore stray test artifacts, JarvisNative subproject, drop Inline

🤖 Generated with Claude Code

machj8968-lab and others added 12 commits May 20, 2026 16:40
…Inline

Adds .gitignore patterns for repo-root junk that accumulated from
misconfigured tests/scripts (":memory:", MagicMock/, ~/) and ignores
the JarvisNative/ Swift HUD subproject (which carries its own nested
.git). The orphaned Inline entry was already staged for deletion; this
commit lands both together.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves ruff F541 and E501 in src/openjarvis/cli/hints.py:23.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…jection

BaseAgent._build_messages used to hard-code Asia/Taipei datetime
injection and stacked a second SYSTEM message even when the upstream
context already supplied one. That made open-source / multi-timezone
use awkward and broke a handful of unit tests after the user customised
their default_system_prompt.

This commit:

* Adds src/openjarvis/agents/prompt_middleware.py with a composable
  chain (DateTimeInjector is the only default). Timezone is read from
  config.agent.datetime_timezone (default Asia/Taipei), and the chain
  short-circuits on None so we never append a synthetic SYSTEM message
  on top of a context-supplied one.
* Adds AgentConfig.inject_datetime (default True) and
  AgentConfig.datetime_timezone fields.
* Updates BaseAgent to call apply_chain only when emitting its own
  SYSTEM message.
* Adds 5 unit tests for the middleware and converts the existing
  base-agent tests to mock load_config so they no longer fail when the
  developer's ~/.openjarvis/config.toml diverges from JarvisConfig
  defaults.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-agent omitted

Previously, omitting --agent always took the direct-to-engine path,
which silently bypassed the configured agent (and therefore the system
prompt, tools, memory context, capability policy). Users had to remember
to pass --agent simple every time they wanted their persona honoured.

New contract: --agent unspecified -> use config.agent.default_agent;
--agent "" -> opt out into direct mode explicitly.

Updates:

* src/openjarvis/cli/ask.py: documents the contract; preserves direct
  mode behind --agent "".
* tests/cli/test_ask_router.py: rewrites the four existing model-
  resolution tests to mock load_config with default_agent="" so the
  direct path is what's under test, plus adds TestAskAgentDefault
  (two tests) covering the new fallback and explicit override.
* tests/cli/test_ask_e2e.py / tests/telemetry/test_energy_wiring.py:
  set cfg.agent.default_agent = "" in the shared test config builder
  so the engine-wiring assertions stay valid.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… SystemBuilder

serve.py had two duplicate ~40-line blocks that instantiated tools
from config.agent.tools, ignored config.tools.enabled (the canonical
source), and never loaded SkillManager. SystemBuilder had a third
variant that did read config.tools.enabled but with subtle
differences. Three call-sites, three behaviours, three opportunities
to drift apart.

This commit:

* Adds src/openjarvis/system/runtime_tools.py exposing
  resolve_tool_names() and build_runtime_tools(). Resolution order:
  explicit override -> config.tools.enabled -> config.agent.tools
  -> DEFAULT_TOOL_NAMES. Tool instances get correct dependency
  injection (engine/model for llm, memory_backend for retrieval,
  channel for channel_*).
* When config.skills.enabled, discovers Skills from
  config.skills.skills_dir (+ ./skills workspace overlay) and
  surfaces them as additional BaseTool wrappers.
* serve.py: both the API agent and the channel agent now go through
  this factory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… configured

The frontend sends streaming /v1/chat/completions requests with only
{model, messages, stream, temperature, max_tokens} — no tools field —
so the previous condition (request_body.tools truthy) sent every Web /
Desktop chat directly to engine.stream(), silently bypassing the agent
even when the server was started with one. That meant tools, skills,
memory and web_search never reached the main chat path.

This commit changes the streaming router to take the agent bridge
whenever the server-side agent itself has tools wired, regardless of
whether the request carries a tools array. Yes, the bridge runs
agent.run() synchronously and word-splits the result, but losing
capability is worse than losing token-level smoothness.

When no agent tools are present and we keep the direct stream, the
handler now prepends config.agent.default_system_prompt (with the
prompt-middleware chain applied) if the request lacks a SYSTEM
message — so plain chat is still grounded in the configured persona
and current date/time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The /v1/skills routes were placeholders — list returned bare registry
keys (no manifest metadata), install returned not_implemented, delete
returned not_implemented.

This commit:

* Threads SkillManager through create_app(skill_manager=...) and
  stores it on app.state.skill_manager. serve.py passes the manager
  built by the runtime_tools factory.
* GET /v1/skills returns name + description + step count from the
  manager, falling back to SkillRegistry if no manager is wired.
* POST /v1/skills accepts {"name": ..., "manifest": ..., "format":
  "toml"|"md"}, writes the file into config.skills.skills_dir, and
  re-runs discover() so the new skill becomes immediately resolvable.
  Rejects path-traversal-shaped names.
* DELETE /v1/skills/{name} removes the manifest file from skills_dir
  and drops the entry from the live manager.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chat REPL kept extending its local 'history' list but called
agent.run(user_input) without context, so the agent treated every turn
as the first turn. A follow-up question like "and what about the other
one?" was indistinguishable from a fresh prompt — the agent had to
guess from scratch.

This commit packages the history (minus the just-appended user turn,
which the agent appends itself) into AgentContext.conversation and
passes it via agent.run(user_input, context=ctx). Multi-turn chat
remains coherent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SkillExecutor._render_template returned raw string values into a JSON
template, so a value containing a double quote or newline produced
invalid JSON and the subsequent json.loads(rendered_args) crashed
silently — the skill step would fail with "Template rendering error".

This commit:

* Switches every substitution to json.dumps(val, ensure_ascii=False).
  For string values we strip the outer quotes because templates wrap
  string placeholders in their own quotes by convention
  ('{"query": "{query}"}'). For non-strings the encoded form (numbers,
  arrays, objects) is emitted as-is.
* Preserves the original placeholder when the key is missing from
  context, so callers can detect unset keys.
* Adds 5 regression tests in tests/skills/test_executor_template.py
  covering double quotes, newlines, numbers, arrays, and missing keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both Node subprojects shipped with known-vulnerable transitive deps.
This commit lands the audit-fix bump and the one source change Baileys
7.x required.

frontend/ (was 13 vulns — 7 high, 6 moderate; now 0):
* Vite 6 -> 7 (GHSA-4w7w-66w2-5vf9 path traversal in .map handling,
  GHSA-p9ff-h696-f583 arbitrary file read over dev-server WS).
* @rollup/plugin-terser bump pulls serialize-javascript out of the
  RCE-via-RegExp.flags / DoS via array-like range.
* postcss 8.5.10 (XSS via unescaped </style> in CSS stringify).
* @hono/node-server 1.19.13 (middleware bypass via repeated slashes).

src/openjarvis/channels/whatsapp_baileys_bridge/ (was 5 vulns — 1
critical, 2 high, 2 moderate; now 0):
* @whiskeysockets/baileys 7.0.0-rc11 — pulls in a non-vulnerable
  libsignal-node / protobufjs (GHSA-xq3m-2v4x-88gg arbitrary code
  execution and seven other protobuf.js advisories).
* ws 8.20.x (GHSA-58qx-3vcg-4xpx uninitialised memory disclosure).
* src/bridge.ts: Baileys 7.x removed DisconnectReason.unknown; the
  close handler now falls back to statusCode 0 (transient -> reconnect)
  when no statusCode is present, matching the previous behaviour.

Both subprojects' build commands (vite build, tsc) pass after the
upgrade.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the foundation for "Jarvis remembers you": a structured USER.md
file persisted at ~/.openjarvis/USER.md, plus a consolidator that
folds memory backend rows (and the legacy user_facts SQLite table)
into it.

* src/openjarvis/personalization/profile.py — UserProfile dataclass
  with Identity / Preferences / Facts / Relations / Notes sections,
  markdown round-trip parsing, and key-prefix routing
  (user.* -> Identity, pref.* -> Preferences, etc).
* src/openjarvis/personalization/consolidator.py — scans either a
  MemoryBackend.all_documents() enumerable, falls back to broad
  prefix-token retrieves, and additionally probes ``user_facts``
  rows in memory.db so existing setup-seeded facts surface. Latest
  write wins per key.
* src/openjarvis/core/config.py — adds AgentConfig.inject_profile and
  profile_path knobs (defaults to enabled, path
  ~/.openjarvis/USER.md).
* tests/personalization/test_profile.py + test_consolidator.py — 11
  unit tests covering profile parsing, key-prefix routing, latest-
  wins dedup, retrieval-result handling, and missing-DB fallback.

Privacy: nothing leaves the box. The consolidator only reads, never
writes back to memory.db, and never touches the network.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… profile CLI

Builds on the previous commit by injecting personalisation into every
agent invocation:

* prompt_middleware.build_default_middleware() now appends
  ProfileInjector (USER.md "[我知道關於你]" block) and
  ToolAffinityInjector ("[你常用的工具]" block) when their respective
  config.agent.inject_* flags are on (default true).
* Adds wire_tool_affinity(bus) hooks in ask.py, chat_cmd.py, and
  serve.py so TOOL_CALL_END events feed the affinity tracker.
* New ``jarvis profile`` group:
    show     — render USER.md
    rebuild  — re-consolidate from memory.db
    edit     — open USER.md in $EDITOR
    clear    — wipe contents
    tools    — display ranked tool usage with success rate
* Updates base-agent tests' isolated_config fixture to also disable
  inject_profile / inject_tool_affinity so SYSTEM message assertions
  stay focused on _build_messages itself.
* Splits the prompt-middleware flag test so each injector is verified
  independently.

This makes Jarvis materially "more aware" of who it's talking to:
every chat carries the user's known facts, preferences, and tool
habits into the SYSTEM message — without any of that ever leaving
the box.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants