Skip to content

feat(apple-silicon): Neural Memory native macOS support#3

Open
itsXactlY wants to merge 1 commit into
masterfrom
feat/apple-silicon-support
Open

feat(apple-silicon): Neural Memory native macOS support#3
itsXactlY wants to merge 1 commit into
masterfrom
feat/apple-silicon-support

Conversation

@itsXactlY
Copy link
Copy Markdown
Owner

Adds first-class Apple Silicon (M1-M4) support for Neural Memory embeddings and C++ SIMD acceleration. Previously fell back to weak TF-IDF+SVD (384d) on Mac — now uses bge-m3 (1024d) via Metal GPU.

embed_provider.py:

  • MPS (Metal Performance Shaders) GPU detection, device priority CUDA > MPS > CPU
  • EMBED_MODEL env var for model selection
  • Unified memory: no VRAM check needed on Mac

cpp_bridge.py:

  • Platform-aware .dylib/.so library loading
  • Graceful fallback with macOS build instructions

neural_memory.py:

  • Dynamic dim from embedder (replaces hardcoded dim=384)

C++ SIMD (simd.h, simd_engine.cpp):

  • ARM NEON intrinsics for all 8 SIMD functions (dot_product, cosine_similarity, l2_norm, add, hadamard, scale, fmadd, weighted_add, zero)
  • 3-way dispatch: AVX2 (x86) > NEON (ARM64) > scalar fallback
  • CPUID guarded behind x86 platform check

CMakeLists.txt:

  • Apple Silicon detection (arm64 AND APPLE)
  • ARM: -march=armv8-a+simd
  • x86: -march=x86-64-v2 -mavx2 -mfma
  • Conditional linking (skip pthread/stdc++ on macOS)

Adds first-class Apple Silicon (M1-M4) support for Neural Memory
embeddings and C++ SIMD acceleration. Previously fell back to weak
TF-IDF+SVD (384d) on Mac — now uses bge-m3 (1024d) via Metal GPU.

embed_provider.py:
- MPS (Metal Performance Shaders) GPU detection, device priority CUDA > MPS > CPU
- EMBED_MODEL env var for model selection
- Unified memory: no VRAM check needed on Mac

cpp_bridge.py:
- Platform-aware .dylib/.so library loading
- Graceful fallback with macOS build instructions

neural_memory.py:
- Dynamic dim from embedder (replaces hardcoded dim=384)

C++ SIMD (simd.h, simd_engine.cpp):
- ARM NEON intrinsics for all 8 SIMD functions (dot_product,
  cosine_similarity, l2_norm, add, hadamard, scale, fmadd,
  weighted_add, zero)
- 3-way dispatch: AVX2 (x86) > NEON (ARM64) > scalar fallback
- CPUID guarded behind x86 platform check

CMakeLists.txt:
- Apple Silicon detection (arm64 AND APPLE)
- ARM: -march=armv8-a+simd
- x86: -march=x86-64-v2 -mavx2 -mfma
- Conditional linking (skip pthread/stdc++ on macOS)
ernes-toe added a commit to ernes-toe/neural-memory that referenced this pull request May 1, 2026
Reviewer itsXactlY#3 caught a recurring bug: my fix in bcd72db was a forward-
guard only. Hermes (running at PID 55181 since 12:47, was 19835 at
session start) hadn't reloaded the updated memory_client.py module, so
its in-memory copy still inserted entity rows into FTS5. Result: 8
stale entity rows in the FTS index by review-time (was 6 at original
audit; +2 from continued hermes saves).

The forward-guard is right but insufficient when long-running processes
hold stale code. This commit adds a self-healing defensive cleanup:
SchemaUpgrade._ensure_fts5() now DELETEs any kind='entity' rows from
memories_fts on every invocation. Combined with the SQLiteStore.__init__
hook from P7C2, this means every fresh NeuralMemory() instance cleans
the index. Backfill is also now kind-aware (skips entities).

python/schema_upgrade.py: _ensure_fts5() extended with defensive DELETE
+ kind-filtered backfill. ~10 LOC added.

Verified on live DB: ran schema_upgrade.py against ~/.neural_memory/
memory.db; sync delta dropped 8 → 0. Tests still pass (5/5 schema +
10/10 sparse_temporal).

Trade-off: defensive DELETE on every init is O(entity_count) extra
work — negligible at AE scale (few entities ever).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ernes-toe added a commit to ernes-toe/neural-memory that referenced this pull request May 2, 2026
Per Sonnet code review of HEAD~6..HEAD this session.

CRITICAL itsXactlY#1: memory_client.py:710 — HNSW skip-reload now also requires
disk file mtime to match what we tracked at last load. Without this,
a separate process (MCP server, ingest cron, bench) that persists
new HNSW state would be ignored — we'd silently serve stale dense
results across plugin/cron/MCP-server boundaries.

CRITICAL itsXactlY#2: cleanup_onedrive_dupes.py:53 — added WHERE h IS NOT NULL
to GROUP BY query. Without it, all rows missing content_hash get
lumped into one "NULL group" and treated as duplicates. Live
2026-05-02 run was lucky; future re-runs against partially-tagged
sources would silently delete distinct content.

MEDIUM itsXactlY#3: nm_recall_mcp.py — flipped use_hnsw=False → True. Without
HNSW the MCP server linear-scans 12k+ memories per dense channel
call, blowing up p50. R@5=0.82 bench config requires HNSW.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ernes-toe added a commit to ernes-toe/neural-memory that referenced this pull request May 2, 2026
Per first-run reconciliation reviewer findings on python/mssql_store.py:

LOW itsXactlY#1 [smell]: get_all() is unbounded
Added default limit=100_000 with explicit TOP N in SQL. Different
shape from SQLiteStore.get_all() (row-iterator) but MSSQL's pyodbc
fetchall() materializes all rows into RAM, so the cap matters for
big substrates. Caller can bump limit explicitly if needed.

LOW itsXactlY#2 [dead]: CREATE DATABASE block in SCHEMA_SQL
Removed the IF NOT EXISTS CREATE DATABASE NeuralMemory + USE
NeuralMemory + GO blocks. _ensure_schema explicitly skipped them
via 'GO not in stmt and CREATE DATABASE not in stmt' guard, so
they never executed. Misleading dead code; simplified.

LOW itsXactlY#3 [smell]: Now-redundant filter conditions
Since CREATE DATABASE + GO are gone from SCHEMA_SQL, the filter
becomes just `if stmt:`. Simplified _ensure_schema accordingly.

Plus inline note about the autocommit=True / .commit() pattern
(reviewer's other [smell] finding) — kept the .commit() calls as
readability anchors but documented they're no-ops.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ernes-toe added a commit to ernes-toe/neural-memory that referenced this pull request May 2, 2026
…ctlY#3)

Per holistic-reviewer-round-1 finding: nm_recall_mcp.py was 365 LOC
of cross-agent JSON-RPC surface (exposed to Codex + Hermes + Claude
Code) shipped with zero tests. Closes the coverage gap.

Smoke contracts:
- test_initialize_returns_protocol_version
- test_tools_list_returns_five_tools (nm_recall, nm_sparse_search,
  nm_remember, nm_status, nm_audit)
- test_nm_status_tool_call_returns_substrate_stats (verifies new
  memories_fts_count + non_entity_memories_count fields from
  reviewer-round-6 reshape)
- test_unknown_tool_returns_jsonrpc_error (-32601)
- test_unknown_method_returns_jsonrpc_error (-32601)

Tests skip cleanly if substrate/script not present (CI-safe).
Subprocess-based: spawns the actual MCP server stdio protocol and
verifies real responses, not just imported logic.

Closes the cross-agent surface coverage gap before peers actually
adopt nm-recall in their MCP configs (Valiendo just ACK'd registering
it in her Hermes profile via approval_7bf71a60).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ernes-toe added a commit to ernes-toe/neural-memory that referenced this pull request May 3, 2026
…ion (S2 + Opus race fix)

Canonical DB had PRAGMA user_version=0 and no evidence-identity authority.
record_evidence_artifact previously deduped via JSON-scan of metadata —
not a real DB-level guard.

S2 packet (NM-builder lane, dispatched by Opus):
- python/schema_upgrade.py: additive evidence_ledger table
  (evidence_id PK, memory_id, evidence_type, source_system,
  source_record_id, status, inserted_at, updated_at, metadata_hash) with
  UNIQUE indexes on (source_system, source_record_id) and
  (evidence_type, source_record_id). PRAGMA user_version 0 → 1.
  Migration is idempotent (CREATE IF NOT EXISTS, no DROP/ALTER).
- python/ae_workflow_helpers.py:
    * _ledger_reserve uses INSERT OR IGNORE for atomic claim.
    * Winner: mem.remember() then _ledger_set_memory_id patches in.
    * Loser: re-reads ledger; if memory_id NULL still, falls back to
      legacy json_extract path (or fresh remember as last resort).
    * Helper return shape {memory_id, evidence_id, inserted} preserved
      exactly across all 4 evidence helpers.
    * Pre-upgrade DBs and non-SQLite stores transparently fall back to
      legacy json_extract scan.

Opus race-fix follow-up (commit-time): S2's loser path returned None
when memory_id was still NULL (winner mid-flight), which made all 8
threads in the race test fall through to mem.remember() — exposing a
non-thread-safe iteration in HNSW/connection_graph internals (RuntimeError:
dictionary changed size during iteration). Wrapping the full pipeline
in store._lock would deadlock since mem.remember re-acquires the same
non-reentrant Lock internally. Fix: loser polls _ledger_lookup with
40 × 25ms (1s budget), releasing store._lock between polls so the winner
can complete; falls through to a fresh remember only if the budget
exhausts. Race test now passes consistently.

Tests: 12 schema + 26 evidence (incl. 8-thread race test) +
26 sent-pdf consumer = 64/64 pass.

Closes LIVE_FEED Active P0 itsXactlY#3 (REPO_DB_CONTRACT_GAP — evidence
identity DB guard).

Schema upgrade is NOT auto-applied to canonical DB. Tito ACK gates
that. Pre-existing JSON-scan path remains for un-upgraded DBs.

Synth contract: LIVE_FEED 2026-05-03T10:41:47Z, S2 dispatch.
Evidence packet: ~/.neural_memory/sonnet-packets/2026-05-03/S2-result.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant