feat(apple-silicon): Neural Memory native macOS support by itsXactlY · Pull Request #3 · itsXactlY/mazemaker

itsXactlY · 2026-04-17T09:26:12Z

Adds first-class Apple Silicon (M1-M4) support for Neural Memory embeddings and C++ SIMD acceleration. Previously fell back to weak TF-IDF+SVD (384d) on Mac — now uses bge-m3 (1024d) via Metal GPU.

embed_provider.py:

MPS (Metal Performance Shaders) GPU detection, device priority CUDA > MPS > CPU
EMBED_MODEL env var for model selection
Unified memory: no VRAM check needed on Mac

cpp_bridge.py:

Platform-aware .dylib/.so library loading
Graceful fallback with macOS build instructions

neural_memory.py:

Dynamic dim from embedder (replaces hardcoded dim=384)

C++ SIMD (simd.h, simd_engine.cpp):

ARM NEON intrinsics for all 8 SIMD functions (dot_product, cosine_similarity, l2_norm, add, hadamard, scale, fmadd, weighted_add, zero)
3-way dispatch: AVX2 (x86) > NEON (ARM64) > scalar fallback
CPUID guarded behind x86 platform check

CMakeLists.txt:

Apple Silicon detection (arm64 AND APPLE)
ARM: -march=armv8-a+simd
x86: -march=x86-64-v2 -mavx2 -mfma
Conditional linking (skip pthread/stdc++ on macOS)

Adds first-class Apple Silicon (M1-M4) support for Neural Memory embeddings and C++ SIMD acceleration. Previously fell back to weak TF-IDF+SVD (384d) on Mac — now uses bge-m3 (1024d) via Metal GPU. embed_provider.py: - MPS (Metal Performance Shaders) GPU detection, device priority CUDA > MPS > CPU - EMBED_MODEL env var for model selection - Unified memory: no VRAM check needed on Mac cpp_bridge.py: - Platform-aware .dylib/.so library loading - Graceful fallback with macOS build instructions neural_memory.py: - Dynamic dim from embedder (replaces hardcoded dim=384) C++ SIMD (simd.h, simd_engine.cpp): - ARM NEON intrinsics for all 8 SIMD functions (dot_product, cosine_similarity, l2_norm, add, hadamard, scale, fmadd, weighted_add, zero) - 3-way dispatch: AVX2 (x86) > NEON (ARM64) > scalar fallback - CPUID guarded behind x86 platform check CMakeLists.txt: - Apple Silicon detection (arm64 AND APPLE) - ARM: -march=armv8-a+simd - x86: -march=x86-64-v2 -mavx2 -mfma - Conditional linking (skip pthread/stdc++ on macOS)

Reviewer itsXactlY#3 caught a recurring bug: my fix in bcd72db was a forward- guard only. Hermes (running at PID 55181 since 12:47, was 19835 at session start) hadn't reloaded the updated memory_client.py module, so its in-memory copy still inserted entity rows into FTS5. Result: 8 stale entity rows in the FTS index by review-time (was 6 at original audit; +2 from continued hermes saves). The forward-guard is right but insufficient when long-running processes hold stale code. This commit adds a self-healing defensive cleanup: SchemaUpgrade._ensure_fts5() now DELETEs any kind='entity' rows from memories_fts on every invocation. Combined with the SQLiteStore.__init__ hook from P7C2, this means every fresh NeuralMemory() instance cleans the index. Backfill is also now kind-aware (skips entities). python/schema_upgrade.py: _ensure_fts5() extended with defensive DELETE + kind-filtered backfill. ~10 LOC added. Verified on live DB: ran schema_upgrade.py against ~/.neural_memory/ memory.db; sync delta dropped 8 → 0. Tests still pass (5/5 schema + 10/10 sparse_temporal). Trade-off: defensive DELETE on every init is O(entity_count) extra work — negligible at AE scale (few entities ever). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per Sonnet code review of HEAD~6..HEAD this session. CRITICAL itsXactlY#1: memory_client.py:710 — HNSW skip-reload now also requires disk file mtime to match what we tracked at last load. Without this, a separate process (MCP server, ingest cron, bench) that persists new HNSW state would be ignored — we'd silently serve stale dense results across plugin/cron/MCP-server boundaries. CRITICAL itsXactlY#2: cleanup_onedrive_dupes.py:53 — added WHERE h IS NOT NULL to GROUP BY query. Without it, all rows missing content_hash get lumped into one "NULL group" and treated as duplicates. Live 2026-05-02 run was lucky; future re-runs against partially-tagged sources would silently delete distinct content. MEDIUM itsXactlY#3: nm_recall_mcp.py — flipped use_hnsw=False → True. Without HNSW the MCP server linear-scans 12k+ memories per dense channel call, blowing up p50. R@5=0.82 bench config requires HNSW. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per first-run reconciliation reviewer findings on python/mssql_store.py: LOW itsXactlY#1 [smell]: get_all() is unbounded Added default limit=100_000 with explicit TOP N in SQL. Different shape from SQLiteStore.get_all() (row-iterator) but MSSQL's pyodbc fetchall() materializes all rows into RAM, so the cap matters for big substrates. Caller can bump limit explicitly if needed. LOW itsXactlY#2 [dead]: CREATE DATABASE block in SCHEMA_SQL Removed the IF NOT EXISTS CREATE DATABASE NeuralMemory + USE NeuralMemory + GO blocks. _ensure_schema explicitly skipped them via 'GO not in stmt and CREATE DATABASE not in stmt' guard, so they never executed. Misleading dead code; simplified. LOW itsXactlY#3 [smell]: Now-redundant filter conditions Since CREATE DATABASE + GO are gone from SCHEMA_SQL, the filter becomes just `if stmt:`. Simplified _ensure_schema accordingly. Plus inline note about the autocommit=True / .commit() pattern (reviewer's other [smell] finding) — kept the .commit() calls as readability anchors but documented they're no-ops. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ctlY#3) Per holistic-reviewer-round-1 finding: nm_recall_mcp.py was 365 LOC of cross-agent JSON-RPC surface (exposed to Codex + Hermes + Claude Code) shipped with zero tests. Closes the coverage gap. Smoke contracts: - test_initialize_returns_protocol_version - test_tools_list_returns_five_tools (nm_recall, nm_sparse_search, nm_remember, nm_status, nm_audit) - test_nm_status_tool_call_returns_substrate_stats (verifies new memories_fts_count + non_entity_memories_count fields from reviewer-round-6 reshape) - test_unknown_tool_returns_jsonrpc_error (-32601) - test_unknown_method_returns_jsonrpc_error (-32601) Tests skip cleanly if substrate/script not present (CI-safe). Subprocess-based: spawns the actual MCP server stdio protocol and verifies real responses, not just imported logic. Closes the cross-agent surface coverage gap before peers actually adopt nm-recall in their MCP configs (Valiendo just ACK'd registering it in her Hermes profile via approval_7bf71a60). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ion (S2 + Opus race fix) Canonical DB had PRAGMA user_version=0 and no evidence-identity authority. record_evidence_artifact previously deduped via JSON-scan of metadata — not a real DB-level guard. S2 packet (NM-builder lane, dispatched by Opus): - python/schema_upgrade.py: additive evidence_ledger table (evidence_id PK, memory_id, evidence_type, source_system, source_record_id, status, inserted_at, updated_at, metadata_hash) with UNIQUE indexes on (source_system, source_record_id) and (evidence_type, source_record_id). PRAGMA user_version 0 → 1. Migration is idempotent (CREATE IF NOT EXISTS, no DROP/ALTER). - python/ae_workflow_helpers.py: * _ledger_reserve uses INSERT OR IGNORE for atomic claim. * Winner: mem.remember() then _ledger_set_memory_id patches in. * Loser: re-reads ledger; if memory_id NULL still, falls back to legacy json_extract path (or fresh remember as last resort). * Helper return shape {memory_id, evidence_id, inserted} preserved exactly across all 4 evidence helpers. * Pre-upgrade DBs and non-SQLite stores transparently fall back to legacy json_extract scan. Opus race-fix follow-up (commit-time): S2's loser path returned None when memory_id was still NULL (winner mid-flight), which made all 8 threads in the race test fall through to mem.remember() — exposing a non-thread-safe iteration in HNSW/connection_graph internals (RuntimeError: dictionary changed size during iteration). Wrapping the full pipeline in store._lock would deadlock since mem.remember re-acquires the same non-reentrant Lock internally. Fix: loser polls _ledger_lookup with 40 × 25ms (1s budget), releasing store._lock between polls so the winner can complete; falls through to a fresh remember only if the budget exhausts. Race test now passes consistently. Tests: 12 schema + 26 evidence (incl. 8-thread race test) + 26 sent-pdf consumer = 64/64 pass. Closes LIVE_FEED Active P0 itsXactlY#3 (REPO_DB_CONTRACT_GAP — evidence identity DB guard). Schema upgrade is NOT auto-applied to canonical DB. Tito ACK gates that. Pre-existing JSON-scan path remains for un-upgraded DBs. Synth contract: LIVE_FEED 2026-05-03T10:41:47Z, S2 dispatch. Evidence packet: ~/.neural_memory/sonnet-packets/2026-05-03/S2-result.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(apple-silicon): Neural Memory native macOS support#3

feat(apple-silicon): Neural Memory native macOS support#3
itsXactlY wants to merge 1 commit into
masterfrom
feat/apple-silicon-support

itsXactlY commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

itsXactlY commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant