feat(apple-silicon): Neural Memory native macOS support#3
Open
itsXactlY wants to merge 1 commit into
Open
Conversation
Adds first-class Apple Silicon (M1-M4) support for Neural Memory embeddings and C++ SIMD acceleration. Previously fell back to weak TF-IDF+SVD (384d) on Mac — now uses bge-m3 (1024d) via Metal GPU. embed_provider.py: - MPS (Metal Performance Shaders) GPU detection, device priority CUDA > MPS > CPU - EMBED_MODEL env var for model selection - Unified memory: no VRAM check needed on Mac cpp_bridge.py: - Platform-aware .dylib/.so library loading - Graceful fallback with macOS build instructions neural_memory.py: - Dynamic dim from embedder (replaces hardcoded dim=384) C++ SIMD (simd.h, simd_engine.cpp): - ARM NEON intrinsics for all 8 SIMD functions (dot_product, cosine_similarity, l2_norm, add, hadamard, scale, fmadd, weighted_add, zero) - 3-way dispatch: AVX2 (x86) > NEON (ARM64) > scalar fallback - CPUID guarded behind x86 platform check CMakeLists.txt: - Apple Silicon detection (arm64 AND APPLE) - ARM: -march=armv8-a+simd - x86: -march=x86-64-v2 -mavx2 -mfma - Conditional linking (skip pthread/stdc++ on macOS)
ernes-toe
added a commit
to ernes-toe/neural-memory
that referenced
this pull request
May 1, 2026
Reviewer itsXactlY#3 caught a recurring bug: my fix in bcd72db was a forward- guard only. Hermes (running at PID 55181 since 12:47, was 19835 at session start) hadn't reloaded the updated memory_client.py module, so its in-memory copy still inserted entity rows into FTS5. Result: 8 stale entity rows in the FTS index by review-time (was 6 at original audit; +2 from continued hermes saves). The forward-guard is right but insufficient when long-running processes hold stale code. This commit adds a self-healing defensive cleanup: SchemaUpgrade._ensure_fts5() now DELETEs any kind='entity' rows from memories_fts on every invocation. Combined with the SQLiteStore.__init__ hook from P7C2, this means every fresh NeuralMemory() instance cleans the index. Backfill is also now kind-aware (skips entities). python/schema_upgrade.py: _ensure_fts5() extended with defensive DELETE + kind-filtered backfill. ~10 LOC added. Verified on live DB: ran schema_upgrade.py against ~/.neural_memory/ memory.db; sync delta dropped 8 → 0. Tests still pass (5/5 schema + 10/10 sparse_temporal). Trade-off: defensive DELETE on every init is O(entity_count) extra work — negligible at AE scale (few entities ever). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ernes-toe
added a commit
to ernes-toe/neural-memory
that referenced
this pull request
May 2, 2026
Per Sonnet code review of HEAD~6..HEAD this session. CRITICAL itsXactlY#1: memory_client.py:710 — HNSW skip-reload now also requires disk file mtime to match what we tracked at last load. Without this, a separate process (MCP server, ingest cron, bench) that persists new HNSW state would be ignored — we'd silently serve stale dense results across plugin/cron/MCP-server boundaries. CRITICAL itsXactlY#2: cleanup_onedrive_dupes.py:53 — added WHERE h IS NOT NULL to GROUP BY query. Without it, all rows missing content_hash get lumped into one "NULL group" and treated as duplicates. Live 2026-05-02 run was lucky; future re-runs against partially-tagged sources would silently delete distinct content. MEDIUM itsXactlY#3: nm_recall_mcp.py — flipped use_hnsw=False → True. Without HNSW the MCP server linear-scans 12k+ memories per dense channel call, blowing up p50. R@5=0.82 bench config requires HNSW. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ernes-toe
added a commit
to ernes-toe/neural-memory
that referenced
this pull request
May 2, 2026
Per first-run reconciliation reviewer findings on python/mssql_store.py: LOW itsXactlY#1 [smell]: get_all() is unbounded Added default limit=100_000 with explicit TOP N in SQL. Different shape from SQLiteStore.get_all() (row-iterator) but MSSQL's pyodbc fetchall() materializes all rows into RAM, so the cap matters for big substrates. Caller can bump limit explicitly if needed. LOW itsXactlY#2 [dead]: CREATE DATABASE block in SCHEMA_SQL Removed the IF NOT EXISTS CREATE DATABASE NeuralMemory + USE NeuralMemory + GO blocks. _ensure_schema explicitly skipped them via 'GO not in stmt and CREATE DATABASE not in stmt' guard, so they never executed. Misleading dead code; simplified. LOW itsXactlY#3 [smell]: Now-redundant filter conditions Since CREATE DATABASE + GO are gone from SCHEMA_SQL, the filter becomes just `if stmt:`. Simplified _ensure_schema accordingly. Plus inline note about the autocommit=True / .commit() pattern (reviewer's other [smell] finding) — kept the .commit() calls as readability anchors but documented they're no-ops. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ernes-toe
added a commit
to ernes-toe/neural-memory
that referenced
this pull request
May 2, 2026
…ctlY#3) Per holistic-reviewer-round-1 finding: nm_recall_mcp.py was 365 LOC of cross-agent JSON-RPC surface (exposed to Codex + Hermes + Claude Code) shipped with zero tests. Closes the coverage gap. Smoke contracts: - test_initialize_returns_protocol_version - test_tools_list_returns_five_tools (nm_recall, nm_sparse_search, nm_remember, nm_status, nm_audit) - test_nm_status_tool_call_returns_substrate_stats (verifies new memories_fts_count + non_entity_memories_count fields from reviewer-round-6 reshape) - test_unknown_tool_returns_jsonrpc_error (-32601) - test_unknown_method_returns_jsonrpc_error (-32601) Tests skip cleanly if substrate/script not present (CI-safe). Subprocess-based: spawns the actual MCP server stdio protocol and verifies real responses, not just imported logic. Closes the cross-agent surface coverage gap before peers actually adopt nm-recall in their MCP configs (Valiendo just ACK'd registering it in her Hermes profile via approval_7bf71a60). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ernes-toe
added a commit
to ernes-toe/neural-memory
that referenced
this pull request
May 3, 2026
…ion (S2 + Opus race fix)
Canonical DB had PRAGMA user_version=0 and no evidence-identity authority.
record_evidence_artifact previously deduped via JSON-scan of metadata —
not a real DB-level guard.
S2 packet (NM-builder lane, dispatched by Opus):
- python/schema_upgrade.py: additive evidence_ledger table
(evidence_id PK, memory_id, evidence_type, source_system,
source_record_id, status, inserted_at, updated_at, metadata_hash) with
UNIQUE indexes on (source_system, source_record_id) and
(evidence_type, source_record_id). PRAGMA user_version 0 → 1.
Migration is idempotent (CREATE IF NOT EXISTS, no DROP/ALTER).
- python/ae_workflow_helpers.py:
* _ledger_reserve uses INSERT OR IGNORE for atomic claim.
* Winner: mem.remember() then _ledger_set_memory_id patches in.
* Loser: re-reads ledger; if memory_id NULL still, falls back to
legacy json_extract path (or fresh remember as last resort).
* Helper return shape {memory_id, evidence_id, inserted} preserved
exactly across all 4 evidence helpers.
* Pre-upgrade DBs and non-SQLite stores transparently fall back to
legacy json_extract scan.
Opus race-fix follow-up (commit-time): S2's loser path returned None
when memory_id was still NULL (winner mid-flight), which made all 8
threads in the race test fall through to mem.remember() — exposing a
non-thread-safe iteration in HNSW/connection_graph internals (RuntimeError:
dictionary changed size during iteration). Wrapping the full pipeline
in store._lock would deadlock since mem.remember re-acquires the same
non-reentrant Lock internally. Fix: loser polls _ledger_lookup with
40 × 25ms (1s budget), releasing store._lock between polls so the winner
can complete; falls through to a fresh remember only if the budget
exhausts. Race test now passes consistently.
Tests: 12 schema + 26 evidence (incl. 8-thread race test) +
26 sent-pdf consumer = 64/64 pass.
Closes LIVE_FEED Active P0 itsXactlY#3 (REPO_DB_CONTRACT_GAP — evidence
identity DB guard).
Schema upgrade is NOT auto-applied to canonical DB. Tito ACK gates
that. Pre-existing JSON-scan path remains for un-upgraded DBs.
Synth contract: LIVE_FEED 2026-05-03T10:41:47Z, S2 dispatch.
Evidence packet: ~/.neural_memory/sonnet-packets/2026-05-03/S2-result.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds first-class Apple Silicon (M1-M4) support for Neural Memory embeddings and C++ SIMD acceleration. Previously fell back to weak TF-IDF+SVD (384d) on Mac — now uses bge-m3 (1024d) via Metal GPU.
embed_provider.py:
cpp_bridge.py:
neural_memory.py:
C++ SIMD (simd.h, simd_engine.cpp):
CMakeLists.txt: