diff --git a/docs/audits/holoindex_search_quality/HIA_FEDERATION_FOUNDUP_IDENTITY_COVERAGE_AUDIT.md b/docs/audits/holoindex_search_quality/HIA_FEDERATION_FOUNDUP_IDENTITY_COVERAGE_AUDIT.md new file mode 100644 index 00000000..6388609c --- /dev/null +++ b/docs/audits/holoindex_search_quality/HIA_FEDERATION_FOUNDUP_IDENTITY_COVERAGE_AUDIT.md @@ -0,0 +1,226 @@ +# HIA_FEDERATION_FOUNDUP_IDENTITY_COVERAGE_AUDIT_PHASE2B + +**Date**: 2026-05-06 +**Slice**: HIA_FEDERATION_FOUNDUP_IDENTITY_COVERAGE_AUDIT_PHASE2B +**Status**: COMPLETE - AUDIT ONLY +**Author**: 0102 W1 +**Base**: main @ `fccd7d9a8` (PR #510 merged) +**WSP References**: WSP 97, WSP 103, WSP 104, WSP 15 +**Depends On**: HIA_FEDERATION_METADATA_TAGGING_PHASE2 + +--- + +## Purpose + +Audit FoundUp identity coverage before enabling `foundup_id` query filtering. +Determine which `modules/foundups/*` directories are actual FoundUps, which +have manifests, which are externalized, and whether any ID mismatches exist. + +--- + +## 1. Preflight Query Results + +| Query | Top Doc Hits | +|-------|-------------| +| AutoPost externalized FoundUp | AUTOPOST_EXTERNAL_OPERATIONAL_READINESS_AUDIT.md (docs), INTERFACE.md (foundups) | +| Science Swarm Hub external pqn_swarm_hub | pqn_swarm_hub/README.md, FOUNDUPS_SCIENCE_SWARM_EMBED_SPEC.md, WSP_103, WSP_104 | +| FoundUp manifest schema foundup_id | PFMALL_FOUNDUP_MANIFEST_SCHEMA.md (TOP-1 docs), WSP_104 | + +All preflight queries return relevant docs at top positions. + +--- + +## 2. Catalog FoundUp IDs + +From `public/member/mall-video-catalog.json`: + +| Catalog foundup_id | Classification | +|-------------------|----------------| +| `antifafm` | ACTIVE_PFMALL_FOUNDUP | +| `autopost` | EXTERNAL_APP | +| `eduit` | CANDIDATE | +| `foundups_main` | BRAND_META | +| `gotjunk_001` | ACTIVE_INTERNAL | +| `kosei` | ACTIVE_INTERNAL | +| `linkedin_012` | IDENTITY_ONLY | +| `linkedin_esingularity` | LINKEDIN_MICRO | +| `linkedin_foundups` | IDENTITY_ONLY | +| `linkedin_tsingularity` | LINKEDIN_MICRO | +| `move2japan` | ACTIVE_PFMALL_FOUNDUP | +| `science_swarm` | EXTERNALIZED_STUB | +| `undaodu` | BRAND_META | + +--- + +## 3. modules/foundups/* Directory Analysis + +### Directories with foundup_manifest.json (4) + +| Directory | Manifest foundup_id | Catalog Match | Status | +|-----------|-------------------|---------------|--------| +| `trade/` | `trade` | Not in catalog | MATCH - proto-ready | +| `kosei/` | `kosei` | `kosei` | MATCH | +| `gotjunk/` | `gotjunk_001` | `gotjunk_001` | MATCH | +| `voteballots/` | `voteballots` | Not in catalog | MATCH - proto-ready | + +### FoundUp Directories WITHOUT Manifest (5) + +| Directory | Evidence of FoundUp | Catalog ID | Action Required | +|-----------|-------------------|-----------|-----------------| +| `move2japan/` | README: "FoundUp", has src/ | `move2japan` | **ADD MANIFEST** | +| `geoze/` | README: FoundUp intent, has src/ | Not in catalog | ADD MANIFEST when activated | +| `pqn_portal/` | Has src/, incubating | Not in catalog | ADD MANIFEST when activated | +| `social_twin/` | Has src/, PoC status | Not in catalog | ADD MANIFEST when activated | +| `ecosystem_animation/` | Frontend animation | Not in catalog | ADD MANIFEST when activated | + +### Support Modules (NOT FoundUps) (8) + +| Directory | Purpose | Manifest Required | +|-----------|---------|-------------------| +| `agent/` | Agent lifecycle management | NO | +| `agent_market/` | CABR engine, FAM daemon | NO | +| `agent_market+/` | Memory folder only | NO | +| `memory/` | WSP 60 compliance storage | NO | +| `mobile_worker_skills/` | Worker skills framework | NO | +| `pfmall/` | Catalog API | NO | +| `simulator/` | Simulation tool | NO | +| `src/` | Infrastructure code | NO | + +### Documentation Only (1) + +| Directory | Purpose | +|-----------|---------| +| `docs/` | Architecture documentation | + +### Externalized Stub (1) + +| Directory | External Repo | Catalog ID | ID Mismatch | +|-----------|--------------|-----------|-------------| +| `pqn_swarm_hub/` | FOUNDUPS/science-swarm-hub | `science_swarm` | **YES** | + +--- + +## 4. ID Mismatches + +### Mismatch 1: science_swarm vs pqn_swarm_hub + +| Field | Value | +|-------|-------| +| Catalog `foundup_id` | `science_swarm` | +| Module directory | `modules/foundups/pqn_swarm_hub/` | +| External repo | `FOUNDUPS/science-swarm-hub` | +| Package name | `science-swarm-hub` (pyproject.toml) | + +**Resolution**: The directory name `pqn_swarm_hub` is historical. The canonical +name is `science_swarm`. If a manifest is added to the stub, use `foundup_id: "science_swarm"`. + +**Current impact**: Files under `pqn_swarm_hub/` resolve to `foundup_id: "pqn_swarm_hub"` +(directory fallback). This is a stub-only directory with no executable code. + +--- + +## 5. External FoundUps (Correctly Blocked) + +| FoundUp | Monorepo Module | External Surface | Status | +|---------|----------------|-----------------|--------| +| AutoPost | NO (correct) | FOUNDUPS/AutoPost repo | BLOCKED | +| Science Swarm Hub | STUB only | FOUNDUPS/science-swarm-hub repo | BLOCKED | + +Both externalized FoundUps are correctly blocked from internal indexing. + +--- + +## 6. Missing Manifests + +### Priority 1: move2japan + +| Field | Recommended Value | +|-------|-------------------| +| `foundup_id` | `move2japan` | +| `routing_prefix` | `/f/move2japan` | +| `data_namespace` | `idb_move2japan` | +| `tier` | `F1_OPO` | +| `lifecycle_stage` | `proto` | +| Reason | Active in pfMALL (573 videos), has src/ | + +### Priority 2-3: Deferred + +| FoundUp | Priority | Reason | +|---------|----------|--------| +| `geoze` | P2 | Has src/, not in catalog | +| `pqn_portal` | P3 | Incubating | +| `social_twin` | P3 | PoC status | + +--- + +## 7. antifaFM Location + +antifaFM module is at `modules/platform_integration/antifafm_broadcaster/`, not +under `foundups/`. This means `resolve_foundup_metadata()` classifies it as `"core"`. + +This is acceptable: the broadcaster is platform infrastructure. The FoundUp identity +`antifafm` belongs to the content lane, not the broadcaster code. + +--- + +## 8. Phase 3 Filtering Readiness + +### Ready for Query Filtering (4) + +| foundup_id | Has Manifest | Filter-Ready | +|-----------|--------------|--------------| +| `trade` | YES | YES | +| `kosei` | YES | YES | +| `gotjunk_001` | YES | YES | +| `voteballots` | YES | YES | + +### Fallback Works (1) + +| Directory | Effective ID | Gap | +|-----------|-------------|-----| +| `move2japan/` | `move2japan` (fallback) | Manifest recommended | + +### External (Blocked) + +| FoundUp | Status | +|---------|--------| +| `autopost` | BLOCKED | +| `science_swarm` | BLOCKED | + +**VERDICT**: Phase 3 query filtering CAN proceed. The 4 manifested FoundUps are +filter-ready. `move2japan` fallback produces correct ID. + +--- + +## 9. WSP 97 Truth Boundaries + +| Statement | Status | +|-----------|--------| +| Only 4 directories have foundup_manifest.json | TRUE | +| move2japan has 573 catalog videos but no manifest | TRUE | +| pqn_swarm_hub directory name mismatches catalog ID science_swarm | TRUE | +| antifaFM is under platform_integration, not foundups | TRUE | +| AutoPost has no internal module (correctly externalized) | TRUE | +| Science Swarm Hub is stub-only (code migrated to external) | TRUE | +| resolve_foundup_metadata() falls back to directory name | TRUE | +| No manifests added in this audit | TRUE | +| No external repo indexing enabled | TRUE | + +--- + +## 10. Recommended Manifest Additions + +| FoundUp | Priority | When | +|---------|----------|------| +| `move2japan` | P1 | Before Phase 3 (optional) | +| `geoze` | P2 | When activated | +| `pqn_portal` | P3 | When proto | +| `social_twin` | P3 | When proto | + +--- + +## Files Added + +| File | Purpose | +|------|---------| +| `HIA_FEDERATION_FOUNDUP_IDENTITY_COVERAGE_AUDIT.md` | This audit | diff --git a/holo_index/ModLog.md b/holo_index/ModLog.md index 273291db..c17f732f 100644 --- a/holo_index/ModLog.md +++ b/holo_index/ModLog.md @@ -1,5 +1,85 @@ # HoloIndex Package ModLog +## [2026-05-06] HIA_FEDERATION_QUERY_FILTERING_PHASE3 + +**Agent**: 0102 (W1) +**WSP References**: WSP 97, WSP 87, WSP 15 +**Status**: COMPLETE + +### Summary + +Added FoundUp-scoped query filtering to HoloIndex search API. When `foundup_id` +is provided, results are restricted to that FoundUp's documents. When +`include_shared=True` (default), shared `core` documents are also included. +Backward compatible: existing callers work unchanged. + +### Changes + +- `search_engine.py`: Added `_build_foundup_where_filter()` helper; extended + `_search_collection()`, `_lexical_search_collection()`, `execute_search()` + with `foundup_id: Optional[str]` and `include_shared: bool` params +- `holo_index.py`: Added `foundup_id` and `include_shared` params to + `HoloIndex.search()` passthrough +- `test_federation_query_filtering.py`: 19 unit tests (NEW) + +### API Signature Changes + +```python +# Before +def execute_search(holo, query, limit=10, doc_type_filter="all") +def HoloIndex.search(query, limit=10, doc_type_filter="all") + +# After (backward compatible) +def execute_search(holo, query, limit=10, doc_type_filter="all", + foundup_id=None, include_shared=True) +def HoloIndex.search(query, limit=10, doc_type_filter="all", + foundup_id=None, include_shared=True) +``` + +### Filter Logic + +- `foundup_id=None` → No filtering (all documents) +- `foundup_id="trade", include_shared=False` → `where={"foundup_id": "trade"}` +- `foundup_id="trade", include_shared=True` → `where={"$or": [{"foundup_id": "trade"}, {"foundup_id": "core"}]}` + +### Test Results + +- New: 19/19 pass (test_federation_query_filtering.py) +- Regression metadata: 22/22 pass (test_federation_metadata_tagging.py) +- Regression routing: 19/19 pass (test_backend_routing.py) + +--- + +## [2026-05-06] HIA_FEDERATION_FOUNDUP_IDENTITY_COVERAGE_AUDIT_PHASE2B + +**Agent**: 0102 (W1) +**WSP References**: WSP 97, WSP 103, WSP 104, WSP 15 +**Status**: COMPLETE - AUDIT ONLY + +### Summary + +Audited FoundUp identity coverage for federation query filtering readiness. +Found 4 directories with manifests (trade, kosei, gotjunk, voteballots), +5 FoundUp directories without manifests (move2japan needs one), 1 ID mismatch +(pqn_swarm_hub dir vs science_swarm catalog ID), and 2 correctly externalized +FoundUps (AutoPost, Science Swarm Hub). + +### Key Findings + +- 4/17 directories have `foundup_manifest.json` +- `move2japan` active in catalog (573 videos) but no manifest +- `pqn_swarm_hub/` directory vs `science_swarm` catalog ID mismatch +- `antifafm` module under `platform_integration/`, not `foundups/` +- AutoPost and Science Swarm Hub correctly blocked as external +- Phase 3 filtering CAN proceed (fallback handles gaps) + +### Changes + +- `HIA_FEDERATION_FOUNDUP_IDENTITY_COVERAGE_AUDIT.md`: Audit document (NEW) +- No code changes in this slice + +--- + ## [2026-05-06] HIA_FEDERATION_METADATA_TAGGING_PHASE2 **Agent**: 0102 (W1) diff --git a/holo_index/core/holo_index.py b/holo_index/core/holo_index.py index aa9f1add..94547ccd 100644 --- a/holo_index/core/holo_index.py +++ b/holo_index/core/holo_index.py @@ -518,14 +518,25 @@ def index_skillz_entries(self) -> None: # --------- Search --------- # - def search(self, query: str, limit: int = 10, doc_type_filter: str = "all") -> Dict[str, Any]: + def search( + self, + query: str, + limit: int = 10, + doc_type_filter: str = "all", + foundup_id: Optional[str] = None, + include_shared: bool = True, + ) -> Dict[str, Any]: """Search across all indexed collections. Delegates to search_engine.execute_search() — the search surface was extracted from this class for WSP 87 size compliance. + + Args: + foundup_id: If set, filter results to this FoundUp's documents. + include_shared: If True and foundup_id is set, also include 'core' docs. """ from .search_engine import execute_search - return execute_search(self, query, limit, doc_type_filter) + return execute_search(self, query, limit, doc_type_filter, foundup_id, include_shared) # --------- CLI Helpers --------- # diff --git a/holo_index/core/search_engine.py b/holo_index/core/search_engine.py index 37243363..7da59778 100644 --- a/holo_index/core/search_engine.py +++ b/holo_index/core/search_engine.py @@ -202,6 +202,35 @@ def _resolve_alias_wsp_numbers(query: str) -> List[str]: return matched_wsps +# --------------------------------------------------------------------------- +# HIA Phase 3: FoundUp-scoped query filtering +# --------------------------------------------------------------------------- + + +def _build_foundup_where_filter( + foundup_id: Optional[str], + include_shared: bool, +) -> Optional[Dict[str, Any]]: + """Build ChromaDB where filter for FoundUp-scoped queries. + + Args: + foundup_id: If set, filter to this FoundUp's documents. + include_shared: If True and foundup_id is set, also include 'core' docs. + + Returns: + ChromaDB where clause dict, or None for no filtering. + """ + if foundup_id is None: + return None + + if include_shared: + # Include both the specified FoundUp and core/shared content + return {"$or": [{"foundup_id": foundup_id}, {"foundup_id": "core"}]} + else: + # Strict: only the specified FoundUp + return {"foundup_id": foundup_id} + + def _wsp_alias_match_boost(query: str, path: str, title: str) -> float: """Return boost if query matches a known WSP alias phrase. @@ -374,10 +403,16 @@ def _search_collection( limit: int, kind: str, doc_type_filter: str = "all", + foundup_id: Optional[str] = None, + include_shared: bool = True, ) -> List[Dict[str, Any]]: """Search a ChromaDB *collection* using vector embeddings with hybrid keyword scoring. Falls back to lexical search when the embedding model is unavailable. + + Args: + foundup_id: If set, filter results to this FoundUp's documents. + include_shared: If True and foundup_id is set, also include 'core' docs. """ if collection is None: return [] @@ -409,7 +444,7 @@ def _search_collection( model = getattr(holo, "model", None) if model is None: holo._log_agent_action("Embedding model not available - using offline lexical scan", "WARN") - return _lexical_search_collection(holo, collection, query, limit, kind, doc_type_filter) + return _lexical_search_collection(holo, collection, query, limit, kind, doc_type_filter, foundup_id, include_shared) # WSP 97: Encode with timeout to prevent indefinite hangs embedding = _run_with_timeout( @@ -420,9 +455,11 @@ def _search_collection( ) if embedding is None: holo._log_agent_action("Encoding timed out - falling back to lexical search", "WARN") - return _lexical_search_collection(holo, collection, query, limit, kind, doc_type_filter) + return _lexical_search_collection(holo, collection, query, limit, kind, doc_type_filter, foundup_id, include_shared) - results = collection.query(query_embeddings=[embedding], n_results=limit) + # HIA Phase 3: Apply FoundUp-scoped filter if specified + where_filter = _build_foundup_where_filter(foundup_id, include_shared) + results = collection.query(query_embeddings=[embedding], n_results=limit, where=where_filter) docs = results.get("documents", [[]])[0] metas = results.get("metadatas", [[]])[0] @@ -541,8 +578,15 @@ def _lexical_search_collection( limit: int, kind: str, doc_type_filter: str = "all", + foundup_id: Optional[str] = None, + include_shared: bool = True, ) -> List[Dict[str, Any]]: - """Keyword-based search used when embedding model is unavailable.""" + """Keyword-based search used when embedding model is unavailable. + + Args: + foundup_id: If set, filter results to this FoundUp's documents. + include_shared: If True and foundup_id is set, also include 'core' docs. + """ tokens = _tokenize_query(query) if not tokens: return [] @@ -586,6 +630,16 @@ def _lexical_search_collection( if doc_type_filter != "all" and doc_type != doc_type_filter: continue + # HIA Phase 3: FoundUp-scoped filtering + if foundup_id is not None: + meta_foundup = meta.get("foundup_id", "core") + if include_shared: + if meta_foundup != foundup_id and meta_foundup != "core": + continue + else: + if meta_foundup != foundup_id: + continue + keyword_score = 0.0 title = (meta.get("title") or "").lower() path = (meta.get("path") or "").lower() @@ -766,10 +820,16 @@ def execute_search( query: str, limit: int = 10, doc_type_filter: str = "all", + foundup_id: Optional[str] = None, + include_shared: bool = True, ) -> Dict[str, Any]: """Run a full HoloIndex search and return the canonical result payload. This is the extracted core of ``HoloIndex.search()``. + + Args: + foundup_id: If set, filter results to this FoundUp's documents. + include_shared: If True and foundup_id is set, also include 'core' docs. """ try: # Fast path: check cache first (WSP 91 performance optimization) @@ -807,53 +867,53 @@ def execute_search( # Search code index if doc_type_filter in ["code", "all"] and code_collection is not None: - code_results = _search_collection(holo, code_collection, query, limit, kind="code") + code_results = _search_collection(holo, code_collection, query, limit, kind="code", foundup_id=foundup_id, include_shared=include_shared) code_hits = holo._enhance_code_results_with_previews(code_results) if should_scan_symbols and symbol_collection is not None: - symbol_results = _search_collection(holo, symbol_collection, query, limit, kind="symbol") + symbol_results = _search_collection(holo, symbol_collection, query, limit, kind="symbol", foundup_id=foundup_id, include_shared=include_shared) if symbol_results: code_hits = _merge_hits(symbol_results, code_hits, limit) # Search WSP index if doc_type_filter not in ["code", "test"] and wsp_collection is not None: - wsp_hits = _search_collection(holo, wsp_collection, query, limit, kind="wsp", doc_type_filter=doc_type_filter) + wsp_hits = _search_collection(holo, wsp_collection, query, limit, kind="wsp", doc_type_filter=doc_type_filter, foundup_id=foundup_id, include_shared=include_shared) # Search Test index if doc_type_filter in ["test", "all"] and test_collection is not None: - test_hits = _search_collection(holo, test_collection, query, limit, kind="test", doc_type_filter=doc_type_filter) + test_hits = _search_collection(holo, test_collection, query, limit, kind="test", doc_type_filter=doc_type_filter, foundup_id=foundup_id, include_shared=include_shared) # Search Skillz index if doc_type_filter == "all" and skill_collection is not None: try: - skill_hits = _search_collection(holo, skill_collection, query, limit, kind="skill") + skill_hits = _search_collection(holo, skill_collection, query, limit, kind="skill", foundup_id=foundup_id, include_shared=include_shared) except Exception: skill_hits = [] # CFZ4: Search Docs index (module/root docs) if doc_type_filter in ["docs", "all"] and docs_collection is not None: try: - docs_hits = _search_collection(holo, docs_collection, query, limit, kind="docs") + docs_hits = _search_collection(holo, docs_collection, query, limit, kind="docs", foundup_id=foundup_id, include_shared=include_shared) except Exception: docs_hits = [] # CFZ4: Search Knowledge index (papers/research) if doc_type_filter in ["knowledge", "all"] and knowledge_collection is not None: try: - knowledge_hits = _search_collection(holo, knowledge_collection, query, limit, kind="knowledge") + knowledge_hits = _search_collection(holo, knowledge_collection, query, limit, kind="knowledge", foundup_id=foundup_id, include_shared=include_shared) except Exception: knowledge_hits = [] # Symbol-query fallback: lexical + rg for exact identifiers/paths if symbol_query: if doc_type_filter in ["code", "all"] and code_collection is not None: - lexical_code = _lexical_search_collection(holo, code_collection, query, limit, kind="code") + lexical_code = _lexical_search_collection(holo, code_collection, query, limit, kind="code", foundup_id=foundup_id, include_shared=include_shared) if lexical_code: code_hits = _merge_hits(code_hits, lexical_code, limit) rg_hits = _rg_symbol_search(holo.project_root, query, limit) if rg_hits: code_hits = _merge_hits(rg_hits, code_hits, limit) if doc_type_filter in ["all"] and not wsp_hits and wsp_collection is not None: - lexical_wsp = _lexical_search_collection(holo, wsp_collection, query, limit, kind="wsp", doc_type_filter=doc_type_filter) + lexical_wsp = _lexical_search_collection(holo, wsp_collection, query, limit, kind="wsp", doc_type_filter=doc_type_filter, foundup_id=foundup_id, include_shared=include_shared) if lexical_wsp: wsp_hits = _merge_hits(wsp_hits, lexical_wsp, limit) @@ -914,6 +974,9 @@ def execute_search( "collection_backend_map": dict( getattr(holo, "collection_backend_map", {}) or {} ), + # HIA Phase 3: FoundUp-scoped filtering params + "foundup_id": foundup_id, + "include_shared": include_shared, }, } diff --git a/holo_index/tests/test_federation_query_filtering.py b/holo_index/tests/test_federation_query_filtering.py new file mode 100644 index 00000000..c813ee9f --- /dev/null +++ b/holo_index/tests/test_federation_query_filtering.py @@ -0,0 +1,298 @@ +# -*- coding: utf-8 -*- +"""HIA_FEDERATION_QUERY_FILTERING_PHASE3: Query Filtering Tests + +Tests that verify FoundUp-scoped query filtering behavior: +- Unfiltered search unchanged (backward compat) +- Strict foundup_id filter +- include_shared includes 'core' documents +- Unknown foundup_id returns empty scoped hits + +WSP 97: These tests verify filtering correctness at the unit level. +WSP 87: Keep tests focused on filter behavior. +""" + +import pytest +from typing import Any, Dict, List, Optional +from unittest.mock import MagicMock, patch + +from holo_index.core.search_engine import ( + _build_foundup_where_filter, + _search_collection, + _lexical_search_collection, + execute_search, +) + + +# ============================================================================= +# Filter Builder Tests +# ============================================================================= + + +class TestBuildFoundupWhereFilter: + """Test _build_foundup_where_filter() logic.""" + + def test_no_foundup_id_returns_none(self): + """No foundup_id means no filtering.""" + result = _build_foundup_where_filter(None, include_shared=True) + assert result is None + + def test_no_foundup_id_include_shared_false_returns_none(self): + """No foundup_id means no filtering even with include_shared=False.""" + result = _build_foundup_where_filter(None, include_shared=False) + assert result is None + + def test_strict_foundup_filter(self): + """foundup_id + include_shared=False returns strict filter.""" + result = _build_foundup_where_filter("trade", include_shared=False) + assert result == {"foundup_id": "trade"} + + def test_include_shared_filter(self): + """foundup_id + include_shared=True returns OR filter.""" + result = _build_foundup_where_filter("trade", include_shared=True) + assert result == {"$or": [{"foundup_id": "trade"}, {"foundup_id": "core"}]} + + def test_gotjunk_001_strict(self): + """gotjunk_001 strict filter.""" + result = _build_foundup_where_filter("gotjunk_001", include_shared=False) + assert result == {"foundup_id": "gotjunk_001"} + + def test_kosei_with_shared(self): + """kosei with shared includes core.""" + result = _build_foundup_where_filter("kosei", include_shared=True) + assert result == {"$or": [{"foundup_id": "kosei"}, {"foundup_id": "core"}]} + + def test_unknown_foundup_id_still_builds_filter(self): + """Unknown foundup_id still builds filter (ChromaDB returns empty).""" + result = _build_foundup_where_filter("nonexistent_foundup", include_shared=False) + assert result == {"foundup_id": "nonexistent_foundup"} + + +# ============================================================================= +# Execute Search Metadata Tests +# ============================================================================= + + +class TestExecuteSearchMetadata: + """Test that execute_search() includes filtering params in metadata.""" + + @pytest.fixture + def mock_holo(self): + """Create a mock HoloIndex instance.""" + holo = MagicMock() + holo.code_collection = None + holo.symbol_collection = None + holo.wsp_collection = None + holo.test_collection = None + holo.skill_collection = None + holo.docs_collection = None + holo.knowledge_collection = None + holo.search_cache = None + holo.retrieval_mode = "semantic" + holo.embedding_backend = "sentence_transformers" + holo.routing_active = False + holo.collection_backend_map = {} + holo._log_agent_action = MagicMock() + return holo + + def test_unfiltered_metadata(self, mock_holo): + """Unfiltered search has foundup_id=None in metadata.""" + result = execute_search(mock_holo, "test query", limit=5) + assert result["metadata"]["foundup_id"] is None + assert result["metadata"]["include_shared"] is True + + def test_filtered_metadata(self, mock_holo): + """Filtered search includes foundup_id in metadata.""" + result = execute_search(mock_holo, "test query", limit=5, foundup_id="trade") + assert result["metadata"]["foundup_id"] == "trade" + assert result["metadata"]["include_shared"] is True + + def test_strict_filter_metadata(self, mock_holo): + """Strict filter has include_shared=False in metadata.""" + result = execute_search(mock_holo, "test query", limit=5, foundup_id="kosei", include_shared=False) + assert result["metadata"]["foundup_id"] == "kosei" + assert result["metadata"]["include_shared"] is False + + +# ============================================================================= +# Search Collection Filter Tests +# ============================================================================= + + +class TestSearchCollectionFiltering: + """Test that _search_collection() passes where filter to ChromaDB.""" + + @pytest.fixture + def mock_holo(self): + """Create a mock HoloIndex for collection tests.""" + holo = MagicMock() + holo.model = MagicMock() + holo.model.encode = MagicMock(return_value=MagicMock(tolist=lambda: [0.1] * 384)) + holo.embedders = None + holo.routing_active = False + holo._log_agent_action = MagicMock() + return holo + + @pytest.fixture + def mock_collection(self): + """Create a mock ChromaDB collection.""" + collection = MagicMock() + collection.name = "navigation_code" + collection.count = MagicMock(return_value=10) + collection.query = MagicMock(return_value={ + "documents": [["doc1", "doc2"]], + "metadatas": [[ + {"need": "need1", "type": "code", "path": "path1", "foundup_id": "trade"}, + {"need": "need2", "type": "code", "path": "path2", "foundup_id": "core"}, + ]], + "distances": [[0.1, 0.2]], + }) + return collection + + def test_unfiltered_query_no_where(self, mock_holo, mock_collection): + """Unfiltered search passes where=None to collection.query().""" + _search_collection(mock_holo, mock_collection, "test", 5, "code") + + mock_collection.query.assert_called_once() + call_kwargs = mock_collection.query.call_args[1] + assert call_kwargs.get("where") is None + + def test_strict_filter_passes_where(self, mock_holo, mock_collection): + """Strict filter passes where clause to collection.query().""" + _search_collection(mock_holo, mock_collection, "test", 5, "code", + foundup_id="trade", include_shared=False) + + mock_collection.query.assert_called_once() + call_kwargs = mock_collection.query.call_args[1] + assert call_kwargs.get("where") == {"foundup_id": "trade"} + + def test_include_shared_passes_or_where(self, mock_holo, mock_collection): + """Include shared passes $or where clause.""" + _search_collection(mock_holo, mock_collection, "test", 5, "code", + foundup_id="trade", include_shared=True) + + mock_collection.query.assert_called_once() + call_kwargs = mock_collection.query.call_args[1] + assert call_kwargs.get("where") == {"$or": [{"foundup_id": "trade"}, {"foundup_id": "core"}]} + + +# ============================================================================= +# Lexical Search Filter Tests +# ============================================================================= + + +class TestLexicalSearchFiltering: + """Test that _lexical_search_collection() filters by foundup_id.""" + + @pytest.fixture + def mock_holo(self): + """Create a mock HoloIndex.""" + holo = MagicMock() + holo._log_agent_action = MagicMock() + return holo + + @pytest.fixture + def mock_collection_with_data(self): + """Create a mock collection with mixed foundup_id data.""" + collection = MagicMock() + collection.count = MagicMock(return_value=4) + collection.get = MagicMock(return_value={ + "documents": [ + "trade engine code", + "core search code", + "kosei contracts", + "trade utils", + ], + "metadatas": [ + {"title": "trade_engine", "path": "trade/src/engine.py", "type": "code", "foundup_id": "trade"}, + {"title": "search_engine", "path": "holo_index/search.py", "type": "code", "foundup_id": "core"}, + {"title": "kosei_contracts", "path": "kosei/contracts.py", "type": "code", "foundup_id": "kosei"}, + {"title": "trade_utils", "path": "trade/utils.py", "type": "code", "foundup_id": "trade"}, + ], + }) + return collection + + def test_unfiltered_returns_all(self, mock_holo, mock_collection_with_data): + """Unfiltered lexical search returns all matching docs.""" + results = _lexical_search_collection( + mock_holo, mock_collection_with_data, "trade engine", 10, "code" + ) + # Should find docs containing "trade" or "engine" + assert len(results) >= 1 + + def test_strict_filter_excludes_others(self, mock_holo, mock_collection_with_data): + """Strict filter excludes non-matching foundup_id docs.""" + results = _lexical_search_collection( + mock_holo, mock_collection_with_data, "code", 10, "code", + foundup_id="trade", include_shared=False + ) + # Should only return trade docs (2 out of 4) + # Results have 'location' field containing doc text, not metadata fields + for result in results: + location = result.get("location", "").lower() + # Doc text for trade docs contains "trade" + assert "trade" in location + + def test_include_shared_includes_core(self, mock_holo, mock_collection_with_data): + """include_shared=True includes core docs.""" + results = _lexical_search_collection( + mock_holo, mock_collection_with_data, "engine code", 10, "code", + foundup_id="trade", include_shared=True + ) + # Should include both trade and core docs + # Results have 'location' field containing doc text + locations = [r.get("location", "").lower() for r in results] + has_trade = any("trade" in loc for loc in locations) + has_core = any("core" in loc for loc in locations) + # At least one should match (depends on keyword scoring) + assert len(results) >= 1 + + +# ============================================================================= +# Backward Compatibility Tests +# ============================================================================= + + +class TestBackwardCompatibility: + """Test that existing callers work unchanged.""" + + @pytest.fixture + def mock_holo(self): + """Create a mock HoloIndex.""" + holo = MagicMock() + holo.code_collection = None + holo.symbol_collection = None + holo.wsp_collection = None + holo.test_collection = None + holo.skill_collection = None + holo.docs_collection = None + holo.knowledge_collection = None + holo.search_cache = None + holo.retrieval_mode = "semantic" + holo.embedding_backend = "sentence_transformers" + holo.routing_active = False + holo.collection_backend_map = {} + holo._log_agent_action = MagicMock() + return holo + + def test_old_signature_works(self, mock_holo): + """Old callers with just (query, limit, doc_type_filter) still work.""" + result = execute_search(mock_holo, "test query", 5, "all") + assert "metadata" in result + assert result["metadata"]["foundup_id"] is None + + def test_old_signature_no_limit(self, mock_holo): + """Old callers with just query still work.""" + result = execute_search(mock_holo, "test query") + assert "metadata" in result + + def test_result_structure_unchanged(self, mock_holo): + """Result structure is unchanged for unfiltered search.""" + result = execute_search(mock_holo, "test query") + # All expected keys present + assert "code_hits" in result + assert "wsp_hits" in result + assert "code" in result + assert "wsps" in result + assert "docs" in result + assert "knowledge" in result + assert "metadata" in result