Skip to content

refactor(scrape): Remove vestigial code.symbol dual-writes from language processors #63

@nicabarnimble

Description

@nicabarnimble

Summary

Remove redundant add_symbol() calls for entities that already have richer fact types. This is Phase 2 cleanup following the FTS5 deduplication fix (commit ef6f59c).

Background

The scrape pipeline has vestigial code from an abandoned architecture (Aug 2025):

  • Original design (b7c1ddf2): code_search table for text search alongside code_fingerprints for fingerprint similarity
  • Current state: code_search is NEVER QUERIED, but language processors still write to it via dual add_symbol() + add_function() calls

The FTS5 fix filters these at query time, but the writes still happen, wasting storage and cycles.

Current Behavior

Every function gets written 4 times per scrape:

  1. code.function event → eventlog
  2. function_facts table → materialized view
  3. code.symbol event → eventlog (REDUNDANT)
  4. code_search table → materialized view (NEVER QUERIED)

Proposed Changes

Phase 2a: Remove symbol writes for functions

  • Files: src/commands/scrape/code/languages/*.rs (~20 sites)
  • Remove data.add_symbol() calls that follow data.add_function()

Phase 2b: Remove symbol writes for types

  • Same files, similar pattern for structs/enums/traits

Phase 2c: Remove dead code

  • Remove code_search table from schema (src/commands/scrape/code/database.rs)
  • Remove insert_symbols() writes to code_search
  • Consider deprecating code.symbol event type

Affected Files

File Sites
rust.rs ~3 (function + struct + enum)
go.rs ~3
python.rs ~3
typescript.rs ~4
javascript.rs ~3
c.rs ~3
cpp.rs ~4
cairo.rs ~2
solidity.rs ~4

Validation

  • Run patina scrape - should complete without errors
  • Run patina bench retrieval - MRR should not regress
  • Verify code_fts has no code.symbol entries (redundant check after Phase 1)
  • Storage reduction: eventlog should shrink ~40%

References

  • Spec: layer/surface/build/spec-fts-deduplication.md
  • Session: 20251222-191614 (git archaeology)
  • Phase 1 fix: ef6f59c

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions