Summary
Remove redundant add_symbol() calls for entities that already have richer fact types. This is Phase 2 cleanup following the FTS5 deduplication fix (commit ef6f59c).
Background
The scrape pipeline has vestigial code from an abandoned architecture (Aug 2025):
- Original design (
b7c1ddf2): code_search table for text search alongside code_fingerprints for fingerprint similarity
- Current state:
code_search is NEVER QUERIED, but language processors still write to it via dual add_symbol() + add_function() calls
The FTS5 fix filters these at query time, but the writes still happen, wasting storage and cycles.
Current Behavior
Every function gets written 4 times per scrape:
code.function event → eventlog
function_facts table → materialized view
code.symbol event → eventlog (REDUNDANT)
code_search table → materialized view (NEVER QUERIED)
Proposed Changes
Phase 2a: Remove symbol writes for functions
- Files:
src/commands/scrape/code/languages/*.rs (~20 sites)
- Remove
data.add_symbol() calls that follow data.add_function()
Phase 2b: Remove symbol writes for types
- Same files, similar pattern for structs/enums/traits
Phase 2c: Remove dead code
- Remove
code_search table from schema (src/commands/scrape/code/database.rs)
- Remove
insert_symbols() writes to code_search
- Consider deprecating
code.symbol event type
Affected Files
| File |
Sites |
rust.rs |
~3 (function + struct + enum) |
go.rs |
~3 |
python.rs |
~3 |
typescript.rs |
~4 |
javascript.rs |
~3 |
c.rs |
~3 |
cpp.rs |
~4 |
cairo.rs |
~2 |
solidity.rs |
~4 |
Validation
References
- Spec:
layer/surface/build/spec-fts-deduplication.md
- Session: 20251222-191614 (git archaeology)
- Phase 1 fix: ef6f59c
Summary
Remove redundant
add_symbol()calls for entities that already have richer fact types. This is Phase 2 cleanup following the FTS5 deduplication fix (commit ef6f59c).Background
The scrape pipeline has vestigial code from an abandoned architecture (Aug 2025):
b7c1ddf2):code_searchtable for text search alongsidecode_fingerprintsfor fingerprint similaritycode_searchis NEVER QUERIED, but language processors still write to it via dualadd_symbol()+add_function()callsThe FTS5 fix filters these at query time, but the writes still happen, wasting storage and cycles.
Current Behavior
Every function gets written 4 times per scrape:
code.functionevent → eventlogfunction_factstable → materialized viewcode.symbolevent → eventlog (REDUNDANT)code_searchtable → materialized view (NEVER QUERIED)Proposed Changes
Phase 2a: Remove symbol writes for functions
src/commands/scrape/code/languages/*.rs(~20 sites)data.add_symbol()calls that followdata.add_function()Phase 2b: Remove symbol writes for types
Phase 2c: Remove dead code
code_searchtable from schema (src/commands/scrape/code/database.rs)insert_symbols()writes tocode_searchcode.symbolevent typeAffected Files
rust.rsgo.rspython.rstypescript.rsjavascript.rsc.rscpp.rscairo.rssolidity.rsValidation
patina scrape- should complete without errorspatina bench retrieval- MRR should not regresscode_ftshas nocode.symbolentries (redundant check after Phase 1)References
layer/surface/build/spec-fts-deduplication.md