feat(migrate): lodedb migrate toolkit for existing vector stores#36
Merged
Conversation
Add a plan-first, non-destructive `lodedb migrate` sub-app that moves an existing LangChain, LlamaIndex, or mem0 store, or a direct pgvector table, onto a local LodeDB path. The flow is inspect, plan, dry-run, run, then validate; the source store is only ever read. - Detection routes a framework owner (LangChain/LlamaIndex/mem0) ahead of any direct provider beneath it; direct pgvector is the first provider-first path. Ambiguous projects stop and ask for --framework/--provider. - Source exporters: LangChain InMemoryVectorStore, LlamaIndex SimpleVectorStore, mem0 Qdrant, and direct pgvector (read-only, keyset-paginated, with information_schema/atttypmod column and dimension discovery). - run writes to a temp dir, reopens it read-only, validates count/sample/text and the persisted-index audit, then renames into place. The default is a dry run; --write performs the migration. An existing target is never clobbered. - Plans and the migration.json manifest are payload-free; connection strings are redacted and never persisted (re-supply with --source at run time). - Switch snippets cover vector-preserve and text-owned SDK usage. - Public agent pages docs/migrate-agent.md (framework) and docs/install-agent.md (provider-first router), linked from docs/integrations.md and the README. audit_persisted_index_snapshots and the index loader now skip migration.json so a manifest can live in a migrated store directory, mirroring collection.json. Closes #34 Closes #35
…eopen, batch writes Address review findings on the migration runner: - run no longer publishes a failed migration. After validation it records the manifest in the temp dir, and only moves the store into the target when validation passed. A failed run raises and leaves any existing target unchanged, so a failed --overwrite-target run cannot replace a valid store. - Validation reopens vector-preserve targets with the effective dimension (the source-discovered dimension when the plan did not pin --embedding-dim) rather than the dim-8 fallback, so direct pgvector migrations that rely on dimension discovery validate correctly. - Writes are buffered into bounded batches and flushed through the batch SDK APIs (add_many, add_vectors_many, adapter batch inserts), so a large migration pays one commit and one embedding pass per batch instead of per row. - Validation compares a bounded sample of source rows against the target (id presence, scalar-metadata subset, stored text after reopen) and counts via count() instead of materializing every document. - migration.json is written via durable_replace so a crash mid-write cannot leave a partial manifest. - The published agent pages cross-link the website URLs rather than repo files. Tests cover failed-validation-does-not-publish, failed-overwrite-leaves-target, discovered-dimension validation, and batched write behavior.
…d overlap threshold Follow-up review nits: - The unpublished temp manifest of a failed run now reports status "failed". The run status is set before the inspection manifest is written, so target.tmp/migration.json no longer says "migrated" when validation failed. - Drop the query-overlap / query-sample thresholds from the plan and the plan Markdown, and stop advertising a representative query-overlap check in the migrate docs. Representative-query overlap is not enforced yet, so the plan and docs only state the checks the runner actually runs (count parity, the id/metadata/text sample, stored-text recovery, persisted-index audit).
…son tables Add a short call-out beneath the comparison tables pointing a project's coding assistant at the public install-agent page to migrate an existing store onto LodeDB.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
lodedb migratesub-app: a plan-first, non-destructive toolkit that moves an existing LangChain, LlamaIndex, or mem0 store, or a direct pgvector table, onto a local LodeDB path. The flow is inspect, plan, dry-run, run, then validate. The source store is only ever read.--framework/--provider.InMemoryVectorStore, LlamaIndexSimpleVectorStore, mem0 Qdrant, and direct pgvector (read-only, keyset-paginated, withinformation_schema/atttypmodcolumn and dimension discovery).runwrites to a temp dir, reopens it read-only, validates count/sample/text and the persisted-index audit, then renames into place. The default is a dry run;--writeperforms the migration. An existing target is never clobbered.migration.jsonmanifest are payload-free; connection strings are redacted and never persisted (re-supply with--sourceat run time).docs/migrate-agent.md(framework) anddocs/install-agent.md(provider-first router), linked fromdocs/integrations.mdand the README. These still need publishing toegoistmachines.com/lodedb/{migrate-agent,install-agent}.audit_persisted_index_snapshotsand the index loader skipmigration.jsonso a manifest can live in a migrated store directory, mirroringcollection.json.Deferred to follow-ups: direct exporters for Qdrant (non-mem0), Chroma, LanceDB, sqlite-vec, and FAISS (detected and reported today, export pending); vector-preserve mode for LangChain/LlamaIndex; representative-query overlap during validate; live-server integration tests.
Closes #34
Closes #35
Testing
uv run pytest tests/test_migrate_toolkit.py tests/test_migrate_pgvector_and_cli.py -q: 34 passeduv run pytest -q(full suite): 535 passed, 2 skippeduv run ruff check .: clean