Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 25 additions & 10 deletions docs/indexing/reindexing.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,16 @@ As data is being added and a reindex operation is running, LanceDB will combine

Rather than dropping an existing index entirely and reindexing from scratch, LanceDB supports **incremental indexing**.

## Incremental Indexing
## Incremental Reindexing

<Badge color="green">OSS</Badge>
You can manually trigger an incremental indexing operation on updated data
using the `optimize()` method on a table.

In LanceDB OSS, you can manually trigger an incremental indexing operation using the `optimize()`
method on a table. This will perform compaction, pruning and updating of the index on the specified
table.
Table optimization performs three maintenance operations:

1. **Compaction**: merges small fragments into larger ones to improve read performance
2. **Pruning/Cleanup**: removes files from versions older than a retention window (7 days by default)
3. **Index update**: adds newly-ingested data to existing indexes

<CodeGroup>
<CodeBlock filename="Python" language="Python" icon="python">
Expand All @@ -36,11 +39,23 @@ LanceDB Cloud/Enterprise support incremental reindexing through an automated bac
- While indexes are being rebuilt, queries use brute force methods on unindexed rows, which may temporarily increase latency. To avoid this, set `fast_search=True` to search only indexed data.
- Use `index_stats()` to view the number of unindexed rows. This will be zero when indexes are fully up-to-date.


<Tip>
**Performance and simplicity**

The benefit of using LanceDB Cloud & Enterprise is that they automate the reindexing process
and operate continuously in the background, minimizing the impact on latency under high loads.
In OSS, you must manually manage the reindexing cadence based on your data growth and performance needs.
</Tip>

## Disk utilization

Compaction by itself does not immediately free disk space, and can temporarily increase it because new
compacted files are written before old-version files are deleted. Disk space is reclaimed when old versions
are pruned during cleanup. Set retention only as low as your rollback and time-travel requirements allow.

If you need to reclaim space more aggressively in OSS, use a shorter retention window:
<CodeGroup>
```python Python icon=Python
from datetime import timedelta

table.optimize(cleanup_older_than=timedelta(days=1))
```
</CodeGroup>


14 changes: 8 additions & 6 deletions docs/lance.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -73,13 +73,15 @@ throughput (i.e., keep latencies down to a minimum). Compaction is the process o
together to reduce the amount of metadata that needs to be managed, and to reduce the number of files
that need to be opened while scanning the dataset.

### Performance Optimization Through Compaction
Running compaction on a Lance dataset will do the following:

Compaction performs the following tasks in the background:
- Remove deleted rows from fragments
- Remove dropped columns from fragments
- Merge small fragments into larger ones

- Removes deleted rows from fragments
- Removes dropped columns from fragments
- Merges small fragments into larger ones
Compaction focuses on read performance, not immediate disk reclamation. During compaction, Lance writes
new compacted files while older files are still referenced by previous table versions. This means disk
usage can increase temporarily until old versions are cleaned up.

### Data deletion and recovery

Expand All @@ -97,4 +99,4 @@ exists based on your backup policy.
href="https://lance.org/quickstart"
>
Lance is a separate open source project. Check out its documentation to learn more.
</Card>
</Card>
Loading