Skip to content

Bad Vector Fill Behavior: Docs Say Only Bad Elements Are Replaced, But Code Replaces the Entire Vector #134

@oqoqo-bot

Description

@oqoqo-bot

Documentation Gap

Documentation claims [1.0, NaN, 3.0] becomes [1.0, 0.0, 3.0] (element-wise) but code replaces the ENTIRE vector with [0.0, 0.0, 0.0] (whole-vector replacement).

Description

The docs incorrectly describe on_bad_vectors='fill' as element-wise NaN replacement — the actual code replaces the ENTIRE vector.

  • Docs claim [1.0, NaN, 3.0] becomes [1.0, 0.0, 3.0] with fill_value=0.0, but the code at table.py:3177-3181 replaces the entire vector with [0.0, 0.0, 0.0] — the is_bad flag is per-vector, not per-element
  • Users lose ALL valid elements in partially-bad vectors without knowing it
  • Zero fill vectors cause downstream issues: undefined cosine similarity (division by zero) and L2 results clustering near the origin

How to Validate

Affected Files

  • python/python/lancedb/table.py
  • python/python/lancedb/db.py
  • docs/tables/consistency.mdx

Created by Oqoqo

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions