Native split-level data skipping with PartitionFilter and column statistics

## Summary

Extend the existing `PartitionFilter` infrastructure to support split-level data skipping — evaluating column statistics (min/max/count) against query filters entirely in Rust, returning only the split IDs that cannot be pruned.

## Priority: P1

## Motivation

Currently, data skipping is done on the JVM driver in Scala (`PartitionPruning.scala`, `ExpressionSimplifier.scala`):

- Recursive filter tree traversal with Spark's `Expression.eval()`
- Per-partition string → typed value conversion (creates `LocalDate`/`Instant` objects per value)
- `zipWithIndex` + `groupBy` + `flatMap` chains on large collections
- Serial evaluation on the driver for all splits × all partitions

For highly-partitioned tables (100+ partitions × 1000+ splits), this is a measurable driver-side bottleneck.

## Proposed Approach

1. **Extend `PartitionFilter`** to support column statistics predicates (min ≤ value, max ≥ value)
2. **New method** (strawman):
   ```java
   // Given split metadata (partition values + column stats) and filters,
   // return indices of splits that survive pruning
   int[] evaluateDataSkipping(
       SplitMetadata[] splits,          // partition values + column stats per split
       PartitionFilter partitionFilter, // partition pruning
       ColumnStatsFilter statsFilter    // column-level data skipping
   );
   ```
3. Or, if transaction log state is already in Rust (depends on P0 transaction log reader), the pruning can happen as part of state materialization — filter during read rather than read-then-filter.

## Expected Impact

- **5-10x faster scan planning** on highly-partitioned tables
- Moves filter evaluation from interpreted Spark expressions to compiled Rust code
- Enables combined partition + column stats pruning in a single native pass

## Dependencies

- **P0**: Native transaction log state reader (if pruning during state read)
- Existing `PartitionFilter` infrastructure (equality, range, IN, IS NULL, AND/OR)

## Related

- indextables/indextables_spark integration issue (to be linked)
- `PartitionPruning.scala`, `ExpressionSimplifier.scala` (current JVM implementation)
- `SparkPredicateToPartitionFilter.scala` (existing Spark → PartitionFilter conversion)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native split-level data skipping with PartitionFilter and column statistics #102

Summary

Priority: P1

Motivation

Proposed Approach

Expected Impact

Dependencies

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Native split-level data skipping with PartitionFilter and column statistics #102

Description

Summary

Priority: P1

Motivation

Proposed Approach

Expected Impact

Dependencies

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions