feat(reader): lazy DateTimeParts reassembly#49
Merged
Conversation
…Array
The pre-existing DateTimePartsEncodingDecoder returned a generic
GenericArray wrapping the three children (days, seconds, subseconds)
but no consumer in the extension-decode path (ExtensionStorage,
TimestampExtensionDecoder, DateExtensionDecoder) knew how to reassemble
that shape back into the epoch count their accessors expect. The path
was effectively dead at scan time — the encoder tests round-tripped
the children individually but never reconstructed an epoch.
Add LazyDateTimePartsLongArray (record, implements LongArray) that holds
the three children plus the precomputed unitsPerDay / unitsPerSecond
multipliers and reassembles on demand:
getLong(i) = days[i] * unitsPerDay
+ seconds[i] * unitsPerSecond
+ subseconds[i]
DateTimePartsArrays (package-private) centralises the per-row read so
each child can use whichever signed-integer ptype the encoder picked
(Byte / Short / Int / Long Array, optionally wrapped in MaskedArray).
DateTimePartsEncodingDecoder parses the parent Extension dtype's
TimeUnit metadata byte, computes unitsPerSecond = TimeUnit.divisor()
(falling back to 1 for the Days unit, whose seconds and subseconds
children are zero) and unitsPerDay = 86_400 × unitsPerSecond, then
constructs the lazy record. No buffer allocation, no per-row copy.
Now the extension-decode pipeline composes correctly: scanning a
vortex.datetimeparts-encoded column under a vortex.timestamp extension
produces a LongArray of reassembled epoch counts, which feeds into
TimestampExtensionDecoder.instant exactly like a Materialized child.
Updated DateTimePartsEncodingEncoderTest to assert the reassembled
epoch value instead of the (now hidden) per-child structure — the
behaviour the encoder is actually guaranteeing.
3 new unit tests in LazyDateTimePartsLongArrayTest cover the
millisecond reassembly, widening from narrower child ptypes, and the
fold reduction. ./mvnw verify green (13 modules, integration suite 40s).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Convert
vortex.datetimepartsdecoding from a non-functionalGenericArraywrapper into a working lazyLazyDateTimePartsLongArray.Background
The pre-existing decoder returned a
GenericArrayholding(days, seconds, subseconds)children but no consumer in the extension-decode path (ExtensionStorage,TimestampExtensionDecoder,DateExtensionDecoder) knew how to reassemble that shape back into the epoch count their accessors expect. The path was effectively dead at scan time — the encoder tests round-tripped the children individually but never reconstructed an epoch.Changes
LazyDateTimePartsLongArray(record, implementsLongArray) holds the three children plus precomputedunitsPerDay/unitsPerSecondmultipliers.getLong(i) = days[i] * unitsPerDay + seconds[i] * unitsPerSecond + subseconds[i].forEachLong/folduse the same per-row path.DateTimePartsArrays(package-private) centralises the per-row signed-long read so each child can use whichever ptype the encoder picked (Byte/Short/Int/Long, optionally wrapped inMaskedArray).DateTimePartsEncodingDecoderparses the parent Extension'sTimeUnitmetadata byte, computesunitsPerSecond = TimeUnit.divisor()(Days unit falls back to 1; seconds/subseconds children are zero in that case) andunitsPerDay = 86_400 × unitsPerSecond, then constructs the lazy record.DateTimePartsEncodingEncoderTestasserts the reassembled epoch values instead of the (now hidden) per-child breakdown — testing the behaviour the encoder actually guarantees.Pattern
Same top-level record shape as the rest of the lazy-decode session. The lazy record's
dtype()is the parent Extension dtype so it slots transparently into the existingExtensionStorage.epochInteger→TimestampExtensionDecoder.instant(...)pipeline (which already pattern-matches the LongArray case).Test plan
./mvnw verify— 13 modules SUCCESS, integration suite 40s green.LazyDateTimePartsLongArrayTestpass.DateTimePartsEncodingEncoderTestround-trip pass (asserting reassembled epoch).🤖 Generated with Claude Code