Rewrite BAM auxiliary data by jakobnissen · Pull Request #70 · BioJulia/XAM.jl

jakobnissen · 2024-03-10T20:03:57Z

This is the first step towards improving the API and performance of XAM.jl.

This PR is not breaking. It introduces a new data type representing BAM auxiliary data. When this PR is done, it will introduce a new function that allows users to manipulate auxiliary data using this new type.
The old type may then be removed in a future, breaking version of BAM, if one such is ever to happen. So, if this PR lands, and I never get around to make larger changes to XAM, or we disagree on what to do, this PR should still be a win :)

The advantage of this new AUXData type over the old AuxData is:

It allows both reading and writing, whereas the old only allowed reading
It's faster
It's more type stable
It conforms more closely to the AbstractDict interface.

The design of this currently relies on there never being any extra noncoding bytes in BAM record's data vector. However, there currently sometimes are.
Why the change? In older versions of Julia, resizing vectors were always slow. So, significant time could be saved by having the vector be overly long, and then storing the true length as an integer inside your mutable struct. However, from Julia 1.11, simply resizing the vectors is now even faster :)

TODO

Change BAM record to never have unused bytes at the end (Do not allow unused trailing bytes in BAM records #71)
Integrate with existing BAM, and provide a good API
Add tests
Add actual useful error messages
Review TODOs in code

The new AUXData type also supports writing, conforms closer to the `AbstractDict` interface, and is more efficient and type stable.

kescobo · 2024-03-11T19:10:56Z

introduces a new data type representing BAM auxiliary data. When this PR is done, it will introduce a new function that allows users to manipulate auxiliary data using this new type.
The old type may then be removed in a future, breaking version of BAM, if one such is ever to happen.

Based on this description, this modification is strictly additive. To me, having an AuxData and an AUXData is confusing, and since we're still pre-1.0, I'd kinda prefer just making another breaking release (even though the last such release was in January).

Aside from the name change, how much is different between the old and new types?

CiaranOMara · 2024-03-20T15:23:25Z

Don't changes in the length of BAM.Record.data need to be reflected in BAM.Record.block_size?

CiaranOMara · 2024-03-20T15:52:38Z

Naively, once the span is found, couldn't splice! be used to adjust BAM.Record.data?

function adjust(record::Record, indices, newdata::Vector{Uint8})
    splice!(record.data, indices, newdata)
    record.block_size = recalculate_block_size(record.data)
end

jakobnissen · 2024-10-18T09:31:06Z

This has been implemented as https://github.com/BioJulia/XAMAuxData.jl.

Rewrite BAM auxiliary data

dc1adba

The new AUXData type also supports writing, conforms closer to the `AbstractDict` interface, and is more efficient and type stable.

jakobnissen closed this Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite BAM auxiliary data#70

Rewrite BAM auxiliary data#70
jakobnissen wants to merge 1 commit into
BioJulia:developfrom
jakobnissen:auxdata

jakobnissen commented Mar 10, 2024 •

edited

Loading

Uh oh!

kescobo commented Mar 11, 2024 •

edited

Loading

Uh oh!

CiaranOMara commented Mar 20, 2024

Uh oh!

CiaranOMara commented Mar 20, 2024

Uh oh!

jakobnissen commented Oct 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jakobnissen commented Mar 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kescobo commented Mar 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CiaranOMara commented Mar 20, 2024

Uh oh!

CiaranOMara commented Mar 20, 2024

Uh oh!

jakobnissen commented Oct 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jakobnissen commented Mar 10, 2024 •

edited

Loading

kescobo commented Mar 11, 2024 •

edited

Loading