Rewrite BAM auxiliary data#70
Conversation
The new AUXData type also supports writing, conforms closer to the `AbstractDict` interface, and is more efficient and type stable.
Based on this description, this modification is strictly additive. To me, having an Aside from the name change, how much is different between the old and new types? |
|
Don't changes in the length of |
|
Naively, once the span is found, couldn't splice! be used to adjust function adjust(record::Record, indices, newdata::Vector{Uint8})
splice!(record.data, indices, newdata)
record.block_size = recalculate_block_size(record.data)
end |
|
This has been implemented as https://github.com/BioJulia/XAMAuxData.jl. |
This is the first step towards improving the API and performance of XAM.jl.
This PR is not breaking. It introduces a new data type representing BAM auxiliary data. When this PR is done, it will introduce a new function that allows users to manipulate auxiliary data using this new type.
The old type may then be removed in a future, breaking version of BAM, if one such is ever to happen. So, if this PR lands, and I never get around to make larger changes to XAM, or we disagree on what to do, this PR should still be a win :)
The advantage of this new AUXData type over the old AuxData is:
AbstractDictinterface.The design of this currently relies on there never being any extra noncoding bytes in BAM record's data vector. However, there currently sometimes are.
Why the change? In older versions of Julia, resizing vectors were always slow. So, significant time could be saved by having the vector be overly long, and then storing the true length as an integer inside your mutable struct. However, from Julia 1.11, simply resizing the vectors is now even faster :)
TODO