Add merged structure IO and mmCIF parity#74
Add merged structure IO and mmCIF parity#74heathcliff233 wants to merge 2 commits intosteineggerlab:masterfrom
Conversation
- add Python split compression and merged fragment database reads - expose source fragment indices for merged entries - support format-selectable decompression in Python and CLI (pdb|mmcif|cif) - add shared mmCIF atom writer in C++ output path - harden tar/db output path handling with parent-directory checks - expand tests and docs using existing multichain fixture
|
Hi authors, thanks for the great tool and the efforts on maintaining it. I have checked the error log and followed the black formatting requirements. Other errors seem to be on the github server side that failed on apt install. This PR aims to add protein multimer support based on the current storage format. It seems that there is already support for segment storage, so I reuse it for multi-chain support and add an additional layer to allow sample-level iteration (with additional mmcif write option by gemmi). Hope it can help. Best, |
|
Thanks a lot for the work. However, we have been exploring a different approach to be able to store full complexes. We have started to implement a container format that stores multiple models, chains, etc directly in the Foldcomp codebase (and not in the Python API). It's still incomplete but he work is here: |
|
Thanks for the clarification and for sharing the new direction. Glad to know that a container format for full complexes is being developed. I’ll keep an eye on the progress, and I’d be happy to contribute once the new format stabilizes. Best, |
Summary
This PR adds practical multi-chain support while keeping FCZ chunk storage unchanged (single-chain per chunk).
Highlights
compress(..., split=True)andopen(..., merge_fragments=True)pdb | mmcif | ciffoldcomp decompress --output-format ...Compatibility
Validation
conda run -n foldcomp pytest test -q→12 passedconda run -n foldcomp ./build.sh test→ pass