Add merged structure IO and mmCIF parity by heathcliff233 · Pull Request #74 · steineggerlab/foldcomp

heathcliff233 · 2026-02-19T06:06:37Z

Summary

This PR adds practical multi-chain support while keeping FCZ chunk storage unchanged (single-chain per chunk).

Highlights

Added split/merge workflow for multi-chain/discontinuous structures:
- Python: compress(..., split=True) and open(..., merge_fragments=True)
Added Python/CLI format parity for decompression output:
- pdb | mmcif | cif
- CLI: foldcomp decompress --output-format ...
Added shared mmCIF writer path in C++ output utilities.
Improved robustness by creating/checking parent directories for tar/db outputs.

Compatibility

Backward compatible with existing FCZ/DB files.
Single-chain behavior remains unchanged.

Validation

conda run -n foldcomp pytest test -q → 12 passed
conda run -n foldcomp ./build.sh test → pass

- add Python split compression and merged fragment database reads - expose source fragment indices for merged entries - support format-selectable decompression in Python and CLI (pdb|mmcif|cif) - add shared mmCIF atom writer in C++ output path - harden tar/db output path handling with parent-directory checks - expand tests and docs using existing multichain fixture

heathcliff233 · 2026-02-23T07:20:51Z

Hi authors, thanks for the great tool and the efforts on maintaining it. I have checked the error log and followed the black formatting requirements. Other errors seem to be on the github server side that failed on apt install.

This PR aims to add protein multimer support based on the current storage format. It seems that there is already support for segment storage, so I reuse it for multi-chain support and add an additional layer to allow sample-level iteration (with additional mmcif write option by gemmi). Hope it can help.

Best,
Liang

milot-mirdita · 2026-03-09T08:30:30Z

Thanks a lot for the work. However, we have been exploring a different approach to be able to store full complexes. We have started to implement a container format that stores multiple models, chains, etc directly in the Foldcomp codebase (and not in the Python API).

It's still incomplete but he work is here:
https://github.com/milot-mirdita/foldcomp

heathcliff233 · 2026-03-10T05:57:21Z

Thanks for the clarification and for sharing the new direction. Glad to know that a container format for full complexes is being developed.

I’ll keep an eye on the progress, and I’d be happy to contribute once the new format stabilizes.

Best,
Liang

heathcliff233 added 2 commits February 19, 2026 05:55

Format test file with black

cfddb60

heathcliff233 closed this Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add merged structure IO and mmCIF parity#74

Add merged structure IO and mmCIF parity#74
heathcliff233 wants to merge 2 commits intosteineggerlab:masterfrom
heathcliff233:feature/merged-structure-io

heathcliff233 commented Feb 19, 2026

Uh oh!

heathcliff233 commented Feb 23, 2026

Uh oh!

milot-mirdita commented Mar 9, 2026

Uh oh!

heathcliff233 commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

heathcliff233 commented Feb 19, 2026

Summary

Highlights

Compatibility

Validation

Uh oh!

heathcliff233 commented Feb 23, 2026

Uh oh!

milot-mirdita commented Mar 9, 2026

Uh oh!

heathcliff233 commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants