Sync only latest version of collections by default#2471
Sync only latest version of collections by default#2471
Conversation
Assisted By: Claude Code
|
I am uncomfortable with changing default behavior. |
|
@mdellweg, @gerrod3 - I appreciate the feed back! I have confidence that memory issues will be better with the recent patches. Those are welcome changes! I was hesitant to submit this PR because memory consumption was the most pressing issue, but I changed my mind after discussing it with some colleagues. I think it is a good way to start a conversation if the current behavior is the best default behavior. I don't believe old collections are reaped from the current repositories for collections. This means the amount of data that is synchronized continues to grow. New organizations starting fresh syncs will discover ever-growing synchronization times paired with increased storage requirements. Is it reasonable to assume that an organization will want a local copy of every version of every collection available since the beginning of time? Is it more reasonable to sync all the collections but only the latest version? The additive nature of syncs will continue to pick up the latest version. Remember, the original behavior can be achieved with an explicit requirements file. I have reached out to a few additional people to get their feedback on whether a change of this nature is worth the effort. If not - all good - at least we had the conversation! |
|
We probably won't accept changing the default behavior, but I think we would accept a new field on the remote As for reducing disk space usage on initial sync, this is what on-demand syncs are for. I think we are pretty close to being able to solve this in pulp-ansible now #712. |
The behavior of the current collection sync without a requirements file is to pull all collections and every version available of each collection. This can lead to failures in the collection sync process due to memory inefficiencies in how collection syncs are performed.
This PR switches the default behavior to only sync the latest version of each collection that is available.
The current behavior can be achieved by providing a requirements file with a version set to a wildcard or version range.
Assisted By: Claude Code
📜 Checklist