Separate state retention settings from consensus data#4279
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces separate retention policies and pruning logic for Merklized state and consensus data, adding new configuration options and updating storage traits to support state-specific pruned height tracking. The SQL storage pruner has been refactored to handle these categories independently, with state pruning prioritized. Review feedback points out an inconsistency in timestamp usage during pruner initialization and suggests enhancing error messages by including the data category to improve debuggability.
| })?; | ||
|
|
||
| if pruned_height.is_some_and(|h| height <= h as usize) { | ||
| if pruned_height.is_some_and(|h| created <= h as i64) { |
There was a problem hiding this comment.
Unless I'm misunderstanding, this was a latent bug. We were considering the requested state snapshot to exist as long as height (the height of the most recently created snapshot) was not pruned. This would be a false positive whenever we were querying for an older snapshot that had been pruned.
Ultimately this ends up being ok, because if we load a path that is not actually present, we will end up computing a different commitment than the actual root of the tree, but it leads to an unintuitive edge case in the get_path verification logic and a confusing error message.
|
Claude finished @jbearer's task in 3m 48s —— View job Review of PR #4279: Separate state retention settings from consensus data
Overall AssessmentThis is a well-structured refactor that cleanly separates state pruning from consensus data pruning. The decomposition of the old monolithic The new test Issues FoundBug: Archive mode doesn't clear state pruned heightIn query("DELETE FROM pruned_height WHERE id = 1")
.execute(conn.as_mut())
.await?;This only clears This line isn't in the diff so I couldn't leave an inline comment — suggest changing to: query("DELETE FROM pruned_height")
.execute(conn.as_mut())
.await?;Minor: Stale comment in
|
This allows state to be pruned more aggressively than consensus data. This may be desired because state takes up a lot of space, archival state is rarely accessed, and old state can always be re-derived from headers. Thus, pruning state aggressively may enable a node to keep more history of consensus data while using less space. The logic for pruning is substantially reworked to separate (a) choosing which batch to prune based on retention settings and usage from (b) actually pruning a batch. This makes it much simpler to implement a complex policy such as having different retention policies for state vs consensus data.
This allows state to be pruned more aggressively than consensus data. This may be desired because state takes up a lot of space, archival state is rarely accessed, and old state can always be re-derived from headers. Thus, pruning state aggressively may enable a node to keep more history of consensus data while using less space.
The logic for pruning is substantially reworked to separate (a) choosing which batch to prune based on retention settings and usage from (b) actually pruning a batch. This makes it much simpler to implement a complex policy such as having different retention policies for state vs consensus data.