Skip to content
This repository was archived by the owner on Oct 2, 2025. It is now read-only.

Conversation

@KnightMurloc
Copy link

@KnightMurloc KnightMurloc commented Sep 18, 2025

Commit bbbd801 fixed inefficient SQL for restoring statistics. For new backups, this was done by replacing the IN operator with = in queries to delete existing statistics. For existing backups, this was done by enabling nested loop join for a specific range of gpbackup versions, the backups created with which contained inefficient SQL. Version checking was performed by checking the patchset number. However, gpbackup binaries shipped to customers do not contain the patchset number, which prevents this optimization from being activated.
This commit was reverted and split into two.
The first commit fixes the SQL for new backups by replacing the IN operator with = in queries to delete statistics.
The second commit fixes the issue for existing backups by enabling nested loop join in gprestore.


do not squash.

This patch fixed inefficient sql when restoring statistics by replacing the IN
operator with =. To fix existing backups, the nested loop was enabled when
restoring statistics. This was done only for a certain range of patchsets. But
it turned out that the binaries that are supplied to customers do not have a
patchset number, which is why this optimization is not activated. Therefore,
this patch has been reworked in the next commits.

This reverts commit bbbd801.
Statistics are restored in 3 stages. In the first step, we update reltuples in
pg_class. In the second step, we delete statistics for a specific attribute in
pg_statistic. And at the last stage, we insert new statistics for this
attribute. At the second stage, to determine the attribute number, it is
searched in pg_attribute. For this, a subquery and the IN operator were used.
This led to an inefficient plan that contains a seq scan on pg_statistic. This,
in turn, can significantly affect the speed of statistical restore.

Fix this by replacing the IN operator with = in the attribute statistics
deletion query. We can be sure that the subquery will return no more than one
row, since pg_attribute has a unique index on the attrelid and attname
attributes.
@silent-observer
Copy link

Third commit's description contains an invalid commit hash. I assume this is supposed to be the second commit's hash, however since the PR will be rebased, the commit hashes will change anyway. I suggest replacing the hash with "previous commit" instead.

Other than that, the patch seems good and doesn't have any performance degradation compared to the previous version.

Statistics backups created in gpbackup versions starting from 1.30.5_arenadata16
have inefficient SQL for deleting existing statistics statements. For new
backups, this issue has been fixed in the previous commit. This patch fixes the
issue for existing backups by enabling nested loop join. This should lead to a
more efficient plan with an index scan instead of a seq scan.
@dkovalev1
Copy link

In general it looks good, performance remains the same, but:

  • Please format description to 80 columns
  • Perhaps by pathset you meant patchset?
  • Do you think all tests under "Restore statistic" are not relevant anymore?

@KnightMurloc
Copy link
Author

In general it looks good, performance remains the same, but:

* Please format description to 80 columns

* Perhaps  by `pathset` you meant `patchset`?

* Do you think all tests under "Restore statistic" are not relevant anymore?
  1. The PR will be merged via rabase, so it is not necessary to align its description to 80 characters.
  2. Fixed.
  3. These tests tested the enabling of nestedloop for certain versions of gpbackup. This logic has been removed, so the tests are also no longer relevant.

@KnightMurloc KnightMurloc merged commit ab9477f into master Sep 22, 2025
3 checks passed
@KnightMurloc KnightMurloc deleted the ADBDEV-8282 branch September 22, 2025 12:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants