Conversation
3124131 to
30a9774
Compare
cost_tidrangescan() was setting the disabled_nodes value correctly, and then immediately resetting it to zero, due to poor code editing on my part. materialized_finished_plan correctly set matpath.parent to zero, but forgot to also set matpath.parallel_workers = 0, causing an access to uninitialized memory in cost_material. (This shouldn't result in any real problem, but it makes valgrind unhappy.) reparameterize_path was dereferencing a variable before verifying that it was not NULL. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> (issue #1) Reported-by: Michael Paquier <michael@paquier.xyz> (issue #1) Diagnosed-by: Lukas Fittl <lukas@fittl.com> (issue #1) Reported-by: Zsolt Parragi <zsolt.parragi@percona.com> (issue #2) Reported-by: Richard Guo <guofenglinux@gmail.com> (issue #3) Discussion: http://postgr.es/m/CAN4CZFPvwjNJEZ_JT9Y67yR7C=KMNa=LNefOB8ZY7TKDcmAXOA@mail.gmail.com Discussion: http://postgr.es/m/aXrnPgrq6Gggb5TG@paquier.xyz
Use the proper constant InvalidXLogRecPtr instead of literal 0 when assigning XLogRecPtr variables and struct fields. This improves code clarity by making it explicit that these are invalid LSN values rather than ambiguous zero literals. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aRtd2dw8FO1nNX7k@ip-10-97-1-34.eu-west-3.compute.internal
The leaks were hard to reach in practice and the impact was low. The callers provide a buffer the same number of bytes as the source string (plus one for NUL terminator) as a starting size, and libc never increases the number of characters. But, if the byte length of one of the converted characters is larger, then it might need a larger destination buffer. Previously, in that case, the working buffers would be leaked. Even in that case, the call typically happens within a context that will soon be reset. Regardless, it's worth fixing to avoid such assumptions, and the fix is simple so it's worth backporting. Discussion: https://postgr.es/m/e2b7a0a88aaadded7e2d19f42d5ab03c9e182ad8.camel@j-davis.com Backpatch-through: 18
30a9774 to
c930d75
Compare
Commit 6ceef94 was still one brick shy of a load, because it caused any usage at all of PGIOAlignedBlock or PGAlignedXLogBlock to fail under older g++. Notably, this broke "headerscheck --cplusplus". We can permit references to these structs as abstract structs though; only actual declaration of such a variable needs to be forbidden. Discussion: https://www.postgresql.org/message-id/3119480.1769189606@sss.pgh.pa.us
In fcb9c97 I included an assertion in BufferLockConditional() to detect if a conditional lock acquisition is done on a buffer that we already have locked. The assertion was added in the course of adding other assertions. Unfortunately I failed to realize that some of our code relies on such lock acquisitions to silently fail. E.g. spgist and nbtree may try to conditionally lock an already locked buffer when acquiring a empty buffer. LWLockAcquireConditional(), which was previously used to implement ConditionalLockBuffer(), does not have such an assert. Instead of just removing the assert, and relying on the lock acquisition to fail due to the buffer already locked, this commit changes the behaviour of conditional content lock acquisition to fail if the current backend has any pre-existing lock on the buffer, even if the lock modes would not conflict. The reason for that is that we currently do not have space to track multiple lock acquisitions on a single buffer. Allowing multiple locks on the same buffer by a backend also seems likely to lead to bugs. There is only one non-self-exclusive conditional content lock acquisition, in GetVictimBuffer(), but it only is used if the target buffer is not pinned and thus can't already be locked by the current backend. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/90bd2cbb-49ce-4092-9f61-5ac2ab782c94@gmail.com
Previously, the CheckXidAlive check was performed within the table_scan*next* functions. This caused the check to be executed for every fetched tuple, an unnecessary overhead. To fix, move the check to table_beginscan* so it is performed once per scan rather than once per row. Note: table_tuple_fetch_row_version() does not use a scan descriptor; therefore, the CheckXidAlive check is retained in that function. The overhead is unlikely to be relevant for the existing callers. Reported-by: Andres Freund <andres@anarazel.de> Author: Dilip Kumar <dilipbalaut@gmail.com> Suggested-by: Andres Freund <andres@anarazel.de> Suggested-by: Amit Kapila <akapila@postgresql.org> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/tlpltqm5jjwj7mp66dtebwwhppe4ri36vdypux2zoczrc2i3mp%40dhv4v4nikyfg
Author: Yugo Nagata <nagata@sraoss.co.jp> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Discussion: https://postgr.es/m/20260128120056.b2a3e8184712ab5a537879eb@sraoss.co.jp
This makes the arrays somewhat easier to read. Author: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Discussion: https://postgr.es/m/202601281204.sdxbr5qvpunk@alvherre.pgsql
These changes should have been done by 2f96613, but were overlooked. I noticed while reviewing the code for commit b8926a5. Author: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/18984-0f4778a6599ac3ae@postgresql.org
For readability. It was a slight modularity violation to have fields in PGShmemHeader that were only used by the allocator code in shmem.c. And it was inconsistent that ShmemLock was nevertheless not stored there. Moving all the allocator-related fields to a separate struct makes it more consistent and modular, and removes the need to allocate and pass ShmemLock separately via BackendParameters. Merge InitShmemAccess() and InitShmemAllocation() into a single function that initializes the struct when called from postmaster, and when called from backends in EXEC_BACKEND mode, re-establishes the global variables. That's similar to all the *ShmemInit() functions that we have. Co-authored-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/CAExHW5uNRB9oT4pdo54qAo025MXFX4MfYrD9K15OCqe-ExnNvg@mail.gmail.com
BackgroundPsql needs to wait for all the output from an interactive psql command to come back. To make sure that's happened, it issues the command, then issues \echo and \warn psql commands that echo a "banner" string (which we assume won't appear in the command's output), then waits for the banner strings to appear. The hazard in this approach is that the banner will also appear in the echoed psql commands themselves, so we need to distinguish those echoes from the desired output. Commit 8b886a4 tried to do that by positing that the desired output would be directly preceded and followed by newlines, but it turns out that that assumption is timing-sensitive. In particular, it tends to fail in builds made --without-readline, wherein the command echoes will be made by the pty driver and may be interspersed with prompts issued by psql proper. It does seem safe to assume that the banner output we want will be followed by a newline, since that should be the last output before things quiesce. Therefore, we can improve matters by putting quotes around the banner strings in the \echo and \warn psql commands, so that their echoes cannot include banner directly followed by newline, and then checking for just banner-and-newline in the match pattern. While at it, spruce up the pump() call in sub query() to look like the neater version in wait_connect(), and don't die on timeout until after printing whatever we got. Reported-by: Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru> Diagnosed-by: Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com> Discussion: https://postgr.es/m/db6fdb35a8665ad3c18be01181d44b31@postgrespro.ru Backpatch-through: 14
Similarly to the preceding commit, 030_pager.pl was assuming that patterns it looks for in interactive psql output would appear by themselves on a line, but that assumption tends to fall over in builds made --without-readline: the output we get might have a psql prompt immediately followed by the expected line of output. For several of these tests, just checking for the pattern followed by newline seems sufficient, because we could not get a false match against the command echo, nor against the unreplaced command output if the pager fails to be invoked when expected. However, that's fairly scary for the test that was relying on information_schema.referential_constraints: "\d+" could easily appear at the end of a line in that view. Let's get rid of that hazard by making a custom test view instead of using information_schema.referential_constraints. This test script is new in v19, so no need for back-patch. Reported-by: Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru> Author: Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com> Discussion: https://postgr.es/m/db6fdb35a8665ad3c18be01181d44b31@postgrespro.ru
The build generates four files based on the wait event contents stored in wait_event_names.txt: - wait_event_types.h - pgstat_wait_event.c - wait_event_funcs_data.c - wait_event_types.sgml The SGML file is generated as part of a documentation build, with its data stored in doc/src/sgml/ for meson and configure. The three others are handled differently for meson and configure: - In configure, all the files are created in src/backend/utils/activity/. A link to wait_event_types.h is created in src/include/utils/. - In meson, all the files are created in src/include/utils/. The two C files, pgstat_wait_event.c and wait_event_funcs_data.c, are then included in respectively wait_event.c and wait_event_funcs.c, without the "utils/" path. For configure, this does not present a problem. For meson, this has to be combined with a trick in src/backend/utils/activity/meson.build, where include_directories needs to point to include/utils/ to make the inclusion of the C files work properly, causing builds to pull in PostgreSQL headers rather than system headers in some build paths, as src/include/utils/ would take priority. In order to fix this issue, this commit reworks the way the C/H files are generated, becoming consistent with guc_tables.inc.c: - For meson, basically nothing changes. The files are still generated in src/include/utils/. The trick with include_directories is removed. - For configure, the files are now generated in src/backend/utils/, with links in src/include/utils/ pointing to the ones in src/backend/. This requires extra rules in src/backend/utils/activity/Makefile so as a make command in this sub-directory is able to work. - The three files now fall under header-stamp, which is actually simpler as guc_tables.inc.c does the same. - wait_event_funcs_data.c and pgstat_wait_event.c are now included with "utils/" in their path. This problem has not been an issue in the buildfarm; it has been noted with AIX and a conflict with float.h. This issue could, however, create conflicts in the buildfarm depending on the environment with unexpected headers pulled in, so this fix is backpatched down to where the generation of the wait-event files has been introduced. While on it, this commit simplifies wait_event_names.txt regarding the paths of the files generated, to mention just the names of the files generated. The paths where the files are generated became incorrect. The path of the SGML path was wrong. This change has been tested in the CI, down to v17. Locally, I have run tests with configure (with and without VPATH), as well as meson, on the three branches. Combo oversight in fa88928 and 1e68e43. Reported-by: Aditya Kamath <aditya.kamath1@ibm.com> Discussion: https://postgr.es/m/LV8PR15MB64888765A43D229EA5D1CFE6D691A@LV8PR15MB6488.namprd15.prod.outlook.com Backpatch-through: 17
A failing unlink() was reporting an incorrect error message, referring to stat(). Author: Man Zeng <zengman@halodbtech.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/tencent_3BBE865C5F49D452360FF190@qq.com Backpath-through: 17
Up to now we've used GNU-style local labels for branch targets in s_lock.h's assembly blocks. But there's an alternative style, which I for one didn't know about till recently: use regular assembler labels, and insert a per-asm-block number in them using %= to ensure they are distinct across multiple TAS calls within one source file. gcc has had %= since gcc 2.0, and I've verified that clang knows it too. While the immediate motivation for changing this is that AIX's assembler doesn't do local labels, it seems to me that this is a superior solution anyway. There is nothing mnemonic about "1:", while a regular label can convey something useful, and at least to me it feels less error-prone. Therefore let's standardize on this approach, also converting the one other usage in s_lock.h. Discussion: https://postgr.es/m/399291.1769998688@sss.pgh.pa.us
Separate att_align_nominal() into two macros, similarly to what was already done with att_align_datum() and att_align_pointer(). The inner macro att_nominal_alignby() is really just TYPEALIGN(), while att_align_nominal() retains its previous API by mapping TYPALIGN_xxx values to numbers of bytes to align to and then calling att_nominal_alignby(). In support of this, split out tupdesc.c's logic to do that mapping into a publicly visible function typalign_to_alignby(). Having done that, we can replace performance-critical uses of att_align_nominal() with att_nominal_alignby(), where the typalign_to_alignby() mapping is done just once outside the loop. In most places I settled for doing typalign_to_alignby() once per function. We could in many places pass the alignby value in from the caller if we wanted to change function APIs for this purpose; but I'm a bit loath to do that, especially for exported APIs that extensions might call. Replacing a char typalign argument by a uint8 typalignby argument would be an API change that compilers would fail to warn about, thus silently breaking code in hard-to-debug ways. I did revise the APIs of array_iter_setup and array_iter_next, moving the element type attribute arguments to the former; if any external code uses those, the argument-count change will cause visible compile failures. Performance testing shows that ExecEvalScalarArrayOp is sped up by about 10% by this change, when using a simple per-element function such as int8eq. I did not check any of the other loops optimized here, but it's reasonable to expect similar gains. Although the motivation for creating this patch was to avoid a performance loss if we add some more typalign values, it evidently is worth doing whether that patch lands or not. Discussion: https://postgr.es/m/1127261.1769649624@sss.pgh.pa.us
Oversight in commit 5373bc2. Author: Michael Banck <mbanck@gmx.net> Discussion: https://postgr.es/m/20260202173156.GB17962%40p46.dedyn.io%3Blightning.p46.dedyn.io
…porary table. The test relies on VACUUM being able to mark a page all-visible, but this can fail when autovacuum in other sessions prevents the visibility horizon from advancing. Making the test table temporary isolates its horizon from other sessions, including catalog table vacuums, ensuring reliable test behavior. Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/2b09fba6-6b71-497a-96ef-a6947fcc39f6%40gmail.com
This commit introduces a new prompt escape %i for psql, which shows whether the connected server is operating in hot standby mode. It expands to standby if the server reports in_hot_standby = on, and primary otherwise. This is useful for distinguishing standby servers from primary ones at a glance, especially when working with multiple connections in replicated environments where libpq's multi-host connection strings are used. Author: Jim Jones <jim.jones@uni-muenster.de> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Greg Sabino Mullane <htamfids@gmail.com> Reviewed-by: Srinath Reddy Sadipiralla <srinath2133@gmail.com> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/flat/016f6738-f9a9-4e98-bb5a-e1e4b9591d46@uni-muenster.de
…changes. Previously, when synchronous_standby_names was changed (for example, by reducing the number of required synchronous standbys or modifying the standby list), backends waiting for synchronous replication were not released immediately, even if the new configuration no longer required them to wait. They could remain blocked until additional messages arrived from standbys and triggered their release. This commit improves walsender so that backends waiting for synchronous replication are released as soon as the updated configuration takes effect and the new settings no longer require them to wait, by calling SyncRepReleaseWaiters() when configuration changes are processed. As part of this change, the duplicated code that handles configuration changes in walsender has been refactored into a new helper function, which is now used at the three existing call places. Since this is an improvement rather than a bug fix, it is applied only to the master branch. Author: Shinya Kato <shinya11.kato@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/CAOzEurSRii0tEYhu5cePmRcvS=ZrxTLEvxm3Kj0d7_uKGdM23g@mail.gmail.com
This routine has an option to bypass an error if a WAL summary file is opened for read but is missing (missing_ok=true). However, the code incorrectly checked for EEXIST, that matters when using O_CREAT and O_EXCL, rather than ENOENT, for this case. There are currently only two callers of OpenWalSummaryFile() in the tree, and both use missing_ok=false, meaning that the check based on the errno is currently dead code. This issue could matter for out-of-core code or future backpatches that would like to use missing_ok set to true. Issue spotted while monitoring this area of the code, after a9afa02. Author: Michael Paquier <michael@paquier.xyz> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aYAf8qDHbpBZ3Rml@paquier.xyz Backpatch-through: 17
Two wait events are added to the COPY FROM/TO code: * COPY_FROM_READ: reading data from a copy_file. * COPY_TO_WRITE: writing data to a copy_file. In the COPY code, copy_file can be set when processing a command through the pipe mode (for the non-DestRemote case), the program mode or the file mode, when processing fread() or fwrite() on it. Author: Nikolay Samokhvalov <nik@postgres.ai> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAM527d_iDzz0Kqyi7HOfqa-Xzuq29jkR6AGXqfXLqA5PR5qsng@mail.gmail.com
This keeps run-time assertions and static assertions clearly separate. Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/2273bc2a-045d-4a75-8584-7cd9396e5534%40eisentraut.org
When using ALTER TABLE ... ADD CONSTRAINT to add a not-null constraint with an explicit name, we have to ensure that if the column is already marked NOT NULL, the provided name matches the existing constraint name. Failing to do so could lead to confusion regarding which constraint object actually enforces the rule. This patch adds a check to throw an error if the user tries to add a named not-null constraint to a column that already has one with a different name. Reported-by: yanliang lei <msdnchina@163.com> Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de> Co-authored-bu: Srinath Reddy Sadipiralla <srinath2133@gmail.com> Backpatch-through: 18 Discussion: https://postgr.es/m/19351-8f1c523ead498545%40postgresql.org
I don't understand how to reach errdetail_abort() with MyProc->recoveryConflictPending set. If a recovery conflict signal is received, ProcessRecoveryConflictInterrupt() raises an ERROR or FATAL error to cancel the query or connection, and abort processing clears the flag. The error message from ProcessRecoveryConflictInterrupt() is very clear that the query or connection was terminated because of recovery conflict. The only way to reach it AFAICS is with a race condition, if the startup process sends a recovery conflict signal when the transaction has just entered aborted state for some other reason. And in that case the detail would be misleading, as the transaction was already aborted for some other reason, not because of the recovery conflict. errdetail_abort() was the only user of the recoveryConflictPending flag in PGPROC, so we can remove that and all the related code too. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/4cc13ba1-4248-4884-b6ba-4805349e7f39@iki.fi
Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/4cc13ba1-4248-4884-b6ba-4805349e7f39@iki.fi
The pg_dump documentation had repetitive notes for the --schema, --table, and --extension switches, noting that dependent database objects are not automatically included in the dump. This commit removes these notes and replaces them with a consolidated paragraph in the "Notes" section. pg_restore had a similar note for -t but lacked one for -n; do likewise. Also, add a note to --extension in pg_dump to note that ancillary files (such as shared libraries and control files) are not included in the dump and must be present on the destination system. Author: Florents Tselai <florents.tselai@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/284C4D55-4F90-4AA0-84C8-1E6A28DDF271@gmail.com
A failure while closing pg_wal/summaries/ incorrectly generated a report about pg_wal/archive_status/. While at it, this commit adds #undefs for the macros used in KillExistingWALSummaries() and KillExistingArchiveStatus() to prevent those values from being misused in an incorrect function context. Oversight in dc21234. Author: Tianchen Zhang <zhang_tian_chen@163.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/SE2P216MB2390C84C23F428A7864EE07FA19BA@SE2P216MB2390.KORP216.PROD.OUTLOOK.COM Backpatch-through: 17
Currently, when the argument of copyObject() is const-qualified, the return type is also, because the use of typeof carries over all the qualifiers. This is incorrect, since the point of copyObject() is to make a copy to mutate. But apparently no code ran into it. The new implementation uses typeof_unqual, which drops the qualifiers, making this work correctly. typeof_unqual is standardized in C23, but all recent versions of all the usual compilers support it even in non-C23 mode, at least as __typeof_unqual__. We add a configure/meson test for typeof_unqual and __typeof_unqual__ and use it if it's available, else we use the existing fallback of just returning void *. Reviewed-by: David Geier <geidav.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org
We can immediately make use of it in pg_signal_backend(), which previously fetched the process type from the backend status array with pgstat_get_backend_type_by_proc_number(). That was correct but felt a little questionable to me: backend status should be for observability purposes only, not for permission checks. Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/b77e4962-a64a-43db-81a1-580444b3e8f5@iki.fi
Mostly this involves checking for NULL pointer before doing operations that add a non-zero offset. The exception is an overflow warning in heap_fetch_toast_slice(). This was caused by unneeded parentheses forcing an expression to be evaluated to a negative integer, which then got cast to size_t. Per clang 21 undefined behavior sanitizer. Backpatch to all supported versions. Co-authored-by: Alexander Lakhin <exclusion@gmail.com> Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/777bd201-6e3a-4da0-a922-4ea9de46a3ee@gmail.com Backpatch-through: 14
Commit 5f13999 added a TAP test for GUC settings passed via the CONNECTION string in logical replication, but the buildfarm member sungazer reported test failures. The test incorrectly used the subscriber's log file position as the starting offset when reading the publisher's log. As a result, the test failed to find the expected log message in the publisher's log and erroneously reported a failure. This commit fixes the test to use the publisher's own log file position when reading the publisher's log. Also, to avoid similar confusion in the future, this commit splits the single $log_location variable into $log_location_pub and $log_location_sub, clearly distinguishing publisher and subscriber log positions. Backpatched to v15, where commit 5f13999 introduced the test. Per buildfarm member sungazer. This issue was reported and diagnosed by Alexander Lakhin. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/966ec3d8-1b6f-4f57-ae59-fc7d55bc9a5a@gmail.com Backpatch-through: 15
Instead of assigning the backend type in the Main function of each postmaster child, do it right after fork(), by which time it is already known by postmaster_child_launch(). This reduces the time frame during which MyBackendType is incorrect. Before this commit, ProcessStartupPacket would overwrite MyBackendType to B_BACKEND for dead-end backends, which is quite dubious. Stop that. We may now see MyBackendType == B_BG_WORKER before setting up MyBgworkerEntry. As far as I can see this is only a problem if we try to log a message and %b is in log_line_prefix, so we now have a constant string to cover that case. Previously, it would print "unrecognized", which seems strictly worse. Author: Euler Taveira <euler@eulerto.com> Discussion: https://postgr.es/m/e85c6671-1600-4112-8887-f97a8a5d07b2@app.fastmail.com
This affects two command patterns, showing information about relations: * oid2name -x -d DBNAME, applying to all relations on a database. * oid2name -x -d DBNAME -t TABNAME [-t ..], applying to a subset of defined relations on a database. The relative path of a relation is added to the information provided, using pg_relation_filepath(). Author: David Bidoc <dcbidoc@gmail.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Guillaume Lelarge <guillaume.lelarge@dalibo.com> Reviewed-by: Euler Taveira <euler@eulerto.com> Reviewed-by: Mark Wong <markwkm@gmail.com> Discussion: https://postgr.es/m/CABour1v2CU1wjjoM86wAFyezJQ3-+ncH43zY1f1uXeVojVN8Ow@mail.gmail.com
Commit 29d0a77 improved pg_upgrade to allow migrating logical slots provided that all logical slots have caught up (i.e., they have no pending decodable WAL records). Previously, this verification was done by checking each slot individually, which could be time-consuming if there were many logical slots to migrate. This commit optimizes the check to avoid reading the same WAL stream multiple times. It performs the check only for the slot with the minimum confirmed_flush_lsn and applies the result to all other slots in the same database. This limits the check to at most one logical slot per database. During the check, we identify the last decodable WAL record's LSN to report any slots with unconsumed records, consistent with the existing error reporting behavior. Additionally, the maximum confirmed_flush_lsn among all logical slots on the database is used as an early scan cutoff; finding a decodable WAL record beyond this point implies that no slot has caught up. Performance testing demonstrated that the execution time remains stable regardless of the number of slots in the database. Note that we do not distinguish slots based on their output plugins. A hypothetical plugin might use a replication origin filter that filters out changes from a specific origin. In such cases, we might get a false positive (erroneously considering a slot caught up). However, this is safe from a data integrity standpoint, such scenarios are rare, and the impact of a false positive is minimal. This optimization is applied only when the old cluster is version 19 or later. Bump catalog version. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAD21AoBZ0LAcw1OHGEKdW7S5TRJaURdhEk3CLAW69_siqfqyAg@mail.gmail.com
The attribute storing the statistics data for a set of expressions in pg_statistic_ext_data is stxdexpr. stxdexprs does not exist. Extracted from a larger patch by the same author. Incorrect as of efbebb4. Author: Corey Huinker <corey.huinker@gmail.com> Discussion: https://postgr.es/m/CADkLM=fPcci6oPyuyEZ0F4bWqAA7HzaWO+ZPptufuX5_uWt6kw@mail.gmail.com
synchronized_standby_slots is defined in guc_parameter.dat as part of the REPLICATION_PRIMARY group and is listed under the "Primary Server" section in postgresql.conf.sample. However, in the documentation its description was previously placed under the "Sending Servers" section. Since synchronized_standby_slots only takes effect on the primary server, this commit moves its documentation to the "Primary Server" section to match its behavior and other references. Backpatch to v17 where synchronized_standby_slots was added. Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Shinya Kato <shinya11.kato@gmail.com> Discussion: https://postgr.es/m/CAHGQGwE_LwgXgCrqd08OFteJqdERiF3noqOKu2vt7Kjk4vMiGg@mail.gmail.com Backpatch-through: 17
Provide a way to disable the use of posix_fallocate() for relation files. It was introduced by commit 4d330a6. The new setting file_extend_method=write_zeros can be used as a workaround for problems reported from the field: * BTRFS compression is disabled by the use of posix_fallocate() * XFS could produce spurious ENOSPC errors in some Linux kernel versions, though that problem is reported to have been fixed The default is file_extend_method=posix_fallocate if available, as before. The write_zeros option is similar to PostgreSQL < 16, except that now it's multi-block. Backpatch-through: 16 Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reported-by: Dimitrios Apostolou <jimis@gmx.net> Discussion: https://postgr.es/m/b1843124-fd22-e279-a31f-252dffb6fbf2%40gmx.net
These errors are very unlikely going to show up, but in the event that they happen, some incorrect information would have been provided: - In pg_rewind, a stat() failure was reported as an open() failure. - In pg_combinebackup, a check for the new directory of a tablespace mapping was referred as the old directory. - In pg_combinebackup, a failure in reading a source file when copying blocks referred to the destination file. The changes for pg_combinebackup affect v17 and newer versions. For pg_rewind, all the stable branches are affected. Author: Man Zeng <zengman@halodbtech.com> Discussion: https://postgr.es/m/tencent_1EE1430B1E6C18A663B8990F@qq.com Backpatch-through: 14
This routine's internals directly used MyProcNumber to choose which object ID to assign for the hash key of a backend's stats entry, while the value to use is given as input argument of the function. The original intention was to pass MyProcNumber as an argument of pgstat_create_backend() when called in pgstat_bestart_final(), pgstat_beinit() ensuring that MyProcNumber has been set, not use it directly in the function. This commit addresses this inconsistency by using the procnum given by the caller of pgstat_create_backend(), not MyProcNumber. This issue is not a cause of bugs currently. However, let's keep the code in sync across all the branches where this code exists, as it could matter in a future backpatch. Oversight in 4feba03. Reported-by: Ryo Matsumura <matsumura.ryo@fujitsu.com> Discussion: https://postgr.es/m/TYCPR01MB11316AD8150C8F470319ACCAEE866A@TYCPR01MB11316.jpnprd01.prod.outlook.com Backpatch-through: 18
First, split the Protocol Versions table in two, and lead with the list of versions that are supported today. Reserved and unsupported version numbers go into the second table. Second, in anticipation of a new (reserved) protocol extension, document the extension negotiation process alongside version negotiation, and add the corresponding tables for future extension parameter registrations. Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: David G. Johnston <david.g.johnston@gmail.com> Discussion: https://postgr.es/m/DDPR5BPWH1RJ.1LWAK6QAURVAY%40jeltef.nl
The main reason that libpq doesn't request protocol version 3.2 by
default is because other proxy/server implementations don't implement
the negotiation. This is a bit of a chicken-and-egg problem: We don't
bump the default version that libpq requests, but other implementations
may not be incentivized to implement version negotiation if their users
never run into issues.
One established practice to combat this is to flip Postel's Law on its
head, by sending parameters that the server cannot possibly support. If
the server fails the handshake instead of correctly negotiating, then
the problem is surfaced naturally. If the server instead claims to
support the bogus parameters, then we fail the connection to make the
lie obvious. This is called "grease" (or "greasing"), after the GREASE
mechanism in TLS that popularized the concept:
https://www.rfc-editor.org/rfc/rfc8701.html
This patch reserves 3.9999 as an explicitly unsupported protocol version
number and `_pq_.test_protocol_negotiation` as an explicitly unsupported
protocol extension. A later commit will send these by default in order
to stress-test the ecosystem during the beta period; that commit will
then be reverted before 19 RC1, so that we can decide what to do with
whatever data has been gathered.
The _pq_.test_protocol_negotiation change here is intentionally docs-
only: after its implementation is reverted, the parameter should remain
reserved.
Extracted/adapted from a patch by Jelte Fennema-Nio.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/DDPR5BPWH1RJ.1LWAK6QAURVAY%40jeltef.nl
Commit 4020b37 had the idea that it would be a good idea to handle testing PGS_CONSIDER_NONPARTIAL within cost_material to save callers the trouble, but that turns out not to be a very good idea. One concern is that it makes cost_material() dependent on the caller having initialized certain fields in the MaterialPath, which is a bit awkward for materialize_finished_plan, which wants to use a dummy path. Another problem is that it can result in generated materialized nested loops where the Materialize node is disabled, contrary to the intention of joinpath.c's logic in match_unsorted_outer() and consider_parallel_nestloop(), which aims to consider such paths only when they would not need to be disabled. In the previous coding, it was possible for the pgs_mask on the joinrel to have PGS_CONSIDER_NONPARTIAL set, while the inner rel had the same bit clear. In that case, we'd generate and then disable a Materialize path. That seems wrong, so instead, pull up the logic to test the PGS_CONSIDER_NONPARTIAL bit into joinpath.c, restoring the historical behavior that we don't generate a given materialized nested loop, or we don't disable it.
Commit 94f3ad3 failed to do this because I couldn't think of a use for the information, but this has proven to be short-sighted. Best to fix it before this code is officially released. Now, the only argument to standard_planenr that isn't passed to planner_setup_hook is boundParams, but that is accessible via glob->boundParams, and so doesn't need to be passed separately.
Suppose that we're currently planning a query and, when that same query was previously planned and executed, we learned something about how a certain table within that query should be planned. We want to take note when that same table is being planned during the current planning cycle, but this is difficult to do, because the RTI of the table from the previous plan won't necessarily be equal to the RTI that we see during the current planning cycle. This is because each subquery has a separate range table during planning, but these are flattened into one range table when constructing the final plan, changing RTIs. Commit 8c49a48 allows us to match up subqueries seen in the previous planning cycles with the subqueries currently being planned just by comparing textual names, but that's not quite enough to let us deduce anything about individual tables, because we don't know where each subquery's range table appears in the final, flattened range table. To fix that, store a list of SubPlanRTInfo objects in the final planned statement, each including the name of the subplan, the offset at which it begins in the flattened range table, and whether or not it was a dummy subplan -- if it was, some RTIs may have been dropped from the final range table, but also there's no need to control how a dummy subquery gets planned. The toplevel subquery has no name and always begins at rtoffset 0, so we make no entry for it. This commit teaches pg_overexplain's RANGE_TABLE option to make use of this new data to display the subquery name for each range table entry. Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Greg Burd <greg@burd.me> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Haibo Yan <tristan.yim@gmail.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
An extension (or core code) might want to reconstruct the planner's choice of join order from the final plan. To do so, it must be possible to find all of the RTIs that were part of the join problem in that plan. The previous commit, together with the earlier work in 8c49a48, is enough to let us match up RTIs we see in the final plan with RTIs that we see during the planning cycle, but we still have a problem if the planner decides to drop some RTIs out of the final plan altogether. To fix that, when setrefs.c removes a SubqueryScan, single-child Append, or single-child MergeAppend from the final Plan tree, record the type of the removed node and the RTIs that the removed node would have scanned in the final plan tree. It would be natural to record this information on the child of the removed plan node, but that would require adding an additional pointer field to type Plan, which seems undesirable. So, instead, store the information in a separate list that the executor need never consult, and use the plan_node_id to identify the plan node with which the removed node is logically associated. Also, update pg_overexplain to display these details. Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Greg Burd <greg@burd.me> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Haibo Yan <tristan.yim@gmail.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
An extension (or core code) might want to reconstruct the planner's decisions about whether and where to perform partitionwise joins from the final plan. To do so, it must be possible to find all of the RTIs of partitioned tables appearing in the plan. But when an AppendPath or MergeAppendPath pulls up child paths from a subordinate AppendPath or MergeAppendPath, the RTIs of the subordinate path do not appear in the final plan, making this kind of reconstruction impossible. To avoid this, propagate the RTI sets that would have been present in the 'apprelids' field of the subordinate Append or MergeAppend nodes that would have been created into the surviving Append or MergeAppend node, using a new 'child_append_relid_sets' field for that purpose. The value of this field is a list of Bitmapsets, because each relation whose append-list was pulled up had its own set of RTIs: just one, if it was a partitionwise scan, or more than one, if it was a partitionwise join. Since our goal is to see where partitionwise joins were done, it is essential to avoid losing the information about how the RTIs were grouped in the pulled-up relations. This commit also updates pg_overexplain so that EXPLAIN (RANGE_TABLE) will display the saved RTI sets. Co-authored-by: Robert Haas <rhaas@postgresql.org> Co-authored-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Greg Burd <greg@burd.me> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Haibo Yan <tristan.yim@gmail.com> Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com> Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
Up until now, the only way for a loadable module to disable the use of a particular index was to use get_relation_info_hook to remove it from the index list. While that works, it has some disadvantages. First, the index becomes invisible for all purposes, and can no longer be used for optimizations such as self-join elimination or left join removal, which can severely degrade the resulting plan. Second, if the module attempts to compel the use of a certain index by removing all other indexes from the index list and disabling other scan types, but the planner is unable to use the chosen index for some reason, it will fall back to a sequential scan, because that is only disabled, whereas the other indexes are, from the planner's point of view, completely gone. While this situation ideally shouldn't occur, it's hard for a loadable module to be completely sure whether the planner will view a certain index as usable for a certain query. If it isn't, it's more desirable to fall back to the next-cheapest plan than to be forced into a sequential scan.
Provide a facility that (1) can be used to stabilize certain plan choices so that the planner cannot reverse course without authorization and (2) can be used by knowledgeable users to insist on plan choices contrary to what the planner believes best. In both cases, terrible outcomes are possible: users should think twice and perhaps three times before constraining the planner's ability to do as it thinks best; nevertheless, there are problems that are much more easily solved with these facilities than without them. This patch takes the approach of analyzing a finished plan to produce textual output, which we call "plan advice", that describes key decisions made during plan; if that plan advice is provided during future planning cycles, it will force those key decisions to be made in the same way. Not all planner decisions can be controlled using advice; for example, decisions about how to perform aggregation are currently out of scope, as is choice of sort order. Plan advice can also be edited by the user, or even written from scratch in simple cases, making it possible to generate outcomes that the planner would not have produced. Partial advice can be provided to control some planner outcomes but not others. Currently, plan advice is focused only on specific outcomes, such as the choice to use a sequential scan for a particular relation, and not on estimates that might contribute to those outcomes, such as a possibly-incorrect selectivity estimate. While it would be useful to users to be able to provide plan advice that affects selectivity estimates or other aspects of costing, that is out of scope for this commit. For more details, see contrib/pg_plan_advice/README or the SGML documentation. NOTE: There are still some known problems with this code, which are called out by XXX. Reviewed-by: Lukas Fittl <lukas@fittl.com> Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com> Reviewed-by: Greg Burd <greg@burd.me> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Haibo Yan <tristan.yim@gmail.com> Reviewed-by: Dian Fay <di@nmfay.com> Reviewed-by: Ajay Pal <ajay.pal.k@gmail.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: http://postgr.es/m/CA+TgmoZ-Jh1T6QyWoCODMVQdhTUPYkaZjWztzP1En4=ZHoKPzw@mail.gmail.com
Commit e222534 adjusted the logic in add_path() to keep the path list sorted by disabled_nodes and then by total_cost, but failed to make the corresponding adjustment to add_partial_path. As a result, add_partial_path might sort the path list just by total cost, which could lead to later planner misbehavior. In principle, this should be back-patched to v18, but we are typically reluctant to back-patch planner fixes for fear of destabilizing working installations, and it is unclear to me that this has sufficiently serious consequences to justify an exception, so for now, no back-patch.
Previously, the coments stated that there was no purpose to considering startup cost for partial paths, but this is not the case: it's perfectly reasonable to want a fast-start path for a plan that involves a LIMIT (perhaps over an aggregate, so that there is enough data being processed to justify parallel query but yet we don't want all the result rows). Accordingly, rewrite add_partial_path and add_partial_path_precheck to consider startup costs. This also fixes an independent bug in add_partial_path_precheck: commit e222534 failed to update it to do anything with the new disabled_nodes field. That bug fix is formally separate from the rest of this patch and could be committed separately, but I think it makes more sense to fix both issues together, because then we can (as this commit does) just make add_partial_path_precheck do the cost comparisons in the same way as compare_path_costs_fuzzily, which hopefully reduces the chances of ending up with something that's still incorrect. This patch is based on earlier work on this topic by Tomas Vondra, but I have rewritten a great deal of it. Co-authored-by: Robert Haas <rhaas@postgresql.org> Co-authored-by: Tomas Vondra <tomas@vondra.me>
The TAP test included in this new module runs the regression tests with pg_plan_advice loaded. It arranges for each query to be planned twice. The first time, we generate plan advice. The second time, we replan the query using the resulting advice string. If the tests fail, that means that using pg_plan_advice to tell the planner to do what it was going to do anyway breaks something, which indicates a problem either with pg_plan_advice or with the planner.
c930d75 to
987560f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Mandatory description