-
Notifications
You must be signed in to change notification settings - Fork 413
Fix C++ VLA and template keyword issues for Apple Clang 17 #10654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Apple Clang 17 enforces stricter C++ compliance:
1. Variable Length Arrays (VLAs) - Replaced C99/Clang extension VLAs
with std::vector for standard C++ compliance:
- AggregateFunctionArray.h, AggregateFunctionForEach.h,
AggregateFunctionNull.h
- JSONEachRowRowInputStream.cpp
- FunctionsGeo.h
- EncodingUtil.h, EncodingUtil.cpp
- AESCTRCipher.cpp, AESCTRCipherStream.cpp
- ColumnFileTiny.cpp
- Various gtest files
2. Missing includes - Added explicit #include <memory> and <vector>
where std::shared_ptr and std::vector are used:
- CPUAffinityManager.h
- ColumnRange_fwd.h
3. Unnecessary template keyword - Removed template keyword before
non-template member functions:
- Aggregator.cpp
- JoinPartition.cpp
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
📝 WalkthroughWalkthroughReplaced variable-length and fixed-size stack arrays with std::vector in multiple places, removed unnecessary explicit Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: Repository UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
dbms/src/Functions/FunctionsGeo.h (2)
27-30: Add<vector>include looks correct; consider replacing<math.h>with<cmath>.
<vector>is needed for the new container; separately,<cmath>is generally preferable in C++ to avoid global C symbols/leakage.
234-238: Update the “array on stack” comment / rationale for the 10k-argument cap.The cap in
getReturnTypeImpl()is still OK as a general guardrail, but the comment implies stack allocation; the main allocation is nowstd::vector<Ellipse>on the heap.Proposed comment tweak
- /// For array on stack, see below. + /// Guardrail against excessive argument counts (memory/time). if (arguments.size() > 10000) { throw Exception("Number of arguments of function " + getName() + " is too large."); }Also applies to: 259-283
dbms/src/DataStreams/JSONEachRowRowInputStream.cpp (1)
101-106:read_columnsmigration tostd::vector<bool>is safe for this access pattern. (index-only reads/writes; preserves default-false init)Optional: avoid `std::vector` specialization
- std::vector<bool> read_columns(num_columns, false); + std::vector<UInt8> read_columns(num_columns, 0); ... - if (read_columns[index]) + if (read_columns[index]) throw Exception(...); ... - read_columns[index] = true; + read_columns[index] = 1; ... - if (!read_columns[i]) + if (!read_columns[i]) header.getByPosition(i).type->insertDefaultInto(*columns[i]);Also applies to: 148-165
dbms/src/IO/Compression/EncodingUtil.cpp (2)
357-384: Good VLA removal; consider a defensive size assertion to lock in the assumptiondest_size <= required_size.Proposed diff
const auto required_size = round_size * TYPE_BYTE_SIZE + TYPE_BYTE_SIZE; std::vector<char> tmp_buffer(required_size, 0); + assert(dest_size <= required_size); // copy the first value to the temporary buffer memcpy(tmp_buffer.data(), src, TYPE_BYTE_SIZE);
386-413: Same comment for the UInt64 specialization (nice cleanup).Proposed diff
const auto required_size = round_size * TYPE_BYTE_SIZE + TYPE_BYTE_SIZE; std::vector<char> tmp_buffer(required_size, 0); + assert(dest_size <= required_size); // copy the first value to the temporary buffer memcpy(tmp_buffer.data(), src, TYPE_BYTE_SIZE);dbms/src/Storages/Page/V3/tests/gtest_blob_store.cpp (2)
354-430: Tests: prefermemcmpoverstrncmpfor binary payload validation.Proposed diff
- ASSERT_EQ(strncmp(c_buff.data() +index * buff_size, c_buff_read.data() +index * buff_size, record.entry.size), 0); + ASSERT_EQ(memcmp(c_buff.data() + index * buff_size, c_buff_read.data() + index * buff_size, record.entry.size), 0); ... - ASSERT_EQ(strncmp(c_buff.data() +index * buff_size, page.data.begin(), page.data.size()), 0); + ASSERT_EQ(memcmp(c_buff.data() + index * buff_size, page.data.begin(), page.data.size()), 0); ... - ASSERT_EQ(strncmp(c_buff.data() +index * buff_size, page.data.begin(), page.data.size()), 0); + ASSERT_EQ(memcmp(c_buff.data() + index * buff_size, page.data.begin(), page.data.size()), 0);
541-609: Good: shared buffer switched tostd::vector<char>; consider.data()for the other&c_buff[...]uses for consistency.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
dbms/src/AggregateFunctions/AggregateFunctionArray.hdbms/src/AggregateFunctions/AggregateFunctionForEach.hdbms/src/AggregateFunctions/AggregateFunctionNull.hdbms/src/Common/CPUAffinityManager.hdbms/src/Common/tests/gtest_bitpacking_primitives.cppdbms/src/DataStreams/JSONEachRowRowInputStream.cppdbms/src/Functions/FunctionsGeo.hdbms/src/IO/Checksum/tests/gtest_dm_checksum_buffer.cppdbms/src/IO/Compression/EncodingUtil.cppdbms/src/IO/Compression/EncodingUtil.hdbms/src/IO/Encryption/AESCTRCipher.cppdbms/src/IO/Encryption/AESCTRCipherStream.cppdbms/src/IO/Encryption/tests/gtest_encryption_test.cppdbms/src/Interpreters/Aggregator.cppdbms/src/Interpreters/JoinPartition.cppdbms/src/Storages/DeltaMerge/ColumnFile/ColumnFileTiny.cppdbms/src/Storages/DeltaMerge/Filter/ColumnRange_fwd.hdbms/src/Storages/Page/V2/tests/gtest_page_util.cppdbms/src/Storages/Page/V3/tests/gtest_blob_store.cppdbms/src/Storages/Page/V3/tests/gtest_page_storage.cppdbms/src/Storages/Page/V3/tests/gtest_wal_log.cpp
🧰 Additional context used
🧬 Code graph analysis (3)
dbms/src/AggregateFunctions/AggregateFunctionNull.h (1)
dbms/src/Columns/IColumn.h (1)
IColumn(52-73)
dbms/src/IO/Encryption/AESCTRCipherStream.cpp (1)
dbms/src/IO/Encryption/tests/gtest_encryption_test.cpp (1)
iv(191-191)
dbms/src/IO/Compression/EncodingUtil.h (1)
dbms/src/IO/Compression/EncodingUtil.cpp (10)
applyFrameOfReference(164-201)applyFrameOfReference(164-164)applyFrameOfReference(203-203)applyFrameOfReference(204-204)applyFrameOfReference(205-205)applyFrameOfReference(206-206)applyFrameOfReference(207-207)applyFrameOfReference(208-208)applyFrameOfReference(209-209)applyFrameOfReference(210-210)
🪛 Clang (14.0.6)
dbms/src/Storages/Page/V3/tests/gtest_blob_store.cpp
[error] 372-372: variable 'edit' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 645-645: variable 'edit' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 801-801: variable 'buff' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
🔇 Additional comments (29)
dbms/src/Storages/DeltaMerge/Filter/ColumnRange_fwd.h (1)
17-26: Make the header self-contained by including<vector>(good fix).
ColumnRangesusesstd::vector, so adding#include <vector>avoids relying on transitive includes and fixes stricter compiler builds.dbms/src/Common/CPUAffinityManager.h (1)
20-20: Correctly adds missing include for std::shared_ptr.The file uses
LoggerPtr(astd::shared_ptrtypedef) at line 37 and declaresLoggerPtr logat line 133, so the<memory>include is necessary. This change aligns with the PR objective of adding missing includes for stricter Apple Clang 17 compliance.dbms/src/Functions/FunctionsGeo.h (1)
260-262:std::vector<Ellipse>+ellipses.data()migration is correct.This removes the non-standard VLA-style storage while preserving the
isPointInEllipses(const Ellipse*)API.Also applies to: 315-321, 330-336
dbms/src/Interpreters/JoinPartition.cpp (1)
1414-1414: LGTM! Correct removal of unnecessarytemplatekeyword.The
push_backmethod is a regular member function, not a template member function, so the explicittemplatedisambiguator is unnecessary and causes compilation errors with Apple Clang 17's stricter standards. This change correctly makes the code standards-compliant without altering behavior.dbms/src/Interpreters/Aggregator.cpp (2)
678-696: LGTM! Correct removal of unnecessarytemplatekeywords.The
findKeyandemplaceKeymethods are regular (non-template) member functions, so the explicittemplatedisambiguator is unnecessary. This change correctly addresses Apple Clang 17's stricter enforcement of C++ standards without altering the logic.
698-715: LGTM! Consistent with the previous overload.This overload correctly removes the unnecessary
templatekeywords from calls to non-template member functionsfindKeyandemplaceKey, maintaining consistency with the changes in the first overload above.dbms/src/IO/Checksum/tests/gtest_dm_checksum_buffer.cpp (1)
339-339: LGTM! Correct removal of unnecessary template keyword.The explicit
templatedisambiguator is only required for dependent names in template contexts. Sinceslicesis a local variable with a fully-instantiated type, thetemplatekeyword is unnecessary here and Apple Clang 17 correctly rejects it.dbms/src/IO/Encryption/AESCTRCipher.cpp (1)
163-164: LGTM! Correct replacement of VLA with fixed-size array.The replacement is accurate since both AES and SM4 use 128-bit (16-byte) block sizes. The added comment clearly documents this constraint. This change eliminates the non-standard C99/GNU VLA extension while maintaining correct behavior.
dbms/src/Storages/Page/V3/tests/gtest_wal_log.cpp (1)
211-214: LGTM! Proper VLA replacement with std::vector.The migration from a C-style variable-length array to
std::vectorensures standard C++ compliance. All pointer accesses correctly use.data(), maintaining both memory safety and the original behavior.dbms/src/AggregateFunctions/AggregateFunctionArray.h (2)
25-26: LGTM! Appropriate include addition.The
<vector>header is correctly added to support the VLA replacement below.
75-75: LGTM! Clean VLA replacement with std::vector.The replacement of the variable-length array with
std::vector<const IColumn *>ensures standard C++ compliance. The usage correctly passes the raw pointer via.data()to the nested function, maintaining the original interface contract.Also applies to: 99-99
dbms/src/IO/Compression/EncodingUtil.h (1)
137-140: LGTM! Correct VLA replacement in conditional path.The migration to
std::vector<unsigned char>eliminates the non-standard VLA while maintaining correct behavior. All pointer operations properly use.data(), and the allocation only occurs whenround_size != count, preserving the optimization for the common case.dbms/src/DataStreams/JSONEachRowRowInputStream.cpp (1)
20-21: Good: add explicit<vector>include for the new container usage.dbms/src/Storages/DeltaMerge/ColumnFile/ColumnFileTiny.cpp (2)
25-27: Good:<vector>include added alongside new usage.
373-391: Verify thedata_size == 0encryption path with a specific test case.The loop validates each column's serialized size is non-zero, but if the block contains zero columns,
data_sizewould be zero. Whilestd::vector<char>(0)andreadStrict(_, 0)are safely handled, the subsequent call tofile_provider->encryptPage()withdata_size == 0needs verification. ThecipherDataandCipherfunctions do not explicitly validatedata_size > 0before encryption, so confirm they handle empty payloads correctly.dbms/src/Storages/Page/V3/tests/gtest_page_storage.cpp (1)
95-120: Nice test hardening: move large buffers and batch containers off the stack;.data()usage is correct.dbms/src/Storages/Page/V3/tests/gtest_blob_store.cpp (4)
171-210: Good: remove stack buffer +const_castby switching tostd::vector<char>and.data().
432-540: Good: vectorized IO limiter test buffers/batches; pointer arithmetic via.data()looks correct.
862-1205: Bulk replacements tostd::vector+.data()look consistent; LGTM.Also applies to: 1634-1931
789-860: Verify the exact clang-tidy warning from CI output—the check "init-variables" does not exist in the repository's clang-tidy configuration.The .clang-tidy configuration does not include a check named
init-variables. The relevant uninitialized variable checks enabled areclang-analyzer-core.uninitialized.*andcppcoreguidelines-pro-type-member-init. Code inspection shows variables at lines 372, 645, and 801 are properly initialized. Without actual CI warning messages, it is unclear whether this refers to a stale warning, a misnamed check, or a false positive that has already been resolved.dbms/src/AggregateFunctions/AggregateFunctionNull.h (2)
38-38: LGTM: Standard library include added for std::vector.The include is necessary for the VLA replacement on line 477.
477-477: LGTM: VLA replaced with std::vector for Apple Clang 17 compatibility.The change correctly replaces the variable-length array with
std::vector<const IColumn *>, which is standard C++ compliant. The vector is sized appropriately (bounded by MAX_ARGS = 8), and passingnested_columns.data()maintains the same interface contract with the nested aggregate function.Also applies to: 497-497
dbms/src/AggregateFunctions/AggregateFunctionForEach.h (2)
28-28: LGTM: Standard library include added for std::vector.The include is necessary for the VLA replacement on line 157.
157-157: LGTM: VLA replaced with std::vector for Apple Clang 17 compatibility.The change correctly replaces the variable-length array with
std::vector<const IColumn *>and passesnested.data()to maintain the same interface. This follows the same pattern as the fix in AggregateFunctionNull.h.Also applies to: 185-185
dbms/src/Storages/Page/V2/tests/gtest_page_util.cpp (2)
39-39: LGTM: VLA replaced with std::vector in test.The change correctly replaces stack-allocated char buffers with
std::vector<char>and uses.data()to pass pointers to PageUtil functions. The test logic and assertions remain valid.Also applies to: 49-49, 58-61
84-85: LGTM: VLA replaced with std::vector in test.Consistent with the ReadWriteFile test, this change correctly replaces stack-allocated buffers with
std::vector<char>for the larger buffer size test case.Also applies to: 95-95, 105-106
dbms/src/Common/tests/gtest_bitpacking_primitives.cpp (1)
184-185: LGTM: VLA replaced with std::vector in test.The change correctly replaces stack-allocated buffers with
std::vector<unsigned char>and uses.data()to pass pointers to BitpackingPrimitives functions. The test logic, including pointer arithmetic on line 192, remains correct.Also applies to: 188-189, 192-192
dbms/src/IO/Encryption/AESCTRCipherStream.cpp (1)
46-49: LGTM: VLA replaced with fixed-size array for IV.The change correctly replaces the variable-length IV array with a fixed 16-byte array. Both AES and SM4 always use 128-bit (16-byte) IVs for CTR mode:
blockSize(method)returnsAES_BLOCK_SIZE(16 bytes) for all AES variants and 16 for SM4. TheinitIVfunction writes exactly 16 bytes (2 ×UInt64viamemcpy). The explicit size makes the code more self-documenting and is applied consistently across both theencryptanddecryptmethods.dbms/src/IO/Encryption/tests/gtest_encryption_test.cpp (1)
70-72: LGTM! Consistent VLA-to-vector migration for Apple Clang 17 compatibility.All fixed-size stack arrays have been correctly replaced with
std::vector, and pointer usage has been consistently updated to use.data(). The changes:
- Preserve the original semantics (raw pointers from
.data()behave identically to array names)- Improve memory safety through RAII
- Ensure C++ standard compliance by removing VLA/C99 extensions
- Maintain correct buffer sizes throughout
The transformations are mechanical and correct across all test cases in this file.
Also applies to: 191-197, 248-261, 278-295, 311-335, 364-376, 408-417, 431-443, 467-476, 498-503, 524-534, 559-566
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JinheLin, wshwsh12 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold |
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
Signed-off-by: JaySon-Huang <tshent@qq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
dbms/src/IO/Encryption/AESCTRCipher.cpp (1)
255-255: Consider replacing this VLA for consistency.Line 255 still uses a variable-length array (
unsigned char partial_block[block_size]), similar to what was fixed in the OpenSSLCipher function. While this may not cause issues if USE_GM_SSL is not defined on Apple Clang builds, replacing it with a fixed-size array (e.g.,unsigned char partial_block[16]) would maintain consistency with the OpenSSLCipher fix and improve overall code portability.♻️ Suggested fix for consistency
- unsigned char partial_block[block_size]; + // Block size is always 16 bytes (128-bit) for SM4 + static_assert(16 == 16, "SM4 block size must be 16 bytes"); + unsigned char partial_block[16];Alternatively, if SM4_BLOCK_SIZE is available, use that constant instead of the magic number 16.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
dbms/src/DataStreams/JSONEachRowRowInputStream.cppdbms/src/IO/Compression/EncodingUtil.hdbms/src/IO/Encryption/AESCTRCipher.cppdbms/src/IO/Encryption/AESCTRCipherStream.cppdbms/src/Storages/Page/V3/tests/gtest_blob_store.cppdbms/src/Storages/Page/V3/tests/gtest_page_storage.cppdbms/src/Storages/Page/V3/tests/gtest_wal_log.cpp
🚧 Files skipped from review as they are similar to previous changes (4)
- dbms/src/Storages/Page/V3/tests/gtest_wal_log.cpp
- dbms/src/IO/Compression/EncodingUtil.h
- dbms/src/Storages/Page/V3/tests/gtest_page_storage.cpp
- dbms/src/DataStreams/JSONEachRowRowInputStream.cpp
🧰 Additional context used
🪛 Clang (14.0.6)
dbms/src/Storages/Page/V3/tests/gtest_blob_store.cpp
[error] 371-371: variable 'edit' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 644-644: variable 'edit' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
[error] 802-802: variable 'buff' is not initialized
(cppcoreguidelines-init-variables,-warnings-as-errors)
🔇 Additional comments (6)
dbms/src/IO/Encryption/AESCTRCipher.cpp (2)
18-18: LGTM! Necessary include for AES_BLOCK_SIZE.The addition of
<openssl/aes.h>is required to use the AES_BLOCK_SIZE constant introduced at line 166.
164-166: Excellent VLA replacement with defensive assertion.The change from a variable-length array to a fixed-size array correctly addresses Apple Clang 17 compatibility. The runtime assertion at line 165 is good defensive programming that validates the assumption that all supported ciphers use a 16-byte block size.
dbms/src/Storages/Page/V3/tests/gtest_blob_store.cpp (1)
175-1924: LGTM! VLA removal is correctly implemented throughout the test file.The refactoring successfully replaces all Variable Length Arrays (VLAs) with
std::vector, making the code compliant with strict C++ standards required by Apple Clang 17. Key observations:
- All stack-allocated character buffers now use
std::vector<char>with runtime sizing- Pointer access correctly uses
.data()method for C-API compatibility (ReadBufferFromMemory, strncmp, memset)- Fixed-size WriteBatch and PageEntriesEdit arrays also converted to vectors (lines 448-449)
- Pointer arithmetic on vector data is handled correctly (e.g.,
c_buff.data() + offset)- Test semantics and memory lifetimes are preserved
Regarding the static analysis hints: All three warnings are false positives. Lines 371, 644, and 802 show variables initialized via assignment or factory functions, which clang-tidy incorrectly flags.
dbms/src/IO/Encryption/AESCTRCipherStream.cpp (3)
58-60: VLA replaced with fixed-size array (consistent with encrypt).The same VLA-to-fixed-array transformation applied in
encrypt()is correctly mirrored here indecrypt(). The same verification regarding block size assumptions applies (see previous comment).
17-17: Include addition is correct and necessary.The
#include <openssl/aes.h>is required for theAES_BLOCK_SIZEconstant used in the fixed-size array declarations on lines 49 and 60. AES_BLOCK_SIZE is defined as 16 in openssl/aes.h and matches the block sizes of all supported encryption methods (AES variants and SM4 both use 16-byte blocks).
47-49: The block size assumption is already validated byblockSize().The
blockSize()function (AESCTRCipher.cpp lines 80–105) explicitly guarantees that all supported encryption methods return 16 bytes: Aes128Ctr, Aes192Ctr, and Aes256Ctr returnAES_BLOCK_SIZE, while SM4Ctr returns 16 (as documented in the code comment). The assert validates this internal invariant and will fail if an unknown method is passed, which is appropriate for production code since the method originates from trusted EncryptionInfo sources.The VLA replacement is correct and fixes the Apple Clang 17 compatibility issue.
Likely an incorrect or invalid review comment.
Signed-off-by: JaySon-Huang <tshent@qq.com>
|
/unhold |
|
/retest |
What problem does this PR solve?
Issue Number: close #10653
Problem Summary:
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit
Refactor
Tests
✏️ Tip: You can customize this high-level summary in your review settings.