Blocked in use by harrylin98 · Pull Request #17 · JimB123/valkey

harrylin98 · 2026-02-19T18:13:43Z

No description provided.

### This PR adds GoogleTest (gtest) support to Valkey to enable writing modern unit tests. **Motivation**: GoogleTest provides richer assertions, test fixtures, mocking support, and improved diagnostics, helping improve test coverage and maintainability over time. **Summary**: The change is limited to test infrastructure, and existing C unit tests remain unchanged. This PR focuses only on integrating the framework and includes a small set of example tests to demonstrate usage. For more details, see `src/gtest/README.md`. --------- Signed-off-by: Harry Lin <harrylhl@amazon.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Harry Lin <49881386+harrylin98@users.noreply.github.com> Co-authored-by: Harry Lin <harrylhl@amazon.com> Co-authored-by: Jim Brunner <brunnerj@amazon.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Jacob Murphy <jkmurphy@google.com>

This PR fixes the issue where changes to src files did not trigger a rebuild of libvalkey.a when rerunning `make test-gtest`. It also resolves the problem where unchanged .cpp test files were unnecessarily recompiled during make test-gtest. Now: If nothing changed: - Nothing rebuilds, no relinking occurs. If only C++ test files changed: - Both libvalkey.a and C++ test files recompile (because release.o changes) - Test executable relinks If only src code changed: - Only libvalkey.a rebuilds - C++ test files do not recompile - Test executable relinks If both changed: - Both libvalkey.a and C++ test files rebuild - Test executable relinks Signed-off-by: Harry Lin <harrylhl@amazon.com> Co-authored-by: Harry Lin <harrylhl@amazon.com>

Resolved conflicts by merging both gtest and libbacktrace features: - ci.yml: Combined gtest dependencies with libbacktrace support - Makefile: Added libbacktrace configuration alongside gtest-parallel - daily.yml: Merged g++-multilib with libbacktrace build steps All build configurations now support both USE_LIBBACKTRACE and gtest unit tests.

…alkey-io#3217) ## Add `ALLOW_BUSY` flag to `SELECT` command ### Motivation When a client decides to switch databases (e.g., from DB 0 to DB 1), it must issue `SELECT` command. However it is possible that when the command was sent, the server was running a very long script, causing it to respond with `-BUSY` error. This leads to several issues: 1. pipelining clients can face inconsistency when the select is followed by a pipeline of commands to write to this database. in case only the select gets the `-BUSY` error and some of the rest of the commands are process AFTER the long script executes, these commands will be processed on the wrong database context. 2. Although clients can maintain logic to handle the prior issue, it complicates client logic and places a "head of line" delay, since the client will need to wait for the select return in order to continue pipeline commands, potentially forcing it to allocate a different connection per database. Also this is probably NOT being handled by any known client library ATM. 3. With the introduction of multi-database support in cluster mode, clients managing connections across multiple shards need to maintain a consistent database selection across all their connections. When a client decides to switch databases (e.g., from DB 0 to DB 1), it must issue `SELECT` to every shard in the cluster. Currently, if any shard is busy (e.g., executing a long-running Lua script or module command), the `SELECT` call to that shard will be rejected with a `-BUSY` error. This creates a **split-brain scenario** where some connections have switched to the new database while others remain on the old one. ### Why `ALLOW_BUSY` is safe for `SELECT` `SELECT` is a **connection-local, metadata-only operation**. It simply changes which database index the current client connection points to. It: - **Does not read or write any keys** - there is no data conflict with a running script. - **Is already flagged as `fast`** - it runs in O(1) and will not contribute to or extend any busy condition. - **Is already flagged as `loading`** - the server already recognizes that `SELECT` is safe to execute during sensitive server states (dataset loading from disk), establishing a precedent that this command is harmless to run when other commands are being rejected. - **Is already flagged as `stale`** - similarly, `SELECT` is allowed on stale replicas, further confirming it is treated as a safe, non-data operation. ### Changes - Added `allow_busy` to the `command_flags` for `SELECT` in `src/commands/select.json`. Signed-off-by: Gabi Ganam <ggabi@amazon.com> Co-authored-by: Gabi Ganam <ggabi@amazon.com>

The `test_quicklistCompressAndDecompressQuicklistListpackNode` unit test was failing with ASan in CI with OOM errors. The test was allocating and compressing up to 1GB of data (32 iterations × 32MB), which exceeded available memory in CI environments. The test is skipped for ASan builds. This PR addressed the unit test failures in `test-sanitizer-address-large-memory` for both GCC and CLANG Resolves valkey-io#3221 --------- Signed-off-by: Nikhil Manglore <nmanglor@amazon.com>

## Overview: This PR converted existing C unit tests to GTEST based on this introduction of GoogleTest framework valkey-io#2956 , as mentioned in valkey-io#2878 . ## Details: 1. Kept the previous test logic as much as possible, also kept all original comments from C unit tests. 2. Deleted C unit tests. 3. Changed headers: include "generated_wrappers.hpp" and remove all header files already in "generated_wrappers.hpp". 4. Used extern "C" to wrap the C include files to prevent name mangling. 5. Added Test Fixture with Setup/Teardown. ## Notes: 1. `ustime` is included in `util.h`, so deleted some duplicated `ustime`. 2. A lot of C tests include `.c` files directly to use static functions, but we don't want to copy static functions from `.c` to `.cpp`, so I added wrapper functions (prefix: `testOnly`) to `.c` to use in `.cpp` file. 3. Some C tests have shared state which is incompatible with gtest-parallel, so I changed them to isolated tests. (e.g. in `test_dict.cpp`, I made `_dict` an instance variable created fresh in SetUp() for each test, and added back the necessary setup code to each test so they can run independently in any order or in parallel). ## Testing: * GTEST: 253 tests + 32 disabled tests (285 tests in total) all pass --------- Signed-off-by: Alina Liu <liusalisa6363@gmail.com> Signed-off-by: Harry Lin <harrylhl@amazon.com> Signed-off-by: Harry Lin <49881386+harrylin98@users.noreply.github.com> Signed-off-by: Alina Liu <alinalq@dev-dsk-alinalq-2b-2db84246.us-west-2.amazon.com> Co-authored-by: Harry Lin <harrylhl@amazon.com> Co-authored-by: Harry Lin <49881386+harrylin98@users.noreply.github.com> Co-authored-by: Alina Liu <alinalq@dev-dsk-alinalq-2b-2db84246.us-west-2.amazon.com>

Signed-off-by: Harry Lin <harrylhl@amazon.com> Signed-off-by: Harry Lin <49881386+harrylin98@users.noreply.github.com> Signed-off-by: Alina Liu <liusalisa6363@gmail.com> Signed-off-by: Alina Liu <alinalq@dev-dsk-alinalq-2b-2db84246.us-west-2.amazon.com> Co-authored-by: Harry Lin <harrylhl@amazon.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Alina Liu <liusalisa6363@gmail.com> Co-authored-by: Alina Liu <alinalq@dev-dsk-alinalq-2b-2db84246.us-west-2.amazon.com>

Signed-off-by: Harry Lin <harrylhl@amazon.com>

Update deps/libvalkey to version 0.4.0 Squashed 'deps/libvalkey/' changes from b012f8e85..45c2ed15c 45c2ed15c Release 0.4.0 (valkey-io#286) 40d6590d7 Implement runtime dynamic loading for RDMA libraries (valkey-io#284) 62e757d17 Release 0.3.0 (valkey-io#283) a554f0942 Fix potential uint32_t underflow issue (valkey-io#280) 8f9051ae0 Correcting command parser bug (valkey-io#277) 29023eb36 Add valkey-json, valkey-bloom, valkey-search to cmddef.h ae756bc89 Update cmddef.h to Valkey 9.0.0 21abd737e Replace problematic alloca() with fixed stack alloc 38191079c Fix compilation on Solaris with Sun/Solaris Studio ef5de0312 Make libvalkey initialization thread-safe ae341dea5 Support slotmap updates using CLUSTER NODES in RESP3 (valkey-io#262) 36f6e2292 Fix the long-blocking read for Valkey RDMA. (valkey-io#233) c090c28be Use a uintptr_t hop for casting pointers to ints daa7f11ac Avoid heap buffer overflow in valkeyAsyncFormattedCommand (valkey-io#245) 15974930d Add option to select a logical database (valkey-io#244) 983d67e4f Install the macosx adapter on Apple platforms only ... git-subtree-dir: deps/libvalkey git-subtree-split: 45c2ed15cab9fa0ea1a6cabc8460f5eea6240de5 Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>

…onfigured (valkey-io#2846) When dual-channel-replication is enabled, and replica-announce-ip is set, the RDB/AOF channel does not announce itself at this endpoint. This defaults to the IP address behind the NAT, or the Kubernetes Pod IP in our case. This means that if Sentinel is polling the primary for connected replicas, it will first see the ephemeral pod IP, then revert to the announce-ip - leaving behind the pod IP as a down replica. This PR configures the RDB/AOF channel to also announce itself at the announce-ip to prevent the stale replica. ## Testing I evaluated writing unit tests for this, but I am not sure of a way we can test an IP address different to localhost (127.0.0.1) that would fail without the fix. I did test on Kubernetes against 9.0 tag and verified the fix there too. ### Status quo On 9.0 image tag: ``` $ kubectl get pods -n valkey-baseline -o custom-columns=NAME:.metadata.name,POD-IP:.status.podIP NAME POD-IP valkey-primary-5bd78c8566-llb6k 10.244.0.25 valkey-replica-0 10.244.0.17 valkey-replica-1 10.244.0.13 $ kubectl get services -n valkey-baseline -o custom-columns=NAME:.metadata.name,CLUSTER-IP:.spec.clusterIP NAME CLUSTER-IP valkey-primary 10.96.147.28 valkey-replica-0 10.96.66.233 valkey-replica-1 10.96.57.230 ``` Logs below show that pod IP for valkey-primary-5bd78c8566-llb6k `10.244.0.25:6379` is being used for dual-channel replication. This should be its cluster IP `10.96.147.28` as this is what is set in replica-announce-ip. ``` 1:M 14 Nov 2025 17:57:51.750 * Replica 10.96.147.28:6379 asks for synchronization 1:M 14 Nov 2025 17:57:51.751 * Replica 10.244.0.25:6379 asks for synchronization 1:M 14 Nov 2025 17:57:56.135 * Dual channel replication: Sending to replica 10.244.0.25:6379 RDB end offset 1763269 and client-id 35 1:M 14 Nov 2025 17:57:56.140 * Replica 10.96.147.28:6379 asks for synchronization ``` ### This fix ``` $ kubectl get pods -n valkey-test -o custom-columns=NAME:.metadata.name,CLUSTER-IP:.status.podIP NAME POD-IP valkey-primary-594c9597b5-qqvdk 10.244.0.26 valkey-replica-0 10.244.0.10 valkey-replica-1 10.244.0.18 $ kubectl get services -n valkey-test -o custom-columns=NAME:.metadata.name,CLUSTER-IP:.spec.clusterIP NAME CLUSTER-IP valkey-primary 10.96.125.142 valkey-replica None valkey-replica-0 10.96.155.74 valkey-replica-1 10.96.64.111 valkey-sentinel None ``` Logs show that the Cluster IP is now being used for dual-channel replication. ``` 1:M 14 Nov 2025 17:57:49.923 * Replica 10.96.125.142:6379 asks for synchronization 1:M 14 Nov 2025 17:57:49.924 * Replica 10.96.125.142:6379 asks for synchronization 1:M 14 Nov 2025 17:57:54.913 * Dual channel replication: Sending to replica 10.96.125.142:6379 RDB end offset 1771247 and client-id 36 1:M 14 Nov 2025 17:57:54.916 * Replica 10.96.125.142:6379 asks for synchronization ``` Fixes valkey-io#2338 Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>

…ast key (valkey-io#3197) This issue was encountered while processing valkey-io#3121. Currently in all our commands with KSPEC_FK_KEYNUM, key step is 1. So this bug does not currently affect any core commands. If we have commands with different key step values, calculting the last key in here will casue problems since we are not including step in the calculation. Signed-off-by: Binbin <binloveplay1314@qq.com>

… message Signed-off-by: Roshan Khatri <rvkhatri@amazon.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>

Signed-off-by: harrylin98 <harrylin980107@gmail.com>

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>

Remove -encoding binary from fconfigure as it is no longer supported in Tcl 9 (Fedora Rawhide and Daily / test-fedoralatest-jemalloc (pull_request)). -translation binary alone is sufficient. ``` https://github.com/valkey-io/valkey/actions/runs/22324297258/job/64590589777?pr=3225 ===== Start of server log (pid 17617) ===== ### Starting server for test 17617:M 23 Feb 2026 21:21:47.846 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see jemalloc/jemalloc#1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 17617:M 23 Feb 2026 21:21:47.846 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo 17617:M 23 Feb 2026 21:21:47.846 * Valkey version=9.0.3, bits=64, commit=00000000, modified=0, pid=17617, just started 17617:M 23 Feb 2026 21:21:47.846 * Configuration loaded 17617:M 23 Feb 2026 21:21:47.847 * monotonic clock: POSIX clock_gettime .+^+. .+#########+. .+########+########+. Valkey 9.0.3 (00000000/0) 64 bit .+########+' '+########+. .########+' .+. '+########. Running in cluster mode |####+' .+#######+. '+####| Port: 25121 |###| .+###############+. |###| PID: 17617 |###| |#####*'' ''*#####| |###| |###| |####' .-. '####| |###| |###| |###( (@@@) )###| |###| https://valkey.io/ |###| |####. '-' .####| |###| |###| |#####*. .*#####| |###| |###| '+#####| |#####+' |###| |####+. +##| |#+' .+####| '#######+ |##| .+########' '+###| |##| .+########+' '| |####+########+' +#########+' '+v+' 17617:M 23 Feb 2026 21:21:47.849 * No cluster configuration found, I'm 5501916ccdf76dee6b652cab10402cab3a8f9152 17617:M 23 Feb 2026 21:21:47.852 * Server initialized 17617:M 23 Feb 2026 21:21:47.852 * Ready to accept connections tcp 17617:M 23 Feb 2026 21:21:47.852 * Ready to accept connections unix 17617:M 23 Feb 2026 21:21:47.963 - Accepted 127.0.0.1:37791 17617:M 23 Feb 2026 21:21:47.963 - Client closed connection id=2 addr=127.0.0.1:37791 laddr=127.0.0.1:25121 fd=12 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=37504 events=r cmd=ping user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=7 tot-net-out=7 tot-cmds=1 17617:M 23 Feb 2026 21:21:47.969 - Accepted 127.0.0.1:39251 17617:M 23 Feb 2026 21:21:47.970 * configEpoch set to 1 via CLUSTER SET-CONFIG-EPOCH 17617:M 23 Feb 2026 21:21:47.975 # Missing implement of connection type tls 17617:M 23 Feb 2026 21:21:47.976 # DEBUG LOG: ========== I am primary 0 ========== 17617:M 23 Feb 2026 21:21:49.872 * Cluster state changed: ok ### Starting test Packet with missing gossip messages don't cause invalid read in tests/unit/cluster/packet.tcl 17617:M 23 Feb 2026 21:21:49.879 - Accepting cluster node connection from 127.0.0.1:33701 17617:M 23 Feb 2026 21:21:49.880 - Client closed connection id=3 addr=127.0.0.1:39251 laddr=127.0.0.1:25121 fd=12 name= age=2 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=426 obl=0 oll=0 omem=0 tot-mem=22144 events=r cmd=cluster|info user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=1363 tot-net-out=17324 tot-cmds=47 17617:signal-handler (1771881709) Received SIGTERM scheduling shutdown... 17617:M 23 Feb 2026 21:21:49.973 * User requested shutdown... 17617:M 23 Feb 2026 21:21:49.973 * Removing the pid file. 17617:M 23 Feb 2026 21:21:49.973 * Saving the cluster configuration file before exiting. 17617:M 23 Feb 2026 21:21:49.978 * Removing the unix socket file. 17617:M 23 Feb 2026 21:21:49.978 # Valkey is now ready to exit, bye bye... ===== End of server log (pid 17617) ===== ===== Start of server stderr log (pid 17617) ===== ===== End of server stderr log (pid 17617) ===== [exception]: Executing test client: unknown encoding "binary": No longer supported. please use either "-translation binary" or "-encoding iso8859-1". unknown encoding "binary": No longer supported. please use either "-translation binary" or "-encoding iso8859-1" while executing "fconfigure $sock -translation binary -encoding binary -buffering none -blocking 1" ("uplevel" body line 18) invoked from within "uplevel 1 $code" (procedure "test" line 62) invoked from within "test "Packet with missing gossip messages don't cause invalid read" { set base_port [srv 0 port] set cluster_port [expr {$base_port + ..." ("uplevel" body line 2) invoked from within "uplevel 1 $code" (procedure "cluster_setup" line 41) invoked from within "cluster_setup 1 0 1 continuous_slot_allocation default_replica_allocation { test "Packet with missing gossip messages don't cause invalid read" { ..." ("uplevel" body line 1) invoked from within "uplevel 1 $code " (procedure "start_server" line 2) invoked from within "start_server {overrides {cluster-enabled yes cluster-ping-interval 100 cluster-node-timeout 3000 cluster-databases 16 cluster-slot-stats-enabled yes} ..." ("uplevel" body line 1) invoked from within "uplevel 1 $code" (procedure "start_multiple_servers" line 5) invoked from within "start_multiple_servers $node_count $options $code" (procedure "start_cluster" line 17) invoked from within "start_cluster 1 0 {tags {external:skip cluster tls:skip}} { test "Packet with missing gossip messages don't cause invalid read" { set base..." (file "tests/unit/cluster/packet.tcl" line 84) invoked from within "source $path" (procedure "execute_test_file" line 4) invoked from within "execute_test_file $data" (procedure "test_client_main" line 10) invoked from within "test_client_main $::test_server_port " Killing still running Valkey server 10700 Killing still running Valkey server 11659 Killing still running Valkey server 11705 Killing still running Valkey server 12735 Killing still running Valkey server 12775 Killing still running Valkey server 12808 Killing still running Valkey server 12858 Killing still running Valkey server 12901 Killing still running Valkey server 12940 Killing still running Valkey server 12972 Killing still running Valkey server 13004 Killing still running Valkey server 13036 Killing still running Valkey server 13073 Killing still running Valkey server 14952 Killing still running Valkey server 16631 Killing still running Valkey server 16670 Killing still running Valkey server 16686 Killing still running Valkey server 16702 Killing still running Valkey server 16718 Killing still running Valkey server 16734 Killing still running Valkey server 16753 Killing still running Valkey server 16771 Killing still running Valkey server 17094 Killing still running Valkey server 17112 Killing still running Valkey server 17133 Killing still running Valkey server 17262 Killing still running Valkey server 17278 Killing still running Valkey server 17316 Killing still running Valkey server 17349 Killing still running Valkey server 17379 Killing still running Valkey server 17410 Killing still running Valkey server 17430 Killing still running Valkey server 17443 Killing still running Valkey server 17459 Killing still running Valkey server 17478 Killing still running Valkey server 17491 Killing still running Valkey server 17507 Killing still running Valkey server 17526 Killing still running Valkey server 17545 Killing still running Valkey server 17561 Killing still running Valkey server 17577 Killing still running Valkey server 17716 Killing still running Valkey server 17778 Killing still running Valkey server 17825 Killing still running Valkey server 17848 Killing still running Valkey server 17896 Killing still running Valkey server 17931 Killing still running Valkey server 17953 Killing still running Valkey server 17973 Killing still running Valkey server 17991 Killing still running Valkey server 18010 Killing still running Valkey server 18064 Killing still running Valkey server 18082 Killing still running Valkey server 18134 Killing still running Valkey server 18118 Killing still running Valkey server 18150 Killing still running Valkey server 18166 Killing still running Valkey server 18188 Killing still running Valkey server 18209 ``` Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>

…3192) When a cluster reset is performed on a replica node, a new shard ID is generated because the node is about to become an empty primary node, see valkey-io#2283. However, the log added in valkey-io#2510 caused some confusions. In clusterSetNodeAsPrimary we will print: ``` serverLog(LL_NOTICE, "Reconfiguring node %.40s (%s) as primary for shard %.40s", n->name, humanNodename(n), n->shard_id); ``` In clusterReset, we first call clusterSetNodeAsPrimary and then generate a new shard ID, which causes us to print an error shard ID log first. There is an exmaple, when a replica node performs a cluster reset, we will print: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Reconfiguring node af76a3e0ffcd77bd14fa47ce4d07ab2bdc78702f (xxx) as primary for shard ea528667634af8beed83adac2b9af8360769a1b4 ``` But the node shard id is actually: ``` xxx> cluster myshardid "52ede26d1554dd203161ba09011af14574b2cc84" ``` Now after a new shard ID is generated we will print a log, and we also move the call to clusterSetNodeAsPrimary after the new shard id, so that we can have the right one. After this PR: ``` xxx * Cluster reset (user request from 'xxx'). xxx * Moving myself to a new shard bd31870ce73f5977084e6a46e337a4a1ad38fc66. xxx * Reconfiguring node 1d54b904efd30cd9d7d1abbfd63c8fafbb62e1c8 (xxx) as primary for shard bd31870ce73f5977084e6a46e337a4a1ad38fc66 ``` This is part of valkey-io#2989, but i guess we won't merge the extension fix in a short time, so i am gonna extracting it separately as a log fix (or improvement). Signed-off-by: Binbin <binloveplay1314@qq.com>

…lkey-io#3091) ## Summary Fixes valkey-io#2620 Skip loading expired hash fields when a non-preamble RDB is being loaded on a primary server. Propagate `HDEL` to replicas when expired fields are skipped. ## Changes - Updated `rdbLoadObject` signature to accept `rdbflags` and `now` parameters - Added logic to skip expired hash fields during RDB load on primary - Propagate `HDEL` to replicas when `RDBFLAGS_FEED_REPL` is set - Updated all callers of `rdbLoadObject` - Added unit test --------- Signed-off-by: Hanxi Zhang <hanxizh@amazon.com> Co-authored-by: Binbin <binloveplay1314@qq.com>

…rink rehashing (valkey-io#3175) During hashtable shrinking, all keys are inserted into ht1. This PR adds a mechanism to the hashtable: when there are severe hash collisions with the new ht1 during the shrink rehashing process, the hashtable will stop shrinking by swapping ht0 and ht1 to avoid excessive performance degradation of the new hashtable during this period. In extreme cases, for example, if ht0 is very large and is reduced to only one entry, and the new ht1 is very small after the resize, the ht0 rehash process is very slow since we have a lot of empty buckets. If a large number of elements are added into ht1 at this time, it will lead to severe hash collisions in ht1. During the add operation during hashtable shrinking, we check the fill percentage of ht1. If it exceeds MAX_FILL_PERCENT_HARD, we swap ht0 and ht1, and abort the shrinking process, and then set rehash_idx back to 0 and restart the rehash. This PR also added a new debug hashtable-can-abort-shrink subcommand to control this behavior. --------- Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

…y test in crash report (valkey-io#3029) The memory test was commented out in valkey-io#2858 and should have been reenabled. On further investigation I found that the server hangs during shutdown inside the `bioDrainWorker(BIO_LAZY_FREE)` call. This causes deadlock because the lock was acquired for shutdown but lazy free jobs require the GIL too: - main thread: `serverCron()` acquires GIL via `afterSleep()` then calls `finishShutdown()`, which eventually calls our script module unload code that calls `bioDrainWorker()`. - bio threads: Pending lazy free jobs such as `lazyFreeEvalScripts()` call `scriptingEngineCallFreeFunction()` which requires the GIL. --------- Signed-off-by: Rain Valentine <rsg000@gmail.com>

Signed-off-by: harrylin98 <harrylin980107@gmail.com>

…tablished psync test (valkey-io#3242) Fix valkey-io#3231: Increase wait timeout for RDB load log detection in test Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>

…ey-io#3294) The test "Blocking keyspace notification with pipelining hset after hget" was recently failing intermittently with two different errors: 1. `Expected [expr {114 * 10 < 1114}]` - timing assertion failed under Valgrind 2. `Timeout waiting for blocked clients` - race condition on normal runs The test used wall-clock timing to verify that hget (non-blocking) completed faster than hset (blocking). This is unreliable because: - Valgrind slows execution 10-50x, making timing ratios meaningless - Fast systems may complete both operations in <10ms, causing ratio failures This fix replaces timing assertions with blocked client count checks, which directly verify the blocking mechanism rather than inferring it from timing. The test now confirms hget's response is available before hset blocks, then waits for the blocked client count to transition through the expected states. Signed-off-by: Rain Valentine <rsg000@gmail.com>

Closes valkey-io#3077 ### Overview URI in SAN is used to represent client identities in modern mTLS deployments where CN may be empty or deprecated. See the valkey-io#3077 (comment) for more details. When `tls-auth-clients-user URI` is configured, during the TLS handshake, the server iterates through the URIs in the client certificate and authenticates the client as the first enabled user whose name matches one of those URIs. ### Implementation - Introduced a new value `URI` for `tls-auth-clients-user` - Added new function `getCertSanUri` that: - Extracts URI entries from the certificate's SAN extension - Checks each URI against existing Valkey users - Returns the first URI that matches an enabled user - Renamed `getCertFieldByName` → `getCertSubjectFieldByName` for clarity - Modified `tlsGetPeerUsername` to support both CN and URI authentication modes ### Example behavior Common setup ``` # client certificate X509v3 Subject Alternative Name URI:urn:valkey:user:first, URI:urn:valkey:user:second # valkey.conf tls-auth-clients-user URI hide-user-data-from-log no ``` Use case 1: multiple enabled users ``` user urn:valkey:user:first on >clientpass allcommands allkeys user urn:valkey:user:second on >clientpass allcommands allkeys 39762:M 26 Jan 2026 22:06:25.122 - TLS: Auto-authenticated client as urn:valkey:user:first ``` Use case 2: first URI disabled, second enabled ``` user urn:valkey:user:first off >clientpass allcommands allkeys user urn:valkey:user:second on >clientpass allcommands allkeys 39792:M 26 Jan 2026 22:07:08.006 - TLS: Auto-authenticated client as urn:valkey:user:second ``` Use case 3: all matching users disabled or no matching user ``` user urn:valkey:user:first off >clientpass allcommands allkeys user urn:valkey:user:second off >clientpass allcommands allkeys 39812:M 26 Jan 2026 22:07:34.174 * TLS: No matching user found in certificate SAN URI fields 127.0.0.1:6379> acl whoami "default" 127.0.0.1:6379> acl log 1) 1) "count" 2) (integer) 1 3) "reason" 4) "tls-cert" 5) "context" 6) "toplevel" 7) "object" 8) "" 9) "username" 10) "urn:valkey:user:second" 11) "age-seconds" 12) "17.381" 13) "client-info" 14) "id=3 addr=127.0.0.1:57236 laddr=127.0.0.1:6379 fd=8 name= age=0 idle=0 flags=N capa= db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=17024 events=r cmd=NULL user=default redir=-1 resp=2 lib-name= lib-ver= tot-net-in=0 tot-net-out=0 tot-cmds=0" 15) "entry-id" 16) (integer) 0 17) "timestamp-created" 18) (integer) 1771963041866 19) "timestamp-last-updated" 20) (integer) 1771963041866 127.0.0.1:6379> ``` --------- Signed-off-by: Yang Zhao <zymy701@gmail.com>

…o#3310) The CodeQL workflow is currently throwing a deprecation warning regarding use of v3. > CodeQL Action v3 will be deprecated in December 2026. Please update all occurrences of the CodeQL Action in your workflow files to v4. This PR introduces the following changes: * References to CodeQL v3 have been updated to the SHA of the latest CodeQL release, [v4.32.5]. Signed-off-by: Kurt McKee <contactme@kurtmckee.org>

…r daily tests (valkey-io#3303) `SSL_get0_peer_certificate()` was introduced in OpenSSL 3.0. The recent commit 7e110ae (Support TLS authentication using SAN URI) used it in `tlsGetPeerUser()` without a version guard, breaking builds against `OpenSSL 1.1.x.` Use `SSL_get_peer_certificate()` on OpenSSL < 3.0 with the corresponding `X509_free()` since the older API increments the reference count. Fixes build failure: implicit declaration of function `SSL_get0_peer_certificate [-Werror=implicit-function-declaration]` Also fixes the version mismatch for almalinux 9 daily tests. Closes valkey-io#3304. Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>

as part of valkey-io#2508 we introduce a defrag optimization to avoid matching the replaced defrag entry. Even though defrag replacements does not impact the skip list order, when scores are equal, we MUST compare elements lexicographically to maintain correct skip list ordering. Otherwise we might miss locating the entry. --------- Signed-off-by: Ran Shidlansik <ranshid@amazon.com>

This PR fixes a Codecov workflow misconfiguration introduced when upgrading codecov/codecov-action from v4 to v5 (in valkey-io#3185). In v5, the action expects files (plural), but the workflow still used file. The coverage shown is 0 right now: https://app.codecov.io/gh/valkey-io/valkey Documentation from - https://github.com/codecov/codecov-action/tree/v5?tab=readme-ov-file#arguments ``` The following arguments have been changed file (this has been deprecated in favor of files) plugin (this has been deprecated in favor of plugins) ``` Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

Before we added LUA as a module we had a logic to NOT register the luaHook when the value of `busy-reply-threshold` config is set to 0. Now we ALWAYS register the hook and in order to keep aligned with old behavior we will let the execution of the script continue from the interrupt hook when `busy-reply-threshold` config is set == 0. And in value.conf, we are saying the config can be negative, but in fact the config is minimum at 0, fix the valkey.conf as well. ``` # The default is 5 seconds. It is possible to set it to 0 or a negative value # to disable this mechanism (uninterrupted execution) ``` It was introduced in valkey-io#2858. Signed-off-by: Alon Arenberg <alonare@amazon.com> Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Binbin <binloveplay1314@qq.com>

In valkey-io#3260 we try to deflake it by using a assert_range, but the upper limit of the range is low so the test is still flaky. Increase the upper limit to 200ms more than the expected latency, e.g. from 550 to 700 when the expected latency is 500. ``` *** [err]: LATENCY GRAPH can output the event graph in tests/unit/latency-monitor.tcl Expected '625' to be between to '450' and '550' (context: type source line 143 file /Users/runner/work/valkey/valkey/tests/unit/latency-monitor.tcl cmd {assert_range $high 450 550} proc ::test) ``` --------- Signed-off-by: Binbin <binloveplay1314@qq.com>

…io#3317) Since there is some mismatch between the already installed `ar` tool on a macOS runner and Clang 22, installed by brew; lets use the brew installed `llvm-ar`. Expected to fix the issue in CI job `build-macos-latest`. --------- Signed-off-by: Björn Svensson <bjorn.a.svensson@est.tech>

valkey-io#3311) Closes valkey-io#3286 --------- Signed-off-by: Yang Zhao <zymy701@gmail.com>

Replace integer mono_ticksPerMicrosecond with fixed-point arithmetic in x86 TSC monotonic clock for better accuracy (and performance because of multiplication + shift instead of division). Calibration now uses double precision to compute multiplier. **Comparison with TSC @ 2400.5 ticks/us** **Old Method:** Calibration: 2400 (truncated from 2400.5) Converting 1 second of ticks: 2,400,500,000 / 2400 = 1,000,208 us Error: +208 us per second **New Method:** Calibration: sample_ticks_per_us = 2400.5 (double stores exactly in this case) Multiplier: 2^24 / 2400.5 = 6989.942 -> stored as 6989 in uint64_t Converting 1 second: (2,400,500,000 * 6989) >> 24 = 999,992 us Error: -8 us per second --------- Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>

Currently after changing cluster-replica-no-failover, we will call clusterUpdateMyselfFlags to trigger CLUSTER_TODO_SAVE_CONFIG. But in gossip when the sender's NOFAILOVER changes, we won't trigger the save, this cause nodes.conf to not save the latest nofailover flag. Signed-off-by: Binbin <binloveplay1314@qq.com>

Implemented CLUSTERSCAN command for topology-aware scanning Unlike `SCAN` which is local to a single node, `CLUSTERSCAN` provides a mechanism that helps clients iterate across slot boundaries and handles `MOVED` redirections. **Key details** * Global cluster iteration via `fingerprint-{hashtag}-cursor` * Scan one slot at a time * Start the CLUSTERSCAN with 0 * SLOT argument for parallel scanning of multiple slots * Re-use scanGenericCommand for the response **Cursor format:** `fingerprint-{hashtag}-localcursor` - Fingerprint is a hash of the node's DB seed that identifies the current memory layout. On mismatch, scan restarts from cursor 0 rather than returning an error. - Fingerprint 0 indicates a cross slot cursor (e.g., initial cursor or slot transition) where validation is skipped. - Hashtag encodes the target slot - Local cursor tracks position within the slot **Usage:** ``` CLUSTERSCAN <cursor> [MATCH pattern] [COUNT count] [TYPE type] [SLOT number] ``` ``` CLUSTERSCAN 0 # Start scanning from slot 0 CLUSTERSCAN <cursor> # Continue from cursor CLUSTERSCAN 0 SLOT 1000 # Start scanning specific slot CLUSTERSCAN <cursor> MATCH user:* COUNT 100 ``` --------- Signed-off-by: nmvk <r@nmvk.com> Signed-off-by: Raghav <r@nmvk.com> Signed-off-by: Madelyn Olson <madelyneolson@gmail.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>

Before this change, if you built `valkey-server` (e.g. `make valkey-server`) the LUA module was not built. With this change, the LUA module is now a direct a dependency of `valkey-server` - (unless `BUILD_LUA=no` is passed) Added some colors to the Lua module & Lua lib `Makefile`s to they blend nicely in the build output. Signed-off-by: Eran Ifrah <eifrah@amazon.com>

…time_t (valkey-io#3252) Addresses issue valkey-io#2350 As noted in the issue, much of the expire code uses raw long long for timestamps, which provides no semantic meaning about the unit or purpose of the value. Valkey already defines mstime_t (milliseconds) and ustime_t (microseconds) typedefs — this PR replaces bare long long declarations with the appropriate typedef wherever the value represents an expiration timestamp or time duration. This PR only fixes a small subset in the codebase, but it is an incremental step toward fully replacing the bare long long references. --------- Signed-off-by: curious-george-rk <r.ebu@gmail.com> Co-authored-by: curious-george-rk <r.ebu@gmail.com>

Signed-off-by: harrylin98 <harrylin980107@gmail.com>

Now we will be able to add a `run-cluster-benchmark` label to run a benchmark with cluster-mode enabled valkey-server It will use the config https://github.com/valkey-io/valkey/blob/unstable/.github/benchmark_configs/benchmark-config-arm.json modified for for cluster mode with a single clustermode enabled instance of valkey. It uses the same single instance for the benchmark as for run-benchmark. If both labels are used, they are sequential in the same concurrency group `group: ec2-al-2023-pr-benchmarking-arm64`. --------- Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>

…ey-io#2746) This PR introduces a new config `cluster-message-gossip-perc` which allows an operator to modify the amount of gossip node information to be sent per ping/pong/meet message. It can be modified dynamically (Related to valkey-io#2291). The default value is 10% i.e. 10% of peer node information would be gossiped along with each ping/pong/meet packet. Users can tune this configuration, setting the value higher allows faster information dissemination whereas setting it lower would lead to direct PING messages if no information was received about a node with the `server.cluster_node_timeout/2` period. Note: the behavior for partially failed gossip nodes still remains intact where all the `pfail` nodes are part of the message for faster propagation of information and faster transition of `PFAIL` to `FAIL`. --------- Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

Signed-off-by: harrylin98 <harrylin980107@gmail.com>

…o#3336) This fixes a flaky dual-channel replication integration test: https://github.com/valkey-io/valkey/actions/runs/22810251608/job/66165776198#step:8:7701 `INFO memory` field `used_memory_overhead` and `MEMORY STATS` field `overhead.total` can change during dual-channel sync if replica's pending replication buffer is still changing. This is probably more visible in slower environments. The test now collects `INFO` and `MEMORY STATS` in a single `MULTI/EXEC` on both the primary and replica, so the compared values come from the same snapshot. Passing here: https://github.com/sarthakaggarwal97/valkey/actions/runs/22864585326/job/66327772967 Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

harrylin98 and others added 9 commits January 14, 2026 12:28

Blocked_inuse

329ac61

Combine the unblocked client lists

28aa8c8

Merge branch 'valkey-io:unstable' into blockedInUse

4160328

Merge remote-tracking branch 'origin/unstable' into feature-gtest

30ef060

Format code

476c5c8

github-actions Bot assigned harrylin98 Feb 19, 2026

Nikhil-Manglore and others added 17 commits February 19, 2026 20:34

Merge branch 'unstable' into Feb20_final_merge

4aff5f9

Signed-off-by: Harry Lin <harrylhl@amazon.com>

Fix for [CVE-2026-21863] Remote DoS with malformed Valkey Cluster bus…

61f990f

… message Signed-off-by: Roshan Khatri <rvkhatri@amazon.com> Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>

Fix for [CVE-2025-67733] RESP Protocol Injectiton via Lua error_reply

0d713a4

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>

Merge branch 'unstable' into Feb20_final_merge

cec912c

Signed-off-by: harrylin98 <harrylin980107@gmail.com>

Reset request type after handling empty requests

2565d44

Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>

Merge branch 'unstable' into Feb20_final_merge

e9b10e5

Signed-off-by: harrylin98 <harrylin980107@gmail.com>

harrylin98 force-pushed the blockedInUse branch 2 times, most recently from a6666f2 to 0e42a61 Compare February 24, 2026 22:19

Fix flake dual-channel-replication primary gets cob overrun before es…

4dee322

…tablished psync test (valkey-io#3242) Fix valkey-io#3231: Increase wait timeout for RDB load log detection in test Signed-off-by: Hanxi Zhang <hanxizh@amazon.com>

rainsupreme and others added 9 commits March 3, 2026 15:56

Fix compatibility issue for death test with Valgrind (valkey-io#3301)

896e9c8

harrylin98 force-pushed the blockedInUse branch from 1d5e104 to 00965ea Compare March 5, 2026 23:31

bjosv and others added 3 commits March 6, 2026 12:37

Recommend passwordless users for mTLS certificate-based authentication (

acb0b9b

valkey-io#3311) Closes valkey-io#3286 --------- Signed-off-by: Yang Zhao <zymy701@gmail.com>

harrylin98 force-pushed the blockedInUse branch 2 times, most recently from dc1510e to 829f5a6 Compare March 7, 2026 04:48

enjoy-binbin and others added 3 commits March 8, 2026 00:01

harrylin98 force-pushed the blockedInUse branch from 8f07cba to e63d25a Compare March 9, 2026 18:01

curious-george-rk and others added 2 commits March 9, 2026 20:03

Add connections check in unit test

b112f8a

Signed-off-by: harrylin98 <harrylin980107@gmail.com>

harrylin98 force-pushed the blockedInUse branch 2 times, most recently from c5682fb to a48c3dd Compare March 9, 2026 19:22

roshkhatri and others added 3 commits March 9, 2026 22:48

Fixing memory leak in unit test

8dee168

Signed-off-by: harrylin98 <harrylin980107@gmail.com>

harrylin98 force-pushed the blockedInUse branch from a48c3dd to 8dee168 Compare March 9, 2026 22:12

harrylin98 and others added 3 commits March 9, 2026 15:29

Merge branch 'valkey-io:unstable' into blockedInUse

cddeeb2

Merge branch 'valkey-io:unstable' into blockedInUse

9dcc5a3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blocked in use#17

Blocked in use#17
harrylin98 wants to merge 72 commits into
JimB123:jims-forklessfrom
harrylin98:blockedInUse

harrylin98 commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

harrylin98 commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants