Skip to content

mbocevski/valkey-flash

valkey-flash

CI codecov License

A Valkey module that tiers key/value data to NVMe storage, letting Valkey serve a working set larger than RAM. Hot entries live in an in-memory cache; cold entries reside on NVMe and are promoted back on read. The NVMe I/O path uses io_uring and runs on background threads — the Valkey event loop never blocks on disk.

valkey-flash ships full cluster support, durable writes via a per-record WAL, and native RDB + AOF persistence. Strings, hashes, lists, and sorted sets are all supported as tiered types in v1.0.0 — see the command reference below and CHANGELOG for details.

Quick start

# Run the docker image (requires Linux kernel ≥5.6 for io_uring)
docker run --rm -p 6379:6379 \
  --security-opt seccomp=docker/seccomp-flash.json \
  -e FLASH_PATH=/data/flash -e FLASH_CAPACITY_BYTES=1073741824 \
  -v flash-data:/data \
  ghcr.io/mbocevski/valkey-flash:1.0.0

# From another shell
valkey-cli FLASH.SET hello world
valkey-cli FLASH.GET hello
# → "world"
valkey-cli INFO flash

See docs/QUICK_START.md for a hands-on walkthrough including cluster mode.

Why valkey-flash

The practical ceiling on RAM-only Valkey deployments is either cost (RAM is expensive) or physical memory per node. valkey-flash extends the addressable working set by tiering colder data to NVMe, trading a microsecond-scale RAM access for a ~100 µs NVMe read on cold-path fetches. Hot-path reads (in-cache) stay RAM-speed.

Use cases where valkey-flash fits well:

  • Session stores with long-tailed access patterns where 10–20% of sessions drive 80% of traffic
  • Feature stores serving ML inference where the working set is orders of magnitude larger than RAM but per-request latency budget is generous
  • Content caches where LRU eviction would thrash an under-sized cache — tiering keeps cold items addressable instead of evicting them
  • User-state stores during reconnection storms where rehydration from upstream is expensive

It is not the right fit if every request is latency-critical at tail percentiles and the working set comfortably fits in RAM — native Valkey is cheaper and simpler.

Will it help my workload? — flash-sizer

Before installing anything, point flash-sizer at an existing Valkey and find out. The tool is read-only (only SCAN + MEMORY USAGE + OBJECT IDLETIME), samples up to 100 000 keys, and prints a Markdown report with a 95 % Wilson-score confidence interval on the fraction of your working set that's cold enough to tier.

uvx valkey-flash-sizer valkey://my-host:6379

No install, no write path, no phone-home. Sample output: tests/golden/report.md.

Installation

Pre-built binaries

Releases publish .so artifacts for linux-x86_64 and linux-aarch64. Download from GitHub Releases and load with:

valkey-server --loadmodule /path/to/libvalkey_flash.so \
              flash.path /data/flash.bin \
              flash.capacity-bytes 10737418240

Docker / Podman / Kubernetes

Multi-arch images are published to GHCR at ghcr.io/mbocevski/valkey-flash. See Running in containers below for the seccomp requirement and platform-specific notes.

Build from source

# Full pipeline: fmt check, clippy, unit tests, integration tests
SERVER_VERSION=unstable ./build.sh

# Module .so only
cargo build --release
# → target/release/libvalkey_flash.so

Build requires Rust (pinned to rust:1.95.0-trixie in the Dockerfile; cargo --version should be ≥1.90 locally for edition-2024 support).

Commands

All FLASH.* commands are opt-in per key — existing native Valkey data types are unchanged.

Strings (FlashString)

Command Purpose
FLASH.SET key value [EX|PX|EXAT|PXAT ...] [NX|XX] Set value, optional TTL and conditional flags
FLASH.GET key Get value (nil if missing/expired)
FLASH.DEL key [key ...] Delete one or more keys; returns count of deleted

Hashes (FlashHash)

Command Purpose
FLASH.HSET key field value [field value ...] [EX|PX|EXAT|PXAT ...] [KEEPTTL] Set fields with optional TTL on the key
FLASH.HGET key field Get field value
FLASH.HGETALL key Return all field-value pairs
FLASH.HDEL key field [field ...] Delete fields; empty hash deletes the key
FLASH.HEXISTS key field 0 or 1
FLASH.HLEN key Number of fields

Lists (FlashList)

Command Purpose
FLASH.LPUSH key value [value ...] / FLASH.RPUSH key value [value ...] Push values to the head / tail
FLASH.LPUSHX key value [value ...] / FLASH.RPUSHX key value [value ...] Push only if the key exists
FLASH.LPOP key [count] / FLASH.RPOP key [count] Pop one or more values from the head / tail
FLASH.LLEN key Number of elements
FLASH.LRANGE key start stop Slice (inclusive, supports negative indices)
FLASH.LINDEX key index Element at index
FLASH.LSET key index value Overwrite element at index
FLASH.LINSERT key BEFORE|AFTER pivot value Insert relative to pivot
FLASH.LREM key count value Remove occurrences
FLASH.LTRIM key start stop Keep only [start, stop] slice
FLASH.LMOVE src dst LEFT|RIGHT LEFT|RIGHT Atomic move between lists
FLASH.RPOPLPUSH src dst Shorthand for LMOVE RIGHT LEFT
FLASH.BLPOP key [key ...] timeout / FLASH.BRPOP ... timeout Blocking pop; 0 = block indefinitely
FLASH.BLMOVE src dst LEFT|RIGHT LEFT|RIGHT timeout Blocking variant of LMOVE

Sorted sets (FlashZSet)

Command Purpose
FLASH.ZADD key [NX|XX] [GT|LT] [CH] [INCR] score member [score member ...] Insert or update members
FLASH.ZREM key member [member ...] Remove members
FLASH.ZINCRBY key increment member Increment a member's score (NaN-guarded)
FLASH.ZPOPMIN key [count] / FLASH.ZPOPMAX key [count] Pop lowest / highest-scored members
FLASH.BZPOPMIN key [key ...] timeout / FLASH.BZPOPMAX ... timeout Blocking pop variants
FLASH.ZSCORE key member Member's score (nil if absent)
FLASH.ZRANK key member [WITHSCORE] / FLASH.ZREVRANK key member [WITHSCORE] Rank by score ascending / descending
FLASH.ZCARD key Number of members
FLASH.ZCOUNT key min max / FLASH.ZLEXCOUNT key min max Count by score / lex range
FLASH.ZRANGE key start stop [BYSCORE|BYLEX] [REV] [LIMIT offset count] [WITHSCORES] Unified range query
FLASH.ZRANGEBYSCORE / FLASH.ZREVRANGEBYSCORE / FLASH.ZRANGEBYLEX / FLASH.ZREVRANGEBYLEX Legacy range aliases
FLASH.ZSCAN key cursor [MATCH pattern] [COUNT count] Incremental iteration
FLASH.ZUNIONSTORE / FLASH.ZINTERSTORE / FLASH.ZDIFFSTORE / FLASH.ZRANGESTORE Set arithmetic with WEIGHTS / AGGREGATE SUM|MIN|MAX

Admin / debug

Command Purpose
FLASH.DEBUG.STATE Return module state (recovering, ready, error)
FLASH.DEBUG.DEMOTE key Force a Hot→Cold demotion (test-only; @admin @dangerous)
FLASH.COMPACTION.TRIGGER Manually run NVMe compaction (test-only)
FLASH.COMPACTION.STATS Current free-list state
FLASH.MIGRATE.PROBE [host port] Query local or remote node state, capacity, path
FLASH.MIGRATE Extended MIGRATE hook for FLASH.* keys — bundles DUMP/RESTORE with tier state, capacity-probe gated
FLASH.CONVERT key Convert one FLASH.* key to its native Valkey counterpart in-place, preserving TTL. Prerequisite for MODULE UNLOAD flash
FLASH.DRAIN [MATCH pat] [COUNT n] [FORCE] Scan the keyspace and FLASH.CONVERT every matching FLASH.* key. Reply array [converted, skipped, errors, scanned]. See Unloading the module

Configuration

Loaded-module args take the form flash.<knob> <value> on the valkey-server --loadmodule line.

Knob Default Mutable at runtime Description
flash.path /tmp/valkey-flash.bin No NVMe backing file path
flash.capacity-bytes 1073741824 (1 GiB) No Backing file size
flash.cache-size-bytes 268435456 (256 MiB) Yes Hot tier RAM cap
flash.sync everysec Yes WAL fsync mode: always, everysec, no
flash.io-threads num_cpus() No Async I/O worker pool size
flash.io-uring-entries 256 No io_uring submission queue depth per worker
flash.compaction-interval-sec 60 Yes NVMe compaction cadence
flash.demotion-batch 0 (auto: io-threads / 2, min 1) Yes Max demotions submitted per tick. Keep below io-threads × 4 (pool queue depth) so client write-through always has headroom
flash.demotion-max-inflight 0 (auto: io-threads × 2, min 2) Yes Cap on outstanding demotion submits. Bounds transient NVMe footprint and pool contention across ticks
flash.replica-tier-enabled no No Opt-in symmetric tiering on replicas (cluster recommendation: yes)
flash.cluster-mode-enabled auto No auto/yes/no — cluster-mode detection
flash.migration-bandwidth-mbps 100 (0 = unlimited) Yes Slot-migration rate cap
flash.migration-max-key-bytes 67108864 (64 MiB) No Per-key pre-warm cap; larger keys migrate via NVMe-read path
flash.migration-chunk-timeout-sec 30 Yes Per-chunk migration timeout
flash.migration-probe-cache-sec 60 No FLASH.MIGRATE.PROBE result cache TTL

Runtime changes for mutable knobs use the standard CONFIG SET interface:

valkey-cli CONFIG SET flash.cache-size-bytes 2147483648

Immutable knobs reject CONFIG SET with a clear error identifying the restart requirement.

Deployment

Unloading the module

MODULE UNLOAD flash is refused by Valkey while any FLASH.* custom-type keys exist. To unload cleanly, convert the flash tier back to native types first with FLASH.DRAIN:

> FLASH.DRAIN
1) (integer) 1234     # converted
2) (integer) 0        # skipped (non-flash / already native)
3) (integer) 0        # errors
4) (integer) 1234     # scanned
> MODULE UNLOAD flash
OK

After FLASH.DRAIN, each former FLASH.* key is a native Valkey value with its original name and TTL. The AOF and replication stream only ever see plain native commands (DEL / SET / HSET / RPUSH / ZADD / PEXPIREAT), so AOF replay and replica state stay module-independent.

Draining a large flash tier

A naive FLASH.DRAIN materialises every Cold-tier value into RAM before writing it as a native key. If the NVMe tier is larger than free RAM this will OOM. The default guard refuses the command when used_memory + storage_used_bytes > maxmemory:

> FLASH.DRAIN
(error) ERR FLASH.DRAIN would exceed maxmemory (used_memory=... + projected_cold=... > maxmemory=...); pass FORCE to override

Recover with one of:

  • FLASH.DRAIN COUNT <n> — chunk the work. Each invocation converts up to n keys, so you can loop externally:

    while true; do
        N=$(valkey-cli FLASH.DRAIN COUNT 1000 | head -1)
        [[ "$N" == "0" ]] && break
    done
    
  • FLASH.DRAIN MATCH <pat> — drain a subset first. Useful for draining in priority order, or for targeting only one key-prefix per host.

  • FLASH.DRAIN FORCE — bypass the guard. Use only when you know the materialised working set fits in RAM, or when maxmemory=0 isn't configured but you're confident.

Progress after each invocation is visible in INFO flash:

flash_drain_in_progress:no
flash_drain_last_converted:1000
flash_drain_last_skipped:0
flash_drain_last_errors:0
flash_drain_last_scanned:1000
flash_convert_total:42000

Cluster-wide drain

In a cluster, FLASH.DRAIN runs per-node on the connected primary. Each primary's sub-calls replicate to its replicas as native commands, so after the primary returns you can MODULE UNLOAD on both primary and replicas once they've caught up:

  1. FLASH.DRAIN on every primary (loop or parallelise).
  2. Wait for replication offsets to catch up (INFO replication).
  3. MODULE UNLOAD flash on each replica, then on each primary.

If you unload asymmetrically (primary keeps the module, replicas drop it), new FLASH.* writes on the primary can't be applied on a replica that no longer knows the custom type — the replication stream will stall. Unload everywhere, or not at all.

Post-drain caveat

FLASH.CONVERT is not transactional. After the internal DEL of the flash key, if the native create sub-call fails (e.g. the server hits maxmemory mid-convert) the key is lost. The deny-oom flag causes Valkey to refuse the command up front when maxmemory is already exceeded, which is the dominant failure mode. When running FORCE on a tightly-sized node, watch for flash_drain_last_errors > 0 in INFO.

Running in containers

Why io_uring requires a seccomp override

valkey-flash's NVMe I/O path uses io_uring (syscalls io_uring_setup, io_uring_enter, io_uring_register). The default seccomp profiles shipped by Docker and Podman block these syscalls. A plain docker run will fail at module load with an io_uring setup error unless you override the seccomp policy.

Kernel requirement: Linux ≥5.6. Earlier kernels lack the required io_uring APIs entirely.

Seccomp profile

docker/seccomp-flash.json is the recommended profile. It is Docker's default syscall allowlist extended with only the three io_uring syscalls (io_uring_setup, io_uring_enter, io_uring_register, min kernel 5.1). All other restrictions from the default profile remain in place.

For quick-start / CI, --security-opt seccomp=unconfined also works but removes all syscall filtering.

Docker

With the custom profile (recommended):

docker run --rm \
  --security-opt seccomp=docker/seccomp-flash.json \
  -e FLASH_PATH=/data/flash \
  -e FLASH_CAPACITY_BYTES=1073741824 \
  -v flash-data:/data \
  ghcr.io/mbocevski/valkey-flash:1.0.0

Docker Compose — the bundled docker/compose.single.yml already uses the profile:

security_opt:
  - seccomp:./seccomp-flash.json

Cluster Compose: docker/compose.cluster.yml gives each of the six nodes (three primaries, three replicas) a separate named volume, satisfying the requirement that every flash-tier node has a unique flash.path. If you override FLASH_PATH, ensure each container maps to a different host path or volume — two nodes sharing the same file will silently corrupt each other's NVMe tier.

To revert to unconfined for quick iteration, overlay with the dev override:

docker compose -f docker/compose.single.yml -f docker/compose.single.dev.yml up

Podman

Rootful Podman uses the same flag:

sudo podman run --rm \
  --security-opt seccomp=docker/seccomp-flash.json \
  -e FLASH_PATH=/data/flash \
  -e FLASH_CAPACITY_BYTES=1073741824 \
  -v flash-data:/data \
  ghcr.io/mbocevski/valkey-flash:1.0.0

Rootless Podman — same flag, with additional caveats:

  • Kernel <5.11: the kernel blocks io_uring inside user namespaces; upgrade to ≥5.11 for rootless io_uring support.
  • SELinux (enforcing): add --security-opt label=disable, or write a policy allowing io_uring from the container's label.
  • AppArmor: if the default AppArmor profile is loaded, also pass --security-opt apparmor=unconfined.
  • systemd user units with NoNewPrivileges=yes: override with NoNewPrivileges=no in the unit's [Service] section.

podman-compose — same security_opt syntax as Docker Compose.

Kubernetes

Copy docker/seccomp-flash.json to each node's seccomp profile directory (typically /var/lib/kubelet/seccomp/profiles/) and reference it:

# recommended for production
securityContext:
  seccompProfile:
    type: Localhost
    localhostProfile: profiles/seccomp-flash.json
# dev / staging only
securityContext:
  seccompProfile:
    type: Unconfined

Pod Security Standards note: the restricted profile mandates seccompProfile.type: RuntimeDefault or Localhost. The Localhost + seccomp-flash.json approach satisfies the restricted standard once the profile file is deployed to nodes.

Compatibility

Versions
Valkey unstable, 8.1, 9.0
Rust (build) ≥1.90 (edition 2024)
Linux kernel ≥5.6 (io_uring), ≥5.11 recommended for rootless containers
Platforms (shipped binaries) linux-x86_64, linux-aarch64
Clients Any RESP-compliant client; cluster-mode required for cluster deployments (tested against valkey-py)

Stability and versioning

valkey-flash follows strict Semantic Versioning. Starting at v1.0.0:

  • All FLASH.* command names, argument shapes, and reply formats are stable
  • All flash.* configuration knobs and their value semantics are stable
  • RDB and AOF on-disk formats are stable (encoding_version byte supports forward-compatible additions)
  • WAL on-disk format is stable

Breaking changes to any of the above require a major version bump (v2.0.0). Additive changes (new commands, new optional args, new config knobs with safe defaults) land in minor releases.

Stretch v1.x additions planned for follow-on minor releases (not breaking): full-Rust ASAN instrumentation (CI-level), import-side migration byte tracking, chunked streaming for keys larger than flash.migration-max-key-bytes.

Documentation index

Document Audience
docs/ARCHITECTURE.md Implementers / contributors — design consolidation across 11 spec decisions
docs/cluster.md Operators — deployment, sizing, failover, troubleshooting
docs/cluster-migration-runbook.md Operators — step-by-step live resharding
docs/docker-tests.md Developers — local Docker stacks + tests
docs/ci.md Developers — CI workflows and local reproduction
CHANGELOG.md All — per-release change log (Keep a Changelog format)
SECURITY.md Security researchers — vulnerability disclosure via GitHub Security Advisories
CONTRIBUTING.md Contributors — workflow and Conventional Commits
CODE_OF_CONDUCT.md Contributors — community standards

Contributing

Pull requests welcome. See CONTRIBUTING.md for the workflow. All commits use Conventional Commits; the release process is automated from the tag history.

Local test loop:

./build.sh               # fmt + clippy + unit + integration tests
cargo llvm-cov --html    # coverage report
make docker-test-single  # integration tests against a Docker stack
make docker-test-cluster # 3-primary + 3-replica Compose stack

Security

Report vulnerabilities through GitHub Security Advisories — see SECURITY.md for the full policy.

License

BSD-3-Clause

About

Valkey module that tiers key/value data to NVMe, extending RAM with flash. Rust + io_uring, cluster-aware, strings/hashes/lists/sorted sets.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages