valkey-flash

A Valkey module that tiers key/value data to NVMe storage, letting Valkey serve a working set larger than RAM. Hot entries live in an in-memory cache; cold entries reside on NVMe and are promoted back on read. The NVMe I/O path uses io_uring and runs on background threads — the Valkey event loop never blocks on disk.

valkey-flash ships full cluster support, durable writes via a per-record WAL, and native RDB + AOF persistence. Strings, hashes, lists, and sorted sets are all supported as tiered types in v1.0.0 — see the command reference below and CHANGELOG for details.

Quick start

# Run the docker image (requires Linux kernel ≥5.6 for io_uring)
docker run --rm -p 6379:6379 \
  --security-opt seccomp=docker/seccomp-flash.json \
  -e FLASH_PATH=/data/flash -e FLASH_CAPACITY_BYTES=1073741824 \
  -v flash-data:/data \
  ghcr.io/mbocevski/valkey-flash:1.0.0

# From another shell
valkey-cli FLASH.SET hello world
valkey-cli FLASH.GET hello
# → "world"
valkey-cli INFO flash

See docs/QUICK_START.md for a hands-on walkthrough including cluster mode.

Why valkey-flash

The practical ceiling on RAM-only Valkey deployments is either cost (RAM is expensive) or physical memory per node. valkey-flash extends the addressable working set by tiering colder data to NVMe, trading a microsecond-scale RAM access for a ~100 µs NVMe read on cold-path fetches. Hot-path reads (in-cache) stay RAM-speed.

Use cases where valkey-flash fits well:

Session stores with long-tailed access patterns where 10–20% of sessions drive 80% of traffic
Feature stores serving ML inference where the working set is orders of magnitude larger than RAM but per-request latency budget is generous
Content caches where LRU eviction would thrash an under-sized cache — tiering keeps cold items addressable instead of evicting them
User-state stores during reconnection storms where rehydration from upstream is expensive

It is not the right fit if every request is latency-critical at tail percentiles and the working set comfortably fits in RAM — native Valkey is cheaper and simpler.

Will it help my workload? — `flash-sizer`

Before installing anything, point flash-sizer at an existing Valkey and find out. The tool is read-only (only SCAN + MEMORY USAGE + OBJECT IDLETIME), samples up to 100 000 keys, and prints a Markdown report with a 95 % Wilson-score confidence interval on the fraction of your working set that's cold enough to tier.

uvx valkey-flash-sizer valkey://my-host:6379

No install, no write path, no phone-home. Sample output: tests/golden/report.md.

Installation

Pre-built binaries

Releases publish .so artifacts for linux-x86_64 and linux-aarch64. Download from GitHub Releases and load with:

valkey-server --loadmodule /path/to/libvalkey_flash.so \
              flash.path /data/flash.bin \
              flash.capacity-bytes 10737418240

Docker / Podman / Kubernetes

Multi-arch images are published to GHCR at ghcr.io/mbocevski/valkey-flash. See Running in containers below for the seccomp requirement and platform-specific notes.

Build from source

# Full pipeline: fmt check, clippy, unit tests, integration tests
SERVER_VERSION=unstable ./build.sh

# Module .so only
cargo build --release
# → target/release/libvalkey_flash.so

Build requires Rust (pinned to rust:1.95.0-trixie in the Dockerfile; cargo --version should be ≥1.90 locally for edition-2024 support).

Commands

All FLASH.* commands are opt-in per key — existing native Valkey data types are unchanged.

Strings (`FlashString`)

Command	Purpose
`FLASH.SET key value [EX\|PX\|EXAT\|PXAT ...] [NX\|XX]`	Set value, optional TTL and conditional flags
`FLASH.GET key`	Get value (nil if missing/expired)
`FLASH.DEL key [key ...]`	Delete one or more keys; returns count of deleted

Hashes (`FlashHash`)

Command	Purpose
`FLASH.HSET key field value [field value ...] [EX\|PX\|EXAT\|PXAT ...] [KEEPTTL]`	Set fields with optional TTL on the key
`FLASH.HGET key field`	Get field value
`FLASH.HGETALL key`	Return all field-value pairs
`FLASH.HDEL key field [field ...]`	Delete fields; empty hash deletes the key
`FLASH.HEXISTS key field`	0 or 1
`FLASH.HLEN key`	Number of fields

Lists (`FlashList`)

Command	Purpose
`FLASH.LPUSH key value [value ...]` / `FLASH.RPUSH key value [value ...]`	Push values to the head / tail
`FLASH.LPUSHX key value [value ...]` / `FLASH.RPUSHX key value [value ...]`	Push only if the key exists
`FLASH.LPOP key [count]` / `FLASH.RPOP key [count]`	Pop one or more values from the head / tail
`FLASH.LLEN key`	Number of elements
`FLASH.LRANGE key start stop`	Slice (inclusive, supports negative indices)
`FLASH.LINDEX key index`	Element at index
`FLASH.LSET key index value`	Overwrite element at index
`FLASH.LINSERT key BEFORE\|AFTER pivot value`	Insert relative to pivot
`FLASH.LREM key count value`	Remove occurrences
`FLASH.LTRIM key start stop`	Keep only `[start, stop]` slice
`FLASH.LMOVE src dst LEFT\|RIGHT LEFT\|RIGHT`	Atomic move between lists
`FLASH.RPOPLPUSH src dst`	Shorthand for LMOVE RIGHT LEFT
`FLASH.BLPOP key [key ...] timeout` / `FLASH.BRPOP ... timeout`	Blocking pop; `0` = block indefinitely
`FLASH.BLMOVE src dst LEFT\|RIGHT LEFT\|RIGHT timeout`	Blocking variant of LMOVE

Sorted sets (`FlashZSet`)

Command	Purpose
`FLASH.ZADD key [NX\|XX] [GT\|LT] [CH] [INCR] score member [score member ...]`	Insert or update members
`FLASH.ZREM key member [member ...]`	Remove members
`FLASH.ZINCRBY key increment member`	Increment a member's score (NaN-guarded)
`FLASH.ZPOPMIN key [count]` / `FLASH.ZPOPMAX key [count]`	Pop lowest / highest-scored members
`FLASH.BZPOPMIN key [key ...] timeout` / `FLASH.BZPOPMAX ... timeout`	Blocking pop variants
`FLASH.ZSCORE key member`	Member's score (nil if absent)
`FLASH.ZRANK key member [WITHSCORE]` / `FLASH.ZREVRANK key member [WITHSCORE]`	Rank by score ascending / descending
`FLASH.ZCARD key`	Number of members
`FLASH.ZCOUNT key min max` / `FLASH.ZLEXCOUNT key min max`	Count by score / lex range
`FLASH.ZRANGE key start stop [BYSCORE\|BYLEX] [REV] [LIMIT offset count] [WITHSCORES]`	Unified range query
`FLASH.ZRANGEBYSCORE` / `FLASH.ZREVRANGEBYSCORE` / `FLASH.ZRANGEBYLEX` / `FLASH.ZREVRANGEBYLEX`	Legacy range aliases
`FLASH.ZSCAN key cursor [MATCH pattern] [COUNT count]`	Incremental iteration
`FLASH.ZUNIONSTORE` / `FLASH.ZINTERSTORE` / `FLASH.ZDIFFSTORE` / `FLASH.ZRANGESTORE`	Set arithmetic with `WEIGHTS` / `AGGREGATE SUM\|MIN\|MAX`

Admin / debug

Command	Purpose
`FLASH.DEBUG.STATE`	Return module state (`recovering`, `ready`, `error`)
`FLASH.DEBUG.DEMOTE key`	Force a Hot→Cold demotion (test-only; `@admin @dangerous`)
`FLASH.COMPACTION.TRIGGER`	Manually run NVMe compaction (test-only)
`FLASH.COMPACTION.STATS`	Current free-list state
`FLASH.MIGRATE.PROBE [host port]`	Query local or remote node state, capacity, path
`FLASH.MIGRATE`	Extended MIGRATE hook for FLASH.* keys — bundles DUMP/RESTORE with tier state, capacity-probe gated
`FLASH.CONVERT key`	Convert one `FLASH.*` key to its native Valkey counterpart in-place, preserving TTL. Prerequisite for `MODULE UNLOAD flash`
`FLASH.DRAIN [MATCH pat] [COUNT n] [FORCE]`	Scan the keyspace and `FLASH.CONVERT` every matching `FLASH.*` key. Reply array `[converted, skipped, errors, scanned]`. See Unloading the module

Configuration

Loaded-module args take the form flash.<knob> <value> on the valkey-server --loadmodule line.

Knob	Default	Mutable at runtime	Description
`flash.path`	`/tmp/valkey-flash.bin`	No	NVMe backing file path
`flash.capacity-bytes`	`1073741824` (1 GiB)	No	Backing file size
`flash.cache-size-bytes`	`268435456` (256 MiB)	Yes	Hot tier RAM cap
`flash.sync`	`everysec`	Yes	WAL fsync mode: `always`, `everysec`, `no`
`flash.io-threads`	`num_cpus()`	No	Async I/O worker pool size
`flash.io-uring-entries`	`256`	No	io_uring submission queue depth per worker
`flash.compaction-interval-sec`	`60`	Yes	NVMe compaction cadence
`flash.demotion-batch`	`0` (auto: `io-threads / 2`, min 1)	Yes	Max demotions submitted per tick. Keep below `io-threads × 4` (pool queue depth) so client write-through always has headroom
`flash.demotion-max-inflight`	`0` (auto: `io-threads × 2`, min 2)	Yes	Cap on outstanding demotion submits. Bounds transient NVMe footprint and pool contention across ticks
`flash.replica-tier-enabled`	`no`	No	Opt-in symmetric tiering on replicas (cluster recommendation: `yes`)
`flash.cluster-mode-enabled`	`auto`	No	`auto`/`yes`/`no` — cluster-mode detection
`flash.migration-bandwidth-mbps`	`100` (0 = unlimited)	Yes	Slot-migration rate cap
`flash.migration-max-key-bytes`	`67108864` (64 MiB)	No	Per-key pre-warm cap; larger keys migrate via NVMe-read path
`flash.migration-chunk-timeout-sec`	`30`	Yes	Per-chunk migration timeout
`flash.migration-probe-cache-sec`	`60`	No	`FLASH.MIGRATE.PROBE` result cache TTL

Runtime changes for mutable knobs use the standard CONFIG SET interface:

valkey-cli CONFIG SET flash.cache-size-bytes 2147483648

Immutable knobs reject CONFIG SET with a clear error identifying the restart requirement.

Deployment

Single-node deployment — the default; see the Quick start above.
Cluster deployment — sizing, unique flash.path constraint per node, replica topologies, slot migration, failover, and troubleshooting.
Migration runbook — operator step-by-step for resharding a live cluster.
Unloading the module — FLASH.DRAIN to native types, then MODULE UNLOAD.
Running in containers — Docker, Podman, Kubernetes with the io_uring seccomp profile.
Developer workflow with Docker — local Compose stacks and integration test runner.

Unloading the module

MODULE UNLOAD flash is refused by Valkey while any FLASH.* custom-type keys exist. To unload cleanly, convert the flash tier back to native types first with FLASH.DRAIN:

> FLASH.DRAIN
1) (integer) 1234     # converted
2) (integer) 0        # skipped (non-flash / already native)
3) (integer) 0        # errors
4) (integer) 1234     # scanned
> MODULE UNLOAD flash
OK

After FLASH.DRAIN, each former FLASH.* key is a native Valkey value with its original name and TTL. The AOF and replication stream only ever see plain native commands (DEL / SET / HSET / RPUSH / ZADD / PEXPIREAT), so AOF replay and replica state stay module-independent.

Draining a large flash tier

A naive FLASH.DRAIN materialises every Cold-tier value into RAM before writing it as a native key. If the NVMe tier is larger than free RAM this will OOM. The default guard refuses the command when used_memory + storage_used_bytes > maxmemory:

> FLASH.DRAIN
(error) ERR FLASH.DRAIN would exceed maxmemory (used_memory=... + projected_cold=... > maxmemory=...); pass FORCE to override

Recover with one of:

FLASH.DRAIN COUNT <n> — chunk the work. Each invocation converts up to n keys, so you can loop externally:

while true; do
    N=$(valkey-cli FLASH.DRAIN COUNT 1000 | head -1)
    [[ "$N" == "0" ]] && break
done

FLASH.DRAIN MATCH <pat> — drain a subset first. Useful for draining in priority order, or for targeting only one key-prefix per host.
FLASH.DRAIN FORCE — bypass the guard. Use only when you know the materialised working set fits in RAM, or when maxmemory=0 isn't configured but you're confident.

Progress after each invocation is visible in INFO flash:

flash_drain_in_progress:no
flash_drain_last_converted:1000
flash_drain_last_skipped:0
flash_drain_last_errors:0
flash_drain_last_scanned:1000
flash_convert_total:42000

Cluster-wide drain

In a cluster, FLASH.DRAIN runs per-node on the connected primary. Each primary's sub-calls replicate to its replicas as native commands, so after the primary returns you can MODULE UNLOAD on both primary and replicas once they've caught up:

FLASH.DRAIN on every primary (loop or parallelise).
Wait for replication offsets to catch up (INFO replication).
MODULE UNLOAD flash on each replica, then on each primary.

If you unload asymmetrically (primary keeps the module, replicas drop it), new FLASH.* writes on the primary can't be applied on a replica that no longer knows the custom type — the replication stream will stall. Unload everywhere, or not at all.

Post-drain caveat

FLASH.CONVERT is not transactional. After the internal DEL of the flash key, if the native create sub-call fails (e.g. the server hits maxmemory mid-convert) the key is lost. The deny-oom flag causes Valkey to refuse the command up front when maxmemory is already exceeded, which is the dominant failure mode. When running FORCE on a tightly-sized node, watch for flash_drain_last_errors > 0 in INFO.

Running in containers

Why io_uring requires a seccomp override

valkey-flash's NVMe I/O path uses io_uring (syscalls io_uring_setup, io_uring_enter, io_uring_register). The default seccomp profiles shipped by Docker and Podman block these syscalls. A plain docker run will fail at module load with an io_uring setup error unless you override the seccomp policy.

Kernel requirement: Linux ≥5.6. Earlier kernels lack the required io_uring APIs entirely.

Seccomp profile

docker/seccomp-flash.json is the recommended profile. It is Docker's default syscall allowlist extended with only the three io_uring syscalls (io_uring_setup, io_uring_enter, io_uring_register, min kernel 5.1). All other restrictions from the default profile remain in place.

For quick-start / CI, --security-opt seccomp=unconfined also works but removes all syscall filtering.

Docker

With the custom profile (recommended):

docker run --rm \
  --security-opt seccomp=docker/seccomp-flash.json \
  -e FLASH_PATH=/data/flash \
  -e FLASH_CAPACITY_BYTES=1073741824 \
  -v flash-data:/data \
  ghcr.io/mbocevski/valkey-flash:1.0.0

Docker Compose — the bundled docker/compose.single.yml already uses the profile:

security_opt:
  - seccomp:./seccomp-flash.json

Cluster Compose: docker/compose.cluster.yml gives each of the six nodes (three primaries, three replicas) a separate named volume, satisfying the requirement that every flash-tier node has a unique flash.path. If you override FLASH_PATH, ensure each container maps to a different host path or volume — two nodes sharing the same file will silently corrupt each other's NVMe tier.

To revert to unconfined for quick iteration, overlay with the dev override:

docker compose -f docker/compose.single.yml -f docker/compose.single.dev.yml up

Podman

Rootful Podman uses the same flag:

sudo podman run --rm \
  --security-opt seccomp=docker/seccomp-flash.json \
  -e FLASH_PATH=/data/flash \
  -e FLASH_CAPACITY_BYTES=1073741824 \
  -v flash-data:/data \
  ghcr.io/mbocevski/valkey-flash:1.0.0

Rootless Podman — same flag, with additional caveats:

Kernel <5.11: the kernel blocks io_uring inside user namespaces; upgrade to ≥5.11 for rootless io_uring support.
SELinux (enforcing): add --security-opt label=disable, or write a policy allowing io_uring from the container's label.
AppArmor: if the default AppArmor profile is loaded, also pass --security-opt apparmor=unconfined.
systemd user units with NoNewPrivileges=yes: override with NoNewPrivileges=no in the unit's [Service] section.

podman-compose — same security_opt syntax as Docker Compose.

Kubernetes

Copy docker/seccomp-flash.json to each node's seccomp profile directory (typically /var/lib/kubelet/seccomp/profiles/) and reference it:

# recommended for production
securityContext:
  seccompProfile:
    type: Localhost
    localhostProfile: profiles/seccomp-flash.json

# dev / staging only
securityContext:
  seccompProfile:
    type: Unconfined

Pod Security Standards note: the restricted profile mandates seccompProfile.type: RuntimeDefault or Localhost. The Localhost + seccomp-flash.json approach satisfies the restricted standard once the profile file is deployed to nodes.

Compatibility

	Versions
Valkey	unstable, 8.1, 9.0
Rust (build)	≥1.90 (edition 2024)
Linux kernel	≥5.6 (io_uring), ≥5.11 recommended for rootless containers
Platforms (shipped binaries)	linux-x86_64, linux-aarch64
Clients	Any RESP-compliant client; cluster-mode required for cluster deployments (tested against valkey-py)

Stability and versioning

valkey-flash follows strict Semantic Versioning. Starting at v1.0.0:

All FLASH.* command names, argument shapes, and reply formats are stable
All flash.* configuration knobs and their value semantics are stable
RDB and AOF on-disk formats are stable (encoding_version byte supports forward-compatible additions)
WAL on-disk format is stable

Breaking changes to any of the above require a major version bump (v2.0.0). Additive changes (new commands, new optional args, new config knobs with safe defaults) land in minor releases.

Stretch v1.x additions planned for follow-on minor releases (not breaking): full-Rust ASAN instrumentation (CI-level), import-side migration byte tracking, chunked streaming for keys larger than flash.migration-max-key-bytes.

Documentation index

Document	Audience
docs/ARCHITECTURE.md	Implementers / contributors — design consolidation across 11 spec decisions
docs/cluster.md	Operators — deployment, sizing, failover, troubleshooting
docs/cluster-migration-runbook.md	Operators — step-by-step live resharding
docs/docker-tests.md	Developers — local Docker stacks + tests
docs/ci.md	Developers — CI workflows and local reproduction
CHANGELOG.md	All — per-release change log (Keep a Changelog format)
SECURITY.md	Security researchers — vulnerability disclosure via GitHub Security Advisories
CONTRIBUTING.md	Contributors — workflow and Conventional Commits
CODE_OF_CONDUCT.md	Contributors — community standards

Contributing

Pull requests welcome. See CONTRIBUTING.md for the workflow. All commits use Conventional Commits; the release process is automated from the tag history.

Local test loop:

./build.sh               # fmt + clippy + unit + integration tests
cargo llvm-cov --html    # coverage report
make docker-test-single  # integration tests against a Docker stack
make docker-test-cluster # 3-primary + 3-replica Compose stack

Security

Report vulnerabilities through GitHub Security Advisories — see SECURITY.md for the full policy.

License

BSD-3-Clause

Name		Name	Last commit message	Last commit date
Latest commit History 231 Commits
.github		.github
docker		docker
docs		docs
fuzz		fuzz
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
MAINTAINERS.md		MAINTAINERS.md
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
build.sh		build.sh
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

valkey-flash

Quick start

Why valkey-flash

Will it help my workload? — flash-sizer

Installation

Pre-built binaries

Docker / Podman / Kubernetes

Build from source

Commands

Strings (FlashString)

Hashes (FlashHash)

Lists (FlashList)

Sorted sets (FlashZSet)

Admin / debug

Configuration

Deployment

Unloading the module

Draining a large flash tier

Cluster-wide drain

Post-drain caveat

Running in containers

Why io_uring requires a seccomp override

Seccomp profile

Docker

Podman

Kubernetes

Compatibility

Stability and versioning

Documentation index

Contributing

Security

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Will it help my workload? — `flash-sizer`

Strings (`FlashString`)

Hashes (`FlashHash`)

Lists (`FlashList`)

Sorted sets (`FlashZSet`)

Packages