Skip to content

Add PegaFlow external KV cache blog post#211

Open
Alex-yang00 wants to merge 2 commits into
vllm-project:mainfrom
Alex-yang00:codex/pegaflow-blog
Open

Add PegaFlow external KV cache blog post#211
Alex-yang00 wants to merge 2 commits into
vllm-project:mainfrom
Alex-yang00:codex/pegaflow-blog

Conversation

@Alex-yang00
Copy link
Copy Markdown

@Alex-yang00 Alex-yang00 commented May 19, 2026

Summary

This PR adds a new blog post for the vLLM x Novita AI collaboration on PegaFlow, an external KV cache service for production LLM serving.

The post covers:

  • Why moving KV cache lifetime out of the inference process improves restart behavior and failure isolation
  • How PegaFlow pools KV cache across local instances, TP ranks, and remote nodes
  • Benchmark results for startup time, local cache sharing, MLA KV deduplication, and RDMA remote reads
  • The three-level cache hierarchy with pinned DRAM, remote RDMA-accessible DRAM, and SSD
  • vLLM integration through the external KV connector interface
  • Quick-start commands and a public reference benchmark from the PegaFlow repository

Assets

Adds figures for:

  • PegaFlow architecture
  • Startup time comparison
  • Rust/Python tail-latency comparison
  • Local sharing result summary
  • Cross-node RDMA throughput
  • Cache-policy comparison

Review notes

The public PegaFlow install commands, connector configuration, P2P flags, and reference benchmark were checked against the novitalabs/pegaflow README and docs. The internal production benchmark numbers should still be confirmed by the Novita AI team before marking this PR ready for merge.

Signed-off-by: Alex-wuhu <yanglongwei06@gmail.com>
Comment thread _posts/2026-05-18-pegaflow.md Outdated
--metaserver-addr http://metaserver-host:50056
```

Connect vLLM without modifying vLLM source code:
Copy link
Copy Markdown
Member

@esmeetu esmeetu May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can specify the vLLM version used.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Added a note in the quick-start section that the examples in this post use vllm>=0.20.0.

@esmeetu
Copy link
Copy Markdown
Member

esmeetu commented May 19, 2026

LGTM! There's small suggestion, please take a look.

Signed-off-by: Alex-wuhu <yanglongwei06@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants