Skip to content

[Sandbox] Curvine #438

@szbr9486

Description

@szbr9486

Project summary

Curvine is a high-performance distributed multi-tier cache system written in Rust. Its main purpose is to solve the bottlenecks in cloud storage I/O and bandwidth, making up for a key aspect of cloud computing performance.

Project description

Curvine is a high-performance, highly concurrent distributed caching system released under the Apache 2.0 open-source license. By leveraging caching acceleration, it provides unified path-based access for multi-cloud storage systems while maintaining POSIX compatibility, enabling cloud storage paths to be mounted to local directories for read and write operations.

Inherently designed to bridge the gap between cloud storage I/O performance and the escalating demands of computational throughput, Curvine can be described as a solution born for cloud computing. It offers comprehensive CSI-based cloud-native access mechanisms, while also supporting integration through the Fluid Runtime framework.

How it works:
As a high-performance distributed caching system, Curvine primarily consists of the following components: Master, Worker, and JournalNode. The Master is responsible for managing workers and caching metadata, while the Worker handles read/write operations and management of cached data blocks. To ensure cluster stability, the Master role is instantiated across multiple nodes, with consistency among them maintained via the Raft consensus protocol. The JournalNode is tasked with synchronizing data between Master instances.

Curvine can utilize memory, SSD, and HDD as caching media, thereby constituting a multi-tier caching architecture. As a caching layer, Curvine supports multiple data caching modes:

Proactive Caching: Users can proactively load data from the underlying object storage into the Curvine cache layer via commands.

Reactive Caching: Paths from the underlying storage system can be mounted onto Curvine. If a file under such a path is accessed for the first time, it will be automatically loaded into Curvine for caching. Subsequent accesses will read the data directly from the cache.

The core feature of Curvine:

Multi-Cloud Support: Curvine is compatible with object storage services from multiple cloud providers as its underlying storage layer, enabling transparent data migration across different vendors' object storage platforms.

Cloud-Native: Curvine supports CSI-based cloud-native integration with Kubernetes, enabling deployment and management of Curvine clusters via Helm charts.

POSIX Semantic Support: Curvine delivers comprehensive POSIX semantic compatibility, implementing a high-performance FUSE layer to facilitate the manipulation of distributed cached data as if it were local disk storage.

Compatibility with S3 and HDFS Protocols: The system supports both S3 and HDFS read/write interfaces, facilitating seamless integration with artificial intelligence and big data technology ecosystems.

High Performance: Curvine employs "zero-copy" techniques multiple times throughout its data read/write pipeline and leverages asynchronous operations. Additionally, its core engine is built with Rust, ensuring optimal performance is achieved.

Org repo URL (provide if all repos under the org are in scope of the application)

https://github.com/CurvineIO

Project repo URL in scope of application

https://github.com/CurvineIO/curvine

Additional repos in scope of the application

No response

Website URL

https://curvineio.github.io/

Roadmap

CurvineIO/curvine#29

Roadmap context

Milestones for 2025

Version number Release date Core features
0.0.1-beta 2025-07 A high-performance distributed caching framework has been established, enabling the cluster to operate normally.
0.1.1-beta 2025-10 Shuffle function completion + Spark integration test
S3 protocol support
HDFS protocol support
0.2.1-beta 2025-12 Cloud-native CSI driver support
Use Curvine FUSE as Spark Shuffle local disk (through TPCH test)

Major version planning

Version number Time node Target scene
1.0.0-base 2025-12-30 Comprehensive support for big data scenarios:
- Hot data cache acceleration
- comprehensive POSIX semantic compatibility
- Spark reduce task local Spill support
HDFS protocol support
2.0.0-base 2026-06-30 AI scenario enhancement:
- Training acceleration framework integration
- RDMA/GDS support

The above outlines the roadmap for 2025 and half of 2026. The roadmap for 2026-2027 will be developed in collaboration with community partners.
Btw, the roadmap may by adjusted temporarily based on the priority of user needs.

Contributing guide

https://github.com/CurvineIO/curvine/blob/main/CONTRIBUTING.md

Code of Conduct (CoC)

https://github.com/CurvineIO/curvine/blob/main/COMMIT_CONVENTION.md

Adopters

No response

Maintainers file

https://github.com/CurvineIO/curvine/blob/main/MAINTAINERS.md

Security policy file

https://github.com/CurvineIO/curvine/blob/main/SECURITY.md

Standard or specification?

While Curvine is not yet regarded as the benchmark for cloud-native distributed caching, its exceptional performance and robust ecosystem integration through development will significantly propel its evolution. Currently, Curvine is compatible with the following protocols and software: POSIX, S3, HDFS, Fluid, Spark, StarRocks, among others.
The project adheres to cloud-native best practices and standards but does not introduce new specifications of its own.

Business product or service to project separation

This project is unrelated to any product or service.

Why CNCF?

Curvine was conceived with the explicit goal of addressing I/O performance challenges in cloud storage, being deeply rooted in cloud-native principles and designed to serve the cloud computing ecosystem. The Cloud Native Computing Foundation (CNCF), as the most influential open-source foundation for cloud-native technologies, represents a key incubation target for many projects in this domain. Admission into CNCF would attract more contributors to participate in its development, amplify Curvine's influence, and ultimately provide the industry with a more efficient distributed caching solution to accelerate the advancement of cloud computing.

Independence: By joining CNCF, this project will become an independent, neutral open-source initiative, ensuring its long-term development is no longer subject to changes within its original sponsoring company.

Operational Support: Through joining CNCF, Curvine will gain operational backing from the Foundation, accelerating its promotion and enhancing the vitality of the project.

Ecosystem Integration: Joining CNCF will facilitate deeper integration of Curvine into the cloud-native technology ecosystem.

Benefit to the landscape

Curvine addresses I/O acceleration and bandwidth limitation challenges in cloud storage. As a data caching layer positioned between cloud storage and Kubernetes pod nodes, Curvine effectively bridges the performance gap between the two. Currently, there is no active open-source project of similar scope in the industry, including within the CNCF community (the open-source version of Alluxio is largely no longer maintained). The emergence of Curvine thus fills a critical gap in the ecosystem.

Cloud native 'fit'

From its initial design phase, Curvine has been architected with the dynamic nature of cloud-native environments in mind. Each component role incorporates mechanisms for automatic handling post-elastic scaling, ensuring both stability and performance of data access within such environments. Simultaneously, Curvine aims to bridge the performance gap between storage and computation in the cloud-native ecosystem, enhancing data access performance and convenience specifically in scenarios involving cloud-native big data computing, as well as the training and inference of large models.
In practice, users can leverage Curvine’s FUSE capabilities through the CSI cloud-native standard interface, which is pluggable, stateless, and elastically scalable.

Cloud native 'integration'

In multiple scenarios such as distributed big data computing, distributed model inference, and model training, Curvine can integrate with various CNCF projects, including but not limited to Volcano, Kserve, and Fluid.

Cloud native overlap

N/A

Similar projects

N/A

Landscape

No

Trademark and accounts

  • If the project is accepted, I agree to donate all project trademarks and accounts to the CNCF

IP policy

  • If the project is accepted, I agree the project will follow the CNCF IP Policy

Will the project require a license exception?

Curvine was conceived with the explicit goal of addressing I/O performance challenges in cloud storage, being deeply rooted in cloud-native principles and designed to serve the cloud computing ecosystem.
The Cloud Native Computing Foundation (CNCF), as the most influential open-source foundation for cloud-native technologies, represents a key incubation target for many projects in this domain. Admission into CNCF would attract more contributors to participate in its development, amplify Curvine's influence, and ultimately provide the industry with a more efficient distributed caching solution to accelerate the advancement of cloud computing.

Independence: By joining CNCF, this project will become an independent, neutral open-source initiative, ensuring its long-term development is no longer subject to changes within its original sponsoring company.

Operational Support: Through joining CNCF, Curvine will gain operational backing from the Foundation, accelerating its promotion and enhancing the vitality of the project.

Ecosystem Integration: Joining CNCF will facilitate deeper integration of Curvine into the cloud-native technology ecosystem.

Project "Domain Technical Review"

N/A

Application contact email(s)

curvine86@gmail.com

Contributing or sponsoring entity signatory information

Or, if an individual or individual(s):

Name Country Email address
David Fu China curvine86@gmail.com

CNCF contacts

No response

Additional information

No response

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

Status

Contributor Agreement Unsigned

Status

New - Sandbox Pending Review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions