[Sandbox] Curvine

### Project summary

Curvine is a high-performance distributed multi-tier cache system written in Rust. Its main purpose is to solve the bottlenecks in cloud storage I/O and bandwidth, making up for a key aspect of cloud computing performance.

### Project description

Curvine is a high-performance, highly concurrent distributed caching system released under the Apache 2.0 open-source license. By leveraging caching acceleration, it provides unified path-based access for multi-cloud storage systems while maintaining POSIX compatibility, enabling cloud storage paths to be mounted to local directories for read and write operations.

Inherently designed to bridge the gap between cloud storage I/O performance and the escalating demands of computational throughput, Curvine can be described as a solution born for cloud computing. It offers comprehensive CSI-based cloud-native access mechanisms, while also supporting integration through the Fluid Runtime framework.

**How it works：**
As a high-performance distributed caching system, Curvine primarily consists of the following components: Master, Worker, and JournalNode. The Master is responsible for managing workers and caching metadata, while the Worker handles read/write operations and management of cached data blocks. To ensure cluster stability, the Master role is instantiated across multiple nodes, with consistency among them maintained via the Raft consensus protocol. The JournalNode is tasked with synchronizing data between Master instances.

Curvine can utilize memory, SSD, and HDD as caching media, thereby constituting a multi-tier caching architecture. As a caching layer, Curvine supports multiple data caching modes:

**Proactive Caching:** Users can proactively load data from the underlying object storage into the Curvine cache layer via commands.

**Reactive Caching:** Paths from the underlying storage system can be mounted onto Curvine. If a file under such a path is accessed for the first time, it will be automatically loaded into Curvine for caching. Subsequent accesses will read the data directly from the cache.

**The core feature of Curvine:**

**Multi-Cloud Support:** Curvine is compatible with object storage services from multiple cloud providers as its underlying storage layer, enabling transparent data migration across different vendors' object storage platforms.

**Cloud-Native:** Curvine supports CSI-based cloud-native integration with Kubernetes, enabling deployment and management of Curvine clusters via Helm charts.

**POSIX Semantic Support:** Curvine delivers comprehensive POSIX semantic compatibility, implementing a high-performance FUSE layer to facilitate the manipulation of distributed cached data as if it were local disk storage.

**Compatibility with S3 and HDFS Protocols:** The system supports both S3 and HDFS read/write interfaces, facilitating seamless integration with artificial intelligence and big data technology ecosystems.

**High Performance:** Curvine employs "zero-copy" techniques multiple times throughout its data read/write pipeline and leverages asynchronous operations. Additionally, its core engine is built with Rust, ensuring optimal performance is achieved.

### Org repo URL (provide if all repos under the org are in scope of the application)

https://github.com/CurvineIO

### Project repo URL in scope of application

https://github.com/CurvineIO/curvine

### Additional repos in scope of the application

_No response_

### Website URL

https://curvineio.github.io/

### Roadmap

https://github.com/CurvineIO/curvine/issues/29

### Roadmap context

### Milestones for 2025
| Version number          | Release date   | Core features                          |
|-----------------|------------|-----------------------------------|
| `0.0.1-beta`    | 2025-07    | A high-performance distributed caching framework has been established, enabling the cluster to operate normally. |
| `0.1.1-beta`    | 2025-10   | Shuffle function completion + Spark integration test  |
|                 |            | S3 protocol support            |
|                 |            | HDFS protocol support                  |
| `0.2.1-beta`    | 2025-12    | Cloud-native CSI driver support                   |
|                 |            | Use Curvine FUSE as Spark Shuffle local disk (through TPCH test)    |

### Major version planning
| Version number          | Time node   | Target scene                          |
|-----------------|------------|-----------------------------------|
| `1.0.0-base`   | 2025-12-30 | Comprehensive support for big data scenarios:              |
||            | - Hot data cache acceleration                  |
|                 |            | - comprehensive POSIX semantic compatibility              |
|                 |            | - Spark reduce task local Spill support                 |
|                 |            | HDFS protocol support                  |
| `2.0.0-base`   | 2026-06-30 | AI scenario enhancement:                     |
|                 |            | - Training acceleration framework integration                |
|                 |            | - RDMA/GDS support                  |

The above outlines the roadmap for 2025 and half of 2026. The roadmap for 2026-2027 will be developed in collaboration with community partners.
Btw, the roadmap may by adjusted temporarily based on the priority of user needs.

### Contributing guide

https://github.com/CurvineIO/curvine/blob/main/CONTRIBUTING.md

### Code of Conduct (CoC)

https://github.com/CurvineIO/curvine/blob/main/COMMIT_CONVENTION.md

### Adopters

_No response_

### Maintainers file

https://github.com/CurvineIO/curvine/blob/main/MAINTAINERS.md

### Security policy file

https://github.com/CurvineIO/curvine/blob/main/SECURITY.md

### Standard or specification?

While Curvine is not yet regarded as the benchmark for cloud-native distributed caching, its exceptional performance and robust ecosystem integration through development will significantly propel its evolution. Currently, Curvine is compatible with the following protocols and software: POSIX, S3, HDFS, Fluid, Spark, StarRocks, among others. 
The project adheres to cloud-native best practices and standards but does not introduce new specifications of its own.

### Business product or service to project separation

This project is unrelated to any product or service.

### Why CNCF?

Curvine was conceived with the explicit goal of addressing I/O performance challenges in cloud storage, being deeply rooted in cloud-native principles and designed to serve the cloud computing ecosystem. The Cloud Native Computing Foundation (CNCF), as the most influential open-source foundation for cloud-native technologies, represents a key incubation target for many projects in this domain. Admission into CNCF would attract more contributors to participate in its development, amplify Curvine's influence, and ultimately provide the industry with a more efficient distributed caching solution to accelerate the advancement of cloud computing.

**Independence:** By joining CNCF, this project will become an independent, neutral open-source initiative, ensuring its long-term development is no longer subject to changes within its original sponsoring company.

**Operational Support:** Through joining CNCF, Curvine will gain operational backing from the Foundation, accelerating its promotion and enhancing the vitality of the project.

**Ecosystem Integration:** Joining CNCF will facilitate deeper integration of Curvine into the cloud-native technology ecosystem.

### Benefit to the landscape

Curvine addresses I/O acceleration and bandwidth limitation challenges in cloud storage. As a data caching layer positioned between cloud storage and Kubernetes pod nodes, Curvine effectively bridges the performance gap between the two. Currently, there is no active open-source project of similar scope in the industry, including within the CNCF community (the open-source version of Alluxio is largely no longer maintained). The emergence of Curvine thus fills a critical gap in the ecosystem.

### Cloud native 'fit'

From its initial design phase, Curvine has been architected with the dynamic nature of cloud-native environments in mind. Each component role incorporates mechanisms for automatic handling post-elastic scaling, ensuring both stability and performance of data access within such environments. Simultaneously, Curvine aims to bridge the performance gap between storage and computation in the cloud-native ecosystem, enhancing data access performance and convenience specifically in scenarios involving cloud-native big data computing, as well as the training and inference of large models.
 In practice, users can leverage Curvine’s FUSE capabilities through the CSI cloud-native standard interface, which is pluggable, stateless, and elastically scalable.

### Cloud native 'integration'
In multiple scenarios such as distributed big data computing, distributed model inference, and model training, Curvine can integrate with various CNCF projects, including but not limited to Volcano, Kserve, and Fluid.

### Cloud native overlap

N/A

### Similar projects

N/A

### Landscape

No


### Trademark and accounts

- [x] If the project is accepted, I agree to donate all project trademarks and accounts to the CNCF

### IP policy

- [x] If the project is accepted, I agree the project will follow the CNCF IP Policy

### Will the project require a license exception?

Curvine was conceived with the explicit goal of addressing I/O performance challenges in cloud storage, being deeply rooted in cloud-native principles and designed to serve the cloud computing ecosystem. 
The Cloud Native Computing Foundation (CNCF), as the most influential open-source foundation for cloud-native technologies, represents a key incubation target for many projects in this domain. Admission into CNCF would attract more contributors to participate in its development, amplify Curvine's influence, and ultimately provide the industry with a more efficient distributed caching solution to accelerate the advancement of cloud computing.

**Independence:** By joining CNCF, this project will become an independent, neutral open-source initiative, ensuring its long-term development is no longer subject to changes within its original sponsoring company.

**Operational Support:** Through joining CNCF, Curvine will gain operational backing from the Foundation, accelerating its promotion and enhancing the vitality of the project.

**Ecosystem Integration:** Joining CNCF will facilitate deeper integration of Curvine into the cloud-native technology ecosystem.

### Project "Domain Technical Review"

N/A


### Application contact email(s)

curvine86@gmail.com

### Contributing or sponsoring entity signatory information

Or, if an individual or individual(s):
| Name | Country | Email address |
|-----------|-----------|-----------|
| David Fu | China | curvine86@gmail.com |

### CNCF contacts

_No response_

### Additional information

_No response_

Version number	Release date	Core features
`0.0.1-beta`	2025-07	A high-performance distributed caching framework has been established, enabling the cluster to operate normally.
`0.1.1-beta`	2025-10	Shuffle function completion + Spark integration test
		S3 protocol support
		HDFS protocol support
`0.2.1-beta`	2025-12	Cloud-native CSI driver support
		Use Curvine FUSE as Spark Shuffle local disk (through TPCH test)

Version number	Time node	Target scene
`1.0.0-base`	2025-12-30	Comprehensive support for big data scenarios:
		- Hot data cache acceleration
		- comprehensive POSIX semantic compatibility
		- Spark reduce task local Spill support
		HDFS protocol support
`2.0.0-base`	2026-06-30	AI scenario enhancement:
		- Training acceleration framework integration
		- RDMA/GDS support

[Sandbox] Curvine #438

Description

Project summary

Project description

Org repo URL (provide if all repos under the org are in scope of the application)

Project repo URL in scope of application

Additional repos in scope of the application

Website URL

Roadmap

Roadmap context

Milestones for 2025

Major version planning

Contributing guide

Code of Conduct (CoC)

Adopters

Maintainers file

Security policy file

Standard or specification?

Business product or service to project separation

Why CNCF?

Benefit to the landscape

Cloud native 'fit'

Cloud native 'integration'

Cloud native overlap

Similar projects

Landscape

Trademark and accounts

IP policy

Will the project require a license exception?

Project "Domain Technical Review"

Application contact email(s)

Contributing or sponsoring entity signatory information

CNCF contacts

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions