Nebula Drive AI

基于 C++ Gateway 的 AI-Native 对象存储与智能文件知识库系统。

AI-native object storage and intelligent file knowledge base powered by C++ Gateway, Redis Stream, pgvector and RAG.

项目简介

Nebula Drive AI 是一个围绕文件存储、内容寻址、异步索引、混合检索和 RAG 问答构建的工程系统。它不是普通云盘 UI，而是以 C++ Gateway 为核心的对象存储与文件知识库后端。

系统支持从文件上传、下载、分享，到文本内容索引、检索、问答和会话审计的一体化流程。Web Demo 是交互入口，核心链路位于 C++ Gateway、Content-Addressable Chunk Store、Redis Stream AI Pipeline、PostgreSQL / pgvector 和 RAG 服务。

上传完成后，文件会进入本地 Chunk Store 和元数据表；索引任务通过 Redis Stream 投递给 Python ai_worker；文本内容被解析、切片并写入 document_chunks；搜索接口提供 keyword / vector / hybrid 检索；RAG 接口基于授权文件片段生成带 citations 的回答，并保存 session 审计记录。

核心特性

对象存储能力

C++17 + Boost.Asio / Boost.Beast Gateway。
JWT 认证。
文件夹与文件元数据。
Multipart Upload：init、part upload、status、complete、abort。
Content-Addressable Chunk Store。
SHA-256 chunk 去重。
objects / object_versions / object_chunks / chunks 元数据模型。
完整下载和 Range Download。
文件删除、引用计数和 GC pending。
分享链接与分享下载。
分享密码、过期时间、访问次数限制和撤销。

AI 知识库能力

Redis Stream 异步索引 Pipeline。
Python ai_worker。
文本解析、编码 fallback 与切片。
document_chunks 持久化。
mock embedding。
pgvector vector search。
keyword / vector / hybrid search。
RAG with citations。
LLM Provider abstraction。
默认 mock LLM Provider。
可选 OpenAI-compatible Provider。
provider fallback。
RAG sessions and audit。

Web Demo

Login / Register。
Files 上传、下载、分享和索引状态展示。
Search 混合检索。
AI Ask 问答与引用来源。
Sessions 历史会话。
Shares 分享链接管理。
Observability 指标展示。

工程化能力

Docker Compose。
scripts/smoke_test.sh。
scripts/demo_prepare.sh。
scripts/benchmark_basic.sh。
scripts/release_check.sh。
GitHub Actions CI。
/metrics Prometheus-style metrics。
request_id。
structured logs。

系统架构

flowchart LR
    Web[React Web Demo] --> Gateway[C++ Gateway]
    Gateway --> Postgres[(PostgreSQL + pgvector)]
    Gateway --> Redis[(Redis Stream)]
    Gateway --> ChunkStore[Local Chunk Store]

    Redis --> Worker[Python AI Worker]
    Worker --> ChunkStore
    Worker --> Postgres

    Gateway --> Search[Keyword / Vector / Hybrid Search]
    Search --> Rag[RAG with Citations]
    Rag --> Sessions[Sessions Audit]

核心数据流：

Upload -> Chunk Store -> Redis Stream -> AI Worker
       -> document_chunks -> Hybrid Search
       -> RAG with citations -> Sessions audit

文件上传 complete 后写入对象元数据和 chunk 映射，并创建 AI index job；ai_worker 消费任务后解析文本、生成 chunk 和 mock embedding；Search / RAG 在 JWT 用户上下文下进行权限过滤，避免跨用户读取文件内容。

技术栈

Gateway：C++17、Boost.Asio、Boost.Beast、CMake。
Storage：Local Chunk Store、PostgreSQL、pgvector。
Queue：Redis Stream。
AI Worker：Python、asyncio。
Search / RAG：keyword search、mock embedding、pgvector、LLM Provider abstraction。
Frontend：React、Vite、TypeScript、Tailwind CSS。
DevOps：Docker Compose、GitHub Actions、Smoke Test、Prometheus-style metrics。

快速启动

启动后端核心服务：

docker compose up -d gateway postgres redis
bash scripts/apply_migrations.sh
curl http://localhost:18080/api/v1/health

启动 Web Demo：

cd web
npm install
VITE_API_BASE_URL=http://localhost:18080 npm run dev

访问：

http://localhost:5173

使用 Docker Compose 启动 Web：

docker compose up -d web

访问：

http://localhost:18081

手动执行 AI Worker：

docker compose run --rm ai_worker python main.py --once

典型使用流程

注册并登录。
上传 Markdown / TXT / 代码文件。
执行 ai_worker 进行索引。
在 Search 页面使用 hybrid mode 检索。
在 AI Ask 页面基于文件内容提问。
查看 citations 来源。
在 Sessions 页面查看历史问答。
在 Shares 页面创建分享链接。
在 Observability 页面查看 health 和 metrics。

API 概览

Auth

POST /api/v1/auth/register
POST /api/v1/auth/login
GET /api/v1/users/me

Upload / Files

POST /api/v1/uploads/init
PUT /api/v1/uploads/{upload_id}/parts/{part_no}
GET /api/v1/uploads/{upload_id}/status
POST /api/v1/uploads/{upload_id}/complete
DELETE /api/v1/uploads/{upload_id}
GET /api/v1/files
GET /api/v1/files/{file_id}/download
DELETE /api/v1/files/{file_id}

Share

POST /api/v1/shares
GET /api/v1/shares
GET /api/v1/shares/{token}
GET /api/v1/shares/{token}/download
DELETE /api/v1/shares/{share_id}

Search / RAG

GET /api/v1/files/{file_id}/index/status
GET /api/v1/search?q=...&mode=hybrid
POST /api/v1/ai/ask
GET /api/v1/ai/sessions
GET /api/v1/ai/sessions/{session_id}
DELETE /api/v1/ai/sessions/{session_id}

Observability

GET /api/v1/health
GET /metrics

测试与回归

bash scripts/smoke_test.sh

Smoke test 覆盖认证、上传、下载、Range Download、分享、AI indexing、Search、RAG、Sessions、权限隔离和删除过滤。

Benchmark

bash scripts/benchmark_basic.sh

详细说明见 docs/benchmark.md。该脚本用于轻量工程验证，不代表生产压测结果。

部署说明

默认端口：

Gateway：18080 -> 8080
Web：18081 -> 5173
Prometheus：19090 -> 9090

部署文档见 docs/deployment.md。生产部署时应通过环境变量管理敏感配置，Redis / PostgreSQL 不应暴露公网。

当前限制

AI indexing 当前主要支持文本类文件，如 Markdown、TXT、代码、JSON、YAML。
任意格式文件可以作为对象存储上传、下载和分享，但 PDF / DOCX / 图片 / 视频的内容理解暂未实现。
默认使用 mock embedding。
默认使用 mock LLM Provider。
OpenAI-compatible Provider 是可选配置。
demo 模式下 ai_worker 手动触发。
Web Demo 面向演示，不是生产级文件托管前端。
暂未支持 SSE 流式输出。
暂未接入 MinIO / S3 Adapter。
benchmark 是轻量工程验证，不代表生产压测结果。

后续规划

Real embedding provider。
PDF / DOCX parser。
OCR pipeline。
SSE streaming RAG responses。
MinIO / S3 object store adapter。
Grafana dashboard。
distributed tracing。
production deployment hardening。

项目截图

Dashboard 总览

展示系统能力卡片和整体产品入口。

文件上传与对象管理

展示文件上传、索引状态、下载与分享入口。

Hybrid Search

展示 keyword / vector / hybrid 检索结果和 chunk 级引用。

RAG 问答与 Citations

展示基于用户文件的问答结果和引用来源。

RAG Sessions

展示历史问答与 citations 快照。

Observability

展示 health、metrics 和运行命令。

Login / Register

展示 Web Demo 的登录和注册入口。

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
ai_worker		ai_worker
cpp		cpp
docker		docker
docs		docs
migrations		migrations
scripts		scripts
web		web
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

Nebula Drive AI

项目简介

核心特性

对象存储能力

AI 知识库能力

Web Demo

工程化能力

系统架构

技术栈

快速启动

典型使用流程

API 概览

Auth

Upload / Files

Share

Search / RAG

Observability

测试与回归

Benchmark

部署说明

当前限制

后续规划

项目截图

Dashboard 总览

文件上传与对象管理

Hybrid Search

RAG 问答与 Citations

RAG Sessions

Observability

Login / Register

相关文档

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages