diff --git a/docs/proposal/kernel_native_mode_restart.md b/docs/proposal/kernel_native_mode_restart.md new file mode 100644 index 000000000..9f85e983d --- /dev/null +++ b/docs/proposal/kernel_native_mode_restart.md @@ -0,0 +1,96 @@ +# Kernel-Native Mode Restart Configuration Persistence + +## Overview + +Kmesh supports uninterrupted acceleration capabilities during restarts and automatically manages related configurations. + +## Motivation + +Using Kmesh for network acceleration in a K8s cluster, ensuring that acceleration capabilities remain uninterrupted during Kmesh restart scenarios provides a significant competitive advantage through seamless restarts and upgrades. + +### Objectives + +1. Persistently save related configurations locally during Kmesh restart and automatically manage them to maintain uninterrupted acceleration capabilities. +2. Automatically restore related configurations and update them after a restart. + +### Proposal + +Implement configuration persistence management and service continuity. + +#### Configuration Persistence Management + +- Determine if Kmesh is shutting down normally or restart. If it's a restart, persist relevant configurations to a specified directory. +- Persist the eBPF programs used for service traffic management, allowing independent traffic governance even after Kmesh shuts down, thus ensuring service continuity. +- Persist other relevant functional configurations: + - Management features: Automatically pull the latest configurations and refresh them after each restart. + - Certificate subscription: Reacquire certificates after each restart. + +#### Configuration Restoration and Update + +- On Kmesh startup, determine if it is a fresh start or a restart. If it’s a restart, restore configurations from the specified directory and compare them with the latest received configurations for updates. + +### Limitations + +Currently, upgrade scenarios are not supported but will be in the future. + +If Kmesh experiences a coredump leading to a restart, configuration persistence capabilities cannot be achieved. + +## Design Details + +### Configuration Persistence Management + +
+ +![kernel_native_mode_restart](pics/kernel_native_mode_restart.svg) + +
+ +- `ebpf_prog` is used for traffic governance operations after Kmesh shuts down. +- `ebpf_map` records configurations for traffic governance operations after Kmesh shuts down. +- `hashName` records the hash value of each XDS configuration tree for comparison to detect changes. +- `Kmesh_version` records the version information of Kmesh for comparing whether Kmesh is in a restart or upgrade state. +- `tail_call_map` tracks eBPF tail call information, recording tail call programs. + +#### Persistence Operations + +- `ebpf_prog` with `sockconn`, `sockops`, and `tracepoint` ensures that eBPF traffic governance capabilities remain uninterrupted after being solidified. + + - Use `bpf_link` to pin the attached `sockconn`/`sockops` programs. + + - Directly pin the tracepoint programs to the specified directory. +- Pin other eBPF maps directly to the specified directory. + + - Special design: The `ebpf_tail_call` map requires special handling; it must be separately pinned to the file directory. + +This functional eBPF ensures that after Kmesh shuts down, traffic governance rules continue to be applied based on existing rules. + +- Pin `Kmesh_version` to the specified directory. + + - This `ebpf_map` primarily records the Kmesh version and serves as a basis for determining whether to read from the specified directory or treat it as a new start. + +- `hashName` serializes and saves the hash of the cached XDS configuration to a file in the `/mnt` directory for comparison after a restart. + +### Configuration Restoration and Update + +1. Load eBPF programs after a restart. + + 1. Restore `Kmesh_version` from the specified directory to determine if it is a restart scenario and whether configuration restoration is needed. + + 2. Restore `inner_map_mng` information from the specified directory and update fd information based on the `inner_map` id. + + 3. Restore `bpf_map` from the pinned specified directory and restore the tail call map. + + 4. Start new eBPF programs for `sockconn`/`sockops`, attach them, update and replace the programs in `bpf_link`, and refresh `tail_call_map`, replacing the old tail call programs for seamless replacement. + + 5. After starting the new tracepoint eBPF programs, attach them and then delete the old ones to complete the replacement. + +2. Compare saved old data with newly acquired data and refresh. + + During the Kmesh startup process, all XDS configurations will be fully subscribed, and `bpf_map` will be updated through a complete overwrite. Therefore, for new and updated XDS configurations, they will be loaded into `bpf_map`, and we only need to consider deletions during the restart process. + + Restore the XDS configuration tree from the `/mnt` directory to a variable and compare it with the latest subscribed XDS configuration cache. Remove records that exist in the persisted file but not in the cache to ensure the accuracy of the `bpf_map` configuration. + +### Outstanding Issues + +1. The current refresh granularity for the XDS tree is at the top-level config. Future iterations will refine this granularity. +2. Coordination between other single-point features and restart functionality remains to be considered. \ No newline at end of file diff --git a/docs/proposal/kernel_native_mode_restart_zh.md b/docs/proposal/kernel_native_mode_restart_zh.md new file mode 100644 index 000000000..77a92d0bf --- /dev/null +++ b/docs/proposal/kernel_native_mode_restart_zh.md @@ -0,0 +1,104 @@ +## kernel-native模式重启配置持久化 + +### 概要 + +kmesh支持在重启过程中加速能力不中断,且自动管理相关配置 + +### 动机 + +在K8s集群中使用kmesh进行网络加速,在kmesh重启场景下加速能力不中断,提供无感重启/升级会给kmesh带来很大的竞争力 + +#### 目标 + +1. 在kmesh重启场景下将相关配置保存到本地并自动管理,保持加速能力不中断 +2. 重启后自动恢复相关配置并更新 + +### 提议 + +实现配置持久化管理与服务不中断 + +配置持久化管理: + +- 在kmesh关闭的时候判断是正常关闭还是重启场景,是重启场景则将相关配置持久化保存到指定目录 +- 将具体用于服务流量的eBPF程序持久化,在kmesh关闭后依然可以根据配置独立提供流量治理服务,实现服务不中断 +- 其他相关功能配置持久化 + - 纳管功能:在每次重启后自动拉取最新配置并刷新 + - 证书订阅:在每次重启后重新获取证书 + + +配置恢复与更新: + +- 在kmesh启动的时候判断是新启动还是重启场景,是重启场景则从指定目录恢复配置,与接收到的最新配置做差异比较并更新 + +### 限制 + +当前未支持升级场景,后续会支持 + +kmesh如果进程coredump导致重启,无法实现正常的配置持久化能力 + +## 设计细节 + +### 配置持久化管理 + +
+ +![kernel_native_mode_restart](pics/kernel_native_mode_restart.svg) + +
+ +- ebpf_prog用于kmesh关闭后进行流量治理操作 +- ebpf_map用于记录kmesh关闭后提供流量治理操作的配置 +- hashName用于记录每棵XDS配置树的hash值,用于比较XDS配置树是否发生变化 +- kmesh_version用于记录kmesh的版本信息,用于比较kmesh是否是重启/升级 +- tail_call_map用于记录ebpf_tail_call信息,记录tail_call prog + +#### 持久化操作 + +- ebpf_prog有sockconn/sockops/tracepoint,固化后可保证eBPF流量治理能力不中断 + + - 使用bpf_link将attach过的sockconn/sockops的bpf程序固化, + + - tracepoint程序直接pin到目录上 +- 固化其他ebpf map,直接pin到指定目录上 + + - 特别设计:ebpf_tail_call的map需要特别处理,需要单独将tail_call的map pin到文件目录中 + + +以上是功能性的eBPF,固化之后可以保证kmesh关闭时,依然依据现有的流量治理规则进行治理 + +- 将kmesh_version pin到指定目录上 + - 该ebpf_map主要用于记录kmesh版本,并作为启动时判断是新启动还是从指定目录中读取ebpf_map的依据 + +- hashName是将记录xds配置的cache中的hash序列化后固化为文件保存于/mnt目录下,用于重启后数据对比 + +### 配置恢复与更新 + +1. 重启后加载eBPF程序 + + 1. 从指定目录恢复kmesh_version,判断是否是重启场景,判断是否需要进行配置恢复 + + 2. 从指定目录恢复inner_map_mng信息,并根据根据inner_map的id更新fd信息 + + 3. 从pin的指定目录恢复bpf_map,从pin的指定目录恢复tail_call 的map + + 4. 启动新的eBPF程序sockconn/sockops,attach后更新替换掉bpf_link中的prog,并且刷新tail_call_map,替换掉旧的tail_call_prog,从而完成无缝替换 + + 5. tracepoint的eBPF程序启动新的之后,attach上再删除旧的,完成替换 + + + + +2. 将保存的旧数据和新获取的数据进行对比并刷新 + + 在kmesh启动过程中会全量订阅所有的XDS配置,且会对bpf_map进行覆盖式的更新,所以对于新增和更新情况的XDS配置,均已刷入bpf_map中,我们只需要考虑在重启过程中删除的xds配置。 + + 将/mnt目录下的XDS配置树从文件恢复到变量中,与最新订阅获取到的XDS配置的Cache作对比,将固化的文件中存在,但是cache中不存在的记录,进行删除,保证bpf_map配置的准确性。 + + + + + +### 遗留事项 + +1. 当前XDS树的刷新粒度为顶层一整个config,后续将会细化刷新粒度 +1. 其他单点功能与重启功能的配合依然有待考虑 \ No newline at end of file diff --git a/docs/proposal/pics/kernel_native_mode_restart.svg b/docs/proposal/pics/kernel_native_mode_restart.svg new file mode 100644 index 000000000..ad309fafd --- /dev/null +++ b/docs/proposal/pics/kernel_native_mode_restart.svg @@ -0,0 +1,4 @@ + + + +
istiod

kmesh-daemon

XDS parsing layer

ListenerCache
ClusterCache
RouteCache
ebpf prog/map
ebpf prog
cgroup connect
sockops
tracepoint
ebpf map
kmesh_version
outer_map
cluster
route
listener
cluster
hashName
tail_call_map
store when Stop kmesh
deserialization
\ No newline at end of file