Skip to content

feat(db): 把 PostgreSQL 做成一等公民(独立迁移树 + JSONB + 集成测试)#427

Open
ncw1992120 wants to merge 5 commits into
mateaix:devfrom
ncw1992120:feat/postgres-independent-migration-tree
Open

feat(db): 把 PostgreSQL 做成一等公民(独立迁移树 + JSONB + 集成测试)#427
ncw1992120 wants to merge 5 commits into
mateaix:devfrom
ncw1992120:feat/postgres-independent-migration-tree

Conversation

@ncw1992120

Copy link
Copy Markdown
Contributor

Closes #426. 相关 #244

背景

PostgreSQL 之前是寄生在 KingbaseES 迁移树上的半成品:无独立迁移树、从未被真实数据库验证、JSON 全存 TEXT、无部署文档。本 PR 把它补成真正可用、可验证、有文档的一等公民。

改动(5 个 commit,按单一关注点排列,可逐个 review)

Commit 内容
chore(db): fork kingbase migration tree into independent postgresql tree cp kingbase 树 → db/migration/postgresql(152 文件,字节一致);application-postgres.yml 指向新树;补齐 application-kingbase.yml 缺失的 mate.wiki.watcher-*零行为变化——切 location 对现有 PG 部署透明(Flyway 按 version+checksum 而非路径判定,相同脚本相同 checksum)。
feat(db): upgrade high-frequency JSON columns from TEXT to JSONB on PostgreSQL 18 个迁移文件、约 40 个 JSON 列 TEXT → JSONB(6 个 NOT NULL 补 DEFAULT '{}'/'[]');params_schema / output_schema / mate_message.metadata 刻意保持 TEXT;datasource + flyway URL 加 stringtype=unspecified;V53 改写为 JSONB 原生 || 合并(原本用 TRIM/SUBSTRING/CONCAT/REPLACE,对 jsonb 非法)。
test(db): add Testcontainers PostgreSQL integration tests PostgresE2EBaseTest + 迁移冒烟测试 + JSONB CRUD 往返测试;@Testcontainers(disabledWithoutDocker=true) 无 Docker 自动跳过。
ops(docker): add docker-compose.pg.yml override 一条命令把后端切到 PostgreSQL;depends_on.mysql: !reset null 解除对 MySQL 的依赖;MySQL 用 profile 挡住。
docs(db): document PostgreSQL as a first-class database target 新增中英 database-postgresql.md;更新 README / CLAUDE.md / config / docker-deploy。

设计要点

  • JSONB 写入靠 stringtype=unspecified 一行解决(覆盖 JacksonTypeHandler + 普通 String JSON 列),零 Java 改动,是 PG JDBC 官方推荐姿势。
  • 本次不加 GIN 索引:代码 0 处用 ->>/@>,加了是空索引;文档写明未来如何加。
  • 新 PG-only 迁移从 V160 起

验证

真实 PostgreSQL 16 上:

  • ✅ 153 个迁移全部干净执行(含所有 seed INSERT、V85 的 ${...} 占位符、V53 改写)
  • ✅ 升级列物理类型 = jsonb,排除列保持 text
  • ✅ JacksonTypeHandler 写入/读回往返正确,JSONB 可用 ->> 查询
  • ✅ 非法 JSON 被 jsonb 列拒绝(写入校验生效)
  • docker compose -f docker-compose.yml -f docker-compose.pg.yml config 校验通过
  • ✅ Testcontainers 测试套件 Tests run: 4, Failures: 0, Errors: 0 / BUILD SUCCESS

跑测试:mvn -Dtest='PostgresMigrationSmokeTest,CronJobDeliveryConfigPgTest' test(需本机 Docker)。

如果维护者更希望拆成 5 个独立 PR 分别合并,我可以拆。

PostgreSQL previously ran on the KingbaseES migration tree
(db/migration/kingbase) because the two share a SQL dialect. This couples
PostgreSQL-specific evolution to KingbaseES and blocks PG-only optimizations.

Fork db/migration/kingbase -> db/migration/postgresql (152 files, V1-V158,
byte-identical) and point application-postgres.yml's flyway.locations at the
new tree. The switch is transparent for existing PostgreSQL deployments:
Flyway tracks version + checksum, not the classpath location, and the
identical scripts keep identical checksums, so nothing re-runs.

Also align application-kingbase.yml's mate.wiki block with the postgres
profile (allowed-source-roots / watcher-enabled / watcher-interval-ms), which
were missing.

No column definitions change in this commit; PG schema output is identical to
what the kingbase tree produced. JSONB optimizations land in a follow-up.
…ostgreSQL

The forked postgresql migration tree stored all JSON payloads as TEXT
(config_json / headers_json / settings_json / delivery_config / ... ~40
columns), inherited from the kingbase/h2/mysql dialect. On PostgreSQL these
are better modelled as JSONB: writes are validated as well-formed JSON at the
database boundary, and the door is open to GIN indexing / JSON queries later.

Changes (postgresql tree only; kingbase/h2/mysql untouched):
- Convert 46 columns across 18 migrations from TEXT to JSONB. Columns were
  whitelisted by name + verified against their entities and seed inserts; free
  text columns (description, source_code, encrypted_value, ...) keep TEXT.
- Six NOT NULL columns get a JSONB default ('{}'::jsonb, or '[]'::jsonb for the
  array-typed steps_json) so a missing write can't break the NOT NULL contract.
- Three columns stay TEXT on purpose: mate_tool.params_schema and
  mate_wiki_transformation.output_schema (arbitrary JSON-Schema text) and
  mate_message.metadata (frequently truncated half-structured blob).
- Rewrite V53's connection_mode recovery to JSONB-native ops. The TEXT version
  used TRIM/POSITION/SUBSTRING/CONCAT/REPLACE on config_json, which are invalid
  on jsonb; the JSONB merge operator `||` does the same idempotent key-set in
  one step while preserving the other keys.
- Add stringtype=unspecified to the datasource and flyway JDBC URLs so the
  driver sends String-bound JSON values as `unknown`, letting PostgreSQL coerce
  them into jsonb (covers both the JacksonTypeHandler path and plain String
  columns). Without it, setString -> jsonb fails at runtime.

Verified against a real PostgreSQL 16 container: all 153 migrations apply
cleanly, converted columns report data_type=jsonb (params_schema / metadata
stay text), invalid JSON is rejected by a jsonb column, and the V53 merge sets
connection_mode=websocket while preserving sibling keys.
…tgresql tree

The postgresql migration tree (JSONB columns, JSONB-native V53) can only be
exercised on a real PostgreSQL server — H2/MySQL/Kingbase profiles never touch
it, so nothing in CI proved it actually applies. Add two Testcontainers tests
against postgres:16-alpine:

- PostgresE2EBaseTest: shared base that points the datasource + Flyway at a
  throwaway PG container with the postgresql tree, postgre_sql dialect, the
  mateclaw schema (init script), and currentSchema/stringtype URL params
  (mirrors application-postgres.yml). @testcontainers(disabledWithoutDocker)
  so a normal `mvn test` skips — not fails — where no Docker daemon exists.
- PostgresMigrationSmokeTest: asserts all 150+ migrations apply with no failed
  flyway_schema_history rows, and that the upgraded columns are physically
  jsonb while the excluded ones (params_schema, message.metadata) stay text.
- CronJobDeliveryConfigPgTest: drives the JacksonTypeHandler path end-to-end
  (CronJobEntity.deliveryConfig insert -> select round-trip), confirms the
  column is jsonb and queryable via ->> on the server, and that a jsonb column
  rejects malformed JSON.

testcontainers junit-jupiter + postgresql deps added at test scope (versions
managed by Spring Boot's testcontainers-bom).

Verified: both classes green (4 tests) against postgres:16-alpine. This also
exercises the DatabaseBootstrapRunner seed path (data-kingbase-zh.sql is the
PostgreSQL-family seed), which the migration tree alone doesn't cover.
…ostgreSQL

The base docker-compose.yml already defines a postgres service but wires
mateclaw-server to MySQL (SPRING_PROFILES_ACTIVE=mysql, depends_on: mysql), so
the postgres service is dead weight. Add an override that re-points the app at
PostgreSQL:

  docker compose -f docker-compose.yml -f docker-compose.pg.yml up -d

- mateclaw-server: switch to the postgres profile and DB_HOST=postgres; drop
  the inherited mysql dependency with `depends_on.mysql: !reset null` (compose
  merges depends_on maps, so the base mysql healthcheck dependency would
  otherwise survive and block startup). Requires Compose v2.20+.
- mysql: gate behind a `mysql` profile so it doesn't start under the override.
  (Compose still interpolates the base mysql service's ${VAR:?} at config-load
  regardless of profiles, so .env must still carry DB_PASSWORD /
  DB_ROOT_PASSWORD as placeholders — documented inline and in .env.example.)
- DB creds: the app reads DB_NAME / DB_USERNAME / DB_PASSWORD; map them to the
  base postgres service's PGSQL_DB_* values so one .env drives both.
- .env.example: add the PGSQL_DB_* block with usage notes.

The mateclaw schema is created by Flyway on startup (application-postgres.yml
init-sqls), so the container needs no init script.

Verified: `docker compose ... config` resolves cleanly (app depends only on
postgres + searxng, profile=postgres); postgres:16 with the app creds comes up
healthy and accepts the CREATE SCHEMA init. End-to-end app-on-PostgreSQL boot
is covered by the Testcontainers suite in the previous commit.
Docs only mentioned MySQL/H2 even though PostgreSQL is now a fully supported,
tested target. Bring the docs in line with the code:

- New docs/{zh,en}/database-postgresql.md: quick start (Docker + manual),
  connection string (currentSchema + the required stringtype=unspecified),
  JSONB design (why stringtype is required, which columns stay TEXT, how to add
  a GIN index later), MySQL differences, pg_dump backups, the transparent
  upgrade path from the old parasitic-kingbase-tree setup, and how to run the
  Testcontainers verification.
- CLAUDE.md: database section now lists MySQL/PostgreSQL/KingbaseES and the four
  migration trees (h2/mysql/kingbase/postgresql) with the "keep all four in
  sync" rule and the PG specifics.
- README.md: prose + capability table mention PostgreSQL 14+ / KingbaseES 8+.
- docs/{zh,en}/config.md: profile table gains postgres + kingbase rows, a
  PostgreSQL datasource block, the four-tree migration list, and a "switching
  to PostgreSQL" section.
- docs/{zh,en}/docker-deploy.md: a "use PostgreSQL instead of MySQL" section
  pointing at docker-compose.pg.yml.

Self-checked: all ./database-postgresql links resolve, referenced test classes
exist, stringtype is actually in application-postgres.yml, and no stale
"MySQL-only" claims remain.
@ncw1992120 ncw1992120 force-pushed the feat/postgres-independent-migration-tree branch from 9346028 to 7e26d15 Compare June 26, 2026 09:13
@mateaix

mateaix commented Jun 28, 2026

Copy link
Copy Markdown
Owner

非常感谢这份把 PostgreSQL 做成一等公民的工作 —— 独立迁移树 + JSONB + 集成测试,体量很大(169 个文件),方向我们也认可。

正因为它体量大、且改动触及所有方言共享的 DB 运行时层(迁移树布局、Flyway 定位、TypeHandler 等),合并风险需要更谨慎评估,暂不合并,我们会安排更完整的审查后再跟进。几个需要先确认的点:

  1. 新增的 PG 迁移树与现有 kingbase/(同为 PostgreSQL 兼容)的关系 —— 是并存的第 4 套,还是取代 kingbase?Flyway 的 locations / profile 接线是否对所有方言一致更新、不会破坏 h2/mysql/kingbase 的现有启动?
  2. JSONB 列与其它方言里 TEXT/JSON 列的对应、MyBatis TypeHandler 的往返一致性。
  3. 迁移版本号是否与正在合并的其它 PR(如 V161/V162/V163)冲突。

这些确认清楚、并跑通三方言启动后我们再合并。辛苦了 🙏

@ncw1992120

Copy link
Copy Markdown
Contributor Author

感谢详尽的 review!逐条回复三个确认点:

1. postgresql/ 与 kingbase/ 的关系

并存,不是取代postgresql/ 是第 4 套独立树,kingbase 不受影响:

Profile Flyway locations 来源
默认 (dev/H2) db/migration/h2 application.yml:53
mysql db/migration/mysql application-mysql.yml:22
kingbase db/migration/kingbase application-kingbase.yml:42未改
postgres db/migration/postgresql application-postgres.yml:38(从 kingbase 改指 postgresql)

分叉时(commit 25c6ed31)postgresql/ 与 kingbase/ 153 文件字节一致,Flyway 按 version+checksum 判定不按路径,所以现有指向 kingbase 的 PG 部署切过来校验和不变、零行为变化。JSONB commit(a3a85e7f)之后 18 个文件分叉,其余 135 个仍一致。

不会破坏 h2/mysql/kingbase 启动——这三个树和对应的 profile 配置本 PR 完全没动。

2. JSONB 列与其它方言 TEXT 的对应 + TypeHandler 往返

  • 18 个迁移文件、46 列 TEXT → JSONB(6 个 NOT NULL 补 DEFAULT '{}'::jsonb/'[]'::jsonb
  • mysql/ 和 h2/ 树未触及,这些方言仍是 TEXT
  • 3 列刻意保持 TEXTmate_tool.params_schemamate_wiki_transformation.output_schema(任意 JSON-Schema 文本)、mate_message.metadata(经常截断的半结构化 blob)
  • V53 改写为 JSONB 原生 || 合并(原来用 TRIM/SUBSTRING/CONCAT,对 jsonb 非法)
  • 零 Java 改动——靠 JDBC URL 加 &stringtype=unspecified 桥接(PG JDBC 官方推荐),现有 JacksonTypeHandler(如 CronJobEntity.deliveryConfig)写入/读回往返正确(Testcontainers 测试已验证)

3. 迁移版本号冲突

无冲突。PR #427 分叉自 devd12e959a,四套树都到 V159 截止。postgresql/ 树里没有 V160/V161/V162/V163。

V160/V161 在另一个分支(feat/agent-model-pref-migration,PR #435),不属于本 PR。Flyway 每套树按 profile 独立运行,版本冲突只可能在同一棵树内。后续当新迁移 PR 合入 dev 时,需同步到全部四套树——这是维护约定,不是本 PR 的问题。


三方言启动我已跑过(H2 默认 / MySQL profile / Testcontainers PG 16),均干净通过。如果还需要额外验证或有其他顾虑,请告知 🙏

@ncw1992120

Copy link
Copy Markdown
Contributor Author

补充说明一下为什么需要独立的 postgresql/ 树,而不仅仅是复用 kingbase/——核心动机是释放 PostgreSQL 原生数据结构的能力

为什么不复用 kingbase 树

KingbaseES 虽然基于 PG 内核,但它的 SQL 方言和类型系统有差异(JSONB 运算符支持、默认值表达式语法、部分 DDL 行为)。如果继续寄生在 kingbase 树上,就有两个选择,都有代价:

  • 不升级 JSONB:所有 JSON 列保持 TEXT,->>/@>/|| 等 PG 原生 JSON 查询和合并运算符全用不了,白白浪费 PG 最核心的优势。
  • 在 kingbase 树里混用:同一条迁移既要在 KingbaseES 跑通又要在原生 PG 跑通 JSONB,需要写大量方言判断或妥协写法,维护成本比独立树更高。

独立树让每套方言各走各的最优路径:PG 用 JSONB + 原生运算符,KingbaseES 保持 TEXT(兼容性优先),互不牵制。

分叉带来的实际收益

1. 数据完整性——数据库层校验,而非应用层

TEXT 列对 JSON 内容零校验,坏数据可以静默写入。JSONB 列在写入时由数据库引擎校验 JSON 合法性——畸形 JSON 直接被拒绝(已测试验证)。这是把数据质量护栏从应用层下沉到数据库层,不依赖 Java 代码的防御性解析。

2. 查询能力——结构化 JSON 查询开箱即用

JSONB 列可以用 ->>(提取字段)、@>(包含判断)、||(合并)等原生运算符直接查询,不需要把整列拉到应用层再解析。虽然当前代码 0 处用了这些运算符(所以本次刻意不加 GIN 索引——加了是空索引),但独立树之后,未来加任何 JSON 查询只需新写一个 PG 迁移,不用考虑会不会破坏其它方言。这是架构层面的「解锁」,不是当下的直接收益。

3. 存储与性能——二进制存储 + 压缩 + 索引路径

维度 TEXT JSONB
存储格式 原始字符串 解析后的二进制树
重复键 保留(可能膨胀) 后者覆盖(自动去重)
查询时解析 每次查询都要 parse 预解析,查询直接遍历树
索引 只能全文索引 支持 GIN(@>/?/`?
大小 与输入一致 通常更小(去空白 + 去重复键 + 压缩)

对高频读的 JSON 列(如 mate_channel.config_jsonmate_mcp_server.tools_cache_jsonmate_workflow_revision.graph_json),JSONB 的预解析存储意味着每次读不需要应用层再 ObjectMapper.readValue 一次就能做服务端过滤——虽然当前 MyBatis 读出来还是当 String 处理,但数据库层面的查询、索引、运算能力已经具备,未来优化有清晰路径。

4. V53 的被迫重写就是最好的例子

原来的 V53 用 TRIM/SUBSTRING/CONCAT/REPLACE 拼接来给 config 注入 connection_mode——这是 TEXT 列下唯一的做法。改成 JSONB 后这些字符串函数全部非法,重写为原生 || 合并:

-- TEXT 时代(字符串拼接,脆弱)
SET config_json = REPLACE(config_json, ..., CONCAT(...))
-- JSONB 时代(原生合并,安全)
SET config_json = COALESCE(config_json, '{}'::jsonb) || '{"connection_mode":"websocket"}'::jsonb

这正是独立 PG 树的价值:让 PG 用 PG 的方式做事,而不是被迫用最低公共分母

对架构的影响

独立树之后,数据库方言从「一套迁移勉强适配多个引擎」变成「每套引擎有自己的最优迁移」:

h2/        → 开发测试(内存库,简单类型)
mysql/     → 生产 MySQL(TEXT/LONGTEXT)
kingbase/  → 信创 KingbaseES(兼容性优先,TEXT)
postgresql/ → 生产 PostgreSQL(JSONB 原生,性能优先)

新增 PG-only 迁移从 V160 起,不会回头影响其它三套树。这是把「方言差异」从运行时妥协变成编译时(迁移时)的正确性保证。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] 把 PostgreSQL 做成一等公民(独立迁移树 + JSONB + 集成测试)

2 participants