Skip to content

[ML] Add per-PR changelog YAML entries with schema validation#2920

Open
edsavage wants to merge 8 commits intoelastic:mainfrom
edsavage:changelog-yaml-per-pr
Open

[ML] Add per-PR changelog YAML entries with schema validation#2920
edsavage wants to merge 8 commits intoelastic:mainfrom
edsavage:changelog-yaml-per-pr

Conversation

@edsavage
Copy link
Copy Markdown
Contributor

Summary

Replaces the monolithic CHANGELOG.md approach with per-PR YAML changelog files, modelled after the Elasticsearch repository's changelog system.

Each PR that changes user-visible behaviour adds a small YAML file (docs/changelog/<PR_NUMBER>.yaml) with structured metadata:

pr: 2914
summary: Split build and test into separate pipeline steps
area: Build
type: enhancement
issues: []

What's included

  • docs/changelog/ — directory for per-PR YAML entries, with a README explaining the format and a JSON schema for validation
  • dev-tools/validate_changelogs.py — Python script that validates entries against the schema (filename convention, field types, enum values, PR number cross-check)
  • dev-tools/bundle_changelogs.py — Python script that generates consolidated release notes (Markdown or AsciiDoc) from individual YAML entries, grouped by type and area
  • Gradle tasksvalidateChangelogs and bundleChangelogs for local developer use
  • Buildkite CI step — added to format_and_validation.yml.sh as a soft_fail step during rollout, with automatic skip for PRs labelled >test, >refactoring, >docs, or >build

Benefits

  • No more merge conflicts in CHANGELOG.md
  • Simpler backports — changelog entry travels with the PR, no separate file to conflict
  • Structured data — enables automated release notes generation
  • Schema validation — catches errors early in CI

Rollout plan

The CI step is set to soft_fail: true initially, giving the team time to adopt the new workflow before making it mandatory.

Test plan

  • Validated validate_changelogs.py locally with valid and invalid YAML files
  • Verified error messages for: wrong filename, missing fields, invalid enums, PR number mismatch, extra fields
  • Tested bundle_changelogs.py markdown and asciidoc output
  • Confirmed Gradle validateChangelogs task wiring
  • CI build passes with the new Buildkite step

Made with Cursor

@prodsecmachine
Copy link
Copy Markdown

prodsecmachine commented Feb 26, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@edsavage
Copy link
Copy Markdown
Contributor Author

Review: Interaction with the existing monolithic changelog

The new per-PR YAML changelog system and the existing docs/CHANGELOG.asciidoc are completely independent — there's no integration between them. A few things to consider before merging:

No migration or replacement plan

The existing docs/CHANGELOG.asciidoc is untouched. There's no code to append bundled entries into it, nor any plan to deprecate it. Contributors could end up maintaining both systems in parallel.

No deduplication

Nothing prevents the same change from appearing in both the monolithic file and a per-PR YAML entry.

No release workflow

bundle_changelogs.py can output AsciiDoc, but there's no automation to merge its output into docs/CHANGELOG.asciidoc at release time, nor to clean out processed YAML files after a release.

Format mismatch

The existing changelog uses AsciiDoc macros like {ml-pull}2863[#2863] for links, while the bundler generates raw GitHub URLs. They wouldn't be stylistically consistent if combined.

Grouping mismatch

The existing file groups by Elasticsearch version (== {es} version 9.4.0), while the YAML schema has no version field — entries are just grouped by type and area.

Suggestions

To be production-ready, this would need:

  • A decision on whether the monolithic file is being replaced or supplemented
  • A release-time workflow to merge YAML entries into the existing format (or replace it)
  • Cleanup of processed YAML files after each release
  • Consistent link/reference formatting between the two systems

@valeriy42
Copy link
Copy Markdown
Contributor

@edsavage , I am very excited about this change. However, this should be the first step to resolving #2217 and we need to decide on the complete plan on how integrate ML changeslog in the ES release docs process. I expect that some design decisions will extend/adjust the yaml schema that you are using now. Once we have this, we should ditch CHANGELOG.asciidoc completely and only use the single schema.

I think a couple of things changed since 2022, which makes #2217 more approachable and relevant:

  1. Introduction of AI code assistants significantly reduced the implementation costs, and hence, the ROI argument of Dave Roberts does not have the same validity anymore.
  2. We have many new developers who are more comfortable with ES processes. Aligning the ML-CPP documentation process with ES will simplify their work and reduce errors.

Can you please plan the required changes for the complete integration of the release doc processes, and identify the open questions we still need to answer before moving forward?

@edsavage
Copy link
Copy Markdown
Contributor Author

edsavage commented Mar 9, 2026

Design Plan: Integrating ml-cpp Changelogs into the ES Release Notes Pipeline

Following up on @valeriy42's request to plan the complete integration of ml-cpp changelog entries into the Elasticsearch release documentation process, resolving #2217.

Current State

  • ml-cpp maintains docs/CHANGELOG.asciidoc manually
  • At ES release time, someone manually copies relevant entries into the Elasticsearch release notes
  • Elasticsearch uses per-PR YAML files (docs/changelog/<PR>.yaml) validated against a JSON schema, bundled and rendered by Gradle tasks (generateReleaseNotes)
  • Machine Learning is already a valid area in the ES changelog schema

Proposed Design

Phase 1: Per-PR YAML changelogs in ml-cpp

Developers add structured changelog entries with each ml-cpp PR. The schema should align with the ES changelog schema (build-tools-internal/src/main/resources/changelog-schema.json) as closely as possible:

pr: 2914
summary: "Split build and test into separate pipeline steps"
area: Machine Learning
type: enhancement
issues: []

Key schema decisions:

  • Use the ES area enum — most entries would use Machine Learning, but some could use Inference or other valid ES areas
  • Use the ES type enumbug, enhancement, feature, breaking, deprecation, etc.
  • Support highlight and breaking objects — same structure as ES, for entries warranting release highlights or breaking change notices
  • pr field — references the ml-cpp PR number (not an ES PR). This diverges from ES where pr is always an ES PR number
  • Add optional es-pr field — for cross-repo changes where a corresponding ES PR exists
Auto-generation of changelog entries

In the ES repo, elasticsearchmachine automatically generates changelog YAML files for PRs. When a PR is opened, the bot:

  1. Creates docs/changelog/<PR_NUMBER>.yaml with fields derived from the PR metadata (title, labels, linked issues)
  2. Pushes a commit to the PR branch (attributed to the PR author) with the message Update docs/changelog/<PR_NUMBER>.yaml
  3. Comments on the PR: "Hi @author, I've created a changelog YAML for you."
  4. If the PR title or labels change, the bot updates the file and comments: "I've updated the changelog YAML for you."

Developers can then customise the generated file if needed (e.g. adjusting the summary wording).

This automation likely runs via Homer or another internal Elastic tool configured in elastic/elasticsearch-infra. Since ml-cpp doesn't have this integration, we should replicate and build on it with a GitHub Action:

Proposed mechanism — changelog-check GitHub Action (runs on pull_request):

  1. On PR open/edit/label: the workflow checks whether docs/changelog/<PR_NUMBER>.yaml exists in the PR branch
  2. If missing and required: it auto-generates a changelog YAML file from PR metadata:
    • pr — from the PR number
    • summary — from the PR title
    • area — defaults to Machine Learning (can be overridden by PR labels)
    • type — inferred from PR labels (>bugbug, >enhancementenhancement, >featurefeature, >breakingbreaking, >deprecationdeprecation, default → enhancement)
    • issues — extracted from any Fixes #NNN / Closes #NNN references in the PR body
  3. Commit the generated file directly to the PR branch, so the developer can review and adjust it
  4. If PR metadata changes: update the generated file (matching the ES bot behaviour)
  5. If already manually edited: validate the existing file against the schema but don't overwrite the developer's changes
  6. If not required: skip silently (based on skip labels)

This replicates the ES workflow while being self-contained — no dependency on Homer or external tooling.

As an alternative (or complement), we could provide a CLI helper:

# Generate changelog YAML from PR metadata
./dev-tools/generate_changelog.sh 2914

Validation: CI validates entries against the schema on every PR (soft-fail initially, then hard-fail).

Location: docs/changelog/<PR_NUMBER>.yaml in the ml-cpp repo.

Skip logic: PRs labelled >test, >refactoring, >docs, >build, or >non-issue would not require a changelog entry.

Phase 2: Integration with the ES release notes pipeline

Three possible approaches, in order of preference:

Option A — ES build pulls ml-cpp changelogs at bundle time (recommended)

Extend BundleChangelogsTask in elastic/elasticsearch to read changelogs from ml-cpp in addition to the local docs/changelog/ directory:

  1. Add a Gradle configuration for external changelog sources (repo + path)
  2. At bundle time, fetch ml-cpp's docs/changelog/ directory (via git clone or GitHub API)
  3. Merge ml-cpp entries into the bundle, adjusting PR links to point to elastic/ml-cpp

Requires a PR to elastic/elasticsearch build-tools-internal and buy-in from the ES build/release team.

Option B — CI pushes ml-cpp entries to the ES repo

When an ml-cpp PR is merged, a GitHub Actions workflow creates a corresponding YAML file in the ES repo via PR:

  1. ml-cpp CI creates docs/changelog/ml-cpp-<PR>.yaml in the ES repo
  2. Uses the standard ES schema with a naming convention to avoid PR number collisions
  3. The pr field would need special handling (or an external_pr / source_repo field)

Simpler to implement but adds cross-repo coupling and noise to the ES repo.

Option C — Release-time script (interim)

A script collects ml-cpp changelogs, converts them to ES-compatible format, and creates a single PR in the ES repo at release time. Less automation but lowest risk — good as an interim step while working toward Option A.

Phase 3: Deprecate CHANGELOG.asciidoc

Once the YAML system is integrated:

  1. Stop updating docs/CHANGELOG.asciidoc
  2. Replace its contents with a pointer to the ES release notes
  3. Add a pruneChangelogs equivalent that removes YAML files after they are included in a release

Phase 4: Backport considerations

Changelog YAML files travel with the PR — they are just files in the repo. When backporting:

  • The YAML file is cherry-picked along with the code change
  • The same entry appears on the version branch, which is correct for that version's release notes
  • No special handling needed (this is an advantage of per-file changelogs vs a monolithic file)

Open Questions

  1. PR number linkage — The ES schema uses ES PR numbers. ml-cpp entries reference ml-cpp PRs. How should these appear in the generated release notes? Options:

    • Use {ml-pull} macro format (existing convention in CHANGELOG.asciidoc)
    • Extend the ES schema with a source_repo field
    • Use a filename convention (e.g., ml-cpp-2914.yaml)
  2. Cross-repo changes — When a change spans both ES and ml-cpp (e.g., new ML feature with a Java API surface), where does the changelog entry live? Both repos? One with a cross-reference? The current convention is to mark the ES PR as >non-issue and reference both {ml-pull} and {es-pull} in the ml-cpp changelog.

  3. ES build-tools ownership — Extending BundleChangelogsTask requires buy-in from the ES build/release team. Should we propose this, or start with a simpler integration path (Option C)?

  4. Homer / elasticsearchmachine integrationelasticsearchmachine auto-generates changelog YAML files for ES PRs (creates the file, commits it to the PR branch, and updates it when PR metadata changes). This automation likely runs via Homer or similar internal tooling in elastic/elasticsearch-infra. Key questions:

    • Could this automation be extended to also handle ml-cpp PRs?
    • Would the ES team prefer ml-cpp to use the same tooling, or is an independent GitHub Action acceptable?
    • Who owns the elasticsearchmachine changelog automation and can we request changes?
  5. Version scoping — ES changelogs are pruned per release. How do we handle the version boundary in ml-cpp? Should we prune after each ES release that includes ml-cpp changes?

  6. Which PRs need entries? — Should every ml-cpp PR have a changelog entry, or only user-facing changes? What labels indicate "no changelog needed"?

  7. Historical entries — Should we backfill existing CHANGELOG.asciidoc entries as YAML, or draw a line and only use YAML going forward?

Suggested Next Steps

  1. Align on the schema — confirm that using the ES area/type enums works for ml-cpp
  2. Investigate elasticsearchmachine — determine how the ES changelog auto-generation works and whether it can be extended to ml-cpp, or whether an independent GitHub Action is preferred (question 4)
  3. Answer question 3 — reach out to the ES build/release team about the preferred integration method
  4. Implement Phase 1 — update this PR to use the ES-compatible schema, including the changelog-check GitHub Action for auto-generation
  5. Start with Option C — build a release-time script as an interim integration while pursuing Option A

@valeriy42
Copy link
Copy Markdown
Contributor

Your plan looks good to me.

Cross-repo changes — When a change spans both ES and ml-cpp (e.g., new ML feature with a Java API surface), where does the changelog entry live? Both repos? One with a cross-reference? The current convention is to mark the ES PR as >non-issue and reference both {ml-pull} and {es-pull} in the ml-cpp changelog.

IMO, if we do this, we shouldn't rely on unwritten convention, and both PRs should be bundled together.

Version scoping — ES changelogs are pruned per release. How do we handle the version boundary in ml-cpp? Should we prune after each ES release that includes ml-cpp changes?

I don't understand this question. ES and ml-cpp have the same version boundary.

Which PRs need entries? — Should every ml-cpp PR have a changelog entry, or only user-facing changes? What labels indicate "no changelog needed"?

We handle it the same way as it is handled in ES and ml-cpp today.

Historical entries — Should we backfill existing CHANGELOG.asciidoc entries as YAML, or draw a line and only use YAML going forward?

We should backfill for the 3 branches we are supporting: main/9.4.0, 9.3.2, 9.2.7, and 8.19.13. This shouldn't be much

With regard to #2217 , @pugnascotia do you see any problems with Phase 2 Option A (ES build pulls ml-cpp changelogs at bundle time) in Ed's comment above?

@pugnascotia
Copy link
Copy Markdown

With regard to #2217 , @pugnascotia do you see any problems with Phase 2 Option A (ES build pulls ml-cpp changelogs at bundle time) in Ed's comment above?

I think this is the most appropriate option.

edsavage and others added 5 commits April 9, 2026 13:06
Replace the monolithic CHANGELOG.md with per-PR YAML changelog files
in docs/changelog/. Each PR that changes user-visible behaviour adds
a small YAML file (<PR_NUMBER>.yaml) with structured metadata (area,
type, summary). This eliminates merge conflicts in CHANGELOG.md and
simplifies backports.

Includes:
- JSON schema for validating changelog entries
- Python validation script (validate_changelogs.py)
- Python bundler script (bundle_changelogs.py) for release notes
- Gradle tasks: validateChangelogs, bundleChangelogs
- Buildkite CI step (soft-fail during rollout)
- Skip validation via >test, >refactoring, >docs, >build labels

Made-with: Cursor
The validate-changelogs step ran on ml-check-style:2 (Alpine with
only clang/bash/git, no Python), causing "pip: command not found".

Switch the step to python:3.11-slim and install git on demand. Use
python3 -m pip with --break-system-packages for PEP 668 compat.

Made-with: Cursor
Replaces the ml-cpp-specific changelog schema with the exact
Elasticsearch changelog schema so that entries can be consumed
directly by the ES release notes pipeline (Phase 2 Option A).

Key changes:
- area enum: ES-wide values (most entries use "Machine Learning")
- type enum: adds breaking-java, known-issue, new-aggregation,
  security, upgrade
- Adds highlight, breaking, and deprecation sub-objects
- pr/area not required for known-issue and security types
- Validator allows descriptive filenames for entries without a pr
- Bundler handles all new types and entries without pr/area
- AsciiDoc output uses {ml-pull} macros for consistency

Made-with: Cursor
Adds structured changelog entries for all changes in the active
release branches: main/9.4.0, 9.3.x, 9.2.x, and 8.19.x.

Also includes entries for recent hardening PRs (elastic#3008, elastic#3015)
and the flaky test fix (elastic#3017) that were not yet in
CHANGELOG.asciidoc.

Made-with: Cursor
@edsavage
Copy link
Copy Markdown
Contributor Author

edsavage commented Apr 9, 2026

Update: Investigation findings and next steps

Step 1: Schema alignment — done ✅

The changelog schema has been updated to be an exact copy of the Elasticsearch changelog schema. This means ml-cpp entries are directly consumable by the ES release notes pipeline with no transformation needed.

Key changes:

  • area enum now uses ES values (Machine Learning, Inference, etc.) instead of ml-cpp-specific values
  • type enum expanded to include all ES types (breaking-java, known-issue, security, upgrade, etc.)
  • Added highlight, breaking, and deprecation sub-objects matching the ES schema exactly
  • pr/area conditionally required (not needed for known-issue/security types)
  • Validator updated: allows descriptive filenames for entries without a PR number
  • Bundler updated: handles all new types, AsciiDoc output uses {ml-pull} macros

Step 2: Backfill — done ✅

Created changelog entries for all changes on active branches: main/9.4.0, 9.3.x, 9.2.x, 8.19.x (10 entries). Also includes recent hardening PRs (#3008, #3015) and flaky test fix (#3017).

Step 3: Homer bot investigation — done ✅

Correcting my earlier assumption: changelog YAML files are auto-generated in the ES repo by Homer (elastic/elasticsearch-infra/homer).

How it works:

  1. Homer is a Spring Boot app running in the elastic-apps Kubernetes cluster
  2. It listens for GitHub PR webhook events (OPENED, EDITED, LABELED, etc.)
  3. On PR creation, it generates a changelog YAML from PR metadata (title → summary, labels → area/type)
  4. It pushes the commit to the PR branch attributed to the PR author (using OAuth tokens), not as elasticsearchmachine
  5. It comments "Hi @user, I've created a changelog YAML for you" via the elasticsearchmachine account
  6. When labels change, it updates the file and comments "I've updated the changelog YAML for you"
  7. Users can also comment @elasticmachine generate changelog to regenerate

Configuration is per-repository in homer/k8s/production.yaml under github.changelogs:

changelogs:
  - repository: elastic/elasticsearch
    applicableBranches: [...]
    ignoredLabels: ['>test', '>refactoring', ...]
    filteredLabels: [...]
    areaOverrides: {...}

To enable Homer for ml-cpp, someone with access to elastic/elasticsearch-infra would need to add an entry for elastic/ml-cpp. The ES delivery team (@pugnascotia) owns Homer.

Proposed next steps

  1. Request Homer integration — ask the ES delivery team to add elastic/ml-cpp to Homer's github.changelogs config. This would give us auto-generation for free. @pugnascotia is the original author of both the changelog system and Homer — since he's already endorsed Option A on this PR, this seems like a natural ask.

  2. Phase 2 Option A — extend BundleChangelogsTask in elastic/elasticsearch to pull ml-cpp changelogs at bundle time. Since our schema is now identical to ES, this should be straightforward. This resolves Generate changelog YAML files for ML? #2217.

  3. Interim (Option C) — while the above are being implemented, we can ship a release-time script that collects ml-cpp changelogs and creates a PR in the ES repo.

@valeriy42 @pugnascotia — thoughts on requesting Homer integration? Is this something the delivery team would be open to?

@edsavage edsavage force-pushed the changelog-yaml-per-pr branch from e2d58c8 to 3050d2f Compare April 9, 2026 01:24
@edsavage
Copy link
Copy Markdown
Contributor Author

edsavage commented Apr 9, 2026

Correction: Homer changelog configuration location

The changelog config is in homer/src/main/resources/application.yml (Spring Boot defaults), not in production.yaml. Currently only elastic/elasticsearch is configured:

github:
  changelogs:
    - repository: 'elastic/elasticsearch'
      applicableBranches: [master, main, '8\.(x|\d+)', '7\.17']
      ignoredLabels: ['>non-issue', '>refactoring', '>docs', '>test', '>test-failure', '>test-mute', '>tech-debt', ':Delivery/Build', ':Delivery/Cloud', ':Delivery/Tooling', 'backport', 'WIP']
      filteredLabels: ['>new-field-mapper']
      areaOverrides:
        ml: Machine Learning
        # ...

To add ml-cpp, we'd need:

  1. A new entry in this changelogs list (in application.yml or overridden in production.yaml)
  2. A GitHub webhook on elastic/ml-cpp pointing to homer.app.elstc.co/webhook/github for pull_request events

Example config for ml-cpp:

    - repository: 'elastic/ml-cpp'
      applicableBranches: [main, '9\.\d+', '8\.\d+']
      ignoredLabels: ['>non-issue', '>refactoring', '>docs', '>test', '>build']
      areaOverrides: {}

Since almost all ml-cpp entries will use area: Machine Learning, the areaOverrides would likely be empty. The ChangelogUtils.createChangelogEntry() derives the area from PR labels — we'd need to check how ml-cpp labels map to the ES area enum.

elasticsearchmachine added 3 commits April 9, 2026 14:19
Interim tool (Option C) to bridge ml-cpp changelogs into the ES
release notes pipeline until the full BundleChangelogsTask
integration (Option A) is implemented.

The script copies changelog YAML entries from docs/changelog/ to
the ES repo's docs/changelog/ with a 'ml-cpp-' filename prefix
to avoid PR number collisions.

Supports:
- --dry-run to preview what would be exported
- --target to specify the ES docs/changelog/ directory
- --create-pr to automatically create a PR in the ES repo
- --prune to delete source entries after a successful release
- --version to label the export with a version number

Made-with: Cursor
- Validate all entries against the JSON schema before exporting
- Verify the target directory is inside an ES checkout (checks for
  build.gradle, settings.gradle, and docs/changelog/)
- Detect pre-existing files at the destination:
  - Identical files are silently skipped
  - Different files show a unified diff and prompt the user to
    overwrite, skip, or abort the entire export
- Use the verified ES repo root for --create-pr instead of fragile
  parent-of-parent path assumption

Made-with: Cursor
Adds the optional source_repo field to the changelog schema, matching
the corresponding change in the Elasticsearch repo. This field tells
the ES release notes generator which GitHub repo to use for PR links.

The export script now injects source_repo: elastic/ml-cpp into
exported entries automatically, so they link correctly in the ES
release notes.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants