Skip to content

[ML] Automate version bump in CI pipeline#3018

Open
edsavage wants to merge 2 commits intoelastic:mainfrom
edsavage:feature/version-bump-automation
Open

[ML] Automate version bump in CI pipeline#3018
edsavage wants to merge 2 commits intoelastic:mainfrom
edsavage:feature/version-bump-automation

Conversation

@edsavage
Copy link
Copy Markdown
Contributor

@edsavage edsavage commented Apr 7, 2026

Summary

Replaces the manual block step in the version-bump pipeline with automated version bump logic.

Flow

  1. Bump version — checks out the target $BRANCH, updates elasticsearchVersion in gradle.properties to $NEW_VERSION, commits as elasticsearchmachine, pushes directly to the branch
  2. Fetch DRA Artifacts — polls artifact URLs until the new version is available (unchanged)

Pattern

Follows the established Elasticsearch repo pattern for automated commits from CI:

  • elasticsearchmachine / infra-root+elasticsearchmachine@elastic.co as committer
  • HTTPS push using the Buildkite agent's checkout token
  • git diff-index --quiet HEAD for idempotency
  • git pull --ff-only before push to handle concurrent commits

New file

  • dev-tools/bump_version.sh — standalone script with DRY_RUN=true support for safe testing

Pipeline changes

  • Removed the block step (no manual approval needed)
  • Removed the "blocked" Slack notification
  • Fetch DRA Artifacts now depends on bump-version instead of the block

Portability

  • sed -i handles both macOS (BSD) and Linux (GNU) variants
  • git config uses local scope (not --global)

Test plan

  • CI passes
  • Dry-run: NEW_VERSION=99.99.99 BRANCH=test/version-bump-dry-run DRY_RUN=true — commit created with correct author/message, no push
  • Real push: same test without DRY_RUN — commit pushed to throwaway branch successfully
  • Idempotency: re-running with the same version produces "nothing to do" and no new commit
  • Edge cases: missing NEW_VERSION / BRANCH fail with clear errors; non-existent branch fails at checkout
  • Pipeline JSON: job-version-bump.json.py generates correct step structure with bump-versionfetch-dra-artifacts dependency
  • Throwaway branch cleaned up after testing
  • Repo-level branch protection: elasticsearchmachine added to bypass list on main, 9.3, 9.2, 9.1, 8.18
  • Blocker: Elastic org-level ruleset [org] Require a PR applies to main and versioned branches (via glob patterns like refs/heads/[0-9].[0-9]) and has no bypass actors. Only org admins can update this. Other teams automating version bumps will hit the same issue — coordinate with Release Engineering.
  • End-to-end Buildkite pipeline test (requires PR merged to main and org-level bypass resolved)

Replace the manual block step in the version-bump pipeline with an
automated step that:
1. Checks out the target branch
2. Updates elasticsearchVersion in gradle.properties to $NEW_VERSION
3. Commits as elasticsearchmachine
4. Pushes directly to the branch (no PR needed)

Follows the same pattern as Elasticsearch's automated Lucene snapshot
updates (.buildkite/scripts/lucene-snapshot/update-es-snapshot.sh).

The Fetch DRA Artifacts step now depends on the bump step, ensuring
the version is updated before polling for artifacts at the new version.

Made-with: Cursor
@prodsecmachine
Copy link
Copy Markdown

prodsecmachine commented Apr 7, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@edsavage
Copy link
Copy Markdown
Contributor Author

edsavage commented Apr 7, 2026

⚠️ Branch protection blocker

All target branches (main, 9.3, 8.19, etc.) have branch protection requiring 1 PR approval. elasticsearchmachine has write access to this repo, which is insufficient to bypass PR requirements and push directly.

In the elasticsearch repo, this works because elasticsearchmachine has admin access (with enforce_admins: false), allowing it to bypass PR review requirements for automated commits like the Lucene snapshot updates.

Action needed (repo admin): Either:

  1. Upgrade elasticsearchmachine to admin on elastic/ml-cpp (matches the elasticsearch repo setup), or
  2. Add elasticsearchmachine to the branch protection "Allow specified actors to bypass required pull requests" list

Without this, the git push in bump_version.sh will be rejected with a 403.

@edsavage
Copy link
Copy Markdown
Contributor Author

edsavage commented Apr 7, 2026

Gap analysis vs version bump automation spec

Reviewed the full Version Bump Automation PSI spec. This PR covers the patch workflow basics. Here's what's done and what's still needed:

Done

  • Patch version bump (update gradle.properties, commit, push)
  • Idempotency (skip if version already matches)
  • DRA artifact polling (json-watcher plugin)
  • Slack notifications (#machine-learn-build)
  • Parameters (NEW_VERSION, BRANCH)

Gaps to address

1. Minor version workflow not implemented
The spec requires a minor workflow (feature freeze day) that:

  • Creates a new minor branch from upstream (e.g., 9.3 from main)
  • Sets the version on the new branch
  • Triggers DRA snapshot + staging builds on the new branch
  • In parallel: bumps the upstream branch to the next version
  • Requires 3 artifact set verifications (not the current 2)

2. WORKFLOW parameter ignored
The pipeline receives ${WORKFLOW} (minor or patch) but the bump script doesn't use it. Needs branching logic.

3. DRA builds not explicitly triggered
The script pushes the version commit but doesn't explicitly trigger DRA snapshot/staging builds. Need to confirm: does the version bump commit automatically trigger the ml-cpp-snapshot-builds pipeline, or do we need an explicit Buildkite API trigger?

4. No retry loop for push failures
The spec says team pipelines should handle their own retries. If another commit lands between checkout and push, the push fails. Should add pull-rebase-retry logic.

5. Branch protection blocker
Already flagged separately — elasticsearchmachine needs admin access (or bypass list) to push to protected branches.

6. SLSA 0.1 compliance check needed
The spec notes some repos may require human approvals. Need to confirm whether ml-cpp has this requirement — if so, we'd need a PR-based approach instead of direct push.

7. Team failure notifications
Spec requires immediate team notification on failure. Current Slack notification goes to #machine-learn-build but doesn't ping @ml-team.

8. Skip ITs/E2E for version bump commits
Spec recommends skipping integration/E2E tests for version bumps. Since we push directly, the commit triggers normal CI. No mechanism to flag it as a version-bump-only commit that should skip heavy tests.

Not applicable

  • Auto-approval/auto-merge — we push directly, no PR (assuming branch protection is resolved)
  • GitHub workflow triggering — we use Buildkite natively

Adds a DRY_RUN=true option that performs all steps (checkout, sed,
commit) but skips the final git push. Useful for testing the pipeline
and for local verification.

Also makes sed portable across macOS/Linux and uses local git config
instead of --global.

Made-with: Cursor
@edsavage
Copy link
Copy Markdown
Contributor Author

edsavage commented Apr 8, 2026

Status update — testing complete

Testing results

All local and remote testing has been completed successfully:

Test Result
Dry run (DRY_RUN=true) Commit created locally with correct author (elasticsearchmachine), message ([ML] Bump version to 99.99.99), and diff — no push
Real push (throwaway branch) Commit pushed to test/version-bump-dry-run branch successfully
Idempotency (same version twice) Second run exits 0 with "nothing to do", no new commit
Missing NEW_VERSION Fails immediately with clear error message
Missing BRANCH Fails immediately with clear error message
Non-existent branch Fails at git checkout
Pipeline JSON job-version-bump.json.py generates correct step structure — bump-version step, fetch-dra-artifacts depends on it, block notification removed
Throwaway branch Cleaned up

Changes since initial commit

Added DRY_RUN=true env var support to bump_version.sh:

  • Performs all steps (checkout, pull, sed, verify, commit) but skips the final git push
  • Reports commit details (hash, author, message) for inspection
  • Provides undo instructions
  • Useful for CI testing and local verification

Also fixed:

  • Portable sed -i (handles both macOS BSD and Linux GNU)
  • git config uses local scope instead of --global (safer in CI)

Branch protection status

Repo-level bypass — configured elasticsearchmachine on all protected branches:

  • main, 9.3, 9.2, 9.1, 8.18

Org-level ruleset blocker — the Elastic org has an [org] Require a PR ruleset that applies to main and versioned branches via glob patterns (e.g. refs/heads/[0-9].[0-9]). This ruleset requires 1 approving review and has no bypass actors. Only org admins can add bypass actors to this ruleset.

This is a shared blocker — any team automating version bumps via direct push will hit the same issue. Recommend coordinating with Release Engineering to add elasticsearchmachine as a bypass actor on the org ruleset (or confirming their planned approach for this).

What's ready to merge

The code is functionally complete and tested for the patch version bump workflow. It can be merged once the org-level bypass is resolved, or if the decision is made to accept the blocker and address it separately (since the pipeline will simply fail at the push step with a clear error until the bypass is in place).

Remaining gaps (from earlier analysis)

See the gap analysis comment above for items like minor workflow, DRA trigger confirmation, and retry logic — these are follow-up work, not blockers for the initial patch workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants