Skip to content

feat(BA-5419): support percentage values for rolling update max_surge and max_unavailable#10563

Open
jopemachine wants to merge 13 commits intomainfrom
feat/BA-5419-percentage-rolling-update
Open

feat(BA-5419): support percentage values for rolling update max_surge and max_unavailable#10563
jopemachine wants to merge 13 commits intomainfrom
feat/BA-5419-percentage-rolling-update

Conversation

@jopemachine
Copy link
Member

@jopemachine jopemachine commented Mar 26, 2026

Resolves BA-5419.

Summary

  • Add support for float fraction values (e.g., 0.25 for 25%) for max_surge and max_unavailable rolling update parameters, in addition to existing absolute integer values
  • Float values are resolved to absolute counts at execution time based on desired replica count, with max_surge rounding up and max_unavailable rounding down (matching Kubernetes semantics)
  • Validation ensures float fractions are between 0.0 and 1.0, and existing integer-only configurations continue to work without changes

Rounding semantics (follows Kubernetes convention)

Field Rounding Reason
max_surge ceil (rounds up) Determines how many extra replicas can be created — rounding up ensures at least 1 can be created for progress
max_unavailable floor (rounds down) Determines how many replicas can be taken down — rounding down preserves availability conservatively

Example (desired_replicas = 10, both set to 25%):

  • max_surge: ceil(10 * 0.25) = ceil(2.5) = 3 → up to 13 replicas simultaneously
  • max_unavailable: floor(10 * 0.25) = floor(2.5) = 2 → at least 8 replicas always available

Both directions round toward safety: surge allows more creation headroom, unavailable restricts how many can go down.

Test plan

  • DTO validation tests: float fraction input accepted, boundary values, negative rejected
  • RollingUpdateSpec tests: fraction resolve with rounding, passthrough for integers, deadlock rejection for 0+0
  • Strategy evaluation tests: correct route creation/termination with fractional specs
  • GQL mutation tests pass unchanged
  • CI passes lint/check/test

Resolves BA-5419


📚 Documentation preview 📚: https://sorna--10563.org.readthedocs.build/en/10563/


📚 Documentation preview 📚: https://sorna-ko--10563.org.readthedocs.build/ko/10563/

Copilot AI review requested due to automatic review settings March 26, 2026 07:07
@github-actions github-actions bot added size:L 100~500 LoC comp:manager Related to Manager component comp:common Related to Common component labels Mar 26, 2026
jopemachine added a commit that referenced this pull request Mar 26, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jopemachine jopemachine marked this pull request as draft March 26, 2026 07:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for percentage-based max_surge / max_unavailable for rolling updates (Kubernetes-style rounding), expanding validation/DTO/GQL surfaces and updating strategy evaluation to resolve percentages at runtime.

Changes:

  • Introduce shared validate_int_or_percent() / resolve_int_or_percent() helpers and apply them to rolling update config/spec.
  • Update rolling update strategy to resolve surge/unavailable against desired replicas at execution time.
  • Expand DTO/GQL types and add unit tests covering percentage parsing and rounding behavior.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/unit/manager/sokovan/deployment/strategy/test_rolling_update.py Adds strategy/spec unit tests for percentage inputs, rounding, and deadlock prevention cases.
tests/unit/common/dto/manager/v2/deployment/test_request.py Adds DTO validation tests ensuring percent strings are accepted/rejected as expected and numeric strings normalize to int.
src/ai/backend/manager/sokovan/deployment/strategy/rolling_update.py Switches from raw spec fields to desired-based resolution helpers before computing budgets.
src/ai/backend/manager/models/deployment_policy/row.py Updates RollingUpdateSpec schema to accept int-or-percent and adds resolver helpers.
src/ai/backend/manager/api/gql/deployment/types/policy.py Broadens GQL fields to JSON to carry either int or percent string.
src/ai/backend/common/types.py Adds shared int-or-percent validation and resolution helpers.
src/ai/backend/common/dto/manager/v2/deployment/types.py Broadens response DTO fields to `int
src/ai/backend/common/dto/manager/v2/deployment/request.py Broadens request DTO to accept int-or-percent with a shared validator.
changes/10563.feature.md Adds changelog entry for the new percentage-based behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions github-actions bot added the area:docs Documentations label Mar 26, 2026
@jopemachine jopemachine force-pushed the feat/BA-5419-percentage-rolling-update branch 8 times, most recently from c874cd8 to 15ed8b7 Compare March 26, 2026 07:51
@github-actions github-actions bot added size:XL 500~ LoC and removed size:L 100~500 LoC labels Mar 26, 2026
@jopemachine jopemachine force-pushed the feat/BA-5419-percentage-rolling-update branch from 2768df4 to ce69b1e Compare March 26, 2026 08:08
@jopemachine jopemachine added this to the 26.4 milestone Mar 26, 2026
@jopemachine jopemachine force-pushed the feat/BA-5419-percentage-rolling-update branch from 7ce41a4 to 2dbf998 Compare March 26, 2026 08:12
@github-actions github-actions bot added the require:db-migration Automatically set when alembic migrations are added or updated label Mar 26, 2026
jopemachine and others added 5 commits March 27, 2026 10:37
… and max_unavailable

Add support for percentage-based values (e.g., "25%") in addition to
absolute integers for max_surge and max_unavailable rolling update
parameters.  Percentage values are resolved to absolute counts at
execution time based on the desired replica count, with max_surge
rounding up and max_unavailable rounding down (matching Kubernetes
semantics).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… and max_unavailable

Add support for float fraction values (e.g., 0.25 for 25%) in addition
to absolute integers for max_surge and max_unavailable rolling update
parameters.  Float values are resolved to absolute counts at execution
time based on the desired replica count, with max_surge rounding up and
max_unavailable rounding down (matching Kubernetes semantics).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: octodog <mu001@lablup.com>
Update test files to use IntOrPercent instead of plain int for
RollingUpdateSpec max_surge/max_unavailable after type change.
Also merge diverged alembic heads from main.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jopemachine jopemachine force-pushed the feat/BA-5419-percentage-rolling-update branch from 4946496 to 7647381 Compare March 27, 2026 01:37
…-check logic

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jopemachine and others added 7 commits March 27, 2026 10:53
… Field descriptions

- Create IntOrPercentTypeGQL as a standalone enum.Enum class with @gql_enum
  decorator instead of wrapping the DTO enum via gql_enum() function call
- Move JSON examples from description to examples in RollingUpdateConfigInput fields
- Rename _iop helper to _int_or_percent in tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ng update tests

- Add docstring and use match statement in RollingUpdateSpec._resolve
- Refactor TestRollingUpdateConfigInput with RollingUpdateValidScenario dataclass
- Add max_unavailable invalid input test cases

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ate in tests

- Replace hardcoded added_version="26.4.0" with NEXT_RELEASE_VERSION
- Use InvalidEndpointState instead of bare Exception in test assertions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… values

- Add field_validator to coerce plain int to IntOrPercent for legacy DB rows
- Fix type annotation in migration eb9441fcf90a that broke SQLAlchemy 2.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The legacy int values in strategy_spec won't exist in practice,
so no backward compatibility shim or data migration is needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace Field(default=None, description=...) with plain None default
for extra_mounts in CreateRevisionInput and AddRevisionInput, and
remove unused pydantic Field import.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: octodog <mu001@lablup.com>
@jopemachine jopemachine requested a review from a team March 27, 2026 05:48
@jopemachine jopemachine marked this pull request as ready for review March 27, 2026 05:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:docs Documentations comp:common Related to Common component comp:manager Related to Manager component require:db-migration Automatically set when alembic migrations are added or updated size:XL 500~ LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants