Skip to content

Let metrics degrade gracefully when no deploy workflow is configured #58

Description

@coygeek

Summary

deploybot metrics currently fails hard when the configured deployment workflow does not resolve to one active workflow. That makes metrics unusable for repositories that want merge-queue timing but do not have a deployment workflow yet.

For a first pilot or library-only repo, queue metrics such as request-to-queue and queue-to-merge are still useful even when merge-to-live cannot be computed.

Evidence

Running metrics against this repository with the example policy fails because the example policy configures a Deploy workflow but the repository only has CI active:

error: configured workflow 'Deploy' did not resolve to one active workflow

The command attempted:

deploybot --repository Forward-Future/DeployBot \
  --config .mergequeue.example.toml \
  metrics --limit 10 --json

The active workflow list contains only:

[
  {
    "name": "CI",
    "path": ".github/workflows/ci.yml",
    "state": "active"
  }
]

Impact

Metrics become all-or-nothing:

  • A repo without deployment cannot view queue throughput.
  • A repo with a renamed deploy workflow gets no partial timing report.
  • During onboarding, the user cannot use metrics to prove merge-queue improvements until deployment is fully wired.

This is especially awkward because DeployBot can be valuable before production deployment is configured: it can still coordinate ready PRs and report queue behavior.

Proposed behavior

If deployment workflows cannot be resolved, deploybot metrics should still return partial samples:

  • request_to_queue_seconds
  • queue_to_merge_seconds
  • request_to_live_seconds = null
  • merge_to_live_seconds = null
  • a warning field explaining that deploy workflow timing was unavailable

For example:

{
  "sample_count": 10,
  "warnings": [
    "configured deploy workflow 'Deploy' did not resolve to one active workflow; live timing omitted"
  ],
  "queue_to_merge_seconds": {
    "count": 8,
    "p50": 42.0,
    "p95": 120.0,
    "max": 120.0
  },
  "merge_to_live_seconds": {
    "count": 0,
    "p50": null,
    "p95": null,
    "max": null
  }
}

The existing hard failure can remain for commands that actually need deployment workflow dispatch or verification. This issue is only about read-only metrics.

Acceptance criteria

  • deploybot metrics --json returns queue timing when deploy workflows are missing or unresolved.
  • The output includes a machine-readable warning when live/deploy timing is omitted.
  • Text mode prints the same warning without a traceback.
  • Existing behavior remains fail-closed for merge/follow/react paths that require deployment workflow identity.
  • Tests cover missing deploy workflow, renamed deploy workflow, and no merged PR samples.

Verification

  • Add/update unit tests for delivery_metrics.
  • Run python -m unittest discover -s tests.
  • Run ruff check ..

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions