From 5b1c63a8cf9eb9620a545e3ca8527b40c397499f Mon Sep 17 00:00:00 2001 From: kj-podonos Date: Thu, 18 Jun 2026 16:23:20 +0900 Subject: [PATCH 1/2] ci(publish): retry testpypi-smoke install to tolerate index propagation lag TestPyPI's /simple/ index is eventually-consistent (Fastly CDN): a just- uploaded version can take tens of seconds to a couple minutes to become installable. The smoke step ran a single pip install seconds after the test-pypi upload, racing the index and failing spuriously with "No matching distribution". Wrap the install in a retry loop (10x30s, ~5min ceiling, early-exit on success) and add --no-cache-dir so pip does not replay a cached negative index response. Route the version through env: for script-injection safety. Co-Authored-By: Claude Opus 4.8 (1M context) --- .github/workflows/publish.yml | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml index 637cfa0..a940757 100644 --- a/.github/workflows/publish.yml +++ b/.github/workflows/publish.yml @@ -121,12 +121,30 @@ jobs: steps: - uses: actions/setup-python@v6 with: { python-version: '3.12' } - - run: | - pip install \ - --index-url https://test.pypi.org/simple/ \ - --extra-index-url https://pypi.org/simple/ \ - onepin==${{ needs.build.outputs.version }} \ - && onepin --version + - name: Install just-published version from TestPyPI (tolerate index propagation lag) + env: + VERSION: ${{ needs.build.outputs.version }} + # TestPyPI's /simple/ index is eventually-consistent (Fastly CDN): a version can + # take tens of seconds to a couple minutes to become installable after the + # test-pypi job uploads it. This job runs seconds later, so a single pip install + # races the index and spuriously fails with "No matching distribution". Retry + # until it propagates; --no-cache-dir avoids replaying a cached negative response. + run: | + set -euo pipefail + for attempt in $(seq 1 10); do + if pip install --no-cache-dir \ + --index-url https://test.pypi.org/simple/ \ + --extra-index-url https://pypi.org/simple/ \ + "onepin==${VERSION}"; then + echo "installed onepin==${VERSION} on attempt ${attempt}" + onepin --version + exit 0 + fi + echo "attempt ${attempt}/10: onepin==${VERSION} not on TestPyPI index yet; sleeping 30s" + sleep 30 + done + echo "::error::onepin==${VERSION} never became installable from TestPyPI after 10 attempts (~5 min)" + exit 1 notify-failure: if: failure() needs: [build, smoke-install, test-pypi, testpypi-smoke] From b07013b1293738326df28671a8c85b38c8b1d934 Mon Sep 17 00:00:00 2001 From: kj-podonos Date: Thu, 18 Jun 2026 16:47:48 +0900 Subject: [PATCH 2/2] docs(publish): add testpypi-smoke index-propagation race to pre-mortem Document the failure mode fixed in the prior commit (TestPyPI /simple/ index propagation lag racing the post-publish smoke install) and its retry mitigation as row 8 of the pre-mortem table. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/PUBLISH.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/PUBLISH.md b/docs/PUBLISH.md index a8a81e5..dc80f18 100644 --- a/docs/PUBLISH.md +++ b/docs/PUBLISH.md @@ -119,6 +119,7 @@ a manual settings change in the PyPI project — the workflow cannot self-config | 5 | **Double-publish to the immutable PyPI index** | A re-dispatch / re-run / rollback re-fires the promote for an already-published version | `promote-prod.yml` **preflight**: `GET https://pypi.org/pypi/onepin//json`; HTTP `200` ⇒ `::error::` abort before upload. `concurrency: { group: promote-prod, cancel-in-progress: false }` serializes promotes. PyPI itself is the final backstop (rejects re-uploads). | | 6 | **Wrong-version / wrong-sha promoted** to customers | Promote builds off a branch HEAD (a `.devN`), or promotes a tag whose API is ahead of the deployed spec | The build checks out `refs/tags/` (qualified — never a same-named branch) and **asserts a clean `^[0-9]+\.[0-9]+\.[0-9]+$`** (a `.devN`/local aborts). **Per-sha pinning** (shipped): on a prod dispatch carrying spec commit S, the resolver iterates tags newest-first and promotes the newest tag whose `.spec-sha` is an ancestor-or-equal of S in the spec repo (`compare` base...head → `ahead`/`identical` = safe); any tag ahead of prod is skipped. Any API error during classification aborts the whole resolve (fail closed) — the immutable index is never touched with an uncertain result. | | 7 | **Rollback re-dispatch republishes/downgrades** | A non-forward dispatch (e.g. a rollback) reaches the receiver | The PyPI lane only acts on `environment == 'prod'` dispatches; the immutable-index preflight (row 5) blocks a re-publish of an existing version. (The backend additionally gates `notify-sdk-repos` on `github.event_name == 'push'` so a rollback `workflow_dispatch` doesn't re-dispatch.) | +| 8 | **`testpypi-smoke` flakes on a fresh `.devN`** — TestPyPI publish succeeds but the post-publish smoke install fails with `No matching distribution` | TestPyPI's `/simple/` index is eventually-consistent (Fastly CDN); the smoke job runs seconds after `test-pypi` uploads and **races the index** before the new version propagates | `testpypi-smoke` **retries** the install (10×30s, ~5 min ceiling, early-exit on success) with `--no-cache-dir` so pip never replays a cached negative index response. A genuinely uninstallable artifact still fails after the budget (the install never succeeds); the version flows via `env:` for script-injection safety. Note: `--no-cache-dir` covers pip's *client* cache, not Fastly *edge* caching — the time budget, not the flag, is what outlasts CDN lag. | ## Test plan (4 layers)