ScienceLiveHub · annefou · Jun 13, 2026 · Jun 13, 2026
diff --git a/.github/workflows/wayback.yml b/.github/workflows/wayback.yml
@@ -0,0 +1,64 @@
+name: Archive web sources in the Wayback Machine
+
+# On release (and via workflow_dispatch), submit this project's web sources to the
+# Internet Archive Wayback Machine ("Save Page Now") for an immutable, timestamped
+# snapshot. Two kinds of source are captured:
+#   1. The deployed MyST / Jupyter Book site. The repo *source* is preserved in
+#      Software Heritage (swh-save.yml) and Zenodo (docker.yml), but the rendered
+#      *site* is not — this captures it.
+#   2. Any URLs listed in `wayback-urls.txt` at the repo root (one per line, '#'
+#      comments allowed). Use this for Mode-B / paperless claim sources — blogs,
+#      design notes, README pages — that Software Heritage cannot archive because
+#      they are prose, not code.
+#
+# Uses anonymous Save Page Now (no secrets); it is rate-limited. If you hit limits,
+# switch to the authenticated SPN2 API: add Internet Archive S3-style keys as the
+# secrets IA_ACCESS_KEY / IA_SECRET_KEY and an
+#   -H "Authorization: LOW ${IA_ACCESS_KEY}:${IA_SECRET_KEY}"
+# header to the curl call below.
+#
+# STATUS: written, NOT yet executed. Validate via Actions -> Run workflow and check
+# the run log + the resulting web.archive.org snapshot URLs.
+
+on:
+  release:
+    types: [published]
+  workflow_dispatch:
+
+permissions:
+  contents: read
+
+jobs:
+  wayback:
+    runs-on: ubuntu-latest
+    continue-on-error: true   # best-effort archival must never fail the release
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Build URL list
+        run: |
+          repo="${{ github.repository }}"          # owner/name
+          owner="${repo%%/*}"
+          name="${repo#*/}"
+          pages="https://${owner}.github.io/${name}/"
+          {
+            echo "${pages}"
+            if [ -f wayback-urls.txt ]; then
+              grep -vE '^[[:space:]]*(#|$)' wayback-urls.txt || true
+            fi
+          } > /tmp/wayback-urls.txt
+          echo "URLs to archive:"; cat /tmp/wayback-urls.txt
+
+      - name: Submit to Wayback Machine (Save Page Now)
+        run: |
+          while IFS= read -r url; do
+            [ -z "${url}" ] && continue
+            echo "::group::Archiving ${url}"
+            code=$(curl -sS -o /dev/null -w '%{http_code}' \
+              -A "forrt-replication-template wayback workflow" \
+              "https://web.archive.org/save/${url}") || code="000"
+            echo "HTTP ${code} for ${url}"
+            echo "Latest snapshot: https://web.archive.org/web/2/${url}"
+            echo "::endgroup::"
+            sleep 5   # be polite to the Internet Archive endpoint
+          done < /tmp/wayback-urls.txt
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -122,8 +122,9 @@ Exit: `nanopubs/drafts/05_outcome.md` is written with the conclusion sentence, t
 - A GitHub release is cut with a Zenodo-facing description (no internal ops detail, no bot signatures — `docs/cicd-conventions.md` § Release notes are Zenodo descriptions).
 - Zenodo mints a concept DOI; the value is written into `CITATION.cff` and `codemeta.json`.
 - The Docker image is pushed to GHCR via `.github/workflows/docker.yml` and (optionally) archived on Zenodo.
+- On release, two further archival workflows fire automatically (best-effort, never block the release): `swh-save.yml` requests **Software Heritage** Save Code Now so the released revision gets a permanent, forge-agnostic **SWHID**, and `wayback.yml` snapshots the deployed Jupyter Book site plus any URLs in `wayback-urls.txt` (Mode-B / paperless claim sources) in the **Internet Archive Wayback Machine**. See `docs/cicd-conventions.md` § Preservation.
 
-Exit: the release page is live, the Zenodo record exists, and `nanopubs/PUBLISHED.md` lists the source + image DOIs.
+Exit: the release page is live, the Zenodo record exists, and `nanopubs/PUBLISHED.md` lists the source + image DOIs. (Software Heritage + Wayback archival are best-effort and may complete asynchronously.)
 
 ### Phase 5 — FORRT nanopublication chain
 

diff --git a/docs/chain-decision-tree.md b/docs/chain-decision-tree.md
@@ -55,6 +55,21 @@ Don't treat PCC as a generic "research question" template that you can use whene
 
 The mirror mistake — using Quote-with-comment to anchor a question-rooted chain — happens when there's a *related* paper but no specific sentence we're testing. In that case, cite the related paper at the AIDA step's *Supported by other publications* group, but anchor the chain in PICO/PCC.
 
+## Paperless claims (Mode-B) — a claim stated in code / README / blog, not a paper
+
+Not every testable claim lives in a paper. A tool's README, a design note, or a blog post can state a falsifiable claim about how a system behaves. These are first-class — *not everyone who advances knowledge writes a paper, and they shouldn't have to to make a claim that's testable and citable.* But a paperless source has no DOI, which interacts with the chain start:
+
+- **The Quote-with-comment `Cited DOI` field is DOI-only** (it expects a bare `10.x/y`, not a URL). So you cannot quote a raw GitHub / blog / SWHID URL there.
+
+Two clean ways to handle a paperless claim:
+
+1. **Deposit the source to get a DOI, then go paper-rooted.** Archive the code (Software Heritage → SWHID; and/or Zenodo → DOI) or the prose (Zenodo deposit → DOI; Wayback for fixity). Once the source has a DOI, use the normal Quote-with-comment start.
+2. **Go question-rooted (PICO/PCC) and cite the source by URL at the CiTO step.** When there's no DOI to quote, frame the claim as an answered research question (PICO/PCC), then at the CiTO Citation step cite the artifact by URL — the CiTO *"DOI **or other URL**"* field accepts any resolvable URI. This is usually the right shape for a claim that isn't quoting a paper anyway.
+
+**Anchor the source on the most durable artifact identifier available**, in order: **SWHID** (code, forge-agnostic) > **Zenodo DOI** > repo URL > Wayback-snapshotted page URL.
+
+**Credit the original author by any resolvable URI** inside the nanopub (`prov:wasAttributedTo`): an ORCID if they have one, else an institutional profile or `https://github.com/<user>` — never force an ORCID on a non-academic author (that would re-impose the gatekeeping Mode-B exists to bypass). Note the *signer* of the nanopub stays the Science Live user's ORCID; only the *referenced* source and its author may be a non-DOI / non-ORCID URI.
+
 ## What happens after Phase 5
 
 Once a single chain is published, you have three optional layers:

diff --git a/docs/cicd-conventions.md b/docs/cicd-conventions.md
@@ -168,6 +168,26 @@ If a bad description is already on Zenodo: edit in place via `zenodo.org/records
 
 ---
 
+## Preservation: Zenodo (release), Software Heritage (code), Wayback (web sources)
+
+Three release-time archival paths, each with a distinct job. They are complementary, not redundant — capture all three where applicable.
+
+| Workflow | Archives | Identifier | Coverage |
+|---|---|---|---|
+| `docker.yml` (Zenodo) | the release source tarball + (optionally) the Docker image | Zenodo concept DOI | GitHub-only auto-archival |
+| `swh-save.yml` (Software Heritage) | the source tree at the released revision | **SWHID** (ISO/IEC standard) | forge-agnostic — GitHub, GitLab.com, self-hosted GitLab, any git |
+| `wayback.yml` (Internet Archive) | the deployed Jupyter Book site + the URLs in `wayback-urls.txt` | timestamped `web.archive.org` snapshot | web pages (prose), not code |
+
+Conventions:
+
+- **Code → Software Heritage (SWHID).** SWH is the universal, forge-agnostic anchor: it covers GitLab / self-hosted forks that Zenodo's GitHub-only integration misses. `swh-save.yml` requests Save Code Now on each release. Zenodo gives the *citable release + metadata DOI*; SWH gives the *immutable code identity*. Capture both.
+- **Prose / web sources → Wayback.** Blogs, design notes, README pages that state a claim are not code, so Software Heritage cannot archive them. List them in `wayback-urls.txt`; `wayback.yml` snapshots them (plus the deployed book site) on release. Pair with a Zenodo deposit if a citable DOI is also wanted.
+- **Never anchor on a conda package.** Software Heritage's conda loader is not in production; built conda-forge / bioconda *artifacts* are not archived. The recipes (feedstock GitHub repos) and upstream source repos *are* archived (as git). So anchor reproducibility on **pinned `pixi.toml` / `pixi.lock` + the source repo's SWHID + the container image on Zenodo** — not the conda artifact.
+
+All three workflows trigger only on `release` (plus manual `workflow_dispatch`), so they never run on an uninitialised template or on routine pushes.
+
+---
+
 ## Long-running experiments — don't poll
 
 If an analysis takes more than ~5 minutes:

diff --git a/wayback-urls.txt b/wayback-urls.txt
@@ -0,0 +1,16 @@
+# wayback-urls.txt — external web sources to snapshot in the Internet Archive
+# Wayback Machine on each release (see .github/workflows/wayback.yml).
+#
+# One URL per line. Lines starting with '#' and blank lines are ignored.
+#
+# Use this for Mode-B / paperless claim sources that Software Heritage cannot
+# archive because they are prose, not code: blog posts, design notes, README
+# pages, or documentation that states a claim your replication tests. A Wayback
+# snapshot gives the source immutable fixity at the moment the claim was made,
+# which the nanopub can cite as provenance.
+#
+# The deployed Jupyter Book site is archived automatically — you do NOT need to
+# list it here.
+#
+# Example:
+# https://example.org/blog/the-claim-we-are-testing