Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
731746c
chore: ignore local worktrees
Golden-Promise Mar 25, 2026
44c9a24
docs: add long-task continuity suite spec
Golden-Promise Mar 25, 2026
f87d846
docs: normalize long-task eval tokens
Golden-Promise Mar 25, 2026
d5bc772
feat: scaffold long-task continuity skill packages
Golden-Promise Mar 25, 2026
5dee7d4
docs: add continuity reference placeholders
Golden-Promise Mar 25, 2026
48a4a7f
test: add skill-context-keeper contract checks
Golden-Promise Mar 25, 2026
a28c2c4
feat: add skill-context-keeper package
Golden-Promise Mar 25, 2026
f29454a
fix: tighten skill-context-keeper package docs
Golden-Promise Mar 25, 2026
62ffd7c
test: add skill-phase-gate contract checks
Golden-Promise Mar 25, 2026
553b3d0
feat: add skill-phase-gate package
Golden-Promise Mar 25, 2026
fa6eb4b
test: cover skill-phase-gate package surface
Golden-Promise Mar 25, 2026
b08ba08
test: add skill-handoff-summary contract checks
Golden-Promise Mar 25, 2026
b7ad09c
feat: add skill-handoff-summary package
Golden-Promise Mar 25, 2026
9b4c08e
fix: tighten handoff-summary reference contract
Golden-Promise Mar 25, 2026
d953483
test: add skill-task-continuity bootstrap checks
Golden-Promise Mar 25, 2026
9c58514
feat: add skill-task-continuity package
Golden-Promise Mar 25, 2026
7e5ad08
test: cover downstream installed bootstrap layout
Golden-Promise Mar 25, 2026
746767e
fix: narrow skill-task-continuity bootstrap guard
Golden-Promise Mar 25, 2026
5c29086
test: cover copied public library bootstrap guard
Golden-Promise Mar 25, 2026
4d7d202
test: define long-task eval contract
Golden-Promise Mar 25, 2026
40706c4
feat: add static long-task eval harness
Golden-Promise Mar 25, 2026
7135ae2
test: fix copied public library regression fixture
Golden-Promise Mar 25, 2026
e896dfa
fix: detect public library bootstrap targets by footprint
Golden-Promise Mar 25, 2026
c10522e
test: tighten eval scoring contract
Golden-Promise Mar 25, 2026
f847028
feat: make long-task evals case-aware
Golden-Promise Mar 25, 2026
8a84191
test: tighten long-task eval coverage
Golden-Promise Mar 25, 2026
3155a09
feat: tighten long-task eval scoring
Golden-Promise Mar 25, 2026
4888621
ci: add pull request checks for published packages
Golden-Promise Mar 25, 2026
d08eea0
docs: harden long-task suite release flow
Golden-Promise Mar 25, 2026
8aacf78
ci: retrigger pull request checks
Golden-Promise Mar 25, 2026
38d0ff9
test: isolate governance helpers from host CI env
Golden-Promise Mar 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .github/workflows/pull-request-checks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: pull-request-checks

on:
pull_request:
workflow_dispatch:

jobs:
tests:
runs-on: ubuntu-latest

steps:
- name: Check out repository
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"

- name: Run package test suites
run: |
set -eu
for test_dir in skills/*/tests; do
python3 -m unittest discover -s "$test_dir" -p 'test_*.py' -v
done

- name: Run eval unit tests
run: python3 -m unittest discover -s evals -p 'test_*.py' -v

- name: Run eval seed cases
run: python3 evals/run_evals.py
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ __pycache__/
*.pyd

.venv/
.worktrees/

.pytest_cache/
.mypy_cache/
Expand Down
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,24 @@

All notable changes to `codex-skill-library` should be documented in this file.

## [Unreleased]

### Added

- Publish four long-task continuity packages under `skills/`: `skill-context-keeper`, `skill-phase-gate`, `skill-handoff-summary`, and `skill-task-continuity`.
- Add bilingual package entry docs, routing-first `SKILL.md` files, OpenAI agent metadata, reader-facing references, downstream template assets, and package contract tests for the new continuity packages.
- Add the continuity-suite bootstrap helper and downstream template set for `AGENTS.md` plus `.agent-state/*.md` files without turning the repository root into a consumer repo.
- Add `docs/long-task-suite.md` and `docs/long-task-suite.zh-CN.md` so maintainers and readers can understand the suite architecture without opening package internals.
- Add a static continuity eval harness under `evals/` with seed cases, per-package artifact checks, routing checks, exact workflow-token checks, and optional guardrail metadata validation.
- Add a pull-request workflow for published package tests plus continuity eval checks.
- Add bilingual release checklist guidance for the continuity-suite publication flow.

### Changed

- Update root docs, skills indexes, and publishing guides so all four continuity packages are discoverable, install guidance stays aligned with `skill-installer`, and maintainers can find smoke-test and release-checklist steps quickly.
- Tighten continuity package README install guidance with direct, copyable install examples for `main` and the planned `v0.6.0` release.
- Treat the continuity eval contract as a release-facing surface: routing now depends on published trigger guidance, workflow tokens must match exact package and polarity contracts, and optional guardrail metadata must be valid when present.

## [0.5.1] - 2026-03-25

### Changed
Expand Down
40 changes: 29 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,47 +26,60 @@ This repository is designed for people who want to:
| Skill | Best For | Docs |
| --- | --- | --- |
| `skill-governance` | Governing skill assets with task-first add, enable, doctor, repair, audit, and document flows | [EN](skills/skill-governance/README.md) / [中文](skills/skill-governance/README.zh-CN.md) |
| `skill-context-keeper` | Refreshing current task state without drifting into checkpoints or handoff ownership | [EN](skills/skill-context-keeper/README.md) / [中文](skills/skill-context-keeper/README.zh-CN.md) |
| `skill-phase-gate` | Adding preflight and postflight checkpoints around meaningful edits | [EN](skills/skill-phase-gate/README.md) / [中文](skills/skill-phase-gate/README.zh-CN.md) |
| `skill-handoff-summary` | Writing compact continuation handoffs when work pauses or changes owners | [EN](skills/skill-handoff-summary/README.md) / [中文](skills/skill-handoff-summary/README.zh-CN.md) |
| `skill-task-continuity` | Bootstrapping and composing the continuity suite without replacing the atomic packages | [EN](skills/skill-task-continuity/README.md) / [中文](skills/skill-task-continuity/README.zh-CN.md) |

## Quick Start

1. Open the package list in [skills/README.md](skills/README.md).
2. Choose a skill and read its package `README.md`.
3. Install it with `skill-installer`, usually into the default Codex shared library.
4. Use the package references for examples, prompts, and deeper guidance.
3. Install the package you want with `skill-installer`, using either `main` or a tagged release.
4. For the continuity workflow, start with `skill-task-continuity` when you need suite bootstrap or composition guidance, or install the narrower atomic package directly.
5. Use the package reference pages for boundary notes now, and later for examples, prompts, and deeper guidance.

## Install Example
## Install Examples

Install `skill-governance` from this repository:
Install the suite entry package from the repository default branch:

```bash
python3 <path-to-skill-installer>/scripts/install-skill-from-github.py \
--repo Golden-Promise/codex-skill-library \
--path skills/skill-governance
--path skills/skill-task-continuity
```

Install the current release:
Pin the upcoming continuity-suite release:

```bash
python3 <path-to-skill-installer>/scripts/install-skill-from-github.py \
--repo Golden-Promise/codex-skill-library \
--path skills/skill-governance \
--ref v0.5.1
--path skills/skill-task-continuity \
--ref v0.6.0
```

Install from a GitHub tree URL:
Install from a GitHub tree URL when you want the public package page directly:

```bash
python3 <path-to-skill-installer>/scripts/install-skill-from-github.py \
--url https://github.com/Golden-Promise/codex-skill-library/tree/main/skills/skill-governance
--url https://github.com/Golden-Promise/codex-skill-library/tree/main/skills/skill-task-continuity
```

For maintainer smoke-test commands covering all four continuity packages, use [docs/publishing.md](docs/publishing.md).

## Reading Guide

- English skill index: [skills/README.md](skills/README.md)
- 中文技能索引: [skills/README.zh-CN.md](skills/README.zh-CN.md)
- `skill-governance` package: [EN](skills/skill-governance/README.md) / [中文](skills/skill-governance/README.zh-CN.md)
- `skill-context-keeper` package: [EN](skills/skill-context-keeper/README.md) / [中文](skills/skill-context-keeper/README.zh-CN.md)
- `skill-phase-gate` package: [EN](skills/skill-phase-gate/README.md) / [中文](skills/skill-phase-gate/README.zh-CN.md)
- `skill-handoff-summary` package: [EN](skills/skill-handoff-summary/README.md) / [中文](skills/skill-handoff-summary/README.zh-CN.md)
- `skill-task-continuity` package: [EN](skills/skill-task-continuity/README.md) / [中文](skills/skill-task-continuity/README.zh-CN.md)
- Repository publishing guide: [docs/publishing.md](docs/publishing.md)
- 中文发布说明: [docs/publishing.zh-CN.md](docs/publishing.zh-CN.md)
- Release checklist for the continuity suite: [docs/release-checklist-long-task-suite.md](docs/release-checklist-long-task-suite.md)
- 中文连续性套件发布清单: [docs/release-checklist-long-task-suite.zh-CN.md](docs/release-checklist-long-task-suite.zh-CN.md)

## Repository Layout

Expand All @@ -80,9 +93,14 @@ codex-skill-library/
README.md
README.zh-CN.md
skill-governance/
skill-context-keeper/
skill-phase-gate/
skill-handoff-summary/
skill-task-continuity/
```

## For Maintainers

Repository versioning, release flow, and validation steps are documented in [docs/publishing.md](docs/publishing.md).
If you are publishing this repository for the first time, start there instead of the package runtime docs.
The continuity-suite release checklist lives in [docs/release-checklist-long-task-suite.md](docs/release-checklist-long-task-suite.md).
If you are publishing this repository for the first time, start with those maintainer docs instead of the package runtime docs.
38 changes: 28 additions & 10 deletions README.zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,47 +26,60 @@
| Skill | 适用场景 | 文档 |
| --- | --- | --- |
| `skill-governance` | 用任务式入口治理 skill 资产,包括新增、启用、体检、修复、审计和补文档 | [EN](skills/skill-governance/README.md) / [中文](skills/skill-governance/README.zh-CN.md) |
| `skill-context-keeper` | 刷新当前任务状态,不扩展成检查点或交接职责 | [EN](skills/skill-context-keeper/README.md) / [中文](skills/skill-context-keeper/README.zh-CN.md) |
| `skill-phase-gate` | 在有分量的改动前后加入 preflight / postflight 检查点 | [EN](skills/skill-phase-gate/README.md) / [中文](skills/skill-phase-gate/README.zh-CN.md) |
| `skill-handoff-summary` | 在暂停或换人时生成紧凑、面向续做的交接摘要 | [EN](skills/skill-handoff-summary/README.md) / [中文](skills/skill-handoff-summary/README.zh-CN.md) |
| `skill-task-continuity` | 负责连续性套件的启动与组合,但不替代原子包 | [EN](skills/skill-task-continuity/README.md) / [中文](skills/skill-task-continuity/README.zh-CN.md) |

## 快速开始

1. 先看 [skills/README.zh-CN.md](skills/README.zh-CN.md) 浏览当前可用 skill。
2. 进入具体 skill 包的 `README.md` 了解它是否适合你的场景。
3. 使用 `skill-installer` 进行安装,通常直接安装到默认的 Codex 共享库。
4. 需要更详细示例时,继续阅读该包下的 `references/`。
3. 使用 `skill-installer` 安装目标包,可直接跟踪 `main`,也可固定到某个 tag。
4. 如果你需要整套连续性流程的启动或组合说明,先从 `skill-task-continuity` 开始;如果只需要单个动作,直接安装对应的原子包。
5. 现在可先阅读该包下的参考页了解边界说明,后续阶段再继续使用其中补充的示例与提示词资料。

## 安装示例

从当前仓库安装 `skill-governance`
从仓库默认分支安装连续性套件入口包

```bash
python3 <path-to-skill-installer>/scripts/install-skill-from-github.py \
--repo Golden-Promise/codex-skill-library \
--path skills/skill-governance
--path skills/skill-task-continuity
```

固定安装当前发布版本
固定安装即将发布的连续性套件版本

```bash
python3 <path-to-skill-installer>/scripts/install-skill-from-github.py \
--repo Golden-Promise/codex-skill-library \
--path skills/skill-governance \
--ref v0.5.1
--path skills/skill-task-continuity \
--ref v0.6.0
```

也可以直接使用 GitHub tree URL:
也可以直接使用 GitHub tree URL 指向公开包页面

```bash
python3 <path-to-skill-installer>/scripts/install-skill-from-github.py \
--url https://github.com/Golden-Promise/codex-skill-library/tree/main/skills/skill-governance
--url https://github.com/Golden-Promise/codex-skill-library/tree/main/skills/skill-task-continuity
```

如果你要做四个连续性包的维护者 smoke test,请直接看 [docs/publishing.zh-CN.md](docs/publishing.zh-CN.md)。

## 阅读入口

- English skill index: [skills/README.md](skills/README.md)
- 中文技能索引: [skills/README.zh-CN.md](skills/README.zh-CN.md)
- `skill-governance` 包说明: [EN](skills/skill-governance/README.md) / [中文](skills/skill-governance/README.zh-CN.md)
- `skill-context-keeper` 包说明: [EN](skills/skill-context-keeper/README.md) / [中文](skills/skill-context-keeper/README.zh-CN.md)
- `skill-phase-gate` 包说明: [EN](skills/skill-phase-gate/README.md) / [中文](skills/skill-phase-gate/README.zh-CN.md)
- `skill-handoff-summary` 包说明: [EN](skills/skill-handoff-summary/README.md) / [中文](skills/skill-handoff-summary/README.zh-CN.md)
- `skill-task-continuity` 包说明: [EN](skills/skill-task-continuity/README.md) / [中文](skills/skill-task-continuity/README.zh-CN.md)
- English publishing guide: [docs/publishing.md](docs/publishing.md)
- 中文发布说明: [docs/publishing.zh-CN.md](docs/publishing.zh-CN.md)
- English continuity-suite release checklist: [docs/release-checklist-long-task-suite.md](docs/release-checklist-long-task-suite.md)
- 中文连续性套件发布清单: [docs/release-checklist-long-task-suite.zh-CN.md](docs/release-checklist-long-task-suite.zh-CN.md)

## 仓库结构

Expand All @@ -80,9 +93,14 @@ codex-skill-library/
README.md
README.zh-CN.md
skill-governance/
skill-context-keeper/
skill-phase-gate/
skill-handoff-summary/
skill-task-continuity/
```

## 给维护者

仓库级的版本、发布流程和校验说明统一放在 [docs/publishing.zh-CN.md](docs/publishing.zh-CN.md)。
如果你是第一次发布这个仓库,建议先看那份文档,而不是直接从包内运行时说明开始。
连续性套件发布清单在 [docs/release-checklist-long-task-suite.zh-CN.md](docs/release-checklist-long-task-suite.zh-CN.md)。
如果你是第一次发布这个仓库,建议先看这些维护者文档,而不是直接从包内运行时说明开始。
98 changes: 98 additions & 0 deletions docs/long-task-suite.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Long-Task Continuity Suite

## Problem Statement

Long threads usually do not fail all at once. They fail by degrees: the shared state gets stale, the workflow loses its shape, and handoffs become too thin for the next agent to trust.

This suite exists to make those failure modes explicit. It treats long-task degradation as three separate problems:

- state drift, where the working picture no longer matches reality
- workflow drift, where the task stops following a deliberate sequence of phases and checkpoints
- handoff friction, where another agent cannot resume without guessing

The goal is not to add more ceremony. The goal is to make continuity measurable so that long tasks stay resumable, inspectable, and transferable.

## State Drift, Workflow Drift, And Handoff Friction

These three failure modes overlap, but they are not the same thing.

State drift appears when summaries, context, or task memory lag behind the actual work. The risk is silent divergence: the thread sounds confident while carrying the wrong assumptions.

Workflow drift appears when a task that needs staged execution starts behaving like a single shot. The work may still move forward, but it loses checkpoints, decision points, and clear boundaries.

Handoff friction appears when a pause or transfer leaves too little signal for the next agent. The work is not necessarily wrong, just expensive to resume.

The suite uses these distinctions to decide which package should trigger and which one should stay out of the way.

The evaluation matrix uses normalized artifact paths and event tokens so the runner can validate results consistently.

## Package Map

| Package | Responsibility | Trigger Shape |
| --- | --- | --- |
| `skill-context-keeper` | Preserve and reconstruct working state across long threads, especially after interruptions or stale summaries. | Resume, refresh, or reconcile context. |
| `skill-phase-gate` | Decide when work needs phase boundaries, checkpoints, or a deliberate pause before execution continues. | Split, gate, or stage the work. |
| `skill-handoff-summary` | Produce a clean transfer note when work is paused or handed to another agent. | Summarize status, blockers, and next steps. |
| `skill-task-continuity` | Orchestrate the three atomic packages when the task itself is about maintaining long-thread continuity. | Bootstrap the suite, coordinate boundaries, and keep the flow coherent. |

## Repository Boundary Rules

This repository is a public installable skill library, so the suite docs must stay reader-facing and maintainable.

- Keep the suite spec in `docs/`, not in live agent state files.
- Do not create a root `AGENTS.md`, `.agent-state/`, or public-package `.agents/skills` content for this task.
- Treat `evals/cases.csv` as the source of truth for trigger coverage, but keep the prose docs understandable on their own.
- Describe package boundaries in plain language; do not require readers to open package implementation files first.
- Prefer the narrowest package that matches the prompt. The composition package should not steal work that belongs to an atomic package.
- Make ambiguity explicit in the matrix so maintainers can see when a keyword match is not a real trigger.

## Success Criteria

### Outcome

- Long-thread work can be resumed, paused, or transferred without losing intent.
- The suite catches both false positives and false negatives for the four target packages.
- A maintainer can understand the architecture and boundaries without opening package source files.

### Process

- The eval matrix includes positive trigger cases and negative trigger cases for every atomic package.
- The matrix includes at least one composition-package bootstrap case and one boundary-protection case.
- Each case records the expected artifacts and the expected workflow event or command shape when that matters.

### Style

- The docs stay concise, public-reader-friendly, and easy to scan.
- English and Chinese versions share the same major section order.
- Trigger notes read like maintainer guidance, not like internal scratchpad text.

### Efficiency

- Maintainers can validate the suite from the docs and CSV without reverse-engineering package code.
- The matrix is small enough to extend without becoming noisy.
- Ambiguous prompts are documented once, then reused as regression coverage.

## Initial Evaluation Matrix

The seed matrix lives in `evals/cases.csv`. The table below shows the initial coverage shape and the kinds of expected artifacts and workflow events to look for.

| Case | Package | Trigger | Prompt Shape | Expected Artifacts | Expected Events |
| --- | --- | --- | --- | --- | --- |
| `context_resume` | `skill-context-keeper` | Yes | Resume the last known state and carry forward unresolved work. | `state/context.snapshot`, `state/continuity.note` | `context:reload`, `context:reconstruct`, `context:summary` |
| `context_resume_not_needed` | `skill-context-keeper` | No | Answer a one-off question with no continuity risk. | `none` | `context:skip`, `direct:answer` |
| `phase_gate_before_multi_step` | `skill-phase-gate` | Yes | Split a multi-step task into phases before coding starts. | `plan/phase.plan`, `plan/checkpoints.md`, `plan/exit-criteria.md` | `phase:split`, `phase:checkpoint`, `phase:gate` |
| `tiny_edit_not_gate` | `skill-phase-gate` | No | Make a tiny local edit with no staged workflow. | `none` | `phase:skip`, `direct:edit` |
| `handoff_before_pause` | `skill-handoff-summary` | Yes | Pause work and hand it to another agent. | `handoff/HANDOFF.md`, `handoff/blockers.md`, `handoff/next-steps.md` | `handoff:capture`, `handoff:pause`, `handoff:transfer` |
| `handoff_not_needed` | `skill-handoff-summary` | No | Give a final answer without transfer notes. | `none` | `handoff:skip`, `direct:answer` |
| `suite_bootstrap` | `skill-task-continuity` | Yes | Coordinate the long-task suite across the atomic packages. | `AGENTS.md`, `.agent-state/TASK_STATE.md`, `.agent-state/HANDOFF.md` | `bootstrap:agents_md`, `bootstrap:task_state`, `bootstrap:handoff` |
| `suite_boundary_clean` | `skill-task-continuity` | No | A trivial edit that merely mentions all the keywords. | `none` | `bootstrap:skip`, `direct:edit` |

## Phase Plan

The current task is the bootstrap phase: define the suite, seed the matrix, and make the boundaries legible.

Phase 1 should keep the documentation stable while the package implementations are still being shaped. That means adding new cases only when they improve coverage, not when they repeat the same trigger in different words.

Phase 2 should expand the matrix with more realistic long-thread scenarios, especially ones where the wrong package could plausibly trigger.

Phase 3 should use the suite as a regression harness for future package changes, so trigger behavior stays narrow and intentional.
Loading
Loading