Skip to content

allow resuming#205

Merged
sethconvex merged 5 commits intomainfrom
ian/resume
Mar 6, 2026
Merged

allow resuming#205
sethconvex merged 5 commits intomainfrom
ian/resume

Conversation

@ianmacartney
Copy link
Copy Markdown
Member

@ianmacartney ianmacartney commented Feb 19, 2026

Add workflow retry functionality

Fixes #38

Fixes #191

This change introduces the ability to retry previously-failed workflows from a specific step or from the beginning.

Key Features

  • Retry from specific step: Support retrying from a step number, step name, or function reference
  • Step cleanup: Automatically delete steps from the retry point onwards before restarting
  • Event cleanup: Remove associated events when deleting event steps during retry
  • Generation tracking: Increment workflow generation number and clear previous run results
  • Async execution: Support both immediate execution and workpool enqueueing via startAsync option

API Changes

  • Added workflow.retry() method with options for specifying retry point and execution mode
  • Enhanced cleanup functionality with optional force parameter

Validation

  • Prevents retrying workflows that are still running
  • Validates step numbers and names exist before attempting retry
  • Maintains workflow state consistency through generation number tracking

Summary by CodeRabbit

  • New Features

    • Added workflow retry functionality allowing users to re-run failed workflows from a specific step (by number, name, or function reference).
    • Introduced startAsync option for asynchronous retry execution.
  • Documentation

    • Added comprehensive "Retrying a failed workflow" guide with usage examples and best practices.
  • Tests

    • Added extensive test coverage for retry scenarios and edge cases.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 19, 2026

📝 Walkthrough

Walkthrough

This pull request implements a workflow retry mechanism enabling workflows to resume from specific steps after failure. It includes client-facing API, server-side mutations, step-management utilities, validation updates, documentation, and comprehensive tests.

Changes

Cohort / File(s) Summary
Documentation
README.md
Added two "Retrying a failed workflow" sections documenting usage of workflow.retry() with from parameter (step number, name, or function reference), startAsync option, transaction behavior, and code examples; updated "Cleaning up a workflow" section to clarify manual cleanup requirement.
Client API
src/client/index.ts
Added public retry() method to WorkflowManager that normalizes the optional from field (numbers/strings passed as-is, function references converted via safeFunctionName) and invokes the internal workflow retry mutation.
Validation & Schema Updates
src/component/journal.ts
Tightened startSteps mutation parameter validation by changing workflowId from v.string() to v.id("workflows") to enforce proper workflow ID references.
Internal Model Refactoring
src/component/model.ts
Updated getWorkflow() signature to accept Id<"workflows"> instead of string with removed normalization logic; deleted getJournalEntry() exported function; simplified database retrieval and validation.
Workflow Engine
src/component/workflow.ts
Added new retry mutation with retryHandler to validate workflow completion, delete steps conditionally, increment generation, clear run results, and trigger via workpool or direct mutation; introduced internal deleteSteps() helper for cascading cleanup of steps and nested workflows; extended cleanup mutation with optional force flag for handling in-progress workflows.
Pool & Error Handling
src/component/pool.ts
Replaced getWorkflow import with direct database lookup; added explicit existence checks for journal entries with early return and error logging; added workflow existence assertion; improved generation mismatch error messaging.
Test Suite
src/component/workflow.test.ts
Added comprehensive test coverage for retry functionality including: basic retry with generation increment, error on running workflows, retrying from step number/name/unknown reference, step deletion verification, and cascading cleanup scenarios.

Sequence Diagram

sequenceDiagram
    participant Client as Application Code
    participant API as WorkflowManager API
    participant Mutation as Retry Mutation
    participant DB as Database
    participant Engine as Workflow Engine
    participant Pool as Workpool

    Client->>API: retry(ctx, workflowId, {from, startAsync})
    API->>Mutation: Invoke retry mutation with normalized args
    Mutation->>DB: Get workflow & validate completion
    activate DB
    DB-->>Mutation: Workflow with runResult
    deactivate DB
    
    alt from specified
        Mutation->>DB: Delete steps from specified point
        activate DB
        DB-->>Mutation: Cascade cleanup nested workflows
        deactivate DB
    end
    
    Mutation->>DB: Increment generationNumber & clear runResult
    activate DB
    DB-->>Mutation: Updated workflow
    deactivate DB
    
    alt startAsync true
        Mutation->>Pool: Enqueue via workpool
        Pool-->>Engine: Async execution
    else startAsync false
        Mutation->>Engine: Direct runMutation
    end
    
    Engine-->>Client: Workflow resumed from retry point
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • sethconvex

Poem

🐰 Hop, hop, rewind and try again!
From step N, let workflows begin,
No need to leap through earlier hops,
Just retry right where progress stops.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'allow resuming' is vague and generic, using a non-descriptive term that doesn't convey the specific mechanism (workflow retry) or key features implemented. Consider a more specific title like 'Add workflow retry functionality' or 'Support resuming workflows from specific steps' to clearly communicate the primary change.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The PR implementation addresses both linked issues: Issue #38 (workflow retry mechanism allowing resume from failed point) and Issue #191 (cleanup handling for cancelled workflows with graceful missing journal entry handling).
Out of Scope Changes check ✅ Passed All changes are scoped to implementing workflow retry functionality and improving cleanup handling; no unrelated modifications detected beyond the stated objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ian/resume

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Member Author

ianmacartney commented Feb 19, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Feb 19, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@convex-dev/workflow@205

commit: 44c6c60

@ianmacartney ianmacartney marked this pull request as ready for review February 19, 2026 04:27
@ianmacartney ianmacartney force-pushed the ian/resume branch 2 times, most recently from acee365 to b7fdb37 Compare March 3, 2026 23:38
Base automatically changed from ian/cleanup-nested-workflow-and-events to ian/use-convex-size March 3, 2026 23:45
@ianmacartney ianmacartney changed the base branch from ian/use-convex-size to graphite-base/205 March 6, 2026 01:38
@ianmacartney ianmacartney changed the base branch from graphite-base/205 to main March 6, 2026 01:38
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/component/workflow.ts (1)

479-492: Consider extracting getWorkpool outside the loop.

getWorkpool is called inside the loop for each workflow step. If there are multiple nested workflow steps to clean up, this results in repeated workpool instantiation. Consider moving it outside the loop for efficiency.

♻️ Suggested refactor
 async function deleteSteps(ctx: MutationCtx, steps: Doc<"steps">[]) {
+  let workpool: Awaited<ReturnType<typeof getWorkpool>> | undefined;
   for (const entry of steps) {
     await ctx.db.delete(entry._id);
     if (entry.step.kind === "event" && entry.step.eventId) {
       await ctx.db.delete(entry.step.eventId);
     } else if (entry.step.kind === "workflow" && entry.step.workflowId) {
-      const workpool = await getWorkpool(ctx, {});
+      workpool ??= await getWorkpool(ctx, {});
       await workpool.enqueueMutation(ctx, api.workflow.cleanup, {
         workflowId: entry.step.workflowId,
         force: true,
       });
     }
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/component/workflow.ts` around lines 479 - 492, deleteSteps currently
calls getWorkpool inside the for-loop causing repeated instantiation when
multiple steps are workflow kind; refactor by calling getWorkpool once before
the loop (e.g., const workpool = await getWorkpool(ctx, {})) and reuse that
instance when handling entry.step.kind === "workflow" so
workpool.enqueueMutation(ctx, api.workflow.cleanup, {...}) uses the pre-fetched
workpool; ensure you still only call getWorkpool when at least one workflow step
may be present (lazy-init or null-check) and keep deletion of entry._id and
event handling unchanged.
src/client/index.ts (1)

230-240: Consider validating that safeFunctionName returns a non-empty string.

If safeFunctionName(options.from) returns an empty string (possible edge case with malformed function references), the retry mutation would receive from: "", which passes validation but would fail with a confusing error message like Step "" not found.

💡 Optional defensive check
       } else {
-        from = safeFunctionName(options.from);
+        from = safeFunctionName(options.from);
+        if (!from) {
+          throw new Error("Could not extract function name from reference");
+        }
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/client/index.ts` around lines 230 - 240, The code sets from =
safeFunctionName(options.from) without ensuring the returned string is
non-empty, which can pass validation but cause confusing runtime errors (e.g.,
Step "" not found); update the logic around safeFunctionName (used where
options.from is handled and later passed into the retry mutation) to validate
that the result is a non-empty string and handle the empty case (e.g., throw a
clear error or treat as undefined) so the retry mutation never receives an empty
"" value for from. Ensure references: options.from, safeFunctionName(...), the
local variable from, and the retry mutation input are covered by this check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/client/index.ts`:
- Around line 230-240: The code sets from = safeFunctionName(options.from)
without ensuring the returned string is non-empty, which can pass validation but
cause confusing runtime errors (e.g., Step "" not found); update the logic
around safeFunctionName (used where options.from is handled and later passed
into the retry mutation) to validate that the result is a non-empty string and
handle the empty case (e.g., throw a clear error or treat as undefined) so the
retry mutation never receives an empty "" value for from. Ensure references:
options.from, safeFunctionName(...), the local variable from, and the retry
mutation input are covered by this check.

In `@src/component/workflow.ts`:
- Around line 479-492: deleteSteps currently calls getWorkpool inside the
for-loop causing repeated instantiation when multiple steps are workflow kind;
refactor by calling getWorkpool once before the loop (e.g., const workpool =
await getWorkpool(ctx, {})) and reuse that instance when handling
entry.step.kind === "workflow" so workpool.enqueueMutation(ctx,
api.workflow.cleanup, {...}) uses the pre-fetched workpool; ensure you still
only call getWorkpool when at least one workflow step may be present (lazy-init
or null-check) and keep deletion of entry._id and event handling unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0b6bcfa4-98ca-4358-a3f6-f2a65ac494e0

📥 Commits

Reviewing files that changed from the base of the PR and between ace49ae and 28c53f9.

⛔ Files ignored due to path filters (1)
  • src/component/_generated/component.ts is excluded by !**/_generated/**
📒 Files selected for processing (7)
  • README.md
  • src/client/index.ts
  • src/component/journal.ts
  • src/component/model.ts
  • src/component/pool.ts
  • src/component/workflow.test.ts
  • src/component/workflow.ts

Copy link
Copy Markdown
Contributor

@sethconvex sethconvex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@sethconvex sethconvex merged commit 0e48e34 into main Mar 6, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Do not log a warning/error when calling workflowManager.cleanup() in onComplete() for a cancelled workflow Workflow retry mechanism

2 participants