chore: executor benchmarks, scripts, docs, and visualization by sethconvex · Pull Request #209 · get-convex/workflow

sethconvex · 2026-02-24T01:11:58Z

Benchmarks, Scripts, and Documentation for Executor Mode

Depends on #210 (executor mode core implementation).

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR introduces executor mode, a sharded task queue architecture for high-throughput workflow execution. It adds new components (task queue, coordinator, executor actions), extends the client and schema, includes comprehensive benchmarks and documentation, and modifies routing logic in existing modules to support batch actions and executor-based step processing.

Changes

Cohort / File(s)	Summary
Executor Mode Core `src/component/taskQueue.ts`, `src/component/coordinator.ts`, `src/component/schema.ts`	New sharded task queue with claim/record/replay operations, coordinator for batch workflow processing, and schema additions (taskQueue, executorEpoch, executorHandoff, coordinatorState tables; executorShards/readyToRun/batchBridgeHandle workflow fields)
Client API Extensions `src/client/index.ts`, `src/client/step.ts`, `src/client/workflowMutation.ts`	Added executor and batch action support: executor() method, setExecutorRef(), action() registration, batchBridge(), startExecutors(), batchActionNames propagation through step execution
Routing & Execution Logic `src/component/journal.ts`, `src/component/pool.ts`, `src/component/event.ts`, `src/component/workflow.ts`	Conditional routing for batch actions and executor-managed steps; removed workpool dependencies in favor of coordinator/task queue paths; executor-aware cancel logic cleaning taskQueue entries; replaced direct workpool invocation with scheduled directRunWorkflow; new timeline and counting queries
Benchmark Infrastructure `example/convex/benchmark.ts`, `example/convex/http.ts`, `example/convex/schema.ts`, `scripts/benchmark_*.sh`, `scripts/run_joke_batched_trace.sh`, `scripts/soak_workflow_20.sh`, `scripts/watch_workflow_debug.sh`, `scripts/check_regular.cjs`	New LLM-based workflow benchmarks (standard vs. executor modes), HTTP visualization page with timeline rendering, benchmark orchestration scripts, status aggregation, and tail latency diagnostics
Configuration & Setup `.mcp.json`, `package.json`, `example/convex/convex.config.ts`	MCP server configuration, Anthropic SDK dependency (`@anthropic-ai/sdk` ^0.75.0), workpool middleware registration
Documentation `docs/executor-mode-spec.md`, `docs/executor-mode.md`, `docs/tuning-guide.md`, `screenshots/d1024.md`	Comprehensive executor mode design specification, user guide with architecture/lifecycle/API details, tuning guide with knobs and profiles for different workloads, and benchmark observations
Benchmark Results `benchmark_results/.json`, `benchmark_results/.txt`, `benchmark_results/workflow_.txt`, `benchmark_results/workflow_results_archive.txt`	Data files from benchmark runs (batched 1000/100 items, workflow benchmarks with regular/batched mode comparisons, timeline metrics, soak test results)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Implement workflow.list and workflow.listByName #189: Modifies workflow listing APIs and query surfaces in src/component/workflow.ts; this PR significantly extends those same query surfaces with new timeline/counting/pagination operations, creating overlapping touchpoints in the listing/querying infrastructure.

Suggested reviewers

ianmacartney

Poem

🐰 A sharded queue hops into view,
With executors claiming tasks on cue,
Epochs align, handoffs flow smooth and true,
No more contention—the coordinator's breakthrough!
From benchmark to benchmark, the speedups ring loud, 🎉

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 8.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'chore: executor benchmarks, scripts, docs, and visualization' accurately describes the main additions in the PR: benchmarks, test scripts, documentation files, and HTTP visualization tooling for the executor mode feature.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch workflow-v2-perf

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

pkg-pr-new · 2026-02-24T01:13:06Z

Open in StackBlitz

npm i https://pkg.pr.new/get-convex/workflow/@convex-dev/workflow@209

commit: a974d24

coderabbitai

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/component/journal.ts (1)

130-266: ⚠️ Potential issue | 🟠 Major

Executor path drops per-step retry/scheduling.
Lines 130-266: when executorShards is set, inserts use DEFAULT_QM_RETRY (or none for actions) and ignore stepArgs.retry/schedulerOptions, so retry: false, custom backoff, or runAt/runAfter are silently lost vs the workpool path. Please map retries into taskQueue and explicitly reject unsupported scheduling (or implement delayed tasks).

🛠️ Suggested normalization/guard

-        const { retry, schedulerOptions } = stepArgs;
+        const { retry, schedulerOptions } = stepArgs;
+        const retryConfig =
+          retry === false
+            ? undefined
+            : retry === true || retry === undefined
+              ? DEFAULT_QM_RETRY
+              : retry;
+        if (workflow.executorShards && schedulerOptions) {
+          throw new Error("schedulerOptions not supported in executor mode");
+        }

-                await ctx.db.insert("taskQueue", {
+                await ctx.db.insert("taskQueue", {
                   shard,
                   functionType: "query",
                   handle: step.handle,
                   args: step.args,
                   stepId,
                   workflowId: workflow._id,
                   generationNumber,
-                  retry: DEFAULT_QM_RETRY,
+                  retry: retryConfig,
                 });

Apply retryConfig similarly in mutation/action taskQueue inserts.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/component/journal.ts` around lines 130 - 266, The executor path (when
workflow.executorShards is set) currently hardcodes DEFAULT_QM_RETRY (or omits
retry for actions) and ignores per-step retry/scheduling (stepArgs.retry and
schedulerOptions) leading to lost retry/backoff/runAt semantics; update the
taskQueue inserts (the ctx.db.insert("taskQueue") calls in the switch branches
for "query","mutation","action") to map stepArgs.retry into the inserted retry
configuration (and include any supported schedulerOptions fields like
runAt/runAfter/backoff), and for any unsupported scheduling options explicitly
throw or return an error so callers know they’re unsupported (use
shardForWorkflow, stepId, stepArgs.batchActionName and workflow.executorShards
to locate the relevant branches). Ensure behavior mirrors
workpool.enqueueQuery/Mutation/Action parameter handling or documents/rejects
divergences.

♻️ Duplicate comments (3)

benchmark_results/batched_1000_timeline_20260215_112330.json (1)

1-43: Same unpopulated timeline fields as batched_1000_timeline_20260215_111510.json.

durationMs: 0, elapsedMs: 0, concurrency: [], itemRows: [], steps: [] — see the note raised on batched_1000_timeline_20260215_111510.json. The pattern is consistent across both timeline snapshots, confirming a systematic recording gap rather than a one-off artifact.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@benchmark_results/batched_1000_timeline_20260215_112330.json` around lines 1
- 43, The batched timeline object is being left unpopulated (durationMs,
elapsedMs, concurrency, itemRows, steps are empty/zero) — update the timeline
recording/finalization logic so the "batched" timeline is fully populated before
persisting: in the code paths that build the batched object (look for functions
or methods that construct or finalize the "batched" timeline or call a
finalizeTimeline/recordTimeline routine), calculate and set timelineStart and
timelineEnd (use run start/stop timestamps), compute durationMs and elapsedMs,
populate concurrency and steps arrays with the recorded events, and fill
itemRows with per-item timing/metadata; ensure this update runs for the batched
mode (batchedMaxWorkers/batchedMaxParallelism paths) and is invoked both on
normal completion and on retry/error code paths so the JSON snapshot contains
the expected fields.

benchmark_results/batched_1000_status_20260215_112330.json (1)

1-13: Same concern as the other benchmark JSON files — consider excluding from repo.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@benchmark_results/batched_1000_status_20260215_112330.json` around lines 1 -
13, The benchmark results file
benchmark_results/batched_1000_status_20260215_112330.json should not be
committed; remove it from the repository and stop tracking future similar files
by deleting or moving this file from source control (use git rm --cached on the
file to remove from the index while keeping local copy) and add an appropriate
pattern for benchmark_results/*.json (or the specific naming convention) to
.gitignore so future benchmark JSONs are not committed; ensure CI/artifacts or a
designated storage location is documented for keeping these outputs instead.

benchmark_results/batched_1000_timeline_20260215_115704.json (1)

1-43: Same concern as the other benchmark JSON file — consider excluding from repo.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@benchmark_results/batched_1000_timeline_20260215_115704.json` around lines 1
- 43, This JSON (batched_1000_timeline_20260215_115704.json) is a generated
benchmark artifact that should be removed from the repo; delete the committed
file, add a .gitignore entry to exclude generated benchmark timeline files (e.g.
a wildcard like batched_*_timeline_*.json or a dedicated pattern for benchmark
outputs), and update any CI/docs to upload or store these artifacts outside
source control instead of committing them.

🧹 Nitpick comments (13)

scripts/check_regular.cjs (1)

2-2: Consider accepting the file path as a CLI argument instead of hardcoding /tmp/jokebattle_data.json.

The hardcoded path makes the script fragile across different machines or CI environments. process.argv[2] with a sensible default is an easy improvement.

♻️ Proposed refactor

-const d = readFileSync("/tmp/jokebattle_data.json", "utf8");
+const filePath = process.argv[2] ?? "/tmp/jokebattle_data.json";
+const d = readFileSync(filePath, "utf8");

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/check_regular.cjs` at line 2, The script currently hardcodes the
input file path in the call to fs.readFileSync (const d =
fs.readFileSync("/tmp/jokebattle_data.json", "utf8")), making it brittle; change
this to read the path from process.argv[2] with a sensible default (e.g. use
process.argv[2] || "/tmp/jokebattle_data.json"), update any related variable
names that consume the file contents (e.g. d) and ensure the script prints a
helpful usage message or default note when no argument is provided; keep the
fs.readFileSync call but replace the literal path with the variable so the file
path can be passed via CLI.

scripts/watch_workflow_debug.sh (1)

7-22: Extract json_last to a shared helper function.

This function is duplicated identically across three scripts: scripts/watch_workflow_debug.sh, scripts/benchmark_workflow_small.sh, and scripts/benchmark_joke_battle.sh. Create a shared helper (e.g., scripts/lib/common.sh) and source it from each script to eliminate duplication.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/watch_workflow_debug.sh` around lines 7 - 22, Move the json_last
function out of each script into a single shared helper file (e.g., create
scripts/lib/common.sh) that defines json_last exactly once, then update
scripts/watch_workflow_debug.sh, scripts/benchmark_workflow_small.sh, and
scripts/benchmark_joke_battle.sh to source that helper (source
"scripts/lib/common.sh") and remove the duplicated function bodies; ensure the
helper is executable/readable and that the function name json_last remains
unchanged so callers keep working.

scripts/benchmark_workflow_small.sh (1)

37-50: Avoid clobbering the fixed /tmp error log.

A shared /tmp/workflow_bench_err.log can be overwritten if runs overlap, making debugging harder. Consider a per-run temp log (PID-based or mktemp).

♻️ Proposed refactor

 run_one() {
   local mode="$1"
   local topic="$2"
+  local err_log="/tmp/workflow_bench_err.${mode}.$$"
   if [[ "$SKIP_CLEAR" != "1" ]]; then
     run_convex llmSimulation:clearAll "{}" >/dev/null || true
     sleep 1
   fi
@@
-    st="$(run_convex llmSimulation:benchmarkStatus "{\"simulationId\":\"$simulation_id\"}" 2>/tmp/workflow_bench_err.log)"
+    st="$(run_convex llmSimulation:benchmarkStatus "{\"simulationId\":\"$simulation_id\"}" 2>"$err_log")"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/benchmark_workflow_small.sh` around lines 37 - 50, The script's
run_one function currently writes stderr to a fixed /tmp/workflow_bench_err.log
which can be clobbered by concurrent runs; update run_one to create a per-run
temporary error log (e.g., use mktemp or include $$/PID in the filename), store
that temp path in a local variable (e.g., err_log), and replace all references
to /tmp/workflow_bench_err.log with that variable (including cleanup at the end
of run_one); ensure the temp file is created before the loop and removed on exit
so concurrent runs do not overwrite each other's logs.

scripts/benchmark_compare.sh (3)

36-39: Three separate python3 invocations to parse the same JSON is wasteful.

You could parse all three fields in a single call:

Proposed consolidation

-    local completed failed running
-    completed="$(echo "$raw" | python3 -c 'import json,sys; print(json.load(sys.stdin).get("completed",0))' 2>/dev/null || echo 0)"
-    failed="$(echo "$raw" | python3 -c 'import json,sys; print(json.load(sys.stdin).get("failed",0))' 2>/dev/null || echo 0)"
-    running="$(echo "$raw" | python3 -c 'import json,sys; print(json.load(sys.stdin).get("running",0))' 2>/dev/null || echo 0)"
+    local parsed
+    parsed="$(echo "$raw" | python3 -c '
+import json, sys
+d = json.load(sys.stdin)
+print(d.get("completed", 0), d.get("failed", 0), d.get("running", 0))
+' 2>/dev/null || echo "0 0 0")"
+    local completed failed running
+    read -r completed failed running <<< "$parsed"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/benchmark_compare.sh` around lines 36 - 39, Replace the three
separate python3 invocations that extract "completed", "failed", and "running"
from "$raw" with a single python3 call that reads the JSON once and prints the
three values (e.g., space-separated or newline), then read those into the shell
variables; update the code around variables completed, failed, running to use a
single python3 invocation that returns defaults of 0 when keys are missing and
assign with read -r completed failed running (or mapfile) to avoid repeated JSON
parsing.

46-48: eval for dynamic variable assignment is fragile — consider alternatives.

While the mode argument is currently hardcoded ("batched"/"standard"), eval is a common source of injection bugs if that ever changes. A safer pattern uses declare or an associative array.

Example using declare

-      eval "${mode}_elapsed=$elapsed"
-      eval "${mode}_completed=$completed"
-      eval "${mode}_failed=$failed"
+      declare -g "${mode}_elapsed=$elapsed"
+      declare -g "${mode}_completed=$completed"
+      declare -g "${mode}_failed=$failed"

Apply the same change at lines 54-56.

Also applies to: 54-56

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/benchmark_compare.sh` around lines 46 - 48, The script uses eval to
assign dynamic variables (e.g., eval "${mode}_elapsed=$elapsed", eval
"${mode}_completed=$completed", eval "${mode}_failed=$failed") which is fragile
and can lead to injection; replace these eval assignments with a safer approach
such as using declare to create the dynamic variable names (declare
"${mode}_elapsed=$elapsed" etc.) or, better, refactor to store results in an
associative array (e.g., results[${mode}_elapsed]=$elapsed) and update the code
to read from that array; apply the same change for the corresponding eval calls
at the other location (lines 54-56).

24-25: start_out is captured but never used.

The return value of startBenchmark is assigned to start_out but never referenced. Either remove the variable or use it (e.g., to log the run ID).

Proposed fix

-  local start_out
-  start_out="$(convex_run "benchmark:startBenchmark" "{\"mode\":\"$mode\",\"count\":$COUNT}")"
+  convex_run "benchmark:startBenchmark" "{\"mode\":\"$mode\",\"count\":$COUNT}" >/dev/null

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/benchmark_compare.sh` around lines 24 - 25, The script assigns the
result of convex_run "benchmark:startBenchmark" to start_out but never uses it;
either remove start_out and just call convex_run for side effects, or capture
and use it (e.g., parse and echo a run ID/info). Update the invocation around
start_out and ensure the symbol start_out (and the convex_run call to
"benchmark:startBenchmark") is either removed or its value is logged/consumed
(for example echoing the returned run ID together with mode and COUNT) so the
assignment is meaningful.

src/client/step.ts (1)

191-205: Base-name matching could produce false positives across modules.

Splitting on :/ and taking only the last segment means two functions like "moduleA:process" and "moduleB:process" would both match if "process" is in batchActionNames. This is probably intentional given how batch actions are registered, but worth noting if module-qualified names become necessary in the future.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/client/step.ts` around lines 191 - 205, The current base-name matching
logic (involving safeFunctionName(target.function), splitting on /[:\/]/ and
comparing the last segment against this.batchActionNames) can produce false
positives across modules; to fix, narrow matching by preferring full-qualified
names first and falling back to base-name only when necessary: update the
detection in the code that sets batchActionName to first check
this.batchActionNames.has(fnName) (the full safeFunctionName), then if not found
check the existing base-name fallback (parts[parts.length-1]) and only set
batchActionName to that when explicitly required; ensure this preserves behavior
for target.kind === "function" and target.functionType === "action" and keep the
variable name batchActionName unchanged.

scripts/soak_workflow_20.sh (2)

22-25: json_field interpolates the field name directly into Python code — safe here but fragile.

Since json_field is only called with hardcoded field names, there's no current injection risk. However, a safer pattern uses sys.argv:

Safer alternative

 json_field() {
   local field="$1"
-  python3 -c "import json,sys; d=json.load(sys.stdin); print(d['$field'])"
+  python3 -c "import json,sys; d=json.load(sys.stdin); print(d[sys.argv[1]])" "$field"
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/soak_workflow_20.sh` around lines 22 - 25, The json_field function
currently interpolates the field name directly into the inline Python code
(json_field), which is fragile; change it to pass the field name as an argument
via sys.argv to avoid injecting shell content into the Python snippet, e.g. call
python3 with the field as an argv parameter and use sys.argv[1] inside the
Python code, and update any callers of json_field if needed to continue passing
the field name as the first parameter.

41-67: Line 67 is dead code — the while true loop always sets done=1 before break.

Both exit paths from the loop (lines 55-56 and lines 62-63) set done=1 before break, and there is no other way to exit the while true. The guard on line 67 will therefore never increment failures.

Proposed fix — remove the dead code

-  done=0
   while true; do
     ...
-      done=1
       break
     fi
     if [[ "$elapsed_s" -ge "$TIMEOUT_SECS" ]]; then
       echo "timeout run=$i"
       run_secs+=("$elapsed_s")
       failures=$((failures + 1))
-      done=1
       break
     fi
     sleep "$POLL_SECS"
   done
-  [[ "$done" -eq 1 ]] || failures=$((failures + 1))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/soak_workflow_20.sh` around lines 41 - 67, The guard at the end using
the done flag is dead code because the while true loop always sets done=1 before
breaking; remove the final check/increment line ("[[ \"$done\" -eq 1 ]] ||
failures=$((failures + 1))") so failures is only updated where the loop exits
(inside the non-running and timeout branches). Locate the while true loop that
calls run_convex llmSimulation:benchmarkStatus and updates status, elapsed_s,
run_secs and failures, and delete the trailing dead-code guard referencing done.

benchmark_results/batched_1000_timeline_20260215_115155.json (1)

1-43: Consider excluding benchmark result artifacts from the repository.

These JSON files are ephemeral, machine-generated snapshots that will accumulate over time and bloat the repository history. They also contain run-specific IDs and timestamps with no reuse value.

Consider adding benchmark_results/ to .gitignore and storing these artifacts externally (CI artifacts, cloud storage, etc.) instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmark_results/batched_1000_timeline_20260215_115155.json` around lines 1
- 43, Add the benchmark_results/ directory to .gitignore and stop committing
ephemeral JSON artifacts: update .gitignore to include the literal entry
"benchmark_results/" and commit that change, then remove any already-tracked
files from git (e.g., using git rm --cached on files under benchmark_results/)
and commit the removal so history no longer grows; finally, configure CI or
external storage to persist these artifacts instead of the repository (ensure
any pipeline or job names that produce these files push them to CI artifacts or
cloud storage).

scripts/run_joke_batched_trace.sh (1)

5-5: DEBUG_TRACE is interpolated as a raw JSON value — non-boolean inputs will produce invalid JSON.

DEBUG_TRACE defaults to "false" which works since it's a valid JSON boolean literal. However, inputs like DEBUG_TRACE=1 or DEBUG_TRACE=yes would produce malformed JSON in the argument string on line 29. Consider validating or coercing:
Optional guard
+[[ "$DEBUG_TRACE" == "true" ]] || DEBUG_TRACE="false"
+
 echo "debugTrace=$DEBUG_TRACE count=$COUNT"
Also applies to: 29-29
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/run_joke_batched_trace.sh` at line 5, The DEBUG_TRACE variable is
inserted as a raw JSON literal and accepts arbitrary input; coerce/validate it
to a JSON boolean before use. Replace the simple default assignment of
DEBUG_TRACE with a small normalization block that maps common truthy values
(e.g., 1, "1", yes, y, true, TRUE) to the literal true and everything else to
false (for example using a case/regex check or if-statement), and then use that
normalized DEBUG_TRACE variable in the JSON argument construction so the
resulting JSON contains only the literal true or false; reference the
DEBUG_TRACE variable and the JSON argument construction where the variable is
interpolated.

src/client/workflowMutation.ts (1)

45-49: Consider grouping optional parameters into an options object for better ergonomics.

Two trailing optional positional parameters means a caller who needs only batchActionNames must write:

workflowMutation(component, registered, undefined, batchActionNames)

Consolidating into a single options bag avoids this:

♻️ Suggested refactor

 export function workflowMutation<ArgsValidator extends PropertyValidators>(
   component: WorkflowComponent,
   registered: WorkflowDefinition<ArgsValidator>,
-  defaultWorkpoolOptions?: WorkpoolOptions,
-  batchActionNames?: Set<string>,
+  options?: {
+    workpoolOptions?: WorkpoolOptions;
+    batchActionNames?: Set<string>;
+  },
 ): RegisteredMutation<...> {
   const workpoolOptions = {
-    ...defaultWorkpoolOptions,
+    ...options?.workpoolOptions,
     ...registered.workpoolOptions,
   };
   // ...
   const executor = new StepExecutor(
     // ...
-    batchActionNames,
+    options?.batchActionNames,
   );

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/client/workflowMutation.ts` around lines 45 - 49, Replace the two
trailing optional positional parameters on workflowMutation
(defaultWorkpoolOptions and batchActionNames) with a single optional options
object to improve ergonomics: change the signature of workflowMutation to accept
an options?: { defaultWorkpoolOptions?: WorkpoolOptions; batchActionNames?:
Set<string> } (or similar), update the body to read
options.defaultWorkpoolOptions and options.batchActionNames, and update all call
sites to pass an options object instead of using positional undefined
placeholders; optionally add a lightweight overload or compatibility branch in
workflowMutation to accept the old positional form and map it to the new options
shape while emitting a deprecation note to callers.

src/component/workflow.ts (1)

330-357: Sort steps by stepNumber in timelinePage for stable ordering.
The DB collect order isn’t guaranteed; sorting avoids shuffled bars in the timeline viz.

Suggested fix

-        const stepDocs = await ctx.db
+        const stepDocs = await ctx.db
           .query("steps")
           .withIndex("workflow", (q) => q.eq("workflowId", wf._id))
           .collect();
+        stepDocs.sort((a, b) => a.stepNumber - b.stepNumber);
         return {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/component/workflow.ts` around lines 330 - 357, The timeline steps can
appear in nondeterministic DB order; before mapping stepDocs into the steps
array in the Promise.all mapping (the async wf => { ... } block that builds
page), sort the collected stepDocs by their stepNumber (or sort the produced
steps array by stepNumber) to ensure stable ordering for the timeline
visualization; update the code around the stepDocs handling (the stepDocs
variable and the steps: stepDocs.map(...) block) to perform a numeric sort by
stepNumber prior to mapping.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.mcp.json:
- Around line 1-13: The .mcp.json currently contains user-specific absolute
paths (the "args" entry pointing to
"/Users/magicseth/Projects/claudemanager/dist/main/mcp-server.js" and the "env"
value for "MCP_SOCKET_PATH"), so replace those with non-user-specific defaults
and move this file to a template: create a .mcp.example.json that uses
repo-relative paths (e.g., "args": ["./dist/main/mcp-server.js"]) and a
placeholder or env interpolation for MCP_SOCKET_PATH (e.g., use an explicitly
documented placeholder value or reference an env var like "${MCP_SOCKET_PATH}"),
update README with instructions for creating a local .mcp.json from
.mcp.example.json, and add the real .mcp.json to .gitignore so developers/CI can
supply machine-specific paths without committing them.

In `@1npm`:
- Around line 1-108: The file contains a duplicated CLI help output for "convex
dev" (the entire help block is repeated); remove the duplicate so only one
instance of the help text remains (or delete the file if this output should not
be committed), ensuring the single retained block includes the full options list
and usage lines for "convex dev".

In `@benchmark_results/batched_1000_timeline_20260215_111510.json`:
- Around line 1-43: The batched run timeline fields (batched.jokesDone,
judgesDone, picksDone, maxConcurrency, concurrency, itemRows, durationMs,
elapsedMs, steps, result) are not being populated for large batches; inspect and
fix the timeline aggregation path (e.g., finalizeBatchedRun / recordTimeline /
aggregateWorkerTimeline) to ensure it iterates over all items/workers instead of
being truncated by a hardcoded cap or page size, correctly accumulates counts
and maxConcurrency, appends all itemRows and concurrency samples, computes
durationMs/elapsedMs from startedAt/completedAt using safe numeric types, and
sets result when aggregation completes; update any early-return conditions or
try/catch that swallow errors so the batched object is assigned the computed
values for the runId.

In `@benchmark_results/workflow_benchmark_small_1000_2026-02-15_00-08-07.txt`:
- Around line 1-4: This benchmark run shows errors from the CONVEX cleanup step
(CONVEX ?(llmSimulation:clearAll)) reporting "Simulation not found" for IDs
'j9712dprjeky0709w2098prf65816c0h' and 'j97507jy904tpv1erkkkzgt1yx8172qh';
before archiving, either move this file out of the successful archive into a
failed/partial folder or update the file/metadata to annotate the run as
incomplete and include the exact error lines and the simulation IDs so consumers
won't treat this as a successful workflow_small benchmark (section_count=1000).

In `@benchmark_results/workflow_results_archive_latest.txt`:
- Around line 2-4: The file contains an absolute local path in the repo metadata
(the "repo=" line) which leaks local filesystem info; change the "repo=" value
in benchmark_results/workflow_results_archive_latest.txt to a repo-relative
identifier or a sanitized placeholder (e.g., repository name or commit hash)
instead of an absolute /Users/... path so the artifact is portable and
non-identifying.

In `@docs/executor-mode-spec.md`:
- Around line 42-66: Several fenced code blocks (ASCII diagrams such as the
"User mutation └─ workflow.start() └─ workflow.create mutation (component) ..."
and other similar blocks) are missing language identifiers which triggers
markdownlint MD040; update each of those fenced blocks (the diagram blocks
referencing executor(shard=0/1/.../N-1), taskQueue, flush loop, and the other
listed blocks) by adding a language tag like ```text (or ```typescript where
appropriate) at the opening fence so the blocks become e.g. ```text ... ```;
ensure you apply this change to all reported blocks (the diagram shown plus the
other occurrences) so every fenced block includes a language identifier.
- Line 1035: Fix the typo in the docs for the POLL_BACKOFF_MS entry: change
"budge" to "budget" in the table cell describing POLL_BACKOFF_MS so the sentence
reads "wastes query budget" (refer to the `POLL_BACKOFF_MS` table row in
executor-mode-spec.md).

In `@docs/executor-mode.md`:
- Around line 196-202: The constants in the executor-mode table (CLAIM_LIMIT,
MAX_CONCURRENCY, POLL_BACKOFF_MS, MAX_EMPTY_POLLS, RESCHEDULE_MS) conflict with
the values in the tuning guide; update the table to match the authoritative
tuning-guide values (e.g., set CLAIM_LIMIT to 800, POLL_BACKOFF_MS to 200,
MAX_CONCURRENCY to 500, MAX_EMPTY_POLLS to 150) and ensure RESCHEDULE_MS matches
the implementation/guide as well, adjusting the table row text and any
explanatory text so both docs present the same defaults.
- Around line 93-97: Replace the inaccurate description that tasks are assigned
via Math.random with the current deterministic shard mapping: explain that
executor mode uses a hash-based assignment (e.g., hash(workflowId) % numShards)
so the same workflowId always maps to the same shard; update any examples or
tuning guidance that assume uniform random distribution to reflect that skew and
affinity are determined by workflowId hashing rather than Math.random.

In `@example/convex/benchmark.ts`:
- Around line 140-150: The file defines a BenchmarkMode type alias that's
unused, so update function signatures to use that alias instead of inline
unions: change doWork(mode: "simulated" | "real", ...) to use mode:
BenchmarkMode, and similarly update any other functions mentioned (e.g.,
simulateWork, callClaude or the function at the 162-163 region) to accept
BenchmarkMode; alternatively, if you prefer removal, delete the BenchmarkMode
alias and keep the inline unions—make the change consistently where mode is
typed to eliminate the no-unused-vars warning.

In `@example/convex/http.ts`:
- Around line 101-114: The code reads params via URLSearchParams and casts
params.get("after") to a Number in createdAfter without validation, so guard the
parsed value (createdAfter) using Number.isFinite (or Number.isNaN/isFinite
checks) and, if invalid, replace the current friendly missing-parameter UI (the
document.body.innerHTML block) with a similar message that the ?after= value is
not a valid timestamp and instructs the user how to obtain a numeric startedAt;
after showing the message, throw an Error to stop execution. Ensure you locate
and update the logic around URLSearchParams/params, the createdAfter assignment,
and the existing document.body.innerHTML error block so invalid non-numeric
input is handled the same way as a missing parameter.

In `@package.json`:
- Around line 65-67: The package.json currently lists "@anthropic-ai/sdk" in
"dependencies" but it's only used by the example (example/convex/benchmark.ts);
move "@anthropic-ai/sdk": "^0.75.0" from "dependencies" into "devDependencies"
in package.json, keep the same version spec, update the lockfile (npm/yarn/pnpm
install) so consumers won't receive this runtime dependency, and verify the
example still resolves the package during local development/test runs.

In `@screenshots/d1024.md`:
- Line 12: The Viz URL in screenshots/d1024.md (the string
"https://cautious-quail-607.convex.site/benchmark-viz?after=1771390151616") is
deployment-specific and may break; update the markdown to either (a) replace the
live link with an archived/static asset (commit a screenshot image into the repo
and link to that file), or (b) keep the live link but add an explicit
ephemeral/disposable-note next to it (e.g., "ephemeral deployment — may be
removed") so readers know it can become a dead link; ensure the referenced URL
string and any alt text reflect the change.

In `@scripts/check_regular.cjs`:
- Around line 2-3: Wrap the synchronous file read and JSON parse in a try/catch
around the fs.readFileSync("/tmp/jokebattle_data.json", "utf8") and
JSON.parse(d) calls (variables d and j) to handle ENOENT and SyntaxError: on
ENOENT log a clear message that the file is missing, on SyntaxError log that the
JSON is malformed (including the error.message), and for any other error log it
and exit non‑zero; alternatively check fs.existsSync before reading and still
catch JSON.parse failures to provide actionable messages rather than letting the
raw exceptions bubble.
- Around line 1-3: Rename the file from check_regular.cjs to check_regular.mjs
and convert CommonJS to ESM by replacing the require("fs") usage with an ESM
import (e.g., import fs from "fs" or import { readFileSync } from "fs"), keep
using readFileSync("/tmp/jokebattle_data.json", "utf8") and JSON.parse(d) as
before, and ensure package.json ("type": "module") is compatible; alternatively,
if you prefer to keep .cjs, add an eslint.config.js override to mark scripts/
with the node environment or exclude it from linting so no ESLint
no-undef/require errors occur.

In `@scripts/watch_workflow_debug.sh`:
- Around line 43-57: The inline Python block invoked with python3 -c that reads
json into d and prints fields (d["pendingTotal"], d["pendingBySlot"], and
iterates d["simulations"]) can raise KeyError/Exception and, under set -euo
pipefail, will exit the whole script; update that block to guard the parse/print
work with a try/except that catches Exception (including
KeyError/JSONDecodeError), writes a short diagnostic to stderr, and exits with
code 0 so the surrounding while true loop continues; locate the python3 -c '...
d = json.load(sys.stdin) ... print("pending_total=...") ...' block and add the
exception handling around the parsing/printing logic.

In `@src/client/index.ts`:
- Around line 6-7: Replace the permissive "type BatchWorkpool = any" with a
minimal local interface named BatchWorkpool that declares only the
methods/properties your code actually calls (e.g., the run/submit/close/stop
signatures or iterator/length properties used elsewhere); update
src/client/index.ts to export or use that interface in place of any so callers
get type safety, and add a TODO comment to remove this local interface once the
official BatchWorkpool type is exported by `@convex-dev/workpool`. Ensure the
interface method names/signatures exactly match uses in functions that accept a
BatchWorkpool so the compiler will catch API drift.

In `@src/component/coordinator.ts`:
- Around line 10-21: The ensureCoordinatorRunning function can create duplicate
coordinatorState rows under concurrency; change the logic to enforce a singleton
state record: use a fixed unique key/ID for the coordinator state (instead of
blind insert) and perform an atomic upsert/conditional update so only one record
is created and its scheduled flag is set (replace the current ctx.db.insert call
with an upsert or a transaction that patches-or-inserts by the fixed _id); also
add a small cleanup step (query all coordinatorState rows and collapse/delete
duplicates, keeping the single canonical record) to guard existing databases.
Target symbols: ensureCoordinatorRunning, coordinatorState query/first(),
ctx.db.patch, ctx.db.insert and the scheduler call
internal.coordinator.coordinator.

In `@src/component/pool.ts`:
- Around line 97-99: The onCompleteHandler currently creates a logger with a
hard-coded DEFAULT_LOG_LEVEL (const console = createLogger(DEFAULT_LOG_LEVEL)),
which ignores any per-workflow/workpool logLevel; update onCompleteHandler to
read an optional log level from the provided context (e.g.,
args.context.logLevel or args.context.onComplete?.logLevel) and pass that into
createLogger instead of DEFAULT_LOG_LEVEL, falling back to DEFAULT_LOG_LEVEL
only when the context value is absent; adjust variable references around
createLogger, console and any callers in onCompleteHandler to use the resolved
log level so verbosity remains configurable.

In `@src/component/taskQueue.ts`:
- Line 3: The import statement in taskQueue.ts currently imports MutationCtx but
it is unused; remove MutationCtx from the named imports in the import from
"./_generated/server.js" (i.e., update the import that includes mutation and
query so it no longer imports MutationCtx) to satisfy the linter and eliminate
the unused-symbol warning.

---

Outside diff comments:
In `@src/component/journal.ts`:
- Around line 130-266: The executor path (when workflow.executorShards is set)
currently hardcodes DEFAULT_QM_RETRY (or omits retry for actions) and ignores
per-step retry/scheduling (stepArgs.retry and schedulerOptions) leading to lost
retry/backoff/runAt semantics; update the taskQueue inserts (the
ctx.db.insert("taskQueue") calls in the switch branches for
"query","mutation","action") to map stepArgs.retry into the inserted retry
configuration (and include any supported schedulerOptions fields like
runAt/runAfter/backoff), and for any unsupported scheduling options explicitly
throw or return an error so callers know they’re unsupported (use
shardForWorkflow, stepId, stepArgs.batchActionName and workflow.executorShards
to locate the relevant branches). Ensure behavior mirrors
workpool.enqueueQuery/Mutation/Action parameter handling or documents/rejects
divergences.

---

Duplicate comments:
In `@benchmark_results/batched_1000_status_20260215_112330.json`:
- Around line 1-13: The benchmark results file
benchmark_results/batched_1000_status_20260215_112330.json should not be
committed; remove it from the repository and stop tracking future similar files
by deleting or moving this file from source control (use git rm --cached on the
file to remove from the index while keeping local copy) and add an appropriate
pattern for benchmark_results/*.json (or the specific naming convention) to
.gitignore so future benchmark JSONs are not committed; ensure CI/artifacts or a
designated storage location is documented for keeping these outputs instead.

In `@benchmark_results/batched_1000_timeline_20260215_112330.json`:
- Around line 1-43: The batched timeline object is being left unpopulated
(durationMs, elapsedMs, concurrency, itemRows, steps are empty/zero) — update
the timeline recording/finalization logic so the "batched" timeline is fully
populated before persisting: in the code paths that build the batched object
(look for functions or methods that construct or finalize the "batched" timeline
or call a finalizeTimeline/recordTimeline routine), calculate and set
timelineStart and timelineEnd (use run start/stop timestamps), compute
durationMs and elapsedMs, populate concurrency and steps arrays with the
recorded events, and fill itemRows with per-item timing/metadata; ensure this
update runs for the batched mode (batchedMaxWorkers/batchedMaxParallelism paths)
and is invoked both on normal completion and on retry/error code paths so the
JSON snapshot contains the expected fields.

In `@benchmark_results/batched_1000_timeline_20260215_115704.json`:
- Around line 1-43: This JSON (batched_1000_timeline_20260215_115704.json) is a
generated benchmark artifact that should be removed from the repo; delete the
committed file, add a .gitignore entry to exclude generated benchmark timeline
files (e.g. a wildcard like batched_*_timeline_*.json or a dedicated pattern for
benchmark outputs), and update any CI/docs to upload or store these artifacts
outside source control instead of committing them.

---

Nitpick comments:
In `@benchmark_results/batched_1000_timeline_20260215_115155.json`:
- Around line 1-43: Add the benchmark_results/ directory to .gitignore and stop
committing ephemeral JSON artifacts: update .gitignore to include the literal
entry "benchmark_results/" and commit that change, then remove any
already-tracked files from git (e.g., using git rm --cached on files under
benchmark_results/) and commit the removal so history no longer grows; finally,
configure CI or external storage to persist these artifacts instead of the
repository (ensure any pipeline or job names that produce these files push them
to CI artifacts or cloud storage).

In `@scripts/benchmark_compare.sh`:
- Around line 36-39: Replace the three separate python3 invocations that extract
"completed", "failed", and "running" from "$raw" with a single python3 call that
reads the JSON once and prints the three values (e.g., space-separated or
newline), then read those into the shell variables; update the code around
variables completed, failed, running to use a single python3 invocation that
returns defaults of 0 when keys are missing and assign with read -r completed
failed running (or mapfile) to avoid repeated JSON parsing.
- Around line 46-48: The script uses eval to assign dynamic variables (e.g.,
eval "${mode}_elapsed=$elapsed", eval "${mode}_completed=$completed", eval
"${mode}_failed=$failed") which is fragile and can lead to injection; replace
these eval assignments with a safer approach such as using declare to create the
dynamic variable names (declare "${mode}_elapsed=$elapsed" etc.) or, better,
refactor to store results in an associative array (e.g.,
results[${mode}_elapsed]=$elapsed) and update the code to read from that array;
apply the same change for the corresponding eval calls at the other location
(lines 54-56).
- Around line 24-25: The script assigns the result of convex_run
"benchmark:startBenchmark" to start_out but never uses it; either remove
start_out and just call convex_run for side effects, or capture and use it
(e.g., parse and echo a run ID/info). Update the invocation around start_out and
ensure the symbol start_out (and the convex_run call to
"benchmark:startBenchmark") is either removed or its value is logged/consumed
(for example echoing the returned run ID together with mode and COUNT) so the
assignment is meaningful.

In `@scripts/benchmark_workflow_small.sh`:
- Around line 37-50: The script's run_one function currently writes stderr to a
fixed /tmp/workflow_bench_err.log which can be clobbered by concurrent runs;
update run_one to create a per-run temporary error log (e.g., use mktemp or
include $$/PID in the filename), store that temp path in a local variable (e.g.,
err_log), and replace all references to /tmp/workflow_bench_err.log with that
variable (including cleanup at the end of run_one); ensure the temp file is
created before the loop and removed on exit so concurrent runs do not overwrite
each other's logs.

In `@scripts/check_regular.cjs`:
- Line 2: The script currently hardcodes the input file path in the call to
fs.readFileSync (const d = fs.readFileSync("/tmp/jokebattle_data.json",
"utf8")), making it brittle; change this to read the path from process.argv[2]
with a sensible default (e.g. use process.argv[2] ||
"/tmp/jokebattle_data.json"), update any related variable names that consume the
file contents (e.g. d) and ensure the script prints a helpful usage message or
default note when no argument is provided; keep the fs.readFileSync call but
replace the literal path with the variable so the file path can be passed via
CLI.

In `@scripts/run_joke_batched_trace.sh`:
- Line 5: The DEBUG_TRACE variable is inserted as a raw JSON literal and accepts
arbitrary input; coerce/validate it to a JSON boolean before use. Replace the
simple default assignment of DEBUG_TRACE with a small normalization block that
maps common truthy values (e.g., 1, "1", yes, y, true, TRUE) to the literal true
and everything else to false (for example using a case/regex check or
if-statement), and then use that normalized DEBUG_TRACE variable in the JSON
argument construction so the resulting JSON contains only the literal true or
false; reference the DEBUG_TRACE variable and the JSON argument construction
where the variable is interpolated.

In `@scripts/soak_workflow_20.sh`:
- Around line 22-25: The json_field function currently interpolates the field
name directly into the inline Python code (json_field), which is fragile; change
it to pass the field name as an argument via sys.argv to avoid injecting shell
content into the Python snippet, e.g. call python3 with the field as an argv
parameter and use sys.argv[1] inside the Python code, and update any callers of
json_field if needed to continue passing the field name as the first parameter.
- Around line 41-67: The guard at the end using the done flag is dead code
because the while true loop always sets done=1 before breaking; remove the final
check/increment line ("[[ \"$done\" -eq 1 ]] || failures=$((failures + 1))") so
failures is only updated where the loop exits (inside the non-running and
timeout branches). Locate the while true loop that calls run_convex
llmSimulation:benchmarkStatus and updates status, elapsed_s, run_secs and
failures, and delete the trailing dead-code guard referencing done.

In `@scripts/watch_workflow_debug.sh`:
- Around line 7-22: Move the json_last function out of each script into a single
shared helper file (e.g., create scripts/lib/common.sh) that defines json_last
exactly once, then update scripts/watch_workflow_debug.sh,
scripts/benchmark_workflow_small.sh, and scripts/benchmark_joke_battle.sh to
source that helper (source "scripts/lib/common.sh") and remove the duplicated
function bodies; ensure the helper is executable/readable and that the function
name json_last remains unchanged so callers keep working.

In `@src/client/step.ts`:
- Around line 191-205: The current base-name matching logic (involving
safeFunctionName(target.function), splitting on /[:\/]/ and comparing the last
segment against this.batchActionNames) can produce false positives across
modules; to fix, narrow matching by preferring full-qualified names first and
falling back to base-name only when necessary: update the detection in the code
that sets batchActionName to first check this.batchActionNames.has(fnName) (the
full safeFunctionName), then if not found check the existing base-name fallback
(parts[parts.length-1]) and only set batchActionName to that when explicitly
required; ensure this preserves behavior for target.kind === "function" and
target.functionType === "action" and keep the variable name batchActionName
unchanged.

In `@src/client/workflowMutation.ts`:
- Around line 45-49: Replace the two trailing optional positional parameters on
workflowMutation (defaultWorkpoolOptions and batchActionNames) with a single
optional options object to improve ergonomics: change the signature of
workflowMutation to accept an options?: { defaultWorkpoolOptions?:
WorkpoolOptions; batchActionNames?: Set<string> } (or similar), update the body
to read options.defaultWorkpoolOptions and options.batchActionNames, and update
all call sites to pass an options object instead of using positional undefined
placeholders; optionally add a lightweight overload or compatibility branch in
workflowMutation to accept the old positional form and map it to the new options
shape while emitting a deprecation note to callers.

In `@src/component/workflow.ts`:
- Around line 330-357: The timeline steps can appear in nondeterministic DB
order; before mapping stepDocs into the steps array in the Promise.all mapping
(the async wf => { ... } block that builds page), sort the collected stepDocs by
their stepNumber (or sort the produced steps array by stepNumber) to ensure
stable ordering for the timeline visualization; update the code around the
stepDocs handling (the stepDocs variable and the steps: stepDocs.map(...) block)
to perform a numeric sort by stepNumber prior to mapping.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1552ab and b5cd3ea.

⛔ Files ignored due to path filters (5)

example/convex/_generated/api.d.ts is excluded by !**/_generated/**
package-lock.json is excluded by !**/package-lock.json
src/component/_generated/api.ts is excluded by !**/_generated/**
src/component/_generated/component.ts is excluded by !**/_generated/**
src/component/_generated/server.ts is excluded by !**/_generated/**

📒 Files selected for processing (78)

.mcp.json
1cd
1npm
benchmark_results/batched100_logs_20260215_103112.jsonl
benchmark_results/batched100_logs_filtered_20260215_103112.txt
benchmark_results/batched100_status_20260215_103112.json
benchmark_results/batched100_timeline_20260215_103112.json
benchmark_results/batched100_trace_20260215_103112.txt
benchmark_results/batched_1000_logs_20260215_104803.txt
benchmark_results/batched_1000_logs_20260215_105644.txt
benchmark_results/batched_1000_logs_20260215_111119.txt
benchmark_results/batched_1000_logs_20260215_111510.txt
benchmark_results/batched_1000_logs_20260215_112033.txt
benchmark_results/batched_1000_logs_20260215_112330.txt
benchmark_results/batched_1000_logs_20260215_113347.txt
benchmark_results/batched_1000_logs_20260215_115155.txt
benchmark_results/batched_1000_logs_20260215_115704.txt
benchmark_results/batched_1000_status_20260215_104803.json
benchmark_results/batched_1000_status_20260215_104803.txt
benchmark_results/batched_1000_status_20260215_105644.json
benchmark_results/batched_1000_status_20260215_105644.txt
benchmark_results/batched_1000_status_20260215_111119.json
benchmark_results/batched_1000_status_20260215_111119.txt
benchmark_results/batched_1000_status_20260215_111510.json
benchmark_results/batched_1000_status_20260215_111510.txt
benchmark_results/batched_1000_status_20260215_112033.json
benchmark_results/batched_1000_status_20260215_112033.txt
benchmark_results/batched_1000_status_20260215_112330.json
benchmark_results/batched_1000_status_20260215_112330.txt
benchmark_results/batched_1000_status_20260215_113347.json
benchmark_results/batched_1000_status_20260215_113347.txt
benchmark_results/batched_1000_status_20260215_115155.json
benchmark_results/batched_1000_status_20260215_115155.txt
benchmark_results/batched_1000_status_20260215_115704.json
benchmark_results/batched_1000_status_20260215_115704.txt
benchmark_results/batched_1000_timeline_20260215_104803.json
benchmark_results/batched_1000_timeline_20260215_105644.json
benchmark_results/batched_1000_timeline_20260215_111119.json
benchmark_results/batched_1000_timeline_20260215_111510.json
benchmark_results/batched_1000_timeline_20260215_112033.json
benchmark_results/batched_1000_timeline_20260215_112330.json
benchmark_results/batched_1000_timeline_20260215_113347.json
benchmark_results/batched_1000_timeline_20260215_115155.json
benchmark_results/batched_1000_timeline_20260215_115704.json
benchmark_results/workflow_benchmark_small_1000_2026-02-14.txt
benchmark_results/workflow_benchmark_small_1000_2026-02-15_00-08-07.txt
benchmark_results/workflow_benchmark_small_1000_2026-02-15_00-12-56_skipclear.txt
benchmark_results/workflow_benchmark_small_500_2026-02-14.txt
benchmark_results/workflow_benchmark_small_50_2026-02-14.txt
benchmark_results/workflow_results_archive_2026-02-15_00-06-51.txt
benchmark_results/workflow_results_archive_latest.txt
docs/executor-mode-spec.md
docs/executor-mode.md
docs/tuning-guide.md
example/convex/benchmark.ts
example/convex/convex.config.ts
example/convex/http.ts
example/convex/schema.ts
package.json
screenshots/d1024.md
scripts/benchmark_compare.sh
scripts/benchmark_joke_battle.sh
scripts/benchmark_workflow_small.sh
scripts/check_regular.cjs
scripts/run_joke_batched_trace.sh
scripts/soak_workflow_20.sh
scripts/watch_workflow_debug.sh
src/client/index.ts
src/client/step.ts
src/client/workflowMutation.ts
src/component/batch.ts
src/component/coordinator.ts
src/component/event.ts
src/component/journal.ts
src/component/pool.ts
src/component/schema.ts
src/component/taskQueue.ts
src/component/workflow.ts

coderabbitai · 2026-02-24T01:25:34Z

+{
+  "mcpServers": {
+    "hive": {
+      "type": "stdio",
+      "command": "node",
+      "args": [
+        "/Users/magicseth/Projects/claudemanager/dist/main/mcp-server.js"
+      ],
+      "env": {
+        "MCP_SOCKET_PATH": "/Users/magicseth/Library/Application Support/Hive/mcp.sock"
+      }
+    }
+  }


⚠️ Potential issue | 🟠 Major

Avoid committing user-specific absolute paths in .mcp.json.

Lines 7 and 10 hardcode /Users/magicseth/..., which will break for other devs/CI and leaks a local machine path. Consider moving this to a template (e.g., .mcp.example.json + .gitignore) or switching to repo‑relative paths and externally supplied environment variables.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.mcp.json around lines 1 - 13, The .mcp.json currently contains user-specific absolute paths (the "args" entry pointing to "/Users/magicseth/Projects/claudemanager/dist/main/mcp-server.js" and the "env" value for "MCP_SOCKET_PATH"), so replace those with non-user-specific defaults and move this file to a template: create a .mcp.example.json that uses repo-relative paths (e.g., "args": ["./dist/main/mcp-server.js"]) and a placeholder or env interpolation for MCP_SOCKET_PATH (e.g., use an explicitly documented placeholder value or reference an env var like "${MCP_SOCKET_PATH}"), update README with instructions for creating a local .mcp.json from .mcp.example.json, and add the real .mcp.json to .gitignore so developers/CI can supply machine-specific paths without committing them.

coderabbitai · 2026-02-24T01:25:34Z

+error: too many arguments for 'dev'. Expected 0 arguments but got 2.
+error: too many arguments for 'dev'. Expected 0 arguments but got 2.
+
+
+Usage: convex dev [options]
+
+Develop against a dev deployment, watching for changes
+
+  1. Configures a new or existing project (if needed)
+  2. Updates generated types and pushes code to the configured dev deployment
+  3. Runs the provided command (if `--run` or `--run-sh` is used)
+  4. Watches for file changes, and repeats step 2
+
+
+Options:
+  -v, --verbose                   Show full listing of changes
+  --typecheck <mode>              Check TypeScript files with `tsc --noEmit`.
+                                  (choices: "enable", "try", "disable", default:
+                                  "try")
+  --typecheck-components          Check TypeScript files within component
+                                  implementations with `tsc --noEmit`. (default:
+                                  false)
+  --codegen <mode>                Regenerate code in `convex/_generated/`
+                                  (choices: "enable", "disable", default:
+                                  "enable")
+  --once                          Execute only the first 3 steps, stop on any
+                                  failure (default: false)
+  --until-success                 Execute only the first 3 steps, on failure
+                                  watch for local and remote changes and retry
+                                  steps 2 and 3 (default: false)
+  --run <functionName>            The identifier of the function to run in step
+                                  3, like `api.init.createData` or
+                                  `myDir/myFile:myFunction`
+  --run-component <functionName>  If --run is used and the function is in a
+                                  component, the path the component tree defined
+                                  in convex.config.ts. Components are a beta
+                                  feature. This flag is unstable and may change
+                                  in subsequent releases.
+  --run-sh <command>              A shell command to run in step 3, like `node
+                                  myScript.js`. If you just want to run a Convex
+                                  function, use `--run` instead.
+  --tail-logs [mode]              Choose whether to tail Convex function logs in
+                                  this terminal (choices: "always",
+                                  "pause-on-deploy", "disable", default:
+                                  "pause-on-deploy")
+  --configure [choice]            Ignore existing configuration and configure
+                                  new or existing project, interactively or set
+                                  by --team <team_slug>, --project
+                                  <project_slug>, and --dev-deployment
+                                  local|cloud (choices: "new", "existing")
+  --env-file <envFile>            Path to a custom file of environment
+                                  variables, for choosing the deployment, e.g.
+                                  CONVEX_DEPLOYMENT or CONVEX_SELF_HOSTED_URL.
+                                  Same format as .env.local or .env files, and
+                                  overrides them.
+  -h, --help                      display help for command
+Usage: convex dev [options]
+
+Develop against a dev deployment, watching for changes
+
+  1. Configures a new or existing project (if needed)
+  2. Updates generated types and pushes code to the configured dev deployment
+  3. Runs the provided command (if `--run` or `--run-sh` is used)
+  4. Watches for file changes, and repeats step 2
+
+
+Options:
+  -v, --verbose                   Show full listing of changes
+  --typecheck <mode>              Check TypeScript files with `tsc --noEmit`.
+                                  (choices: "enable", "try", "disable", default:
+                                  "try")
+  --typecheck-components          Check TypeScript files within component
+                                  implementations with `tsc --noEmit`. (default:
+                                  false)
+  --codegen <mode>                Regenerate code in `convex/_generated/`
+                                  (choices: "enable", "disable", default:
+                                  "enable")
+  --once                          Execute only the first 3 steps, stop on any
+                                  failure (default: false)
+  --until-success                 Execute only the first 3 steps, on failure
+                                  watch for local and remote changes and retry
+                                  steps 2 and 3 (default: false)
+  --run <functionName>            The identifier of the function to run in step
+                                  3, like `api.init.createData` or
+                                  `myDir/myFile:myFunction`
+  --run-component <functionName>  If --run is used and the function is in a
+                                  component, the path the component tree defined
+                                  in convex.config.ts. Components are a beta
+                                  feature. This flag is unstable and may change
+                                  in subsequent releases.
+  --run-sh <command>              A shell command to run in step 3, like `node
+                                  myScript.js`. If you just want to run a Convex
+                                  function, use `--run` instead.
+  --tail-logs [mode]              Choose whether to tail Convex function logs in
+                                  this terminal (choices: "always",
+                                  "pause-on-deploy", "disable", default:
+                                  "pause-on-deploy")
+  --configure [choice]            Ignore existing configuration and configure
+                                  new or existing project, interactively or set
+                                  by --team <team_slug>, --project
+                                  <project_slug>, and --dev-deployment
+                                  local|cloud (choices: "new", "existing")
+  --env-file <envFile>            Path to a custom file of environment
+                                  variables, for choosing the deployment, e.g.
+                                  CONVEX_DEPLOYMENT or CONVEX_SELF_HOSTED_URL.
+                                  Same format as .env.local or .env files, and
+                                  overrides them.
+  -h, --help                      display help for command


⚠️ Potential issue | 🟡 Minor

Deduplicate or drop this CLI output artifact.

This file appears to be duplicated help output (entire help block repeated). If it’s meant to be checked in, trim to a single occurrence; otherwise consider removing the file to avoid noise.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@1npm` around lines 1 - 108, The file contains a duplicated CLI help output for "convex dev" (the entire help block is repeated); remove the duplicate so only one instance of the help text remains (or delete the file if this output should not be committed), ensuring the single retained block includes the full options list and usage lines for "convex dev".

coderabbitai · 2026-02-24T01:25:34Z

+{
+  "batched": {
+    "batchedMaxConcurrencyPerWorker": 1000,
+    "batchedMaxParallelism": 100,
+    "batchedMaxWorkers": 2,
+    "completedAt": 1771183136898,
+    "completedWorkflows": 1000,
+    "concurrency": [],
+    "count": 1000,
+    "durationMs": 0,
+    "elapsedMs": 0,
+    "failedWorkflows": 0,
+    "itemRows": [],
+    "jokesDone": 0,
+    "jokesTotal": 2000,
+    "judgesDone": 0,
+    "judgesTotal": 1000,
+    "maxConcurrency": 0,
+    "mode": "batched",
+    "picksDone": 0,
+    "picksTotal": 1000,
+    "regularMaxParallelism": 50,
+    "result": null,
+    "retry": {
+      "base": 2,
+      "initialBackoffMs": 200,
+      "maxAttempts": 4
+    },
+    "runId": "jh79gh7s96qn90zp247mh68s5d81691f",
+    "startedAt": 1771182912234,
+    "status": "completed",
+    "steps": [],
+    "tieCount": 316,
+    "timelineEnd": 1771182912234,
+    "timelineStart": 1771182912234,
+    "winner1Count": 315,
+    "winner2Count": 369
+  },
+  "count": 1000,
+  "globalTimelineEnd": 1771182912234,
+  "globalTimelineStart": 1771182912234,
+  "regular": null
+}


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for code that writes or generates timeline snapshots rg -n --type=ts --type=js 'timelineStart|timelineEnd|durationMs|itemRows|concurrency' -A 3 -B 3 | head -100

Repository: get-convex/workflow

Length of output: 3774

🏁 Script executed:

# Find files in benchmark_results to check if this is a pattern find . -name "*timeline*.json" -type f | head -20

Repository: get-convex/workflow

Length of output: 690

🏁 Script executed:

# Search for code that generates or populates timeline data rg -n --type=ts --type=js 'jokesDone|judgesDone|picksDone|completedWorkflows' -B 2 -A 2 | head -100

Repository: get-convex/workflow

Length of output: 45

🏁 Script executed:

# Look for visualization or chart code that uses these timeline fields rg -n --type=ts --type=jsx --type=tsx 'concurrency.*render|itemRows.*render|steps.*render|chart' -B 2 -A 2 | head -80

Repository: get-convex/workflow

Length of output: 90

🏁 Script executed:

# Search for code that writes JSON files named timeline rg -n --type=ts --type=js 'timeline.*json|writeFileSync|writeFile' -B 3 -A 3 | head -120

Repository: get-convex/workflow

Length of output: 45

🏁 Script executed:

# Look for test or benchmark directories find . -type d -name '*test*' -o -type d -name '*benchmark*' | head -20

Repository: get-convex/workflow

Length of output: 82

🏁 Script executed:

# Check if there's benchmark code that generates these timeline files fd -t f -e ts -e js . | grep -i benchmark | head -20

Repository: get-convex/workflow

Length of output: 90

🏁 Script executed:

# Examine the actual structure of a couple timeline files to compare cat ./benchmark_results/batched100_timeline_20260215_103112.json

Repository: get-convex/workflow

Length of output: 50376

🏁 Script executed:

# Look for where completedAt, startedAt, durationMs are used in timeline context rg -n --type=ts 'completedAt|startedAt' -B 2 -A 2 | grep -A 10 -B 10 'timeline' | head -80

Repository: get-convex/workflow

Length of output: 45

🏁 Script executed:

# Look at the benchmark code that generates these files cat -n example/convex/benchmark.ts | head -200

Repository: get-convex/workflow

Length of output: 8981

🏁 Script executed:

# Search for timeline snapshot generation in the benchmark rg -n 'timeline.*json|writeFile|snapshot' example/convex/benchmark.ts -A 5 -B 5

Repository: get-convex/workflow

Length of output: 45

🏁 Script executed:

# Check if there's any batch-size-dependent logic rg -n 'batched.*1000|count.*1000|batchedMaxWorkers\|batchedMaxParallelism' example/convex/benchmark.ts -B 3 -A 3

Repository: get-convex/workflow

Length of output: 45

🏁 Script executed:

# Search for where timeline snapshots are written to JSON rg -n 'writeFileSync|writeFile|JSON.stringify' example/convex/ -B 5 -A 5 | head -150

Repository: get-convex/workflow

Length of output: 796

🏁 Script executed:

# Look for Timeline type or snapshot-related code in the entire codebase rg -n 'Timeline|timeline.*json|concurrency.*record|itemRows' src/ -B 2 -A 2 | head -100

Repository: get-convex/workflow

Length of output: 45

🏁 Script executed:

# Check if there's test/bench code that generates these files fd -t f -e ts -e js . | xargs grep -l 'timeline.*json\|writeFile.*timeline' 2>/dev/null | head -10

Repository: get-convex/workflow

Length of output: 45

🏁 Script executed:

# Search more broadly for code that populates the fields mentioned in the review rg -n 'durationMs.*0|jokesDone|judgesDone|picksDone|maxConcurrency' -B 3 -A 3 | head -80

Repository: get-convex/workflow

Length of output: 6297

Timeline snapshot fields are not being populated for larger batched runs — confirmed recording bug.

The batched 100-item run properly records timeline data (jokesDone: 200, judgesDone: 100, picksDone: 100, maxConcurrency: 200), but all 1000-item batched runs consistently show zero values for these fields (jokesDone: 0, judgesDone: 0, picksDone: 0, maxConcurrency: 0), empty arrays for concurrency and itemRows, and durationMs and elapsedMs both zero. All 1000-item runs also return result: null, indicating incomplete or failed state.

This pattern is 100% reproducible across 7 separate 1000-item benchmark runs. Any visualization code consuming concurrency, itemRows, steps, durationMs, elapsedMs, or maxConcurrency will receive no data for these larger batches, potentially causing silent failures or empty charts.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@benchmark_results/batched_1000_timeline_20260215_111510.json` around lines 1 - 43, The batched run timeline fields (batched.jokesDone, judgesDone, picksDone, maxConcurrency, concurrency, itemRows, durationMs, elapsedMs, steps, result) are not being populated for large batches; inspect and fix the timeline aggregation path (e.g., finalizeBatchedRun / recordTimeline / aggregateWorkerTimeline) to ensure it iterates over all items/workers instead of being truncated by a hardcoded cap or page size, correctly accumulates counts and maxConcurrency, appends all itemRows and concurrency samples, computes durationMs/elapsedMs from startedAt/completedAt using safe numeric types, and sets result when aggregation completes; update any early-return conditions or try/catch that swallow errors so the batched object is assigned the computed values for the runId.

coderabbitai · 2026-02-24T01:25:34Z

+=== workflow small benchmark ===
+section_count=1000
+[CONVEX ?(llmSimulation:clearAll)] [ERROR] 'Simulation not found: j9712dprjeky0709w2098prf65816c0h'
+[CONVEX ?(llmSimulation:clearAll)] [ERROR] 'Simulation not found: j97507jy904tpv1erkkkzgt1yx8172qh'


⚠️ Potential issue | 🟡 Minor

Clarify failed benchmark output before archiving.

Lines 3-4 show “Simulation not found” errors. If this run is incomplete, consider moving it to a failed/partial folder or annotating the archive so it isn’t mistaken for a successful benchmark.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@benchmark_results/workflow_benchmark_small_1000_2026-02-15_00-08-07.txt` around lines 1 - 4, This benchmark run shows errors from the CONVEX cleanup step (CONVEX ?(llmSimulation:clearAll)) reporting "Simulation not found" for IDs 'j9712dprjeky0709w2098prf65816c0h' and 'j97507jy904tpv1erkkkzgt1yx8172qh'; before archiving, either move this file out of the successful archive into a failed/partial folder or update the file/metadata to annotate the run as incomplete and include the exact error lines and the simulation IDs so consumers won't treat this as a successful workflow_small benchmark (section_count=1000).

coderabbitai · 2026-02-24T01:25:34Z

+generated_at=2026-02-15 00:23:43 PST
+repo=/Users/magicseth/Projects/workflow
+branch=llm-simulation-demo


⚠️ Potential issue | 🟡 Minor

Avoid committing absolute local paths.
Line 3 includes /Users/..., which leaks a local filesystem path and makes the artifact less portable. Consider stripping it or replacing with a repo-relative identifier.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@benchmark_results/workflow_results_archive_latest.txt` around lines 2 - 4, The file contains an absolute local path in the repo metadata (the "repo=" line) which leaks local filesystem info; change the "repo=" value in benchmark_results/workflow_results_archive_latest.txt to a repo-relative identifier or a sanitized placeholder (e.g., repository name or commit hash) instead of an absolute /Users/... path so the artifact is portable and non-identifying.

coderabbitai · 2026-02-24T01:25:35Z

+  python3 -c '
+import json, sys
+d = json.load(sys.stdin)
+print("pending_total={} workflow_running={} workflow_done={}".format(
+    d["pendingTotal"], d["workflowRunning"], d["workflowDone"]
+))
+print("pending_by_slot:", ", ".join("s{}={}".format(p["slot"], p["pending"]) for p in d["pendingBySlot"]))
+print("--- simulations ---")
+for s in d["simulations"]:
+    elapsed = s["elapsedMs"] / 1000.0
+    print("{} mode={} status={} elapsed={:.1f}s outline={} sections={}".format(
+        s["id"], s["mode"], s["status"], elapsed, s["outlineCount"], s["sectionCount"]
+    ))
+' <<< "$json"
+  sleep "$INTERVAL_SECS"


⚠️ Potential issue | 🟠 Major

Python parse failure under set -e kills the entire monitoring loop.

With set -euo pipefail active, any KeyError (e.g., debugPoolView schema change or partial response) exits the python3 command with a non-zero code, which propagates out of the while true loop and terminates the script rather than retrying on the next interval.

🛡️ Proposed fix – guard the parse step and continue on failure

- python3 -c ' + if ! python3 -c ' import json, sys d = json.load(sys.stdin) print("pending_total={} workflow_running={} workflow_done={}".format( d["pendingTotal"], d["workflowRunning"], d["workflowDone"] )) print("pending_by_slot:", ", ".join("s{}={}".format(p["slot"], p["pending"]) for p in d["pendingBySlot"])) print("--- simulations ---") for s in d["simulations"]: elapsed = s["elapsedMs"] / 1000.0 print("{} mode={} status={} elapsed={:.1f}s outline={} sections={}".format( s["id"], s["mode"], s["status"], elapsed, s["outlineCount"], s["sectionCount"] )) -' <<< "$json" +' <<< "$json"; then + echo "parse error – unexpected JSON shape" + echo "$json" + fi sleep "$INTERVAL_SECS"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

python3 -c '

import json, sys

d = json.load(sys.stdin)

print("pending_total={} workflow_running={} workflow_done={}".format(

d["pendingTotal"], d["workflowRunning"], d["workflowDone"]

))

print("pending_by_slot:", ", ".join("s{}={}".format(p["slot"], p["pending"]) for p in d["pendingBySlot"]))

print("--- simulations ---")

for s in d["simulations"]:

elapsed = s["elapsedMs"] / 1000.0

print("{} mode={} status={} elapsed={:.1f}s outline={} sections={}".format(

s["id"], s["mode"], s["status"], elapsed, s["outlineCount"], s["sectionCount"]

))

' <<< "$json"

sleep "$INTERVAL_SECS"

if ! python3 -c '

import json, sys

d = json.load(sys.stdin)

print("pending_total={} workflow_running={} workflow_done={}".format(

d["pendingTotal"], d["workflowRunning"], d["workflowDone"]

))

print("pending_by_slot:", ", ".join("s{}={}".format(p["slot"], p["pending"]) for p in d["pendingBySlot"]))

print("--- simulations ---")

for s in d["simulations"]:

elapsed = s["elapsedMs"] / 1000.0

print("{} mode={} status={} elapsed={:.1f}s outline={} sections={}".format(

s["id"], s["mode"], s["status"], elapsed, s["outlineCount"], s["sectionCount"]

))

' <<< "$json"; then

echo "parse error – unexpected JSON shape"

echo "$json"

fi

sleep "$INTERVAL_SECS"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/watch_workflow_debug.sh` around lines 43 - 57, The inline Python block invoked with python3 -c that reads json into d and prints fields (d["pendingTotal"], d["pendingBySlot"], and iterates d["simulations"]) can raise KeyError/Exception and, under set -euo pipefail, will exit the whole script; update that block to guard the parse/print work with a try/except that catches Exception (including KeyError/JSONDecodeError), writes a short diagnostic to stderr, and exits with code 0 so the surrounding while true loop continues; locate the python3 -c '... d = json.load(sys.stdin) ... print("pending_total=...") ...' block and add the exception handling around the parsing/printing logic.

coderabbitai · 2026-02-24T01:25:35Z

+// BatchWorkpool not yet exported from @convex-dev/workpool@0.3.1
+type BatchWorkpool = any;


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, check the current state of src/client/index.ts around lines 6-7 cat -n src/client/index.ts | head -20

Repository: get-convex/workflow

Length of output: 734

🏁 Script executed:

# Check package.json to see workpool version cat package.json | grep -A 5 -B 5 "workpool"

Repository: get-convex/workflow

Length of output: 630

🏁 Script executed:

# Search for any workpool imports or usages in the codebase rg "workpool" --type ts --type js -i

Repository: get-convex/workflow

Length of output: 5957

🌐 Web query:

@convex-dev/workpool BatchWorkpool export TypeScript type 0.3.1 latest version

💡 Result:

@convex-dev/workpool v0.3.1 is a real published version (its package.json shows "version": "0.3.1"). [1]

In v0.3.1, the client API exports a Workpool class (with batching methods like enqueueActionBatch, enqueueMutationBatch, statusBatch, etc.), but there is no exported TypeScript symbol named BatchWorkpool in dist/client/index.d.ts. [3]

Latest version is not 0.3.1: UNPKG’s version selector shows latest (0.4.0) (and also lists 0.3.2). [2]

Sources:
[1] [3] [2]

Define a local BatchWorkpool interface instead of any.
BatchWorkpool is not exported from @convex-dev/workpool@0.3.1. Using any loses type safety and hides API drift. Define a minimal local interface based on actual usage, or replace it when the type becomes available in a future workpool release.

Suggested minimal local interface

-// BatchWorkpool not yet exported from `@convex-dev/workpool`@0.3.1 -type BatchWorkpool = any; +// BatchWorkpool not yet exported from `@convex-dev/workpool`@0.3.1 +type BatchWorkpool = { + action: <Args extends DefaultFunctionArgs = any, Returns = any>( + name: string, + opts: { + args: Record<string, Validator<any, any, any>>; + handler: ( + ctx: GenericActionCtx<GenericDataModel>, + args: Args, + ) => Promise<Returns>; + }, + ) => RegisteredAction<"internal", Args, Returns>; + enqueueByHandle: ( + ctx: GenericMutationCtx<GenericDataModel>, + name: string, + args: DefaultFunctionArgs, + options: { onComplete: { fnHandle: string; context: unknown } }, + ) => Promise<string>; +};

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/client/index.ts` around lines 6 - 7, Replace the permissive "type BatchWorkpool = any" with a minimal local interface named BatchWorkpool that declares only the methods/properties your code actually calls (e.g., the run/submit/close/stop signatures or iterator/length properties used elsewhere); update src/client/index.ts to export or use that interface in place of any so callers get type safety, and add a TODO comment to remove this local interface once the official BatchWorkpool type is exported by `@convex-dev/workpool`. Ensure the interface method names/signatures exactly match uses in functions that accept a BatchWorkpool so the compiler will catch API drift.

coderabbitai · 2026-02-24T01:25:35Z

+export async function ensureCoordinatorRunning(ctx: MutationCtx) {
+  const state = await ctx.db.query("coordinatorState").first();
+  if (state?.scheduled) {
+    return;
+  }
+  if (state) {
+    await ctx.db.patch(state._id, { scheduled: true });
+  } else {
+    await ctx.db.insert("coordinatorState", { scheduled: true });
+  }
+  await ctx.scheduler.runAfter(0, internal.coordinator.coordinator);
+}


⚠️ Potential issue | 🟠 Major

Guard against multiple coordinatorState rows.
Lines 10-19 insert coordinatorState if missing; concurrent callers can insert duplicates and leave a stale scheduled: true, which can block future scheduling or spawn multiple coordinators. Consider enforcing a singleton (fixed ID/config entry) or cleaning duplicates.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/component/coordinator.ts` around lines 10 - 21, The ensureCoordinatorRunning function can create duplicate coordinatorState rows under concurrency; change the logic to enforce a singleton state record: use a fixed unique key/ID for the coordinator state (instead of blind insert) and perform an atomic upsert/conditional update so only one record is created and its scheduled flag is set (replace the current ctx.db.insert call with an upsert or a transaction that patches-or-inserts by the fixed _id); also add a small cleanup step (query all coordinatorState rows and collapse/delete duplicates, keeping the single canonical record) to guard existing databases. Target symbols: ensureCoordinatorRunning, coordinatorState query/first(), ctx.db.patch, ctx.db.insert and the scheduler call internal.coordinator.coordinator.

coderabbitai · 2026-02-24T01:25:35Z

+  const console = createLogger(DEFAULT_LOG_LEVEL);
  const stepId =
    "stepId" in args.context && typeof args.context.stepId === "string"


⚠️ Potential issue | 🟡 Minor

Preserve configurable log levels in onCompleteHandler.
Line 97 hard-codes DEFAULT_LOG_LEVEL, which bypasses any per-workflow/workpool logLevel settings. Consider plumbing the optional log level from the onComplete context so verbosity stays configurable.

Suggested fix

- const console = createLogger(DEFAULT_LOG_LEVEL); + const context = args.context as Partial<OnCompleteContext>; + const console = createLogger( + context.workpoolOptions?.logLevel ?? DEFAULT_LOG_LEVEL, + );

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/component/pool.ts` around lines 97 - 99, The onCompleteHandler currently creates a logger with a hard-coded DEFAULT_LOG_LEVEL (const console = createLogger(DEFAULT_LOG_LEVEL)), which ignores any per-workflow/workpool logLevel; update onCompleteHandler to read an optional log level from the provided context (e.g., args.context.logLevel or args.context.onComplete?.logLevel) and pass that into createLogger instead of DEFAULT_LOG_LEVEL, falling back to DEFAULT_LOG_LEVEL only when the context value is absent; adjust variable references around createLogger, console and any callers in onCompleteHandler to use the resolved log level so verbosity remains configurable.

coderabbitai · 2026-02-24T01:25:36Z

@@ -0,0 +1,454 @@
+import { v } from "convex/values";
+import { vResultValidator } from "@convex-dev/workpool";
+import { mutation, query, type MutationCtx } from "./_generated/server.js";


⚠️ Potential issue | 🟡 Minor

Remove unused MutationCtx import (lint warning).

Suggested fix

-import { mutation, query, type MutationCtx } from "./_generated/server.js"; +import { mutation, query } from "./_generated/server.js";

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import { mutation, query, type MutationCtx } from "./_generated/server.js";

import { mutation, query } from "./_generated/server.js";

🧰 Tools

🪛 GitHub Check: Test and lint

[warning] 3-3:
'MutationCtx' is defined but never used. Allowed unused vars must match /^_/u

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/component/taskQueue.ts` at line 3, The import statement in taskQueue.ts currently imports MutationCtx but it is unused; remove MutationCtx from the named imports in the import from "./_generated/server.js" (i.e., update the import that includes mutation and query so it no longer imports MutationCtx) to satisfy the linter and eliminate the unused-symbol warning.

sethconvex · 2026-02-24T07:18:09Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

chore: executor benchmarks, scripts, docs, and visualization #209 👈 (View in Graphite)
feat: executor mode with query/mutation routing and per-step retries #210 : 1 other dependent PR (#211 )
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

Benchmark results, tuning docs, example benchmark/viz endpoints, and helper scripts for the executor mode implementation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Replace fire-and-forget scheduler.runAfter safety net with durable replayQueue table - recordResultBatch: insert replay entry → inline replay → delete on success - processReplayBatch runs sequentially in flush loop (OCC-free) - Remove eager replayQueue cleanup from completeHandler (avoids OCC with executors) - Add bumpEpoch (stop executors), clearReplayQueue, clearTaskQueue utilities - Lower MAX_CONCURRENCY=50, CLAIM_LIMIT=200 for real API calls - Simplify Haiku benchmark to "hello world" (minimal tokens) - 19825/20000 real Haiku benchmark: 0 stuck, 175 failed (API transients) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- CLAIM_LIMIT=50 (matches MAX_CONCURRENCY) so executors re-query the task queue frequently, picking up later steps for earlier workflows instead of always grabbing step-0 tasks for newer workflows. - Task queue indexed by [shard, workflowCreatedAt] (ascending) so tasks for earlier-created workflows are always claimed first. This gives FIFO completion ordering: the first 2k of 20k workflows finish in ~4 min median while the last 2k take ~20 min. - 50 concurrency × 100 shards = 5000 concurrent slots. This works because real-world throughput is gated by the LLM API (Anthropic), not local compute. Higher per-shard concurrency wastes V8 memory (64 MB limit) holding idle HTTP connections. - failPendingTasks mutation: force-fails all queued tasks in a shard, marks steps as failed, and inserts replay entries so workflows complete (as failures) rather than getting stuck forever. - failAllPendingTasks action: iterates all 100 shards to nuke the entire task queue — useful for clearing stale work after crashes. - priorityAnalysis action: measures whether FIFO ordering is working by bucketing workflows into creation-time deciles and comparing median completion times. Benchmark results (20k real Claude Haiku workflows): 19,886 completed, 114 failed (0.57%), 0 stuck p50=12.1min, p90=20.8min, p99=22.1min, slowest=26.6min Earliest 2k workflows: median 4.3 min to complete Latest 2k workflows: median 20.1 min to complete Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Self-rescheduling watchdog mutation checks each shard's task/replay queues every 30s. If tasks are older than 60s, the shard's executor is presumed dead and a replacement is scheduled at the current epoch. Watchdog starts automatically with startExecutors and stops on bumpEpoch. Also adds mode/count badge to benchmark viz (green for real, gray for simulated) via URL query params. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… name - Schedule successor at RESCHEDULE_MS (no jitter) so it's running before the original stops claiming at RESCHEDULE_MS + jitter. Eliminates the ~2min gap when rate-limited tasks block drainInFlight. - Reduce RESCHEDULE_MS from 5min to 3min for headroom under 10min timeout. - Batch watchdog into 25-shard ticks to stay under 32k doc read limit. - Viz: derive WF_NAME from mode URL param so standard runs render correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai Bot reviewed Feb 24, 2026

View reviewed changes

sethconvex changed the base branch from main to graphite-base/209 February 24, 2026 07:17

sethconvex force-pushed the graphite-base/209 branch from b1552ab to 372547f Compare February 24, 2026 07:17

sethconvex force-pushed the workflow-v2-perf branch from b5cd3ea to 801c4b2 Compare February 24, 2026 07:17

sethconvex changed the base branch from graphite-base/209 to executor-mode-core February 24, 2026 07:18

sethconvex mentioned this pull request Feb 24, 2026

feat: executor mode with query/mutation routing and per-step retries #210

Open

sethconvex changed the title ~~Workflow v2 perf~~ chore: executor benchmarks, scripts, docs, and visualization Feb 24, 2026

sethconvex force-pushed the workflow-v2-perf branch 3 times, most recently from 3cb55b4 to a6f0186 Compare February 24, 2026 07:40

sethconvex force-pushed the executor-mode-core branch from 91a3b55 to ff68a69 Compare February 24, 2026 07:40

sethconvex force-pushed the workflow-v2-perf branch 2 times, most recently from 932d4cc to ae2a0a9 Compare February 25, 2026 18:31

Add executor benchmarks, scripts, docs, and visualization

a974d24

Benchmark results, tuning docs, example benchmark/viz endpoints, and helper scripts for the executor mode implementation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

sethconvex force-pushed the workflow-v2-perf branch from ae2a0a9 to a974d24 Compare February 25, 2026 20:09

sethconvex and others added 2 commits February 25, 2026 22:36

sethconvex changed the base branch from executor-mode-core to graphite-base/209 February 26, 2026 16:30

This was referenced Feb 26, 2026

FIFO workflow completion, durable replay queue, and failPendingTasks #211

Open

Add executor benchmarks, viz, and diagnostic tools #212

Open

sethconvex and others added 4 commits February 26, 2026 08:36

[checkpoint] Manual: Agent "Agent (main)" completed

47200ac

[checkpoint] Manual: Agent "Agent (main)" completed

3487c85

		// BatchWorkpool not yet exported from @convex-dev/workpool@0.3.1
		type BatchWorkpool = any;

	import { mutation, query, type MutationCtx } from "./_generated/server.js";
	import { mutation, query } from "./_generated/server.js";

Conversation

sethconvex commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks, Scripts, and Documentation for Executor Mode

Contents

Uh oh!

coderabbitai Bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

pkg-pr-new Bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

sethconvex commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sethconvex commented Feb 24, 2026 •

edited

Loading

coderabbitai Bot commented Feb 24, 2026 •

edited

Loading

pkg-pr-new Bot commented Feb 24, 2026 •

edited

Loading

sethconvex commented Feb 24, 2026 •

edited

Loading