Add executor benchmarks, viz, and diagnostic tools#212
Add executor benchmarks, viz, and diagnostic tools#212sethconvex wants to merge 1 commit intoexecutor-perf-implfrom
Conversation
- benchmark.ts: 20k workflow benchmark with simulated and real Claude Haiku modes. Includes workflow creation, executor management, cancel/cleanup, priority analysis, and tail diagnostics. - http.ts: benchmark-viz HTML endpoint with interactive timeline visualization showing per-workflow step progression, concurrent step counts, and shard-sorted views. - failAllPendingTasks action for operational cleanup. - priorityAnalysis action to verify FIFO completion ordering. - diagnoseTail action for analyzing straggler workflows. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
commit: |
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |

Summary
Benchmark and diagnostic tooling for executor-mode workflows:
benchmark.ts: End-to-end benchmark with simulated and real Claude Haiku modes. Creates 10k-20k workflows with a 4-step pipeline (extract → analyze-a ∥ analyze-b → summarize), manages executor lifecycle, and provides cancel/cleanup operations.http.ts: Interactive benchmark-viz HTML endpoint with timeline visualization showing per-workflow step progression, concurrent step area chart, and shard-sorted view.priorityAnalysis: Verifies FIFO completion ordering by bucketing workflows into creation-time deciles and comparing median completion times.diagnoseTail: Analyzes straggler workflows — computes queue wait, execution time, and inter-step gap percentiles for the slowest N workflows.failAllPendingTasks: Iterates all 100 shards to force-fail every queued task for operational cleanup.Test plan
priorityAnalysisconfirms FIFO ordering🤖 Generated with Claude Code