feat: export eval artifacts as json by jlewi · Pull Request #186 · runmedev/web

jlewi · 2026-04-25T19:18:05Z

Summary

write one JSON artifact per eval run plus a summary manifest
print TTFM and TurnTime in the eval runner output
ignore generated eval artifact directories

Validation

runme run build test

Signed-off-by: Jeremy lewi <jeremy@lewi.us>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 82479c38d8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-25T19:25:24Z

+    target.handle &&
+    typeof target.handle.uri === "string" &&
+    target.handle.uri.trim()


Reject explicit invalid notebook targets

This resolver now falls back to the current notebook whenever target is present but has an empty uri/handle.uri, which changes the previous behavior (returning null) into implicit mutation of whatever notebook is currently open. If a tool call or generated script passes { uri: "" } (or an empty handle URI), runme.* operations can act on the wrong notebook instead of failing fast, which can silently corrupt eval results and user state.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-25T19:25:25Z

+  runOrThrow(`agent-browser open ${FRONTEND_URL}`, 30000);
+  run("agent-browser wait 2500", 5000);
+  evaluateJson<boolean>(
+    `JSON.stringify(await window.__runmeEval.waitUntilReady())`,


Wait for eval bridge registration before calling it

The runner invokes window.__runmeEval.waitUntilReady() immediately after a fixed sleep, but __runmeEval is installed from a React useEffect and is not guaranteed to exist by then. On slower page loads this throws before readiness polling even starts, causing flaky hard failures in the eval pipeline; the script should first poll for window.__runmeEval itself, then call waitUntilReady.

Useful? React with 👍 / 👎.

jlewi added 6 commits April 22, 2026 07:05

docs: update minimal evals design guidance

32bea37

Signed-off-by: Jeremy lewi <jeremy@lewi.us>

feat: add minimal browser eval harness

dcabc7f

Signed-off-by: Jeremy lewi <jeremy@lewi.us>

chore: ignore generated eval artifacts

02603f1

Signed-off-by: Jeremy lewi <jeremy@lewi.us>

chore: default eval driver to dedicated frontend port

0f0aea7

Signed-off-by: Jeremy lewi <jeremy@lewi.us>

fix: support live codex wasm evals

6915014

Signed-off-by: Jeremy lewi <jeremy@lewi.us>

feat: export eval artifacts as json

82479c3

Signed-off-by: Jeremy lewi <jeremy@lewi.us>

jlewi requested a review from sourishkrout as a code owner April 25, 2026 19:18

chatgpt-codex-connector Bot reviewed Apr 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: export eval artifacts as json#186

feat: export eval artifacts as json#186
jlewi wants to merge 6 commits into
mainfrom
dev/jlewi/evals2

jlewi commented Apr 25, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jlewi commented Apr 25, 2026

Summary

Validation

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant