You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(Starlark is designed for hermetic, deterministic execution; go.starlark.net/starlark executes from a string with no FS/net/clock unless you expose it.
Implement tool contract and handler for code.sandbox.starlark.run that executes source from memory: create internal/tools/starlarkrun/handler.go exposing Name() string { return "code.sandbox.starlark.run" } and Call(ctx, json.RawMessage) (ToolResult, error) which accepts {source:string,input:string,limits:{wall_ms:int,output_kb:int},caps:{}}; parse JSON, run Starlark with ExecFile using a Thread and predeclared functions read_input()->str and emit(str); enforce wall-time via context.WithTimeout and cap emitted bytes with a bounded buffer; return {stdout:string}; DoD: unit test passes showing a script emit(read_input()) returns input, times out on while True: pass, and output > limit is truncated with a clear error.
Add dependency and wiring: append require go.starlark.net vX to go.mod, create internal/tools/starlarkrun/module.go registering the tool into the tool registry (constructor + dependency-free init), and update README.md “Tools” table with usage and example input/output; DoD: go build ./... succeeds locally, registry lists the tool, README shows a runnable curl example.
Harden capabilities (deny-by-default): ensure only emit and read_input are available (no FS, net, clock); do not bind any os, time, or custom builtins; add negative tests that attempt to import or access such capabilities and expect failure; DoD: tests demonstrate no ambient side effects are reachable and only declared builtins exist.
Determinism test: add a table test running the same source+input 100× and asserting identical output and errors; DoD: flaky rate 0/100 locally; test name includes “deterministic”; comment cites Starlark determinism; tests green locally; DoD documented in test.
Structured errors & shared schema: return standardized errors {code:string,message:string,details?:object} for timeouts (TIMEOUT), output limit exceeded (OUTPUT_LIMIT), and evaluation failures (EVAL_ERROR); update shared error schema doc and example in README; DoD: unit tests assert JSON shape for each failure mode and README section is present.
Observability: add structured logs (trace id, tool name, wall_ms, bytes_out) and emit OpenTelemetry span attributes; DoD: local run shows JSON logs with those fields and a span named tools.starlark.run.
Contract examples: add docs/interfaces/code.sandbox.starlark.run.md with request/response examples (valid, timeout, error), security notes, and performance caveats; DoD: doc renders and is linked from main docs.
(Starlark is designed for hermetic, deterministic execution;
go.starlark.net/starlarkexecutes from a string with no FS/net/clock unless you expose it.code.sandbox.starlark.runthat executes source from memory: createinternal/tools/starlarkrun/handler.goexposingName() string { return "code.sandbox.starlark.run" }andCall(ctx, json.RawMessage) (ToolResult, error)which accepts{source:string,input:string,limits:{wall_ms:int,output_kb:int},caps:{}}; parse JSON, run Starlark withExecFileusing aThreadandpredeclaredfunctionsread_input()->strandemit(str); enforce wall-time viacontext.WithTimeoutand cap emitted bytes with a bounded buffer; return{stdout:string}; DoD: unit test passes showing a scriptemit(read_input())returns input, times out onwhile True: pass, and output > limit is truncated with a clear error.require go.starlark.net vXtogo.mod, createinternal/tools/starlarkrun/module.goregistering the tool into the tool registry (constructor + dependency-free init), and updateREADME.md“Tools” table with usage and example input/output; DoD:go build ./...succeeds locally, registry lists the tool, README shows a runnable curl example.emitandread_inputare available (no FS, net, clock); do not bind anyos,time, or custom builtins; add negative tests that attempt to import or access such capabilities and expect failure; DoD: tests demonstrate no ambient side effects are reachable and only declared builtins exist.source+input100× and asserting identical output and errors; DoD: flaky rate 0/100 locally; test name includes “deterministic”; comment cites Starlark determinism; tests green locally; DoD documented in test.{code:string,message:string,details?:object}for timeouts (TIMEOUT), output limit exceeded (OUTPUT_LIMIT), and evaluation failures (EVAL_ERROR); update shared error schema doc and example in README; DoD: unit tests assert JSON shape for each failure mode and README section is present.tools.starlark.run.docs/interfaces/code.sandbox.starlark.run.mdwith request/response examples (valid, timeout, error), security notes, and performance caveats; DoD: doc renders and is linked from main docs.