feat: add e2b-haystack integration for E2B cloud sandbox tools#3195
feat: add e2b-haystack integration for E2B cloud sandbox tools#3195bogdankostic merged 18 commits intomainfrom
Conversation
Introduces `e2b-haystack`, a new integration that provides E2B cloud sandbox tools for Haystack agents. Migrated from deepset-ai/haystack-experimental#448. Exposes four tools sharing a single `E2BSandbox` instance: - `RunBashCommandTool` - execute bash commands - `ReadFileTool` - read sandbox filesystem files - `WriteFileTool` - write sandbox filesystem files - `ListDirectoryTool` - list directory contents Plus `E2BToolset` as a convenience Toolset bundling all four tools. Includes 38 unit tests, two usage examples, and full serialisation round-trip support. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add mypy override to ignore missing stubs for `e2b` package (which doesn't ship a py.typed marker or type stubs) - Quote \$GITHUB_OUTPUT in workflow to fix actionlint/shellcheck SC2086 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Coverage report (e2b)Click to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Gate the "Run integration tests" CI step on the E2B_API_KEY env var being present, matching the pattern used by other integrations (e.g. cohere). Without this the step exits with code 5 (no tests collected) because there are no integration-marked tests and no API key is configured yet. Also exposes E2B_API_KEY from secrets at the workflow env level so it will be available once a maintainer adds the secret to the repo. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds real end-to-end integration tests (marked @pytest.mark.integration) that exercise all four tools against a live E2B sandbox: - RunBashCommandTool: echo, non-zero exit code, stderr capture - WriteFileTool + ReadFileTool: round-trip, nested directory creation - ListDirectoryTool: list /tmp, list after write - E2BToolset: warm_up/close lifecycle, shared sandbox state across tools Also suppresses S108 (/tmp path warning) in test per-file-ignores — /tmp is correct and intentional inside a sandboxed environment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The e2b SDK raises CommandExitException (with exit_code/stdout/stderr attributes) instead of returning a result for non-zero exit codes. Detect this via duck-typing and return the formatted result string so the LLM can see and react to the exit status, rather than propagating a ToolInvocationError. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Toolset was introduced in haystack-ai 2.19.0. The previous lower bound of 2.12.0 caused an ImportError on the "lowest direct dependencies" CI run. This matches the floor already used by the mcp integration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
e2b 1.x does not have the Sandbox.create() classmethod — it was introduced in 2.0.0. The lowest-direct-dependency CI job resolves e2b>=1.0.0 to 1.0.0, causing AttributeError when mock.patch tries to patch Sandbox.create. Bumping the floor to >=2.0.0 fixes the lowest- direct run while keeping Python 3.9+ compatibility (e2b 2.x requires >=3.9). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
bogdankostic
left a comment
There was a problem hiding this comment.
Looking forward to having this integration soon in Haystack!
I've left a few minor comments, mostly focused on polishing the docstrings. The only major blocker we should address IMO is the deserialization logic for individual tools to ensure the sandbox environments remain consistent within one pipeline.
| } | ||
|
|
||
| @classmethod | ||
| def from_dict(cls, data: dict[str, Any]) -> "RunBashCommandTool": |
There was a problem hiding this comment.
If a user builds an Agent with the four tools passed individually (sharing one E2BSandbox) rather than via E2BToolset, and then serializes/deserializes the pipeline, each tool's from_dict constructs its own independent E2BSandbox, breaking the shared state of the tools.
I tried this out by adapting build_pipeline in e2b_pipeline_example:
def build_pipeline() -> Pipeline:
sandbox = E2BSandbox(sandbox_template="base", timeout=120)
tools = [
WriteFileTool(sandbox=sandbox),
RunBashCommandTool(sandbox=sandbox),
]
agent = Agent(
chat_generator=OpenAIChatGenerator(model="gpt-4o-mini"),
tools=tools,
system_prompt=(
"You are a helpful coding assistant with access to a live Linux sandbox. "
"Use the available tools freely to write files and run commands."
),
max_agent_steps=10,
)
pipeline = Pipeline()
pipeline.add_component("agent", agent)
return pipelineExecuting the script with this version of build_pipeline, we get the following error:
e2b_pipeline_example.py", line 88, in verify_roundtrip
assert len(sandbox_ids) == 1, "Tools should share a single sandbox after round-trip"
^^^^^^^^^^^^^^^^^^^^^
AssertionError: Tools should share a single sandbox after round-trip
There was a problem hiding this comment.
@bogdankostic good point - do we have a similar pattern somewhere else in haystack that we can follow here for serializing a shared object?
There was a problem hiding this comment.
I don't think there's a straightforwar way to preserve a shared E2BSandbox across individually-serialized tools. The round-trip would need some form of cross-tool deduplication.
A quick solution would be to log a warning in each tool's to_dict pointing users at E2BToolset for the serialize/deserialize path.
Something like:
def to_dict(self) -> dict[str, Any]:
logger.warning(
"Serializing %s standalone will not preserve a shared E2BSandbox "
"across tools after deserialization. If you need to serialize an "
"agentic pipeline (e.g. to YAML) with multiple E2B tools sharing "
"one sandbox, use E2BToolset instead.",
type(self).__name__,
)
return {
"type": generate_qualified_class_name(type(self)),
"data": {"sandbox": self._e2b_sandbox.to_dict()},
}WDYT?
There was a problem hiding this comment.
Hm ok ... I guess the above warning but we the simplest approach.
I explored one alternative to make the round-trip work (see latest commit):
- Identity-based dedup inside
E2BSandbox.from_dict— give eachE2BSandboxa UUID at construction, serialize it, and havefrom_dictconsult a class-levelweakref.WeakValueDictionaryto return the already-restored instance for the same UUID. Transparent to all callers (tools, toolset, ad-hoc usage).
Pro: Enables production / platform use cases, where you regularly have to serialize / deserialize pipelines. Showing an warning there is not really useful and we should then rather disable the standalone tools for such cases and always push users directly to "E2BToolset"
Con: Makes the code more complex.
What do you think? @sjrl @bogdankostic
There was a problem hiding this comment.
I just tried it out, works like a charm 🙌🏼
Co-authored-by: bogdankostic <bogdankostic@web.de>
…ox.py Co-authored-by: bogdankostic <bogdankostic@web.de>
- drop nested py.typed (parent tools/py.typed already covers the namespace) - drop e2b.md API reference (auto-generated post-merge) - remove duplicate e2b>=2.0.0 from test env (already a project dep) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each E2BSandbox now carries a stable instance_id. E2BSandbox.from_dict consults a process-wide WeakValueDictionary keyed on that id so multiple tools that shared one sandbox before serialization keep sharing it after round-trip — addresses the case where users pass tools individually (WriteFileTool, RunBashCommandTool, ...) instead of via E2BToolset. A cache hit is only honored when the full serialized config (api_key, template, timeout, environment_vars) matches the cached entry. A crafted YAML reusing another tenant's id but with a different api_key falls through to a fresh instance and never observes the cached one — closes the cross-tenant escalation path that a naive id-only cache would open. On config mismatch the cache entry is preserved (no DoS). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
E2BToolset accepts api_key: Secret | None but E2BSandbox.__init__ requires
Secret. Fall back to Secret.from_env_var("E2B_API_KEY") when None.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tests _require_sandbox auto-calls warm_up, so the previous tests expecting "E2B sandbox is not running" were silently hitting the live E2B API with a fake key and failing on the 401 response. Mock Sandbox.create to fail and assert each tool wraps the warm_up RuntimeError as ToolInvocationError instead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Use the same `Secret = Secret.from_env_var("E2B_API_KEY", strict=True)`
default as E2BSandbox.__init__ for consistency. Drops the now-unreachable
`or` fallback in the E2BSandbox call.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bogdankostic
left a comment
There was a problem hiding this comment.
Nice, looking good now, just found one minor improvement related to changing the default value of api_key that I will directly change.
partially addresses #3227
Summary
Migrates the E2B sandbox toolset prototype from deepset-ai/haystack-experimental#448 into a proper
e2b-haystackintegration package.E2BSandboxfor managing the lifecycle of an E2B cloud sandbox (lazywarm_up,close, full serialisation round-trip)Toolsubclasses that share a single sandbox instance:RunBashCommandTool,ReadFileTool,WriteFileTool,ListDirectoryToolE2BToolset(aToolsetsubclass) that bundles all four tools as a convenience — just passE2BToolset()to any HaystackAgenthaystack_integrations/tools/module path convention (same asmcp-haystackandgithub-haystack)Test plan
hatch run test:unit) — all sandbox calls are mocked, no API key neededhatch run fmt-check)integrations/e2b/examples/for correctnessE2B_API_KEY)E2B_API_KEYsecret to GitHub repo for CI integration tests🤖 Generated with Claude Code