Skip to content

feat: add e2b-haystack integration for E2B cloud sandbox tools#3195

Merged
bogdankostic merged 18 commits intomainfrom
add-e2b-integration
Apr 30, 2026
Merged

feat: add e2b-haystack integration for E2B cloud sandbox tools#3195
bogdankostic merged 18 commits intomainfrom
add-e2b-integration

Conversation

@tholor
Copy link
Copy Markdown
Member

@tholor tholor commented Apr 21, 2026

partially addresses #3227

Summary

Migrates the E2B sandbox toolset prototype from deepset-ai/haystack-experimental#448 into a proper e2b-haystack integration package.

  • Adds E2BSandbox for managing the lifecycle of an E2B cloud sandbox (lazy warm_up, close, full serialisation round-trip)
  • Adds four Haystack Tool subclasses that share a single sandbox instance: RunBashCommandTool, ReadFileTool, WriteFileTool, ListDirectoryTool
  • Adds E2BToolset (a Toolset subclass) that bundles all four tools as a convenience — just pass E2BToolset() to any Haystack Agent
  • Follows the haystack_integrations/tools/ module path convention (same as mcp-haystack and github-haystack)

Test plan

  • 38 unit tests pass (hatch run test:unit) — all sandbox calls are mocked, no API key needed
  • Lint clean (hatch run fmt-check)
  • Review examples in integrations/e2b/examples/ for correctness
  • Manual smoke test with a real E2B API key (integration test, requires E2B_API_KEY)
  • Maintainer to add E2B_API_KEY secret to GitHub repo for CI integration tests

🤖 Generated with Claude Code

Introduces `e2b-haystack`, a new integration that provides E2B cloud
sandbox tools for Haystack agents. Migrated from
deepset-ai/haystack-experimental#448.

Exposes four tools sharing a single `E2BSandbox` instance:
- `RunBashCommandTool` - execute bash commands
- `ReadFileTool` - read sandbox filesystem files
- `WriteFileTool` - write sandbox filesystem files
- `ListDirectoryTool` - list directory contents

Plus `E2BToolset` as a convenience Toolset bundling all four tools.
Includes 38 unit tests, two usage examples, and full serialisation
round-trip support.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added topic:CI type:documentation Improvements or additions to documentation labels Apr 21, 2026
- Add mypy override to ignore missing stubs for `e2b` package (which
  doesn't ship a py.typed marker or type stubs)
- Quote \$GITHUB_OUTPUT in workflow to fix actionlint/shellcheck SC2086

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 21, 2026

Coverage report (e2b)

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  integrations/e2b/src/haystack_integrations/tools/e2b
  bash_tool.py 53-55
  e2b_sandbox.py
  list_directory_tool.py 72, 80-81
  read_file_tool.py
  sandbox_toolset.py 74, 78, 82, 90-92
  write_file_tool.py 67, 75-76
Project Total  

This report was generated by python-coverage-comment-action

tholor and others added 6 commits April 22, 2026 10:30
Gate the "Run integration tests" CI step on the E2B_API_KEY env var
being present, matching the pattern used by other integrations (e.g.
cohere). Without this the step exits with code 5 (no tests collected)
because there are no integration-marked tests and no API key is
configured yet.

Also exposes E2B_API_KEY from secrets at the workflow env level so
it will be available once a maintainer adds the secret to the repo.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds real end-to-end integration tests (marked @pytest.mark.integration)
that exercise all four tools against a live E2B sandbox:
- RunBashCommandTool: echo, non-zero exit code, stderr capture
- WriteFileTool + ReadFileTool: round-trip, nested directory creation
- ListDirectoryTool: list /tmp, list after write
- E2BToolset: warm_up/close lifecycle, shared sandbox state across tools

Also suppresses S108 (/tmp path warning) in test per-file-ignores — /tmp
is correct and intentional inside a sandboxed environment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The e2b SDK raises CommandExitException (with exit_code/stdout/stderr
attributes) instead of returning a result for non-zero exit codes. Detect
this via duck-typing and return the formatted result string so the LLM can
see and react to the exit status, rather than propagating a ToolInvocationError.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Toolset was introduced in haystack-ai 2.19.0. The previous lower bound
of 2.12.0 caused an ImportError on the "lowest direct dependencies" CI
run. This matches the floor already used by the mcp integration.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
e2b 1.x does not have the Sandbox.create() classmethod — it was
introduced in 2.0.0. The lowest-direct-dependency CI job resolves
e2b>=1.0.0 to 1.0.0, causing AttributeError when mock.patch tries to
patch Sandbox.create. Bumping the floor to >=2.0.0 fixes the lowest-
direct run while keeping Python 3.9+ compatibility (e2b 2.x requires >=3.9).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@socket-security
Copy link
Copy Markdown

socket-security Bot commented Apr 22, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addede2b@​2.20.277100100100100

View full report

@tholor tholor marked this pull request as ready for review April 22, 2026 12:09
@tholor tholor requested a review from a team as a code owner April 22, 2026 12:09
@tholor tholor requested review from bogdankostic and removed request for a team April 22, 2026 12:09
Copy link
Copy Markdown
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to having this integration soon in Haystack!

I've left a few minor comments, mostly focused on polishing the docstrings. The only major blocker we should address IMO is the deserialization logic for individual tools to ensure the sandbox environments remain consistent within one pipeline.

Comment thread .github/labeler.yml Outdated
Comment thread integrations/e2b/examples/e2b_agent_example.py
Comment thread integrations/e2b/examples/e2b_pipeline_example.py
Comment thread integrations/e2b/pydoc/config_docusaurus.yml
Comment thread integrations/e2b/src/haystack_integrations/tools/e2b/e2b_sandbox.py
Comment thread integrations/e2b/src/haystack_integrations/tools/e2b/e2b_sandbox.py Outdated
Comment thread integrations/e2b/src/haystack_integrations/tools/e2b/e2b_sandbox.py Outdated
Comment thread integrations/e2b/src/haystack_integrations/tools/e2b/e2b_sandbox.py Outdated
Comment thread integrations/e2b/README.md
}

@classmethod
def from_dict(cls, data: dict[str, Any]) -> "RunBashCommandTool":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user builds an Agent with the four tools passed individually (sharing one E2BSandbox) rather than via E2BToolset, and then serializes/deserializes the pipeline, each tool's from_dict constructs its own independent E2BSandbox, breaking the shared state of the tools.

I tried this out by adapting build_pipeline in e2b_pipeline_example:

def build_pipeline() -> Pipeline:
    sandbox = E2BSandbox(sandbox_template="base", timeout=120)
    tools = [
        WriteFileTool(sandbox=sandbox),
        RunBashCommandTool(sandbox=sandbox),
    ]
    agent = Agent(
        chat_generator=OpenAIChatGenerator(model="gpt-4o-mini"),
        tools=tools,
        system_prompt=(
            "You are a helpful coding assistant with access to a live Linux sandbox. "
            "Use the available tools freely to write files and run commands."
        ),
        max_agent_steps=10,
    )
    pipeline = Pipeline()
    pipeline.add_component("agent", agent)
    return pipeline

Executing the script with this version of build_pipeline, we get the following error:

e2b_pipeline_example.py", line 88, in verify_roundtrip
    assert len(sandbox_ids) == 1, "Tools should share a single sandbox after round-trip"
           ^^^^^^^^^^^^^^^^^^^^^
AssertionError: Tools should share a single sandbox after round-trip

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogdankostic good point - do we have a similar pattern somewhere else in haystack that we can follow here for serializing a shared object?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a straightforwar way to preserve a shared E2BSandbox across individually-serialized tools. The round-trip would need some form of cross-tool deduplication.

A quick solution would be to log a warning in each tool's to_dict pointing users at E2BToolset for the serialize/deserialize path.

Something like:

def to_dict(self) -> dict[str, Any]:
    logger.warning(
        "Serializing %s standalone will not preserve a shared E2BSandbox "
        "across tools after deserialization. If you need to serialize an "
        "agentic pipeline (e.g. to YAML) with multiple E2B tools sharing "
        "one sandbox, use E2BToolset instead.",
        type(self).__name__,
    )
    return {
        "type": generate_qualified_class_name(type(self)),
        "data": {"sandbox": self._e2b_sandbox.to_dict()},
    }

WDYT?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm ok ... I guess the above warning but we the simplest approach.
I explored one alternative to make the round-trip work (see latest commit):

  • Identity-based dedup inside E2BSandbox.from_dict — give each E2BSandbox a UUID at construction, serialize it, and have from_dict consult a class-level weakref.WeakValueDictionary to return the already-restored instance for the same UUID. Transparent to all callers (tools, toolset, ad-hoc usage).

Pro: Enables production / platform use cases, where you regularly have to serialize / deserialize pipelines. Showing an warning there is not really useful and we should then rather disable the standalone tools for such cases and always push users directly to "E2BToolset"

Con: Makes the code more complex.

What do you think? @sjrl @bogdankostic

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried it out, works like a charm 🙌🏼

tholor and others added 4 commits April 24, 2026 15:54
Comment thread integrations/e2b/src/haystack_integrations/tools/e2b/py.typed Outdated
Comment thread integrations/e2b/e2b.md Outdated
Comment thread integrations/e2b/pyproject.toml Outdated
tholor and others added 4 commits April 29, 2026 15:43
- drop nested py.typed (parent tools/py.typed already covers the namespace)
- drop e2b.md API reference (auto-generated post-merge)
- remove duplicate e2b>=2.0.0 from test env (already a project dep)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each E2BSandbox now carries a stable instance_id. E2BSandbox.from_dict
consults a process-wide WeakValueDictionary keyed on that id so multiple
tools that shared one sandbox before serialization keep sharing it after
round-trip — addresses the case where users pass tools individually
(WriteFileTool, RunBashCommandTool, ...) instead of via E2BToolset.

A cache hit is only honored when the full serialized config (api_key,
template, timeout, environment_vars) matches the cached entry. A crafted
YAML reusing another tenant's id but with a different api_key falls
through to a fresh instance and never observes the cached one — closes
the cross-tenant escalation path that a naive id-only cache would open.
On config mismatch the cache entry is preserved (no DoS).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
E2BToolset accepts api_key: Secret | None but E2BSandbox.__init__ requires
Secret. Fall back to Secret.from_env_var("E2B_API_KEY") when None.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tests

_require_sandbox auto-calls warm_up, so the previous tests expecting
"E2B sandbox is not running" were silently hitting the live E2B API
with a fake key and failing on the 401 response. Mock Sandbox.create
to fail and assert each tool wraps the warm_up RuntimeError as
ToolInvocationError instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Use the same `Secret = Secret.from_env_var("E2B_API_KEY", strict=True)`
default as E2BSandbox.__init__ for consistency. Drops the now-unreachable
`or` fallback in the E2BSandbox call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looking good now, just found one minor improvement related to changing the default value of api_key that I will directly change.

Comment thread integrations/e2b/src/haystack_integrations/tools/e2b/e2b_sandbox.py Outdated
@bogdankostic bogdankostic merged commit d847cc0 into main Apr 30, 2026
18 checks passed
@bogdankostic bogdankostic deleted the add-e2b-integration branch April 30, 2026 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:e2b topic:CI type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants