Skip to content

Inconsistent response Content in evaluate Function When Toggling Python Evaluator Sandbox  #371

@hellowoe23

Description

@hellowoe23

Describe the bug
When using the Python evaluator, enabling or disabling the sandbox causes the response parameter passed into the evaluate function to contain different content. This results in inconsistent evaluation behavior depending on the sandbox setting. The structure of response does not match what is described in the [official documentation](https://chainforge.ai/docs/evaluation/).

To Reproduce
Steps to reproduce the behavior:

  1. Create a Python evaluator.
  2. Disable the sandbox environment.
  3. Print out various attributes of the response object in the evaluate function.
  4. You will observe a set of integers being printed, rather than the documented .text, .raw, etc.

Expected behavior
The response parameter should have a consistent structure, regardless of whether the sandbox is enabled or not. It should follow the documented interface.

Screenshots Image

Environment:

  • OS: Ubuntu 11 (server), Windows 11 (client)
  • Browser: Edge 138.0.3351.95
  • Python: 3.12

Additional context
This inconsistency makes it difficult to write portable evaluator scripts. It would be helpful to unify the response object structure between sandboxed and non-sandboxed execution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions