Inconsistent response Content in evaluate Function When Toggling Python Evaluator Sandbox 

**Describe the bug**
When using the Python evaluator, enabling or disabling the sandbox causes the `response` parameter passed into the `evaluate` function to contain different content. This results in inconsistent evaluation behavior depending on the sandbox setting. The structure of `response` does not match what is described in the [[official documentation](https://chainforge.ai/docs/evaluation/)](https://chainforge.ai/docs/evaluation/).

**To Reproduce**
Steps to reproduce the behavior:

1. Create a Python evaluator.
2. Disable the sandbox environment.
3. Print out various attributes of the `response` object in the `evaluate` function.
4. You will observe a set of integers being printed, rather than the documented `.text`, `.raw`, etc.

**Expected behavior**
The `response` parameter should have a consistent structure, regardless of whether the sandbox is enabled or not. It should follow the documented interface.

**Screenshots** <img width="3758" height="1889" alt="Image" src="https://github.com/user-attachments/assets/f19e2d55-c3d3-4418-ab4e-234f755e8d19" />

**Environment:**

* OS: Ubuntu 11 (server), Windows 11 (client)
* Browser: Edge 138.0.3351.95
* Python: 3.12

**Additional context**
This inconsistency makes it difficult to write portable evaluator scripts. It would be helpful to unify the `response` object structure between sandboxed and non-sandboxed execution.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent response Content in evaluate Function When Toggling Python Evaluator Sandbox #371

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Inconsistent response Content in evaluate Function When Toggling Python Evaluator Sandbox #371

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions