Skip to content

Conversation

@neubig
Copy link
Contributor

@neubig neubig commented Feb 9, 2026

Summary

This PR upgrades the project to Python 3.13 and pins libtmux to neubig/libtmux@fix/new-session-race-condition which contains the fix for the race condition reported in libtmux#624.

Context

See upstream PR: tmux-python/libtmux#625

The issue was that new_session() in libtmux would:

  1. Run tmux new-session -P -F#{session_id} to create session
  2. Immediately run tmux list-sessions to fetch full session data

This created a race condition in Python 3.13 environments (especially with PyInstaller + Docker) where list-sessions might not see the newly created session yet, causing TmuxObjectDoesNotExist errors.

The fix expands the -F format string to include all Session fields and parses the output directly, eliminating the separate list-sessions query entirely.

Changes

  • Update target-version from py312 to py313 in root pyproject.toml (ruff)
  • Update pythonVersion from 3.12 to 3.13 in root pyproject.toml (pyright)
  • Update Python version in server.yml build matrix from 3.12 to 3.13
  • Update Python version in pypi-release.yml from 3.12 to 3.13
  • Update Python version in pr-review action from 3.12 to 3.13
  • Pin libtmux to neubig's branch: libtmux @ git+https://github.com/neubig/libtmux.git@fix/new-session-race-condition

Testing

This PR needs integration tests to verify the libtmux fix works correctly in our CI environment. The integration-test label should trigger those tests.

Note

This is a draft PR to test the libtmux fix. Once the upstream PR is merged and released to PyPI, we should update the dependency to the released version.


Related issues:

@neubig can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:bcad0e2-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-bcad0e2-python \
  ghcr.io/openhands/agent-server:bcad0e2-python

All tags pushed for this build

ghcr.io/openhands/agent-server:bcad0e2-golang-amd64
ghcr.io/openhands/agent-server:bcad0e2-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:bcad0e2-golang-arm64
ghcr.io/openhands/agent-server:bcad0e2-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:bcad0e2-java-amd64
ghcr.io/openhands/agent-server:bcad0e2-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:bcad0e2-java-arm64
ghcr.io/openhands/agent-server:bcad0e2-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:bcad0e2-python-amd64
ghcr.io/openhands/agent-server:bcad0e2-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:bcad0e2-python-arm64
ghcr.io/openhands/agent-server:bcad0e2-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:bcad0e2-golang
ghcr.io/openhands/agent-server:bcad0e2-java
ghcr.io/openhands/agent-server:bcad0e2-python

About Multi-Architecture Support

  • Each variant tag (e.g., bcad0e2-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., bcad0e2-python-amd64) are also available if needed

- Update target-version and pythonVersion to 3.13 in root pyproject.toml
- Update Python version in server.yml build matrix to 3.13
- Update Python version in pypi-release.yml to 3.13
- Update Python version in pr-review action to 3.13
- Pin libtmux to neubig/libtmux#fix/new-session-race-condition branch
  which fixes the race condition in new_session() that causes
  TmuxObjectDoesNotExist errors in Python 3.13 environments

The libtmux fix avoids the race condition by eliminating the separate
list-sessions query after session creation, instead parsing the session
data directly from the -P output of new-session.

Fixes the Python 3.13 + PyInstaller + Docker compatibility issue
reported in libtmux#624.

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig neubig added the integration-test Runs the integration tests and comments the results label Feb 9, 2026 — with OpenHands AI
@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

🧪 Integration Tests Results

Overall Success Rate: 100.0%
Total Cost: $0.90
Models Tested: 4
Timestamp: 2026-02-09 21:08:59 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_deepseek_deepseek_reasoner 100.0% 7/7 1 8 $0.03 586,389
litellm_proxy_moonshot_kimi_k2_thinking 100.0% 7/7 1 8 $0.15 225,951
litellm_proxy_gemini_3_pro_preview 100.0% 8/8 0 8 $0.31 230,266
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 8/8 0 8 $0.41 236,012

📋 Detailed Results

litellm_proxy_deepseek_deepseek_reasoner

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.03
  • Token Usage: prompt: 571,330, completion: 15,059, cache_read: 529,280, reasoning: 6,086
  • Run Suffix: litellm_proxy_deepseek_deepseek_reasoner_122e797_deepseek_v3_2_reasoner_run_N8_20260209_210520
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_moonshot_kimi_k2_thinking

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.15
  • Token Usage: prompt: 220,509, completion: 5,442, cache_read: 172,544
  • Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_122e797_kimi_k2_thinking_run_N8_20260209_210544
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

litellm_proxy_gemini_3_pro_preview

  • Success Rate: 100.0% (8/8)
  • Total Cost: $0.31
  • Token Usage: prompt: 223,711, completion: 6,555, cache_read: 120,260, reasoning: 4,076
  • Run Suffix: litellm_proxy_gemini_3_pro_preview_122e797_gemini_3_pro_run_N8_20260209_210520

litellm_proxy_claude_sonnet_4_5_20250929

  • Success Rate: 100.0% (8/8)
  • Total Cost: $0.41
  • Token Usage: prompt: 229,436, completion: 6,576, cache_read: 158,218, cache_write: 70,849, reasoning: 1,996
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_122e797_claude_sonnet_4_5_20250929_run_N8_20260209_210519

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
TOTAL18219488073% 
report-only-changed-files is enabled. No files were changed during this commit :)

@neubig neubig added the test-examples Run all applicable "examples/" files. Expensive operation. label Feb 9, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2026-02-09 21:45:19 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 28.7s $0.03
01_standalone_sdk/03_activate_skill.py ✅ PASS 21.6s $0.03
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 11.9s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 48.3s $0.04
01_standalone_sdk/09_pause_example.py ✅ PASS 19.9s $0.02
01_standalone_sdk/10_persistence.py ✅ PASS 27.5s $0.02
01_standalone_sdk/11_async.py ✅ PASS 34.3s $0.03
01_standalone_sdk/12_custom_secrets.py ✅ PASS 16.5s $0.02
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 21.5s $0.01
01_standalone_sdk/14_context_condenser.py ✅ PASS 4m 25s $0.53
01_standalone_sdk/17_image_input.py ✅ PASS 18.7s $0.02
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 27.9s $0.02
01_standalone_sdk/19_llm_routing.py ✅ PASS 15.8s $0.02
01_standalone_sdk/20_stuck_detector.py ✅ PASS 22.3s $0.02
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 9.7s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 22.2s $0.02
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 1m 27s $0.01
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 5m 5s $0.41
01_standalone_sdk/25_agent_delegation.py ✅ PASS 2m 16s $0.18
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 20.6s $0.02
01_standalone_sdk/28_ask_agent_example.py ✅ PASS 38.4s $0.03
01_standalone_sdk/29_llm_streaming.py ✅ PASS 36.6s $0.03
01_standalone_sdk/30_tom_agent.py ✅ PASS 11.2s $0.01
01_standalone_sdk/31_iterative_refinement.py ✅ PASS 3m 7s $0.21
01_standalone_sdk/32_configurable_security_policy.py ✅ PASS 19.7s $0.02
01_standalone_sdk/34_critic_example.py ❌ FAIL
Exit code 1
3.8s --
01_standalone_sdk/36_event_json_to_openai_messages.py ✅ PASS 9.8s $0.00
01_standalone_sdk/37_llm_profile_store.py ✅ PASS 4.0s $0.00
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 58.3s $0.04
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ❌ FAIL
Exit code 1
1m 4s --
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ❌ FAIL
Exit code 1
16.8s --
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ❌ FAIL
Exit code 1
1m 1s --
02_remote_agent_server/07_convo_with_cloud_workspace.py ✅ PASS 31.8s $0.02
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py ❌ FAIL
Exit code 1
4.1s --
04_llm_specific_tools/01_gpt5_apply_patch_preset.py ✅ PASS 18.7s $0.02
04_llm_specific_tools/02_gemini_file_tools.py ✅ PASS 44.4s $0.08
05_skills_and_plugins/01_loading_agentskills/main.py ✅ PASS 12.5s $0.01
05_skills_and_plugins/02_loading_plugins/main.py ✅ PASS 7.8s $0.01

❌ Some tests failed

Total: 38 | Passed: 33 | Failed: 5 | Total Cost: $1.94

Failed examples:

  • examples/01_standalone_sdk/34_critic_example.py: Exit code 1
  • examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py: Exit code 1
  • examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py: Exit code 1
  • examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1
  • examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1

View full workflow run

…HUB_SHA

GitHub Actions sets GITHUB_SHA to the merge commit by default, which
differs from the PR head commit. Use a custom variable AGENT_SERVER_SHA
to explicitly pass the PR head SHA to example scripts for Docker image
selection.

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig neubig removed the test-examples Run all applicable "examples/" files. Expensive operation. label Feb 10, 2026
@neubig neubig added the test-examples Run all applicable "examples/" files. Expensive operation. label Feb 10, 2026 — with OpenHands AI
@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2026-02-10 09:37:15 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 25.3s $0.03
01_standalone_sdk/03_activate_skill.py ✅ PASS 17.9s $0.03
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 11.1s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 41.2s $0.04
01_standalone_sdk/09_pause_example.py ✅ PASS 13.5s $0.01
01_standalone_sdk/10_persistence.py ✅ PASS 27.9s $0.02
01_standalone_sdk/11_async.py ✅ PASS 28.7s $0.03
01_standalone_sdk/12_custom_secrets.py ✅ PASS 11.2s $0.01
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 20.8s $0.01
01_standalone_sdk/14_context_condenser.py ✅ PASS 4m 22s $0.54
01_standalone_sdk/17_image_input.py ✅ PASS 17.2s $0.02
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 25.1s $0.01
01_standalone_sdk/19_llm_routing.py ✅ PASS 13.2s $0.02
01_standalone_sdk/20_stuck_detector.py ✅ PASS 19.5s $0.02
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 14.8s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 18.1s $0.01
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 1m 10s $0.01
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 2m 54s $0.22
01_standalone_sdk/25_agent_delegation.py ✅ PASS 2m 4s $0.17
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 18.6s $0.02
01_standalone_sdk/28_ask_agent_example.py ✅ PASS 28.3s $0.02
01_standalone_sdk/29_llm_streaming.py ✅ PASS 38.6s $0.03
01_standalone_sdk/30_tom_agent.py ✅ PASS 12.2s $0.01
01_standalone_sdk/31_iterative_refinement.py ❌ FAIL
Timed out after 600 seconds
10m 0s --
01_standalone_sdk/32_configurable_security_policy.py ✅ PASS 23.7s $0.02
01_standalone_sdk/34_critic_example.py ❌ FAIL
Exit code 1
3.9s --
01_standalone_sdk/36_event_json_to_openai_messages.py ✅ PASS 21.3s $0.01
01_standalone_sdk/37_llm_profile_store.py ✅ PASS 4.1s $0.00
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 1m 5s $0.04
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ❌ FAIL
Exit code 1
4.7s --
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ❌ FAIL
Exit code 1
5.7s --
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ❌ FAIL
Exit code 1
5m 11s --
02_remote_agent_server/07_convo_with_cloud_workspace.py ✅ PASS 28.3s $0.03
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py ❌ FAIL
Exit code 1
4.7s --
04_llm_specific_tools/01_gpt5_apply_patch_preset.py ✅ PASS 31.8s $0.04
04_llm_specific_tools/02_gemini_file_tools.py ✅ PASS 1m 10s $0.05
05_skills_and_plugins/01_loading_agentskills/main.py ✅ PASS 10.5s $0.01
05_skills_and_plugins/02_loading_plugins/main.py ✅ PASS 7.5s $0.01

❌ Some tests failed

Total: 38 | Passed: 32 | Failed: 6 | Total Cost: $1.52

Failed examples:

  • examples/01_standalone_sdk/31_iterative_refinement.py: Timed out after 600 seconds
  • examples/01_standalone_sdk/34_critic_example.py: Exit code 1
  • examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py: Exit code 1
  • examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py: Exit code 1
  • examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1
  • examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1

View full workflow run

- Regenerate uv.lock with pinned libtmux git dependency
- Simplify Generator[T, None, None] to Generator[T] in test files

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig neubig removed the test-examples Run all applicable "examples/" files. Expensive operation. label Feb 10, 2026
@neubig neubig added the test-examples Run all applicable "examples/" files. Expensive operation. label Feb 10, 2026 — with OpenHands AI
@github-actions
Copy link
Contributor

github-actions bot commented Feb 10, 2026

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Generated: 2026-02-10 09:54:23 UTC

Example Status Duration Cost
01_standalone_sdk/02_custom_tools.py ✅ PASS 25.7s $0.03
01_standalone_sdk/03_activate_skill.py ✅ PASS 20.0s $0.03
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 14.0s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 30.2s $0.02
01_standalone_sdk/09_pause_example.py ✅ PASS 18.5s $0.01
01_standalone_sdk/10_persistence.py ✅ PASS 35.4s $0.03
01_standalone_sdk/11_async.py ✅ PASS 31.4s $0.03
01_standalone_sdk/12_custom_secrets.py ✅ PASS 20.0s $0.02
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 21.1s $0.02
01_standalone_sdk/14_context_condenser.py ✅ PASS 6m 29s $0.84
01_standalone_sdk/17_image_input.py ✅ PASS 16.4s $0.02
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 23.3s $0.01
01_standalone_sdk/19_llm_routing.py ✅ PASS 15.6s $0.02
01_standalone_sdk/20_stuck_detector.py ✅ PASS 15.8s $0.02
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 11.4s $0.00
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 16.7s $0.01
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 57.8s $0.01
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 51.3s $0.05
01_standalone_sdk/25_agent_delegation.py ✅ PASS 1m 44s $0.19
01_standalone_sdk/26_custom_visualizer.py ✅ PASS 22.6s $0.03
01_standalone_sdk/28_ask_agent_example.py ✅ PASS 35.2s $0.03
01_standalone_sdk/29_llm_streaming.py ✅ PASS 44.6s $0.04
01_standalone_sdk/30_tom_agent.py ✅ PASS 10.6s $0.01
01_standalone_sdk/31_iterative_refinement.py ✅ PASS 3m 7s $0.22
01_standalone_sdk/32_configurable_security_policy.py ✅ PASS 16.6s $0.02
01_standalone_sdk/34_critic_example.py ❌ FAIL
Exit code 1
3.8s --
01_standalone_sdk/36_event_json_to_openai_messages.py ✅ PASS 13.1s $0.00
01_standalone_sdk/37_llm_profile_store.py ✅ PASS 4.1s $0.00
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 59.4s $0.04
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ❌ FAIL
Exit code 1
4.8s --
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ❌ FAIL
Exit code 1
4.9s --
02_remote_agent_server/04_convo_with_api_sandboxed_server.py ❌ FAIL
Exit code 1
5m 11s --
02_remote_agent_server/07_convo_with_cloud_workspace.py ✅ PASS 28.6s $0.02
02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py ❌ FAIL
Exit code 1
5.6s --
04_llm_specific_tools/01_gpt5_apply_patch_preset.py ✅ PASS 20.8s $0.03
04_llm_specific_tools/02_gemini_file_tools.py ✅ PASS 57.7s $0.07
05_skills_and_plugins/01_loading_agentskills/main.py ✅ PASS 14.0s $0.02
05_skills_and_plugins/02_loading_plugins/main.py ✅ PASS 7.6s $0.01

❌ Some tests failed

Total: 38 | Passed: 33 | Failed: 5 | Total Cost: $1.90

Failed examples:

  • examples/01_standalone_sdk/34_critic_example.py: Exit code 1
  • examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py: Exit code 1
  • examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py: Exit code 1
  • examples/02_remote_agent_server/04_convo_with_api_sandboxed_server.py: Exit code 1
  • examples/02_remote_agent_server/08_convo_with_apptainer_sandboxed_server.py: Exit code 1

View full workflow run

@openhands-ai
Copy link

openhands-ai bot commented Feb 10, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Run Examples Scripts
    • Pre-commit checks

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1978 at branch `upgrade-python313-with-libtmux-fix`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration-test Runs the integration tests and comments the results test-examples Run all applicable "examples/" files. Expensive operation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants