Skip to content

Add CoreX BI-V150 compatibility support#393

Open
lxzlxzliuxuzhao wants to merge 2 commits intoalibaba:mainfrom
lxzlxzliuxuzhao:corex-biv150-adapt
Open

Add CoreX BI-V150 compatibility support#393
lxzlxzliuxuzhao wants to merge 2 commits intoalibaba:mainfrom
lxzlxzliuxuzhao:corex-biv150-adapt

Conversation

@lxzlxzliuxuzhao
Copy link

No description provided.

@CLAassistant
Copy link

CLAassistant commented Mar 20, 2026

CLA assistant check
All committers have signed the CLA.

@lxzlxzliuxuzhao
Copy link
Author

Summary

This PR adds compatibility support for CoreX BI-V150 environments.

Main changes:

  • add CoreX platform detection
  • support NVML-compatible memory queries through libixml.so
  • improve Ray GPU resource registration on CoreX
  • harden platform initialization when CUDA is available but no visible device is exposed
  • adapt Megatron optimizer integration to vendor-patched signatures
  • add vLLM 0.11.2 compatibility for ray distributed executor selection

Validation

Passed:

  • pytest -q tests/platforms/test_platform_init.py
  • pytest -q tests/platforms/test_platform_memory.py
  • pytest -q tests/third_party/megatron/test_optimizer_compat.py
  • pytest -q tests/third_party/vllm/test_versioning.py

Total:

  • 13 passed, 17 warnings

Additionally smoke-tested on a CoreX BI-V150 machine with vendor-patched Torch / Megatron / vLLM builds.

Notes

Known limitation:

  • colocated RL with vLLM sleep/offload is still not treated as a supported path on the current CoreX software stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants