Linux support: build, runtime, and importer fixes (177/177 tests passing)#46
Conversation
fc94bb1 to
d546e22
Compare
|
Awesome! Thanks for this. Pulled it locally (and set up a linux test project) and it works end-to-end: 177/177 tests passing, editor launches cleanly without Could you rebase onto main before a final review? There's concrete overlap with main since the branch diverged (we cleaned up some of the builds prior:
Easier to review on top of current main than to guess at the merge resolution (everything looks great already though). A few things came up while testing - some potential changes (happy to discuss):
Let me know your thoughts, and thanks again! |
Six small Linux-only changes that unblock building the plugin against UE 5.7's Linux toolchain (clang 20 + libc++) and let it locate MuJoCo / CoACD / libzmq at runtime. - URLab.Build.cs: drop _WIN32=0 / __linux__=1 defines on Linux. With _WIN32 defined (even as 0), MuJoCo's mjexport.h takes the __declspec(dllimport) branch, which clang on Linux can't parse. __linux__ / __unix__ are already defined by the compiler. - URLab.h: gate the _WIN32 redefine block on PLATFORM_WINDOWS so it no longer fires on Linux. Same root cause as Build.cs above. - MjCamera.h: replace tcp://*:5558 with tcp://0.0.0.0:5558 in a /** doc comment **/ — the literal `://*` triggers -Wcomment which is -Werror in UE's build. - CoacdInterface.h: case-correct include "Coacd/coacd.h" -> "CoACD/coacd.h" to match the installed header path. Linux fs is case-sensitive. - URLab.cpp + MjPhysicsEngine.cpp: branch the third-party shared-lib names on PLATFORM_WINDOWS / PLATFORM_LINUX so dlopen can find them by their actual SONAMEs (libmujoco.so.3.7.0, lib_coacd.so, libzmq.so.5) and the per-package install/<pkg>/lib subdirectory. With these patches the URLab.* automation suite on Linux runs to 172/177 passing. The 5 remaining failures cluster on a separate MJCF default-class inheritance issue tracked in URLab-Sim#45. Refs: URLab-Sim#45
Mirrors build_and_test.sh's summary-block format but uses the Linux UnrealBuildTool / UnrealEditor-Cmd binaries, builds the host project's Editor target on the Linux platform, and exports LD_LIBRARY_PATH at the third-party install dirs (RuntimeDependencies aren't staged on Linux editor builds, so the plugin's libmujoco / lib_coacd / libzmq aren't otherwise dlopen-able). Note: the existing build_and_test.sh hardcodes Win64 binary names and a Win64 build platform, so the Linux invocation in CONTRIBUTING.md would currently fail. Whether this should be a parallel _linux.sh or a unified platform-detecting script is an open question; see URLab-Sim#45. Refs: URLab-Sim#45
URLab stored TCHAR_TO_UTF8 macro output in a long-lived `const char*`,
which dangles immediately because the macro expands to a pointer into
a temporary FTCHARToUTF8 whose lifetime ends at the end of the full
expression.
On Windows/MSVC the stack memory often still held the original bytes
when the pointer was used a few lines later, so the bug went
unnoticed. On Linux/clang the stack is reclaimed aggressively and the
pointer dereferences garbage. Two consequences observed on Linux:
- FMujocoSpecWrapper::AddDefault registered each <default> class
under a corrupted name, so subsequent mjs_findDefault lookups by
the joint/geom/actuator import paths failed silently and MuJoCo
fell back to its built-in defaults. This caused all five
Linux-only test failures from the previous run:
URLab.Import.DefaultClassJointAxis
URLab.Import.DefaultFromTo
URLab.Import.RoundTrip_Defaults
URLab.Muscle.Arm26_ActuatorParams
URLab.Muscle.Arm26_Counts
After this fix all 177 URLab.* tests pass on Linux.
- UMjTendon::RegisterToSpec did the same thing with the side-site
string for <tendon><geom .../></tendon> wrapping. No failing test
caught it but the pattern is identical so it's fixed here too.
Both call sites now construct an explicit FTCHARToUTF8 with
function-scope lifetime so the converted bytes outlive the
mjs_addDefault / mjs_wrapGeom call.
Refs: URLab-Sim#45
…RPATH
UBT's auto-computed RPATH for the URLab plugin .so on Linux resolves
incorrectly when the plugin lives outside the host project tree (e.g.
symlinked in from a separate clone). The relative path it emits is
${ORIGIN}/../../../../../UnrealEngine/../URLabTest/Plugins/.../lib
which resolves to a non-existent /home/<user>/URLabTest/... rather than
the plugin's real on-disk location. Result: the loader can't find
libmujoco / lib_coacd / libzmq at editor startup, so the plugin module
fails to load without LD_LIBRARY_PATH.
Side-step the relative-path issue by symlinking third-party shared libs
into the plugin's own Binaries/Linux/ directory after build. UBT already
adds ${ORIGIN} to the plugin .so's RPATH (correctly, this time), so the
loader finds the libs in the same dir as the plugin .so. No
LD_LIBRARY_PATH needed at editor or packaging time.
Changes:
- URLab.Build.cs (Linux branch): drop the broken
PublicRuntimeLibraryPaths.Add(<absolute path>) entries (they end up
generating bad RPATH); use RuntimeDependencies with $(BinaryOutputDir)
target so packaging stages the libs alongside the plugin .so. Glob
*.so* (not just *.so) so SONAME-versioned real files like
libmujoco.so.3.7.0 also get staged.
- Scripts/setup_runtime_linux.sh: new helper that symlinks
third_party/install/<pkg>/lib/*.so* into Binaries/Linux/ for editor
builds (where RuntimeDependencies aren't auto-staged on Linux).
Idempotent.
- Scripts/build_and_test_linux.sh: invoke setup_runtime_linux.sh after
the plugin builds, drop the LD_LIBRARY_PATH fallback (no longer
needed).
After: editor launches and the URLab.* automation suite runs to
177/177 passing with LD_LIBRARY_PATH unset.
Refs: URLab-Sim#45
… libzmq.a Stop directly editing the CoACD submodule's public/coacd.h. Instead overlay a fixed copy via the existing CoACD_custom/ pattern that URLab already uses for CMakeLists.txt + cmake/. The fixed copy relaxes `#if _WIN32` to `#if defined(_WIN32)` so consumers compiled with `-Wundef -Werror` (UE on Linux) accept the header without warnings becoming errors. Both build.sh and build.ps1 grow a parallel block to copy CoACD_custom/public/* onto src/public/. Also disable libzmq's static archive on Linux (BUILD_STATIC=OFF). The static archive's mailbox_safe.cpp pulls libc++ condition_variable_any::wait_until -> pthread_cond_clockwait via inlining, which UE's link sysroot can't resolve. The .so version links its own runtime deps internally and works fine, so the static lib is unused and only causes the URLab plugin link to fail (undefined symbol pthread_cond_clockwait). Restricted to Linux via a case on uname so the Windows path is unchanged. After both changes: 177/177 URLab.* automation tests still pass on Linux from a clean third-party rebuild. Refs: URLab-Sim#45
Replace the "Linux is experimental" line in the README requirements with a pointer to docs/linux_setup.md, mention the Linux build script alongside the Windows one in the install section, and add the new doc to the mkdocs nav under Guides. The new docs/linux_setup.md captures: - Prerequisites: UE 5.7 Linux binary, CMake 3.24+, host UE5 C++ project with the plugin in Plugins/. - Why a special build flow: UE's bundled clang + libc++ require ABI-compatible third-party libs, so the system gcc + libstdc++ default of build_all.sh produces unlinkable binaries. - The full env-var sandwich (CC/CXX/AR/RANLIB/CFLAGS/CXXFLAGS/LDFLAGS) pointing the third-party CMake builds at UE's clang. - The role of Scripts/setup_runtime_linux.sh in staging the third- party .so files into Binaries/Linux/ so $ORIGIN-RPATH resolves them without LD_LIBRARY_PATH. - How to run the automation suite via Scripts/build_and_test_linux.sh, including the editor-must-be-closed gotcha. - A short list of known caveats (plugin must be inside the host project's Plugins/, first-launch shader compile time, etc.). Refs: URLab-Sim#45
Accept --engine <UE_ROOT> on third_party/build_all.sh. When given on
Linux, glob Engine/Extras/ThirdPartyNotUE/SDKs/HostLinux/Linux_x64/
v*_clang-*/x86_64-unknown-linux-gnu so the script survives UE version
bumps (v26 -> v27) without a script edit, then export
CC / CXX / AR / RANLIB and the matching CFLAGS / CXXFLAGS / LDFLAGS
internally so child build.sh invocations inherit them.
Replaces the explicit env-var sandwich docs/linux_setup.md previously
asked the user to type. The new flow is a single command:
./third_party/build_all.sh --engine $UE_ROOT
Without --engine the script falls through to the host toolchain
(unchanged behaviour for Windows / macOS, and for any Linux user who
deliberately wants the system gcc + libstdc++).
Why this matters: with the wrong toolchain, system gcc + libstdc++
either produces .so files whose C++ ABI doesn't match UE's plugin
link (wall of std::* undefined-symbol errors), or — worse, with some
combinations — emits LTO bitcode objects that masquerade as .so but
fail to load with a cryptic "file too short" at editor startup.
Both are real failure modes that cost real debugging time.
Each MuJoCo / CoACD / libzmq build.sh now invokes Scripts/setup_runtime_linux.sh after install on Linux. This catches the partial-rebuild case: bumping a single submodule SHA and running only that dep's build.sh (skipping build_all.sh and build_and_test_linux.sh) would otherwise leave Binaries/Linux/ holding stale symlinks pointing into the wiped install/<dep>/lib/ — the editor fails to load with no obvious cause. setup_runtime_linux.sh now warn-and-skips with exit 0 when Binaries/Linux/ doesn't exist yet, so a first-time fresh checkout (third-party built before the plugin .so) doesn't error out. The next plugin build creates Binaries/Linux/, the next setup_runtime_linux.sh call (from any of the build paths) populates it. Linux-only block — guarded with `if [ "$(uname -s)" = "Linux" ]` — so Windows / macOS paths are unchanged.
linux_setup.md previously assumed a working $UE_ROOT. Add a short "Unreal Engine 5" subsection at the top of Prerequisites covering: - where to get the precompiled Linux binary (epicgames.com/linux) - the .zip flow (~25 GB compressed, ~43 GB extracted) - disk-space rule of thumb (~70 GB free for initial setup including shader / DDC cache) - display options for headless servers (NICE DCV / X2Go / VNC) - common apt prereqs (libsdl2, libvulkan1) for fresh Ubuntu 22.04 Mentions the source-build path exists but is not required (most users land on this page from a fresh distro and don't realise the binary distribution exists).
d546e22 to
0993efe
Compare
|
Thanks for the review. Pushed updates addressing all four:
Also folded in your follow-up flag (#5): Build + test summary, post-rebase + post-changes: Ready for another look. |
jonathanembleyriches
left a comment
There was a problem hiding this comment.
Thanks for turning these around so quickly!
One question before merge: have you packaged a host project on Linux (RunUAT.sh BuildCookRun ... -package -stage) and then launched the resulting binary to confirm MuJoCo loads and runs correctly? The Build.cs declarations look right (RuntimeDependencies.Add("$(BinaryOutputDir)/...", LibFile, NonUFS) for the lib branch), but worth a runtime smoke test rather than just a successful package step.
I've left two inline fix suggestions and a few nitpicks.
Thanks for your contribution!
Per review feedback: Linux no longer adds the third-party .so to
PublicDelayLoadDLLs (RPATH staging supersedes the delay-load
workaround), so URLab_InstallMujocoCallbacks doesn't need to
GetDllHandle("libmujoco.so.X.Y.Z") + GetDllExport at all on Linux.
The mju_user_error / mju_user_warning symbols are declared
`MJAPI extern void (*...)(const char*)` in mujoco.h and link
directly through the .so we're already linked against, so direct
assignment is enough.
Drops the hardcoded libmujoco.so.3.7.0 SONAME literal — one fewer
spot to update on the next MuJoCo pin bump.
Windows path is unchanged: still GetDllHandle-based because
mujoco.dll genuinely is delay-loaded there.
Per review feedback: bash globs sort lexicographically, so `for cand in $SDK_BASE/v*_clang-*/...; do ... break; done` picks v25 over v26 when both exist. Latent today since UE ships exactly one Linux toolchain per engine version, but it would silently regress on a future UE bump if a transitional engine ever shipped two. Replace the loop with `ls -d ... | sort -V | tail -1` (version-aware sort, picks the highest) and keep the executability check.
Per review feedback on docs/linux_setup.md:
1. Add a source-build option (UnrealEngine clone + Setup.sh +
GenerateProjectFiles.sh + make UnrealEditor) under Prerequisites.
The precompiled binary is still recommended as the path of least
friction; the source build is for engine-level debugging.
2. Split the doc into clearly-labelled sections:
- One-time setup (the existing 4-step walkthrough)
- Day-to-day workflow (just `git pull` + build_and_test_linux.sh)
- Troubleshooting / Advanced (build flags reference, env-var
sandwich for manual per-dep rebuilds, known caveats).
The Reviewer's read was that the doc currently looked like a
first-setup guide with no signposting for what the loop is after
that. Now there's a header pointer at the top and the day-to-day
section is its own anchor.
3. Move the "Build flags applied internally" reference list out of
step 1 (where most users won't need it) and into Troubleshooting /
Advanced — it's debugging context, not setup material.
No content removed, just reshelved.
|
Thanks for the careful review. All four addressed: Inline fixes1. 2. Doc nitpicks3.
Packaging smoke testPackaged the host project on Linux and confirmed the binary loads MuJoCo at runtime: Build ran clean ( Launched the packaged binary with So the Editor regression checkRe-ran the automation suite after the three fixes, still 177 / 177 on Linux: (Summary is from before the three new commits; will re-run after this push and update if anything regressed, but the fixes are scoped narrowly enough I'd be surprised.) Let me know if there is anything that I missed... I hope not. Thank you for your patience! |
Per the second half of jonathanembleyriches's nitpick on linux_setup.md: setup_runtime_linux.sh is now invoked automatically by every per-dep build.sh and by build_and_test_linux.sh, so users never need to run it manually during normal setup or iteration. Keeping it as Step 3 of the one-time setup walkthrough was misleading. Removed it from the One-time setup section. Added a "How runtime staging works" subsection in Troubleshooting / Advanced that explains what the script does, when it auto-fires, and how to run it manually in the rare case you need to. Renumbered "Launch the editor" from step 4 to step 3. The brief mention is preserved at the end of step 2 for users who want the link to the explanation, but the action is gone.
|
Quick follow-up — re-reading my previous reply, I missed the first half of your line-83 nitpick. I had moved the build-flags reference list to Troubleshooting / Advanced, but I left Step 3 "Stage runtime libs" sitting in the One-time setup walkthrough even though Fixed in
Sorry for the half-fix. |
|
Re-ran the suite at the latest HEAD to make sure the four review-addressing commits didn't regress anything: Still green. |
jonathanembleyriches
left a comment
There was a problem hiding this comment.
Great work. Thanks for the contribution!
LGTM
Summary
URLab.*automation tests passing on Linux from a clean rebuild.TCHAR_TO_UTF8macro output stored in a long-livedconst char*) that was hidden on MSVC by stack-reuse luck and surfaced only on Linux/clang..uassetchanges, no Windows behaviour changes.Motivation
Fixes #45. The README hedged "Linux is experimental"; in practice URLab didn't compile cleanly on Linux against UE's bundled clang and didn't load at runtime even after building. With these patches the editor launches cleanly without
LD_LIBRARY_PATH, the MJCF importer registers, and the full automation suite passes.Linked issue
Fixes #45
Build + test evidence
Run via
./Scripts/build_and_test_linux.sh --engine $UE_ROOT --project /path/to/HostProject.uproject. Editor closed beforehand.LD_LIBRARY_PATHunset (RPATH staging from commitf7b8cb1dmakes it unnecessary).What changed, by commit
fix(linux): make URLab plugin build and dlopen on UE Linux(876048b)Six small build-time blockers:
_WIN32=0define on Linux (forced MuJoCo down the__declspec(dllimport)branch),URLab.h_WIN32redefine fired on every platform, atcp://*:5558doc comment that tripped-Wcomment, aCoacd/coacd.hinclude with the wrong case, andLoadDependencyDLL/GetDllHandlecalls hardcoded tomujoco.dll. All gated onPLATFORM_WINDOWS/PLATFORM_LINUXso the Windows path is unchanged.feat(linux): add Scripts/build_and_test_linux.sh(ea4738c)Linux-platform parallel of
build_and_test.sh. Same summary-block format. UsesBuild.sh+UnrealEditor-Cmd(existing script hardcodes Win64.exepaths, so the Linux invocation in CONTRIBUTING.md would currently fail — see open question 2 below).fix(linux): keep TCHAR_TO_UTF8 conversions alive across mjs_* calls(312b077)Root cause of the 5 Linux test failures from the original 172/177 run:
TCHAR_TO_UTF8(*FString)returns a pointer into a temporaryFTCHARToUTF8whose lifetime ends at the full expression.MjSpecWrapper::AddDefaultandMjTendon::RegisterToSpecstored that pointer in long-livedconst char*variables; on MSVC the freed stack still held the original bytes by coincidence, on clang/Linux the stack was reclaimed andmjs_addDefault/mjs_wrapGeomgot garbage class names. Fix isFTCHARToUTF8with explicit function-scope lifetime — ~4 lines total. Tests jumped from 172/177 to 177/177.feat(linux): stage third-party libs into Binaries/Linux/ for $ORIGIN RPATH(f7b8cb1)UBT's auto-computed RPATH for plugins not living inside the host project tree resolves to invented
/home/<user>/<project>/...paths that don't exist. Sidestep that by symlinkingthird_party/install/<pkg>/lib/*.so*into the plugin's ownBinaries/Linux/directory, where${ORIGIN}(which UBT does add correctly) finds them. New helper:Scripts/setup_runtime_linux.sh(idempotent).build_and_test_linux.shinvokes it after build. After this patch the editor launches withLD_LIBRARY_PATHunset.build(linux): apply CoACD _WIN32 patch via CoACD_custom overlay; drop libzmq.a(6a125d9)Replaces a direct edit of CoACD's submodule header with an overlay file at
third_party/CoACD_custom/public/coacd.h, copied ontosrc/public/by the existing build-script overlay pattern (build.shandbuild.ps1both updated). Also disables libzmq's static archive on Linux (BUILD_STATIC=OFF) — the staticmailbox_safe.cpppulls libc++'spthread_cond_clockwaitwhich UE's link sysroot can't resolve; the.soworks fine and is what the plugin actually uses.docs: add Linux setup guide and drop the experimental hedge(fc94bb1)New
docs/linux_setup.mdwalks through prerequisites, why UE's bundled clang + libc++ require an env-var sandwich for the third-party builds, the runtime staging step, and how to run the automation suite. README updated to point at it; mkdocs nav updated.Open questions / decisions made (with reasoning)
Build-script split: parallel
_linux.sh, not unified. Per your reply on Linux support: build, runtime, and importer issues (172/177 tests passing) #45 ("A parallel is good for now"). I did not collapsebuild_all.shandbuild_and_test.shinto platform-detecting versions — happy to do that as a follow-up if you'd prefer; it's a separate, larger change and would touch the existing Windows scripts. But the existingbuild_and_test.shis currently hardcoded to Win64 binary names and platform tag, so any Linux user following CONTRIBUTING.md verbatim would fail. Worth flagging in case you want me to fixbuild_and_test.shitself in this PR or a follow-up.CoACD overlay, not upstream PR. Per your reply on Linux support: build, runtime, and importer issues (172/177 tests passing) #45. Happy to also send an upstream PR to https://github.com/SarahWeiii/CoACD for the
_WIN32fix if you want — say the word and I'll do it after this PR lands.RPATH approach. Investigated UBT's auto-RPATH and found it produces a path like
${ORIGIN}/../../../../../UnrealEngine/../URLabTest/Plugins/UnrealRoboticsLab/third_party/install/MuJoCo/libthat resolves to/home/<user>/URLabTest/...(with the<user>segment dropped) — broken for any plugin not living inside the host project tree. The symlinks-into-${ORIGIN}approach in commit 4 sidesteps that entirely and works for both editor builds and a packaged plugin (sinceRuntimeDependencieswith$(BinaryOutputDir)target also stages the libs there at packaging time). If you'd prefer a different convention I'll switch.urlab_bridgenot tested on Linux yet. Out of scope for this PR (separate repo). Mentioned as a caveat indocs/linux_setup.md.Manual verification steps
linux-supportbranch into your project'sPlugins/(or symlink in).git submodule update --init --recursive.docs/linux_setup.md(UE's clang +-stdlib=libc++etc.).Engine/Build/BatchFiles/Linux/Build.sh <YourProject>Editor Linux Development -Project=..../Scripts/setup_runtime_linux.sh(or useScripts/build_and_test_linux.shwhich calls it for you).UnrealEditoragainst your project —URLabplugin should load with noLogURLab: Error: ... not foundlines.third_party/MuJoCo/src/model/humanoid/humanoid.xml) into Content Browser → MJCF importer should fire.Checklist
URLab.*automation suite passes (177/177)README.md, newdocs/linux_setup.md,mkdocs.yml)