Run the VS2013 (MSVC 18.00) cl.exe/ml64.exe toolchain under wibo, byte-identical to native#127
Open
jeffmcjunkin wants to merge 9 commits into
Open
Run the VS2013 (MSVC 18.00) cl.exe/ml64.exe toolchain under wibo, byte-identical to native#127jeffmcjunkin wants to merge 9 commits into
jeffmcjunkin wants to merge 9 commits into
Conversation
FlsAlloc/FlsGetValue/FlsSetValue stored values in a single process-global
array, so every thread shared one cell per FLS index. Fiber-local storage
without fibers is thread-local storage on Windows: each thread is exactly
one fiber, and FlsGetValue must return the value the calling thread set.
The shared cell breaks msvcr120.dll (VS2013 CRT) thread creation. The CRT
stores its per-thread data block (_ptd) -- which carries the
_beginthreadex entry point and argument -- via FlsSetValue, and the new
thread's _threadstartex/_callthreadstartex re-read it via FlsGetValue.
With a process-global cell, two concurrently-starting threads overwrite
each other's _ptd and can both start with the same argument.
Observed with VS2013 cl.exe: c2.dll's parallel codegen pool creates four
worker threads back-to-back; ~1-2% of compiles deadlocked forever. An API
trace of a hung process shows two workers waiting on the same per-worker
dispatch event while another worker's event has a signal and no waiter:
CreateEvent h=7c / 88 / 94 / a0 (per-worker "go" events)
t251103 WaitForSingleObject h=88 <- worker 1
t251102 WaitForSingleObject h=88 <- different thread, SAME event
boss SetEvent 7c, 88, 94, a0 <- 94 never gets a waiter
boss WaitForMultipleObjects({done events}, bWaitAll, INFINITE)
-> hangs forever: the orphaned worker never runs its work item,
so its "done" event is never set.
Fix: keep the index allocation map process-wide (FLS indices are
process-wide on Windows) but store the values in a thread_local array;
wibo maps guest threads 1:1 onto host threads, so thread_local is exactly
per-guest-thread. New threads observe zero-initialized values, matching
Windows. Index alloc/free is guarded by a mutex.
Caveat (unchanged behavior): FLS destructor callbacks are still never
invoked, and freeing then reallocating an index does not clear other
threads' stale values for it. The VS2013 CRT allocates its index once per
process, so neither is reachable for it.
With this fix the cl.exe hang rate dropped from ~1-2% to ~0.1-0.3% over
1000-run stress batches; the remainder was a separate CRITICAL_SECTION
issue fixed in the following commit.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Two divergences from the real Windows state machine, both observed
breaking VS2013 cl.exe (c2.dll multithreaded codegen):
1. Contended EnterCriticalSection waited until OwningThread was observed
to be 0 and then claimed the section with a plain store. Two waiters
can both observe 0 after a single Leave and both enter the critical
section simultaneously. The fingerprint -- a free section left with
LockCount == 0 instead of -1, created when the second "owner"'s Leave
failed the ownership check and returned without decrementing -- was
captured with gdb in hung c2.dll worker pools.
2. LeaveCriticalSection bailed out when the calling thread did not match
OwningThread. Real Windows Leave performs no caller validation at all:
it unconditionally decrements RecursionCount/LockCount and releases
exactly one waiter; mutual exclusion is carried entirely by the
LockCount/semaphore state machine. Guest lock usage that Windows
tolerates therefore silently strands wibo's lock state instead.
Instrumented runs caught exactly this on c2.dll's work-queue section:
leave-not-owner cs=<work queue CS> tid=B owner=A lock=1 rec=1
after which the queue's "active worker" counter and the LockCount were
each stranded one too high, the queue's drain condition became
unsatisfiable, and the compiler deadlocked (~0.1-0.3% of compiles even
after the FLS fix).
Fix, mirroring the Windows protocol:
- Model LockSemaphore as a ticket count: every contended Leave posts
exactly one ticket (InterlockedIncrement + WakeByAddressSingle); every
blocked Enter consumes exactly one ticket (CAS decrement, WaitOnAddress
otherwise) and then owns the section by construction. There is no claim
race on OwningThread. TryEnterCriticalSection cannot steal while
waiters exist because each waiter's LockCount increment persists until
it owns and leaves.
- Leave validates nothing, like Windows: if --RecursionCount != 0,
decrement LockCount; otherwise clear the owner, decrement, and release
one waiter if the result is >= 0.
Measured with VS2013 cl.exe under stress (18KB C unit, 25s timeout =
hang): ~1-2% hangs before, 0 hangs in 5,100 runs with this commit plus
the FLS fix (2x1000 + 2000 at 24-way parallelism, 300 sequential, 800 on
two other source units). Compiler output stays byte-identical to native
Windows. Side effect: heavily contended compiles got ~6x faster (1000
parallel compiles: 26s -> 4s wall) because the old wait loop woke every
waiter to race on each release.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Resource-only DLLs (e.g. MSVC's clui.dll) are linked /NOENTRY with their relocations stripped. When their preferred image base is already occupied they must be mapped elsewhere, at which point loadPE bailed with "relocation required but no relocation directory present". Such images have no executable section and address their resources via the actual mapped base, so it is safe to continue without applying relocations. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Backed by wibo's module registry. The GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS form (moduleInfoFromAddress) is how the MSVC CRT/compiler finds its own module to build the localized-resource (1033\clui.dll) path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds InterlockedPushEntrySList / InterlockedPopEntrySList / InterlockedFlushSList / QueryDepthSList (wibo already had SLIST_HEADER and InitializeSListHead), serialized with a mutex. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tubs LCMapStringEx/CompareStringEx forward to LCMapStringW/CompareStringW. A small compat TU adds conservative stubs the MSVC CRT/compiler probe at startup (GetEnabledXStateFeatures, Get/SetThreadPreferredUILanguages, IsValidLocaleName, InitializeSRWLock, WaitForSingleObjectEx/MultipleEx). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…icodeString The MSVC frontend opens source/output files through the NT file API. NtCreateFile maps OBJECT_ATTRIBUTES onto kernel32 CreateFileW (with FILE_FLAG_BACKUP_SEMANTICS for FILE_DIRECTORY_FILE); Rtl*UnicodeString use the process heap; NtQueryDirectoryFile enumerates via std::filesystem. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
General threading fix (not MSVC-specific). Two defects left __declspec(thread) data broken on guest-created threads (NULL ThreadLocalStoragePointer), crashing MSVC c2.dll's parallel-codegen workers: 1. initializeTib() set up the module-TLS array via ensureModuleArrayCapacityLocked(g_moduleArrayCapacity), which early- returns when required <= current capacity, so a thread created after the TLS-bearing DLLs loaded got no array. Allocate it directly for the new TIB. 2. notifyDllThreadAttach() only allocated static TLS for modules passing shouldDeliverThreadNotifications(); a DLL that calls DisableThreadLibraryCalls (c2.dll) is excluded, yet Windows still allocates its static TLS. Allocate static TLS for every hasTls module. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n path VS2013 c1.dll selects its source-file resolution strategy from the OS version reported by RtlGetVersion. 6.2 (Windows 8) steers it into a directory-canonicalization path that dead-ends on wibo's Z:-mapped volumes (C1083 without ever opening the source); 6.1 makes it use the direct CreateFileW(source) path. Windows 7 (6.1, build 7601) was VS2013's contemporary host. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Depends on #126 — please merge that first. This branch is stacked on
fix-fls-per-thread-and-cs-handoff, so its first two commits (per-thread FLS; Windows-faithful CRITICAL_SECTION) belong to #126 and will drop out of this diff once #126 lands and I rebase ontomain. Review the commits above those two.What this does
Makes wibo run the VS2013 Update 5 (MSVC 18.00.40629) 32-bit-host cross toolchain —
VC/bin/x86_amd64/cl.exeandml64.exe(both PE32 / i386) — well enough to compile C/C++ and assemble MASM into COFF objects that are byte-identical to native Windows output. wibo loads the realmsvcr120.dll/msvcp120.dllfrom disk; this PR fixes the wibo-side gaps underneath it (kernel32 / ntdll / loader / TLS).This is unrelated to the older
msvc-wip/msvc-brokenbranches (which implemented the builtin legacymsvcrt.dll) — VS2013 importsmsvcr120, so this is a different and complete approach.Commits (above the #126 base)
clui.dll)GetModuleHandleExA/W(theFROM_ADDRESSform locates the compiler's own module to build the1033\clui.dllpath)InterlockedPush/Pop/FlushEntrySList,QueryDepthSList)LCMapStringEx/CompareStringEx+ a small compat TU of CRT/locale/SRW/wait stubsNtCreateFile/NtClose/NtQueryDirectoryFile+Rtl{Create,Free}UnicodeStringc1.dlltakes its directCreateFileWsource-open pathValidation
.text(incl. all/GyCOMDATs),.xdata,.pdata,.rdata,.drectveand relocations are byte-identical to nativex86_amd64cl.exe; only.debug$S(embedded absolute paths) and the COFFTimeDateStampdiffer..asmbyte-identically too.release-clangwibo); Wine 9.0 (old-WoW64) is the byte-identical baseline this matches.Out of scope (not needed for compile/assemble → obj)
link.exe/lib.exe/rc.exe, PDB generation (/Zi→ themspdbsrv.exeIPC server),/MP, and the 64-bit-host (PE32+) tools (wibo is a 32-bit loader; thex86_amd6432-bit-host tools emit x64 objects and are sufficient).cl.exe+ml64.execover the matching-decompilation workflow this targets.