Skip to content

Race condition in SingletonConstant causes "Circular initialization" crash under parallel execution #436

@lagergren

Description

@lagergren

Summary

The parallel test runner (manualTests:runParallel) intermittently crashes with IllegalStateException: Circular initialization "ecstasy.xtclang.org" in the TestNesting module. The root cause is unsynchronized mutable state in SingletonConstant, which is shared across Lightweight containers.

Crash trace

Exception: Run-time error: java.lang.IllegalStateException: Circular initialization "ecstasy.xtclang.org"
    at ^ecstasy.xtclang.org (Check)
        =========
    at reflect.Class.displayName.get() (Class.x:150)
    at reflect.Class.estimateStringLength() (Class.x:591)
    at text.Stringable.toString() (Stringable.x:35)
    at Object.toString() (Object.x:39)
    at BOuter.InnerC.foo() (nesting.x:23)
    at BOuter.bar() (nesting.x:18)
    at testSimple() (nesting.x:11)
    at run() (nesting.x:5)

Root cause

SingletonConstant (in org.xvm.asm.constants) uses two plain (non-volatile, unsynchronized) fields to manage singleton lifecycle:

private transient ObjectHandle m_handle;        // line 236
private transient boolean      m_fInitializing; // line 241

The three methods that operate on these fields have no synchronization:

  • markInitializing() — check-then-act on m_fInitializing (classic TOCTOU race)
  • setHandle() — writes m_handle then m_fInitializing (no memory fence)
  • getHandle() — plain read of m_handle (may see stale/intermediate value)

When the parallel runner spawns multiple Lightweight containers, they share the same ConstantPool and thus the same SingletonConstant instances. If two containers concurrently trigger resolution of the ecstasy.xtclang.org module singleton (e.g., through toString()Class.displayName), the following race window exists:

  1. Container A calls markInitializing() → sets m_fInitializing = true, begins constructing the module singleton.
  2. Container B calls markInitializing() → sees m_fInitializing == true, installs an InitializingHandle sentinel as m_handle.
  3. Container B (or any fiber resolving the singleton) calls getHandle() → gets the InitializingHandle.
  4. Any operation on the InitializingHandle (e.g., cloneAs, getComposition, getType) calls assertInitialized() → sees instanceof InitializingHandle → throws IllegalStateException("Circular initialization ...").

The specific trigger in this crash: nesting.x:23 does string concatenation ("inner foo of B; this=" + this), which calls toString()Class.estimateStringLength()displayName.get() (Class.x:150), which needs the ecstasy.xtclang.org module singleton to compute a relative path.

Impact

  • Parallel runner (runParallel): Affected. All Lightweight containers share a ConstantPool. Any module that triggers toString() / Class.displayName during the singleton initialization window can crash. This is intermittent and timing-dependent.

  • Sequential runner (runSequential): Not affected in practice. Each module runs as a separate Gradle XtcRunTask in its own forked JVM process with an isolated ConstantPool. No concurrent access occurs.

  • Single-module execution: Not affected. A single module has no concurrent fiber racing the singleton initialization path.

Relation to existing known race

The build file already documents a related race at manualTests/build.gradle.kts:490:

// TODO: Re-enable TestIO here after the intermittent TypeSystem.implicitTypes initialization
// race is fixed.

This is the same category of bug: unsynchronized shared mutable state in the constant pool accessed concurrently by parallel Lightweight containers.

Architectural concern: toString() triggering singleton resolution

Beyond the thread-safety fix, this crash exposes a deeper design concern: toString() on any Ecstasy object can trigger singleton resolution as a side effect.

The call chain is: Object.toString()Class.estimateStringLength()displayName.get()pathWithin(this:service.typeSystem) → resolves the ecstasy.xtclang.org module singleton.

A well-behaved toString() should be a pure read with no observable side effects — no state transitions, no lazy initialization of shared singletons, no risk of throwing due to unrelated initialization ordering. The current design violates this expectation: a simple string concatenation ("this=" + this) can crash the runtime if a shared singleton happens to be mid-initialization on another thread.

This is also a debuggability concern. In any JVM-based runtime, developers expect to inspect objects via toString() in a debugger, in log statements, and in exception messages without perturbing execution. If toString() can trigger singleton resolution (or any other state-changing operation), then merely inspecting an object in a debugger could alter runtime behavior — making bugs harder to reproduce and diagnose.

Dangerous toString() / appendTo() sites in the standard library

The root of the problem is Class.displayName, which accesses this:service.typeSystem — a service-level singleton — to compute a human-readable relative path. Every standard library type whose string representation depends on Class.displayName inherits this danger. The full list of affected sites:

File Lines Method Dangerous Call
Class.x 150 displayName.get() pathWithin(this:service.typeSystem) — the root of all chains
Class.x 591 estimateStringLength() displayName.size
Class.x 617 appendTo() displayName.appendTo(buf)
Annotation.x 10 estimateStringLength() annoClass.displayName.size
Annotation.x 18 appendTo() .addAll(annoClass.displayName)
Mixin.x 8 estimateStringLength() mixinClass.displayName.size
Mixin.x 13 appendTo() .addAll(mixinClass.displayName)
Type.x 673 estimateStringLength() clz.displayName.size (in Class case)
Type.x 770 appendTo() clz.displayName.appendTo(buf) (in Class case)

All of these converge on the same dangerous root: Class.displayName.get()pathWithin(this:service.typeSystem). The Annotation, Mixin, and Type sites are dangerous because they access displayName on a Class object, which triggers the same chain.

Note that ClassTemplate.displayName and ModuleTemplate.displayName are safe — they use local pathWithin() implementations that access containingFile.mainModule directly without going through a service-level TypeSystem singleton.

How this should work in a correct architecture

toString() / appendTo() / estimateStringLength() should be pure reads that never trigger lazy initialization, singleton resolution, or service access. The Class.displayName property should be computable from already-resolved local state. Options:

  1. Eager caching: Compute and cache displayName at class initialization time (before any container can race on it).
  2. Local-only fallback: If the TypeSystem singleton is not yet available, fall back to a fully-qualified name or a locally-derivable short name — never block or throw.
  3. Follow the ClassTemplate pattern: ClassTemplate.displayName already does this correctly by using a local pathWithin() that doesn't access service-level state. Class.displayName should follow the same approach.

This would make toString() safe to call from any context — parallel execution, debuggers, logging, exception formatting — without risk of triggering initialization races.

Suggested fix

The SingletonConstant initialization fields need thread-safe access. Options:

  1. Minimal: Make m_handle volatile and use synchronized on markInitializing() / setHandle().
  2. Lock-free: Replace m_handle with AtomicReference<ObjectHandle> and use CAS in markInitializing().
  3. Structural: Ensure each Lightweight container gets its own singleton resolution scope (may be too invasive).

Additionally, decouple Class.displayName (and by extension all toString() paths) from singleton resolution, so that string formatting never triggers lazy initialization of shared state.

Reproducer

This is a timing-dependent race and cannot be reproduced deterministically. Running manualTests:runParallel repeatedly will eventually trigger it. The crash was observed with TestNesting as the failing module, but any module exercising toString() on an Ecstasy object could be the victim depending on scheduling.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions