This repository is a work in progress. The intent is to create a lightweight library that provides a generalized implementation of deterministic simulation for Java using virtual threads. With the goal of enabling mostly idiomatic Java code to be tested deterministically. The approach taken in most current implementations of deterministic simulation is to constrain the API that non-deterministically interacts with your core logic to be as small as possible. Then create a bespoke simulation that stubs out that API and runs the simulation. This does work, but it is highly customized and may miss interactions outside the core logic.
The following table is an attempt to exhaustively describe the issues that need to be resolved in order do deterministic simulation testing in java.
| Java Non-determinism | Why | TLDR | Mitigations |
|---|---|---|---|
| JVM Garbage Collection | Pause and cause thread reorder beyond what we can simulate. | Ignore | It doesn't have a huge effect in the future its possible to implement a GC though. |
| JIT | Can change instruction ordering at runtime and cause very subtle bugs in multithreaded applications see JCStress project for more details. | Ignore/Disable | Disable it or turn it down during simulation if it is causing issues and leave that state space testing to tools like JCStress. |
| Classloading | Since it has to reads the file system. | Ignore | Classloading doesn't happen during runtime all that often so we can ignore it. |
| volatile | Not all possible interlieved updates to volitile variables can be simulated. | Ignore/Avoid | Specifically avoid mutable volatile usage int, float, double, and long. Boolean and immutable objects are okay. Not all interleavings will be able to be tested unless a Thread.yield() is added via instrumentation before and after each access. |
| Thread.sleep(), Object().wait() | Since it sleeps based on system time, executions get randomly inserted back into the simulation loop causing indeterminism. | Instrument | Currently requires an agent to do byte code manipulation of System time veriavles |
| Calendar, Date, System.nanoTime(), System.currentTimeMillis(),VM.getNanoTimeAdjustment() | Returns values based on system time. | Instrument | Instrument with an agent that replaces all of the underlying method calls to system calls. |
| LocalDate.now(), LocalTime.now(), Instant.now(), LocalDateTime.now(), ZoneDateTime.now() | Returns values based on system time. | Instrument | Anything that can control logic will need to be replaced by something that can have the clock replaced e.g. Instant.now(Clock). |
| Thread.ofPlatform(), ThreadPoolExecutor or Thread.ofVirtual() | Can only control interleaving of threads with virtual threads. | Instrument | Replace with SchedulableVirtualThreadFactory during simulation, Requires Java 24 or higher. |
| ScheduledExecutorsService | Uses system time to schedule tasks. | Instrument | Replace with SimulationScheduledExecutor during simulation or instrument system time. |
| Timer | Alternative | Use ScheduledExecutor instead. | |
| java.io.File* | Synchronous calls capture the simulation thread. They also get inserted back into the simulation loop randomly. | TBD | use java.nio.file instead. |
| java.nio.file | Indeterminism from errors have to be simulated. Threads blocked on IO only can get scheduled after they are unblocked leading to indeterminism in which simulation loop they will be acted on. | TBD | Stub FileSystem with something like JimFS during simulation. |
| java.net.Socket | Indeterminism from errors have to be simulated. Threads blocked on IO only can get scheduled after they are unblocked leading to indeterminism in which simulation loop they will be acted on. | TBD | Use Netty instead, with netty 4.2+ we can pass SchedulableVirtualThreadFactory into the event loop and use local transport to simulate network all within the JVM. |
| Native system calls | Not Supported | ||
| Random, SecureRandom | Randomness has to be made deterministic. | Instrument | Instances of Random and SecureRandom replaced with one instance of Random seeded from the simulation. |
| ForkJoinPool, .stream().parallel() | Can only control interleaving of threads with virtual threads. | TBD | Need to look into scheduling virtual threads on custom fork join pool and replacing the default system fork join pool. |
| External calls HTTP/2, GRPC, AMQP, MQTT, HTTP, STOMP ect... | Have to stub out all external calls to drive the simulation. | Instrument | Netty stubs??? |
| External database calls | Have to stub out all external calls to drive the simulation. | TBD | Stub with h2? |
| ThreadLocal | The simulation is single threaded so they don't work as expected. | Alternative | Replace with Scoped Values that are supported by virtual threads. |
| Singleton or static blocks | The simulation takes place in a single VM so if something is static and host specific it will be shared during simulation | Not Supported | Its possible load multiple instances of classes that contain singletons or static blocks but has to be done in seperate classloaders. |
| Object.wait()/Object.notify() | The ordering that notify wakes threads paused on object wait is JVM specific and there isn't a way to test other possible orderings. | Ignore | Can't fully be tested using this method but should be deterministic within a given JVM |
| ImmutableCollections (SALT32L and REVERSE fields) | The iteration order is seeded by a random number on VM start. | Instrument | Replace with a seed value from our singular random instance |
The main takeaways from the above table to implement deterministic simulation for an application are:
- IO will block and resume continuations non-deterministically in a future simulation tick. This is the main issue that needs to be researched how to resolve.
- System time needs to be avoided or instrumented at runtime with byte code weaving.
- IO operations have to be stubbed out to be able to introduce errors.
- Dependencies which use Alternative/Instrument/Not supported features from the above table have to be stubbed out or instrumented.
- External calls have to be stubbed out.
- Synchronous file IO captures the simulation virtual threads so won't simulate thread interleaving without adding Thread.yield() afterwords.
- Everything has to be run within the simulation's virtual threads, even initializing the system.
Some bespoke implementations of deterministic simulation in java can be found in Cassandra and Kafka's Kraft.
Some good videos and articles on deterministic simulation:
- The FASTEST and SAFEST Database
- Testing a Single-Node, Single Threaded, Distributed System Written in 1985 By Will Wilson
- What's the big deal about Deterministic Simulation Testing?
- Using Java's Project Loom to build more reliable distributed systems
- Deterministic Simulation: A New Era of Distributed System Testing (Part 1 of 2)
- Applying Deterministic Simulation: The RisingWave Story (Part 2 of 2)
- Fray: General-Purpose Concurrency Testing
- Fray GitHub