Skip to content

WASM linear memory never shrinks after resource instance cleanup (deleteUUID) #1186

@hubyrod

Description

@hubyrod

Summary

Resource instances created via getStreamUUID and destroyed via deleteUUID never release their WASM ArrayBuffer memory back to the OS. The WASM linear memory grows monotonically — even after Skip's internal GC frees objects, the ArrayBuffer stays at its high-water mark. Over time this leads to OOM on constrained instances.

Environment

  • @skipruntime/wasm@0.0.19, @skipruntime/server@0.0.19, @skip-adapter/postgres@0.0.19
  • Node.js on Clever Cloud (XS instance, --max-old-space-size=644)
  • Platform: wasm (default)

Reproduction

  1. Start a Skip service with several external Postgres resources (syncHistoricData: true)
  2. Observe baseline ArrayBuffer memory (~1 GB for our dataset — photos, comments, notifications, reactions, etc.)
  3. Open SSE connections — each getStreamUUID call instantiates a resource with .map() chains (enrichers, filters), allocating additional WASM memory
  4. Close SSE connections — deleteUUID is called, resource instance is destroyed
  5. Observe ArrayBuffer memory: it never decreases, even with 0 active SSE connections

Data from production

Memory samples from process.memoryUsage().arrayBuffers (MB):

Uptime   | ArrayBuf | SSE conns | Notes
---------|----------|-----------|------
    63s  |    1025  |     0     | Fresh start, baseline after syncHistoricData
  9843s  |    1077  |     3     | Stable with 3 connections
 10263s  |    1077  |     0     | Connections closed, memory unchanged
 10803s  |    1077  |     0     | Still 0 connections, still 1077 MB
  9063s* |    1704  |     6     | After burst of 17 SSE connections
  9783s* |    1715  |     3     | Connections dropped, memory stayed at 1715
 10683s* |    1715  |     3     | Never came back down

* Second deployment, same pattern

The [MEMORY-TREND] watchdog flagged 368 MB of ArrayBuffer per SSE connection (threshold: 10 MB). After the connections closed and deleteUUID succeeded, the memory remained permanently allocated.

Eventually the process becomes unresponsive: event loop lag reaches 10+ seconds, DB queries time out, all health checks fail. Only a process restart recovers.

Expected behavior

After deleteUUID successfully destroys a resource instance, the WASM memory allocated for that instance's reactive graph (mapped/filtered collections) should be reclaimable — either by shrinking the WASM linear memory or by reusing freed pages for subsequent allocations without growing further.

Questions

  1. Is there a way to configure WASM memory limits or trigger compaction?
  2. Would switching more resources to syncHistoricData: false significantly reduce baseline memory?
  3. Is the native platform option (runService with platform: "native") expected to have better memory reclamation behavior?
  4. Are there plans to support WASM memory shrinking (e.g., via memory.discard proposals)?

Workaround

We're implementing a self-healing watchdog that calls process.exit(1) when ArrayBuffer exceeds a threshold, relying on the hosting platform to restart the process. This works but causes brief downtime on each restart.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions