Block-based immutable list implementation (GSoC proposal) by Zayd-R · Pull Request #809 · typelevel/cats-collections

Zayd-R · 2026-03-23T15:10:05Z

Summary

This is an early-stage implementation of the block-based immutable list
proposed in #634, submitted as part of GSoC work. The goal of this PR is to share the implementation and benchmark data to explore the design space before committing to a final approach.

Two implementations explored

BlockedList — copy-on-write
Every prepend into dead space copies the valid portion of the block before
writing. Fully persistent and safe for branching use cases.

FastBlockedList — write-direct
Prepend writes directly into dead space (offset - 1) without copying,
since that slot has never been pointed at by any existing node and is
invisible to all observers. when the block is full, a fresh block is allocated and
the current node becomes the tail.

Both implementations store BlockSize per node to test it with many sizes in the benchmark file without
recompilation.

Results

All times in ns/op. Lower is better.

prepend (build a list of 10k elements from empty)

blockSize	BlockedList	FastBlockedList	scala.List
4	138,526 ± 4,810	52,328 ± 218	40,419 ± 114
8	167,084 ± 1,601	54,333 ± 228	40,460 ± 296
16	193,820 ± 6,885	56,385 ± 341	39,556 ± 100
32	246,475 ± 2,368	54,803 ± 337	39,600 ± 311
64	378,280 ± 74,722	53,403 ± 226	39,592 ± 164

Copy-on-write prepend scales linearly with blockSize due to the arraycopy
cost. Write-direct prepend is flat across block sizes and only ~30% slower
than scala.List.

foreach (visit every element)

blockSize	BlockedList (cow)	FastBlockedList	scala.List
4	15,149 ± 652	13,001 ± 808	22,114 ± 259
8	9,592 ± 283	10,498 ± 504	22,076 ± 322
16	8,442 ± 154	8,379 ± 137	21,762 ± 147
32	6,960 ± 128	6,864 ± 231	21,880 ± 91
64	5,786 ± 44	5,688 ± 40	21,685 ± 157

Both implementations beat scala.List by ~4x at blockSize=64. This
is the cache locality benefit the proposal predicted — larger blocks mean
longer tight array loops with fewer pointer jumps.

foldLeft (sum all elements)

blockSize	BlockedList (cow)	FastBlockedList	scala.List
4	39,273 ± 654	32,715 ± 2,828	28,207 ± 972
8	38,107 ± 941	29,910 ± 4,056	27,148 ± 174
16	36,720 ± 302	32,043 ± 3,591	27,969 ± 589
32	36,689 ± 445	30,048 ± 3,890	27,263 ± 294
64	34,953 ± 457	28,918 ± 4,981	28,865 ± 261

FastBlockedList.foldLeft ties scala.List at blockSize=64
(28,918 vs 28,865 ns/op). The larger error margins suggest JIT
variance — more iterations would tighten these numbers.

uncons (element-by-element traversal)

blockSize	BlockedList (cow)	FastBlockedList	scala.List
4	82,950 ± 6,042	70,404 ± 488	16,328 ± 202
8	84,576 ± 8,152	76,493 ± 545	16,232 ± 97
16	82,985 ± 14,101	74,295 ± 313	16,284 ± 144
32	83,485 ± 17,600	73,708 ± 1,606	16,470 ± 171
64	83,291 ± 15,568	72,358 ± 409	16,255 ± 208

uncons is slower than scala.List as expected — each call allocates
one Some and one Tuple2. As noted in the proposal, uncons is not
the intended traversal API. The foreach/foldLeft results above are
the relevant comparison.

map (apply a function to every element)

blockSize	BlockedList	scala.List
4	57,963 ± 1,036	55,880 ± 2,542
8	46,509 ± 9,182	58,043 ± 3,081
16	40,872 ± 5,839	58,697 ± 686
32	32,573 ± 357	58,086 ± 1,057
64	30,755 ± 831	58,367 ± 534

BlockedList.map beats scala.List by ~47% at blockSize=64 (30,755 ns vs 58,367 ns).
The improvement scales with blockSize — larger blocks mean more elements processed
per block , confirming cache locality advantage.
scala.List is flat across all block sizes as expected since it has no block structure.

Key findings

foreach validates the proposal's cache locality claim — ~4x faster
than scala.List at blockSize=64 for both implementations
foldLeft ties scala.List at larger block sizes
Copy-on-write prepend is not practical at large block sizes due to
linear arraycopy cost
Write-direct prepend is flat across block sizes and competitive with
scala.List
blockSize=32 or 64 appears optimal for bulk traversal operations

Benchmark methodology

Tool: JMH (Java Microbenchmark Harness) via sbt-jmh plugin
Mode: Average time (AverageTime)
Units: nanoseconds per operation (ns/op) — lower is better
Warmup: 5 iterations
Measurement: 10 iterations
Forks: 1
Threads: 1

Environment: JVM [openjdk 25.0.2 2026-01-20 LTS],
CPU [Intel Core 5 210H],
RAM [16GB RAM],
OS [Ubuntu 22.04]

Lists are pre-built in @Setup(Level.Trial) so construction cost is
excluded from traversal measurements. The benchmark suite is included
in bench/src/main/scala/cats/bench/BlockedListBenchmark.scala and
can be reproduced with:

sbt "bench/jmh:run -i 10 -wi 5 -f 1 -t 1 .*BlockedList.*"

Questions

Is the write-direct approach in FastBlockedList acceptable , or should only the copy-on-write version be pursued?

Transparency note

English is not my first language. I used an LLM to help
with grammar and formatting in this PR description, and to generate the
initial benchmark boilerplate code. All implementation decisions, the
identification of bugs, the analysis of benchmark results, and the core
data structure logic were worked out by me. The AI was used as a writing
and tooling aid, not as a substitute for understanding.

Introduces BlockedList (copy-on-write) and BlockedLostCopy (write-direct) as proposed in typelevel#634. Includes JMH benchmarks comparing both implementations against scala.List across prepend, uncons, foldLeft, and foreach.

Zayd-R · 2026-03-23T15:26:29Z

I just noticed i named the implementaion that writes directly with Copy suffix, the name was just to differentiate it from my original copy on write implementation, srry for the confusion

gemelen · 2026-03-24T20:12:57Z

@Zayd-R thank you for working on this.

There are few things that I'd like you to fix in your changeset:

revisit your description about the PR, fix the typos and misnames (like BloackedLoistCopy, etc)
provide a description on the benchmarks - what tools did you use, what's the methodology, how to apply it to repeat the measuruments, what are the units in the results you provided (time, space, op/s, etc)
please, fix the issue raised by the CI on the missing headers

…eaders. Please let me know if anything else needs attention

Add block-based list implementation with benchmarks

261b997

Introduces BlockedList (copy-on-write) and BlockedLostCopy (write-direct) as proposed in typelevel#634. Includes JMH benchmarks comparing both implementations against scala.List across prepend, uncons, foldLeft, and foreach.

Zayd-R marked this pull request as ready for review March 23, 2026 15:20

Zayd-R added 4 commits March 25, 2026 13:41

Fixed the typos, added benchmark details, and added missing license h…

988d115

…eaders. Please let me know if anything else needs attention

Fixing headers

1113036

Fixing format

0d9ade1

Added map implementation and updated the benchmark results

440066d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Block-based immutable list implementation (GSoC proposal)#809

Block-based immutable list implementation (GSoC proposal)#809
Zayd-R wants to merge 5 commits intotypelevel:masterfrom
Zayd-R:immutable-blocked-list-proposal

Zayd-R commented Mar 23, 2026 •

edited

Loading

Uh oh!

Zayd-R commented Mar 23, 2026

Uh oh!

gemelen commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Zayd-R commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Two implementations explored

Results

prepend (build a list of 10k elements from empty)

foreach (visit every element)

foldLeft (sum all elements)

uncons (element-by-element traversal)

map (apply a function to every element)

Key findings

Benchmark methodology

Questions

Transparency note

Uh oh!

Zayd-R commented Mar 23, 2026

Uh oh!

gemelen commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Zayd-R commented Mar 23, 2026 •

edited

Loading

gemelen commented Mar 24, 2026 •

edited

Loading