This is a meta-issue for what Dask needs in order to declare compatibility with Python 3.14t:
The above are the prerequisites to advertise a good user experience to final users - e.g. it will be at worst like the GIL-enabled version, minus some % slower due to known bottlenecks that are actively being ironed out upstream.
In addition, real world users will be blocked by key missing optional packages; notably:
After the above, there is ample margin for performance tweaking:
- pickling/unpickling runs on a special thread pool with a single thread. With free-threading, you could have as many threads as you have CPUs.
- network I/O runs on a single thread; it could be moved to one thread per peer.
- spilling is single-threaded, blocks the worker state machine and networking, and has always been very painful
- the only thing that must remain single-threaded is the state machine, but I expect it not to be a bottleneck in and by itself.
- more in general, there is a lot of profiling needed as it becomes possible to have workers with a huge number of threads - which was previously unadvisable even at the default chunk size (128 MiB). My latest benchmarks at scale (on Coiled, circa 2022) showed severe performance degradation already when moving from 4 threads per worker to 8, given the same number of total CPUs on the cluster.
This is a meta-issue for what Dask needs in order to declare compatibility with Python 3.14t:
PYTHON_GIL=0. This has the disadvantage of silencing any other package with the same issue.MSGPACK_PUREPYTHON=1 pip install msgpack --no-binary msgpack --no-cache --force-reinstall -vThe above are the prerequisites to advertise a good user experience to final users - e.g. it will be at worst like the GIL-enabled version, minus some % slower due to known bottlenecks that are actively being ironed out upstream.
In addition, real world users will be blocked by key missing optional packages; notably:
engine="h5netcdf")After the above, there is ample margin for performance tweaking: