Skip to content

191023#45

Open
amchess wants to merge 1504 commits into
amchess:masterfrom
official-stockfish:master
Open

191023#45
amchess wants to merge 1504 commits into
amchess:masterfrom
official-stockfish:master

Conversation

@amchess

@amchess amchess commented Oct 19, 2023

Copy link
Copy Markdown
Owner

No description provided.

AliceRoselia and others added 30 commits December 21, 2025 15:43
Passed non regression STC:
https://tests.stockfishchess.org/tests/view/693e642c46f342e1ec20f68d
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 107968 W: 28080 L: 27937 D: 51951
Ptnml(0-2): 381, 12708, 27626, 12925, 344

Passed non regression LTC:
https://tests.stockfishchess.org/tests/view/693ff10c46f342e1ec20fa6a
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 334266 W: 85271 L: 85370 D: 163625
Ptnml(0-2): 179, 36395, 94086, 36292, 181

closes #6484

Bench: 2987379
Passed STC:
LLR: 2.93 (-2.94,2.94) <-1.75,0.25>
Total: 81888 W: 21336 L: 21166 D: 39386
Ptnml(0-2): 284, 9438, 21342, 9584, 296
https://tests.stockfishchess.org/tests/view/692ada47b23dfeae38cffce5

Passed LTC:
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 107328 W: 27534 L: 27404 D: 52390
Ptnml(0-2): 55, 11390, 30659, 11490, 70
https://tests.stockfishchess.org/tests/view/692d7a01b23dfeae38d011ab

closes #6467

Bench: 3006182
We did quite a few tests because this is a pretty involved change with
unknown scaling behavior, but results are decent.

[STC 10+0.1 1th, non-regression](https://tests.stockfishchess.org/tests/live_elo/6941ce3b46f342e1ec210180)
```
LLR: 2.93 (-2.94,2.94) <-1.75,0.25>
Total: 83200 W: 21615 L: 21452 D: 40133
Ptnml(0-2): 247, 9064, 22844, 9169, 276
```

[STC 5+0.05 8th](https://tests.stockfishchess.org/tests/live_elo/693dc38346f342e1ec20f555)
```
LLR: 3.48 (-2.94,2.94) <0.00,2.00>
Total: 58536 W: 15067 L: 14688 D: 28781
Ptnml(0-2): 87, 6474, 15781, 6825, 101
```

[LTC 20+0.2 8th](https://tests.stockfishchess.org/tests/live_elo/693f2afb46f342e1ec20f847)
```
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 27716 W: 7211 L: 6925 D: 13580
Ptnml(0-2): 8, 2674, 8207, 2962, 7
```

[LTC 10+0.1 64th](https://tests.stockfishchess.org/tests/live_elo/694003aa46f342e1ec20fac4):
```
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 16918 W: 4439 L: 4182 D: 8297
Ptnml(0-2): 3, 1493, 5213, 1744, 6
```

[NUMA test, 5+0.05 256th](https://tests.stockfishchess.org/tests/view/6941ee4e46f342e1ec210203)
```
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 7124 W: 1910 L: 1678 D: 3536
Ptnml(0-2): 0, 560, 2211, 790, 1
```

[LTC 60+0.6 64th](https://tests.stockfishchess.org/tests/live_elo/6940a85346f342e1ec20fcde):
```
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 15504 W: 4045 L: 3826 D: 7633
Ptnml(0-2): 0, 1002, 5530, 1219, 1
```

Bonus (courtesy of Viz): The 1 double kill in this last test was master
blundering a cool mate in 3: https://lichess.org/jyNZuRl4

Basically the idea here is to share correction history between threads.
That way, T1 can use the correction values produced by T2, which already
searched positions with that pawn structure etc., so that T1 can search
more efficiently. The table size per thread is about the same, so we
shouldn't get a large increase in hash collisions; in fact, I'd expect a
lower collision rate overall.

Although I came up with and implemented the idea independently,
[Caissa](https://github.com/Witek902/Caissa) was the first engine to
implement corrhist sharing (and corrhist in the first place) – this idea
is not completely novel.

The table size is rounded to a power of two. In particular, it's `65536
* nextPowerOfTwo(threadCount)`. That way, the indexing operation becomes
an AND of the key bits with a mask, rather than something more expensive
(e.g., a `mul_hi64`-style approach or a modulo).

The updates are racy, like the TT, but because `entry` is hoisted into a
register, there's no risk of writing back a value that's out of the
designated range `[-D, D]`. Various attempts at rewriting using atomics
led to substantial slowdowns, so we begrudgingly ignored the functions
in thread sanitizer, but at some point we'd like to make this better.

We allocate one shared correction history per NUMA node, because the
penalty associated with crossing nodes is substantial – I get a 40% hit
with NPS=4 and 256 threads, which is intolerable. With separate tables
per NUMA node I get a 6% penalty for nodes per second, which isn't ideal
but apparently compensated for.

closes #6478

Bench: 2690604

Co-authored-by: Disservin <disservin.social@gmail.com>
This moves since they are late in move ordering probably already have pretty bad stats anyway.
Passed STC:
https://tests.stockfishchess.org/tests/view/6943bcd546f342e1ec210e25
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 96704 W: 25206 L: 24798 D: 46700
Ptnml(0-2): 357, 11244, 24767, 11602, 382
Passed LTC:
https://tests.stockfishchess.org/tests/view/6946a8723c8768ca450722f0
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 89814 W: 23193 L: 22770 D: 43851
Ptnml(0-2): 53, 9532, 25321, 9941, 60
bench 2717363

closes #6485

Bench: 2791988
Passed STC:
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 211776 W: 54939 L: 54911 D: 101926
Ptnml(0-2): 714, 24971, 54484, 25011, 708
https://tests.stockfishchess.org/tests/view/6938971875b70713ef796b70

Passed LTC:
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 216774 W: 55346 L: 55326 D: 106102
Ptnml(0-2): 105, 23599, 60980, 23577, 126
https://tests.stockfishchess.org/tests/view/693fc91f46f342e1ec20f9f6

closes #6486

Bench: 3267755
Init threat offsets at compile time. Avoid another global init function call.

Passed STC Non-Regression:
https://tests.stockfishchess.org/tests/view/694971a83c8768ca4507275c
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 43296 W: 11284 L: 11077 D: 20935
Ptnml(0-2): 152, 4611, 11924, 4800, 161

closes #6487

No functional change
Passed STC:
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 80128 W: 20879 L: 20498 D: 38751
Ptnml(0-2): 274, 9318, 20496, 9705, 271
https://tests.stockfishchess.org/tests/view/6945d11f3c8768ca45072218

Passed LTC:
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 134298 W: 34497 L: 33983 D: 65818
Ptnml(0-2): 81, 14334, 37812, 14834, 88
https://tests.stockfishchess.org/tests/view/6947bf033c8768ca45072491

closes #6488

Bench: 2325401
Passed STC:
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 38208 W: 10076 L: 9754 D: 18378
Ptnml(0-2): 139, 4390, 9742, 4676, 157
https://tests.stockfishchess.org/tests/view/6945bb6446f342e1ec211d93

Passed LTC:
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 64086 W: 16529 L: 16157 D: 31400
Ptnml(0-2): 34, 6808, 17990, 7174, 37
https://tests.stockfishchess.org/tests/view/69479d303c8768ca45072446

closes #6489

Bench: 2442415
closes #6490

No functional change
update to current

closes #6491

No functional change
Passed simplification STC
LLR: 2.93 (-2.94,2.94) <-1.75,0.25>
Total: 92096 W: 23888 L: 23728 D: 44480
Ptnml(0-2): 336, 10796, 23608, 10988, 320
https://tests.stockfishchess.org/tests/view/694b6b9d572093c1986d6ae0

Passed simplification LTC
LLR: 2.96 (-2.94,2.94) <-1.75,0.25>
Total: 50064 W: 12789 L: 12598 D: 24677
Ptnml(0-2): 24, 5350, 14103, 5521, 34
https://tests.stockfishchess.org/tests/view/694d49aa572093c1986d7021

closes #6493

Bench: 2494221
Fix incorrect nonPawnKey update

Passed non-reg SMP STC:
```
LLR: 2.93 (-2.94,2.94) <-1.75,0.25>
Total: 139424 W: 35792 L: 35690 D: 67942
Ptnml(0-2): 197, 15783, 37665, 15855, 212
```
https://tests.stockfishchess.org/tests/view/694b7b7e572093c1986d6b0d

Passed non-reg SMP LTC:
```
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 88880 W: 22863 L: 22718 D: 43299
Ptnml(0-2): 16, 8947, 26401, 9028, 48
```
https://tests.stockfishchess.org/tests/view/694d2ceb572093c1986d6fc8

fixes #6492

closes #6494

Bench: 2475788
Passed STC:
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 141120 W: 36860 L: 36390 D: 67870
Ptnml(0-2): 470, 16441, 36314, 16819, 516
https://tests.stockfishchess.org/tests/view/694978e93c8768ca45072763

Passed LTC:
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 66576 W: 17078 L: 16700 D: 32798
Ptnml(0-2): 45, 7093, 18628, 7483, 39
https://tests.stockfishchess.org/tests/view/694bb608572093c1986d6ba6

closes #6496

Bench: 2503391
[Passed STC SMP](https://tests.stockfishchess.org/tests/view/694e506c572093c1986d7276):
```
LLR: 2.97 (-2.94,2.94) <0.00,2.00>
Total: 14992 W: 3924 L: 3653 D: 7415
Ptnml(0-2): 20, 1547, 4090, 1820, 19
```

[Passed LTC SMP](https://tests.stockfishchess.org/tests/live_elo/694ead61572093c1986d7365):
```
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 41146 W: 10654 L: 10342 D: 20150
Ptnml(0-2): 17, 3999, 12225, 4319, 13
```

[Passed a sanity check STC SMP post-refactoring](https://tests.stockfishchess.org/tests/view/69503997572093c1986d763a):
```
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 46728 W: 12178 L: 11863 D: 22687
Ptnml(0-2): 82, 5093, 12685, 5436, 68
```

(The large gain of the first STC was probably a fluke, and this result
is more reasonable!)

After shared correction history, Viz suggested we try sharing other
histories, especially `pawnHistory`. As far as we're aware, sharing
history besides correction history (like Caissa does) is novel. The
implementation follows the same pattern as shared correction history –
the size of the history table is scaled with
`next_power_of_two(threadsInNumaNode)` and the entry is prefetched in
`do_move`.

A bit of refactoring was done to accommodate this new history. Note that
we prefetch `&history->pawn_entry(*this)[pc][to]` rather than
`&history->pawn_entry(*this)` because unlike the other entries, each
entry contains multiple cache lines.

closes #6498

Bench: 2503391

Co-authored-by: Michael Chaly <Vizvezdenec@gmail.com>
Clang pretends to be GCC, but is enraged by `-Wstack-usage`:

closes #6499

No functional change
Fixes #6505

Missing initialization seemingly resulting in side effects, as discussed in the issue.

Credit to Sopel for spotting the bug.

PR used as a testcase for CoPilot, doing the right thing #6478 (comment)

closes #6511

No functional change
Use _POSIX_C_SOURCE to check for PTHREAD_MUTEX_ROBUST support. The latter is a enum, not a defined variable.

closes #6510

No functional change
this patch dampens down main history to 3/4 of it value for all possible moves
at the start of ID loop, making it partially refresh with every new root
position.

Passed STC:
https://tests.stockfishchess.org/tests/view/694e33ff572093c1986d7234
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 115520 W: 30164 L: 29735 D: 55621
Ptnml(0-2): 395, 13192, 30192, 13551, 430

Passed LTC:
https://tests.stockfishchess.org/tests/view/6950cbe6572093c1986d816c
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 63672 W: 16480 L: 16114 D: 31078
Ptnml(0-2): 46, 6524, 18329, 6892, 45

closes #6504

bench 2710946
passed STC:
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 71968 W: 18833 L: 18489 D: 34646
Ptnml(0-2): 192, 7310, 20643, 7640, 199
https://tests.stockfishchess.org/tests/view/69509e5c572093c1986d7a0a

closes #6512

No functional change
WINE_PATH started as a Wine-specific knob, but it’s now used more generally
as a command prefix to run the built engine under wrappers
like Intel SDE, qemu-user, etc.

- Add RUN_PREFIX as the supported “run wrapper/prefix” variable in Makefile
- Set WINE_PATH as a deprecated alias
- Update CI and scripts to use RUN_PREFIX

closes #6500

No functional change
This code path is never taken for vector sizes >= 512, so we can simplify it.

closes #6501

No functional change
closes #6509

No functional change
Happy New Year!

closes #6514

No functional change
Ensure that thread-local data is created within the correct NUMA
context, so that thread stacks or thread-local storage are allocated
to proper NUMA nodes.

refs #6516

closes #6518

No functional change
closes #6523

No functional change
The bestValue can sometimes go down. This happens 2% of the time or so.
This fix stops it from decreasing.

Failed gainer STC:
LLR: -2.94 (-2.94,2.94) <0.00,2.00>
Total: 146176 W: 37930 L: 37976 D: 70270
Ptnml(0-2): 480, 17422, 37366, 17304, 516
https://tests.stockfishchess.org/tests/view/6953be19572093c1986da66a

Passed Non-regression LTC:
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 257796 W: 65662 L: 65683 D: 126451
Ptnml(0-2): 164, 28247, 72087, 28246, 154
https://tests.stockfishchess.org/tests/view/69554ff0d844c1ce7cc7e333

closes #6520
fixes #6519

Bench: 2477446
Recent changes to the Square enum (reducing it from int32_t to int8_t)
now allow the compiler to vectorize loops that were previously too wide
for targets below AVX-512. However, this vectorization which Clang
performs is not correct and causes a miscompilation.

Disable this vectorization.

This particular issue was noticable with Clang 15 and Clang 19,
on avx2 as well as applie-silicon.

Ref: #6063
Original Clang Issue: llvm/llvm-project#80494

First reported by #6528, though misinterpreted.

closes #6529

No functional change
Only the one on line 158 is actually required but doesn't hurt to add constexpr where applicable here.

Warning was

"comparison of unsigned expression in '< 0' is always false"

closes #6530

No functional change
Compiles and Runs Stockfish on all supported gcc & clang compilers.
Only linux and avx2 currently.

closes #6533

No functional change
Disservin and others added 30 commits June 8, 2026 19:54
example of using, to avoid mixed usage of std::uint/std::int and uint/int...
```cpp
using u64 = std::uint64_t;
using u32 = std::uint32_t;
using u16 = std::uint16_t;
using u8  = std::uint8_t;

using i64 = std::int64_t;
using i32 = std::int32_t;
using i16 = std::int16_t;
using i8  = std::int8_t;

using usize = std::size_t;
using isize = std::ptrdiff_t;

#if defined(__GNUC__) && defined(IS_64BIT)
__extension__ using u128 = unsigned __int128;
__extension__ using i128 = signed __int128;
#endif
```

closes #6874

No functional change
Lichess maintains some patches on top of SF dev to get it working with Emscripten. This PR moves some of these patches into SF and adds WASM to CI. It also adds a few changes in places where the x86 intrinsics don't cleanly map onto WebAssembly SIMD instructions; otherwise, we use Emscripten's x86 compatibility layer and take SSE4.1 code paths.

Summary of the compatibility changes:

- Define `wasm32` and `wasm32-relaxed-simd` targets.
    - We don't support wasm without SIMD; it'd be a waste of time.
- Add option to disable TBs
    - This is required because `tbprobe.cpp` pulls in `mmap`. This option can be used on any target, of course, but it's only enabled by default for wasm.
- Add compilation job + test to CI

And the changes for performance:

- Disable atomics for shared history on wasm
    - Atomics are always `seq_cst` there, which can be quite slow (even on the x86, stores are locked `xchg [mem], reg`)
- Add SSE code path to `get_changed_pieces`, modeled after the AVX2 path
- `_mm_mulhi_epi16` has a complicated emulation sequence, so for the pairwise multiplication, use an approach similar to the NEON impl.
- __int128 is gets lowered to runtime functions on wasm, so use the fallback impl for `mul_hi64`
- V8 does a poor job with the NNZ finding, so use a slightly different sequence there
- Add relaxed simd support for `m128_dpbusd`.

Some local perf figures (single-threaded speedtest):

```
wasm
Nodes/second               : 902523
sse4.1
Nodes/second               : 1155380
avx512icl
Nodes/second               : 1676184
```

Further avenues to explore:
- Optimize for performance under V8's experimental AVX revectorizer (Currently it's about +10% in my testing, but could definittely be more)
- Branch hinting. For example, run bench while collecting branch frequency info, then inject it late in the WASM compilation pipeline. I tried this locally and it didn't help much, but maybe I'm missing something.
- PGO. Gives +1.5% NPS locally, but hard to integrate with WASM compilation wrokflows

closes #6875

No functional change
This PR introduces the additional `RootMove` attribute `previousPV` so that scores and PVs we send to the GUI in MultiPV analysis always match. This allows us in particular to extend our guarantee of exact mate (and TB win/loss) scores having a complete PV (leading to checkmate in the correct number of plies) to all PV lines. Recall that master fails here, since partially searched root moves may send to the GUI the previous score with the current/modified PV. See #6784.

The PR also uses the new attribute to extend the followPV logic to the analysis of sidelines, building on the idea in #6813 by @joergoster.

Passed non-reg STC:
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 166880 W: 42357 L: 42282 D: 82241
Ptnml(0-2): 394, 18685, 45177, 18820, 364
https://tests.stockfishchess.org/tests/view/6a0dea55818cacc1db0abb6a

Failed non-reg LTC:
LLR: -2.97 (-2.94,2.94) <-1.75,0.25>
Total: 890520 W: 224168 L: 225282 D: 441070
Ptnml(0-2): 390, 91902, 261789, 90790, 389
https://tests.stockfishchess.org/tests/view/6a1143ad818cacc1db0ac14c

Opening as draft for discussion on how to proceed. In SinglePV analysis, the patch is completely nonfunctional. But it is maybe a (small?) slowdown because of the increased size of `RootMove`. I am not sure if there as an elegant way to enrich the class only for MultiPV analysis (but the switch can happen at any time through the UCI interface), or to mitigate the speed penalty in some other way.

A local speedup test shows only a small slowdown on my system (but still high error bars):
```
sf_base =  1156928 +/-   1459 (95%)
sf_test =  1155885 +/-   1283 (95%)
diff    =    -1043 +/-   1777 (95%)
speedup = -0.09021% +/- 0.154% (95%)
```

The PR also adds the new MultiPV mate PV correctness check to the CI.

closes #6886

No functional change
Fixes #6881.

`timeout_decorator()` used a `ThreadPoolExecutor` context manager around blocking output waits. When `future.result(timeout=...)` timed out, leaving the context manager still waited for the worker thread to finish, so a blocked stdout read could keep the instrumented tests hanging past the configured timeout.

This change removes that executor wrapper for interactive Stockfish output waits. The harness now drains process output on a daemon reader thread, queues received lines, and applies the deadline directly while waiting for the next queued line. `TimeoutException` also initializes the base exception message so failures show useful text.

Validation:
- `python3 -m py_compile tests/testing.py tests/instrumented.py`
- local timeout smoke test: a 0.2s no-output wait raises in ~0.204s
- Stockfish smoke test: startup/`uciok` read succeeds, deliberate no-output wait raises in ~0.205s, engine exits 0
- `make -C src -j4 build`
- `../tests/signature.sh` -> `2814421`

closes #6882

No functional change
…s at compile time

Adds a `RelaxedAtomic`  wrapper around either `T` or `std::atomic<T>` and `USE_SLOPPY_ATOMICS` preprocessor define. The intent of this flag is to allow easy disabling of atomics on WASM, where even relaxed atomics are expensive because all atomics have `seq_cst` semantics.

Passed non regression STC
LLR: 2.99 (-2.94,2.94) <-1.75,0.25>
Total: 50624 W: 12976 L: 12776 D: 24872
Ptnml(0-2): 112, 5445, 14005, 5631, 119
https://tests.stockfishchess.org/tests/view/6a1f690e818cacc1db0ad2c7

Passed non-regression STC SMP
 LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 163696 W: 41514 L: 41438 D: 80744
Ptnml(0-2): 162, 18272, 44904, 18348, 162
https://tests.stockfishchess.org/tests/view/6a21fb97351b79f679cc44b3

Using this class for the TT also allows us to remove the TSAN suppressions, since the UB is fixed.

closes #6877

No functional change
Passed STC (https://tests.stockfishchess.org/tests/view/6a2893ad7c758d82accea129):
LLR: 3.20 (-2.94,2.94) <0.00,2.00>
Total: 23328 W: 6145 L: 5838 D: 11345
Ptnml(0-2): 50, 2463, 6346, 2740, 65

Instead of repeatedly doing the sum HalfKA + threats at the end, it's profitable to simply store one accumulator per side that combines them. This also avoids an extra/load store of an accumulator, and halves the cache footprint of the accumulators.

For full refreshes, we always compute both halfka and threats simultaneously. Any threat full refresh is always a halfka refresh because it occurs when the king crosses the center line, while halfka refreshes are required for ANY king move, so we don't need a separate detection path for threats.

I get about a 2.5% speedup locally with this, but I'd appreciate other ppl's measurements.

closes #6890

No functional change
Passed STC
 https://tests.stockfishchess.org/tests/view/6a2682ce351b79f679cc47c5
LLR: 2.93 (-2.94,2.94) <0.00,2.00>
Total: 57248 W: 14730 L: 14399 D: 28119
Ptnml(0-2): 145, 6159, 15698, 6464, 158

Reordering operations in `do_move` allows us to effectively prefetch the TT entry earlier, since the piece moving helpers don't actually modify the position key.  I suspect that with threat inputs, `put_piece` and friends got a lot more expensive, and so this helps us a lot.

vondele's machine:

==== master ====
1 Nodes/second : 294311526
2 Nodes/second : 297068312
3 Nodes/second : 297418763
Average (over 3):  296266200
==== pfearly ====
1 Nodes/second : 303986449
2 Nodes/second : 304221719
3 Nodes/second : 305302969
Average (over 3):  304503712 (+2.78%)

Locally, `bench`:

Result of 200 runs
speedup         = +0.0158
P(speedup > 0) =  1.0000

As expected it helps even more in a large-hash, NUMA setting.

closes #6891

No functional change
In a recent CCC event, Stockfish (probably through no fault of its own), lost some games on time when it was winning and when it had already found the move that delivers checkmate.

This patch stops the search when TM is active, and when mainthread can be certain that it is impossible to find a better move. That is if (i) it has found mate-in-1, (ii) it has found a mate-in-2 or (iii) it has found a mated-in-1.

patch:
```
position fen 5K2/8/2qk4/2nPp3/3r4/6B1/B7/3R4 w - e6
go wtime 100000000 winc 100000000
info string Available processors: 0-7
info string Using 1 thread
info string NNUE evaluation using nn-71d6d32cb962.nnue (106MiB, (83248, 1024, 31, 32, 1))
info string Network replica 1: Shared memory.
info depth 1 seldepth 3 multipv 1 score mate 1 nodes 30 nps 30000 hashfull 0 tbhits 0 time 1 pv d5e6
bestmove d5e6
```

master:
```
position fen 5K2/8/2qk4/2nPp3/3r4/6B1/B7/3R4 w - e6
go wtime 100000000 winc 100000000
info string Available processors: 0-7
info string Using 1 thread
info string NNUE evaluation using nn-71d6d32cb962.nnue (106MiB, (83248, 1024, 31, 32, 1))
info string Network replica 1: Shared memory.
info depth 1 seldepth 3 multipv 1 score mate 1 nodes 30 nps 15000 hashfull 0 tbhits 0 time 2 pv d5e6
<snip>
info depth 245 seldepth 2 multipv 1 score mate 1 nodes 5886 nps 367875 hashfull 0 tbhits 0 time 16 pv d5e6
bestmove d5e6
```

Note: In MultiPV analysis (extremely rare with TM active), we take the point of view that the user would like to continue to search until none of the PVs can be improved anymore. This means we only stop if the worst searched line is at least a mate-in-2, or if the best searched line is a mated-in-1.

closes #6879

No functional change
The FEN validation check intended to reject pawns on the first or eighth rank uses the `Rank` enum values in a bitwise OR operation:

`if (pieces(PAWN) & (RANK_1 | RANK_8))`

`RANK_1 | RANK_8` evaluates to the integer `0 | 7 == 7` instead of a bitboard, so the expression only tests squares A1, B1 and C1. As a result, unsupported positions with pawns elsewhere on the first or eighth rank are silently accepted. For instance, `position fen 3P3k/8/8/8/8/8/8/3K4 w - - 0 1` is accepted even though the pawn on d8 makes the position unsupported.

Use the `Rank1BB | Rank8BB` bitboard constants so any pawn on the first or eighth rank are correctly rejected.

closes #6887

No functional change
Fixes an error introduced in 278a755 #6891

Also enables more checking in debug builds

closes #6894

No functional change
This version handles aborting engine processes more gracefully. This
also test the engine prior to use, as the process is nevertheless not
fully robust.

closes #6893

No functional change
As pointed out in
0111d11#r188197707,
the parameter type should be `T` not `int`

closes #6896

No functional change
Somehow a few size_t slipped through #6874

closes #6897

No functional change
After merging the HalfKA and Threats accumulators (7c7fe32) and the
subsequent removal of the double-incremental/fused update, a number of
NNUE helpers and fields became unreachable. Each was verified to have
zero callers/readers across the source tree:

- FusedUpdateData logic in FullThreats: the fused-update branch of
  append_changed_indices and the FusedUpdateData parameter are unused;
  the accumulator update no longer passes fused data.
- FullThreats::requires_refresh: never called. The live king-bucket
  refresh check is HalfKAv2_hm::requires_refresh (PSQFeatureSet), used
  in nnue_accumulator.
- HalfKAv2_hm::append_active_indices: never called. The live
  active-index builder is FullThreats::append_active_indices
  (ThreatFeatureSet).
- DirtyThreats::us, prevKsq and ksq: written in do_move but only read by
  the now-removed FullThreats::requires_refresh. Removing them also
  drops three stores from the do_move path.
- Unused feature Name constants and the unused FtOneVal / HiddenMaxVal
  constants in nnue_common.h.
- Two stale feature-header banner comments.

closes #6898

No functional change
Reuse the already-computed ray instead of calling ray_pass_bb a second time with identical arguments.

closes #6899

No functional change
Passed 8th 5+0.05 https://tests.stockfishchess.org/tests/view/6a0bfdb46524d21ee79b879b
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 23472 W: 6103 L: 5823 D: 11546
Ptnml(0-2): 26, 2501, 6403, 2779, 27

Passed 8th 20+0.2 https://tests.stockfishchess.org/tests/view/6a0c87196524d21ee79b885c
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 44692 W: 11640 L: 11319 D: 21733
Ptnml(0-2): 12, 4418, 13169, 4731, 16

Passed 16th 5+0.05 https://tests.stockfishchess.org/tests/view/6a20533f818cacc1db0ad32b
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 14720 W: 3910 L: 3642 D: 7168
Ptnml(0-2): 6, 1434, 4223, 1680, 17

Passed 64th 10+0.1 https://tests.stockfishchess.org/tests/view/6a20ae8e818cacc1db0ad369

LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 34096 W: 8889 L: 8607 D: 16600
Ptnml(0-2): 4, 2974, 10808, 3260, 2

Continuation history is fixed size so there's a much more false sharing and a larger speed loss here at high thread counts, unfortunately.

vondele's machine, 4x70

```
==== master ====
Average (over 3):  294528128
==== shared-conthist ====
Average (over 3):  282364217 (~4% slowdown)
```

my machine, 8x32
```
Nodes/second               : 243157385
Nodes/second               : 228374554 (~6% slowdown)
```

Evidently it still gains at 64th, but a few followup ideas to try get the speed back:
- Add padding in `PieceToHistory` stats so there's less false sharing.
- Subdivide continuation history more finely than shared correction history.
- Scramble the `PieceToHistory` indexing so there's less false sharing

closes #6905

No functional change
This PR fixes the remaining corner cases in the treatment of MultiPV
mated-in PVs, as well as an oversight in #6886. See the discussion in

In particular:
1. `previousScore` and `previousPV` can only be trusted, if that
   rootmove was indeed fully searched in the previous iteration.
2. A move beyond `pvIdx` (that was hence not fully searched) may have an
   exact loss score that cannot be trusted. So if a MultiPV search gets
   aborted while searching `pvIdx`, we mark all the following loss
   scores as bounds.
3. The forgotten mate logic also got broken in #6886, because the
   `previousPV` of the forgotten mate's bestmove can only be trusted if
   that move was fully searched in the previous iteration, something
   that is not guaranteed. So we now store both `lastBestMoveScore` and
   `lastBestMovePV`.

Here some scenarios for MultiPV = 8 that explain how master was broken:
1. Move A with an inexact mated-in-2 score from the previous iteration
   (so outside the top8 moves) gets flushed into the top8 moves for the
   current iteration, because the previous top8 move B is now scored as
   a mated-in-1. Hence we cannot trust `previousScore` or `previousPV`
   for move A, if the search gets aborted while it is being searched.
2. In the scenario above, move B has `Score != -VALUE_INFINITE` and a
   mated-in-1 score, which cannot be trusted as it was not fully
   searched.
3. Iteration N has bestmove A with mated-in-10, which gets recorded in
   `lastBestMoveScore` (renamed from `lastIterationScore`). Iteration 11
   forgets the mate and has bestmove B with a cp score, move A may have
   an incomplete PV, and may even have a non-mate score. Iteration 12
   gets aborted, and in trying to remember the forgotten mate, master
   recovers the `previousScore` and `previousPV` of move A, which may be
   neither mate nor complete.

Passed STC non-reg:
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 69728 W: 17748 L: 17573 D: 34407
Ptnml(0-2): 143, 7571, 19274, 7720, 156
https://tests.stockfishchess.org/tests/view/6a2c40c60d5d4b19d08052f2

closes #6906

No functional change
We can straightforwardly parallelize the `check_universal.sh` script, which takes quite a bit of time for the x86 builds.

closes #6919

No functional change
Worker::do_move computes the successor hash key via the new
Position::key_after(m) and prefetches the TT entry one full do_move
earlier than the existing prefetch in Position::do_move. key_after does
not model castling, en passant or promotion keys exactly; for rare
moves the prefetch lands on an unused line.

`key_after` has been around since 2014 (82d065b0) and was removed in (#5770). Adding back `prefetch_key` helps in common, normal moves at the cost of extra compute.

Speedup (PGO vs PGO, interleaved paired bench, n=48 pairs, Apple M2
Pro / apple-silicon): +0.69% [0.47, 0.91]

Passed STC:
https://tests.stockfishchess.org/tests/view/6a291f8d7c758d82accea17f
LLR: 4.24 (-2.94,2.94) <0.00,2.00>
Total: 473504 W: 121250 L: 120228 D: 232026
Ptnml(0-2): 1112, 51137, 131251, 52121, 1131

No functional change

closes #6911

No functional change
If improving is true reduce depth of probCut search by 1.
Passed STC:
https://tests.stockfishchess.org/tests/view/6a2a6ceb17167cbe7100a909
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 119136 W: 30591 L: 30158 D: 58387
Ptnml(0-2): 323, 13831, 30845, 14228, 341
Passed LTC:
https://tests.stockfishchess.org/tests/view/6a2d65a40d5d4b19d080537f
LLR: 2.96 (-2.94,2.94) <0.50,2.50>
Total: 186552 W: 47377 L: 46755 D: 92420
Ptnml(0-2): 89, 20047, 52395, 20643, 102

closes #6914

Bench: 2422062
This network is further trained on a new BT4 distillation stage, fine tuning on ~2 billion positions relabeled with the value head output of `BT4-tf13tune.pb.gz`. The dataset can be found at https://huggingface.co/datasets/xushawn/test80-bt4-relabel. A modified branch of lc0 was used to derive this data: https://github.com/xu-shawn/lc0/tree/relabel_dual_stream_test

2 billion positions represent a tiny subset of the total training data, and BT4 relabeling is inherently computationally expensive. I expect a lot more gains as more data are relabeled, but it will likely require coordinated community effort. Everyone is welcome to contribute, and yl25946 has made a spreadsheet to track progress: https://docs.google.com/spreadsheets/d/1yanofhusEzDg8ZnurAw799ikoTY6GcqsNMYfpswOIbw/edit.

Special thanks to Viren6, who performed policy/value distillation experiments on Monty, and created the lc0 distillation fork that the current relabeler is based on; yl25946 for proposing the idea of large network distillations back in February 2025, running distillation experiments on the HL4096 network, and working on fine tuning attempts; vondele for nettest and suggesting the fine-tuning approach; and many others on the knowledge distillation thread in the SF Discord #ideas channel.

nettest PR: vondele/nettest#369

Ongoing STC:
 LLR: -0.01 (-2.94,2.94) <0.00,2.00>
Total: 72224 W: 18891 L: 18784 D: 34549
Ptnml(0-2): 336, 8437, 18332, 8798, 209
https://tests.stockfishchess.org/tests/view/6a3ae7913036e45021aeb4a0

Passed LTC:
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 25110 W: 6566 L: 6288 D: 12256
Ptnml(0-2): 27, 2625, 6957, 2935, 11
https://tests.stockfishchess.org/tests/view/6a3b73513036e45021aeb51e

Passed VLTC:
 LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 18544 W: 4924 L: 4658 D: 8962
Ptnml(0-2): 5, 1730, 5533, 2002, 2
https://tests.stockfishchess.org/tests/view/6a3bbe233036e45021aeb56e

closes #6924

Bench: 2710209

Co-authored-by: Li Ying <121075683+yl25946@users.noreply.github.com>
Co-authored-by: Viren6 <94880762+Viren6@users.noreply.github.com>
When the king is in check, a pseudo-legal move must be a valid evasion. Instead of duplicating the checking rules for king and non-king moves, we can leverage the existing MoveList<EVASIONS> class.

Passed non-reg STC:
LLR: 2.98 (-2.94,2.94) <-1.75,0.25>
Total: 187360 W: 47579 L: 47524 D: 92257
Ptnml(0-2): 418, 20266, 52263, 20309, 424
https://tests.stockfishchess.org/tests/view/6a28aa5b7c758d82accea13c

closes #6902

No functional change
The following have zero call sites repo-wide:

    SearchManager::id: never read or written (also never initialized, nor was it ever used).
    Search::Worker::elapsed_time(): never called. PV output uses tm.elapsed_time() (TimeManager) directly. (removed callers on 25361e5)
    MovePicker::begin()/end(): unused private accessors. (removed callers on 8c2d21f)

closes #6909

No functional change
do_null_move copies the whole StateInfo from the previous state, which leaves capturedPiece holding the piece captured by the last real move, so inside the null-move subtree captured_piece() reports a stale capture. The priorCapture consumers in search are all guarded by prevSq != SQ_NONE or (ss-1)->currentMove.is_ok(), which are false at the null-move child, but the stalemate verification gate in qsearch reads captured_piece() unguarded and can be spuriously triggered by the stale value.

Clear the field, since a null move captures nothing.

Passed non-regression STC:
LLR: 2.93 (-2.94,2.94) <-1.75,0.25>
Total: 82784 W: 21172 L: 21011 D: 40601
Ptnml(0-2): 194, 8976, 22923, 9073, 226
https://tests.stockfishchess.org/tests/view/6a2b5b356b4aa63ddbf31518

Passed non-regression LTC:
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 118134 W: 30007 L: 29891 D: 58236
Ptnml(0-2): 66, 11856, 35108, 11970, 67
https://tests.stockfishchess.org/tests/view/6a2c4c9c0d5d4b19d0805301

closes #6910

No functional change
Passed simplification STC
LLR: 2.93 (-2.94,2.94) <-1.75,0.25>
Total: 206368 W: 52592 L: 52560 D: 101216
Ptnml(0-2): 537, 24201, 53669, 24247, 530
https://tests.stockfishchess.org/tests/view/6a2c5ab00d5d4b19d0805326

Passed simplification LTC
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 109944 W: 27839 L: 27709 D: 54396
Ptnml(0-2): 61, 11938, 30844, 12068, 61
https://tests.stockfishchess.org/tests/view/6a2dc6e70d5d4b19d08053aa

closes #6913

Bench: 2767133
Deduplicating Color-Specific Piece Validation.

The validation checks for the number of pawns and additional promoted pieces are duplicated for WHITE and BLACK. We can combine this logic into a single range-based for loop over both colors.

closes #6922

No functional change
closes #6926

No functional change
Add `test80-2024-01-jan-2tb7p.min-v2.v6.relabel.binpack` to the distillation fine tuning stage, an additional 3.5B (2.9B non-skipped) positions.

nettest PR: vondele/nettest#375

Passed STC (vs #6924):
LLR: 2.94 (-2.94,2.94) <0.00,2.00>
Total: 57952 W: 15000 L: 14656 D: 28296
Ptnml(0-2): 164, 6651, 15003, 6993, 165
https://tests.stockfishchess.org/tests/view/6a3cca103036e45021aeb6f8

Passed LTC:
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 81456 W: 21265 L: 20858 D: 39333
Ptnml(0-2): 52, 8630, 22958, 9035, 53
https://tests.stockfishchess.org/tests/view/6a3dbe203036e45021aeb828

closes #6929

Bench: 2703604
This network is trained by adding the following binpacks, relabeled by @vondele and totaling 74B positions, to the distillation fine-tuning stage:
```
  - vondele/from_kaggle_1_relabel/leela96-filt-v2.min.split_0.relabel-BT4-tf13tune.binpack
  - vondele/from_kaggle_1_relabel/leela96-filt-v2.min.split_1.relabel-BT4-tf13tune.binpack
  - vondele/from_kaggle_1_relabel/leela96-filt-v2.min.split_2.relabel-BT4-tf13tune.binpack
  - vondele/from_kaggle_1_relabel/leela96-filt-v2.min.split_3.relabel-BT4-tf13tune.binpack
  - vondele/from_kaggle_1_relabel/leela96-filt-v2.min.split_4.relabel-BT4-tf13tune.binpack
  - vondele/from_kaggle_2_relabel/T60T70wIsRightFarseerT60T74T75T76.split_0.relabel-BT4-tf13tune.binpack
  - vondele/from_kaggle_2_relabel/T60T70wIsRightFarseerT60T74T75T76.split_1.relabel-BT4-tf13tune.binpack
  - vondele/from_kaggle_2_relabel/T60T70wIsRightFarseerT60T74T75T76.split_2.relabel-BT4-tf13tune.binpack
  - vondele/from_kaggle_2_relabel/T60T70wIsRightFarseerT60T74T75T76.split_3.relabel-BT4-tf13tune.binpack
  - vondele/from_kaggle_2_relabel/T60T70wIsRightFarseerT60T74T75T76.split_4.relabel-BT4-tf13tune.binpack
```

The relabeling effort has been completed during the testing of this patch, and a full training run is on the way. Thanks to @vondele, @anematode, @Disservin, @yl25946, and all who've contributed to the process.

Passed STC:
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 24512 W: 6500 L: 6201 D: 11811
Ptnml(0-2): 69, 2772, 6300, 3021, 94
https://tests.stockfishchess.org/tests/view/6a40b7083036e45021aebbd6

Passed LTC:
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 15708 W: 4232 L: 3960 D: 7516
Ptnml(0-2): 8, 1572, 4420, 1848, 6
https://tests.stockfishchess.org/tests/view/6a413d293036e45021aebc84

nettest PR: vondele/nettest#388

closes #6932

Bench: 2102535

Co-authored-by: Joost VandeVondele <Joost.VandeVondele@gmail.com>
Simplifying the ratio in the eval between the psqt and the positional eval to a basic addition

Passed Nonreg STC:
https://tests.stockfishchess.org/tests/view/6a3eaac63036e45021aeb937
LLR: 2.94 (-2.94,2.94) <-1.75,0.25>
Total: 207392 W: 53521 L: 53489 D: 100382
Ptnml(0-2): 585, 24412, 53748, 24288, 663

Passed Nonreg LTC:
https://tests.stockfishchess.org/tests/view/6a423666f97ff95f7879508e
LLR: 2.95 (-2.94,2.94) <-1.75,0.25>
Total: 27198 W: 7200 L: 6989 D: 13009
Ptnml(0-2): 12, 2794, 7779, 2999, 15

closes #6934

Bench: 2067208
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.