191023#45
Open
amchess wants to merge 1504 commits into
Open
Conversation
Passed non regression STC: https://tests.stockfishchess.org/tests/view/693e642c46f342e1ec20f68d LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 107968 W: 28080 L: 27937 D: 51951 Ptnml(0-2): 381, 12708, 27626, 12925, 344 Passed non regression LTC: https://tests.stockfishchess.org/tests/view/693ff10c46f342e1ec20fa6a LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 334266 W: 85271 L: 85370 D: 163625 Ptnml(0-2): 179, 36395, 94086, 36292, 181 closes #6484 Bench: 2987379
Passed STC: LLR: 2.93 (-2.94,2.94) <-1.75,0.25> Total: 81888 W: 21336 L: 21166 D: 39386 Ptnml(0-2): 284, 9438, 21342, 9584, 296 https://tests.stockfishchess.org/tests/view/692ada47b23dfeae38cffce5 Passed LTC: LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 107328 W: 27534 L: 27404 D: 52390 Ptnml(0-2): 55, 11390, 30659, 11490, 70 https://tests.stockfishchess.org/tests/view/692d7a01b23dfeae38d011ab closes #6467 Bench: 3006182
We did quite a few tests because this is a pretty involved change with unknown scaling behavior, but results are decent. [STC 10+0.1 1th, non-regression](https://tests.stockfishchess.org/tests/live_elo/6941ce3b46f342e1ec210180) ``` LLR: 2.93 (-2.94,2.94) <-1.75,0.25> Total: 83200 W: 21615 L: 21452 D: 40133 Ptnml(0-2): 247, 9064, 22844, 9169, 276 ``` [STC 5+0.05 8th](https://tests.stockfishchess.org/tests/live_elo/693dc38346f342e1ec20f555) ``` LLR: 3.48 (-2.94,2.94) <0.00,2.00> Total: 58536 W: 15067 L: 14688 D: 28781 Ptnml(0-2): 87, 6474, 15781, 6825, 101 ``` [LTC 20+0.2 8th](https://tests.stockfishchess.org/tests/live_elo/693f2afb46f342e1ec20f847) ``` LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 27716 W: 7211 L: 6925 D: 13580 Ptnml(0-2): 8, 2674, 8207, 2962, 7 ``` [LTC 10+0.1 64th](https://tests.stockfishchess.org/tests/live_elo/694003aa46f342e1ec20fac4): ``` LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 16918 W: 4439 L: 4182 D: 8297 Ptnml(0-2): 3, 1493, 5213, 1744, 6 ``` [NUMA test, 5+0.05 256th](https://tests.stockfishchess.org/tests/view/6941ee4e46f342e1ec210203) ``` LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 7124 W: 1910 L: 1678 D: 3536 Ptnml(0-2): 0, 560, 2211, 790, 1 ``` [LTC 60+0.6 64th](https://tests.stockfishchess.org/tests/live_elo/6940a85346f342e1ec20fcde): ``` LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 15504 W: 4045 L: 3826 D: 7633 Ptnml(0-2): 0, 1002, 5530, 1219, 1 ``` Bonus (courtesy of Viz): The 1 double kill in this last test was master blundering a cool mate in 3: https://lichess.org/jyNZuRl4 Basically the idea here is to share correction history between threads. That way, T1 can use the correction values produced by T2, which already searched positions with that pawn structure etc., so that T1 can search more efficiently. The table size per thread is about the same, so we shouldn't get a large increase in hash collisions; in fact, I'd expect a lower collision rate overall. Although I came up with and implemented the idea independently, [Caissa](https://github.com/Witek902/Caissa) was the first engine to implement corrhist sharing (and corrhist in the first place) – this idea is not completely novel. The table size is rounded to a power of two. In particular, it's `65536 * nextPowerOfTwo(threadCount)`. That way, the indexing operation becomes an AND of the key bits with a mask, rather than something more expensive (e.g., a `mul_hi64`-style approach or a modulo). The updates are racy, like the TT, but because `entry` is hoisted into a register, there's no risk of writing back a value that's out of the designated range `[-D, D]`. Various attempts at rewriting using atomics led to substantial slowdowns, so we begrudgingly ignored the functions in thread sanitizer, but at some point we'd like to make this better. We allocate one shared correction history per NUMA node, because the penalty associated with crossing nodes is substantial – I get a 40% hit with NPS=4 and 256 threads, which is intolerable. With separate tables per NUMA node I get a 6% penalty for nodes per second, which isn't ideal but apparently compensated for. closes #6478 Bench: 2690604 Co-authored-by: Disservin <disservin.social@gmail.com>
This moves since they are late in move ordering probably already have pretty bad stats anyway. Passed STC: https://tests.stockfishchess.org/tests/view/6943bcd546f342e1ec210e25 LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 96704 W: 25206 L: 24798 D: 46700 Ptnml(0-2): 357, 11244, 24767, 11602, 382 Passed LTC: https://tests.stockfishchess.org/tests/view/6946a8723c8768ca450722f0 LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 89814 W: 23193 L: 22770 D: 43851 Ptnml(0-2): 53, 9532, 25321, 9941, 60 bench 2717363 closes #6485 Bench: 2791988
Passed STC: LLR: 2.95 (-2.94,2.94) <-1.75,0.25> Total: 211776 W: 54939 L: 54911 D: 101926 Ptnml(0-2): 714, 24971, 54484, 25011, 708 https://tests.stockfishchess.org/tests/view/6938971875b70713ef796b70 Passed LTC: LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 216774 W: 55346 L: 55326 D: 106102 Ptnml(0-2): 105, 23599, 60980, 23577, 126 https://tests.stockfishchess.org/tests/view/693fc91f46f342e1ec20f9f6 closes #6486 Bench: 3267755
Init threat offsets at compile time. Avoid another global init function call. Passed STC Non-Regression: https://tests.stockfishchess.org/tests/view/694971a83c8768ca4507275c LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 43296 W: 11284 L: 11077 D: 20935 Ptnml(0-2): 152, 4611, 11924, 4800, 161 closes #6487 No functional change
Passed STC: LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 80128 W: 20879 L: 20498 D: 38751 Ptnml(0-2): 274, 9318, 20496, 9705, 271 https://tests.stockfishchess.org/tests/view/6945d11f3c8768ca45072218 Passed LTC: LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 134298 W: 34497 L: 33983 D: 65818 Ptnml(0-2): 81, 14334, 37812, 14834, 88 https://tests.stockfishchess.org/tests/view/6947bf033c8768ca45072491 closes #6488 Bench: 2325401
Passed STC: LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 38208 W: 10076 L: 9754 D: 18378 Ptnml(0-2): 139, 4390, 9742, 4676, 157 https://tests.stockfishchess.org/tests/view/6945bb6446f342e1ec211d93 Passed LTC: LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 64086 W: 16529 L: 16157 D: 31400 Ptnml(0-2): 34, 6808, 17990, 7174, 37 https://tests.stockfishchess.org/tests/view/69479d303c8768ca45072446 closes #6489 Bench: 2442415
closes #6490 No functional change
update to current closes #6491 No functional change
Passed simplification STC LLR: 2.93 (-2.94,2.94) <-1.75,0.25> Total: 92096 W: 23888 L: 23728 D: 44480 Ptnml(0-2): 336, 10796, 23608, 10988, 320 https://tests.stockfishchess.org/tests/view/694b6b9d572093c1986d6ae0 Passed simplification LTC LLR: 2.96 (-2.94,2.94) <-1.75,0.25> Total: 50064 W: 12789 L: 12598 D: 24677 Ptnml(0-2): 24, 5350, 14103, 5521, 34 https://tests.stockfishchess.org/tests/view/694d49aa572093c1986d7021 closes #6493 Bench: 2494221
Fix incorrect nonPawnKey update Passed non-reg SMP STC: ``` LLR: 2.93 (-2.94,2.94) <-1.75,0.25> Total: 139424 W: 35792 L: 35690 D: 67942 Ptnml(0-2): 197, 15783, 37665, 15855, 212 ``` https://tests.stockfishchess.org/tests/view/694b7b7e572093c1986d6b0d Passed non-reg SMP LTC: ``` LLR: 2.95 (-2.94,2.94) <-1.75,0.25> Total: 88880 W: 22863 L: 22718 D: 43299 Ptnml(0-2): 16, 8947, 26401, 9028, 48 ``` https://tests.stockfishchess.org/tests/view/694d2ceb572093c1986d6fc8 fixes #6492 closes #6494 Bench: 2475788
Passed STC: LLR: 2.93 (-2.94,2.94) <0.00,2.00> Total: 141120 W: 36860 L: 36390 D: 67870 Ptnml(0-2): 470, 16441, 36314, 16819, 516 https://tests.stockfishchess.org/tests/view/694978e93c8768ca45072763 Passed LTC: LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 66576 W: 17078 L: 16700 D: 32798 Ptnml(0-2): 45, 7093, 18628, 7483, 39 https://tests.stockfishchess.org/tests/view/694bb608572093c1986d6ba6 closes #6496 Bench: 2503391
[Passed STC SMP](https://tests.stockfishchess.org/tests/view/694e506c572093c1986d7276): ``` LLR: 2.97 (-2.94,2.94) <0.00,2.00> Total: 14992 W: 3924 L: 3653 D: 7415 Ptnml(0-2): 20, 1547, 4090, 1820, 19 ``` [Passed LTC SMP](https://tests.stockfishchess.org/tests/live_elo/694ead61572093c1986d7365): ``` LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 41146 W: 10654 L: 10342 D: 20150 Ptnml(0-2): 17, 3999, 12225, 4319, 13 ``` [Passed a sanity check STC SMP post-refactoring](https://tests.stockfishchess.org/tests/view/69503997572093c1986d763a): ``` LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 46728 W: 12178 L: 11863 D: 22687 Ptnml(0-2): 82, 5093, 12685, 5436, 68 ``` (The large gain of the first STC was probably a fluke, and this result is more reasonable!) After shared correction history, Viz suggested we try sharing other histories, especially `pawnHistory`. As far as we're aware, sharing history besides correction history (like Caissa does) is novel. The implementation follows the same pattern as shared correction history – the size of the history table is scaled with `next_power_of_two(threadsInNumaNode)` and the entry is prefetched in `do_move`. A bit of refactoring was done to accommodate this new history. Note that we prefetch `&history->pawn_entry(*this)[pc][to]` rather than `&history->pawn_entry(*this)` because unlike the other entries, each entry contains multiple cache lines. closes #6498 Bench: 2503391 Co-authored-by: Michael Chaly <Vizvezdenec@gmail.com>
Clang pretends to be GCC, but is enraged by `-Wstack-usage`: closes #6499 No functional change
Fixes #6505 Missing initialization seemingly resulting in side effects, as discussed in the issue. Credit to Sopel for spotting the bug. PR used as a testcase for CoPilot, doing the right thing #6478 (comment) closes #6511 No functional change
Use _POSIX_C_SOURCE to check for PTHREAD_MUTEX_ROBUST support. The latter is a enum, not a defined variable. closes #6510 No functional change
this patch dampens down main history to 3/4 of it value for all possible moves at the start of ID loop, making it partially refresh with every new root position. Passed STC: https://tests.stockfishchess.org/tests/view/694e33ff572093c1986d7234 LLR: 2.93 (-2.94,2.94) <0.00,2.00> Total: 115520 W: 30164 L: 29735 D: 55621 Ptnml(0-2): 395, 13192, 30192, 13551, 430 Passed LTC: https://tests.stockfishchess.org/tests/view/6950cbe6572093c1986d816c LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 63672 W: 16480 L: 16114 D: 31078 Ptnml(0-2): 46, 6524, 18329, 6892, 45 closes #6504 bench 2710946
passed STC: LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 71968 W: 18833 L: 18489 D: 34646 Ptnml(0-2): 192, 7310, 20643, 7640, 199 https://tests.stockfishchess.org/tests/view/69509e5c572093c1986d7a0a closes #6512 No functional change
WINE_PATH started as a Wine-specific knob, but it’s now used more generally as a command prefix to run the built engine under wrappers like Intel SDE, qemu-user, etc. - Add RUN_PREFIX as the supported “run wrapper/prefix” variable in Makefile - Set WINE_PATH as a deprecated alias - Update CI and scripts to use RUN_PREFIX closes #6500 No functional change
This code path is never taken for vector sizes >= 512, so we can simplify it. closes #6501 No functional change
closes #6503 No functional change
closes #6509 No functional change
Happy New Year! closes #6514 No functional change
closes #6523 No functional change
The bestValue can sometimes go down. This happens 2% of the time or so. This fix stops it from decreasing. Failed gainer STC: LLR: -2.94 (-2.94,2.94) <0.00,2.00> Total: 146176 W: 37930 L: 37976 D: 70270 Ptnml(0-2): 480, 17422, 37366, 17304, 516 https://tests.stockfishchess.org/tests/view/6953be19572093c1986da66a Passed Non-regression LTC: LLR: 2.95 (-2.94,2.94) <-1.75,0.25> Total: 257796 W: 65662 L: 65683 D: 126451 Ptnml(0-2): 164, 28247, 72087, 28246, 154 https://tests.stockfishchess.org/tests/view/69554ff0d844c1ce7cc7e333 closes #6520 fixes #6519 Bench: 2477446
Recent changes to the Square enum (reducing it from int32_t to int8_t) now allow the compiler to vectorize loops that were previously too wide for targets below AVX-512. However, this vectorization which Clang performs is not correct and causes a miscompilation. Disable this vectorization. This particular issue was noticable with Clang 15 and Clang 19, on avx2 as well as applie-silicon. Ref: #6063 Original Clang Issue: llvm/llvm-project#80494 First reported by #6528, though misinterpreted. closes #6529 No functional change
Only the one on line 158 is actually required but doesn't hurt to add constexpr where applicable here. Warning was "comparison of unsigned expression in '< 0' is always false" closes #6530 No functional change
Compiles and Runs Stockfish on all supported gcc & clang compilers. Only linux and avx2 currently. closes #6533 No functional change
example of using, to avoid mixed usage of std::uint/std::int and uint/int... ```cpp using u64 = std::uint64_t; using u32 = std::uint32_t; using u16 = std::uint16_t; using u8 = std::uint8_t; using i64 = std::int64_t; using i32 = std::int32_t; using i16 = std::int16_t; using i8 = std::int8_t; using usize = std::size_t; using isize = std::ptrdiff_t; #if defined(__GNUC__) && defined(IS_64BIT) __extension__ using u128 = unsigned __int128; __extension__ using i128 = signed __int128; #endif ``` closes #6874 No functional change
Lichess maintains some patches on top of SF dev to get it working with Emscripten. This PR moves some of these patches into SF and adds WASM to CI. It also adds a few changes in places where the x86 intrinsics don't cleanly map onto WebAssembly SIMD instructions; otherwise, we use Emscripten's x86 compatibility layer and take SSE4.1 code paths.
Summary of the compatibility changes:
- Define `wasm32` and `wasm32-relaxed-simd` targets.
- We don't support wasm without SIMD; it'd be a waste of time.
- Add option to disable TBs
- This is required because `tbprobe.cpp` pulls in `mmap`. This option can be used on any target, of course, but it's only enabled by default for wasm.
- Add compilation job + test to CI
And the changes for performance:
- Disable atomics for shared history on wasm
- Atomics are always `seq_cst` there, which can be quite slow (even on the x86, stores are locked `xchg [mem], reg`)
- Add SSE code path to `get_changed_pieces`, modeled after the AVX2 path
- `_mm_mulhi_epi16` has a complicated emulation sequence, so for the pairwise multiplication, use an approach similar to the NEON impl.
- __int128 is gets lowered to runtime functions on wasm, so use the fallback impl for `mul_hi64`
- V8 does a poor job with the NNZ finding, so use a slightly different sequence there
- Add relaxed simd support for `m128_dpbusd`.
Some local perf figures (single-threaded speedtest):
```
wasm
Nodes/second : 902523
sse4.1
Nodes/second : 1155380
avx512icl
Nodes/second : 1676184
```
Further avenues to explore:
- Optimize for performance under V8's experimental AVX revectorizer (Currently it's about +10% in my testing, but could definittely be more)
- Branch hinting. For example, run bench while collecting branch frequency info, then inject it late in the WASM compilation pipeline. I tried this locally and it didn't help much, but maybe I'm missing something.
- PGO. Gives +1.5% NPS locally, but hard to integrate with WASM compilation wrokflows
closes #6875
No functional change
This PR introduces the additional `RootMove` attribute `previousPV` so that scores and PVs we send to the GUI in MultiPV analysis always match. This allows us in particular to extend our guarantee of exact mate (and TB win/loss) scores having a complete PV (leading to checkmate in the correct number of plies) to all PV lines. Recall that master fails here, since partially searched root moves may send to the GUI the previous score with the current/modified PV. See #6784. The PR also uses the new attribute to extend the followPV logic to the analysis of sidelines, building on the idea in #6813 by @joergoster. Passed non-reg STC: LLR: 2.95 (-2.94,2.94) <-1.75,0.25> Total: 166880 W: 42357 L: 42282 D: 82241 Ptnml(0-2): 394, 18685, 45177, 18820, 364 https://tests.stockfishchess.org/tests/view/6a0dea55818cacc1db0abb6a Failed non-reg LTC: LLR: -2.97 (-2.94,2.94) <-1.75,0.25> Total: 890520 W: 224168 L: 225282 D: 441070 Ptnml(0-2): 390, 91902, 261789, 90790, 389 https://tests.stockfishchess.org/tests/view/6a1143ad818cacc1db0ac14c Opening as draft for discussion on how to proceed. In SinglePV analysis, the patch is completely nonfunctional. But it is maybe a (small?) slowdown because of the increased size of `RootMove`. I am not sure if there as an elegant way to enrich the class only for MultiPV analysis (but the switch can happen at any time through the UCI interface), or to mitigate the speed penalty in some other way. A local speedup test shows only a small slowdown on my system (but still high error bars): ``` sf_base = 1156928 +/- 1459 (95%) sf_test = 1155885 +/- 1283 (95%) diff = -1043 +/- 1777 (95%) speedup = -0.09021% +/- 0.154% (95%) ``` The PR also adds the new MultiPV mate PV correctness check to the CI. closes #6886 No functional change
Fixes #6881. `timeout_decorator()` used a `ThreadPoolExecutor` context manager around blocking output waits. When `future.result(timeout=...)` timed out, leaving the context manager still waited for the worker thread to finish, so a blocked stdout read could keep the instrumented tests hanging past the configured timeout. This change removes that executor wrapper for interactive Stockfish output waits. The harness now drains process output on a daemon reader thread, queues received lines, and applies the deadline directly while waiting for the next queued line. `TimeoutException` also initializes the base exception message so failures show useful text. Validation: - `python3 -m py_compile tests/testing.py tests/instrumented.py` - local timeout smoke test: a 0.2s no-output wait raises in ~0.204s - Stockfish smoke test: startup/`uciok` read succeeds, deliberate no-output wait raises in ~0.205s, engine exits 0 - `make -C src -j4 build` - `../tests/signature.sh` -> `2814421` closes #6882 No functional change
…s at compile time Adds a `RelaxedAtomic` wrapper around either `T` or `std::atomic<T>` and `USE_SLOPPY_ATOMICS` preprocessor define. The intent of this flag is to allow easy disabling of atomics on WASM, where even relaxed atomics are expensive because all atomics have `seq_cst` semantics. Passed non regression STC LLR: 2.99 (-2.94,2.94) <-1.75,0.25> Total: 50624 W: 12976 L: 12776 D: 24872 Ptnml(0-2): 112, 5445, 14005, 5631, 119 https://tests.stockfishchess.org/tests/view/6a1f690e818cacc1db0ad2c7 Passed non-regression STC SMP LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 163696 W: 41514 L: 41438 D: 80744 Ptnml(0-2): 162, 18272, 44904, 18348, 162 https://tests.stockfishchess.org/tests/view/6a21fb97351b79f679cc44b3 Using this class for the TT also allows us to remove the TSAN suppressions, since the UB is fixed. closes #6877 No functional change
Passed STC (https://tests.stockfishchess.org/tests/view/6a2893ad7c758d82accea129): LLR: 3.20 (-2.94,2.94) <0.00,2.00> Total: 23328 W: 6145 L: 5838 D: 11345 Ptnml(0-2): 50, 2463, 6346, 2740, 65 Instead of repeatedly doing the sum HalfKA + threats at the end, it's profitable to simply store one accumulator per side that combines them. This also avoids an extra/load store of an accumulator, and halves the cache footprint of the accumulators. For full refreshes, we always compute both halfka and threats simultaneously. Any threat full refresh is always a halfka refresh because it occurs when the king crosses the center line, while halfka refreshes are required for ANY king move, so we don't need a separate detection path for threats. I get about a 2.5% speedup locally with this, but I'd appreciate other ppl's measurements. closes #6890 No functional change
Passed STC https://tests.stockfishchess.org/tests/view/6a2682ce351b79f679cc47c5 LLR: 2.93 (-2.94,2.94) <0.00,2.00> Total: 57248 W: 14730 L: 14399 D: 28119 Ptnml(0-2): 145, 6159, 15698, 6464, 158 Reordering operations in `do_move` allows us to effectively prefetch the TT entry earlier, since the piece moving helpers don't actually modify the position key. I suspect that with threat inputs, `put_piece` and friends got a lot more expensive, and so this helps us a lot. vondele's machine: ==== master ==== 1 Nodes/second : 294311526 2 Nodes/second : 297068312 3 Nodes/second : 297418763 Average (over 3): 296266200 ==== pfearly ==== 1 Nodes/second : 303986449 2 Nodes/second : 304221719 3 Nodes/second : 305302969 Average (over 3): 304503712 (+2.78%) Locally, `bench`: Result of 200 runs speedup = +0.0158 P(speedup > 0) = 1.0000 As expected it helps even more in a large-hash, NUMA setting. closes #6891 No functional change
In a recent CCC event, Stockfish (probably through no fault of its own), lost some games on time when it was winning and when it had already found the move that delivers checkmate. This patch stops the search when TM is active, and when mainthread can be certain that it is impossible to find a better move. That is if (i) it has found mate-in-1, (ii) it has found a mate-in-2 or (iii) it has found a mated-in-1. patch: ``` position fen 5K2/8/2qk4/2nPp3/3r4/6B1/B7/3R4 w - e6 go wtime 100000000 winc 100000000 info string Available processors: 0-7 info string Using 1 thread info string NNUE evaluation using nn-71d6d32cb962.nnue (106MiB, (83248, 1024, 31, 32, 1)) info string Network replica 1: Shared memory. info depth 1 seldepth 3 multipv 1 score mate 1 nodes 30 nps 30000 hashfull 0 tbhits 0 time 1 pv d5e6 bestmove d5e6 ``` master: ``` position fen 5K2/8/2qk4/2nPp3/3r4/6B1/B7/3R4 w - e6 go wtime 100000000 winc 100000000 info string Available processors: 0-7 info string Using 1 thread info string NNUE evaluation using nn-71d6d32cb962.nnue (106MiB, (83248, 1024, 31, 32, 1)) info string Network replica 1: Shared memory. info depth 1 seldepth 3 multipv 1 score mate 1 nodes 30 nps 15000 hashfull 0 tbhits 0 time 2 pv d5e6 <snip> info depth 245 seldepth 2 multipv 1 score mate 1 nodes 5886 nps 367875 hashfull 0 tbhits 0 time 16 pv d5e6 bestmove d5e6 ``` Note: In MultiPV analysis (extremely rare with TM active), we take the point of view that the user would like to continue to search until none of the PVs can be improved anymore. This means we only stop if the worst searched line is at least a mate-in-2, or if the best searched line is a mated-in-1. closes #6879 No functional change
The FEN validation check intended to reject pawns on the first or eighth rank uses the `Rank` enum values in a bitwise OR operation: `if (pieces(PAWN) & (RANK_1 | RANK_8))` `RANK_1 | RANK_8` evaluates to the integer `0 | 7 == 7` instead of a bitboard, so the expression only tests squares A1, B1 and C1. As a result, unsupported positions with pawns elsewhere on the first or eighth rank are silently accepted. For instance, `position fen 3P3k/8/8/8/8/8/8/3K4 w - - 0 1` is accepted even though the pawn on d8 makes the position unsupported. Use the `Rank1BB | Rank8BB` bitboard constants so any pawn on the first or eighth rank are correctly rejected. closes #6887 No functional change
This version handles aborting engine processes more gracefully. This also test the engine prior to use, as the process is nevertheless not fully robust. closes #6893 No functional change
As pointed out in 0111d11#r188197707, the parameter type should be `T` not `int` closes #6896 No functional change
After merging the HalfKA and Threats accumulators (7c7fe32) and the subsequent removal of the double-incremental/fused update, a number of NNUE helpers and fields became unreachable. Each was verified to have zero callers/readers across the source tree: - FusedUpdateData logic in FullThreats: the fused-update branch of append_changed_indices and the FusedUpdateData parameter are unused; the accumulator update no longer passes fused data. - FullThreats::requires_refresh: never called. The live king-bucket refresh check is HalfKAv2_hm::requires_refresh (PSQFeatureSet), used in nnue_accumulator. - HalfKAv2_hm::append_active_indices: never called. The live active-index builder is FullThreats::append_active_indices (ThreatFeatureSet). - DirtyThreats::us, prevKsq and ksq: written in do_move but only read by the now-removed FullThreats::requires_refresh. Removing them also drops three stores from the do_move path. - Unused feature Name constants and the unused FtOneVal / HiddenMaxVal constants in nnue_common.h. - Two stale feature-header banner comments. closes #6898 No functional change
Reuse the already-computed ray instead of calling ray_pass_bb a second time with identical arguments. closes #6899 No functional change
Passed 8th 5+0.05 https://tests.stockfishchess.org/tests/view/6a0bfdb46524d21ee79b879b LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 23472 W: 6103 L: 5823 D: 11546 Ptnml(0-2): 26, 2501, 6403, 2779, 27 Passed 8th 20+0.2 https://tests.stockfishchess.org/tests/view/6a0c87196524d21ee79b885c LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 44692 W: 11640 L: 11319 D: 21733 Ptnml(0-2): 12, 4418, 13169, 4731, 16 Passed 16th 5+0.05 https://tests.stockfishchess.org/tests/view/6a20533f818cacc1db0ad32b LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 14720 W: 3910 L: 3642 D: 7168 Ptnml(0-2): 6, 1434, 4223, 1680, 17 Passed 64th 10+0.1 https://tests.stockfishchess.org/tests/view/6a20ae8e818cacc1db0ad369 LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 34096 W: 8889 L: 8607 D: 16600 Ptnml(0-2): 4, 2974, 10808, 3260, 2 Continuation history is fixed size so there's a much more false sharing and a larger speed loss here at high thread counts, unfortunately. vondele's machine, 4x70 ``` ==== master ==== Average (over 3): 294528128 ==== shared-conthist ==== Average (over 3): 282364217 (~4% slowdown) ``` my machine, 8x32 ``` Nodes/second : 243157385 Nodes/second : 228374554 (~6% slowdown) ``` Evidently it still gains at 64th, but a few followup ideas to try get the speed back: - Add padding in `PieceToHistory` stats so there's less false sharing. - Subdivide continuation history more finely than shared correction history. - Scramble the `PieceToHistory` indexing so there's less false sharing closes #6905 No functional change
This PR fixes the remaining corner cases in the treatment of MultiPV mated-in PVs, as well as an oversight in #6886. See the discussion in In particular: 1. `previousScore` and `previousPV` can only be trusted, if that rootmove was indeed fully searched in the previous iteration. 2. A move beyond `pvIdx` (that was hence not fully searched) may have an exact loss score that cannot be trusted. So if a MultiPV search gets aborted while searching `pvIdx`, we mark all the following loss scores as bounds. 3. The forgotten mate logic also got broken in #6886, because the `previousPV` of the forgotten mate's bestmove can only be trusted if that move was fully searched in the previous iteration, something that is not guaranteed. So we now store both `lastBestMoveScore` and `lastBestMovePV`. Here some scenarios for MultiPV = 8 that explain how master was broken: 1. Move A with an inexact mated-in-2 score from the previous iteration (so outside the top8 moves) gets flushed into the top8 moves for the current iteration, because the previous top8 move B is now scored as a mated-in-1. Hence we cannot trust `previousScore` or `previousPV` for move A, if the search gets aborted while it is being searched. 2. In the scenario above, move B has `Score != -VALUE_INFINITE` and a mated-in-1 score, which cannot be trusted as it was not fully searched. 3. Iteration N has bestmove A with mated-in-10, which gets recorded in `lastBestMoveScore` (renamed from `lastIterationScore`). Iteration 11 forgets the mate and has bestmove B with a cp score, move A may have an incomplete PV, and may even have a non-mate score. Iteration 12 gets aborted, and in trying to remember the forgotten mate, master recovers the `previousScore` and `previousPV` of move A, which may be neither mate nor complete. Passed STC non-reg: LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 69728 W: 17748 L: 17573 D: 34407 Ptnml(0-2): 143, 7571, 19274, 7720, 156 https://tests.stockfishchess.org/tests/view/6a2c40c60d5d4b19d08052f2 closes #6906 No functional change
We can straightforwardly parallelize the `check_universal.sh` script, which takes quite a bit of time for the x86 builds. closes #6919 No functional change
Worker::do_move computes the successor hash key via the new Position::key_after(m) and prefetches the TT entry one full do_move earlier than the existing prefetch in Position::do_move. key_after does not model castling, en passant or promotion keys exactly; for rare moves the prefetch lands on an unused line. `key_after` has been around since 2014 (82d065b0) and was removed in (#5770). Adding back `prefetch_key` helps in common, normal moves at the cost of extra compute. Speedup (PGO vs PGO, interleaved paired bench, n=48 pairs, Apple M2 Pro / apple-silicon): +0.69% [0.47, 0.91] Passed STC: https://tests.stockfishchess.org/tests/view/6a291f8d7c758d82accea17f LLR: 4.24 (-2.94,2.94) <0.00,2.00> Total: 473504 W: 121250 L: 120228 D: 232026 Ptnml(0-2): 1112, 51137, 131251, 52121, 1131 No functional change closes #6911 No functional change
If improving is true reduce depth of probCut search by 1. Passed STC: https://tests.stockfishchess.org/tests/view/6a2a6ceb17167cbe7100a909 LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 119136 W: 30591 L: 30158 D: 58387 Ptnml(0-2): 323, 13831, 30845, 14228, 341 Passed LTC: https://tests.stockfishchess.org/tests/view/6a2d65a40d5d4b19d080537f LLR: 2.96 (-2.94,2.94) <0.50,2.50> Total: 186552 W: 47377 L: 46755 D: 92420 Ptnml(0-2): 89, 20047, 52395, 20643, 102 closes #6914 Bench: 2422062
This network is further trained on a new BT4 distillation stage, fine tuning on ~2 billion positions relabeled with the value head output of `BT4-tf13tune.pb.gz`. The dataset can be found at https://huggingface.co/datasets/xushawn/test80-bt4-relabel. A modified branch of lc0 was used to derive this data: https://github.com/xu-shawn/lc0/tree/relabel_dual_stream_test 2 billion positions represent a tiny subset of the total training data, and BT4 relabeling is inherently computationally expensive. I expect a lot more gains as more data are relabeled, but it will likely require coordinated community effort. Everyone is welcome to contribute, and yl25946 has made a spreadsheet to track progress: https://docs.google.com/spreadsheets/d/1yanofhusEzDg8ZnurAw799ikoTY6GcqsNMYfpswOIbw/edit. Special thanks to Viren6, who performed policy/value distillation experiments on Monty, and created the lc0 distillation fork that the current relabeler is based on; yl25946 for proposing the idea of large network distillations back in February 2025, running distillation experiments on the HL4096 network, and working on fine tuning attempts; vondele for nettest and suggesting the fine-tuning approach; and many others on the knowledge distillation thread in the SF Discord #ideas channel. nettest PR: vondele/nettest#369 Ongoing STC: LLR: -0.01 (-2.94,2.94) <0.00,2.00> Total: 72224 W: 18891 L: 18784 D: 34549 Ptnml(0-2): 336, 8437, 18332, 8798, 209 https://tests.stockfishchess.org/tests/view/6a3ae7913036e45021aeb4a0 Passed LTC: LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 25110 W: 6566 L: 6288 D: 12256 Ptnml(0-2): 27, 2625, 6957, 2935, 11 https://tests.stockfishchess.org/tests/view/6a3b73513036e45021aeb51e Passed VLTC: LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 18544 W: 4924 L: 4658 D: 8962 Ptnml(0-2): 5, 1730, 5533, 2002, 2 https://tests.stockfishchess.org/tests/view/6a3bbe233036e45021aeb56e closes #6924 Bench: 2710209 Co-authored-by: Li Ying <121075683+yl25946@users.noreply.github.com> Co-authored-by: Viren6 <94880762+Viren6@users.noreply.github.com>
When the king is in check, a pseudo-legal move must be a valid evasion. Instead of duplicating the checking rules for king and non-king moves, we can leverage the existing MoveList<EVASIONS> class. Passed non-reg STC: LLR: 2.98 (-2.94,2.94) <-1.75,0.25> Total: 187360 W: 47579 L: 47524 D: 92257 Ptnml(0-2): 418, 20266, 52263, 20309, 424 https://tests.stockfishchess.org/tests/view/6a28aa5b7c758d82accea13c closes #6902 No functional change
The following have zero call sites repo-wide:
SearchManager::id: never read or written (also never initialized, nor was it ever used).
Search::Worker::elapsed_time(): never called. PV output uses tm.elapsed_time() (TimeManager) directly. (removed callers on 25361e5)
MovePicker::begin()/end(): unused private accessors. (removed callers on 8c2d21f)
closes #6909
No functional change
do_null_move copies the whole StateInfo from the previous state, which leaves capturedPiece holding the piece captured by the last real move, so inside the null-move subtree captured_piece() reports a stale capture. The priorCapture consumers in search are all guarded by prevSq != SQ_NONE or (ss-1)->currentMove.is_ok(), which are false at the null-move child, but the stalemate verification gate in qsearch reads captured_piece() unguarded and can be spuriously triggered by the stale value. Clear the field, since a null move captures nothing. Passed non-regression STC: LLR: 2.93 (-2.94,2.94) <-1.75,0.25> Total: 82784 W: 21172 L: 21011 D: 40601 Ptnml(0-2): 194, 8976, 22923, 9073, 226 https://tests.stockfishchess.org/tests/view/6a2b5b356b4aa63ddbf31518 Passed non-regression LTC: LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 118134 W: 30007 L: 29891 D: 58236 Ptnml(0-2): 66, 11856, 35108, 11970, 67 https://tests.stockfishchess.org/tests/view/6a2c4c9c0d5d4b19d0805301 closes #6910 No functional change
Passed simplification STC LLR: 2.93 (-2.94,2.94) <-1.75,0.25> Total: 206368 W: 52592 L: 52560 D: 101216 Ptnml(0-2): 537, 24201, 53669, 24247, 530 https://tests.stockfishchess.org/tests/view/6a2c5ab00d5d4b19d0805326 Passed simplification LTC LLR: 2.95 (-2.94,2.94) <-1.75,0.25> Total: 109944 W: 27839 L: 27709 D: 54396 Ptnml(0-2): 61, 11938, 30844, 12068, 61 https://tests.stockfishchess.org/tests/view/6a2dc6e70d5d4b19d08053aa closes #6913 Bench: 2767133
Deduplicating Color-Specific Piece Validation. The validation checks for the number of pawns and additional promoted pieces are duplicated for WHITE and BLACK. We can combine this logic into a single range-based for loop over both colors. closes #6922 No functional change
closes #6926 No functional change
Add `test80-2024-01-jan-2tb7p.min-v2.v6.relabel.binpack` to the distillation fine tuning stage, an additional 3.5B (2.9B non-skipped) positions. nettest PR: vondele/nettest#375 Passed STC (vs #6924): LLR: 2.94 (-2.94,2.94) <0.00,2.00> Total: 57952 W: 15000 L: 14656 D: 28296 Ptnml(0-2): 164, 6651, 15003, 6993, 165 https://tests.stockfishchess.org/tests/view/6a3cca103036e45021aeb6f8 Passed LTC: LLR: 2.95 (-2.94,2.94) <0.50,2.50> Total: 81456 W: 21265 L: 20858 D: 39333 Ptnml(0-2): 52, 8630, 22958, 9035, 53 https://tests.stockfishchess.org/tests/view/6a3dbe203036e45021aeb828 closes #6929 Bench: 2703604
This network is trained by adding the following binpacks, relabeled by @vondele and totaling 74B positions, to the distillation fine-tuning stage: ``` - vondele/from_kaggle_1_relabel/leela96-filt-v2.min.split_0.relabel-BT4-tf13tune.binpack - vondele/from_kaggle_1_relabel/leela96-filt-v2.min.split_1.relabel-BT4-tf13tune.binpack - vondele/from_kaggle_1_relabel/leela96-filt-v2.min.split_2.relabel-BT4-tf13tune.binpack - vondele/from_kaggle_1_relabel/leela96-filt-v2.min.split_3.relabel-BT4-tf13tune.binpack - vondele/from_kaggle_1_relabel/leela96-filt-v2.min.split_4.relabel-BT4-tf13tune.binpack - vondele/from_kaggle_2_relabel/T60T70wIsRightFarseerT60T74T75T76.split_0.relabel-BT4-tf13tune.binpack - vondele/from_kaggle_2_relabel/T60T70wIsRightFarseerT60T74T75T76.split_1.relabel-BT4-tf13tune.binpack - vondele/from_kaggle_2_relabel/T60T70wIsRightFarseerT60T74T75T76.split_2.relabel-BT4-tf13tune.binpack - vondele/from_kaggle_2_relabel/T60T70wIsRightFarseerT60T74T75T76.split_3.relabel-BT4-tf13tune.binpack - vondele/from_kaggle_2_relabel/T60T70wIsRightFarseerT60T74T75T76.split_4.relabel-BT4-tf13tune.binpack ``` The relabeling effort has been completed during the testing of this patch, and a full training run is on the way. Thanks to @vondele, @anematode, @Disservin, @yl25946, and all who've contributed to the process. Passed STC: LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 24512 W: 6500 L: 6201 D: 11811 Ptnml(0-2): 69, 2772, 6300, 3021, 94 https://tests.stockfishchess.org/tests/view/6a40b7083036e45021aebbd6 Passed LTC: LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 15708 W: 4232 L: 3960 D: 7516 Ptnml(0-2): 8, 1572, 4420, 1848, 6 https://tests.stockfishchess.org/tests/view/6a413d293036e45021aebc84 nettest PR: vondele/nettest#388 closes #6932 Bench: 2102535 Co-authored-by: Joost VandeVondele <Joost.VandeVondele@gmail.com>
Simplifying the ratio in the eval between the psqt and the positional eval to a basic addition Passed Nonreg STC: https://tests.stockfishchess.org/tests/view/6a3eaac63036e45021aeb937 LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 207392 W: 53521 L: 53489 D: 100382 Ptnml(0-2): 585, 24412, 53748, 24288, 663 Passed Nonreg LTC: https://tests.stockfishchess.org/tests/view/6a423666f97ff95f7879508e LLR: 2.95 (-2.94,2.94) <-1.75,0.25> Total: 27198 W: 7200 L: 6989 D: 13009 Ptnml(0-2): 12, 2794, 7779, 2999, 15 closes #6934 Bench: 2067208
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.