Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
f4bd736
Windows asm/ABI and runtime fixes
hoffmang9 Feb 7, 2026
e7f37d7
CMake: Windows + perf experiment options
hoffmang9 Feb 7, 2026
003ecf3
CI: perf experiments and Windows benchmarking
hoffmang9 Feb 7, 2026
b1d9057
ci: streamline test workflow
hoffmang9 Feb 7, 2026
71121ab
build: remove perf-only CMake toggles
hoffmang9 Feb 7, 2026
679a387
core: gate AVX logging behind env flag
hoffmang9 Feb 7, 2026
70c50ac
asm: document Windows ABI restore
hoffmang9 Feb 7, 2026
e89c805
docs: remove Windows perf notes
hoffmang9 Feb 7, 2026
980b697
Nit note in README and clean up some language in test.yaml
hoffmang9 Feb 7, 2026
d04e5fb
Fix step name
hoffmang9 Feb 7, 2026
ad87252
various workflow clean ups and fix AVX512 flag
hoffmang9 Feb 9, 2026
5a7ee42
fix windows runners in test.yaml
hoffmang9 Feb 9, 2026
880d11f
more cmake instruction issues
hoffmang9 Feb 9, 2026
f3a5229
harden finding boost on Mac intel
hoffmang9 Feb 9, 2026
d713c5e
now cmake isn't always there...
hoffmang9 Feb 9, 2026
8fc91e6
more install cmake
hoffmang9 Feb 9, 2026
313c353
fix ASAN, remove TSAN for windows runners
hoffmang9 Feb 9, 2026
a9c401f
initiate windows dev env on ASAN runner
hoffmang9 Feb 9, 2026
e4d821e
more ASAN windows issues - harden brew handling in cibuildwheel
hoffmang9 Feb 9, 2026
7df51ce
giving up on ASAN
hoffmang9 Feb 9, 2026
f0e2ce6
Add required linker flag for Windows asm - fix unchecked vector access
hoffmang9 Feb 9, 2026
dc36ee8
potentially fix bug in the 2weso test
hoffmang9 Feb 9, 2026
a72c490
consolidate 2weso fail-hunting phases 40-43
hoffmang9 Feb 11, 2026
78092a2
2weso fail hunting 44
hoffmang9 Feb 11, 2026
5f9df6c
2weso fail hunting 45
hoffmang9 Feb 11, 2026
3a5dae4
2weso fail hunting 46
hoffmang9 Feb 11, 2026
5b74a88
cleanup: strip 2weso debug scaffolding after Windows root-cause isola…
hoffmang9 Feb 11, 2026
b364b60
ci - add an asm path to the c++ test path, fix HW headers
hoffmang9 Feb 11, 2026
651ee88
ci - bring windows back around to main ubuntu/macos testing
hoffmang9 Feb 11, 2026
34223d3
address cursor review issues
hoffmang9 Feb 11, 2026
ba2c55c
everything on windows should take the asm path
hoffmang9 Feb 11, 2026
e5db303
something about the windows asm path is broken
hoffmang9 Feb 11, 2026
f9b2efa
and we are off again searching for the asm issue
hoffmang9 Feb 11, 2026
14bf670
cache all the things and chase the asm issue
hoffmang9 Feb 11, 2026
52e9b56
look closer at the ubuntu vs windows asm changes
hoffmang9 Feb 11, 2026
bde8394
Consolidate Windows asm investigation commits 1-17.
hoffmang9 Feb 11, 2026
0e07a15
Fix Windows asm CI and align optimized test coverage (#304)
hoffmang9 Feb 12, 2026
9e88b65
cleanup: finalize Windows asm follow-ups and include hygiene
hoffmang9 Feb 12, 2026
3f22222
fix windows avx512 add table rip-relative access
hoffmang9 Feb 12, 2026
73f2585
Remove unused max_test_iteration in 2weso_test.
hoffmang9 Feb 12, 2026
5645703
fix gcd_unsigned compare against end_index
hoffmang9 Feb 12, 2026
afcd784
Harden callback state synchronization and update build/runtime tuning…
hoffmang9 Feb 12, 2026
c3e10d2
Guard POSIX-only emulator hardware targets on Windows.
hoffmang9 Feb 12, 2026
16b5466
Initialize FastAlgorithmCallback forms_capacity.
hoffmang9 Feb 12, 2026
96abd39
Harden macOS/Windows gcd_unsigned dispatch bounds check.
hoffmang9 Feb 12, 2026
f5a7eef
Define CHIAOSX for macOS CMake asm generation.
hoffmang9 Feb 12, 2026
01e8da9
Refine Windows asm argument handling and remove unused SEH include.
hoffmang9 Feb 12, 2026
cd7e60e
Preserve non-Windows asm behavior and gate fallback logging.
hoffmang9 Feb 13, 2026
80a6820
Reply to Opus review with dispatch, bounds, and const-correctness fixes.
hoffmang9 Feb 13, 2026
f7fd57e
Avoid synchronous fallback in TwoWesolowski prover start.
hoffmang9 Feb 13, 2026
46bd58e
Gate AVX512 IFMA dispatch on OS XSAVE/XCR0 state.
hoffmang9 Feb 13, 2026
b3508d0
Narrow test workflow to three optimized runners and exercise forced A…
hoffmang9 Feb 13, 2026
e006351
Temporarily disable all CI jobs and tighten AVX diagnostics.
hoffmang9 Feb 13, 2026
bf8f039
Re-enable test workflow on the original three optimized runners.
hoffmang9 Feb 13, 2026
2d18128
Restore all optimized test runners, including Windows.
hoffmang9 Feb 13, 2026
04f40ca
Use lock-free publication for two-weso forms and stop forcing AVX2.
hoffmang9 Feb 13, 2026
7208595
Merge main into avx2-force-ci-minimal-runners
hoffmang9 Feb 13, 2026
c5ae28e
Restore vdf_bench benchmark depth in optimized CI.
hoffmang9 Feb 14, 2026
31a8226
Add temporary Windows perf investigation instrumentation.
hoffmang9 Feb 14, 2026
2c93d8e
Fix PowerShell interpolation in Windows perf harness.
hoffmang9 Feb 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/build-c-libraries.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ permissions:

jobs:
build-c-libraries:
if: ${{ false }}
name: C Libraries - ${{ matrix.os.name }} ${{ matrix.arch.name }}
runs-on: ${{ matrix.os.runs-on[matrix.arch.matrix] }}
strategy:
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/build-riscv64.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ permissions:

jobs:
build_wheels:
if: ${{ false }}
name: ${{ matrix.os.emoji }} 📦 Build ${{ matrix.python.major-dot-minor }}
runs-on: ${{ matrix.os.runs-on }}
strategy:
Expand Down Expand Up @@ -80,6 +81,7 @@ jobs:
path: ./dist
if-no-files-found: error
upload:
if: ${{ false }}
name: Upload to Chia PyPI
runs-on: ubuntu-latest
needs:
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ permissions:

jobs:
build-wheels:
if: ${{ false }}
name: Wheel - ${{ matrix.os.name }} ${{ matrix.python.major-dot-minor }} ${{ matrix.arch.name }}
runs-on: ${{ matrix.os.runs-on[matrix.arch.matrix] }}
strategy:
Expand Down Expand Up @@ -123,6 +124,7 @@ jobs:
path: ./dist

build-sdist:
if: ${{ false }}
name: sdist - ${{ matrix.os.name }} ${{ matrix.python.major-dot-minor }} ${{ matrix.arch.name }}
runs-on: ${{ matrix.os.runs-on[matrix.arch.matrix] }}
strategy:
Expand Down Expand Up @@ -166,6 +168,7 @@ jobs:
path: ./dist

check:
if: ${{ false }}
name: Check - ${{ matrix.os.name }} ${{ matrix.python.major-dot-minor }} ${{ matrix.arch.name }}
runs-on: ${{ matrix.os.runs-on[matrix.arch.matrix] }}
strategy:
Expand Down Expand Up @@ -208,6 +211,7 @@ jobs:
mypy --config-file mypi.ini setup.py tests

upload:
if: ${{ false }}
name: Upload to PyPI - ${{ matrix.os.name }} ${{ matrix.python.major-dot-minor }} ${{ matrix.arch.name }}
runs-on: ${{ matrix.os.runs-on[matrix.arch.matrix] }}
needs:
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/check-commit-signing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ concurrency:

jobs:
check-commit-signing:
if: ${{ false }}
name: Check commit signing
runs-on: [ubuntu-latest]
timeout-minutes: 5
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/codeql-analysis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ on:

jobs:
analyze:
if: ${{ false }}
name: Analyze
runs-on: ubuntu-latest
permissions:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/dependency-review.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ permissions:

jobs:
dependency-review:
if: github.repository_owner == 'Chia-Network'
if: ${{ false }}
runs-on: ubuntu-latest
steps:
- name: "Checkout Repository"
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/hw-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ permissions:

jobs:
build-hw:
if: ${{ false }}
name: Build HW VDF Client
runs-on: [ubuntu-22.04]
steps:
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ permissions:

jobs:
fuzz_targets:
if: ${{ false }}
name: Run fuzzers (${{ matrix.target }})
runs-on: ubuntu-latest
env:
Expand Down Expand Up @@ -59,6 +60,7 @@ jobs:
cargo +nightly fuzz run ${{ matrix.target }} -- -max_total_time=600

lint:
if: ${{ false }}
name: Lint
runs-on: ubuntu-latest
steps:
Expand All @@ -76,6 +78,7 @@ jobs:
run: cargo clippy

test:
if: ${{ false }}
name: Test (${{ matrix.os.name }} ${{ matrix.arch.name }})
runs-on: ${{ matrix.os.runs-on[matrix.arch.matrix] }}

Expand Down Expand Up @@ -161,6 +164,7 @@ jobs:
run: cargo test && cargo test --release

build_crate:
if: ${{ false }}
name: Build crate
needs: [lint, test]
runs-on: ubuntu-latest
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/stale-issue.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ on:

jobs:
stale:
if: ${{ false }}
runs-on: ubuntu-latest
steps:
- uses: chia-network/stale@main
Expand Down
106 changes: 84 additions & 22 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,15 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [macos-13-intel, macos-13-arm64, ubuntu-latest, windows-latest]
config: [optimized=1, TSAN=1, ASAN=1]
exclude:
include:
- os: macos-13-intel
config: optimized=1
- os: macos-13-arm64
config: optimized=1
Comment thread
cursor[bot] marked this conversation as resolved.
- os: ubuntu-latest
config: optimized=1
- os: windows-latest
config: ASAN=1
- os: windows-latest
config: TSAN=1
config: optimized=1

steps:
- name: Checkout code
Expand Down Expand Up @@ -193,14 +195,11 @@ jobs:
cd src
echo "Running 1weso_test"
./1weso_test
echo "Running 2weso_test"
./2weso_test
# PERF_INVESTIGATION_TEMP: keep non-target tests at smoke level for faster perf investigation turnaround.
echo "Running 2weso_test with 10 iterations (smoke)"
./2weso_test 10
echo "Running prover_test"
if [[ "${{ matrix.os }}" == ubuntu* ]]; then
./prover_test
else
CHIAVDF_PROVER_TEST_FAST=1 ./prover_test
fi
CHIAVDF_PROVER_TEST_FAST=1 ./prover_test

- name: Run vdf tests (short)
if: matrix.config != 'optimized=1' && !startsWith(matrix.os, 'windows')
Expand Down Expand Up @@ -246,8 +245,9 @@ jobs:
}
Write-Host "Running 1weso_test"
Invoke-TestExe "1weso_test"
Write-Host "Running 2weso_test"
Invoke-TestExe "2weso_test"
# PERF_INVESTIGATION_TEMP: keep non-target tests at smoke level for faster perf investigation turnaround.
Write-Host "Running 2weso_test with 10 iterations (smoke)"
Invoke-TestExe "2weso_test" @("10")
Write-Host "Running prover_test"
$env:CHIAVDF_PROVER_TEST_FAST = "1"
Invoke-TestExe "prover_test"
Expand All @@ -256,20 +256,82 @@ jobs:
if: matrix.config == 'optimized=1' && !startsWith(matrix.os, 'windows')
run: |
cd src
echo "Benchmarking vdf_bench with 2,000,000 iterations of square_asm"
./vdf_bench square_asm 2000000
# PERF_INVESTIGATION_TEMP: lower non-target benchmark load to speed overall CI turnaround.
echo "Benchmarking vdf_bench with 250,000 iterations of square_asm"
CHIAVDF_PERF_TRACE=1 ./vdf_bench square_asm 250000

- name: Benchmark vdf_bench square (Windows)
- name: Benchmark vdf_bench square (Windows, perf investigation)
if: matrix.config == 'optimized=1' && startsWith(matrix.os, 'windows')
shell: pwsh
run: |
# PERF_INVESTIGATION_TEMP: repeated benchmark harness for Windows regression isolation.
cd build
$dllPaths = @()
if ($env:MPIR_ROOT -and (Test-Path "$env:MPIR_ROOT\bin")) { $dllPaths += "$env:MPIR_ROOT\bin" }
if (Test-Path "$env:GITHUB_WORKSPACE\mpir_gc_x64") { $dllPaths += "$env:GITHUB_WORKSPACE\mpir_gc_x64" }
if ($dllPaths.Count -gt 0) { $env:PATH = ($dllPaths -join ';') + ';' + $env:PATH }
Write-Host "Benchmarking vdf_bench with 2,000,000 iterations of square_asm"
& .\vdf_bench.exe square_asm 2000000
Write-Host "vdf_bench exit code: $LASTEXITCODE"
if ($LASTEXITCODE -ne 0) { exit $LASTEXITCODE }
$env:CHIAVDF_PERF_TRACE = "1"

Write-Host "PERF_INVESTIGATION_TEMP warmup: square_asm 10000"
& .\vdf_bench.exe square_asm 10000
if ($LASTEXITCODE -ne 0) {
Write-Host "warmup exit code: $LASTEXITCODE"
exit $LASTEXITCODE
}

$repetitions = 5
$iterations = 1000000
$ipsValues = @()
$metricLines = @()
$metricsPath = Join-Path $env:RUNNER_TEMP "PERF_INVESTIGATION_TEMP_windows_metrics.txt"
"PERF_INVESTIGATION_TEMP begin" | Out-File -FilePath $metricsPath -Encoding utf8
for ($i = 1; $i -le $repetitions; $i++) {
Write-Host "PERF_INVESTIGATION_TEMP run $i/${repetitions}: square_asm $iterations"
$runOutput = & .\vdf_bench.exe square_asm $iterations 2>&1
$exitCode = $LASTEXITCODE
$runOutput | ForEach-Object { Write-Host $_ }
if ($exitCode -ne 0) {
Write-Host "vdf_bench run $i exit code: $exitCode"
exit $exitCode
}
$metricLine = $runOutput | Select-String -Pattern 'PERF_INVESTIGATION_TEMP mode=square_asm .*ips=([0-9]+(?:\.[0-9]+)?)' | Select-Object -Last 1
if (-not $metricLine) {
throw "Missing PERF_INVESTIGATION_TEMP metric line in run $i output"
}
$metricText = $metricLine.ToString()
$metricLines += $metricText
$metricText | Out-File -FilePath $metricsPath -Encoding utf8 -Append
$ips = [double]$metricLine.Matches[0].Groups[1].Value
$ipsValues += $ips
}

$avg = ($ipsValues | Measure-Object -Average).Average
$min = ($ipsValues | Measure-Object -Minimum).Minimum
$max = ($ipsValues | Measure-Object -Maximum).Maximum
$variance = 0.0
foreach ($v in $ipsValues) {
$delta = $v - $avg
$variance += $delta * $delta
}
$stddev = [Math]::Sqrt($variance / $ipsValues.Count)
$joinedIps = ($ipsValues | ForEach-Object { "{0:N3}" -f $_ }) -join ", "

Write-Host ("PERF_INVESTIGATION_TEMP summary runs={0} iterations={1} ips_values=[{2}] avg={3:N3} stddev={4:N3} min={5:N3} max={6:N3}" -f $repetitions, $iterations, $joinedIps, $avg, $stddev, $min, $max)
("PERF_INVESTIGATION_TEMP summary runs={0} iterations={1} ips_values=[{2}] avg={3:N3} stddev={4:N3} min={5:N3} max={6:N3}" -f $repetitions, $iterations, $joinedIps, $avg, $stddev, $min, $max) | Out-File -FilePath $metricsPath -Encoding utf8 -Append
"## PERF_INVESTIGATION_TEMP Windows square_asm`n" | Out-File -FilePath $env:GITHUB_STEP_SUMMARY -Encoding utf8 -Append
("- runs: {0}" -f $repetitions) | Out-File -FilePath $env:GITHUB_STEP_SUMMARY -Encoding utf8 -Append
("- iterations_per_run: {0}" -f $iterations) | Out-File -FilePath $env:GITHUB_STEP_SUMMARY -Encoding utf8 -Append
("- ips_values: {0}" -f $joinedIps) | Out-File -FilePath $env:GITHUB_STEP_SUMMARY -Encoding utf8 -Append
("- avg_ips: {0:N3}" -f $avg) | Out-File -FilePath $env:GITHUB_STEP_SUMMARY -Encoding utf8 -Append
("- stddev_ips: {0:N3}" -f $stddev) | Out-File -FilePath $env:GITHUB_STEP_SUMMARY -Encoding utf8 -Append
("- min_ips: {0:N3}" -f $min) | Out-File -FilePath $env:GITHUB_STEP_SUMMARY -Encoding utf8 -Append
("- max_ips: {0:N3}" -f $max) | Out-File -FilePath $env:GITHUB_STEP_SUMMARY -Encoding utf8 -Append

- name: Upload Windows perf investigation metrics
if: matrix.config == 'optimized=1' && startsWith(matrix.os, 'windows')
uses: actions/upload-artifact@v4
with:
name: PERF_INVESTIGATION_TEMP-windows-square-asm-metrics
path: ${{ runner.temp }}/PERF_INVESTIGATION_TEMP_windows_metrics.txt
if-no-files-found: error

1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,6 @@ AVX runtime flags:

- `CHIAVDF_LOG_AVX=1`: emit AVX detection logs at startup
- `CHIA_DISABLE_AVX2=1`: disable AVX2 path even when supported
- `CHIA_FORCE_AVX2=1`: force AVX2 path
- `CHIA_DISABLE_AVX512_IFMA=1`: disable AVX-512 IFMA path
- `CHIA_ENABLE_AVX512_IFMA=1`: enable AVX-512 IFMA path when CPUID support is present
- `CHIA_FORCE_AVX512_IFMA=1`: force AVX-512 IFMA path
Expand Down
50 changes: 31 additions & 19 deletions src/callback.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
#include <algorithm>
#include <atomic>
#include <limits>
#include <mutex>
#include <stdexcept>

// Applies to n-weso.
Expand Down Expand Up @@ -134,32 +133,44 @@ class TwoWesolowskiCallback: public WesolowskiCallback {
forms_capacity = space_needed;
forms.reset(new form[space_needed]);
forms[0] = f;
kl = 10;
switch_iters = -1;
switch_iters.store(0, std::memory_order_relaxed);
switch_index.store(0, std::memory_order_relaxed);
large_constants.store(false, std::memory_order_relaxed);
// forms[0] is valid immediately at construction.
max_published_power.store(0, std::memory_order_relaxed);
}

void IncreaseConstants(uint64_t num_iters) {
std::lock_guard<std::mutex> lk(forms_mutex);
kl = 100;
switch_iters = num_iters;
switch_index = num_iters / 10;
// Publish the switch in a single direction: false -> true.
switch_iters.store(num_iters, std::memory_order_release);
switch_index.store(num_iters / 10, std::memory_order_release);
large_constants.store(true, std::memory_order_release);
}

int GetPosition(uint64_t power) {
std::lock_guard<std::mutex> lk(forms_mutex);
return GetPositionUnlocked(power);
}

bool IsPublished(uint64_t power) const {
return max_published_power.load(std::memory_order_acquire) >= power;
}

int GetPositionUnlocked(uint64_t power) const {
if (switch_iters == -1 || power < switch_iters) {
if (!large_constants.load(std::memory_order_acquire)) {
return power / 10;
} else {
return (switch_index + (power - switch_iters) / 100);
}
const uint64_t switch_iters_local = switch_iters.load(std::memory_order_acquire);
if (power < switch_iters_local) {
return power / 10;
}
const uint64_t switch_index_local = switch_index.load(std::memory_order_acquire);
return (switch_index_local + (power - switch_iters_local) / 100);
}

form GetFormCopy(uint64_t power) {
std::lock_guard<std::mutex> lk(forms_mutex);
if (!IsPublished(power)) {
throw std::runtime_error("TwoWesolowskiCallback::GetFormCopy not yet published");
}
const int pos = GetPositionUnlocked(power);
if (pos < 0 || static_cast<size_t>(pos) >= forms_capacity) {
throw std::runtime_error("TwoWesolowskiCallback::GetFormCopy out of bounds");
Expand All @@ -168,28 +179,29 @@ class TwoWesolowskiCallback: public WesolowskiCallback {
}

bool LargeConstants() {
std::lock_guard<std::mutex> lk(forms_mutex);
return kl == 100;
return large_constants.load(std::memory_order_acquire);
}

void OnIteration(int type, void *data, uint64_t iteration) {
iteration++;
std::lock_guard<std::mutex> lk(forms_mutex);
const uint32_t kl = large_constants.load(std::memory_order_acquire) ? 100 : 10;
if (iteration % kl == 0) {
const int pos = GetPositionUnlocked(iteration);
if (pos < 0 || static_cast<size_t>(pos) >= forms_capacity) {
throw std::runtime_error("TwoWesolowskiCallback::OnIteration out of bounds");
}
form* mulf = &forms[static_cast<size_t>(pos)];
SetForm(type, data, mulf);
// Publish this completed checkpoint after writing the form data.
max_published_power.store(iteration, std::memory_order_release);
}
}

private:
uint64_t switch_index;
int64_t switch_iters;
uint32_t kl;
std::mutex forms_mutex;
std::atomic<uint64_t> switch_index;
std::atomic<uint64_t> switch_iters;
std::atomic<bool> large_constants;
std::atomic<uint64_t> max_published_power;
};

class FastAlgorithmCallback : public WesolowskiCallback {
Expand Down
Loading
Loading