Skip to content

Commit ad2d771

Browse files
ChanceSiyuanclaudeGiggleLiu
authored
Add noisy circuit dataset and documentation (Issue #12) (#25)
* Add noisy circuit dataset for BP decoding demonstration - Add Stim circuits for rotated surface code (d=3) memory experiments with circuit-level depolarizing noise (p=0.01) at 3, 5, 7 rounds - Add generation script (scripts/generate_noisy_circuits.py) - Add comprehensive README with BP decoding tutorial and examples - Add visualization images (qubit layout, parity check matrix, syndrome stats) - Update .gitignore to exclude .venv/ * Refactor to proper Python package structure - Convert scripts/ to src/bpdecoderplus/ package following Python best practices - Add pyproject.toml with uv/hatchling build system and dependencies - Add comprehensive test suite (32 tests) for circuit.py and cli.py - Update .gitignore with Python-specific patterns - Update README to use new CLI entry point via uv Addresses PR feedback from @GiggleLiu. * Add Makefile and uv support for automated workflow - Add Makefile with targets for install, setup, generate-dataset, test, and clean - Update pyproject.toml with uv dev-dependencies configuration - Addresses issue #12 requirements for automation and uv package management Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add GitHub Actions CI/CD workflow for automated testing - Add test.yml workflow to run tests on push and PR - Test on Python 3.10, 3.11, and 3.12 - Use uv for dependency management in CI - Addresses PR #14 review comment for CI/CD setup Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add test coverage reporting and README badges - Update CI workflow to generate coverage reports with pytest-cov - Upload coverage to Codecov for tracking - Add test status and coverage badges to README - Add `make test-cov` target for local coverage reports - Update .gitignore to exclude coverage files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix CI: allow uv cache without lock file Set ignore-nothing-to-cache to true to allow CI to proceed when uv.lock is not present in the repository. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix CI: disable uv caching Remove enable-cache to avoid lock file requirement. Caching can be re-enabled later with a proper lock file. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Remove PNG visualization files from dataset - Delete all PNG files (layout, parity check matrix, syndrome stats) - Update README to remove image references - Keep focus on circuit files and code examples Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add syndrome database generation (Issue #5) - Add syndrome.py module for sampling and saving syndromes - Integrate syndrome generation into CLI with --generate-syndromes flag - Add comprehensive test suite for syndrome operations - Add make generate-syndromes target for easy database creation - Support npz format with metadata for efficient storage Features: - Sample detection events from circuits - Save/load syndrome databases with metadata - Generate databases directly from circuit files - CLI integration for automated workflow Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add detector error model generation (Issue #4) - Add dem.py module for DEM extraction and manipulation - Extract DEM from circuits with decomposition support - Save/load DEMs in stim native format - Convert DEM to JSON for analysis - Build parity check matrix H for BP decoding - Integrate DEM generation into CLI with --generate-dem flag - Add comprehensive test suite for DEM operations - Add make generate-dem target Features: - Extract detector error models from circuits - Save in .dem format (stim native) - Export to JSON with structured error information - Build parity check matrix for BP decoder - CLI integration for automated workflow Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix CI: accept bool dtype in syndrome tests Stim returns boolean arrays by default, not uint8. Update test to accept both bool and uint8 dtypes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add comprehensive syndrome dataset documentation - Add SYNDROME_DATASET.md with complete API documentation - Add validate_dataset.py for dataset generation and validation - Document data format, API interface, and validation checks - Include usage examples and statistics - Provide evidence of dataset validity Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add minimum working example and pipeline illustration - Add minimal_example.py with complete end-to-end demonstration - Add PIPELINE_ILLUSTRATION.md with visual pipeline diagrams - Include detailed explanations of each step - Show data flow and file formats - Provide conceptual understanding of the pipeline Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add getting started guide and demo dataset generator Rewrote PIPELINE_ILLUSTRATION.md as a practical getting-started guide focused on data generation workflow. Added generate_demo_dataset.py to provide a working example that generates, validates, and saves a small syndrome dataset. These changes make it easier for new users to understand and use the package. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Organize datasets into subdirectories and complete Issues #4 and #5 This commit reorganizes the dataset structure and ensures proper file placement for circuits, DEMs, and syndromes. Changes: - Reorganize datasets/ into circuits/, dems/, and syndromes/ subdirectories - Update CLI default output to datasets/circuits/ - Update DEM generation to save files in datasets/dems/ - Update syndrome generation to save files in datasets/syndromes/ - Fix test to reflect new default output path - Add demo DEM and syndrome files for all three circuit variants Resolves #4: Detector error model generation now saves .dem files Resolves #5: Syndrome database generation now saves .npz files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add UAI format support for probabilistic inference (Issue #4) This commit adds support for generating UAI (Uncertainty in Artificial Intelligence) format files from detector error models, enabling probabilistic inference with tools like TensorInference.jl. Changes: - Add dem_to_uai() to convert DEM to UAI format - Add save_uai() to save UAI files - Add generate_uai_from_circuit() for CLI integration - Add --generate-uai flag to CLI - Generate UAI files for all demo circuits - Add comprehensive test coverage for UAI functionality The UAI format represents the DEM as a Markov network where: - Each detector is a binary variable - Each error mechanism is a factor/clique - Factor tables encode error probabilities Addresses #4 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Consolidate documentation into unified getting started guide This commit merges SYNDROME_DATASET.md and PIPELINE_ILLUSTRATION.md into a single comprehensive GETTING_STARTED.md guide in the examples folder. Changes: - Create examples/GETTING_STARTED.md with unified content - Add UAI format introduction for beginners - Update all file paths to reflect new dataset organization - Remove redundant datasets/SYNDROME_DATASET.md - Remove redundant examples/PIPELINE_ILLUSTRATION.md The new guide provides: - Quick start instructions - Step-by-step pipeline explanation - Detailed format documentation (.stim, .dem, .uai, .npz) - Code examples for all use cases - Troubleshooting and best practices Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Organize UAI files into separate datasets/uais/ directory This commit reorganizes the dataset structure to keep UAI files separate from DEM files for better organization. Changes: - Move .uai files from datasets/dems/ to datasets/uais/ - Update generate_uai_from_circuit() to save in datasets/uais/ - Update documentation to reflect new folder structure - Update datasets/README.md with dataset organization section - Update examples/GETTING_STARTED.md with correct paths Dataset structure: - datasets/circuits/ - Circuit files (.stim) - datasets/dems/ - Detector error models (.dem) - datasets/uais/ - UAI format files (.uai) - datasets/syndromes/ - Syndrome databases (.npz) All tests passing (62/62) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Organize demonstration code into examples/ directory - Move generate_demo_dataset.py to examples/ - Move validate_dataset.py to examples/ - Update GETTING_STARTED.md with clarifications This keeps the root directory clean and groups all example/demo code in a dedicated folder for better project organization. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update settings.local.json to expand allowed Bash commands and modify syndrome dataset file * add a notebook * Organize scripts into dedicated scripts/ directory Move script files from examples/ to scripts/: - generate_demo_dataset.py - validate_dataset.py This separates demonstration scripts from API usage examples. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Set up MkDocs documentation with GitHub Pages deployment - Add mkdocs.yml configuration with Material theme - Create docs/index.md as main documentation page - Move GETTING_STARTED.md to docs/getting_started.md - Add GitHub Actions workflow for automatic deployment - Add docs dependencies to pyproject.toml - Add Makefile targets for building and serving docs Documentation will be available at GitHub Pages after merge to main. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: GiggleLiu <cacate0129@gmail.com>
1 parent b7a2a7f commit ad2d771

31 files changed

Lines changed: 25374 additions & 9 deletions

.github/workflows/docs.yml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: Deploy Documentation
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
workflow_dispatch:
8+
9+
permissions:
10+
contents: write
11+
12+
jobs:
13+
deploy:
14+
runs-on: ubuntu-latest
15+
steps:
16+
- uses: actions/checkout@v4
17+
18+
- name: Set up Python
19+
uses: actions/setup-python@v5
20+
with:
21+
python-version: '3.12'
22+
23+
- name: Install dependencies
24+
run: |
25+
pip install mkdocs-material mkdocstrings[python] pymdown-extensions
26+
27+
- name: Deploy to GitHub Pages
28+
run: mkdocs gh-deploy --force

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,4 +38,4 @@ uv.lock
3838
*.blg
3939
*.fdb_latexmk
4040
*.synctex.gz
41-
note/belief_propagation_qec_plan.pdf
41+
note/belief_propagation_qec_plan.pdf.claude/settings.local.json

Makefile

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,17 @@
1-
.PHONY: help install setup test test-cov generate-dataset clean
1+
.PHONY: help install setup test test-cov generate-dataset generate-dem generate-syndromes docs docs-serve clean
22

33
help:
44
@echo "Available targets:"
5-
@echo " install - Install uv package manager"
6-
@echo " setup - Set up development environment with uv"
7-
@echo " generate-dataset - Generate noisy circuit dataset"
8-
@echo " test - Run tests"
9-
@echo " test-cov - Run tests with coverage report"
10-
@echo " clean - Remove generated files and caches"
5+
@echo " install - Install uv package manager"
6+
@echo " setup - Set up development environment with uv"
7+
@echo " generate-dataset - Generate noisy circuit dataset"
8+
@echo " generate-dem - Generate detector error models"
9+
@echo " generate-syndromes - Generate syndrome database (1000 shots)"
10+
@echo " test - Run tests"
11+
@echo " test-cov - Run tests with coverage report"
12+
@echo " docs - Build documentation"
13+
@echo " docs-serve - Serve documentation locally"
14+
@echo " clean - Remove generated files and caches"
1115

1216
install:
1317
@command -v uv >/dev/null 2>&1 || { \
@@ -21,12 +25,26 @@ setup: install
2125
generate-dataset:
2226
uv run generate-noisy-circuits --distance 3 --p 0.01 --rounds 3 5 7 --task z --output datasets/noisy_circuits
2327

28+
generate-dem:
29+
uv run generate-noisy-circuits --distance 3 --p 0.01 --rounds 3 5 7 --task z --output datasets/noisy_circuits --generate-dem
30+
31+
generate-syndromes:
32+
uv run generate-noisy-circuits --distance 3 --p 0.01 --rounds 3 5 7 --task z --output datasets/noisy_circuits --generate-syndromes 1000
33+
2434
test:
2535
uv run pytest
2636

2737
test-cov:
2838
uv run pytest --cov=bpdecoderplus --cov-report=html --cov-report=term
2939

40+
docs:
41+
pip install mkdocs-material mkdocstrings[python] pymdown-extensions
42+
mkdocs build
43+
44+
docs-serve:
45+
pip install mkdocs-material mkdocstrings[python] pymdown-extensions
46+
mkdocs serve
47+
3048
clean:
3149
rm -rf .pytest_cache
3250
rm -rf __pycache__

datasets/README.md

Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
# Noisy Circuit Dataset (Surface Code, d=3)
2+
3+
Circuit-level surface-code memory experiments generated with Stim for **Belief Propagation (BP) decoding** demonstrations.
4+
5+
## Dataset Organization
6+
7+
The dataset is organized into subdirectories by file type:
8+
9+
```
10+
datasets/
11+
├── circuits/ # Noisy quantum circuits (.stim)
12+
├── dems/ # Detector error models (.dem)
13+
├── uais/ # UAI format for probabilistic inference (.uai)
14+
└── syndromes/ # Syndrome databases (.npz)
15+
```
16+
17+
## Overview
18+
19+
| Parameter | Value |
20+
|-----------|-------|
21+
| Code | Rotated surface code |
22+
| Distance | d = 3 |
23+
| Noise model | i.i.d. depolarizing |
24+
| Error rate | p = 0.01 |
25+
| Task | Z-memory experiment |
26+
| Rounds | 3, 5, 7 |
27+
28+
### Noise Application Points
29+
- Clifford gates (`after_clifford_depolarization`)
30+
- Data qubits between rounds (`before_round_data_depolarization`)
31+
- Resets (`after_reset_flip_probability`)
32+
- Measurements (`before_measure_flip_probability`)
33+
34+
## Files
35+
36+
| File | Description |
37+
|------|-------------|
38+
| `sc_d3_r3_p0010_z.stim` | 3 rounds, p=0.01, Z-memory |
39+
| `sc_d3_r5_p0010_z.stim` | 5 rounds, p=0.01, Z-memory |
40+
| `sc_d3_r7_p0010_z.stim` | 7 rounds, p=0.01, Z-memory |
41+
42+
## Using This Dataset for BP Decoding
43+
44+
### Step 1: Load Circuit and Extract Detector Error Model (DEM)
45+
46+
The Detector Error Model is the key input for BP decoding. It describes which errors trigger which detectors.
47+
48+
```python
49+
import stim
50+
import numpy as np
51+
52+
# Load circuit
53+
circuit = stim.Circuit.from_file("datasets/circuits/sc_d3_r3_p0010_z.stim")
54+
55+
# Extract DEM - this is what BP needs
56+
dem = circuit.detector_error_model(decompose_errors=True)
57+
print(f"Detectors: {dem.num_detectors}") # 24
58+
print(f"Error mechanisms: {dem.num_errors}") # 286
59+
print(f"Observables: {dem.num_observables}") # 1
60+
```
61+
62+
### Step 2: Build Parity Check Matrix H
63+
64+
BP operates on the parity check matrix where `H[i,j] = 1` means error `j` triggers detector `i`.
65+
66+
```python
67+
def build_parity_check_matrix(dem):
68+
"""Convert DEM to parity check matrix H and prior probabilities."""
69+
errors = []
70+
for inst in dem.flattened():
71+
if inst.type == 'error':
72+
prob = inst.args_copy()[0]
73+
dets = [t.val for t in inst.targets_copy() if t.is_relative_detector_id()]
74+
obs = [t.val for t in inst.targets_copy() if t.is_logical_observable_id()]
75+
errors.append({'prob': prob, 'detectors': dets, 'observables': obs})
76+
77+
n_detectors = dem.num_detectors
78+
n_errors = len(errors)
79+
80+
# Parity check matrix
81+
H = np.zeros((n_detectors, n_errors), dtype=np.uint8)
82+
# Prior error probabilities (for BP initialization)
83+
priors = np.zeros(n_errors)
84+
# Which errors flip the logical observable
85+
obs_flip = np.zeros(n_errors, dtype=np.uint8)
86+
87+
for j, e in enumerate(errors):
88+
priors[j] = e['prob']
89+
for d in e['detectors']:
90+
H[d, j] = 1
91+
if e['observables']:
92+
obs_flip[j] = 1
93+
94+
return H, priors, obs_flip
95+
96+
H, priors, obs_flip = build_parity_check_matrix(dem)
97+
print(f"H shape: {H.shape}") # (24, 286)
98+
```
99+
100+
### Step 3: Sample Syndromes (Detection Events)
101+
102+
```python
103+
# Compile sampler
104+
sampler = circuit.compile_detector_sampler()
105+
106+
# Sample detection events + observable flip
107+
n_shots = 1000
108+
samples = sampler.sample(n_shots, append_observables=True)
109+
110+
# Split into syndrome and observable
111+
syndromes = samples[:, :-1] # shape: (n_shots, n_detectors)
112+
actual_obs_flips = samples[:, -1] # shape: (n_shots,)
113+
114+
print(f"Syndrome shape: {syndromes.shape}")
115+
print(f"Example syndrome: {syndromes[0]}")
116+
```
117+
118+
### Step 4: BP Decoding (Pseudocode)
119+
120+
```python
121+
def bp_decode(H, syndrome, priors, max_iter=50, damping=0.5):
122+
"""
123+
Belief Propagation decoder (min-sum variant).
124+
125+
Args:
126+
H: Parity check matrix (n_detectors, n_errors)
127+
syndrome: Detection events (n_detectors,)
128+
priors: Prior error probabilities (n_errors,)
129+
max_iter: Maximum BP iterations
130+
damping: Message damping factor
131+
132+
Returns:
133+
estimated_errors: Most likely error pattern (n_errors,)
134+
soft_output: Log-likelihood ratios (n_errors,)
135+
"""
136+
n_checks, n_vars = H.shape
137+
138+
# Initialize LLRs from priors: LLR = log((1-p)/p)
139+
llr_prior = np.log((1 - priors) / priors)
140+
141+
# Messages: check-to-variable and variable-to-check
142+
# ... BP message passing iterations ...
143+
144+
# Hard decision
145+
estimated_errors = (soft_output < 0).astype(int)
146+
147+
return estimated_errors, soft_output
148+
149+
# Decode each syndrome
150+
for i in range(n_shots):
151+
syndrome = syndromes[i]
152+
estimated_errors, _ = bp_decode(H, syndrome, priors)
153+
154+
# Predict observable flip
155+
predicted_obs_flip = np.dot(estimated_errors, obs_flip) % 2
156+
157+
# Check if decoding succeeded
158+
success = (predicted_obs_flip == actual_obs_flips[i])
159+
```
160+
161+
### Step 5: Evaluate Decoder Performance
162+
163+
After decoding, compare predicted vs actual observable flips to measure logical error rate.
164+
165+
```python
166+
def evaluate_decoder(decoder_fn, circuit, n_shots=10000):
167+
"""Evaluate decoder logical error rate."""
168+
dem = circuit.detector_error_model(decompose_errors=True)
169+
H, priors, obs_flip = build_parity_check_matrix(dem)
170+
171+
sampler = circuit.compile_detector_sampler()
172+
samples = sampler.sample(n_shots, append_observables=True)
173+
syndromes = samples[:, :-1]
174+
actual_obs = samples[:, -1]
175+
176+
errors = 0
177+
for i in range(n_shots):
178+
est_errors, _ = decoder_fn(H, syndromes[i], priors)
179+
pred_obs = np.dot(est_errors, obs_flip) % 2
180+
if pred_obs != actual_obs[i]:
181+
errors += 1
182+
183+
return errors / n_shots
184+
185+
# logical_error_rate = evaluate_decoder(bp_decode, circuit)
186+
```
187+
188+
## Regenerating the Dataset
189+
190+
```bash
191+
# Install the package with uv
192+
uv sync
193+
194+
# Generate circuits using the CLI
195+
python -m bpdecoderplus.cli \
196+
--distance 3 \
197+
--p 0.01 \
198+
--rounds 3 5 7 \
199+
--task z \
200+
--generate-dem \
201+
--generate-uai \
202+
--generate-syndromes 10000
203+
```
204+
205+
## Extending the Dataset
206+
207+
```bash
208+
# Different error rates
209+
python -m bpdecoderplus.cli --p 0.005 --rounds 3 5 7 --generate-dem --generate-uai
210+
211+
# Different distances
212+
python -m bpdecoderplus.cli --distance 5 --rounds 5 7 9 --generate-dem --generate-uai
213+
214+
# X-memory experiment
215+
python -m bpdecoderplus.cli --task x --rounds 3 5 7 --generate-dem --generate-uai
216+
```
217+
218+
## References
219+
220+
- [Stim Documentation](https://github.com/quantumlib/Stim)
221+
- [BP+OSD Decoder Paper](https://arxiv.org/abs/2005.07016)
222+
- [Surface Code Decoding Review](https://quantum-journal.org/papers/q-2024-10-10-1498/)

0 commit comments

Comments
 (0)