Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# CLAUDE.md — LED-Display_G6_Firmware_Panel

Guidance for Claude Code working in this repo: RP2350 (Pico 2 / RP2354B) firmware for
the 20×20 panels of the G6 LED arena.

## Architecture (orientation)

Dual-core RP2354B. **Core 0** = `Messenger` — SPI ingest on the PL022 hardware SSP in
**slave** mode (polled, custom blocking read). **Core 1** = `Display` — PIO-driven BCM.
Lock-free SPSC queues (`queue_t`) between them; the display can never starve SPI of CPU.

SPI wire format: **Mode 3** (CPOL=1, CPHA=1), MSB-first, 8-bit, CS-framed.
GS2 = 53-byte messages (cmd `0x10`), GS16 = 203-byte messages (cmd `0x30`).
CIPO is **shared** across panels on the arena bus → the panel drives a 3-byte
confirmation slot `{header, cmd, CRC-8}` only inside its own CS-active window.

Two hardware revs via `-DPANEL_REV` (see `platformio.ini`):
- **v0.2.1** (`pico_v021`): SPI0 on GP32–35, PSRAM CS GP0.
- **v0.3.1** (`pico_v031`): SPI1 on GP40–43, PSRAM CS GP47.

## Build & flash

Run from `panel/`. `pio` is at `/opt/homebrew/bin/pio`.

Environments:
- `pico_v021` / `pico_v031` — **production**.
- `pico_v021_spidiag` / `pico_v031_spidiag` — production + `-DSPI_DIAG=1`: silent in-RAM
reception counters + per-command histogram + ring of the last 32 failures (`got` vs
`expected`). Serial `z` = zero the window, `d` = dump. **Identical streaming timing to
production** (counters are cheap RAM increments; no serial output while streaming).
- `pico_v021_bcmtest` / `pico_v031_bcmtest` — display self-test, **NO SPI ingest. Never deploy.**

Build: `pio run -e <env>` (e.g. `pio run -e pico_v031_spidiag`)

### Flashing autonomously (BOOTSEL via 1200-baud touch) — no need to prompt the user

The earlephilhower core reboots into BOOTSEL when its USB CDC port is opened at 1200 baud.
**You may flash without asking the user to press the BOOTSEL button.** Sequence:

1. 1200-baud touch on the panel's CDC port (typically `/dev/cu.usbmodem1101`; the arena
master enumerates separately, e.g. `…121699401`):
```python
import serial, time
s = serial.Serial("/dev/cu.usbmodem1101", 1200); s.setDTR(False); time.sleep(0.3); s.close()
```
2. Wait for `/Volumes/RP2350` to mount (poll up to ~10 s).
3. `cp -X panel/.pio/build/<env>/firmware.uf2 /Volumes/RP2350/`
4. Wait for the CDC port to re-enumerate (~a few seconds).

Use `/usr/bin/python3` (has pyserial). If `/Volumes/RP2350` is already present the board is
already in BOOTSEL — skip step 1. After flashing, verify with an idle `d` dump: under
`SPI_DIAG` the read has a 50 ms idle timeout, so `d`/`z` are answered even with the master
idle (the dump banner reports `PANEL_REV=21` or `=31` — use it to confirm the right build).

## SPI reliability benchmark (audio-cued contained burst)

The arena master (separate repo, `LED-Display_G6_Firmware_Arena`) has runtime SPI-clock
control + a free-running frames-sent counter. To measure panel-received vs master-sent:
`z` the panel window, have the operator clear the master counter and stream a fixed burst,
then `d`. `missed = sent − received`; `reject_any` = within-frame corruption. Synchronise
start/stop with audio cues (`afplay /System/Library/Sounds/{Glass,Basso}.aiff` + `say`) so
the panel's z..d window brackets the master's burst. ~30 s windows are plenty.

## Gotchas / do-not-break

- **The per-valid-frame DOUBLE SSE toggle is load-bearing.** `panel_spi_read()` clears the
confirmation (toggle #1) and `Messenger::update()` arms it (toggle #2). Collapsing to a
single reload was A/B-tested and **regressed 15 MHz from 0% to 2.4% byte-drops** — the
extra SSP disable/enable keeps the marginal PL022 slave RX aligned. See the warning in
`panel_spi_custom.cpp` `panel_spi_read()`. Do not "optimize" it away.
- **PE03 last-byte-drop** is fixed by draining the RX FIFO after CS-high (`RX_DRAIN_SETTLE`
in `custom_spi_read_blocking`). Sub-`MESSAGE_MINIMUM_SIZE` runts are ignored (no glyph).
- **Reliability ceiling:** clean to **18 MHz** on **both** revs; sharp corruption cliff at
**20 MHz** (~6.4%, `got=exp−1`, parity rides along) — inherent PL022 slave sampling,
**rev-independent** (not pin-routing SI). Deployed clock is ~5 MHz (huge margin). Past
~18 MHz needs PL022+DMA, then a PIO+DMA SPI slave (see the plan in the SPI bench docs).
118 changes: 118 additions & 0 deletions panel/bench/SPI-BRINGUP-SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# G6 panel SPI bring-up — investigation summary

**Date:** 2026-05-30 · **Hardware:** arena master (Teensy 4.1, G6-ArenaSlim) + single G6
panel · **Panel firmware:** branch `spi-bringup-step0` (PR #5) · **Author:** bring-up with
Frank Loesche (`floesche`).

This is the wrap-up of the first real-hardware SPI bring-up of the G6 arena: two merged
PRs, a root-caused-and-fixed flicker bug, a full clock/rate/rev reliability sweep, an
abandoned "optimization," and the resulting decision on the future high-speed path.

---

## 1. What was wrong, and what we shipped

| # | Problem | Fix | Status |
|---|---|---|---|
| PR #2 | Parity used `std::bitset<sizeof(uint8_t)>` = a **1-bit** set → counted only the LSB | `std::bitset<CHAR_BIT*sizeof(uint8_t)>` (full 8-bit popcount; spec-correct) | merged |
| PR #3 | `PE03`/`PE04` at ≤5 MHz: slave TX FIFO assumed empty between transactions | SSE toggle clears both FIFOs before re-loading the CIPO confirmation | merged |
| Flicker / PE03 | Display dimming = panel **rejecting** frames | see §2 | fixed (PR #5) |
| Core-0 stall | Blocking 11-line serial heartbeat every 1000 msgs could stall core 0 between transactions → missed leading bytes | non-blocking heartbeat (`availableForWrite` guard) | merged (`171d614`) |

## 2. Flicker / PE03 — root cause and fix

The flicker was the panel **rejecting** otherwise-good frames. `SPI_DIAG` (silent in-RAM
counters + a ring of the last 32 failures with `got` vs `expected`) showed: **parity never
failed alone — only length**, almost always `got = expected − 1` (one byte short).

**Root cause = last-byte drop.** `custom_spi_read_blocking()` broke out of its loop on
`gpio_get(cs_pin)` reading high *before* the final byte had cleared the RP2350 input
synchronizer (~4 sysclk) + the SSP RX pipeline → returned one byte short → `check_length()`
failed → PE03 → frame dropped → visible dimming. **Not** signal integrity.

**Fix** (`c35b0cc`, `panel_spi_custom.cpp`): after CS goes high, **drain the RX FIFO** for a
straggler byte (bounded by `RX_DRAIN_SETTLE` empty polls so it can never hang; CS is high so
no new bytes can arrive). Plus: **ignore sub-`MESSAGE_MINIMUM_SIZE` runts** (a 1–2-byte CS
glitch shouldn't flash a 3 s error glyph). Validated: GS2/200 Hz 10/24130 → **0/24101**,
GS16 **0/24117**, **0 rejects across 229,726 messages**.

## 3. Reliability sweep (the headline result)

Method: panel `SPI_DIAG` build + arena master with runtime SPI-clock control and a
free-running frames-sent counter. Audio-cued contained burst — `z` the panel, operator
clears the master counter + streams a fixed ~30 s burst, then `d`; `missed = sent − received`,
`corrupted = rejected`. GS16 (203 B/frame).

| SPI clock | rate | v0.3.1 received / corrupted | v0.2.1 received / corrupted |
|---:|---:|:---|:---|
| 5 MHz | 200 Hz | — | 6064 / **0** |
| 10 MHz | 200 Hz | 3261 / **0** | — |
| 15 MHz | 200 Hz | 3186 / **0** | 6724 / **0** |
| 15 MHz | 500 fps | 14713 / **0** | 15753 / **0** |
| 18 MHz | 200 Hz | 3073 / **0** | 6908 / **0** |
| 18 MHz | 500 fps | 29144 / 858 (**2.9 %**) | — |
| 18 MHz | 1 kHz | 52622 / 5144 (**9.7 %**) | — |
| 20 MHz | 200 Hz | 3884 / 250 (**6.4 %**) | 6064 / 388 (**6.40 %**) |

**`missed = 0` in every run** (`sent == received`) — whole-frame *delivery* is robust even at
the cliff. The failures are within-frame byte corruption (`got = exp − 1`, sometimes −2/−3).
Parity fails are a ~30–34 % *subset* of length fails, never parity-only (a short frame
mismatches the header popcount about a third of the time).

### Findings

1. **Reliable through 18 MHz on both revs; sharp cliff at 20 MHz (~6.4 %).** Deployed ~5 MHz
has ~3–4× margin and is rock-solid.
2. **The cliff is rev-independent → PL022, not board SI.** v0.2.1 (SPI0 GP32–35, interleaved
between the row-driver halves — predicted *worse* by the crosstalk hypothesis) corrupts at
the **same** clock, rate, and signature as v0.3.1 (20 MHz → 6.4 %, `got=202`). The
pin-routing/crosstalk hypothesis is **falsified**; the ceiling is inherent **PL022
SPI-slave sampling marginality** (input-synchronizer / shifter). The earlier impression
that "v0.3.1 is more stable" was the last-byte-drop bug (now fixed), not a real rev delta.
3. **Cadence matters only near the cliff.** At 18 MHz, rate walks corruption up
(0 % → 2.9 % → 9.7 % at 200 Hz / 500 fps / 1 kHz). At 15 MHz, **500 fps is still 0 %** —
and notably the per-frame transaction is *longer* at 15 MHz than 18 MHz, so it leaves
*less* inter-transaction slack yet stays clean. ⇒ the limit is per-byte sampling, **not**
core-0 turnaround.
4. **Brightness/duty-cycle was not swept** — the only remaining open signal-integrity
question, now low priority (both revs clean to 18 MHz at these patterns).

## 4. Abandoned: the "turnaround optimization"

Hypothesis: each valid frame does **two** TX-FIFO reloads — `panel_spi_read()` clears to the
sentinel (SSE toggle #1), then `Messenger::update()` arms the real confirmation a few µs
later (toggle #2) — and the intermediate clear looked like wasted work. Tried collapsing to a
single late arm-or-clear.

**A/B at 15 MHz / 500 fps, same warm bench, back-to-back: original 2-reload = 0/14713;
optimized 1-reload = 391/16086 (2.4 %).** Reverted, re-tested original → 0/14713 again
(thermal ruled out). **The double SSE toggle is load-bearing** — the extra SSP disable/enable
per frame keeps the marginal PL022 slave RX aligned transaction-to-transaction. Abandoned;
warning comment left in `panel_spi_custom.cpp` so it is not re-attempted.

## 5. Future high-speed path (gated, not started)

The 25–30 MHz spec aspiration is beyond the PL022 slave cliff. Order of attack (cheapest
first), per the plan and the Codex review:

1. **(optional) Brightness-vs-error bench** with a rail/ground probe — close the last SI
question. Low priority.
2. **PL022 + DMA spike** — drive RX (and TX filler) via DMA + CS IRQ instead of the polled
loop; may fix the per-byte latency without a from-scratch slave.
3. **PIO + DMA SPI slave** — only if 1–2 prove the PL022 shifter is the wall. PIO-SPI ceiling
≈ 25 MHz; levers: `INPUT_SYNC_BYPASS`, clock-recovery via a 2nd PIO, overclock sysclk.
Must double-buffer RX, keep confirmation-arming in `Messenger` after CRC, tri-state CIPO
when CS is high (shared bus), and serve both revs (relative-pin-compatible, GPIO base 16).

## 6. Artifacts & references

- **Panel firmware:** branch `spi-bringup-step0` (PR #5). Key files: `panel_spi_custom.cpp`
(RX drain, double-toggle note, `SPI_DIAG` idle timeout), `messenger.cpp` (`SPI_DIAG`
counters + runt-ignore), `platformio.ini` (`pico_vXX_spidiag` envs).
- **Controller firmware** (separate repo): `LED-Display_G6_Firmware_Arena` branch
`runtime-spi-clock-and-frame-counter` — runtime SPI clock + frames-sent counter.
- **Controller-side ceilings + this sweep:**
`Modular-LED-Display/docs/development/g6_performance-benchmarks.md`.
- **Bench protocol:** `panel/bench/handoff-spi-highspeed-bench.md`.
- **Open issue:** startup first-message PE03 (one-time at master init; separate from
steady-state; GitHub issue #6).
102 changes: 102 additions & 0 deletions panel/bench/handoff-spi-highspeed-bench.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# High-Speed SPI Bench Protocol — arena-master bring-up

**Status:** DRAFT — written 2026-05-29, ahead of the arena master arriving on the bench.
**Hardware:** Saleae Logic Pro 8 (SPI decode + timing) + Digilent AD3 (analog rail probe) + production panel + **arena master** (real SPI controller; `panel_master` is retired).
**Goal:** Produce the two measurements that *gate* the SPI rework decision (see
`~/.claude/plans/please-find-the-2-iridescent-squirrel.md`, Steps 1–2):
- **Capture A** — the arena master's real **SCK clock** and **CS-high (inter-transaction) gap** distribution. Feeds every per-transaction-reset design decision.
- **Capture B** — **brightness vs. SPI-error** correlation at a fixed >10 MHz clock, with a rail/ground probe, to decide whether the >10 MHz failures are **signal/power integrity** (fix the board / rate) or the **PL022 shifter** (justifies PIO+DMA).

Uses the `instruments` skill helpers. **Two Python envs:** AD3 needs `/opt/homebrew/bin/python3.14` (DWF ctypes); Saleae uses system Python with `logic2-automation`. Run them as separate processes and merge results in analysis.

> **Context from the canonical spec** (`Modular-LED-Display/docs/development/`):
> - The **slim G4.1 controller currently clocks SPI at ~5 MHz** (`g6_07-arena-firmware-interface.md` line 178). The G6 SPI-clock target is an explicitly **unmeasured bring-up item** (`g6_03-controller.md` § "Timing measurements still needed" → "SPI clock + framing latency … must be measured"); `g6_01-panel-protocol.md` line 131's "up to 30 MHz" is **aspirational, not validated**. ⇒ Capture A's first job is to learn the arena master's *actual* clock. If it's ~5 MHz, PR #3 likely already covers the deployed system and the >10 MHz work is future-proofing.
> - **Topology:** 2 SPI buses (B0=P1–5, B1=P6–10), **20 CS lines (4 per panel column)**, with **SN74HCS08 column-buffer / fan-out chips** between the Teensy CS and the panel CS (~5–10 ns prop delay each). **CIPO is shared** on these topologies (`g6_03` line 258: broadcast forbidden on shared-CIPO to avoid MISO contention) → a PIO TX backend must tri-state CIPO. The extra buffering in the path is also a signal-integrity factor for Capture B.

---

## 1. Channel assignment (SPI pins differ by rev)

| Signal | v0.2.1 GPIO | v0.3.1 GPIO | Probe |
|---|---|---|---|
| SCK | GP34 | GP42 | Saleae **D0** |
| CS | GP33 | GP41 | Saleae **D1** |
| COPI (MOSI) | GP32 | GP40 | Saleae **D2** |
| CIPO (MISO) | GP35 | GP43 | Saleae **D3** |
| Panel logic/+5V rail (near MCU) | — | — | **AD3 Ch1+** (differential), Ch1− to **panel GND at the same point** |
| SCK time-reference tee (optional) | | | AD3 **Ch2+** (align droop to transactions) |

Probe the rail **at the panel**, not at the supply — we want to see droop/bounce the MCU's SPI pins actually reference. Differential Ch1 (across the local decoupling cap) captures ground bounce, not just rail sag.

---

## 2. Capture A — clock + CS-high gap (Saleae)

30 MHz SCK needs heavy oversampling for clean edges: run Saleae digital at **500 MS/s** (≤4 channels) — ~16 samples/SCK-bit at 30 MHz. Timer capture, a few seconds of steady master traffic.

```python
# system python; pip install logic2-automation; enable Logic 2 automation server
from saleae import automation
mgr = automation.Manager.connect(port=10430)
dev = automation.LogicDeviceConfiguration(
enabled_digital_channels=[0, 1, 2, 3], # SCK, CS, COPI, CIPO
digital_sample_rate=500_000_000, # 500 MS/s
digital_threshold_volts=1.65, # 3V3 logic
)
cap = mgr.start_capture(device_configuration=dev,
capture_configuration=automation.CaptureConfiguration(
capture_mode=automation.TimerCaptureMode(duration_seconds=3.0)))
cap.wait()
spi = cap.add_analyzer("SPI", label="arena", settings={
"MISO": 3, "MOSI": 2, "Clock": 0, "Enable": 1,
"Bits per Transfer": "8 Bits per Transfer",
"Clock State (CPOL)": "Clock is High when inactive", # CPOL=1
"Clock Phase (CPHA)": "Data is Valid on Clock Trailing Edge", # CPHA=1
"Significant Bit": "Most Significant Bit First",
})
cap.export_data_table("/tmp/spiA_frames.csv", analyzers=[spi])
cap.export_raw_data_binary(directory="/tmp/spiA_raw/", digital_channels=[0, 1])
```

**Measurements (Python):**
- **SCK frequency:** from D0 raw edges — `1 / median(diff(rising_edges_seconds))`; also report min/max to catch master jitter.
- **CS-high gap histogram:** on D1, gap = each CS **rising** → next CS **falling**; report min / p1 / median / max. **The `min` (or p1) is the hard budget** any per-transaction reset path must beat. **Measure the gap between consecutive CS-active windows (per-transaction), NOT the frame-to-frame cadence** — Frank's first number (~3333 µs) was the 300 fps frame period, not the inter-transaction gap. Working target: the master should **guarantee ≥0.5 ms** between transactions to the same panel; the panel's per-transaction work is single-digit µs, so confirm the measured min clears 0.5 ms by a wide margin (it almost certainly does).
- **Bytes/transaction & framing:** from the SPI analyzer table — confirm 3..300-byte frames decode and the CIPO confirmation slot `{header, cmd, CRC-8}` lands at bytes 0–2.

---

## 3. Capture B — brightness vs. SPI error sweep (AD3 rail + panel serial)

Fix the arena master at a chosen clock. Run the sweep first at **~5 MHz (the controller's actual rate today — the result that decides whether anything beyond PR #3 is even needed for deployment)**, then **10, 25, 30 MHz** as future-proofing toward the 30 MHz aspiration. At each clock, step `duty_cycle` through `{1, 32, 64, 128, 192, 255}` while the master streams a steady mix (e.g. Gray_16 patterns carrying that duty byte). At each step, record three things over a fixed window (e.g. 5 s):

1. **SPI error rate** — from the panel's serial heartbeat (`messenger.cpp` prints every 1000 msgs): take Δ(`err_displayed`+`err_suppressed`) / Δ`msg_count`, and the parity/length-OK flags. (Capture B is the motivation for the Step-0 task of also exposing a cumulative PE03/PE04 counter — easier to read than booleans.)
2. **Rail droop / ground bounce** — AD3 analog on Ch1.
3. *(optional, harder)* **Saleae SPI decode error count** for an independent error oracle.

**AD3 rail capture** (`/opt/homebrew/bin/python3.14`, raw ctypes per skill §1): for a per-step droop magnitude, **record mode** at single-channel **5 MHz** for the 5 s window catches the envelope (`V_nominal − min(Ch1)` = worst droop; std = bounce). Range ~500 mV around the rail, AC-ish. If you want to *see* droop coincident with a specific transaction, use **triggered single-shot** at 50–100 MHz with the **detector trigger on Ch1 falling below a droop threshold** (skill §1.4) and Ch2 = SCK tee for alignment — but the step-level envelope below is the decisive cheap test.

**Decision rule:**
- Error rate **climbs with duty_cycle** and droop/bounce events line up with bit errors → **signal/power integrity.** PIO won't fix marginal edges; pursue decoupling / termination / drive-strength / ground-return / lower rate first. (Strongly expected given v0.2.1's SPI pins are interleaved with the row drivers; compare v0.2.1 vs v0.3.1 here.)
- Error rate **flat vs. duty_cycle** but high at >10 MHz regardless → points at the **PL022 shifter**; do the PL022+DMA spike, then PIO+DMA if needed.

Run the **same sweep on both v0.2.1 and v0.3.1** — the rev delta is itself a strong SI signal.

---

## 4. Outputs

Per the existing convention, write to `panel/bench/results/<timestamp>-spi-highspeed/`:
- `captureA_frames.csv`, `captureA_raw/` + `captureA_measurements.json` (SCK freq, CS-gap stats)
- `captureB_sweep.csv` (rows: clock × duty × rev → error_rate, droop_mV, bounce_mV) + `captureB_plot.png` (error-rate & droop vs duty, per rev/clock)
- `serial.log` per step
- `SUMMARY.md` — the measured CS-high min/median, real clock, and the SI-vs-PL022 verdict that gates Steps 2–3.

---

## 5. Pre-flight checklist
- [ ] Arena master on the bus; panel running **production** firmware (SPI ingest, not `_bcmtest`).
- [ ] Saleae D0–D3 on SCK/CS/COPI/CIPO for the correct rev (§1); Logic 2 automation enabled.
- [ ] AD3 Ch1 differential across the panel-local rail cap; `python3.14` env verified (skill §1.1).
- [ ] Panel USB serial captured for the heartbeat counters.
- [ ] Run Capture A first (need the real clock/gap before interpreting B).
- [ ] Capture B on **both** revs, at 10/25/30 MHz.
Loading