From 171d6148a75421055ce57a63320cfa565b81ad4c Mon Sep 17 00:00:00 2001 From: Michael B Reiser Date: Fri, 29 May 2026 09:47:35 -0400 Subject: [PATCH 1/7] panel: make DEVEL serial heartbeat non-blocking (avoid core-0 stall -> PE03) The per-1000-message heartbeat in Messenger::update() did 11 blocking `Serial <<` writes (~280 B). This runs on core 0 between SPI transactions; if the USB host isn't draining, a full TX buffer could stall core 0 for milliseconds, so the panel wasn't back in custom_spi_read_blocking() when the master started the next transaction -> missed leading bytes -> num_rx undercount -> PE03 length error. Build one compact line and emit it only when Serial.availableForWrite() has room for the whole line, so write() can never block. The heartbeat is diagnostic and safely droppable. Builds clean for pico_v021 and pico_v031. Co-Authored-By: Claude Opus 4.8 (1M context) --- panel/src/messenger.cpp | 36 ++++++++++++++++++++++++------------ 1 file changed, 24 insertions(+), 12 deletions(-) diff --git a/panel/src/messenger.cpp b/panel/src/messenger.cpp index 71c309b..4fe3746 100644 --- a/panel/src/messenger.cpp +++ b/panel/src/messenger.cpp @@ -8,6 +8,7 @@ #include "predef_patterns_table.h" #include "display.h" #include +#include // S2.2: Display lives in main.cpp; we read frames_skipped_ for the heartbeat. extern Display display; @@ -164,20 +165,31 @@ void Messenger::update() { // else: buffer stays as the previous valid confirmation OR the empty // sentinel (already loaded into the TX FIFO between transactions). - // DEVEL serial heartbeat + // DEVEL serial heartbeat — NON-BLOCKING. // ----------------------------------------------------------- + // Emitted once per 1000 messages, but only when the USB-CDC TX FIFO has + // room for the whole line. The old version did 11 blocking `Serial <<` + // writes (~280 B); at 115200 baud a full host-side buffer could stall + // core 0 for milliseconds *between* SPI transactions, so the panel wasn't + // back in custom_spi_read_blocking() when the master started the next + // transaction -> missed leading bytes -> num_rx undercount -> PE03. The + // heartbeat is diagnostic and safely droppable: build one compact line and + // skip it entirely if there isn't guaranteed room (so write() never blocks). if (msg_count_ % 1000 == 0) { - Serial << "msg_count: " << msg_count_ << endl; - Serial << "parity_ok: " << parity_ok << endl; - Serial << "length_ok: " << length_ok << endl; - Serial << "protocol_ok: " << protocol_ok << endl; - Serial << "cmd_ok: " << cmd_ok << endl; - Serial << "comm_check_ok: " << comm_check_ok_ << endl; - Serial << "queue_drops: " << queue_drops_ << endl; - Serial << "frames_skipped: " << display.frames_skipped() << endl; - Serial << "err_displayed: " << error_displayed_count_ << endl; - Serial << "err_suppressed: " << error_suppressed_count_ << endl; - Serial << endl; + char hb[192]; + int n = snprintf(hb, sizeof(hb), + "HB msg=%llu par=%d len=%d proto=%d cmd=%d cc=%d " + "qdrop=%llu fskip=%llu errd=%llu errs=%llu\r\n", + (unsigned long long)msg_count_, + (int)parity_ok, (int)length_ok, (int)protocol_ok, (int)cmd_ok, + (int)comm_check_ok_, + (unsigned long long)queue_drops_, + (unsigned long long)display.frames_skipped(), + (unsigned long long)error_displayed_count_, + (unsigned long long)error_suppressed_count_); + if (n > 0 && n < (int)sizeof(hb) && Serial.availableForWrite() >= n) { + Serial.write(reinterpret_cast(hb), (size_t)n); + } } // ----------------------------------------------------------- } From fd45378599b886dcd05a9cd260a695af074f8f74 Mon Sep 17 00:00:00 2001 From: Michael B Reiser Date: Fri, 29 May 2026 09:51:14 -0400 Subject: [PATCH 2/7] Retire panel_master harness; add high-speed SPI bench protocol panel_master was a ~250 kHz GPIO-bitbang stand-in for a real arena master; it cannot validate the 10-30 MHz SPI operation that matters now. With the real arena master available it is superseded, so remove the subproject and point the bench docs at the arena master. - Remove panel_master/ (README, platformio.ini, src/main.cpp). - handoff-v1-bench-testing.md: panel_master -> arena master; add a retirement note. - Add handoff-spi-highspeed-bench.md: AD3/Saleae protocol to measure the arena master's real SCK clock + CS-high gap and sweep brightness vs SPI error rate, to decide signal-integrity vs PL022-shifter as the >10 MHz limit. Notes the canonical spec facts (controller ~5 MHz today; 30 MHz aspirational/unmeasured; shared CIPO; column-buffer topology). Co-Authored-By: Claude Opus 4.8 (1M context) --- panel/bench/handoff-spi-highspeed-bench.md | 102 ++++++ panel/bench/handoff-v1-bench-testing.md | 24 +- panel_master/README.md | 108 ------- panel_master/platformio.ini | 58 ---- panel_master/src/main.cpp | 350 --------------------- 5 files changed, 118 insertions(+), 524 deletions(-) create mode 100644 panel/bench/handoff-spi-highspeed-bench.md delete mode 100644 panel_master/README.md delete mode 100644 panel_master/platformio.ini delete mode 100644 panel_master/src/main.cpp diff --git a/panel/bench/handoff-spi-highspeed-bench.md b/panel/bench/handoff-spi-highspeed-bench.md new file mode 100644 index 0000000..c6e250a --- /dev/null +++ b/panel/bench/handoff-spi-highspeed-bench.md @@ -0,0 +1,102 @@ +# High-Speed SPI Bench Protocol — arena-master bring-up + +**Status:** DRAFT — written 2026-05-29, ahead of the arena master arriving on the bench. +**Hardware:** Saleae Logic Pro 8 (SPI decode + timing) + Digilent AD3 (analog rail probe) + production panel + **arena master** (real SPI controller; `panel_master` is retired). +**Goal:** Produce the two measurements that *gate* the SPI rework decision (see +`~/.claude/plans/please-find-the-2-iridescent-squirrel.md`, Steps 1–2): +- **Capture A** — the arena master's real **SCK clock** and **CS-high (inter-transaction) gap** distribution. Feeds every per-transaction-reset design decision. +- **Capture B** — **brightness vs. SPI-error** correlation at a fixed >10 MHz clock, with a rail/ground probe, to decide whether the >10 MHz failures are **signal/power integrity** (fix the board / rate) or the **PL022 shifter** (justifies PIO+DMA). + +Uses the `instruments` skill helpers. **Two Python envs:** AD3 needs `/opt/homebrew/bin/python3.14` (DWF ctypes); Saleae uses system Python with `logic2-automation`. Run them as separate processes and merge results in analysis. + +> **Context from the canonical spec** (`Modular-LED-Display/docs/development/`): +> - The **slim G4.1 controller currently clocks SPI at ~5 MHz** (`g6_07-arena-firmware-interface.md` line 178). The G6 SPI-clock target is an explicitly **unmeasured bring-up item** (`g6_03-controller.md` § "Timing measurements still needed" → "SPI clock + framing latency … must be measured"); `g6_01-panel-protocol.md` line 131's "up to 30 MHz" is **aspirational, not validated**. ⇒ Capture A's first job is to learn the arena master's *actual* clock. If it's ~5 MHz, PR #3 likely already covers the deployed system and the >10 MHz work is future-proofing. +> - **Topology:** 2 SPI buses (B0=P1–5, B1=P6–10), **20 CS lines (4 per panel column)**, with **SN74HCS08 column-buffer / fan-out chips** between the Teensy CS and the panel CS (~5–10 ns prop delay each). **CIPO is shared** on these topologies (`g6_03` line 258: broadcast forbidden on shared-CIPO to avoid MISO contention) → a PIO TX backend must tri-state CIPO. The extra buffering in the path is also a signal-integrity factor for Capture B. + +--- + +## 1. Channel assignment (SPI pins differ by rev) + +| Signal | v0.2.1 GPIO | v0.3.1 GPIO | Probe | +|---|---|---|---| +| SCK | GP34 | GP42 | Saleae **D0** | +| CS | GP33 | GP41 | Saleae **D1** | +| COPI (MOSI) | GP32 | GP40 | Saleae **D2** | +| CIPO (MISO) | GP35 | GP43 | Saleae **D3** | +| Panel logic/+5V rail (near MCU) | — | — | **AD3 Ch1+** (differential), Ch1− to **panel GND at the same point** | +| SCK time-reference tee (optional) | | | AD3 **Ch2+** (align droop to transactions) | + +Probe the rail **at the panel**, not at the supply — we want to see droop/bounce the MCU's SPI pins actually reference. Differential Ch1 (across the local decoupling cap) captures ground bounce, not just rail sag. + +--- + +## 2. Capture A — clock + CS-high gap (Saleae) + +30 MHz SCK needs heavy oversampling for clean edges: run Saleae digital at **500 MS/s** (≤4 channels) — ~16 samples/SCK-bit at 30 MHz. Timer capture, a few seconds of steady master traffic. + +```python +# system python; pip install logic2-automation; enable Logic 2 automation server +from saleae import automation +mgr = automation.Manager.connect(port=10430) +dev = automation.LogicDeviceConfiguration( + enabled_digital_channels=[0, 1, 2, 3], # SCK, CS, COPI, CIPO + digital_sample_rate=500_000_000, # 500 MS/s + digital_threshold_volts=1.65, # 3V3 logic +) +cap = mgr.start_capture(device_configuration=dev, + capture_configuration=automation.CaptureConfiguration( + capture_mode=automation.TimerCaptureMode(duration_seconds=3.0))) +cap.wait() +spi = cap.add_analyzer("SPI", label="arena", settings={ + "MISO": 3, "MOSI": 2, "Clock": 0, "Enable": 1, + "Bits per Transfer": "8 Bits per Transfer", + "Clock State (CPOL)": "Clock is High when inactive", # CPOL=1 + "Clock Phase (CPHA)": "Data is Valid on Clock Trailing Edge", # CPHA=1 + "Significant Bit": "Most Significant Bit First", +}) +cap.export_data_table("/tmp/spiA_frames.csv", analyzers=[spi]) +cap.export_raw_data_binary(directory="/tmp/spiA_raw/", digital_channels=[0, 1]) +``` + +**Measurements (Python):** +- **SCK frequency:** from D0 raw edges — `1 / median(diff(rising_edges_seconds))`; also report min/max to catch master jitter. +- **CS-high gap histogram:** on D1, gap = each CS **rising** → next CS **falling**; report min / p1 / median / max. **The `min` (or p1) is the hard budget** any per-transaction reset path must beat. +- **Bytes/transaction & framing:** from the SPI analyzer table — confirm 3..300-byte frames decode and the CIPO confirmation slot `{header, cmd, CRC-8}` lands at bytes 0–2. + +--- + +## 3. Capture B — brightness vs. SPI error sweep (AD3 rail + panel serial) + +Fix the arena master at a chosen clock. Run the sweep first at **~5 MHz (the controller's actual rate today — the result that decides whether anything beyond PR #3 is even needed for deployment)**, then **10, 25, 30 MHz** as future-proofing toward the 30 MHz aspiration. At each clock, step `duty_cycle` through `{1, 32, 64, 128, 192, 255}` while the master streams a steady mix (e.g. Gray_16 patterns carrying that duty byte). At each step, record three things over a fixed window (e.g. 5 s): + +1. **SPI error rate** — from the panel's serial heartbeat (`messenger.cpp` prints every 1000 msgs): take Δ(`err_displayed`+`err_suppressed`) / Δ`msg_count`, and the parity/length-OK flags. (Capture B is the motivation for the Step-0 task of also exposing a cumulative PE03/PE04 counter — easier to read than booleans.) +2. **Rail droop / ground bounce** — AD3 analog on Ch1. +3. *(optional, harder)* **Saleae SPI decode error count** for an independent error oracle. + +**AD3 rail capture** (`/opt/homebrew/bin/python3.14`, raw ctypes per skill §1): for a per-step droop magnitude, **record mode** at single-channel **5 MHz** for the 5 s window catches the envelope (`V_nominal − min(Ch1)` = worst droop; std = bounce). Range ~500 mV around the rail, AC-ish. If you want to *see* droop coincident with a specific transaction, use **triggered single-shot** at 50–100 MHz with the **detector trigger on Ch1 falling below a droop threshold** (skill §1.4) and Ch2 = SCK tee for alignment — but the step-level envelope below is the decisive cheap test. + +**Decision rule:** +- Error rate **climbs with duty_cycle** and droop/bounce events line up with bit errors → **signal/power integrity.** PIO won't fix marginal edges; pursue decoupling / termination / drive-strength / ground-return / lower rate first. (Strongly expected given v0.2.1's SPI pins are interleaved with the row drivers; compare v0.2.1 vs v0.3.1 here.) +- Error rate **flat vs. duty_cycle** but high at >10 MHz regardless → points at the **PL022 shifter**; do the PL022+DMA spike, then PIO+DMA if needed. + +Run the **same sweep on both v0.2.1 and v0.3.1** — the rev delta is itself a strong SI signal. + +--- + +## 4. Outputs + +Per the existing convention, write to `panel/bench/results/-spi-highspeed/`: +- `captureA_frames.csv`, `captureA_raw/` + `captureA_measurements.json` (SCK freq, CS-gap stats) +- `captureB_sweep.csv` (rows: clock × duty × rev → error_rate, droop_mV, bounce_mV) + `captureB_plot.png` (error-rate & droop vs duty, per rev/clock) +- `serial.log` per step +- `SUMMARY.md` — the measured CS-high min/median, real clock, and the SI-vs-PL022 verdict that gates Steps 2–3. + +--- + +## 5. Pre-flight checklist +- [ ] Arena master on the bus; panel running **production** firmware (SPI ingest, not `_bcmtest`). +- [ ] Saleae D0–D3 on SCK/CS/COPI/CIPO for the correct rev (§1); Logic 2 automation enabled. +- [ ] AD3 Ch1 differential across the panel-local rail cap; `python3.14` env verified (skill §1.1). +- [ ] Panel USB serial captured for the heartbeat counters. +- [ ] Run Capture A first (need the real clock/gap before interpreting B). +- [ ] Capture B on **both** revs, at 10/25/30 MHz. diff --git a/panel/bench/handoff-v1-bench-testing.md b/panel/bench/handoff-v1-bench-testing.md index fe45e28..7f13961 100644 --- a/panel/bench/handoff-v1-bench-testing.md +++ b/panel/bench/handoff-v1-bench-testing.md @@ -5,6 +5,14 @@ **Goal:** Automated validation of the V1 firmware feature-complete release (error codes + Triggered + Gated) **Forward-looking:** This run's automation scripts should be factored into a reusable bench-test skill in a follow-up session. +> **Note (2026-05-29): `panel_master` is retired.** It was a stop-gap that turned a +> second G6 panel into a ~250 kHz bitbang SPI controller in the absence of a real +> arena. We now have the actual **arena master**, which is the sole source of +> protocol-level SPI stimulus going forward (and the only thing that can validate +> 10–30 MHz). All "arena master" references below were formerly `panel_master`; the +> bench role is identical, just driven by real hardware. See +> `handoff-spi-highspeed-bench.md` for the high-speed SPI capture protocol. + --- ## 1. Context for the next session @@ -59,7 +67,7 @@ Suggested channel assignment (USER TO CONFIRM PIN-TO-PAD MAPPING — varies by p | D0 | EINT (GP45) | Trigger source; sample at ≥1 MS/s for ns-level edge timing | | D1 | One row pin (e.g., ROW[0] = GP20 on v0.3.1) | LOW = row ON; gives us "row drive window" timing | | D2 | A second row pin (e.g., ROW[10] = GP30) | Lets us verify row sequencing across the panel without moving probes | -| D3-D6 | SPI MOSI/MISO/SCK/CS (if testing protocol-level Triggered/Gated via panel_master) | Optional | +| D3-D6 | SPI MOSI/MISO/SCK/CS (if testing protocol-level Triggered/Gated via the arena master) | Optional | | A0 | Photodiode amplifier output | LED-on ground truth; see below | ### Photodiode (the new piece) @@ -148,7 +156,7 @@ For tests that sweep a parameter (e.g., Triggered at 100 Hz / 1 kHz / 8 kHz / 22 | 1 | **Firmware boots, predef blob valid** | Power-on + USB | Serial | `predef: ok, 200 slots (1994 B)` printed within 2 s of boot | | 2 | **EINT polarity — rising-edge fires** | AD3 single rising pulse (1 ms width, 3.3 V) after `T` | Saleae D0 (EINT) + D1 (row[0]) + A0 (photodiode) | Row LOW + photodiode rise both follow EINT rising edge within 5 µs; no LED transition correlated with EINT falling edge | | 3 | **Triggered consumption: 20 edges = 1 frame** | AD3 burst of 20 rising edges at 1 kHz after `T` | Saleae all rows + photodiode | Each edge produces one row drive (row LOW window); 20 edges total; panel dark after edge 21 | -| 4 | **Triggered at 8 kHz with operational duty_cycle** | Modify `T` cmd to push duty=85 (or panel_master 0x12 at duty=85), AD3 burst at 8 kHz | Saleae | Per-row LED-on window 12-18 µs (~10-15% of 125 µs); no edges missed across 100 bursts | +| 4 | **Triggered at 8 kHz with operational duty_cycle** | Modify `T` cmd to push duty=85 (or the arena master 0x12 at duty=85), AD3 burst at 8 kHz | Saleae | Per-row LED-on window 12-18 µs (~10-15% of 125 µs); no edges missed across 100 bursts | | 5 | **All glyphs render** | `e 0; e 1; e 2; e 3; e 4; e 5; e 100; e 50` via serial, one per ~1.5 s | Camera frame or photodiode array | Eyeball check (manual) — automated version requires multi-pixel sensor | ### P1 — would degrade UX @@ -158,18 +166,18 @@ For tests that sweep a parameter (e.g., Triggered at 100 Hz / 1 kHz / 8 kHz / 22 | 6 | **Gated brightness matches Persistent** | `g` (Gated checkerboard duty=192) with AD3 holding GP45 HIGH for 100 ms, then compare to Persistent at same pattern | A0 photodiode integration | Mean photodiode value during HIGH window within 10% of Persistent reference | | 7 | **Gated drop latency** | AD3 generates HIGH (100 ms) → LOW step | Saleae D0 (EINT) + A0 (photodiode) | Photodiode drops below threshold within 60 µs of EINT falling (50 µs row-drive worst case + 10 µs margin) | | 8 | **Error display interrupts + restores Triggered** | `T` + AD3 firing edges at 100 Hz; midway through (e.g., after 10 edges), inject `e 2` | Saleae all rows + photodiode | PE02 glyph visible for ~1 s; then Triggered resumes from edge 11 | -| 9 | **Error rate-limit (1 per 5 s)** | panel_master injects 10 bad-parity messages over 1 s | Saleae photodiode + serial heartbeat | Exactly 1 error glyph displayed; serial heartbeat shows `err_displayed:1, err_suppressed:9` | -| 10 | **CIPO unchanged during error window** | panel_master sends valid 0x10, captures CIPO; injects error; sends another 0x10 during error window, captures CIPO | Saleae SPI channels | Second CIPO capture identical to first | +| 9 | **Error rate-limit (1 per 5 s)** | the arena master injects 10 bad-parity messages over 1 s | Saleae photodiode + serial heartbeat | Exactly 1 error glyph displayed; serial heartbeat shows `err_displayed:1, err_suppressed:9` | +| 10 | **CIPO unchanged during error window** | the arena master sends valid 0x10, captures CIPO; injects error; sends another 0x10 during error window, captures CIPO | Saleae SPI channels | Second CIPO capture identical to first | ### P2 — verify if time permits | # | Test | Stimulus | Capture | Pass criterion | |---|---|---|---|---| | 11 | **Triggered frequency sweep** | AD3 sweeps 100 Hz → 50 kHz, 20-edge bursts | Saleae | First missed edge identifies the actual per-row drive limit; compare to spec table | -| 12 | **duty_cycle sweep — measure LED-on width** | duty=1, 16, 64, 85, 128, 192, 255 (via modified `T` command rebuilt each time, or panel_master) | Saleae A0 photodiode | Per-row LED-on width measured; update spec table with empirical values | +| 12 | **duty_cycle sweep — measure LED-on width** | duty=1, 16, 64, 85, 128, 192, 255 (via modified `T` command rebuilt each time, or the arena master) | Saleae A0 photodiode | Per-row LED-on width measured; update spec table with empirical values | | 13 | **865 ns trigger-to-LED latency** (spec line 775) | AD3 single rising edge | Saleae D0 + A0 (high time resolution) | edge-to-photodiode-rise delay measured; expected ~1 µs (includes photodiode response time) | | 14 | **EINT signal quality** (spec line 852) | AD3 sources rising edge; capture both source and panel-side EINT | Saleae D0 + D7 (panel-side EINT via probe) | No falling-edge ringing that would re-fire a row | -| 15 | **Cross-core soak** | panel_master streams mixed valid + invalid traffic for 60 s | Saleae photodiode + serial | No panel hangs; `frames_skipped` near 0; visible state matches commanded state | +| 15 | **Cross-core soak** | the arena master streams mixed valid + invalid traffic for 60 s | Saleae photodiode + serial | No panel hangs; `frames_skipped` near 0; visible state matches commanded state | --- @@ -237,7 +245,7 @@ Already in firmware (build `pico_v031_bcmtest`): | `g` | Push Gated checkerboard (duty=192) | | `t` | Scan-period timing benchmark (existing) | -**To inject duty_cycle-varied Triggered patterns for sweep tests**: either (a) modify the `T` handler in `panel/src/main.cpp` and rebuild, or (b) use panel_master to send real 0x12 commands with arbitrary duty_cycle bytes. Approach (b) is more realistic but requires panel_master to be on the SPI bus (production firmware needed, not bcmtest). +**To inject duty_cycle-varied Triggered patterns for sweep tests**: either (a) modify the `T` handler in `panel/src/main.cpp` and rebuild, or (b) use the arena master to send real 0x12 commands with arbitrary duty_cycle bytes. Approach (b) is more realistic but requires the arena master to be on the SPI bus (production firmware needed, not bcmtest). --- @@ -265,7 +273,7 @@ These came up during implementation and need empirical answers Tuesday: 1. **EINT polarity on production hardware**: spec says rising-edge; prototype rig showed falling-edge fire due to ringing (spec line 852). Production board may or may not reproduce. 2. **Actual per-row drive time at each (duty_cycle, gray_level)**: the spec table values are derived from `base_T = 3 µs` and the BCM weight sum. Confirm against photodiode measurements; update the spec doc with empirical numbers. 3. **865 ± 17 ns trigger-to-LED latency** (spec line 775): re-confirm on production board (prototype measurement). -4. **Cross-core stability under mixed traffic**: implementation review found no race, but soak testing under realistic panel_master load is the real proof. +4. **Cross-core stability under mixed traffic**: implementation review found no race, but soak testing under realistic arena-master load is the real proof. 5. **EINT input impedance behavior with pull-down**: confirm the function-gen source impedance + pull-down combo doesn't degrade rising-edge sharpness. If it does, switch to floating + external pull (or just no pull) and document. --- diff --git a/panel_master/README.md b/panel_master/README.md deleted file mode 100644 index 0191f14..0000000 --- a/panel_master/README.md +++ /dev/null @@ -1,108 +0,0 @@ -# panel_master — bench harness - -Turns a G6 panel into a fake SPI controller (master) that drives a second -panel running the slave firmware in `../panel/`. Used for V1 wire-protocol -bench validation in the absence of an actual arena controller. - -## Build - -```sh -# v0.2.1 master + slave pair -pio run -d panel_master -e pico_master_v021 -pio run -d panel -e pico_v021 - -# v0.3.1 master + slave pair -pio run -d panel_master -e pico_master_v031 -pio run -d panel -e pico_v031 -``` - -## Wiring pre-flight - -**Before powering up anything**, verify the following on the panel schematics -(`reiserlab/LED-Display_G6_Hardware_Panel` v0.2.1 / v0.3.1): - -1. **Direction:** On both panels, the inter-panel J2 header carries MOSI/MISO/SCK - from the panel-MCU's perspective (master mode and slave mode use the same - pin names; the MCU's role is what flips). Master MOSI/MISO/SCK pins ↔ slave - MOSI/MISO/SCK pins, **straight-through** — no crossover. -2. **CS path:** Each panel hardwires J3 pin 5 → MCU CS0. For two-panel bench - testing, wire master J3 pin 5 to slave J3 pin 5 with a single jumper. The - CS-shift behavior of J3↔J5 (vertical stacking topology) does NOT apply when - using J3 on both sides. -3. **Power (single-USB setup):** Plug USB only into the **master** panel. - Power the slave from the master via **J2 pin 5 (+5V) and J2 pin 4 (GND)**. - This mirrors the arena topology (which routes +5V through J2 to fan out - to multiple panels) and is electrically supported, but watch for two - gotchas: - - **USB-port current limit (~500 mA).** Two panels at high brightness - (duty_cycle ≈ 255 with many LEDs lit) can pull > 500 mA and brown out. - Bench sequence below uses moderate duty_cycle values; if you see USB - dropouts, drop the duty_cycle sweep range or use a powered USB hub. - - **No backfeed.** Only one panel may be USB-connected at a time. Plugging - a second USB into the slave while +5V is also bridged via J2 would cross - two USB power rails. **Do not do that.** - If you do have two USB cables: independent USB on each + do NOT connect - J2 pin 5 between panels (then there is no backfeed risk). -4. **Ground continuity:** wire J2 pin 4 (GND) through to the slave. In the - single-USB setup this is the slave's only ground reference. - -## Slave serial output - -In the single-USB setup, **the slave's USB serial is not accessible** — the -slave runs without a host connection. All bench observations come from the -master side: master serial logs the CIPO bytes for each transaction, which is -sufficient to infer slave state for every test step in the sequence below. -Visual inspection of the slave's LEDs confirms display behavior. - -If you need slave-side diagnostics (`msg_count`, `parity_ok`, `queue_drops`), -temporarily swap which panel gets the USB cable. - -## Hardware pin reference - -### v0.2.1 — SPI0 on GP32–35 - -| Header pin | Function | Master GPIO | Slave GPIO | -|---|---|---|---| -| J2 pin 1 | MISO | GP35 (TX in master mode) | GP35 (TX in slave mode) | -| J2 pin 2 | MOSI | GP32 | GP32 | -| J2 pin 3 | SCK | GP34 | GP34 | -| J2 pin 4 | GND | GND | GND | -| J2 pin 5 | +5V | (USB-powered) | **connect to master J2 pin 5** (single-USB) | -| J3 pin 5 | CS0 | GP33 | GP33 | - -### v0.3.1 — SPI1 on GP40–43 - -| Header pin | Function | Master GPIO | Slave GPIO | -|---|---|---|---| -| J2 pin 1 | MISO | GP43 | GP43 | -| J2 pin 2 | MOSI | GP40 | GP40 | -| J2 pin 3 | SCK | GP42 | GP42 | -| J2 pin 4 | GND | GND | GND | -| J2 pin 5 | +5V | (USB-powered) | **connect to master J2 pin 5** (single-USB) | -| J3 pin 5 | CS0 | GP41 | GP41 | - -## Bench sequence (per iteration, ~1 Hz) - -The master loops through: - -1. **COMM_CHECK** (canonical payload 0..199) — expect CIPO `[0x81 0x00 0x00]` - on first run (slave empty buffer) or `[parity|0x01, prev_cmd, prev_crc]` thereafter. - CRC byte is CRC-8/AUTOSAR per `g6_01-panel-protocol.md` § CRC-8 algorithm. - Canonical 202-byte COMM_CHECK CRC = `0x8B`. -2. **Gray_2 cross**, duty_cycle=192 — visual: row+column-10 cross at high brightness. -3. **Gray_16 gradient**, duty_cycle=128 — visual: left-to-right brightness ramp at half intensity. -4. **Duty-cycle sweep** on Gray_16 gradient — brightness ramps each iteration. -5. **COMM_CHECK with one byte flipped** + follow-up COMM_CHECK — expect - `[parity|0x01, 0xFF, 0x00]` on the follow-up (the COMM_CHECK-fail sentinel). -6. **Truncated frame** (3 bytes, length check fails) — CIPO unchanged from prev valid. -7. **Parity-corrupted frame** (no parity recompute) — CIPO unchanged. - -Master prints CIPO bytes to USB serial. Slave prints `msg_count / parity_ok / -length_ok / protocol_ok / cmd_ok / comm_check_ok / queue_drops` every 1000 -messages. - -## SPI clock - -Starts at **1 MHz** — well below the marginal hardware ceiling. The spec -targets 25 MHz with margin; max is 30 MHz. Bump `MASTER_SPI_HZ` in -`src/main.cpp` after the bench passes once. diff --git a/panel_master/platformio.ini b/panel_master/platformio.ini deleted file mode 100644 index fe4aa9e..0000000 --- a/panel_master/platformio.ini +++ /dev/null @@ -1,58 +0,0 @@ -; PlatformIO Project Configuration File — panel_master (bench harness) -; -; Turns a G6 panel into a fake controller (SPI master) for V1 wire-protocol -; bench validation. One panel runs `panel/` slave firmware; another panel -; runs this `panel_master/` firmware and drives it. -; -; Shared protocol/message/pattern code is pulled in from ../panel/src/ via -; build_src_filter (cross-platform; no symlinks). PANEL_REV selects which -; SPI peripheral and pins the master drives. - - -[common] -platform = https://github.com/maxgerhardt/platform-raspberrypi.git -board = generic_rp2350 -framework = arduino -board_build.core = earlephilhower -lib_deps = - ArduinoEigen - Streaming - -; PSRAM not used by the master, but the board has it on the same pins so we -; still set the chip-select correctly to avoid stray bus activity. -board_upload.psram_length = 8388608 -board_build.filesystem_size = 0m - -; USB identity -board_build.arduino.earlephilhower.usb_manufacturer = Reiser Lab -board_build.arduino.earlephilhower.usb_product = G6 Panel-as-Master Bench Harness -board_build.arduino.earlephilhower.usb_vid = 0x2E8A -board_build.arduino.earlephilhower.usb_pid = 0x0009 - -; Pull in shared protocol/message/pattern sources from the slave firmware tree. -; This is the cross-platform alternative to symlinks (works on Windows + CI). -build_src_filter = - +<*> - +<../../panel/src/constants.cpp> - +<../../panel/src/message.cpp> - +<../../panel/src/pattern.cpp> - +<../../panel/src/protocol.cpp> -build_flags = - -I../panel/src - - -[env:pico_master_v021] -extends = common -; Same panel hardware as the slave; just configured as master instead of slave. -build_flags = - ${common.build_flags} - -DPANEL_REV=21 - -DRP2350_PSRAM_CS=0 - - -[env:pico_master_v031] -extends = common -build_flags = - ${common.build_flags} - -DPANEL_REV=31 - -DRP2350_PSRAM_CS=47 diff --git a/panel_master/src/main.cpp b/panel_master/src/main.cpp deleted file mode 100644 index 6baed0e..0000000 --- a/panel_master/src/main.cpp +++ /dev/null @@ -1,350 +0,0 @@ -// ============================================================================= -// G6 panel firmware — panel-as-master bench harness (Stage 1) -// ============================================================================= -// -// This firmware turns a G6 panel into a fake controller (SPI master). It emits -// V1 wire-protocol frames to a second panel running the `panel/` slave firmware -// and reads back the 3-byte CIPO confirmation, printing both to USB serial. -// -// IMPORTANT WIRING NOTE — bitbang master (not hardware SPI): -// The panel's PCB hardwires GP32 to the J2 pin labeled "MOSI" (master -// perspective) and GP35 to the J2 "MISO" pin. The RP2350 hardware SPI -// peripheral fixes pin roles: GP32 = spi0 RX (input), GP35 = spi0 TX -// (output). That's correct for slave operation, but in MASTER mode the -// peripheral would want GP32 as input-from-MISO and GP35 as output-to-MOSI -// — opposite of how the PCB is wired. Using hw-SPI on the master would -// require a crossover cable. -// -// To avoid the crossover-cable requirement, this firmware bitbangs SPI -// in software on plain GPIOs. Pin roles can then be assigned freely: -// MOSI_OUT = GP32 (drives the J2 "MOSI" wire to slave's GP32 RX) -// MISO_IN = GP35 (reads the J2 "MISO" wire from slave's GP35 TX) -// SCK_OUT = GP34 (drives the J2 "SCK" wire to slave's GP34 SCK) -// CS_OUT = GP33 (drives the J3 pin 5 CS wire to slave's GP33 CSn) -// A straight-through J2↔J2 cable + J3pin5↔J3pin5 jumper works. -// -// Slave still uses its hardware SPI peripheral; spec-compliant SPI mode 3. -// -// Test sequence (1 Hz cadence): -// 1. COMM_CHECK with canonical payload 0..199 -// 2. Gray_2 cross pattern (V1 Persistent 0x11), duty_cycle=192 -// 3. Gray_16 gradient pattern (V1 Persistent 0x31), duty_cycle=128 -// 4. Duty cycle sweep on Gray_16 Persistent -// 4b. V1 Oneshot burst (500 frames at ~500 Hz) -// 5. COMM_CHECK with one byte deliberately flipped (validation should fail) -// 6. Truncated frame (length check should fail) -// 7. Parity-corrupted frame -// ============================================================================= - -#include -#include -#include -#include "constants.h" -#include "message.h" -#include "pattern.h" -#include "protocol.h" - -// Start at ~250 kHz — easy on the bitbang loop, well within slave hw-SPI spec. -// Per-bit delay = roughly 2 µs total per bit (1 µs per half-period). The slave's -// hw-SPI peripheral handles this comfortably; spec target is 25 MHz so we have -// 100x margin for bench-test. -static constexpr uint32_t BITBANG_HALF_PERIOD_US = 2; // -> ~250 kHz bit rate - -// Pin assignments (master perspective). With bitbang we are NOT bound by -// hardware-SPI funcsel pin roles, so we can map outputs to the GPIOs whose -// PCB traces go to the "MOSI" wire and "SCK" wire, and configure GP35 as an -// input to read the slave's MISO drive. See constants.cpp for the per-rev -// values of these macros (v0.2.1: 32/35/34/33; v0.3.1: 40/43/42/41). -#define MASTER_MOSI_OUT_PIN SPI_MOSI_PIN // GP32 on v0.2.1 -#define MASTER_MISO_IN_PIN SPI_MISO_PIN // GP35 on v0.2.1 -#define MASTER_SCK_OUT_PIN SPI_SCK_PIN // GP34 on v0.2.1 -#define MASTER_CS_OUT_PIN SPI_CS_PIN // GP33 on v0.2.1 - -static inline void cs_high() { gpio_put(MASTER_CS_OUT_PIN, 1); } -static inline void cs_low() { gpio_put(MASTER_CS_OUT_PIN, 0); } -static inline void sck_high(){ gpio_put(MASTER_SCK_OUT_PIN, 1); } -static inline void sck_low() { gpio_put(MASTER_SCK_OUT_PIN, 0); } - -// SPI Mode 3 (CPOL=1, CPHA=1, MSB-first) bitbang. -// Idle: SCK = HIGH. Each bit: -// - SCK falls (leading edge); master drives MOSI bit -// - half period -// - SCK rises (trailing edge); master samples MISO -// - half period -static inline uint8_t bitbang_xfer_byte(uint8_t tx_byte) { - uint8_t rx_byte = 0; - for (int bit = 7; bit >= 0; bit--) { - // Drive MOSI (master shifts on first edge of cycle) - gpio_put(MASTER_MOSI_OUT_PIN, (tx_byte >> bit) & 1u); - // First edge: SCK HIGH -> LOW - sck_low(); - busy_wait_us(BITBANG_HALF_PERIOD_US); - // Second edge: SCK LOW -> HIGH; both sides sample now - sck_high(); - if (gpio_get(MASTER_MISO_IN_PIN)) { - rx_byte |= (1u << bit); - } - busy_wait_us(BITBANG_HALF_PERIOD_US); - } - return rx_byte; -} - -// One SPI transaction: assert CS, exchange `len` bytes, deassert CS. -static void spi_xfer(const uint8_t *tx, uint8_t *rx, size_t len) { - // CS setup - cs_low(); - busy_wait_us(5); // CS-setup time (slave parser reset) - for (size_t i = 0; i < len; i++) { - rx[i] = bitbang_xfer_byte(tx[i]); - } - busy_wait_us(5); - cs_high(); - busy_wait_us(10); // inter-transaction idle -} - -// Print the first 3 RX bytes (the CIPO confirmation slot) hex-formatted. -static void print_cipo(const char *label, const uint8_t *rx) { - Serial.print(label); - Serial.print(" CIPO: [0x"); - if (rx[0] < 0x10) Serial.print('0'); - Serial.print(rx[0], HEX); - Serial.print(" 0x"); - if (rx[1] < 0x10) Serial.print('0'); - Serial.print(rx[1], HEX); - Serial.print(" 0x"); - if (rx[2] < 0x10) Serial.print('0'); - Serial.print(rx[2], HEX); - Serial.println("]"); -} - - -void setup() { - Serial.begin(115200); - - // Configure bitbang GPIOs (all SIO function; no hardware SPI peripheral). - // Outputs: MOSI, SCK, CS — driven by the master. - // Input: MISO — driven by the slave's hw-SPI TX pin. - gpio_init(MASTER_MOSI_OUT_PIN); - gpio_init(MASTER_SCK_OUT_PIN); - gpio_init(MASTER_CS_OUT_PIN); - gpio_init(MASTER_MISO_IN_PIN); - gpio_set_function(MASTER_MOSI_OUT_PIN, GPIO_FUNC_SIO); - gpio_set_function(MASTER_SCK_OUT_PIN, GPIO_FUNC_SIO); - gpio_set_function(MASTER_CS_OUT_PIN, GPIO_FUNC_SIO); - gpio_set_function(MASTER_MISO_IN_PIN, GPIO_FUNC_SIO); - gpio_set_dir(MASTER_MOSI_OUT_PIN, GPIO_OUT); - gpio_set_dir(MASTER_SCK_OUT_PIN, GPIO_OUT); - gpio_set_dir(MASTER_CS_OUT_PIN, GPIO_OUT); - gpio_set_dir(MASTER_MISO_IN_PIN, GPIO_IN); - // Idle states: SCK = HIGH (CPOL=1), CS = HIGH (deasserted), MOSI = 0. - sck_high(); - cs_high(); - gpio_put(MASTER_MOSI_OUT_PIN, 0); - - // Banner — printed AFTER GPIO setup so the GPIO init can't stall pre-USB. - delay(2000); // give USB CDC time to enumerate - Serial.println("=== G6 panel_master bench harness (bitbang SPI) ==="); - Serial.print("PANEL_REV: "); Serial.println(PANEL_REV); - Serial.print("MOSI->slave on GP"); Serial.println(MASTER_MOSI_OUT_PIN); - Serial.print("MISO<-slave on GP"); Serial.println(MASTER_MISO_IN_PIN); - Serial.print("SCK on GP"); Serial.println(MASTER_SCK_OUT_PIN); - Serial.print("CS on GP"); Serial.println(MASTER_CS_OUT_PIN); - Serial.print("bit half-period: "); Serial.print(BITBANG_HALF_PERIOD_US); Serial.println(" us"); - - // Startup delay so the slave panel completes boot + FIFO prime before - // our first transaction. - delay(1000); - Serial.println("setup() done; entering test loop"); -} - - -// Helper: build a Gray_2 row+column cross pattern and serialize it. -// `protocol_version` selects which header byte (V1 = 0x01 only for now). -// `cmd_id` selects mode: 0x10 (Oneshot) or 0x11 (Persistent). -static void build_gray_2_cross(uint8_t *out, size_t out_len, uint8_t duty_cycle, - uint8_t protocol_version, uint8_t cmd_id) { - Pattern pat; - pat.set_gray_level(GrayLevel::Gray_2); - pat.set_duty_cycle(duty_cycle); - // Cross through center: row 10 and column 10 all ON - for (size_t k = 0; k < PANEL_SIZE; k++) { - pat.matrix()(10, k) = 1; - pat.matrix()(k, 10) = 1; - } - Message msg; - msg.from_pattern(pat, protocol_version); - // Override cmd byte if caller wants the V1 Persistent opcode (0x11) - // instead of V1 Oneshot (0x10). from_pattern() sets it to 0x10 by - // default; we patch byte 1 and recompute parity. - if (cmd_id != CMD_ID_DISPLAY_GRAY_2) { - msg.set_command_byte(cmd_id); - msg.set_parity_bit(); - } - for (size_t i = 0; i < msg.num_bytes() && i < out_len; i++) { - out[i] = msg.data_ptr()[i]; - } -} - -// Helper: build a Gray_16 gradient pattern (intensity = col index ramps 0..15). -// `protocol_version` selects header byte; `cmd_id` selects 0x30 (Oneshot) or -// 0x31 (Persistent). -static void build_gray_16_gradient(uint8_t *out, size_t out_len, uint8_t duty_cycle, - uint8_t protocol_version, uint8_t cmd_id) { - Pattern pat; - pat.set_gray_level(GrayLevel::Gray_16); - pat.set_duty_cycle(duty_cycle); - for (size_t i = 0; i < PANEL_SIZE; i++) { - for (size_t j = 0; j < PANEL_SIZE; j++) { - pat.matrix()(i, j) = uint8_t(j * 15 / (PANEL_SIZE - 1)); // 0..15 - } - } - Message msg; - msg.from_pattern(pat, protocol_version); - if (cmd_id != CMD_ID_DISPLAY_GRAY_16) { - msg.set_command_byte(cmd_id); - msg.set_parity_bit(); - } - for (size_t i = 0; i < msg.num_bytes() && i < out_len; i++) { - out[i] = msg.data_ptr()[i]; - } -} - -// Helper: build the canonical COMM_CHECK message (0..199 payload). -static void build_comm_check(uint8_t *out, size_t out_len) { - Message msg; - msg.to_comms_check(); - for (size_t i = 0; i < msg.num_bytes() && i < out_len; i++) { - out[i] = msg.data_ptr()[i]; - } -} - - -void loop() { - static uint32_t iter = 0; - static uint8_t duty_cycle_sweep = 0; - iter++; - - // -- Step 1: COMM_CHECK ------------------------------------------------- - { - uint8_t tx[HEADER_SIZE + PAYLOAD_COMMS_CHECK] = {0}; - uint8_t rx[HEADER_SIZE + PAYLOAD_COMMS_CHECK] = {0}; - build_comm_check(tx, sizeof(tx)); - spi_xfer(tx, rx, sizeof(tx)); - Serial.print("[iter "); - Serial.print(iter); - Serial.print("] COMM_CHECK -> "); - print_cipo("", rx); - } - - // -- Step 2: Gray_2 cross V1 PERSISTENT (0x11), duty_cycle=192 -------------- - // Persistent: one send → display stays visible until next message. Easy - // to verify visually at 1 Hz cadence (steps 2-4 each replace the prior - // pattern, so the panel cycles through them). - { - uint8_t tx[HEADER_SIZE + PAYLOAD_DISPLAY_GRAY_2] = {0}; - uint8_t rx[HEADER_SIZE + PAYLOAD_DISPLAY_GRAY_2] = {0}; - build_gray_2_cross(tx, sizeof(tx), 192, - CMD_PROTOCOL_V1, CMD_ID_DISPLAY_GRAY_2_PERSIST); - spi_xfer(tx, rx, sizeof(tx)); - print_cipo("Gray_2 cross PERSIST s=192", rx); - } - - // -- Step 3: Gray_16 gradient V1 PERSISTENT (0x31), duty_cycle=128 --------- - { - uint8_t tx[HEADER_SIZE + PAYLOAD_DISPLAY_GRAY_16] = {0}; - uint8_t rx[HEADER_SIZE + PAYLOAD_DISPLAY_GRAY_16] = {0}; - build_gray_16_gradient(tx, sizeof(tx), 128, - CMD_PROTOCOL_V1, CMD_ID_DISPLAY_GRAY_16_PERSIST); - spi_xfer(tx, rx, sizeof(tx)); - print_cipo("Gray_16 gradient PERSIST s=128", rx); - } - - // -- Step 4: duty_cycle sweep (V1 Persistent gradient, one step per iter) -- - { - uint8_t tx[HEADER_SIZE + PAYLOAD_DISPLAY_GRAY_16] = {0}; - uint8_t rx[HEADER_SIZE + PAYLOAD_DISPLAY_GRAY_16] = {0}; - build_gray_16_gradient(tx, sizeof(tx), duty_cycle_sweep, - CMD_PROTOCOL_V1, CMD_ID_DISPLAY_GRAY_16_PERSIST); - spi_xfer(tx, rx, sizeof(tx)); - Serial.print("Gray_16 sweep duty_cycle="); - Serial.print(duty_cycle_sweep); - Serial.print(" -> "); - print_cipo("", rx); - duty_cycle_sweep = (duty_cycle_sweep + 13) & 0xFF; // ~20 steps over 0..255 - } - - // -- Step 4b: V1 ONESHOT burst (0x10 streamed at ~500 Hz for 1 second) -- - // Oneshot semantics: panel displays for one scan, then idles. Without - // streaming, the pattern would just flash once. We send the Gray_2 cross - // 500 times in quick succession; if Oneshot is working, the slave should - // show the cross continuously during this burst, then go dark after. - { - uint8_t tx[HEADER_SIZE + PAYLOAD_DISPLAY_GRAY_2] = {0}; - uint8_t rx[HEADER_SIZE + PAYLOAD_DISPLAY_GRAY_2] = {0}; - build_gray_2_cross(tx, sizeof(tx), 192, - CMD_PROTOCOL_V1, CMD_ID_DISPLAY_GRAY_2); - Serial.println("V1 Oneshot burst: 500x at ~2ms cadence"); - for (int i = 0; i < 500; i++) { - spi_xfer(tx, rx, sizeof(tx)); - // 2 ms gap = ~500 Hz "frame rate" from the master side - delayMicroseconds(1500); - } - print_cipo("Oneshot last CIPO", rx); - // After this burst the slave should go dark (Oneshot timed out; - // queue_drops_ may be nonzero in slave serial — many of the 500 - // frames will land while a previous frame is still on the queue). - delay(1500); // observe the slave go dark - } - - // -- Step 5: COMM_CHECK with deliberate byte-flip ---------------------- - { - uint8_t tx[HEADER_SIZE + PAYLOAD_COMMS_CHECK] = {0}; - uint8_t rx[HEADER_SIZE + PAYLOAD_COMMS_CHECK] = {0}; - build_comm_check(tx, sizeof(tx)); - tx[HEADER_SIZE + 5] ^= 0xAA; // flip a payload byte; parity will no longer match - // We must recompute parity so it's still a valid V1 message at the - // wire layer — otherwise we'd be testing parity_ok=false, not COMM_CHECK fail. - // Build a Message wrapper, rewrite the byte, recompute parity. - Message msg; - msg.to_comms_check(); - msg.data_ptr()[HEADER_SIZE + 5] = uint8_t(5) ^ 0xAA; // canonical[5]=5, flipped - msg.set_parity_bit(); - // Copy back into tx - for (size_t i = 0; i < msg.num_bytes(); i++) tx[i] = msg.data_ptr()[i]; - spi_xfer(tx, rx, sizeof(tx)); - // Next COMM_CHECK's CIPO carries the previous (failed) confirmation. - // We send another COMM_CHECK to retrieve it. - uint8_t tx2[HEADER_SIZE + PAYLOAD_COMMS_CHECK] = {0}; - uint8_t rx2[HEADER_SIZE + PAYLOAD_COMMS_CHECK] = {0}; - build_comm_check(tx2, sizeof(tx2)); - spi_xfer(tx2, rx2, sizeof(tx2)); - Serial.print("COMM_CHECK fail probe -> "); - print_cipo("", rx2); // expect {parity|0x01, 0xFF, 0x00} - } - - // -- Step 6: truncated frame (header+cmd only, length check should fail) -- - { - uint8_t tx[3] = { 0x01, CMD_ID_DISPLAY_GRAY_2, 0x00 }; // 3 bytes, not 53 - uint8_t rx[3] = {0}; - spi_xfer(tx, rx, sizeof(tx)); - Serial.print("Truncated frame -> "); - print_cipo("", rx); // CIPO buffer unchanged from prev valid - } - - // -- Step 7: parity-corrupted valid-length frame ------------------------- - { - uint8_t tx[HEADER_SIZE + PAYLOAD_DISPLAY_GRAY_2] = {0}; - uint8_t rx[HEADER_SIZE + PAYLOAD_DISPLAY_GRAY_2] = {0}; - build_gray_2_cross(tx, sizeof(tx), 255, - CMD_PROTOCOL_V1, CMD_ID_DISPLAY_GRAY_2); - // Flip a payload bit WITHOUT recomputing parity - tx[HEADER_SIZE + 2] ^= 0x01; - spi_xfer(tx, rx, sizeof(tx)); - Serial.print("Parity-corrupt frame -> "); - print_cipo("", rx); // CIPO buffer unchanged - } - - Serial.println("--- iteration done ---"); - Serial.println(); - delay(1000); // 1 Hz cadence -} From c35b0cce98ebb5bd91866034154c25f51493befe Mon Sep 17 00:00:00 2001 From: Michael B Reiser Date: Sat, 30 May 2026 00:56:37 -0400 Subject: [PATCH 3/7] =?UTF-8?q?panel:=20fix=20PE03=20rejected=20frames=20?= =?UTF-8?q?=E2=80=94=20drain=20RX=20FIFO=20after=20CS-high?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit custom_spi_read_blocking() broke out of the receive loop the instant gpio_get(cs_pin) read high. The final data byte's last SCK edge can precede the CS rising edge by less than the RP2350 RX-path latency (input synchronizer + SSP RX pipeline), so that byte was not yet spi_is_readable() when CS was sampled high -> the loop returned one byte short -> check_length() failed -> PE03 (visible as occasional flicker / error glyph during streaming). After detecting CS-high, drain any straggler bytes still arriving in the RX FIFO before returning, bounded by RX_DRAIN_SETTLE consecutive empty polls (reset on each read) so a finished or truncated transaction still exits promptly and can never hang. CS is high during the drain, so no bytes from the next transaction can enter the FIFO. Validated on v0.3.1 + arena master via SPI_DIAG: GS2/200Hz rejects 10/24130 -> 0/24101; GS16 0/24117; 0 rejects across 229,726 messages (GS2+GS16). Co-Authored-By: Claude Opus 4.8 (1M context) --- panel/src/panel_spi_custom.cpp | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/panel/src/panel_spi_custom.cpp b/panel/src/panel_spi_custom.cpp index 000d913..ceee43c 100644 --- a/panel/src/panel_spi_custom.cpp +++ b/panel/src/panel_spi_custom.cpp @@ -98,6 +98,10 @@ static int __not_in_flash_func(custom_spi_read_blocking)( { invalid_params_if(HARDWARE_SPI, 0 > (int)len); const size_t fifo_depth = 8; + // After CS-high, poll up to this many times for a straggler RX byte still + // propagating through the input synchronizer + SSP RX pipeline before + // concluding the transaction is drained (counter resets on each read). + const uint32_t RX_DRAIN_SETTLE = 256; size_t rx_remaining = len, tx_remaining = len; size_t tx_index = 0; int num_rx = 0; @@ -128,8 +132,29 @@ static int __not_in_flash_func(custom_spi_read_blocking)( --rx_remaining; num_rx++; } - // Check if master de-asserted CS (transaction ended) + // Check if master de-asserted CS (transaction ended). if (gpio_get(cs_pin)) { + // The final data byte's last SCK edge can precede the CS rising + // edge by less than the RX-path latency (input synchronizer + SSP + // RX pipeline), so that byte may not be spi_is_readable() yet at + // the instant gpio_get() reports CS high. Breaking immediately + // returns one byte short -> check_length() fails -> PE03. Drain any + // straggler(s) first, bounded by RX_DRAIN_SETTLE consecutive empty + // polls (reset on each read) so a finished or truncated transaction + // still exits promptly and can never hang. CS is high, so the + // master is not clocking — no bytes from the next transaction can + // enter the FIFO during this drain. + uint32_t settle = 0; + while (rx_remaining && settle < RX_DRAIN_SETTLE) { + if (spi_is_readable(spi)) { + *dst++ = (uint8_t) spi_get_hw(spi)->dr; + --rx_remaining; + num_rx++; + settle = 0; + } else { + ++settle; + } + } break; } } From ab206a001ca80b80d1488848597d715a738f3f60 Mon Sep 17 00:00:00 2001 From: Michael B Reiser Date: Sat, 30 May 2026 00:56:37 -0400 Subject: [PATCH 4/7] panel: add SPI_DIAG diagnostic build + ignore sub-minimum runts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit SPI_DIAG (-DSPI_DIAG=1; new pico_v021_spidiag / pico_v031_spidiag envs): behaves identically to production during streaming but silently tallies reception stats in RAM — per-check fail counts, per-command histogram, and a ring of the last 32 failures with got-vs-expected byte counts. No serial output during streaming; 'd' dumps everything in one burst, 'z' zeros the window. This localized the PE03 cause (one-byte-short last-byte drop) without perturbing timing, and is kept for the >10 MHz SPI work. Also ignore sub-minimum runts: a transaction shorter than MESSAGE_MINIMUM_SIZE (a 1-2 byte CS glitch / aborted clock, not a message attempt) no longer raises a PE03/PE04 glyph that would blank a streaming display. SPI_DIAG still counts them, so visibility is retained. Co-Authored-By: Claude Opus 4.8 (1M context) --- panel/platformio.ini | 28 ++++++++++ panel/src/messenger.cpp | 120 +++++++++++++++++++++++++++++++++++++++- 2 files changed, 147 insertions(+), 1 deletion(-) diff --git a/panel/platformio.ini b/panel/platformio.ini index 255ac99..d7fc5b4 100644 --- a/panel/platformio.ini +++ b/panel/platformio.ini @@ -93,3 +93,31 @@ extends = env:pico_v031 build_flags = ${env:pico_v031.build_flags} -DSTAGE2_SELFTEST=1 + + +; ---------------------------------------------------------------------- +; SPI diagnostics builds (-DSPI_DIAG=1). +; +; Identical to production (full SPI ingest, same streaming timing) PLUS +; silent in-RAM reception counters and a one-shot dump. There is NO serial +; output during streaming — send 'd' over USB serial (WHILE the master is +; still streaming, e.g. slowed to 100 Hz) to dump all counters in one burst, +; or 'z' to zero the window. Use to localize the rejected-frame flicker: +; parity vs length, per-command histogram, and got-vs-expected byte counts. +; +; Build: pio run -e pico_v021_spidiag (or pico_v031_spidiag) +; Flash: cp -X .pio/build//firmware.uf2 /Volumes/RP2350/ +; ---------------------------------------------------------------------- + +[env:pico_v021_spidiag] +extends = env:pico_v021 +build_flags = + ${env:pico_v021.build_flags} + -DSPI_DIAG=1 + + +[env:pico_v031_spidiag] +extends = env:pico_v031 +build_flags = + ${env:pico_v031.build_flags} + -DSPI_DIAG=1 diff --git a/panel/src/messenger.cpp b/panel/src/messenger.cpp index 4fe3746..e52deca 100644 --- a/panel/src/messenger.cpp +++ b/panel/src/messenger.cpp @@ -7,6 +7,7 @@ #include "predef_patterns.h" #include "predef_patterns_table.h" #include "display.h" +#include "protocol.h" #include #include @@ -14,6 +15,83 @@ extern Display display; +#if SPI_DIAG +// --------------------------------------------------------------------------- +// SPI_DIAG — silent reception diagnostics (build flag -DSPI_DIAG=1). +// +// Behaves identically to production during streaming: counters are cheap +// in-RAM increments and there is NO serial output. On the one-shot 'd' +// command the whole picture is dumped in a single burst; 'z' zeros the +// window. Goal: localize the ~7% rejected-frame flicker — is it parity (bit +// corruption) vs length (dropped/shifted bytes), which command, and is the +// got-vs-expected byte count "short by N" (framing) or random? +// --------------------------------------------------------------------------- +namespace { +struct DiagFail { uint8_t cmd; uint16_t got; uint16_t expected; uint8_t flags; }; +// flags: bit0=parity bit1=length bit2=protocol bit3=unknown-cmd +constexpr size_t DIAG_RING = 32; +uint32_t diag_msgs = 0, diag_reject_any = 0; +uint32_t diag_parity_fail = 0, diag_length_fail = 0, diag_protocol_fail = 0, diag_unknown_cmd = 0; +uint32_t diag_cmd_hist[256] = {0}; +DiagFail diag_ring[DIAG_RING]; +size_t diag_ring_head = 0, diag_ring_count = 0; + +uint16_t diag_expected_len(uint8_t cmd) { + auto it = PAYLOAD_SIZE_UMAP.find(cmd); + return (it != PAYLOAD_SIZE_UMAP.end()) ? (uint16_t)(it->second + HEADER_SIZE) : 0; +} +void diag_record(uint8_t cmd, uint16_t got, uint8_t flags) { + DiagFail &f = diag_ring[diag_ring_head]; + f.cmd = cmd; f.got = got; f.expected = diag_expected_len(cmd); f.flags = flags; + diag_ring_head = (diag_ring_head + 1) % DIAG_RING; + if (diag_ring_count < DIAG_RING) diag_ring_count++; +} +void diag_reset() { + diag_msgs = diag_reject_any = 0; + diag_parity_fail = diag_length_fail = diag_protocol_fail = diag_unknown_cmd = 0; + diag_ring_head = diag_ring_count = 0; + for (int i = 0; i < 256; i++) diag_cmd_hist[i] = 0; +} +inline void diag_emit(const char *b, int n) { + if (n > 0) Serial.write(reinterpret_cast(b), + (size_t)(n < (int)160 ? n : 159)); +} +// Blocking burst — only ever runs on an explicit 'd', never during the +// measurement window, so blocking on the USB-CDC FIFO is fine here. +void diag_dump() { + char b[160]; + diag_emit(b, snprintf(b, sizeof(b), + "\r\n=== SPI_DIAG (PANEL_REV=%d) ===\r\n", (int)PANEL_REV)); + unsigned long pml = diag_msgs + ? (unsigned long)((uint64_t)diag_reject_any * 1000u / diag_msgs) : 0; + diag_emit(b, snprintf(b, sizeof(b), "msgs=%lu reject_any=%lu (%lu.%lu%%)\r\n", + (unsigned long)diag_msgs, (unsigned long)diag_reject_any, pml / 10, pml % 10)); + diag_emit(b, snprintf(b, sizeof(b), + "parity_fail=%lu length_fail=%lu protocol_fail=%lu unknown_cmd=%lu\r\n", + (unsigned long)diag_parity_fail, (unsigned long)diag_length_fail, + (unsigned long)diag_protocol_fail, (unsigned long)diag_unknown_cmd)); + diag_emit(b, snprintf(b, sizeof(b), "cmd_hist:")); + for (int c = 0; c < 256; c++) + if (diag_cmd_hist[c]) + diag_emit(b, snprintf(b, sizeof(b), " 0x%02X=%lu", c, + (unsigned long)diag_cmd_hist[c])); + Serial.write(reinterpret_cast("\r\n"), 2); + diag_emit(b, snprintf(b, sizeof(b), + "last %u fails (cmd got/exp PLRU):\r\n", (unsigned)diag_ring_count)); + for (size_t i = 0; i < diag_ring_count; i++) { + size_t idx = (diag_ring_head + DIAG_RING - diag_ring_count + i) % DIAG_RING; + const DiagFail &f = diag_ring[idx]; + diag_emit(b, snprintf(b, sizeof(b), " 0x%02X got=%u exp=%u %c%c%c%c\r\n", + f.cmd, f.got, f.expected, + (f.flags & 1) ? 'P' : '.', (f.flags & 2) ? 'L' : '.', + (f.flags & 4) ? 'R' : '.', (f.flags & 8) ? 'U' : '.')); + } + Serial.write(reinterpret_cast("=== end ===\r\n"), 13); +} +} // namespace +#endif + + Messenger::Messenger(queue_t &display_queue, queue_t &error_request_queue) : display_queue_(display_queue), error_request_queue_(error_request_queue) @@ -126,10 +204,36 @@ void Messenger::update() { } } +#if SPI_DIAG + // Silent per-message accounting. Count each failing check INDEPENDENTLY + // (a dropped-bytes frame fails both parity and length — we want to see + // both, not just the first-wins one the error-raise below picks). + { + diag_msgs++; + diag_cmd_hist[cmd_id]++; + uint8_t flags = 0; + if (!parity_ok) { diag_parity_fail++; flags |= 1; } + if (!length_ok) { diag_length_fail++; flags |= 2; } + if (!protocol_ok) { diag_protocol_fail++; flags |= 4; } + if (parity_ok && length_ok && protocol_ok && cmd_umap_.count(cmd_id) == 0) { + diag_unknown_cmd++; flags |= 8; + } + if (flags) { diag_reject_any++; diag_record(cmd_id, (uint16_t)msg.num_bytes(), flags); } + } +#endif + // Trigger PE codes on validity-gate failures. First-detected wins: // parity > length > protocol > unknown opcode. Suppressed while the // panel is already showing an error glyph. - if (!err_active) { + // + // Sub-minimum transactions (num_bytes < MESSAGE_MINIMUM_SIZE) are runts — + // a 1-2 byte CS glitch or aborted clock, not a message attempt. Flashing a + // 3 s PE03/PE04 glyph for what is line noise blanks a streaming display, so + // treat runts as "ignore" (SPI_DIAG still counts them, so visibility is + // retained). The genuine framing bug behind 'got = expected-1' rejects is + // fixed in panel_spi_custom.cpp (drain RX FIFO after CS-high). + bool is_runt = msg.num_bytes() < MESSAGE_MINIMUM_SIZE; + if (!err_active && !is_runt) { if (!parity_ok) { raise_error(PREDEF_SLOT_PE02); } else if (!length_ok) { @@ -165,6 +269,19 @@ void Messenger::update() { // else: buffer stays as the previous valid confirmation OR the empty // sentinel (already loaded into the TX FIFO between transactions). +#if SPI_DIAG + // SPI_DIAG: stay SILENT during streaming (no TX). React only to one-shot + // input commands — 'd' dumps all counters in one burst, 'z' zeros the + // window. Reading input does not transmit, so this adds no traffic to the + // measurement window. NOTE: commands are serviced only between transactions, + // so send 'd' WHILE the master is still streaming (slowly, e.g. 100 Hz) — + // if the stream stops, update() blocks in panel_spi_read and won't see it. + while (Serial.available()) { + int c = Serial.read(); + if (c == 'd') diag_dump(); + else if (c == 'z') diag_reset(); + } +#else // DEVEL serial heartbeat — NON-BLOCKING. // ----------------------------------------------------------- // Emitted once per 1000 messages, but only when the USB-CDC TX FIFO has @@ -192,6 +309,7 @@ void Messenger::update() { } } // ----------------------------------------------------------- +#endif } From 1029d0335b56ec06222a51372946d6ca7ba06cbe Mon Sep 17 00:00:00 2001 From: Michael B Reiser Date: Sat, 30 May 2026 00:56:37 -0400 Subject: [PATCH 5/7] bench: fold canonical-spec facts into high-speed SPI protocol doc Per Modular-LED-Display/docs/development: the slim G4.1 controller clocks SPI at ~5 MHz today (g6_07 line 178) and the G6 SPI clock is an unmeasured bring-up item (the 30 MHz in g6_01 is aspirational). Put 5 MHz first in the sweep (the deployed operating point), note the shared-CIPO + SN74HCS08 column-buffer topology, and target a master-guaranteed >=0.5 ms CS-high gap. Co-Authored-By: Claude Opus 4.8 (1M context) --- panel/bench/handoff-spi-highspeed-bench.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/panel/bench/handoff-spi-highspeed-bench.md b/panel/bench/handoff-spi-highspeed-bench.md index c6e250a..d87e7de 100644 --- a/panel/bench/handoff-spi-highspeed-bench.md +++ b/panel/bench/handoff-spi-highspeed-bench.md @@ -60,7 +60,7 @@ cap.export_raw_data_binary(directory="/tmp/spiA_raw/", digital_channels=[0, 1]) **Measurements (Python):** - **SCK frequency:** from D0 raw edges — `1 / median(diff(rising_edges_seconds))`; also report min/max to catch master jitter. -- **CS-high gap histogram:** on D1, gap = each CS **rising** → next CS **falling**; report min / p1 / median / max. **The `min` (or p1) is the hard budget** any per-transaction reset path must beat. +- **CS-high gap histogram:** on D1, gap = each CS **rising** → next CS **falling**; report min / p1 / median / max. **The `min` (or p1) is the hard budget** any per-transaction reset path must beat. **Measure the gap between consecutive CS-active windows (per-transaction), NOT the frame-to-frame cadence** — Frank's first number (~3333 µs) was the 300 fps frame period, not the inter-transaction gap. Working target: the master should **guarantee ≥0.5 ms** between transactions to the same panel; the panel's per-transaction work is single-digit µs, so confirm the measured min clears 0.5 ms by a wide margin (it almost certainly does). - **Bytes/transaction & framing:** from the SPI analyzer table — confirm 3..300-byte frames decode and the CIPO confirmation slot `{header, cmd, CRC-8}` lands at bytes 0–2. --- From d60443b304e0b683061b298ec495c45a3df4f113 Mon Sep 17 00:00:00 2001 From: Michael B Reiser Date: Sat, 30 May 2026 11:32:58 -0400 Subject: [PATCH 6/7] panel: SPI_DIAG idle timeout + note that the double SSE toggle is load-bearing Add an idle timeout to the SPI_DIAG diagnostic build so the panel can answer 'd'/'z' host commands while the arena master is idle between bursts: custom_spi_read_blocking() returns 0 after 50 ms of CS-high idle, and Messenger::update() early-returns + services commands on a 0-byte read (factored into diag_service_commands()). This is what makes the contained-burst reliability benchmark choreography work (z before the burst, d after). The production path is untouched: the timeout is guarded by SPI_DIAG and never fires during streaming (frames arrive every few ms << 50 ms), so streaming timing is identical to production. Also document, in panel_spi_read(), that the per-valid-frame DOUBLE SSE toggle (clear in panel_spi_read + arm in messenger) is NOT redundant: an A/B test that collapsed it to a single reload regressed 15 MHz from 0% to 2.4% byte-drops. The extra SSP disable/enable keeps the marginal PL022 slave RX aligned. Do not "optimize" it away. Co-Authored-By: Claude Opus 4.8 (1M context) --- panel/src/messenger.cpp | 59 +++++++++++++++++++++++----------- panel/src/panel_spi_custom.cpp | 27 +++++++++++++++- 2 files changed, 66 insertions(+), 20 deletions(-) diff --git a/panel/src/messenger.cpp b/panel/src/messenger.cpp index e52deca..d50ff86 100644 --- a/panel/src/messenger.cpp +++ b/panel/src/messenger.cpp @@ -88,6 +88,14 @@ void diag_dump() { } Serial.write(reinterpret_cast("=== end ===\r\n"), 13); } +// Service one-shot host commands (input only, no TX): 'd' dump, 'z' zero. +void diag_service_commands() { + while (Serial.available()) { + int c = Serial.read(); + if (c == 'd') diag_dump(); + else if (c == 'z') diag_reset(); + } +} } // namespace #endif @@ -179,6 +187,17 @@ void Messenger::update() { static Message msg; panel_spi_read(msg); +#if SPI_DIAG + // Idle timeout (no transaction): panel_spi_read returned 0 bytes because + // the master is between bursts. Service host commands and return without + // counting, so 'd'/'z' work while idle and 0-byte reads never pollute the + // diagnostic counters. (Never happens during streaming — see the + // SPI_DIAG idle timeout in custom_spi_read_blocking.) + if (msg.num_bytes() == 0) { + diag_service_commands(); + return; + } +#endif msg_count_ += 1; // S1.4: reset COMM_CHECK byte-validation flag at the start of every @@ -245,14 +264,21 @@ void Messenger::update() { } } - // S1.3: arm the CIPO confirmation buffer per the plan's buffer-update rule. - // - Valid + COMM_CHECK passed: arm {header, cmd, checksum} - // - Valid COMM_CHECK that byte-mismatched (comm_check_ok_ == false): - // arm {header, 0xFF, 0x00} (sentinel) - // - Any other invalidity: do NOT touch the buffer (per spec) - // - During error-display window: do NOT touch the buffer (matches the - // spec rule for invalid messages; observable behavior matches "panel - // silently rejected the command", which is the truth) + // CIPO confirmation buffer — exactly ONE TX-FIFO reload per transaction. + // (Formerly two SSE toggles per valid frame: panel_spi_read() cleared to the + // sentinel, then this block armed the real confirmation a few µs later. The + // clear is now folded in here so each transaction does a single reload.) + // + // - Valid + dispatched: arm {header, cmd, checksum} + // (valid COMM_CHECK that byte-mismatched -> arm {header,0xFF,0x00}) + // - >=3 bytes clocked but invalid (or during the error-display window): + // CLEAR to the empty sentinel, so the confirmation just sent on CIPO + // during this transaction is deleted and not re-sent (per spec + // "buffer deleted, each confirmation sent only once"). + // - Runt (<3 bytes clocked, is_runt): leave the buffer ARMED — no full + // 3-byte CIPO slot was sent, so the confirmation gets another window. + // (Replicates the old `num_bytes >= 3` guard that used to live in + // panel_spi_read().) if (!err_active && parity_ok && length_ok && protocol_ok && cmd_ok) { uint8_t in_version = msg.header_byte() & 0b01111111; // 0x01 (V1 only) if (cmd_id == CMD_ID_COMMS_CHECK && !comm_check_ok_) { @@ -270,17 +296,12 @@ void Messenger::update() { // sentinel (already loaded into the TX FIFO between transactions). #if SPI_DIAG - // SPI_DIAG: stay SILENT during streaming (no TX). React only to one-shot - // input commands — 'd' dumps all counters in one burst, 'z' zeros the - // window. Reading input does not transmit, so this adds no traffic to the - // measurement window. NOTE: commands are serviced only between transactions, - // so send 'd' WHILE the master is still streaming (slowly, e.g. 100 Hz) — - // if the stream stops, update() blocks in panel_spi_read and won't see it. - while (Serial.available()) { - int c = Serial.read(); - if (c == 'd') diag_dump(); - else if (c == 'z') diag_reset(); - } + // SPI_DIAG: stay SILENT during streaming (no TX). Service one-shot input + // commands ('d' dump, 'z' zero) between transactions; reading input doesn't + // transmit, so no traffic is added to the measurement window. The idle + // timeout in custom_spi_read_blocking + the early-return above also service + // commands while the master is idle, so 'd'/'z' work between bursts too. + diag_service_commands(); #else // DEVEL serial heartbeat — NON-BLOCKING. // ----------------------------------------------------------- diff --git a/panel/src/panel_spi_custom.cpp b/panel/src/panel_spi_custom.cpp index ceee43c..92a9909 100644 --- a/panel/src/panel_spi_custom.cpp +++ b/panel/src/panel_spi_custom.cpp @@ -2,6 +2,9 @@ #include #include "constants.h" #include "panel_spi_custom.h" +#if SPI_DIAG +#include // time_us_32() for the idle timeout (SPI_DIAG only) +#endif // ---------------------------------------------------------------------------- // V1 CIPO confirmation buffer @@ -106,8 +109,24 @@ static int __not_in_flash_func(custom_spi_read_blocking)( size_t tx_index = 0; int num_rx = 0; - // Wait until spi is readable (first bit clocked in by master) + // Wait until spi is readable (first bit clocked in by master). +#if SPI_DIAG + // SPI_DIAG only: if the master is idle (CS high, no byte) for >50 ms, bail + // and return 0 so the caller can service host commands ('d'/'z') and report + // between bursts. Never fires during streaming — frames arrive every few ms + // << 50 ms — so streaming timing is identical to production. An XIP stall in + // time_us_32() here is harmless: nothing is being clocked while idle. + { + uint32_t t0 = time_us_32(); + while (!spi_is_readable(spi)) { + if (gpio_get(cs_pin) && (uint32_t)(time_us_32() - t0) > 50000u) { + return 0; + } + } + } +#else while(!spi_is_readable(spi)) {}; +#endif while (rx_remaining || tx_remaining) { if (tx_remaining && spi_is_writable(spi) && @@ -173,6 +192,12 @@ void panel_spi_read(Message &msg) { // Per spec: "After sending it successfully, the temporary buffer is deleted" // — but only if the master clocked at least 3 bytes (= a full CIPO slot). // Fragmented transactions (< 3 bytes) leave the buffer armed. + // + // NOTE (2026-05-30): this early clear here + the later arm in + // Messenger::update() means each valid frame does TWO SSE toggles. That is + // NOT redundant — collapsing to a single reload was A/B-tested and REGRESSED + // 15 MHz from 0% to 2.4% byte-drops. The per-frame double SSE toggle is + // load-bearing for PL022-slave RX reliability. Do not "optimize" it away. if (msg.num_bytes_ >= 3) { panel_spi_clear_confirmation(); } From 006dcba9024fd7508750c9cb8e56ebb05398979f Mon Sep 17 00:00:00 2001 From: Michael B Reiser Date: Sat, 30 May 2026 11:32:58 -0400 Subject: [PATCH 7/7] docs: add CLAUDE.md + SPI reliability bring-up summary - CLAUDE.md: repo orientation (dual-core, PL022 slave, two HW revs), build/flash including AUTONOMOUS BOOTSEL flashing via the 1200-baud-touch reset (no need to prompt the user to press BOOTSEL), the SPI_DIAG benchmark workflow, and do-not-break notes (load-bearing double toggle, RX-drain PE03 fix, 18 MHz reliable / 20 MHz cliff). - panel/bench/SPI-BRINGUP-SUMMARY.md: full SPI bring-up wrap-up -- merged PRs #2/#3, flicker/PE03 root-cause (last-byte drop) + fix, the clock/rate/rev reliability sweep, the rev-delta=0 finding (the >18 MHz cliff is PL022 sampling, not board SI/pin-routing), the abandoned turnaround optimization, and the gated future PL022+DMA -> PIO+DMA path. Co-Authored-By: Claude Opus 4.8 (1M context) --- CLAUDE.md | 76 +++++++++++++++++++ panel/bench/SPI-BRINGUP-SUMMARY.md | 118 +++++++++++++++++++++++++++++ 2 files changed, 194 insertions(+) create mode 100644 CLAUDE.md create mode 100644 panel/bench/SPI-BRINGUP-SUMMARY.md diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..7e63816 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,76 @@ +# CLAUDE.md — LED-Display_G6_Firmware_Panel + +Guidance for Claude Code working in this repo: RP2350 (Pico 2 / RP2354B) firmware for +the 20×20 panels of the G6 LED arena. + +## Architecture (orientation) + +Dual-core RP2354B. **Core 0** = `Messenger` — SPI ingest on the PL022 hardware SSP in +**slave** mode (polled, custom blocking read). **Core 1** = `Display` — PIO-driven BCM. +Lock-free SPSC queues (`queue_t`) between them; the display can never starve SPI of CPU. + +SPI wire format: **Mode 3** (CPOL=1, CPHA=1), MSB-first, 8-bit, CS-framed. +GS2 = 53-byte messages (cmd `0x10`), GS16 = 203-byte messages (cmd `0x30`). +CIPO is **shared** across panels on the arena bus → the panel drives a 3-byte +confirmation slot `{header, cmd, CRC-8}` only inside its own CS-active window. + +Two hardware revs via `-DPANEL_REV` (see `platformio.ini`): +- **v0.2.1** (`pico_v021`): SPI0 on GP32–35, PSRAM CS GP0. +- **v0.3.1** (`pico_v031`): SPI1 on GP40–43, PSRAM CS GP47. + +## Build & flash + +Run from `panel/`. `pio` is at `/opt/homebrew/bin/pio`. + +Environments: +- `pico_v021` / `pico_v031` — **production**. +- `pico_v021_spidiag` / `pico_v031_spidiag` — production + `-DSPI_DIAG=1`: silent in-RAM + reception counters + per-command histogram + ring of the last 32 failures (`got` vs + `expected`). Serial `z` = zero the window, `d` = dump. **Identical streaming timing to + production** (counters are cheap RAM increments; no serial output while streaming). +- `pico_v021_bcmtest` / `pico_v031_bcmtest` — display self-test, **NO SPI ingest. Never deploy.** + +Build: `pio run -e ` (e.g. `pio run -e pico_v031_spidiag`) + +### Flashing autonomously (BOOTSEL via 1200-baud touch) — no need to prompt the user + +The earlephilhower core reboots into BOOTSEL when its USB CDC port is opened at 1200 baud. +**You may flash without asking the user to press the BOOTSEL button.** Sequence: + +1. 1200-baud touch on the panel's CDC port (typically `/dev/cu.usbmodem1101`; the arena + master enumerates separately, e.g. `…121699401`): + ```python + import serial, time + s = serial.Serial("/dev/cu.usbmodem1101", 1200); s.setDTR(False); time.sleep(0.3); s.close() + ``` +2. Wait for `/Volumes/RP2350` to mount (poll up to ~10 s). +3. `cp -X panel/.pio/build//firmware.uf2 /Volumes/RP2350/` +4. Wait for the CDC port to re-enumerate (~a few seconds). + +Use `/usr/bin/python3` (has pyserial). If `/Volumes/RP2350` is already present the board is +already in BOOTSEL — skip step 1. After flashing, verify with an idle `d` dump: under +`SPI_DIAG` the read has a 50 ms idle timeout, so `d`/`z` are answered even with the master +idle (the dump banner reports `PANEL_REV=21` or `=31` — use it to confirm the right build). + +## SPI reliability benchmark (audio-cued contained burst) + +The arena master (separate repo, `LED-Display_G6_Firmware_Arena`) has runtime SPI-clock +control + a free-running frames-sent counter. To measure panel-received vs master-sent: +`z` the panel window, have the operator clear the master counter and stream a fixed burst, +then `d`. `missed = sent − received`; `reject_any` = within-frame corruption. Synchronise +start/stop with audio cues (`afplay /System/Library/Sounds/{Glass,Basso}.aiff` + `say`) so +the panel's z..d window brackets the master's burst. ~30 s windows are plenty. + +## Gotchas / do-not-break + +- **The per-valid-frame DOUBLE SSE toggle is load-bearing.** `panel_spi_read()` clears the + confirmation (toggle #1) and `Messenger::update()` arms it (toggle #2). Collapsing to a + single reload was A/B-tested and **regressed 15 MHz from 0% to 2.4% byte-drops** — the + extra SSP disable/enable keeps the marginal PL022 slave RX aligned. See the warning in + `panel_spi_custom.cpp` `panel_spi_read()`. Do not "optimize" it away. +- **PE03 last-byte-drop** is fixed by draining the RX FIFO after CS-high (`RX_DRAIN_SETTLE` + in `custom_spi_read_blocking`). Sub-`MESSAGE_MINIMUM_SIZE` runts are ignored (no glyph). +- **Reliability ceiling:** clean to **18 MHz** on **both** revs; sharp corruption cliff at + **20 MHz** (~6.4%, `got=exp−1`, parity rides along) — inherent PL022 slave sampling, + **rev-independent** (not pin-routing SI). Deployed clock is ~5 MHz (huge margin). Past + ~18 MHz needs PL022+DMA, then a PIO+DMA SPI slave (see the plan in the SPI bench docs). diff --git a/panel/bench/SPI-BRINGUP-SUMMARY.md b/panel/bench/SPI-BRINGUP-SUMMARY.md new file mode 100644 index 0000000..e268ce4 --- /dev/null +++ b/panel/bench/SPI-BRINGUP-SUMMARY.md @@ -0,0 +1,118 @@ +# G6 panel SPI bring-up — investigation summary + +**Date:** 2026-05-30 · **Hardware:** arena master (Teensy 4.1, G6-ArenaSlim) + single G6 +panel · **Panel firmware:** branch `spi-bringup-step0` (PR #5) · **Author:** bring-up with +Frank Loesche (`floesche`). + +This is the wrap-up of the first real-hardware SPI bring-up of the G6 arena: two merged +PRs, a root-caused-and-fixed flicker bug, a full clock/rate/rev reliability sweep, an +abandoned "optimization," and the resulting decision on the future high-speed path. + +--- + +## 1. What was wrong, and what we shipped + +| # | Problem | Fix | Status | +|---|---|---|---| +| PR #2 | Parity used `std::bitset` = a **1-bit** set → counted only the LSB | `std::bitset` (full 8-bit popcount; spec-correct) | merged | +| PR #3 | `PE03`/`PE04` at ≤5 MHz: slave TX FIFO assumed empty between transactions | SSE toggle clears both FIFOs before re-loading the CIPO confirmation | merged | +| Flicker / PE03 | Display dimming = panel **rejecting** frames | see §2 | fixed (PR #5) | +| Core-0 stall | Blocking 11-line serial heartbeat every 1000 msgs could stall core 0 between transactions → missed leading bytes | non-blocking heartbeat (`availableForWrite` guard) | merged (`171d614`) | + +## 2. Flicker / PE03 — root cause and fix + +The flicker was the panel **rejecting** otherwise-good frames. `SPI_DIAG` (silent in-RAM +counters + a ring of the last 32 failures with `got` vs `expected`) showed: **parity never +failed alone — only length**, almost always `got = expected − 1` (one byte short). + +**Root cause = last-byte drop.** `custom_spi_read_blocking()` broke out of its loop on +`gpio_get(cs_pin)` reading high *before* the final byte had cleared the RP2350 input +synchronizer (~4 sysclk) + the SSP RX pipeline → returned one byte short → `check_length()` +failed → PE03 → frame dropped → visible dimming. **Not** signal integrity. + +**Fix** (`c35b0cc`, `panel_spi_custom.cpp`): after CS goes high, **drain the RX FIFO** for a +straggler byte (bounded by `RX_DRAIN_SETTLE` empty polls so it can never hang; CS is high so +no new bytes can arrive). Plus: **ignore sub-`MESSAGE_MINIMUM_SIZE` runts** (a 1–2-byte CS +glitch shouldn't flash a 3 s error glyph). Validated: GS2/200 Hz 10/24130 → **0/24101**, +GS16 **0/24117**, **0 rejects across 229,726 messages**. + +## 3. Reliability sweep (the headline result) + +Method: panel `SPI_DIAG` build + arena master with runtime SPI-clock control and a +free-running frames-sent counter. Audio-cued contained burst — `z` the panel, operator +clears the master counter + streams a fixed ~30 s burst, then `d`; `missed = sent − received`, +`corrupted = rejected`. GS16 (203 B/frame). + +| SPI clock | rate | v0.3.1 received / corrupted | v0.2.1 received / corrupted | +|---:|---:|:---|:---| +| 5 MHz | 200 Hz | — | 6064 / **0** | +| 10 MHz | 200 Hz | 3261 / **0** | — | +| 15 MHz | 200 Hz | 3186 / **0** | 6724 / **0** | +| 15 MHz | 500 fps | 14713 / **0** | 15753 / **0** | +| 18 MHz | 200 Hz | 3073 / **0** | 6908 / **0** | +| 18 MHz | 500 fps | 29144 / 858 (**2.9 %**) | — | +| 18 MHz | 1 kHz | 52622 / 5144 (**9.7 %**) | — | +| 20 MHz | 200 Hz | 3884 / 250 (**6.4 %**) | 6064 / 388 (**6.40 %**) | + +**`missed = 0` in every run** (`sent == received`) — whole-frame *delivery* is robust even at +the cliff. The failures are within-frame byte corruption (`got = exp − 1`, sometimes −2/−3). +Parity fails are a ~30–34 % *subset* of length fails, never parity-only (a short frame +mismatches the header popcount about a third of the time). + +### Findings + +1. **Reliable through 18 MHz on both revs; sharp cliff at 20 MHz (~6.4 %).** Deployed ~5 MHz + has ~3–4× margin and is rock-solid. +2. **The cliff is rev-independent → PL022, not board SI.** v0.2.1 (SPI0 GP32–35, interleaved + between the row-driver halves — predicted *worse* by the crosstalk hypothesis) corrupts at + the **same** clock, rate, and signature as v0.3.1 (20 MHz → 6.4 %, `got=202`). The + pin-routing/crosstalk hypothesis is **falsified**; the ceiling is inherent **PL022 + SPI-slave sampling marginality** (input-synchronizer / shifter). The earlier impression + that "v0.3.1 is more stable" was the last-byte-drop bug (now fixed), not a real rev delta. +3. **Cadence matters only near the cliff.** At 18 MHz, rate walks corruption up + (0 % → 2.9 % → 9.7 % at 200 Hz / 500 fps / 1 kHz). At 15 MHz, **500 fps is still 0 %** — + and notably the per-frame transaction is *longer* at 15 MHz than 18 MHz, so it leaves + *less* inter-transaction slack yet stays clean. ⇒ the limit is per-byte sampling, **not** + core-0 turnaround. +4. **Brightness/duty-cycle was not swept** — the only remaining open signal-integrity + question, now low priority (both revs clean to 18 MHz at these patterns). + +## 4. Abandoned: the "turnaround optimization" + +Hypothesis: each valid frame does **two** TX-FIFO reloads — `panel_spi_read()` clears to the +sentinel (SSE toggle #1), then `Messenger::update()` arms the real confirmation a few µs +later (toggle #2) — and the intermediate clear looked like wasted work. Tried collapsing to a +single late arm-or-clear. + +**A/B at 15 MHz / 500 fps, same warm bench, back-to-back: original 2-reload = 0/14713; +optimized 1-reload = 391/16086 (2.4 %).** Reverted, re-tested original → 0/14713 again +(thermal ruled out). **The double SSE toggle is load-bearing** — the extra SSP disable/enable +per frame keeps the marginal PL022 slave RX aligned transaction-to-transaction. Abandoned; +warning comment left in `panel_spi_custom.cpp` so it is not re-attempted. + +## 5. Future high-speed path (gated, not started) + +The 25–30 MHz spec aspiration is beyond the PL022 slave cliff. Order of attack (cheapest +first), per the plan and the Codex review: + +1. **(optional) Brightness-vs-error bench** with a rail/ground probe — close the last SI + question. Low priority. +2. **PL022 + DMA spike** — drive RX (and TX filler) via DMA + CS IRQ instead of the polled + loop; may fix the per-byte latency without a from-scratch slave. +3. **PIO + DMA SPI slave** — only if 1–2 prove the PL022 shifter is the wall. PIO-SPI ceiling + ≈ 25 MHz; levers: `INPUT_SYNC_BYPASS`, clock-recovery via a 2nd PIO, overclock sysclk. + Must double-buffer RX, keep confirmation-arming in `Messenger` after CRC, tri-state CIPO + when CS is high (shared bus), and serve both revs (relative-pin-compatible, GPIO base 16). + +## 6. Artifacts & references + +- **Panel firmware:** branch `spi-bringup-step0` (PR #5). Key files: `panel_spi_custom.cpp` + (RX drain, double-toggle note, `SPI_DIAG` idle timeout), `messenger.cpp` (`SPI_DIAG` + counters + runt-ignore), `platformio.ini` (`pico_vXX_spidiag` envs). +- **Controller firmware** (separate repo): `LED-Display_G6_Firmware_Arena` branch + `runtime-spi-clock-and-frame-counter` — runtime SPI clock + frames-sent counter. +- **Controller-side ceilings + this sweep:** + `Modular-LED-Display/docs/development/g6_performance-benchmarks.md`. +- **Bench protocol:** `panel/bench/handoff-spi-highspeed-bench.md`. +- **Open issue:** startup first-message PE03 (one-time at master init; separate from + steady-state; GitHub issue #6).