Skip to content

Feat/ota web#17

Merged
agessaman merged 6 commits into
mqtt-bridge-implementation-flexfrom
feat/ota-web
Jun 23, 2026
Merged

Feat/ota web#17
agessaman merged 6 commits into
mqtt-bridge-implementation-flexfrom
feat/ota-web

Conversation

@agessaman

Copy link
Copy Markdown
Owner

Summary

Adds a pull-based OTA path for observer (MQTT-bridge) builds: a node fetches its own
firmware from the web-flasher manifest and flashes itself over its existing WiFi
connection — no cable, no manual upload, triggerable remotely over the mesh.

Kept separate from start ota (the ElegantOTA web-upload SoftAP), which is unchanged,
so nobody expecting to hand-upload a binary triggers a silent online update.

Commands (observer builds only)

Command Behavior
ota check Fetch manifest, report current -> available hash + partition-change status. No flash. Synchronous.
ota update Replies Beginning update... immediately, then flashes and reboots.
start ota Unchanged — raises the ElegantOTA SoftAP for a manual web upload.

On non-observer builds ota check/ota update return ERR: online OTA not supported on this build.

How it works

The observer already has everything needed — a live WiFi STA link (for MQTT), the embedded
root-CA bundle, dual OTA partitions, and ArduinoJson — so no new dependencies.

  1. Fetches the manifest (OTA_MANIFEST_URL, baked in at compile time) over TLS verified
    against the embedded CA bundle.
  2. Finds the flash-update (app-only) .bin whose name matches this build's variant
    (OTA_VARIANT, the PlatformIO env name injected by build.sh).
  3. Compares the manifest's build hash against the running firmware's (embedded in
    FIRMWARE_VERSION), by shared prefix so a 7-char CI hash and an 8-char local hash for
    the same commit aren't seen as different. Reports / skips if already up to date.
  4. Streams the binary into the inactive OTA slot via HTTPUpdate and reboots.

No operator-supplied URLs — host and target are resolved entirely from baked-in config +
the manifest.

Robustness (the non-obvious parts, all hardware-driven)

Getting this reliable on a no-PSRAM ESP32-S3 (Heltec V3) surfaced several issues worth
flagging for review:

  • Stack: the TLS + JSON / HTTPUpdate work overflows the ~8 KB Arduino loop-task stack
    when reached via the deep mesh-receive call chain (canary panic). It now runs in a
    dedicated 24 KB-stack FreeRTOS task, spawned per-operation and freed after.
  • Cloudflare transfer encoding: the manifest host fronts via Cloudflare, which answers
    HTTP/1.1 with Transfer-Encoding: chunked and no Content-Length; the raw chunked
    stream corrupts the JSON parse. Fixed with http.useHTTP10(true) → unframed,
    Connection: close body, stream-parsed with an ArduinoJson filter so peak RAM is the
    kept subset (~12 KB), not the ~40 KB manifest.
  • Heap: standing up a third TLS connection (the fetch) alongside the two live MQTT TLS
    sessions drives free heap to a few hundred bytes and truncates the read. Both ota check
    and ota update stop the MQTT bridge first (its TLS contexts + task) for headroom; the
    WiFi STA link survives end().
  • Deferred update for the ack: the flash blocks the loop and then reboots, so a reply
    can't go out inline. ota update schedules the flash ~2.5 s out (via the app loop) so the
    Beginning update... confirmation transmits over LoRa first.

Bridge restart fixes (in MQTTBridge)

Stopping/restarting the bridge for OTA exposed two pre-existing bugs in
initializeWiFiInTask(), now fixed (these also benefit set mqtt… / restartBridge):

  • Skip WiFi.begin() when already connected. end() leaves WiFi up, so the restart was
    forcing a needless disconnect/reconnect that also raced the MQTT task's first DNS lookup
    (getaddrinfo() returns 202 / esp-tls 0x8001). Slot setup still fires because
    _ntp_synced persists across end().
  • Register the WiFi.onEvent handler once. It was re-registered on every restart and
    never removed — a handler leak that duplicated every connect/disconnect log line.

Result: after an ota check, the bridge comes back with only the MQTT sessions reconnecting
— no WiFi flap, no NTP re-sync, no DNS errors, single log lines.

Safety

  • ota update is admin + ACL gated in the mesh receive path.
  • TLS verified against the embedded CA bundle (no setInsecure()) for both manifest and
    binary.
  • Only the app-only flash-update artifact is fetched — never -merged.bin.
  • Releases flagged partition-change in the manifest are refused (OTA can't rewrite the
    partition table; those still need a cable/erase flash).
  • Rollback-safe: HTTPUpdate/Update writes only the inactive OTA slot and commits the
    boot pointer (otadata) only after a complete, validated write (size + MD5 + image-header
    check). A failed/truncated/wrong-chip download is rejected without rebooting and the bridge
    is resumed, so the node keeps booting the working partition. (ESP-IDF post-boot
    auto-rollback is not enabled — a build that passes validation but is functionally broken
    would not auto-revert; noted as future hardening.)

Also in this PR

  • start ota station IP: on a WiFi-connected device, ElegantOTA is now served on the
    station IP (reachable from the LAN) instead of always raising the SoftAP and reporting
    192.168.4.1.

Build config

  • build.sh injects -DOTA_VARIANT='"<env>"' alongside FIRMWARE_VERSION.
  • -D OTA_MANIFEST_URL='"https://observer.gessaman.com/config.json"' added to all 28
    observer envs (1:1 with WITH_MQTT_BRIDGE).

Testing

Verified on hardware (Heltec V3 observer, no PSRAM):

  • ota checkupdate available: dec66838 -> 454afec, reliably, with Min free heap
    ~63 KB during the fetch and a clean bridge restart afterward (no WiFi flap / DNS errors).
  • ota updateBeginning update... ack received over LoRa, then download → flash →
    reboot onto the new build.
  • start ota → ElegantOTA SoftAP, now reporting the correct station IP.
  • Builds confirmed for heltec_v4 and Heltec_v3 observer envs; partition dump confirms
    dual OTA slots.

Not yet exercised: no-op "already up to date" path, partition-change refusal, and a
download-failure (rollback) path.

Files changed

OTA feature: build.sh, src/MeshCore.h, src/helpers/ESP32Board.{cpp,h},
src/helpers/CommonCLI.{cpp,h}, examples/simple_repeater/MyMesh.{cpp,h}, and
OTA_MANIFEST_URL across 12 observer variant platformio.ini files.

Bridge restart fixes: src/helpers/bridges/MQTTBridge.{cpp,h} — slightly outside the OTA
feature but required to make the bridge bounce clean; they improve every restart path.

Added functionality to support pull-based OTA updates by fetching firmware
from a manifest. The new `otaFromManifest` method allows the system to
check for available updates and flash the firmware if necessary. This
enhancement improves the update process for observer builds using the
MQTT bridge, ensuring a more seamless firmware management experience.
Updated the otaFromManifest method to enforce HTTP/1.0 for better
compatibility with CDNs and to handle empty manifest responses. This
ensures that the JSON parser receives a complete body, preventing
errors during firmware update checks.
Updated the otaFromManifest method to stream-parse the firmware manifest
directly from the network, reducing peak RAM usage during OTA checks. This
change enhances compatibility with slow TLS links by implementing a per-read
timeout, ensuring a more efficient and reliable update process.
Updated the startOTAUpdate method to serve the ElegantOTA on the
station IP when connected to a WiFi network, enhancing accessibility
for OTA updates. If not connected, it defaults to the MeshCore-OTA
SoftAP. This change improves the user experience by allowing easier
access to OTA updates without needing to switch networks.
Added support for deferred OTA updates in the MyMesh class, allowing
the system to schedule firmware updates to occur after a confirmation
reply is sent. This change improves the user experience by ensuring
that the update process does not block the main application loop,
allowing for smoother operation during firmware updates.
@agessaman agessaman merged commit a06a14d into mqtt-bridge-implementation-flex Jun 23, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant