[WIP] Fix division-by-zero error in power management#17
Conversation
Co-authored-by: dfeen87 <158860247+dfeen87@users.noreply.github.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Pull request overview
This PR is a safety/robustness hardening sweep across HLV-RAPS, primarily addressing division-by-zero/NaN propagation risks and tightening a few operational guardrails (REST API sizing, production build crypto stubs, rollback store bounds).
Changes:
- Add zero-dt/zero-count guards in power management, PID controller, and PDT Monte Carlo path to prevent division-by-zero and unsafe NaN/Inf propagation.
- Add runtime sanitization of APCU spacetime state (NaN/Inf detection) plus improved rollback store eviction and telemetry truncation handling.
- Add operational/build-time guardrails: REST API request size cap + health version reporting; prevent stub crypto / reference integrator inclusion in production builds.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/telemetry_report.cpp | Detect and flush truncated JSONL lines while scanning telemetry. |
| src/safety/rollback_store.hpp | Evict oldest rollback entry instead of resetting the counter. |
| src/safety/apcu_state_management_and_safety.hpp | Add NaN/Inf state sanitization and emergency-mode trigger. |
| src/platform/platform_hal.cpp | Compile-time error if stub crypto is used in production builds. |
| src/hlv/power_and_resource_management.hpp | Guard division-by-zero on elapsed_ms when computing slew-based power draw. |
| src/hlv/field_coupling_stress_model.hpp | Clamp exponent argument to prevent exp() overflow to +inf. |
| src/hlv/artificial_gravity_control.hpp | Avoid implicit narrowing by casting elapsed_ms before float math. |
| src/control/pid_controller.hpp | Change PID elapsed_ms type to uint64_t and handle zero-dt safely. |
| src/api/rest_api_server.cpp | Add request size limit and expose RAPSVersion in /health. |
| reference/hlv_reference_integrator.hpp | Mark reference integrator as non-production via compile guard. |
| include/safety/state_sanitizer.hpp | New helper to validate key state floats are finite. |
| include/raps/pdt/hlv_pdt_engine.hpp | Early return INVALID when monte_carlo_runs == 0. |
| include/raps/hlv/hlv_field_dynamics.hpp | Remove duplicated constants, rely on shared constants header. |
| include/raps/core/raps_core_types.hpp | Add RAPSVersion constants. |
| include/apcu/advanced_propulsion_control_unit.hpp | Fix include paths to modular headers. |
| VERSION | Add repository version marker (2.4.0). |
| CHANGELOG.md | Add changelog entry for 2.4.0 changes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| int c; | ||
| while ((c = std::fgetc(f)) != '\n' && c != EOF) {} | ||
| std::fprintf(stderr, "warning: truncated line at record %llu\n", | ||
| (unsigned long long)total); |
There was a problem hiding this comment.
The truncated-line handling flushes the remainder of the long line, but then still increments counters and scans buf (which contains only a partial JSON record). This can skew severity/type counts and dropped_total detection. After flushing, skip processing the truncated record (e.g., continue;), and consider whether the warning’s record number should be 1-based (total + 1) since total is incremented later.
| (unsigned long long)total); | |
| (unsigned long long)(total + 1)); | |
| // Count this line but skip processing the partial JSON record | |
| ++total; | |
| continue; |
| uint64_t elapsed_ms) { | ||
|
|
||
| // Integral term with anti-windup | ||
| float dt_s = 0.0f; | ||
| if (elapsed_ms > 0.0f) { | ||
| dt_s = elapsed_ms / 1000.0f; | ||
| if (elapsed_ms == 0) { | ||
| // Zero-time tick: return proportional-only to avoid division by zero | ||
| return kp * error; | ||
| } | ||
| float dt_s = static_cast<float>(elapsed_ms) / 1000.0f; |
There was a problem hiding this comment.
compute_pid_output now takes a uint64_t elapsed_ms, but this header doesn’t include <cstdint>. Please make the header self-contained by including <cstdint> (or another header that guarantees uint64_t is available) here.
| // Guard against zero-time delta to prevent division by zero (startup/rollover) | ||
| float dt_ms = static_cast<float>(std::max(elapsed_ms, uint64_t{1})); | ||
| state.power_draw_GW = compute_power_draw( | ||
| warp_change_request / static_cast<float>(elapsed_ms), | ||
| flux_change_request / static_cast<float>(elapsed_ms) | ||
| warp_change_request / dt_ms, | ||
| flux_change_request / dt_ms |
There was a problem hiding this comment.
This header now uses uint64_t{1} in the new zero-dt guard, but only includes <algorithm>. To avoid relying on transitive includes, add <cstdint> so uint64_t is always defined when this header is included on its own.
| // Reject requests that exceed the size limit | ||
| if (static_cast<size_t>(bytes_read) >= MAX_REQUEST_SIZE) { | ||
| std::string rejection = json_error(413, "Request Too Large"); | ||
| ::send(client_sock, rejection.c_str(), rejection.length(), 0); | ||
| return; | ||
| } |
There was a problem hiding this comment.
The size-limit rejection has two concrete problems: (1) bytes_read can’t be > MAX_REQUEST_SIZE because recv is capped at MAX_REQUEST_SIZE, so >= MAX_REQUEST_SIZE will also reject requests that are exactly 8192 bytes (within the stated limit). (2) Calling json_error(413, ...) won’t actually send a 413 status line because json_response() maps any non-200/non-404 status to the HTTP_500_HEADER. Consider changing the check to detect truncation/overflow reliably (e.g., loop until end-of-headers or limit exceeded, or peek for more data), and update json_response() to generate correct status lines for arbitrary codes (including 400/405/413).
| ════════════════════════ | ||
| // ⚠️ REFERENCE ONLY — This file is a standalone amalgamation for review/documentation. | ||
| // DO NOT include in production builds. Use the modular headers under include/. | ||
| #ifdef RAPS_PRODUCTION_BUILD |
There was a problem hiding this comment.
The production-build guard here uses #ifdef RAPS_PRODUCTION_BUILD, which will fire even if the macro is defined as 0. Elsewhere in this PR (e.g., src/platform/platform_hal.cpp) production gating is done with #if defined(RAPS_PRODUCTION_BUILD) && RAPS_PRODUCTION_BUILD == 1. To keep semantics consistent and avoid accidental build breaks when the macro is defined-but-false, consider using the same defined(...) && ...==1 pattern here.
| #ifdef RAPS_PRODUCTION_BUILD | |
| #if defined(RAPS_PRODUCTION_BUILD) && RAPS_PRODUCTION_BUILD == 1 |
Original prompt
Summary
This PR addresses critical safety, robustness, and architectural issues identified during a comprehensive security review of the HLV-RAPS codebase.
1. Security Fixes
1.1 — Division-by-Zero in Power & Resource Management (CRITICAL)
File:
src/hlv/power_and_resource_management.hpp(lines 17-19)state.power_draw_GW = compute_power_draw( warp_change_request / static_cast<float>(elapsed_ms), flux_change_request / static_cast<float>(elapsed_ms) );When
elapsed_ms == 0(startup, timer rollover, or clock anomaly), this produces a floating-point division by zero →inforNaNpropagation through the entire power subsystem.Fix: Guard against zero-time deltas:
1.2 — PID Controller Silent Failure on Zero dt (HIGH)
File:
src/control/pid_controller.hpp(lines 14-22)elapsed_msis typedfloatin the function signature but callers passuint64_t. Whenelapsed_ms == 0, the derivative term is silently zeroed and the integral is frozen—but no diagnostic is emitted. In a safety-critical PID loop, this masks timing failures.Fix: Change the parameter type to
uint64_t, add a proportional-only fallback for zero dt, and add a comment:1.3 — Monte Carlo Division by Zero in PDT Engine (HIGH)
File:
include/raps/pdt/hlv_pdt_engine.hpp(lines 34-63)If
monte_carlo_runs == 0is passed, multiple divisions by zero occur. No input validation exists.Fix: Add an early return with an INVALID result at the top of the
predict()method:1.4 — Exponential Blowup in Field Coupling Stress (HIGH)
File:
src/hlv/field_coupling_stress_model.hpp(lines 28-29)With extreme inputs, the exponent argument can be enormous →
exp()returns+inf. This propagates into stability checks and emergency mode decisions.Fix: Clamp the exponent argument before calling
std::exp():1.5 — REST API: Add Request Size Limit (MEDIUM)
File:
src/api/rest_api_server.cppThe REST API uses raw POSIX sockets with no visible request size limit. Add a
constexpr size_t MAX_REQUEST_SIZE = 8192;and reject oversized requests. Also add a read timeout to prevent slowloris-style stalls. Document theAccess-Control-Allow-Origin: *CORS header as a conscious risk acceptance in a code comment.1.6 — Gate Stub Cryptography from Production Builds (INFORMATIONAL)
File:
src/platform/platform_hal.cppSHA-256 and Ed25519 are stubs. Add a compile-time guard:
2. Robustness & Resilience
2.1 — Add NaN/Inf Sanitization for State Transitions
Create a new header
include/safety/state_sanitizer.hpp: