Skip to content

Use monotonic clock for total_runtime to prevent negative view_runtime#97

Open
krzysztofjablonski wants to merge 2 commits into
aserafin:masterfrom
krzysztofjablonski:dev-10665-monotonic-clock
Open

Use monotonic clock for total_runtime to prevent negative view_runtime#97
krzysztofjablonski wants to merge 2 commits into
aserafin:masterfrom
krzysztofjablonski:dev-10665-monotonic-clock

Conversation

@krzysztofjablonski
Copy link
Copy Markdown

@krzysztofjablonski krzysztofjablonski commented May 12, 2026

Problem

In production we are seeing grape_logging entries where view_runtime is negative and db_runtime exceeds total_runtime:

{
  "data": {
    "db": 14951.37,
    "view": -624.44,
    "duration": 14326.93
  }
}

It happens sporadically, but at API scale it consistently pollutes our metrics.

Diagnosis

db_runtime is collected from ActiveSupport::Notifications::Event#duration, which since Rails 7+ is measured with the monotonic clock (Process.clock_gettime(Process::CLOCK_MONOTONIC)).

total_runtime was measured with Time.now, i.e. the wall clock. Any wall-clock correction (NTP step, VM suspend/resume, drift between host and container) makes the Time.now - Time.now delta smaller than the request's actual elapsed time measured monotonically. If SQL queries happened to fire in between, db_runtime (monotonic) can be larger than total_runtime (wall-clock), and view = total - db ends up negative.

For the record - this is not the classic shared-instance-variable race discussed in #78. That thread was resolved by @samsonjs: Grape::Middleware::Base#call does dup.call!, so @start_time / @stop_time / @env are per-request. The issue here is mixing two different clocks inside a single log entry.

Fix

start_time and stop_time now use Process.clock_gettime(Process::CLOCK_MONOTONIC). Both readings are on the same scale as event.duration, so total_runtime >= db_runtime always holds, regardless of wall-clock corrections.

The change is minimal and does not alter the public API - parameters[:time] still exposes the same keys and units (milliseconds).

Regression test

The new spec stubs Time.now to return values that move backwards in time between the request's start and end. On the old code total_runtime is negative and the test fails. After the fix the monotonic clock no longer depends on Time.now, so total_runtime is positive.

Test plan

  • bundle exec rspec - full suite green (36 examples)
  • bundle exec rubocop - only 2 preexisting offenses in lib/grape_logging/multi_io.rb, unrelated to this change
  • The new spec fails on master and passes after the fix

🤖 Generated with Claude Code

`ActiveSupport::Notifications::Event#duration` (used to compute
`db_runtime`) is measured with `Process.clock_gettime(CLOCK_MONOTONIC)`,
while `total_runtime` was measured with `Time.now`. Wall-clock skew
(e.g. NTP corrections) could make the wall-clock delta smaller than the
monotonic database time, producing a negative `view_runtime` in the
logs.

Switching `start_time`/`stop_time` to the monotonic clock keeps both
quantities on the same scale and eliminates the negative timings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@krzysztofjablonski krzysztofjablonski marked this pull request as ready for review May 12, 2026 07:39
The previous spec stubbed `db_runtime` to 0.0 and then asserted
`view >= 0`, which collapsed to checking `total >= 0` twice. Replace
with two focused specs:

- A wall-clock skew scenario that asserts `total_runtime` stays
  non-negative when `Time.now` jumps backwards.
- A controlled-monotonic-clock scenario that stubs
  `Process.clock_gettime` and asserts that `view_runtime` is
  `total_runtime - db_runtime` with a non-zero db value.

Both fail on the pre-fix `Time.now` implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant