Skip to content

Ensure tasks ID seed is always greater than existing storage files#255

Open
Abishekcs wants to merge 1 commit intorage-rb:mainfrom
Abishekcs:fix/file-storage-timestamp-clock-skew
Open

Ensure tasks ID seed is always greater than existing storage files#255
Abishekcs wants to merge 1 commit intorage-rb:mainfrom
Abishekcs:fix/file-storage-timestamp-clock-skew

Conversation

@Abishekcs
Copy link
Copy Markdown

What this PR does

Ensure task IDs remain monotonic across restarts by seeding task_id_seed from the maximum timestamp found in existing storage files.

  • Scans current and recovered storage files to find the highest task timestamp
  • Initializes task_id_seed as:
[Time.now.to_i, max_timestamp].max + 1
  • Prevents new task IDs from being lower than previously written ones (handles clock skew / restarts)
  • Ensures recovered storage files are rewound before processing

Why

Using only Time.now can generate smaller timestamps than those already present in Storage files, leading to ordering issues. This change guarantees monotonic task ID generation.

References

Screenrecording:

I played around with a script to reproduce the lower timestamp issue: https://github.com/Abishekcs/my-rage-app/blob/master/reproduce_bug.rb

Not 100% sure how realistic it is, but it helped me observe the behavior locally.

Using this repo: https://github.com/Abishekcs/my-rage-app (Note: rage gem is linked locally so changes reflect right away.)

  • Before
Before.mp4
  • After
After.mp4

When a server restarts after a crash, the system clock may be behind the timestamps already present in the storage file due to clock skew caused by NTP synchronization, VM resume etc. This could result in new task IDs having lower timestamps than existing file entries, breaking ordering guarantees and risking duplicate task execution.

Fix by scanning all acquired storage files (including recovered ones from crashed workers) on boot to find the highest existing timestamp, then applying Lamport's Implementation Rule IR2(b) to ensure the new seed is strictly greater than anything already in the storage file. This guarantees new task IDs are always ordered after existing ones regardless of clock state at boot time.

Ref: "Time, Clocks, and the Ordering of Events in a Distributed System" Leslie Lamport, CACM Vol. 21 No. 7, July 1978, Section 2 (IR2b)
@Abishekcs
Copy link
Copy Markdown
Author

While trying to understand the codebase, I came across this # TODO: ensure timestamps in the file are not higher in disk.rb. So I tried to give it a shot at resolving it in this PR.

Copy link
Copy Markdown
Member

@rsamoilov rsamoilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Abishekcs,

Thank you for the contribution - this looks great!

The implementation looks correct and handles the edge case well.

Could you please add some tests? Some suggestions:

  • Startup with storage file containing timestamps in the future
  • Startup with multiple recovered storage files with varying timestamps
  • Startup with empty storage file
  • Startup with only rem entries in a storage file

@rsamoilov
Copy link
Copy Markdown
Member

Also, any chance you could check how much time this loop takes for a 2 MB WAL?

@Abishekcs
Copy link
Copy Markdown
Author

Could you please add some tests?

Sure no problem will add the tests. Thanks for the review.

@Abishekcs
Copy link
Copy Markdown
Author

Also, any chance you could check how much time this loop takes for a 2 MB WAL?

Yup I can do that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants