Skip to content

Idempotence is not the same as Request Deduplication #29

@ctapobep

Description

@ctapobep

There's a common confusion about Idempotence among the adepts of the message-oriented architecture - they often confuse it with Deduplication.

This confusion started long time ago. E.g. the book Enterprise Integration Patterns (EIP) already mixes up these concepts, and it's published in 2003! See Idempotent Receiver chapter.

What's Idempotence?

Idempotence / Idempotency is a simple property with a fancy name: the result doesn't change if you repeat the operation again. Idem - the same, potence - power. Math examples: multiply by 1 and then again - the result is the same. Or a more typical: project a vector onto a line, and then project the result on the very same line - the projection will stay where it was.

In programming Idempotence can drastically simplify the architecture. Compare these:

  1. Remove the last element from the list - this is dangerous. After we do it once, another element becomes "last". We have to ensure we can't repeat this operation accidentally.
  2. Remove element ID=xxx from the list - this is safe and can be repeated many times

Idempotence in HTTP

Notably, HTTP spec defines some of the methods idempotent: GET, PUT, DELETE. Why? Well, if you DELETE something, you can delete it multiple times - the result is the same. If you replace the full content of the resource (PUT) - same thing.

But if you create a new resource (POST) - the server will create a new thing each time, so it's not idempotent. How do we turn POST into PUT? If the ID of the newly created resource is generated on the client side, then we can replace dangerous POST with a safe PUT. This removes the distinction between creation and updating.

Notice, that the spec defines guarantees, not how we implement them. We'll soon see the difference.

Deduplication is applicable when there's NO Idempotence

Let's say we "delete last element". How to ensure we can't repeat it? Deduplication could be a solution! E.g. we could send a unique RequestID with any "delete last" request, store the RequestID somewhere and the next time check if that ID has already been seen.

So deduplication isn't a means to idempotence, it's a solution if our operation is NOT idempotent. And it's a complicated solution that requires storage. You don't want it. If it's possible we want to redesign the operation to be idempotent instead.

Idempotent API != Deduplicating API

Now suppose that we talk to a service, and it guarantees the operation can be safely repeated. But we don't know the implementation details: could be a data store with the RequestID, could be a true idempotent operation. Can this guarantee be called Idempotence in both cases? Not necessarily! Consider Delete(ID) operation. If it's a soft-delete (we mark the record as deleted), it's possible to un-delete the record in the future. Then:

  1. Idempotent operation applied to the un-deleted resource will delete it again.
  2. Request Deduplication will drop the request entirely, and the resource won't be deleted!

Idempotent POST (Content-Addressable Storage)

Okay, but what if we do want to use POST, and we can't supply the ID by the client during resource creation. Can we make POST idempotent? Or do we have to resort to Deduplication?

Git is a good example here: when adding a file with the same content, Git wouldn't actually store the same content twice. This is because it calculates the hash from the content - and that hash becomes the object ID. This is called Content-Addressable Storage because we can find where the object is by its content.

So if you can make objects Content-Addressable, then POST becomes Idempotent and you don't have to worry about Deduplication. The ID itself becomes a sort of Deduplication Key. The difference is: it's not generated by the client, the server calculates it from the content.

Summary

Unfortunately, because this mistake in the terminology is a part of popular tools like Kafka, the mistake will stay with us for long, if not forever. The confusion is getting propagated even further:

  • HTTP core spec is vague about the term - current wording doesn't differentiate between Idempotence and Deduplication. At the time people didn't have a reason to confuse it with Deduplication anyway.
  • And some new HTTP specs start to show up (see Idempotency-Key HTTP Header) that keep spreading the confusion.

To get notifications about new posts, Watch this repository.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions