Idempotence is not the same as Request Deduplication

There's a common confusion about **Idempotence** among the adepts of the message-oriented architecture - they often confuse it with Deduplication. 

This confusion started long time ago. E.g. the book Enterprise Integration Patterns (EIP) already mixes up these concepts, and it's published in 2003! See [Idempotent Receiver](https://www.enterpriseintegrationpatterns.com/patterns/messaging/IdempotentReceiver.html) chapter. 

# What's Idempotence?

Idempotence / Idempotency is a simple property with a fancy name: the result doesn't change if you repeat the operation again. _Idem_ - the same, _potence_ - power. Math examples: multiply by 1 and then again - the result is the same. Or a more typical: project a vector onto a line, and then project the result on the very same line - the projection will stay where it was.

In programming Idempotence can drastically simplify the architecture. Compare these:
1. Remove the last element from the list - this is dangerous. After we do it once, another element becomes "last". We have to ensure we can't repeat this operation accidentally.
2. Remove element ID=xxx from the list - this is safe and can be repeated many times

# Idempotence in HTTP

Notably, HTTP spec defines some of the methods idempotent: GET, PUT, DELETE. Why? Well, if you DELETE something, you can delete it multiple times - the result is the same. If you replace the full content of the resource (PUT) - same thing. 

But if you create a new resource (POST) - the server will create a new thing each time, so it's not idempotent. How do we turn POST into PUT? If the ID of the newly created resource is generated on the client side, then we can replace dangerous POST with a safe PUT. This removes the distinction between creation and updating.

Notice, that the spec defines _guarantees_, not how we implement them. We'll soon see the difference.

# Deduplication is applicable when there's NO Idempotence

Let's say we "delete last element". How to ensure we can't repeat it? Deduplication could be a solution! E.g. we could send a unique RequestID with any "delete last" request, store the RequestID somewhere and the next time check if that ID has already been seen.

So deduplication isn't a means to idempotence, it's a solution if our operation is NOT idempotent. And it's a complicated solution that requires storage. You don't want it. If it's possible we want to redesign the operation to be idempotent instead.

# Idempotent API != Deduplicating API

Now suppose that we talk to a service, and it guarantees the operation can be safely repeated. But we don't know the implementation details: could be a data store with the RequestID, could be a true idempotent operation. Can this _guarantee_ be called Idempotence in both cases? Not necessarily! Consider Delete(ID) operation. If it's a soft-delete (we mark the record as deleted), it's possible to un-delete the record in the future. Then:

1. Idempotent operation applied to the un-deleted resource will delete it again.
2. Request Deduplication will drop the request entirely, and the resource won't be deleted!

# Idempotent POST (Content-Addressable Storage)

Okay, but what if we do want to use POST, and we can't supply the ID by the client during resource creation. Can we make POST idempotent? Or do we have to resort to Deduplication?

Git is a good example here: when adding a file with the same content, Git wouldn't actually store the same content twice. This is because it calculates the hash from the content - and that hash becomes the object ID. This is called Content-Addressable Storage because we can find where the object is by its content.

So if you can make objects Content-Addressable, then POST becomes Idempotent and you don't have to worry about Deduplication. The ID itself becomes a sort of Deduplication Key. The difference is: it's not generated by the client, the server calculates it from the content.

# Summary

Unfortunately, because this mistake in the terminology is a part of popular tools like Kafka, the mistake will stay with us for long, if not forever. The confusion is getting propagated even further: 

* HTTP core spec [is vague about the term](https://github.com/httpwg/http-core/issues/1123) - current wording doesn't differentiate between Idempotence and Deduplication. At the time people didn't have a reason to confuse it with Deduplication anyway.
*  And some new HTTP specs start to show up (see [Idempotency-Key HTTP Header](https://datatracker.ietf.org/doc/draft-ietf-httpapi-idempotency-key-header/)) that keep spreading the confusion.

---
To get notifications about new posts, Watch this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idempotence is not the same as Request Deduplication #29

What's Idempotence?

Idempotence in HTTP

Deduplication is applicable when there's NO Idempotence

Idempotent API != Deduplicating API

Idempotent POST (Content-Addressable Storage)

Summary

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Idempotence is not the same as Request Deduplication #29

Description

What's Idempotence?

Idempotence in HTTP

Deduplication is applicable when there's NO Idempotence

Idempotent API != Deduplicating API

Idempotent POST (Content-Addressable Storage)

Summary

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions