-
Notifications
You must be signed in to change notification settings - Fork 1
Description
@SimonWoolf's feedback:
I guess my main feedback at this stage is that it doesn't really feel like a protocol, but rather like a framework on which someone could build a protocol. Like, if I was someone trying to write a library to implement this, I'm not sure there's much I could actually practically implement until I knew a lot more details about the transport, storage, history, diff algorithm, what datatype the data will be, etc. Or at least a lot more concrete details on the minimum capabilities that all of those things all have to conform to.
I know it's not finished yet - I guess I'm worried that even the bits which don't seem obvious unfinished seem kinda underspecified. For example, as an implementer I need to know if I should write code in the lib to reorder messages that I get from the transport, using the serial. The spec says "Ordering is important". OK, sure, but is it required? Well, that's up to the transport. But I'm trying to write the core library, independent of the transport, and the spec doesn't give any way for the transport to expose whether it guarantees ordering.
Or say I want to implement checksumming. I read:
However, given the MD5 checksum needs to be applied to an encoded object, the
typeattribute may take the form utf-8/MD5 which indicates that the object is first serialized to a utf-8 string, and then encoded as MD5
First stumbling block is that as a lib implementer, I have no clue what the data is, apart from "any type of Object or language primitive" -- it's just an opaque thing I'm supposed to get from the transport and pass straight to the diffing plugin. That doesn't work if I have to serialize it, I need to know what it is.
Second, "serialized to a utf-8 string" how? Presumably JSON? Even if you specify that, that'd still be too vague to be used for checksumming purposes -- JSON doesn't require any particular order of field serialization, allows optional whitespace, has a whole bunch of characters which may be escaped but don't have to be, etc. You could clarify all of these, but my guess is you'd end up with a ginormous how-to-uniquely-serialize-any-object spec (that probably still wouldn't cover all edge cases), that no-one'd ever implement.
Feedback from @mezis on this:
JSON has (AFAIK) at least one means of being deterministically encoded into a string: JWT.
A way to checksum a JSON blob is to JWT-encode it with a well-known (e.g. nil) key and hash that.
The checksum-type header could still have a few reserved values, which we could specify for delta formats ("algorithms") we also reserve.
For instance: type=jwt+md5 could be used with json-patch, type=sha256 for binary diffs.
Re delta algorithms, myers is a string diff algorithm, it doesn't identify an output format. the diff utility uses myers to calculate diffs, but can output in several different formats (--ed --rcs, --unified, --normal, ...). And once a patch is generated, the subscriber doesn't care what algorithm was used to construct it, whether Myers or anything else, as long as they can apply it.
I'll stop there -- on rereading this it's coming out way more negative than I meant it, considering it's just an early draft of ideas, so sorry about that :simple_smile: