Re: I think you're wrong on several of the technical aspects

Thanks for taking the care to review things in such detail.

On the tagging, you're right about ASN.1. And I should have said "So these tags are not particularly useful", since I do see and understand the use you point out. I just think it's not important enough to add a whole additional layer of complexity.

On the integer formats: I do not find the intended interpretation of "[a byte string ZZ ZZ] stores a big-endian signed integer" at all ambiguous. Obviously in a document written in 2010 this means twos complement. Conversely the MessagePack spec's negative fixnums explicitly say that "111YYYYY is 8-bit signed integer" so the fact that there the sign bit is outside the value field is quite explicit.

You seem to be suggesting that the spec is ambiguous as to whether a uint16 and an int16 should be treated as the same type, but that interpretation is explicitly contradicted by the type system set out right at the top of the document. You seem to have noticed, but then weirdly don't seem to regard this as sufficiently conclusive. The parts of the spec that you seem to regard as casting doubt are encoding descriptions. You say "111YYYYY is 8-bit signed integer" is "gnomic". Perhaps you are simply struggling with the grammar? (Did you notice that the spec author is not a native English speaker?) It is obvious to me that this is a statement about how the 5-bit value field in the negative fixnum is to be interpreted. (Frankly that statement even seems not entirely necessary to me since it seems that "111YYYY stores a 5-bit negative integer" is unambiguous anyway.)

Range checking is not necessary to select the right encoding format. Choosing the shortest encoding is not mandatory (neither is it in CBOR). An encoder can simply take its int16_t and stuff a 0xd1 byte on the front. As for the reader, I was imagining a reader which knows it is expecting an int16_t. With MessagePack a decoder which sees an int16 can simply use it. With CBOR there are two 16-bit integer formats, which (if you like to look at it that way) combine to make a single 17-bit signed integer format with a weird encoding for the sign bit - so the decoder must check that the value is not out of range. One might dismiss this as an irrelevant optimisation and say that a decoder must be able to handle all kinds of integer. But (i) the straightforward encoding can provide a fast path and (ii) in a system where the writers are cooperating to use a supported subset, a limited decoder can be a lot simpler than the corresponding limited CBOR decoder here.

I did not see any ambiguity in the MessagePack specification when I read it, and I still don't having seen your comments. Certainly it is not written in the very defensive style common to modern standards bodies: very carefully worded to avoid even the possibility of disingenuous defences of implementation errors, or mistakes made by the "implementor from Mars" problem (see also: AIX; PP's and Exchange's SMTP implementations). CBOR is written that way and the result is extremely turgid prose. Conversely, MessagePack's specification is written assuming a reader with a roughly contemporary programming background and depends on good faith to avoid misinterpretations. Paradoxically this is the style of early RFCs.

Perhaps part of what's going on here is a difference in philosophy about how standards docs should be written. I prefer documents that have not been armour-plated against weird or disingenuous readings.

(7 comments)

Re: I think you're wrong on several of the technical aspects

Post a comment in response: