Re: I think you're wrong on several of the technical aspects

You seem to be suggesting that the spec is ambiguous as to whether a uint16 and an int16 should be treated as the same type, but that interpretation is explicitly contradicted by the type system set out right at the top of the document. You seem to have noticed, but then weirdly don't seem to regard this as sufficiently conclusive. The parts of the spec that you seem to regard as casting doubt are encoding descriptions.

Hardly. I cited the actual description of the 'Type system', which is actually at the top of the document, and which mentions no such distinction -- indeed, it doesn't mention signedness at all. The first suggestion that there might be a distinction to be drawn is in 'Formats: Overview'. There's a difference between a value's type and its representation, so the latter certainly shouldn't be considered to override the former. (On the one hand, Lisp systems will select different representations for integers depending on their size and context[1]. On the other hand, an ISO8859-1 string and a raw octet vector have the same representation, but are probably best considered as different types.)

Anyway, I'm suggesting that the spec is ambiguous precisely because there are multiple interpretations of it, out there in the world. You formed one; I formed another, and I have justified my interpretation with extensive citations, not only of the specification itself, but of bug-tracker discussions about it, and 'blessed' implementations. There are also implementations which support your interpretation, which just reinforces my point that the specification is bad.

Range checking is not necessary to select the right encoding format. Choosing the shortest encoding is not mandatory (neither is it in CBOR)

Indeed it's not mandatory, but it is strongly recommended. It's literally the only occurrence of RFC2119 SHOUTING in the document.

One might dismiss this as an irrelevant optimisation and say that a decoder must be able to handle all kinds of integer. But (i) the straightforward encoding can provide a fast path and (ii) in a system where the writers are cooperating to use a supported subset, a limited decoder can be a lot simpler than the corresponding limited CBOR decoder here.

This is a fair point.

I did not see any ambiguity in the MessagePack specification when I read it, and I still don't having seen your comments.

How do you reconcile this position with the observed fact that the 'msgpack-c' implementation (among a number of others) erases the distinction between positive and unsigned integers?

Certainly it is not written in the very defensive style common to modern standards bodies [...]. CBOR is written that way and the result is extremely turgid prose.

Adding the words 'two's complement' occasionally will not turn this document into the terrible Vroomfondel standardese of modern RFCs.

[1] It's true that Common Lisp, for example, partitions its integer type into fixnum for small-magnitude integers and bignum for large-magnitude integers. But everything is more subtle in actual implementations, which will store a value unboxed in a machine word or register, even if it's slightly too large to fit in a fixnum with the usual type tagging, if type and range inference shows that this will work.

(7 comments)

Re: I think you're wrong on several of the technical aspects

Post a comment in response: