Someone wrote in [personal profile] diziet 2020-07-14 08:13 pm (UTC)

CBOR features vs bugs

I've been researching such formats as well, recently. I have none of the history here, and only know what I've found about the formats themselves. For my application (streaming over the network), I ended up choosing CBOR, because it has features I need. I'm a little confused about some of the things you've held up as problems with CBOR.

If you're storing data you have fully available, then certainly using a fixed-length string/byte type makes sense. But people do use CBOR to stream over a network, and it's helpful to not have to fully buffer data before sending it out. Suppose you're reading bytes being generated by a program, into a buffer, and wanting to stream those out over the network. Using CBOR, you can send out the result of each read() or each 4096-byte buffer as a packet in an indefinite string, and send a break when you get the EOF. As far as I can tell, with MessagePack, either you have to buffer all the data and send it only when you have it all, or you have to create your own equivalent to the CBOR indefinite string atop MessagePack by saying "semantically, in this protocol, when you get this series of strings at this point in the protocol, you should interpret them as one logical buffer".

Valid implementations are allowed to reject indefinite-length encodings if they don't want to deal with them, or for that matter they could specify a length limit; for instance, a tiny device that doesn't have malloc could reject them, or could set a maximum limit based on a buffer size.

You mentioned that MessagePack has encoding space reserved for extensions. CBOR does as well: it has space for coordinated/standardized extensions (where there must be a standard), and space for less-coordinated extensions (where the extension number can just be reserved for a given usage without requiring substantial justification). I'm trying to figure out what you're suggesting MessagePack has done there that CBOR hasn't.

Given that both formats are schemaless (with optional schema support available), anything you're not expecting you can generally just reject, as long as you handle it. So, if your format doesn't expect (say) 16-bit floats, or doesn't expect floats at all, you could just treat them like you do any other unknown format, and reject them. A general-purpose decoding library would need to handle them, but that's just one more branch in a switch statement and one more variant in an enum/ADT.

It sounds to me like both MessagePack and CBOR will work well for many purposes, and there are other purposes for which one or the other may work better. If you specifically want fixed-size buffers and a simpler implementation, MessagePack sounds helpful; if you need indefinite-length buffers or some additional data types, CBOR could make that easier.

Post a comment in response:

(will be screened)
(will be screened if not on Access List)
(will be screened if not on Access List)
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting