Compressing Matrix, part 2
In my previous post, I pondered that protocols that are a sequence of fairly
repetitively-structured JSON documents could probably be better compressed with
a bit of rearrangement. To wit, separate the structure (keys, the equivalent of
[]{}
, and so forth) from the data, then compress that using separate contexts.
Let's elaborate on that.
In principle, this could be done by creating a custom version of JSON which can represent a few more things, namely where the data was moved to (a separate area for incompressible values seems advisable, so I can't just leave it implicit). However, there is already a format that's a superset of JSON, CBOR.
And as I write that I realise that I don't really NEED a superset. If every value is moved out-of-line, then an ordinary integer can stand as the placeholder. Which means we could do this with good old fashioned JSON. I know from other experiments and a priori reasoning1 that CBOR doesn't actually compress much better than JSON unless you have a lot of base64 data (which for Matrix is plausible, admittedly), in which case you don't need the compression for it to be a win. However, JSON libraries are really easy to get at, and we don't need max efficiency to validate the principle.
So, let's describe something that might work.
JSON Message Stream Encoding, version 1
This protocol compresses a unidirectional stream of JSON objects. If duplex is required, run two copies.
Transport
Two things are required from the transport: the ability to negotiate parameters, and message-oriented framing.
Parameters
Two parameters are defined: the number of contexts, and the compression
format to use. The defaults are 3
(the minimum) and DEFLATE
respectively.
Implementations should negotiate appropriate values for these.
Contexts
There must be at least 3 of these negotiated. Each but the last has its own instance of the compressor or decompressor associated with it. Contexts are numbered starting from zero, and ending with one less than the total number of contexts.
Context 0 contains the structure of messages.
The highest-numbered context is never compressed, as it is used for incompressible values.
Context 0
This is a JSON document, either an array or object, in which array and object values are exclusively integers. Each integer represents a data item which has been sent in another context, which one being given by the value of the integer.
Contexts other than 0
These are JSON documents, specifically arrays. The nth item in the array corresponds to the nth appearance of that context's number in context 0.
Decoding
This section describes an imperative algorithm for decoding a message.
- Read NUM_CONTEXTS messages from the transport. Let these be the array
ctx
. - Decompress each of these but the last, and decode the JSON within.
- Associate with each member of
ctx
a counter starting at zero, call that arraycc
. - Walk the document in
ctx[0]
, in wire order, stopping at each integer that is not an object key.- Let that integer be
i
- Let
idx
becc[i]
- Replace the current item with
ctx[i][idx]
. - Increment
cc[i]
- Let that integer be
- Return
ctx[0]
.
Websocket binding
Disable Websocket's own compression. Transport messages in the above are websocket
messages. The subprotocol is net.berigora.jmse.1
Negotiation takes the form of the client sending a text websocket message like:
{"type": "offer","encodings": ["DEFLATE"],"max_up_contexts": 10,"max_down_contexts": 10}
The server responds (also text) like:
{"type": "accept","encoding": "DEFLATE","up_contexts": 3,"down_contexts": 3}
In an offer
, the encodings
array specifies which compression formats the client
is able to use. max_up_contexts
and max_down_contexts
specify the maximum number
of contexts the client is willing to use for encoding and decoding respectively.
In an accept
, the server indicates what parameters it has actually accepted. up
still means encoded by the client and decoded by the server, and down
the opposite
direction. If the server is unable to fit within the requested parameters, it should
send an error:
{"v": 1,"type": "error","error": "not_enough_contexts"}
Errors:
not_enough_contexts
: Server's encoder requires more contexts than the client is
willing to accept.
unsupported_encoding
: None of the encodings the client is willing to use are
supported by the server.
Security considerations
A server that blindly uses max_up_contexts
or max_down_contexts
as the actual
values could be made to exhaust its memory. Incautious choice of encodings will make
this problem much worse.
The obvious implementation—all values in one context—is vulnerable to compression oracle attacks. This can be mitigated by an encoder that uses a separate context for sensitive information.
There are a few obvious shortcomings in this: no next-protocol field (ala ALPN in TLS
or Sec-WebSocket-Protocol
in websocket), the negotiation adds an RTT, any protocol
that uses JSON objects as dictionaries will find the keys stuck in context 0 rather
than someplace more sensible, potentially worsening compression (oddly enough, Matrix
is one of these). And the encoder and decoder have to buffer the entirety of large
messages!
However, it does have the advantage of being a very simple to implement first attempt, enough to show whether this is a sensible approach. I do have reason to believe this is a sensible approach, as EXI does something similar, including the “buffer the whole message” problem. It's actually EXI4JSON I have to beat, preferably while being much simpler, or at least confining the complexity to very widely available parts.
- CBOR doesn't magically replace the bulkiest parts of of your data with
a more compact representation. In fact, strings can get bigger by being
encoded as a whole collection of smaller strings. Any string longer than
23 bytes does not shrink except by the replacement of escapes with literal bytes,
either. Arrays and objects do, since
,
and:
are implied, but these characters make up only a small fraction of the representation in the first place.↩