Compressing Matrix, part 2

11/07/2019 — 3 Min Read — In Software

In my previous post, I pondered that protocols that are a sequence of fairly repetitively-structured JSON documents could probably be better compressed with a bit of rearrangement. To wit, separate the structure (keys, the equivalent of []{}, and so forth) from the data, then compress that using separate contexts.

Let's elaborate on that.

In principle, this could be done by creating a custom version of JSON which can represent a few more things, namely where the data was moved to (a separate area for incompressible values seems advisable, so I can't just leave it implicit). However, there is already a format that's a superset of JSON, CBOR.

And as I write that I realise that I don't really NEED a superset. If every value is moved out-of-line, then an ordinary integer can stand as the placeholder. Which means we could do this with good old fashioned JSON. I know from other experiments and a priori reasoning¹ that CBOR doesn't actually compress much better than JSON unless you have a lot of base64 data (which for Matrix is plausible, admittedly), in which case you don't need the compression for it to be a win. However, JSON libraries are really easy to get at, and we don't need max efficiency to validate the principle.

So, let's describe something that might work.

JSON Message Stream Encoding, version 1

This protocol compresses a unidirectional stream of JSON objects. If duplex is required, run two copies.

Transport

Two things are required from the transport: the ability to negotiate parameters, and message-oriented framing.

Parameters

Two parameters are defined: the number of contexts, and the compression format to use. The defaults are 3 (the minimum) and DEFLATE respectively. Implementations should negotiate appropriate values for these.

Contexts

There must be at least 3 of these negotiated. Each but the last has its own instance of the compressor or decompressor associated with it. Contexts are numbered starting from zero, and ending with one less than the total number of contexts.

Context 0 contains the structure of messages.

The highest-numbered context is never compressed, as it is used for incompressible values.

Context 0

This is a JSON document, either an array or object, in which array and object values are exclusively integers. Each integer represents a data item which has been sent in another context, which one being given by the value of the integer.

Contexts other than 0

These are JSON documents, specifically arrays. The nth item in the array corresponds to the nth appearance of that context's number in context 0.

Decoding

This section describes an imperative algorithm for decoding a message.

Read NUM_CONTEXTS messages from the transport. Let these be the array ctx.
Decompress each of these but the last, and decode the JSON within.
Associate with each member of ctx a counter starting at zero, call that array cc.
Walk the document in ctx[0], in wire order, stopping at each integer that is not an object key.
1. Let that integer be i
2. Let idx be cc[i]
3. Replace the current item with ctx[i][idx].
4. Increment cc[i]
Return ctx[0].

Websocket binding

Disable Websocket's own compression. Transport messages in the above are websocket messages. The subprotocol is net.berigora.jmse.1

Negotiation takes the form of the client sending a text websocket message like:

{
    "type": "offer",
    "encodings": ["DEFLATE"],
    "max_up_contexts": 10,
    "max_down_contexts": 10
}

The server responds (also text) like:

{
    "type": "accept",
    "encoding": "DEFLATE",
    "up_contexts": 3,
    "down_contexts": 3
}

In an offer, the encodings array specifies which compression formats the client is able to use. max_up_contexts and max_down_contexts specify the maximum number of contexts the client is willing to use for encoding and decoding respectively.

In an accept, the server indicates what parameters it has actually accepted. up still means encoded by the client and decoded by the server, and down the opposite direction. If the server is unable to fit within the requested parameters, it should send an error:

{
    "v": 1,
    "type": "error",
    "error": "not_enough_contexts"
}

Errors:

not_enough_contexts : Server's encoder requires more contexts than the client is willing to accept.

unsupported_encoding : None of the encodings the client is willing to use are supported by the server.

Security considerations

A server that blindly uses max_up_contexts or max_down_contexts as the actual values could be made to exhaust its memory. Incautious choice of encodings will make this problem much worse.

The obvious implementation—all values in one context—is vulnerable to compression oracle attacks. This can be mitigated by an encoder that uses a separate context for sensitive information.

There are a few obvious shortcomings in this: no next-protocol field (ala ALPN in TLS or Sec-WebSocket-Protocol in websocket), the negotiation adds an RTT, any protocol that uses JSON objects as dictionaries will find the keys stuck in context 0 rather than someplace more sensible, potentially worsening compression (oddly enough, Matrix is one of these). And the encoder and decoder have to buffer the entirety of large messages!

However, it does have the advantage of being a very simple to implement first attempt, enough to show whether this is a sensible approach. I do have reason to believe this is a sensible approach, as EXI does something similar, including the “buffer the whole message” problem. It's actually EXI4JSON I have to beat, preferably while being much simpler, or at least confining the complexity to very widely available parts.

CBOR doesn't magically replace the bulkiest parts of of your data with a more compact representation. In fact, strings can get bigger by being encoded as a whole collection of smaller strings. Any string longer than 23 bytes does not shrink except by the replacement of escapes with literal bytes, either. Arrays and objects do, since , and : are implied, but these characters make up only a small fraction of the representation in the first place.↩