Re: Interplatform (interprocess, interlanguage) communication
- From: BGB <cr88192@xxxxxxxxxxx>
- Date: Wed, 08 Feb 2012 00:55:53 -0700
On 2/7/2012 6:31 PM, Martin Gregorie wrote:
On Tue, 07 Feb 2012 16:38:31 -0700, BGB wrote:
in general, I agree (sockets generally make the most sense), althoughYes, for small amounts of data or message passing between processes I
there are cases where file-based communications can make sense, although
probably not in the form as described in the OP.
tend to like sockets - as others have said, the fact that they are
agnostic about the location of the communicating processes is often very
useful.
yep.
usually, for passing messages over sockets, I have used "compact"Yep. ASN.1 has to be about the most compact way of encoding structured,
specialized binary formats,
multi-field messages with XML occupying the other end of the scale.
I disagree partly WRT ASN.1:
a disadvantage of ASN.1 is that a lot of times it tends to use fixed-width integer encodings (and often sends structures in a "reasonably raw" form), whereas one can shave more bytes using a variable-length-integer scheme (why encode an integer in 4 bytes if you only need 1 byte in a given case?). it is also possible to shave more bytes if one makes the format use an adaptive/context-sensitive encoding scheme and maybe a variant of Huffman coding or similar (and possibly encode integer values using a similar scheme to that used in Deflate). it is in-fact not particularly difficult to outperform ASN.1 in these regards.
granted, yes, custom Huffman-based data encodings are probably not "the norm" for network protocols (though some programs, such as the Quake 3 engine, have used Huffman-compressed network protocols).
there is also "arithmetic coding" and "range coding", but with these it is a lot harder to make the codec be acceptably fast (whereas there are some tricks to allow optimizing Huffman codecs).
in cases where I have used XML, I have typically used a custom binary XML variant, which can greatly reduce the overhead vs textual XML. in terms of saving bytes, my encoding can be more compact than WBXML or XML+Deflate, but is arguably more "esoteric", and as-is doesn't make use of schemas (it is instead a basic adaptive coding, and is vaguely similar to an LZ-Markov coding, attempting to exploit repeating patterns in tag-structure and similar via prediction, but like most adaptive codings initially transmits the data in a less dense form as it needs to build up a new context for each message). the coding in question doesn't use Huffman coding (for sake of simplicity, and because I don't always particularly need "maximum compactness"), but a Huffman-based variant could be created if needed.
there is also EXI, but I don't know how my encoding compares (EXI probably does better though, given that IIRC it uses binary universal codes and schemas).
for something else of mine I am using S-Expression based messages (currently between components within the same process), and had considered using a vaguely similar binary coding if/when I get around to it.
That said, for short, list of fields messages I often use a CSV string
preceded by an unsigned binary byte value containing the string length:
this type of message is both easy to transfer, even if the connection
wants to fragment it during transmission, and by having a printable text
payload, its also convenient for trouble shooting.
yes, this is possible.
also possibly would be a TLV encoding (say, possibly doing something similar to the Matroska MKV file-format).
say, the integer values are encoded something like (range, encoding):
0-127 0xxxxxxx
128-16383 10xxxxxx xxxxxxxx
16384-2097151 110xxxxx xxxxxxxx xxxxxxxx
2097152-... ...
likewise, one can get a signed variant by folding the sign into the LSB, forming a pattern like: 0, -1, 1, -2, 2, ...
then, one defines tags as:
{
VLI tag;
VLI length;
byte data[length];
}
where tags can hold either data or messages (and, the smallest tag size needs 2 bytes, or 3 bytes if one has 1 byte of payload for the tag).
if the length is optional (presence depends on tag), one can reduce the typical tag size to 1 byte. likewise, tags can be combined with an MTF/MRU scheme such that any recently used tags have a small value (and can thus be encoded in a single byte). (many of my formats define tags inline, rather than relying on some large hard-coded tag-list).
more bytes can be saved if more of the message structure is known, say that not only does the tag encode a particular tag-type, but also may carry information about what follows after it (various combinations of attributes, and if it contains sub-tags and what they might be, ...).
if a new tag is defined, it is added to the MRU, but if not used frequently may move "backwards" (towards higher index numbers) or eventually be forgotten (falls off the end of the list).
note that some hard-coded tag-numbers will be needed for basic control purposes (encoding new/unfamiliar tags, ...).
a Huffman-based variant could be similar, just one may encode integers differently. an example scheme is to use a prefix value (Huffman coded) and a suffix bit pattern (similar to Deflate). a simpler (but less compact) scheme was used in JPEG, and IIRC I had before "compromised" between them by having the Huffman table be stored using Rice codes.
example (prefix range, value range, suffix bits):
0-15 0-15 0
16-23 16-31 1
24-31 32-63 2
32-39 64-127 3
40-47 128-255 4
48-55 512-1024 5
56-63 1024-2047 6
64-71 2048-4095 7
72-79 4096-8191 8
80-87 8192-16383 9
....
also note that a nifty thing (also used in Deflate) is to compress the Huffman table itself using Huffman coding.
likewise, one can save a few bytes if the encoder is smart enough to recognize when tags encode numeric data (mostly specific to XML, with S-Expressions or similar one knows when they are dealing with numeric data).
likewise, one can encode floats as a pair of integer values (although floats present a few of their own complexities). one can also devise special encodings for things like numeric vectors, quaternions, ... if needed as well.
likewise, either an LZ77 or LZ-Markov scheme can be used for encoding strings (an example would be to used a fixed-size rotating window like in Deflate, and essentially using the same basic encoding for strings, albeit likely with the use of an "End-Of-String" marker).
say (range, meaning):
0-255: literal byte values
258: End Of String
259-321: LZ77 Run (encodes length, followed by window offset).
String encoding would be used, say, for encoding both literal text, and also for escaping things like tag and attribute names.
....
the main variability is mostly in terms of the type of payload being transmitted:
be it XML-based, S-Expression based, or potentially object-based (similar to either JSON, or a sort of "heap pickling" style system).
for most structured data, it shouldn't be needed to change the "fundamentals" too much. the main difference is between tree-structured and heap-like / graph-structured data, as graph-structured data is often better sent as a flat list of objects with a certain entry being a "root node" than as a tree (this can be accomplished either by building a list, or using an algorithm to detect and break-up cycles when needed).
granted, for most use-cases something like this is likely to be overkill.
or such...
.
- References:
- Re: Interplatform (interprocess, interlanguage) communication
- From: jebblue
- Re: Interplatform (interprocess, interlanguage) communication
- From: BGB
- Re: Interplatform (interprocess, interlanguage) communication
- From: Martin Gregorie
- Re: Interplatform (interprocess, interlanguage) communication
- Prev by Date: packaging selenium code as jar
- Next by Date: Dynamic method invocation on Proxy object?
- Previous by thread: Re: Interplatform (interprocess, interlanguage) communication
- Next by thread: how to post to a frame outside an applet
- Index(es):
Relevant Pages
|