Re: 8088 LZSS+RLE decompression: Can this be improved?



Terje Mathisen wrote:
If you can afford to spend some more space to store the "compressed"
image, then you can use a completely different approach, most often used
in the form called "compiled sprites".

That was my first idea, but the cost of literal expansion was too high.
For example:

a) Literal:
mov ax,1234h
stosw

Total = 6 bytes (4 code + 2 data)

Burning 4 bytes to indicate a (2-byte) literal didn't sit well with me.
By using an LZSS scheme, the only data expansion of a literal is a
single bit, since I pack 16 bits into a "control block" word. So
that's the primary reason I abandoned the idea.

b) Run:
mov ax,2345h
mov cx,5
rep stosw

Total = 8 code bytes + 2 per word stored
2 words -> 12 -> 6/word
3 words -> 14 -> 4.7/word
4 words -> 16 -> 4/word
8 words -> 24 -> 3/word

c) Back reference:
lea si,[di-1234h]
mov cx,6
rep movsw

Total = 9 code bytes + 4 per word copied
4 words -> 25 -> 6.25/word
5 words -> 29 -> 5.80/word
9 words -> 45 -> 5/word

I.e. runs are good, back refs to previously generated data is pretty bad
due to the very expensive copying, while literals take 24 cycles/word.

Yeah, this is what I found as well -- that runs are the only thing to
truly benefit from this. I have no doubt that I will write an RLE-only
compressor that outputs x86 code, but whenever match/offsets or
literals are involved, the size expansion cost is too great.
Considering that "pure" LZSS (where the tokens are not run through a
statistical coder) isn't terribly efficient to begin with --
*especially* since my tokens are word-aligned instead of byte-aligned
-- I couldn't justify it.

.