Re: dual core and c



In article <4f2cd564-c9e1-412c-8c01-778b4c581d7f@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
Paul Hsieh <websnarf@xxxxxxxxx> wrote:

To get more performance, you will need 1) To paralellize the algorithm
(its not obvious that can even be done for zip, which is LZ based
compression/decompression)

[OT]

It turns out that parallelizing LZ compression or decompression is
not difficult. The LZ algorithms reset the code dictionary from
time to time, in order to adapt to changes in the data. This is
done by emitting a "flush" token and flushing out all pending
encodings, starting the encoded data again on a byte boundary.
This is the same token that is used at end of file. (A consequence
of this is that you can simply concatenate together several LZ
encoded files and the decompression of the result would be the same
as if the original files had been appended together before compression.)
Different LZ implementations use different strategies for deciding
-when- to flush, but this flushing behaviour is a fairly consistant
property of the family of algorithms.

Thus, in order to parallelize LZ compression, all that has to be
done is to split the file up into pieces, do an LZ compression
independantly on the pieces, and concatenate the encoded pieces
together to form the output. There may be a small loss of encoding
efficiency in doing so (corresponding to the extra output from
flushing at each boundary and rebuilding the data dictionary); on
the other hand, it could also happen that the result was smaller than
it would otherwise be, if it happened that the boundaries chosen
lay near places where the statistical properties changed noticably
and the LZ algorithm would have otherwise delayed resetting the
dictionary until it was more sure that a dictionary reset was
warranted.

For parallel decompression, you have to find the reset tokens
in the encoded data, split the data streams there, and decode
each chunk, concatenating the results.
--
"We may gather out of history a policy, by the comparison and
application of other men's forepassed miseries with our own like
errors and ill deservings." -- Sir Walter Raleigh
.



Relevant Pages

  • bzip2 concatenated files
    ... I have an application that writes out many small gzip files. ... documentation specifically states you can concatenate two or more gzip ... I am now planning to implement bzip2 compression, ... concatenated bzip2 files are still valid bzip2 files. ...
    (comp.compression)
  • Re: Universal Similarity Metric
    ... concatenate A and B and run it through a compressor, ... establish whether a recently-discovered work by a famous composer ... Bear in mind the trap of dependence on compression technique. ...
    (comp.programming)