Re: dual core and c
- From: roberson@xxxxxxxxxxxxxxxxxx (Walter Roberson)
- Date: Fri, 29 Feb 2008 18:49:15 +0000 (UTC)
In article <4f2cd564-c9e1-412c-8c01-778b4c581d7f@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
Paul Hsieh <websnarf@xxxxxxxxx> wrote:
To get more performance, you will need 1) To paralellize the algorithm
(its not obvious that can even be done for zip, which is LZ based
compression/decompression)
[OT]
It turns out that parallelizing LZ compression or decompression is
not difficult. The LZ algorithms reset the code dictionary from
time to time, in order to adapt to changes in the data. This is
done by emitting a "flush" token and flushing out all pending
encodings, starting the encoded data again on a byte boundary.
This is the same token that is used at end of file. (A consequence
of this is that you can simply concatenate together several LZ
encoded files and the decompression of the result would be the same
as if the original files had been appended together before compression.)
Different LZ implementations use different strategies for deciding
-when- to flush, but this flushing behaviour is a fairly consistant
property of the family of algorithms.
Thus, in order to parallelize LZ compression, all that has to be
done is to split the file up into pieces, do an LZ compression
independantly on the pieces, and concatenate the encoded pieces
together to form the output. There may be a small loss of encoding
efficiency in doing so (corresponding to the extra output from
flushing at each boundary and rebuilding the data dictionary); on
the other hand, it could also happen that the result was smaller than
it would otherwise be, if it happened that the boundaries chosen
lay near places where the statistical properties changed noticably
and the LZ algorithm would have otherwise delayed resetting the
dictionary until it was more sure that a dictionary reset was
warranted.
For parallel decompression, you have to find the reset tokens
in the encoded data, split the data streams there, and decode
each chunk, concatenating the results.
--
"We may gather out of history a policy, by the comparison and
application of other men's forepassed miseries with our own like
errors and ill deservings." -- Sir Walter Raleigh
.
- References:
- dual core and c
- From: kerravon
- Re: dual core and c
- From: Paul Hsieh
- dual core and c
- Prev by Date: calculating length of an substring
- Next by Date: Re: calculating length of an substring
- Previous by thread: Re: dual core and c
- Next by thread: Re: dual core and c
- Index(es):
Relevant Pages
|