Re: Is .zip compression lossless?

From: Michael Wojcik (mwojcik_at_newsguy.com)
Date: 03/15/04


Date: 15 Mar 2004 17:28:38 GMT


In article <405505B2.D5A404CA@yahoo.com>, CBFalconer <cbfalconer@yahoo.com> writes:
>
> If your objective is maximal compression for archival storage,
> look into bzip2. I believe it has the highest compression of
> anything available today.

It appears that PPM (Prediction by Partial Match, which AIUI uses
Markov chains to build a statistical model of the uncompressed data)
generally outperforms BWT (Burrows-Wheeler Transform) implementations,
such as bzip2, in terms of total size reduction for arbitrary input
data. And it looks like CTW (Context Tree Weighting) may beat PPM
algorithms.

Additional pre-processing can improve things further for some kinds
of input (eg most large text corpuses). LIPT (Length Index Preserving
Transform), a kind of star encoding, resulted in ~5% improvements for
BWT and PPM in one set of tests, for example.

But bzip2 is convenient, free, widely used, and apparently robust, so
for most people it's a fine solution. The relatively small advantages
that a few other schemes have over it won't matter unless you're
compressing a *lot* of data, and it's data that the other schemes
actually compress better (eg plain text).

-- 
Michael Wojcik                  michael.wojcik@microfocus.com
The guy who's fast in the mountain pass is the coolest.
-- _Initial D: Second Stage_


Relevant Pages

  • Re: Image Files - Safety
    ... I did Google both, found "Bzip2" compression ... I really don't know the level of expertise of the guy. ... you really should know how to Google. ... Bzip2 obviously serves for data compression. ...
    (alt.computer.security)
  • Very fast parallel bzip2.
    ... for me during compression was 8.63x faster with only 4 CPUs. ... 0.05% larger than achieved by bzip2. ... I have not played with compiler optimisations, but pbzip2 was built with ...
    (comp.unix.solaris)
  • Re: PPMZ2 Question!!
    ... bzip2 is rather effective for files that fit within a single compression ... will see that BZIP2 is well below PPMZ2 in compression performance!!! ...
    (comp.compression)
  • Re: Zbip2 Size limitatations ?
    ... John, ... I'll try with recompiling bzip2... ... > Franck Y wrote: ... >> I make several backup with bzip2, it seems that he has the better compression. ...
    (Fedora)
  • Re: Word count of minimum vocabulary
    ... overweighs the gain of employing schemes like Huffman, ... error recovery schemes, ... normal compression techniques to texts that only use ... processsing time (consider e.g. the task of spelling checking). ...
    (sci.lang)