Re: Determine the compressibility of a data buffer .. how to do this fast ?
From: gswork (gswork_at_mailcity.com)
Date: 10/15/03
- Next message: gswork: "Re: Computer Program help"
- Previous message: Chris Dollin: "Re: A syntax good for kids learning programming?"
- In reply to: SuperFly: "Determine the compressibility of a data buffer .. how to do this fast ?"
- Next in thread: Floris van den Berg: "Re: Determine the compressibility of a data buffer .. how to do this fast ?"
- Reply: Floris van den Berg: "Re: Determine the compressibility of a data buffer .. how to do this fast ?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 15 Oct 2003 01:11:26 -0700
SuperFly <mail@mail.com> wrote in message news:<38ooov4mabr0ijprkjfvnt8tm729eqg991@4ax.com>...
> Hi all,
>
> I need a fast way to determine the compressibility of a data buffer in
> memory without using an actual compression scheme. I think the best
> way to do this is to determine the entropy (so i was told), but maybe
> there are faster and/or better ways. I would really appreciate it if
> somebody could point me to some source/pseudo code or a good tutorial
> on how to do this.
You could run through the buffer doing a compression, say an RLE type
count, i.e. count the number of sequences of the same byte value (e.g.
if there's 1000 255's in a row). If you end up with a relatively
small number of long runs the buffer is likely to compress well, if
there'e lots of small runs it probably won't. Not sure that's going
to be much faster than just doing it though!
I made a predictive utility for files once, very crude but
surprisingly predictive of whether the file would compress well.
Without heed to file structure or run-lengths i looped through the
buffer reading the value of each byte and 'counted' that in a 'counter
array'. If, in the end, there was a fairly even distribution of
values then the file would probably not compress well, if there wasn't
(i.e. there were 50,000 zeros and 10 of 255) then it probably would.
I've no doubt there are plenty of reasons not to use such a crude
approach, but it was good for discerning things like jpg files, which
generally won't compress much more than they already are and pdf
files, which sometimes do, sometimes don't, and bmp files, which
normally compress well.
the main part would go, from memory, something like:
for n:=1 to BufferLength do begin
b:= ByteArray[n] // assign the value of the byte from the buffer to
b, a byte
bCounter[b]:= bCounter[b] + 1 //increment count of each byte value
end;
bCounter is an array of integers from 0 to 255, they all start at 0
and increment by one each time the corresponding byte value is
encountered. In the end you have knowledge only of how many of each
value there is, and that seems to be be ok, and quite fast. You can
choose a number of ways to analyse that. I'm sure there are much
cleverer ways of predicting compressibility, and faster too. Infact,
i'll gladly benefit from such.
- Next message: gswork: "Re: Computer Program help"
- Previous message: Chris Dollin: "Re: A syntax good for kids learning programming?"
- In reply to: SuperFly: "Determine the compressibility of a data buffer .. how to do this fast ?"
- Next in thread: Floris van den Berg: "Re: Determine the compressibility of a data buffer .. how to do this fast ?"
- Reply: Floris van den Berg: "Re: Determine the compressibility of a data buffer .. how to do this fast ?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|