Re: Random data cannot be compressed



In article <13tbpulfs7tbvb7@xxxxxxxxxxxxxxxxxx>,
tim <TimJ@xxxxxxxxxxxx> wrote:

On Mon, 10 Mar 2008 10:27:07 -0800, Ron Garret wrote:

In article <13t9v8vlcivje0d@xxxxxxxxxxxxxxxxxx>,
tim <TimJ@xxxxxxxxxxxx> wrote:

Below is an algorithm that will compress *25%* of all randomly generated
bit strings of length > 1.

I think I have an algorithm that can get close to 50% but I haven't
written it out yet.

Of course 25% is a considerable improvement on the figure of
0.00000....00001 quoted above. Such strings are likely to appear well
before the heat death of the universe.

00XXX... -> 0XXX...
01XXX... -> 101XXX...
1XXXX... -> 11XXXX...

Tested code follows:


No. You are cheating in two ways.

I did not claim to compress random strings on average. In fact I clearly
stated in the *same* posting that is impossible. At best you will break
even.

Nor is this really cheating. The algorithm is fixed in size and can
compressive any of the 25% of random strings that meet its criteria.

It is cheating. You are cherry-picking which strings "count". The
result is that the data you are successfully compressing is no longer
random.

My point is that algorithms exist that, when given a completely random
string, can produce a shorter string in a large percentage of cases. This
was supposed to be impossible.

No, that is not what is impossible. What is impossible is producing an
algorithm so that the amount of compression you get when you apply the
algorithm to random data is more than the length of the algorithm. You
can choose how much random data you want to apply it to, but having
chosen the amount you do not get to cherry-pick the data you are
expected to compress. If you do then it's no longer random.

The only caveat is that it is possible to get lucky and happen to get a
random string that hits the sweet spot for your algorithm, and be able
to compress *that* random string by more than the length of your
algorithm. But the probability of that happening is vanishingly small.

rg
.



Relevant Pages

  • Re: Best Job Skill --> .NET or Java
    ... strings, ... But the same basic brute-force algorithm was ... It compiled a histogram of trigrams, ... finds one random trigram that is unique, it expands that one, ...
    (comp.programming)
  • Re: How to efficiently determine if a string contains any one of many strings
    ... If you are looking to apply an algorithm similar to determining what is ... the algorithm that is used in most heuristic spam filters. ... kinds of classifications lend themselves to searches for string literals: ... Of course, assuming more input strings to match, you'd have a lot more ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Data mining algorithm
    ... the "best" descriptive strings for data. ... That sounds like the first step of a compression algorithm. ... In a similar data set this column will show similar repetativeness at ...
    (comp.programming)
  • Re: Reversing a number
    ... )> Lemme ask you a question: Suppose I have an algorithm that solves the ... )> Return the base-10 digits of the input number, ... limited the problem space to specific strings and use numbers to ... I'm not working with numerals. ...
    (comp.programming)
  • Re: GA ver. Parallel tempering
    ... the bulk of processing time will be spent in evaluating potential ... algorithm itself [of course this may not be supportable for search spaces ... sense that evaluations are of necessity performed serially. ... numbers of strings in those regions. ...
    (comp.ai.genetic)