Re: compression API available in Java & C++?



On 2005-12-07, Bjorn Abelli penned:
>
> "Monique Y. Mudama" wrote...
>> On 2005-12-07, Chris Smith penned:
>
>>> 3. Out-of-stream dictionary. If you can devise a dictionary of
>>> pre-known patterns that are likely to occur frequently in the
>>> data, then you could pre-build that dictionary and devise a
>>> compression scheme that asume them.
>>
>> I would have to think about this. I'm pretty sure the data is only
>> ASCII characters, and of those only the common ones that show up on
>> a US keyboard; is that predictable enough for a dictionary
>> approach?
>
> If the characters are somewhat evenly distributed, I don't think you
> could gain so much by what I guess Chris meant, as it would mean
> that the number of "patterns" would be just about the number of keys
> on the keyboard, hence almost as many as can fit into a byte
> itself...

Yeah, that's kind of what I came to realize earlier tonight ...

> Although, there could be some more "compression" made if you're
> *really* sure that it's only ASCII.
>
> Then you can simply drop the insignificant bit (make each character
> 7-bits) and pack them into 11 bytes instead of 12. But then you've
> only gained one byte.
>
> If you can investigate what the data really is comprised of, you
> could probably make an even better chart to map a kind of
> dictionary.
>
> Let's say that the data sent is only alphanumerical characters (0-9,
> a-z and A-Z), which "could" be the case, then you can make your own
> schema for mapping each character to 6 bits instead of 8. After
> packing those 12 characters in that schema, you would end up with
> sending 9 bytes instead of 12.
>
> There would actually be room to map additionally 11 characters...
> ;-)

We also have some punctuation to deal with, so we'd probably end up
needing the full 6 bits, or maybe even all the way to seven. Bleh!

Thanks to you and everyone else who has made some suggestions. I am
going to think about it. It may be that our data just isn't
well-suited to compression; maybe I can come up with ways to
reorganize the data for compression, but I don't think that would be
well received because we already have apps coded to the existing
formats.

--
monique

Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html
.



Relevant Pages

  • Re: Reduce numbers to one number
    ... jpg is lossy in that the uncompressed image doesn't equal the original. ... more to compression. ... could jump up to 16 bit "characters" e.g. unicode. ...
    (sci.math)
  • Re: Some questions
    ... Briefly some methods uses 2 or even 4 byte per characters, ... compressible you can achieve compression on most files." ... The very problem is that on a pure etropic file you cast prediction on ... sort (or otherwise give an alternative representation) of such a file ...
    (comp.compression)
  • Re: mp3AIFF
    ... A quick lesson in lossy compression... ... AAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAABBBBBBBBBCCCCCCCAABBBBBBBBBBBBBB ... Now the same information takes up 17 characters, ... That's lossless compression; you still have all the information. ...
    (uk.comp.sys.mac)
  • Re: Need specific BootCamp/Vista advice please
    ... Hex garbage characters might work! ... Not for the purpose of tight compression of a file, ... Duplicate the garbage file about 4 times, stuff dups in a folder. ...
    (comp.sys.mac.system)