Re: Java text compression



Bonus question for OP: what is the size of data sets and how are they used? Especially, where are they stored?

Multi-terabyte sized, split across multiple machines. On a single machine, generally not more than a few hundred Gb. One or two disks per machine, SATA, no RAID.

At compression time, the data is streamed from an external source, transformed in memory, and written to disk.

At decompression time, the app seeks to the particular block of text of interest and decompresses it. Seek time dominates decompression time, *except* when we do heavy caching, in which case the decompression becomes the bottleneck. Storing the decompressed text in memory takes up too much space. Has to be cached in compressed form.

.



Relevant Pages

  • Re: Two new Hutter Prize candidates
    ... This test requires 1 GB memory, ... Compression time is about 3 hours on an AMD 3500. ... 16 minutes to run and actually needs the 2 GByte of memory (same time and memory for compression and decompression) - got the same enwik8 back after decompression of the 17.958.687 bytes large archive. ...
    (comp.compression)
  • Re: Image compression with a 8 bit microcontroller
    ... decompression, way back when, was quite impressive. ... I do seem to recall (though age has somewhat withered my memory) that the ... The 386 may also benefit from its SHRD and BT instructions. ... LZ77 uncompression does not need *any* extra ...
    (comp.arch.embedded)
  • Re: Java text compression
    ... One or two disks per machine, SATA, no RAID. ... At compression time, the data is streamed from an external source, transformed in memory, and written to disk. ... Seek time dominates decompression time, *except* when we do heavy caching, in which case the decompression becomes the bottleneck. ...
    (comp.lang.java.programmer)
  • Re: speed in extracting rar files - unrar vs. 7z
    ... IIRC the unix version is portable C, but winrar has a lot of CPU ... Since disk can read faster then the decompression, ... I forgot to state that the disks are SATA300, and i ran WinRAR on the ...
    (freebsd-questions)
  • Re: LZO under C#
    ... > memory high speed compression and that is the best available. ... For DEcompression, LZP vs. LZO/LZRW-1, it depends on implementation. ... On systems where memory speed is fast and you have fewer registers, ...
    (comp.compression)