Re: perl multithreading performance



On 2008-08-28 17:49, Leon Timmermans <fawaka@xxxxxxxxx> wrote:
On Wed, 27 Aug 2008 14:25:32 -0700, dniq00 wrote:
Thanks for the link - trying to figure out whattahellisgoingon there :)
Looks like he's basically mmaps the input and begins reading it starting
at different points. Thing is, I'm using <> as input, which can contain
hundreds of gigabytes of data, so I'm not sure how's that going to work
out...

Is your computer 64 or 32 bits? In the former case mmap will work for
such large files, but the latter it won't.

Assuming <> is actually referring to a single file (if it doesn't, you
can just process several files in parallel), the same approach can be
used even without mmap:

Fork $num_cpu worker processes. Let each process seek to position
$i * $length / $num_cpu, and search for the start of the next line. Then
start processing lines until you get to position ($i+1) * $length / $num_cpu.
Finally report result to parent process and let it aggregate the
results.

hp
.



Relevant Pages

  • Re: reading file contents to an array (newbie)
    ... so it looks like mmap is a really good solution. ... One more thing, I am reading into arrays that can be 5000 cells wide, ... and arbitrarily long (time-resolved scientific data.) The datafiles are ... character indicates that the row continues on the next line of the file. ...
    (comp.lang.python)
  • Re: The crux of peoples issues with PLT Scheme?
    ... mmap of not being constrained to the size of the address space) then ... I started out with reading binary data with ordinary ports and some ... Later on someone posted the tip to use Bigloo its undocumented (ieee- ...
    (comp.lang.scheme)
  • Re: mmap() vs. O_DIRECT
    ... > that writecall the memory address returned from mmap() after mapping a ... > different file open for reading, even though it is properly aligned, I get ...
    (comp.os.linux.development.system)
  • Re: malloc(3) ignores RLIMIT_DATA
    ... why not to create mmap() flag MMAP_DSS to check RLIMIT_DATA and to use it in malloc? ... There has been general agreement among the people I've discussed this issue with that the correct solution is to add a separate resource limit for anonymously mapped memory, which would provide capabilities similar to what your suggestion would provide. ... Konstantine has updated his patches and reported on them in the recent status report: ...
    (freebsd-current)