Re: SortMerge avoids thrashing (was: Bailey's "two pass" FFT algorithm question)



From: mike3 <mike4...@xxxxxxxxx>
Alright, I've tried implementing the merge transposition algorithm.

OK, the first question is whether you did it entire in actual RAM,
or did it entirely in virtual RAM with disk as backing store for
pages of RAM, or built merge runs using actual RAM then merged
using disk, or what? It makes a whale of a difference.

For a starter as to my overall advice how best to do it, I need to
know how much physical (actual) RAM you have available for data
(discount anything already used for system and application), and
how much virtual memory you are allowed, and how many disk or tape
drives you have available for merge runs.

But it still accesses the disk very heavily

What does that mean? You explicitly did an external merge, so you
control the disk access, or you tried to use virtual memory to do
the merge and swapping is using the disk? If you did an explicit
external merge, did you put each merge run on a separate drive,
such that each drive is accessing sequentually most of the time,
and all you hear is a slow click click click from each drive as it
moves to the next cylender of that disk? Or did you try to use a
single drive for all data, whereby the disk is constantly seeking
back and forth between the various input runs and the single output
run making a horrible vibrating noise.

some merge passes take like 4x longer than the quickest ones,
with very heavy disk access.

That sounds like you're trying to do the whole job in virtual
memory, with swapping causing that disk activity. Is that correct?
If so, don't do it that way!!!

I don't think there's any more "textbook" a definition of
"thrashing" than that -- it's wasting time accessing the resource
(the disk), so it's "thrashing" the resource!

Yeah, if you're using virtual memory for a task whose effective
working set is large than your real memory, it'll thrash, every
time, you can be sure it'll thrash, you just can't be sure how
*much* it'll thrash because of race conditions between 'chron' jobs
and other system utilties that swap in a system page at
unpredictable times messing up the nice mathematical behaviour of
the threshing for your task.

The test was done with a 4096 x 4096 transpose and 128 MB of
memory allocated to doing it(!).

How many bytes per single element of the matrix? From what you said
earlier, I'm assuming 4 bytes per element, is that correct?
But from what you say now, 4096*4096=16777216 elements,
128MB=128*1024*1024=134217728 bytes total, that's
134217728/16777216=8 bytes per element, twice what I guessed from
your earlier remarks. Please clarify which is the case.
Maybe you mean you have 64 MB to actually hold the array and
another 64 MB to hold a copy of it during merging, merge back and
forth between two blocks of RAM, 64 MB total runs being merged
and 64 MB total larger runs result of merge with each pass?
That makes sense. Please confirm that's what you mean.

But the key question: 128 MB of actual RAM, or 128 MB of virtual memory,
with only some lesser amount of actual RAM available.

On a good run it'll get maybe 13 sec. total, on a bad one it'll
get 22.

Hey, your computer seems to be quite a bit faster than I estimated.
Is that 13 seconds total per merge run, or per complete sort (25
2-way merge passes from single-element merge runs at the start), or
per complete matrix operation (3 complete sorts)?

Since those pi programs I've seen (QuickPi, PiFast, etc.) do not
thrash the hard drive anywhere near this much there's _got_ to be
a way to do this stuff without thrashing, I just can't figure out
what it is!

- Buy enough actual RAM.
- Localize memory use more carefully.
- Make merge runs within your smaller amount of *actual* RAM, then
do an external disk merge using eight separate external drives,
four drives for the separate sets of input merge runs, to do a 4-way
merge, and four drives for artificially distributed set of
output merge runs. These don't have to be 8 dedicated drives
since you're using only a tiny portion of each, but you really do
need completely separate heads which usually requires completely
separate disk drives. How much does a 1 GB drive cost nowadays?
How much discount for 8 of them in group/batch/lot purchase?

Here's the code:
http://www.mediafire.com/download.php?0y3amgzhde5

That requires setting up an account for each person who wants to
view your files, which is absurd!! Please copy your files to
5gbfree where it takes one account to store the files but viewing
is public with no account needed.
.



Relevant Pages

  • DiskBoost 1.0
    ... Speed up hard drive operations with a virtual RAM disk! ... DiskBoost supports both volatile and non-volatile ... RAM drives. ...
    (comp.software.shareware.announce)
  • Re: FFT Multiplication on Disk?
    ... efficient algorithm to do those? ... I want to minimize the amount of disk operations, ... Buy more RAM. ... RPM drives can achieve. ...
    (comp.programming)
  • Re: Win XP takes more space on SATA than IDE
    ... > Accessories, System Tools, Disk CleanUp, More Options, System Restore and ... > remove all but the latest System Restore points? ... >> the RAM which is different in both PCs. ... >>> Are the drives formatted as FAT32 or NTFS? ...
    (microsoft.public.windowsxp.help_and_support)
  • Re: SortMerge avoids thrashing (was: Baileys "two pass" FFT algorithm question)
    ... you had enough RAM to do the whole task in RAM. ... data sequentially from disk, do FFT in RAM, sequentially dump FFT ... you read a whole disk block into RAM, ... I've only got two hard drives, ...
    (comp.programming)
  • Re: file access via read/write vs mmap() interface
    ... Memory] can't support any file access with mmap() whose size is more ... using any virtual memory and you are not using any RAM, ... pool is allocated and the disk page loaded into the RAM. ...
    (comp.unix.programmer)