Re: SortMerge avoids thrashing (was: Bailey's "two pass" FFT algorithm question)



On Oct 24, 12:22 pm, rem6...@xxxxxxxxx (Robert Maas, see http://tinyurl.com/uh3t)
wrote:
From: mike3 <mike4...@xxxxxxxxx>
Alright, I've tried implementing the merge transposition algorithm.

OK, the first question is whether you did it entire in actual RAM,
or did it entirely in virtual RAM with disk as backing store for
pages of RAM, or built merge runs using actual RAM then merged
using disk, or what? It makes a whale of a difference.


I was doing the merge completely on disk.

For a starter as to my overall advice how best to do it, I need to
know how much physical (actual) RAM you have available for data
(discount anything already used for system and application), and
how much virtual memory you are allowed, and how many disk or tape
drives you have available for merge runs.


I have two drives. The matrix data is stored on one. I can't use
the second drive since it's not formatted for Linux use.

But it still accesses the disk very heavily

What does that mean? You explicitly did an external merge, so you
control the disk access, or you tried to use virtual memory to do
the merge and swapping is using the disk? If you did an explicit
external merge, did you put each merge run on a separate drive,
such that each drive is accessing sequentually most of the time,
and all you hear is a slow click click click from each drive as it
moves to the next cylender of that disk? Or did you try to use a
single drive for all data, whereby the disk is constantly seeking
back and forth between the various input runs and the single output
run making a horrible vibrating noise.


I tried using a single drive. That explains it. If I only put the
"rows" I wanted to merge in the merge chunks on different
drives... However like I said, I need the transpose done on
one drive: trying to run the program on the other drive,
NTFS formatted, fails under Linux. I can't do it on two.

some merge passes take like 4x longer than the quickest ones,
with very heavy disk access.

That sounds like you're trying to do the whole job in virtual
memory, with swapping causing that disk activity. Is that correct?
If so, don't do it that way!!!


That's right -- all on disk. However what about the large
merge passes at the end where the size of the data
streams being merged exceeds the size of the RAM
buffers? Even if I were to put the first few passes in RAM,
you still get those other passes after it and that "awful
vibrating noise" as you called it.

I don't think there's any more "textbook" a definition of
"thrashing" than that -- it's wasting time accessing the resource
(the disk), so it's "thrashing" the resource!

Yeah, if you're using virtual memory for a task whose effective
working set is large than your real memory, it'll thrash, every
time, you can be sure it'll thrash, you just can't be sure how
*much* it'll thrash because of race conditions between 'chron' jobs
and other system utilties that swap in a system page at
unpredictable times messing up the nice mathematical behaviour of
the threshing for your task.

The test was done with a 4096 x 4096 transpose and 128 MB of
memory allocated to doing it(!).

How many bytes per single element of the matrix? From what you said
earlier, I'm assuming 4 bytes per element, is that correct?
But from what you say now, 4096*4096=16777216 elements,
128MB=128*1024*1024=134217728 bytes total, that's
134217728/16777216=8 bytes per element, twice what I guessed from
your earlier remarks. Please clarify which is the case.
Maybe you mean you have 64 MB to actually hold the array and
another 64 MB to hold a copy of it during merging, merge back and
forth between two blocks of RAM, 64 MB total runs being merged
and 64 MB total larger runs result of merge with each pass?
That makes sense. Please confirm that's what you mean.


That is correct. I have 64 MB for one buffer, 64 MB for the
merge algorithm's scratchpad.

But the key question: 128 MB of actual RAM, or 128 MB of virtual memory,
with only some lesser amount of actual RAM available.


I've actually got more than 128 MB.

On a good run it'll get maybe 13 sec. total, on a bad one it'll
get 22.

Hey, your computer seems to be quite a bit faster than I estimated.
Is that 13 seconds total per merge run, or per complete sort (25
2-way merge passes from single-element merge runs at the start), or
per complete matrix operation (3 complete sorts)?


That's 13 to 22 seconds total for the entire transposition: 12 2-way
merge
passes, with the first pass doing 2048 merges of pairs of data of
length
4096 elements (corresponding to one row of the matrix), second pass
doing 1024 merges of pairs of data of length 8192 elements, third
doing 512 merges of pairs of data of length 16384 els, etc.

Since those pi programs I've seen (QuickPi, PiFast, etc.) do not
thrash the hard drive anywhere near this much there's _got_ to be
a way to do this stuff without thrashing, I just can't figure out
what it is!

- Buy enough actual RAM.

Why should I spend money when I can optimize the program? Like
I said, Quickpi, etc. do this stuff a lot faster than my program. So
it would seem a waste to just spend lots of money.

Anyway, I just noticed I hadn't had my program set for the entire
maximum amount of physical RAM in my system. Now that I
look at the prog some more, I could have done the first sets of
passes in physical RAM, but wanted to time the disk I/O. In
fact I probably could have done it all in RAM. Yay! I was
using less physical RAM since PiFast, the program I want at
least 30% of the speed of, does a 64 meg pi calculation
with around 200 MB of RAM if I recall correctly. So I wanted
to see if I could make my program use as little of the RAM as
Pifast did. You don't know how much I'd love to have a peek
at the source code of that program! :)

I know, I should've divulged all these details. I guess I wasn't
thinking. Sorry :(

- Localize memory use more carefully.
- Make merge runs within your smaller amount of *actual* RAM, then
do an external disk merge using eight separate external drives,
four drives for the separate sets of input merge runs, to do a 4-way
merge, and four drives for artificially distributed set of
output merge runs. These don't have to be 8 dedicated drives
since you're using only a tiny portion of each, but you really do
need completely separate heads which usually requires completely
separate disk drives. How much does a 1 GB drive cost nowadays?
How much discount for 8 of them in group/batch/lot purchase?


I would not have room in my case for 8 drives nor the controllers
on the mobo to handle it. I'd have to get more IDE controllers (ie.
cards.) Why throw all this money to get lots of hardware when
you can write a better program (Pifast, etc. prove it's possible),
which costs the utterly unbeatable price of $0? It just seems like
such an extreme thing to do. And just to compute pi no less!

Here's the code:
http://www.mediafire.com/download.php?0y3amgzhde5

That requires setting up an account for each person who wants to
view your files, which is absurd!! Please copy your files to
5gbfree where it takes one account to store the files but viewing
is public with no account needed.

WHAT? It downloads just fine for me even when I'm not logged
into my account. You can set the files to "public" access mode
and they will indeed be just that: public.

Ooooh, baby. It looks like I posted a link to the wrong file :(
That's my disk math package, not my transpose thingy. Here:

http://www.mediafire.com/?60c0cx4o6gn

.



Relevant Pages

  • DiskBoost 1.0
    ... Speed up hard drive operations with a virtual RAM disk! ... DiskBoost supports both volatile and non-volatile ... RAM drives. ...
    (comp.software.shareware.announce)
  • Re: FFT Multiplication on Disk?
    ... efficient algorithm to do those? ... I want to minimize the amount of disk operations, ... Buy more RAM. ... RPM drives can achieve. ...
    (comp.programming)
  • Re: Win XP takes more space on SATA than IDE
    ... > Accessories, System Tools, Disk CleanUp, More Options, System Restore and ... > remove all but the latest System Restore points? ... >> the RAM which is different in both PCs. ... >>> Are the drives formatted as FAT32 or NTFS? ...
    (microsoft.public.windowsxp.help_and_support)
  • Re: SortMerge avoids thrashing (was: Baileys "two pass" FFT algorithm question)
    ... OK, the first question is whether you did it entire in actual RAM, ... using disk, or what? ... how much virtual memory you are allowed, and how many disk or tape ... drives you have available for merge runs. ...
    (comp.programming)
  • Re: SortMerge avoids thrashing (was: Baileys "two pass" FFT algorithm question)
    ... you had enough RAM to do the whole task in RAM. ... data sequentially from disk, do FFT in RAM, sequentially dump FFT ... you read a whole disk block into RAM, ... I've only got two hard drives, ...
    (comp.programming)