Re: SortMerge avoids thrashing (was: Bailey's "two pass" FFT algorithm question)



Alert: For weeks, right up through Friday aftermoon, Google Groups
was showing search results only through Oct.13, so I had no way to
find more recent articles that mentionned my name, such as
followups to stuff I previously posted. Just Friday night GG
finally enabled me to find:
Date: Wed, 24 Oct 2007 14:35:00 -0700
From: mike3 <mike4...@xxxxxxxxx>
Since Friday I've been trying to catch up with backlog.
I'm less than a week behind at the moment.
I was doing the merge completely on disk.

Then I'm confused. From what you said elsewhere, it sounded like
you had enough RAM to do the whole task in RAM. Just load original
data sequentially from disk, do FFT in RAM, sequentially dump FFT
result to disk. So why would you do it any other way?

I have two drives. The matrix data is stored on one. I can't use
the second drive since it's not formatted for Linux use.

If your one usable drive has only a single seek mechanism (a single
set of heads that move all as a single unit), then you don't want
to do disk-to-disk merge (unless the whole dataset and all
temporary files reside in a single cylinder), because even if you
are reading each input file sequentially, and writing the output
file sequentially, you are jumping back and forth between the
various cylinders that are active for your current read/write
points in those three files at any one time.

Do you have a disk partition that you can completely empty of all
files then put your dataset there, to assure that it occupies
contiguous cylinders from the start of the partition, not
intermixed with any old files still scattered around that
partition?

what about the large merge passes at the end where the size of
the data streams being merged exceeds the size of the ram buffers?

(Assuming you have at least three separate drives, preferably four:)
When you are merging, logically you need only one record from each
input stream in RAM, and then the smallest record gets written to
output stream and a new input record from that input stream is read
in to replace it. If records are smaller than disk blocks, then
you read a whole disk block into RAM, extract records from it until
it's exhausted, then replace that disk block in RAM from the next
block on the disk. For overlapped I/O, you use double buffering,
where you have the currently-being-processed input block from each
input file in RAM, and also you have a spare input block in RAM
being loaded in background from the next disk block so it'll be all
finished loading and sitting in RAM by the time you are ready to
start extracting records from it. So you need enough RAM for six
disk blocks (2+2 input and 2 output) plus two logical records plus
pointers/indexes/etc.

In fact I probably could have done it all in RAM.

In that case, you just load the input data once, do everything in
RAM, then write the output data once, so you only need one disk
drive. (For safety you write output as a new file, not overwriting
input file, in case an error crashes your system in the middle of
writing output. But you knew that, right?)

So you need to optimize paging between fast cache and RAM, whereby
for each merge pass the input files in RAM are swept just once
each, and output is written back into RAM sequentially, with your
working set of RAM pages small enough to fit into fast cache.
The logic is the same except you don't need multiple RAM drives. :-)
(Although if you *do* have multiple RAM devices, such that you can
read from one during the same clock cycle as writing to another,
that does speed up the algorithm by nearly a factor of
approximately 2.)

Ooooh, baby. It looks like I posted a link to the wrong file :(
That's my disk math package, not my transpose thingy. Here:
http://www.mediafire.com/?60c0cx4o6gn

I get a similar problem as with the other URL. I get a Web page
that looks like this, and I have no idea what to do next:
Processing..

Create a Free Account | Login
|Logout

Free File Hosting Made Simple

free file hosting
* Upload Files
* My Files
* My Account

[whatsnew.gif]

X
Please enter your email address and password to login to your account:

Email Address:
____________________
Password:
____________________
[_] Remember me on this computer

Login to MediaFire
Forgot your password?
X
Create a free account to easily manage your uploaded files:

Email Address:
____________________
Password: (minimum 5 characters)
____________________
Confirm Password:
____________________
We respect your privacy and will not spam, sell, or share your email
address

Create a Free Account
Tell us what you think

Root Folder Explanation

Add Text


You requested mergexpose.c (4.8 KB)
Browse public files
[facebook2.gif]
MediaFire App is now available on Facebook!
Share all your MediaFire files with your Facebook friends
[ajax-loader-small-whitebg.gif]
- Password Protected File -
Please enter the password below to download this file
____________________
[BUTTON]
Preparing download...
This file has been virus scanned for your protection
Scan your own computer for viruses


[upload_share.gif]
* Info
* Email
* Share by IM
* Embed




[copy_button_small.gif] HTML Embed Code
____________________
[copy_button_small.gif] Sharing URL
____________________
click "Embed in Website" tab for more options

Report this file
This form is for reporting files that abuse MediaFire's Terms of
Service or Acceptable Use Policy.
Please enter the reason for reporting this file:

______________________________________________________________________
______________________________________________________________________
______________________________________________________________________
______________________________________________________________________

Report File

Contact
About
FAQs
Blog
Tell a Friend
Link to MediaFire
Bookmark
Support
Save to del.icio.us

Copyright <A9> 2007 MediaFire. All rights reserved.

Acceptable Use Policy | Terms of Service | Privacy Policy

IFRAME: userwork

IFRAME: emailwork
.


Quantcast