Re: New benchmarks (was Re: SortMerge avoids thrashing (was: Bailey's "two pass" FFT algorithm question))
- From: moi <root@xxxxxxxxxxxxxxxxxxx>
- Date: Sat, 03 Nov 2007 10:46:35 +0100
On Sat, 03 Nov 2007 00:59:54 -0700, mike3 wrote:
On Oct 31, 3:40 am, moi <r...@xxxxxxxxxxxxxxxxxxx> wrote:
On Wed, 31 Oct 2007 01:27:15 -0700, Robert Maas, see
http://tinyurl.com/uh3twrote:
[good stuff snipped]
Also I mentionned the sort-merge algorithm as a general took that
frequently works for rearranging lots of data. It would be reasonably
fast for your task. It's a good thing to learn how to do efficiently
in any case. But somebody else suggested a completely different
algorithm which is speciic to this task performed on square matrices,
and that sounded like it might be faster than my more general method.
So IMO you
Yes, that someone was me. I ran some tests, and my mmap() +
tile-flipping approach is about 10 times faster then Mike's
disk-exerciser. Basically because it does 10 times fewer disk-I/O.
Given your large amount of RAM and your smaller but still substantial
fast cache, it sounds like you can do the following: Read input from
one disk drive, sort en route using RAM and fast cache, write to
second disk drive, so each disk drive is running sequentially, no
thrashing there, and your in-RAM/cache algorithm was hopefully not
thrashing the fast cache either.
It is hard (and silly, IMHO) to tune cache effects for a program that
spends about .12 sec user + .24 sec sys CPU in some 3..30 sec walltime.
The CPU is idling anyway.
Note that the system's LRU caching effectively results in double
buffering. While Mike is juggling his tiny buffers, the system has
probably all of the file cached into system buffers. Constantly
hammering the LRU with writes will probably cause some 'write through'
to disk by the system.
AvK
Alright, I gave yours a try, to compare it to my so-called "disk
exerciser", using the code you provided. On my system, I got the
following timings, with a system having 768 MB of physical memory:
1024x1024 (4 MB):
real 0m0.035s
user 0m0.012s
sys 0m0.020s
2048x2048 (16 MB):
real 0m0.126s
user 0m0.064s
sys 0m0.060s
4096x4096 (64 MB):
real 0m0.552s
user 0m0.244s
sys 0m0.248s
8192x8192 (256 MB):
real 0m2.334s
user 0m0.964s
sys 0m0.936s
16384x16384 (1024 MB -- now too big to fit in physical memory): real
4m16.226s
user 0m3.284s
sys 0m4.664s
Now, the question is, why the huge hike on the last one (the ratio of
CPU to wall time is a whopping factor of 50!)? The machine has 768 MB of
RAM, and 400 or more is often free, so it should be able to get a
I hope it is not "free" but dedicated to the disk-buffer cache.
significant piece of the data it's working on into memory, shouldn't it?No. that is not how it works. Pages are read into memory only when needed
('faulted in'). Since every page is needed exactly once, you need exactly
N*N reads and writes to disk.
But for some reason, the hard drive is being thrashed horribly angrily!
Why? It's like once it can't get it ALL into RAM, any performance gains
just go out the window and it turns into a disk hog and the drive
No you got confused.
By first writing the file from within the same program,
you actually primed the systems buffercache. If everything fits into
buffercache, it will still be there once you start transposing.
If the matrix is bigger than available bufferspace, the first (oldest)
part will already have been pushed out once you start accessing it.
When you access the first block (or earlier, or later) , the system needs
to free a buffer for it, kicking out an older block. Once you need that,
it will also have vanished. Perfectly normal.
thrashes like a fish that has just been plucked from the sea -- gives
that disk a workout like running a marathon! What gives? How does it do
on your system? Maybe my computer has a problem?
No, that's the sound of a 1GB file being accessed.
This is the testing routine that I hooked up to your snippet (I changed
float to modint since that's how my pi program works):
They are the same size, so that should work ok.
AvK
.
- Follow-Ups:
- References:
- Prev by Date: Re: SortMerge avoids thrashing (was: Bailey's "two pass" FFT algorithm question)
- Next by Date: Re: SortMerge avoids thrashing (was: Bailey's "two pass" FFT algorithm question)
- Previous by thread: New benchmarks (was Re: SortMerge avoids thrashing (was: Bailey's "two pass" FFT algorithm question))
- Next by thread: Re: New benchmarks (was Re: SortMerge avoids thrashing (was: Bailey's "two pass" FFT algorithm question))
- Index(es):
Relevant Pages
|