Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)

From: Hugo Kornelis (hugo_at_pe_NO_rFact.in_SPAM_fo)
Date: 05/24/04


Date: Mon, 24 May 2004 02:09:47 +0200

On Sun, 23 May 2004 00:13:06 GMT, Nick Landsberg wrote:

>
>[ Everything Snipped ]
>
>OK folks, I have found this discussion fascinating
>and educational. However I would like to add
>one more item into the equation, and it is not
>theory (although I notice that c.d.theory is
>in the distribution list) ---
>
>Given the possible implementations, how long
>would this query run on a 1 million record
>database? (That's actually small by today's standards)
>And then for a 10 million record database.
>
>Pick your own hardware, e.g. raid arrays,
>2.5 GHZ CPU's etc., but tell me how long
>this query would run, please.
>
>Theory is nice, but, in my experience,
>performance is what sells systems to
>customers.

Hi Nick,

Fair question. In between answering Neo's latest messages, I was
constantly switching to my Query Analyzer window. I've been busy writing
out a script that will generate millions of rows in the things and add
random dependencies in the "leader" hierarchy, so I could test some
things.

Some disclaimers to start with:
1. I think I'm pretty good at designing tables and writing SQL. But I have
second to none experience in tuning heavy-duty systems. I can (and
probably will) tweak some things here and there and gain some performance,
but it's not what I do best.
2. I don't have the money to spare on the kind of hardware that would be
present in a shop that runs this kind of queries on 1 million record
databases. I'll do some tests with large numbers, but all I can offer is a
Desktop PC, powered with one 1.3GHz AMD Athlon, 256 MB of RAM and equipped
with two 40GB 7200rpm IDE harddisks (I have the log on one of the HDs and
the data on the other - for this kind of queries, I'd sure like to plant
tempdb on a third HD).

With only this computer to my disposal, I'll have to trim down the number
of rows considerably. It's not that my hardware can't handle a table with
1 million (or even 10 million) rows - but while waiting for the results of
my preliminary tests (using "only" roughly 0.5 million things), it
suddenly occured to me that the report would be based on the Carthesian
product of the things table with itself - almost 250,000,000,000 rows for
my preliminary test, more than 121,000,000,000,000 rows if I would run the
complete test set I prepared (sporting over 11 million things). Just
sending the report to the screen would probably take days!! And I'd need
to add some really beefy hardware to store it.

I just saw that it's past 2AM already, so I'll have to get back to you
later with test results from a "slightly" smaller test set.

Two final notes:
1. I did already catch Neo's reply to you. I'll still try to run the tests
I intended, even though Neo probably won't. Just out of curiosity.
2. I expect my implementation to be quite slow when dealing with a large
numbers of rows. In fact, I was quite surprised when my first attempt
already managed to beat XDb1's execution time. Some time ago, I saw some
messages by Joe Celko in another newsgroup about an alternate way to store
trees. I don't recall the details, but from what I still know, I expect it
to be lots (and lots and lots) faster.
After finishing the tests on my own implementation, I might go try and
find his method, see what performance that one will yield. Just out of
curiosity.

Best, Hugo

-- 
(Remove _NO_ and _SPAM_ to get my e-mail address)


Relevant Pages

  • Re: Nearest Common Ancestor Report (XDb1s $1000 Challenge)
    ... Hugo Kornelis wrote: ... >>And then for a 10 million record database. ... >>this query would run, please. ... I don't have the money to spare on the kind of hardware that would be ...
    (comp.object)
  • Re: How much time does a query take?
    ... how much time goes by until a query to a database is ... if you use the same hardware as i and the same queries, ...
    (microsoft.public.sqlserver.msde)
  • Re: reactivation when nothing changed
    ... IF Microsoft's hardware database for each PC were coded ... Now take out the pencil and have that someone record again what is in the pencil cup. ... You think Microsoft should record everything that was ever installed and allow you to remove and add the same devices without tracking those changes? ...
    (microsoft.public.windowsxp.general)
  • Re: Wiki software for genealogy (long)
    ... What types of reports, displays, or analyses than can be generated from such a data source is up to the desktop genealogical programs rather than the underlying database, i.e. there's still room for a competitive edge for the vendors, and for cheaper-and-easier versus expensive-and-powerful differentiations. ... Thus a better data model not only has to ... If you can visualize how a hardware engineer will improve the hardware, or how a software engineer will create a new OS for the new hardware, you're probably not writing programs. ...
    (soc.genealogy.britain)
  • Re: Time being reset every hour.
    ... > notice that the transition times listed by the database reveled several ... > If the time problem occurs - This would eliminate both xntpd and all added ... > isolate hardware or OS issue. ... > hardware, possibly with xntpd. ...
    (comp.unix.solaris)