Re: Nearest Common Ancestor Report (XDb1's $1000 Challenge)

From: Hugo Kornelis (hugo_at_pe_NO_rFact.in_SPAM_fo)
Date: 05/22/04


Date: Sat, 22 May 2004 01:38:01 +0200

On 20 May 2004 14:22:24 -0700, Neo wrote:

Hi Neo,

>> > > This indicates an elapsed time of less than 3 ms ...
>> >
>> > Among other things, the difference in normalization between the
>> > implementations is quite different.
>> > In an initial attempt to make them more comparable...
>>
>> Now you're changing the rules after the competition has started.
>> Not a sign of good sportsmanship, Neo!
>
>The provided implementation was not similarly generic

Yes, I have to admit that my implementation was not "similarly" generic,
but even MORE generic than the XDb1 implementation, as I didn't limit the
possible hierarchies to a few predefined hierarchy types (as XDb1 does),
but to allow ANY hierarchies (as required in the challenge).

> or normalized.

This argument is getting stale. In another message, I quoted Date's
definition of normal forms, as well as a description of Domain Key Normal
Form (DKNF), said to be <<(theoretically, at least) the "ultimate normal
form">>. Instead of pointing out which of theses normal forms you think my
implementation violates, your message only repeats uninteresting stuff
about XDb1's internals, followed by an unbacked claim that there is
redundancy in my table and that it's not normalized.

Backup your claim, if you can. But please leave XDb1 out of it - the
challenge was to use the relational model, not XDb1's model.

>To make a very rough comparision, I simplified my implementation to
>that which was provided.

Are you referring to this quote from another message you wrote: "I
modified XDb1 algorithm so as to not resolve things' names and simply
prints their IDs."

If so, I don't accept this simplification. The challenge was to create a
report (that contains names) from some input data (that contains names).
There were no IDs in either the input or the report. If there are IDs
under the cover is irrelevant.

If you want to start a new challenge to create a report showing only IDs
from input data containing names, by all means go ahead. Heck, I might
even try and modify my winning entry for this competition to see if I can
rid you of another $1000.

> In general, XDb1 can't compete with RM in the
>scope of a single table's worth of data since XDb1 has the overhead of
>variable storage structures for each table cell. An yes, IDs are
>faster than text so the comparision is far from perfect. XDb1 should
>gain advantage as more tables are joined. The provided solution
>doesn't join any tables since it is less generic and unnormalized.

The provided solution is not less, but even more generic. And it is
completely normalized.

The reason that it doesn't join any tables is that I made proper use of my
RDBMS' tools, instead of the brainless automatic insertion of an
autonumber/identity/guid table_ID column in every table that so many of my
fellow RDBMS developers fall victim to.

>As I said earlier, in the simplfied case, it takes XDb1 2 or 3 ms on a
>500 Mhz, 512MB Dell PowerEdge Server.

As I said earlier, this is irrelevant to the challenge. In the original
form, XDb1 took 16 ms on your machine. Interesting, it also took 16 ms on
my machine (AMD Athlon 1.3 GHz, 256MB)

> I ran the provided SQL Server
>script on the same machine and the report generation takes 65 ms
>(based on the avg of the 3 runs below):
>
>Starting time Ending time Time Elapsed
>------------- ------------ ------------
>14:44:57.670 14:44:57.733 67 ms
>15:02:26.233 15:02:26.297 64 ms
>15:07:57.780 15:07:57.843 63 ms
>
>What might I be doing wrong to get results different than yours?

I assume that you copied and pasted my script straight into Query Analyzer
and executed it without any modification.

It's hard to diagnose the cause of this difference without access to your
computer. But I did think of one thing that might cause this difference.

When I installed SQL Server, I chose the collation "Latin1_General_BIN" as
deafult. That means that character data comparisons are done by straight
comparing of ASCII values. This is the quickest, but it has the
disadvantage that 'john', 'John' and 'JOHN' are treated as three distinct
values. If I wanted to, I could have a person named 'john' and a dog named
'John'.

I don't know what the default collation is, but I do know that it is one
of the case insensitive collations. One that treats 'john', 'John' and
'JOHN' as equal. This will of course introduce some overhead for string
comparisons - which are used quite extensively in my implementation.

Since this is the default collation, I assume that your database will use
case insensitive comparisons. There is a simple way to find out: change
one of the occurences of 'fido' in the insert statements for hierarchies
to 'Fido'. I accept that your database will accept this, showing that it
uses a case sensitive collation. If it is not accepted, your databse uses
a case insensitive collation; it might be a binary collation but it need
not - it might still be accent insensitive or even double-byte character
set!

Another possible explanation might be the other activity on the machine
that runs the SQL Server process. I have SQL Server on my desktop; when I
timed the queries I first shut down all other applications. (SQL Server is
a "friendly" process - it will grab extra memory or processor time if it
needs more, but only if that won't hinder other processes. Reducing other
processes will generally improve SQL Server's performance). Earlier
testing (with several other applications active) resulted in elapsed times
between 10 and 20 ms.

If you have SQL Server running on another server, make sure that server is
dedicated to SQL Server only, if you want the best performance. If you
have SQL Server running on your desktop, close all applications except
Query Analyzer when running this query.

Best, Hugo

-- 
(Remove _NO_ and _SPAM_ to get my e-mail address)


Relevant Pages

  • Re: selecting case sensitive values from MSAccess or SQL Server
    ... but with SQL Server you can specify a collation. ... the performance will most likely be slower when specifying the collation, so if you know the collation of the server you're hitting, to improve performance you may want to leave off the collation, if it suits your query. ... I have a bit of code which selects data from either a SQL Server or MS ... jOhN ...
    (microsoft.public.fox.programmer.exchange)
  • Re: View Type of Protocol
    ... Thanks a lot for the feedback John. ... information about the type of authentication being used. ... some reason Kerberos is not available. ... they are not authenticated and the SQL Server login fails. ...
    (microsoft.public.dotnet.framework.aspnet)
  • Re: Installing SQL_Latin1_General_CP1_CI_AS collation order??
    ... Tibor Karaszi, SQL Server MVP ... "Ward Horsfall" wrote in message ... >> rebuildm.exe) interfaces and a collation designator, ...
    (microsoft.public.sqlserver.setup)
  • Re: Russian Language support in SQL SERVER
    ... SQL Server supports languages on a couple of levels. ... specifying a collation defines what language rules are used for ... Cyrillic_general Windows collation that supports the Russian language. ...
    (microsoft.public.sqlserver.server)
  • Re: update index in background not running.
    ... John, ... event log that SQL Server 2000 FT Indexing success & failures are recorded. ... exec sp_defaultdb N'NT Authority\System', N'master' ...
    (microsoft.public.sqlserver.fulltext)