Re: Get the First record

From: Tony Marston (tony_at_NOSPAM.demon.co.uk)
Date: 03/03/04


Date: Wed, 3 Mar 2004 11:18:51 -0000


"Geoff Berrow" <blthecat@ckdog.co.uk> wrote in message
news:tfbb40te45clb5887d9q67cn49fv3c0uae@4ax.com...
> I noticed that Message-ID: <c23itq$1ngrqm$1@ID-30799.news.uni-berlin.de>
> from Agelmar contained the following:
>
> >As for ordering records on a disk
> >using a hashing algorithm - I've never seen that before... I don't think
you
> >would ever want to do that, either. Ordering on the attributes that are a
> >function of the hash should have the same effect, and still provide some
> >utility for range predicates. The entire purpose of physically storing
the
> >tuples in a sorted order is so that when you retrieve a set of tuples
with a
> >particular (range of) values, you perform sequential I/O, issuing one
> >request to grab all the blocks, rather than one request per tuple and
> >thrashing the disk.
> >
> >P.s. this is my area of research :-) (DBMSs)
>
> I could tell, the only other person I ever heard call them tuples was my
> university lecturer. I'll have to look out my old notes, but the
> hashing stuff sticks in my memory. Maybe I misunderstood it. I did
> find this on the 'net though:-
>
> File Organization Techniques
> Hashing - Each record is placed in the database at a location whose
> address is computed as some function (hash function) of that record
> (hash field). To store the record, the DBMS computes the hash address
> for the new record and instructs the file manager to place the record at
> that position

I worked on a database system many years ago on the HP3000 minicomputer
called IMAGE (later called TurboIMAGE) which used a hashing algorithm to
locate records in a table. This algorithm used only the primary key, not the
whole record, and produced a record number or address. As each record in a
table was fixed to the same length (variable length fields were not
supported, and empty fields were filled with null values) each record number
could be used to work out a physical address on the disk. A capacity was
defined for each table so that it could assign the space necessary to hold
that number of records. The capacity value could be changed, but this meant
rebuilding the table.

There were two approaches in this hashing algorithm:

(a) If the primary key was numeric and the key value was less than the
capacity of the table then the key values was used as the address.
Consequently each record had a fixed location in the table, and a sequential
read would retrieve the records in primary key sequence.

(b) If the primary key was numeric but greater in value than the capacity of
the table, or the primary key was not numeric, then the hashing algorithm
would produce an address somewhere else in the table space, and this address
would be used instead of the primary key value. The output from the hashing
algorithm was known as the "primary address".

(c) It was possible that different primary key values could hash to the same
"primary address", in which case the first record at that address was
allowed to stay at that address, but all others were moved to the nearest
available empty slots, known as a "secondary address". A chain of pointers
to these "secondary addresses" was maintained at the primary address.

(d) When performing a lookup using a primary key the hashing algorithm would
compute a primary address, and the DBMS would read the record at that
address. If the primary key of that record did not match the given primary
key then the DBMS would read down the chain of secondaries until it found
it, or would return "key not found".

(e) A problem arose when inserting a record whose primary key hashed to an
address which was already taken up by a "secondary". In this case the
secondary record would have to be moved to a different empty slot so that
the space could be used by the primary record, and the chain of secondary
addresses would have to be updated to reflect this movement. The insertion
of records could therefore be slowed down by this movement of secondary
records, which was given the term "migrating secondaries".

The whole point of this is to say that unless you know the inner workings of
your DBMS you cannot predict where each record will be written in the table
space, therefore you cannot predict the order in which records will be
presented to you when doing a sequential read. The only thing that you can
predict is that, barring deletions and insertions, the order will be exactly
the same whenever you perform another sequential read.

Here endeth the lesson.

-- 
Tony Marston
http://www.tonymarston.net


Relevant Pages

  • Is an URL a good primary key? (or what else?)
    ... I'll have tables to store sites attributes as well as specific pages ... as primary key, for the reason that they can be very long. ... So I tried with a surrogate key, the sha1hash of the URL. ... For the sake of completeness I'm designing for MySQL 5.1, ...
    (comp.databases.theory)
  • Re: storing byte values
    ... constraint which includes the column with the byte values? ... RAW column in a table (don't confuse RAW with LONG RAW, ... won't work unless you use large block size and a composite primary key ... and then hash the data using SHA-1 ...
    (comp.databases.oracle.misc)
  • Re: storing byte values
    ... RAW column in a table (don't confuse RAW with LONG RAW, ... won't work unless you use large block size and a composite primary key ... ensure uniqueness of your binary data. ... and then hash the data using SHA-1 ...
    (comp.databases.oracle.misc)
  • Re: SQL TOP 50,000 Help
    ... A lot of the time, people use AutoNumber as their primary key, and the ... the column used to generate the hash number contains many equal ... to ensure equal distribution of values. ... Of course, there might be a few duplicate numbers generated, ...
    (microsoft.public.access.formscoding)
  • Re: Hashing article title, will it always be unique?
    ... Your table has two fields, the primary key, an autonumber automatically generated, and the second field, a nvarcharwith an index unique but with ignore_dup_key on. ... INSERT INTO hashTableSELECT titles FROM otherTable ... That returns the long integer value that satisfy your definition, even if it is not a hash value 'per se'. ...
    (microsoft.public.dotnet.languages.csharp)