Re: Updating the SQL key value
- From: Jerry Stuckle <jstucklex@xxxxxxxxxxxxx>
- Date: Sat, 31 Mar 2007 17:46:07 -0500
Toby A Inkster wrote:
Jerry Stuckle wrote:Toby A Inkster wrote:Erwin Moller wrote:True, but you also have to look at performance issues. When searching an index, comparing an int is always faster than comparing a varchar. And comparing a single column is always faster than comparing multiple columns. And the index file itself is smaller.
Example data in table licences:
GPL GNU General Public Licence http://www.gnu...
FDL GNU Free Documentation Licence http://www.gnu...
CC Creative Commons Licence http://www.cre...
Usage is fairly similar, apart from the fact that now, when you look at
the table 'articles' without doing any joins, you can still infer a bit of
information about which licence each article is under, without having to
inner join onto the licences table.
Firstly, I said char, not varchar. char columns are fixed width storage
and consequently a lot faster to search than varchar columns. On a 32-bit
processor, char(4) is particularly fast as it corresponds to the
processor's word length.
True, but unless it's a binary column the search must be case-insensitive.
Also, even if you specify char(4) the field may not end up that way. For instance, in MySQL if you have a mixture of both char and varchar columns, the result is generally all char columns are converted to varchar.
Finally, even if none of the above are true, an int column will almost always be word aligned in memory. A char field - even if it's char(4), is not always word aligned.
Additionally, the PK should not be dependent on data which may change -
i.e. if part of your key was the license (code), what would happen if
they changed the licensing terms?
Firstly, in this example, the licence code is an arbitrary string chosen by
the creator of the data, so it needs to change no more often than an
integer key would.
So why not just use an auto-number field. It seems you're unnecessarily creating an unnecessary bottleneck.
Secondly, it is perfectly acceptable for a primary key to change. A decent
database can even cascade changes to dependent tables for you.
Yes, it's acceptable. But it's not recommended, and good database design would argue against it.
And before you say that this is a waste of space as char(8) takes upNot generally, they don't, because of potential problems. For instance,
eight bytes rather than 4 bytes for an integer, you're second-guessing
the database engine there. Database engines really are dead clever.
if the data is stored as a pointer to the license table, the system
has to do an additional file lookup to fetch the license data. And what
happens if the entry from the license table is altered - or worse yet,
deleted? There is no referential integrity built in here. Deleting an
item from the license table would require all other tables which point
to that entry be updated - that is, the varchar data would have to be
reinserted into each row in every table which pointed to the license
table.
Of course there's referential integrity built-in, via foreign key
constraints. If foreign keys are correctly set up, then depending on
how you configured it, one of the following actions would happen when
you tried to delete a licence from the licences table which was being
used in the articles table:
1. the RDBMS would not allow you to delete it;
2. the articles which used that licence would also be deleted; or
3. the articles which used that licence would have their licence
set to the default licence.
Sure. But you don't have to use referential integrity. And no, databases do not just point to another table, as I indicated. Look at the data in your tables. You will find the char fields.
It's all about cascading.
And in your case, the data would be stored in 4 bytes ("GPL" + 1 byte
length). So it would take the same 4 bytes - but comparisons would
still be slower.
Greater than / less than comparisons would be slower. But (case sensitive)
equal to / not equal to comparisons should be of equivalent speed, as it's
a simple matter of choosing if one 32-bit value is the same as another
32-bit value -- exactly the same process used if they were integers.
Probably the same CPU instructions used.
But char values are not necessarily word aligned, and they aren't necessarily handled as a 32 bit int.
In fact, even if they were - the database would have to make the conscious decision that it is valid, and load the fields on a word boundary. Both take additional time over a simple int field.
That's true. But you're also taking up more storage space and slowing
down searches.
More storage space on, say, the articles table, for storing the user's
login in a column instead of an integer; but less space in the user table,
as there's no need to store an extra integer column.
And the result is more space required. And every additional table requires that much extra space.
Slower to perform a join between the article and user tables, but fewer
occasions when you'll need to perform a join, as you have some useful user
information (the login) in the article table already.
Even searching for the row identified by the PK will be slower.
On one of my current projects, I've got twelve tables, only one ofI can imagine. Using MAX() like this can cause concurrency problems.
which has a surrogate integer primary key (technically it doesn't
autonumber, but I use MAX() to simulate autonumbering when creating a
new record). Guess which table is causing me the most problems?
That's why RDB designers came out with the auto-numbering columns.
Not heard of transactions?
Sure. And in this case transactions cause their own problems. You need at least 4 calls to the database:
START TRANSACTION
SELECT MAX(colId)+1 FROM mytable
INSERT INTO mytable ...
COMMIT
Significantly slower than a single call which returns an auto-numbered column. And worse yet, it will tie up the table for additional changes until the COMMIT. Very bad for concurrency.
Better would have been to design my table so that it had a primary keyBut then your comment table wouldn't point to it anyway if you use (url,
like (url, revision).
revision). Or, if you just use url, your comment table would be
pointing at two different entries - which is not good foreign key
design.
It would be pointing to a partial key. Which I think is against 4NF or
5NF, but I mostly ignore those two as I've found them of very little
practical use.
It's against foreign key constraints, also. The referenced column(s) must be either the primary key or have a unique index. Non-unique entries are not allowed.
Anyhow, it's mostly a matter of taste. I was being somewhat tongue inHere, I agree. But I tend to lean more towards the performance side.
cheek when describing surrogate keys as "the root of all evil", but I
can't stand to see a good candidate key go to waste.
Design the database with efficiency in mind. That's why all databases
aren't 5NF form.
I tend to err on the side of my own sanity. As far as performance is
concerned, I'm of the opinion that it's premature optimisation, and that
the RDBMS programmers will have already done a much better job than I
would have done.
It depends on the database and your needs. For something running 10K hits a on a database with 50K rows, this is fine.
But I've worked on some databases with hundreds of millions of rows (not all one table, of course) with size well into the terabyte region. Number of requests range upwards of 10's of thousands per second.
Performance here is important. And I carry these same techniques down to smaller databases. I don't consider it 'premature optimization' to use good rdb design techniques - including 3NF.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@xxxxxxxxxxxxx
==================
.
- Follow-Ups:
- Re: Updating the SQL key value
- From: Toby A Inkster
- Re: Updating the SQL key value
- From: Gordon Burditt
- Re: Updating the SQL key value
- Prev by Date: session value and radio button
- Next by Date: Re: session value and radio button
- Previous by thread: Re: Updating the SQL key value
- Next by thread: Re: Updating the SQL key value
- Index(es):