Re: Updating the SQL key value
- From: Erwin Moller <since_humans_read_this_I_am_spammed_too_much@xxxxxxxxxxxxxxxx>
- Date: Fri, 30 Mar 2007 19:25:47 +0200
Toby A Inkster wrote:
Erwin Moller wrote:
Why picking 'natural candidates' if you can make it work ALWAYS with a
simple autonumbering PK?
Hi Toby,
As someone who normally spends quite some time refining my database schema
before creating the database, by the time I've come to pick a primary key,
I've normally already decided on one or two UNIQUE constraints, so it's a
simple matter of deciding which of the UNIQUE constraints is fit to be the
primary key.
Besides, there are other techniques that can be said to "always work". For
example, the creation of a varchar column "code" as a primary key. It's
still a surrogate key really, but it can take on more meaning than an
auto-numbered surrogate.
For example, in one of my current projects, I need to store several
articles, each of which must be issued under a particular licence (e.g.
GPL, FDL, Creatice Commons). The "auto-number everything" solution would
be:
======================================
table: articles
--------------------------------------
article_id integer, autonumbered
title varchar
body varchar
licence integer
======================================
======================================
table: licences
--------------------------------------
licence_id integer
licence_name varchar
licence_link varchar
======================================
Example data in table licences:
1 GNU General Public Licence http://www.gnu...
2 GNU Free Documentation Licence http://www.gnu...
3 Creative Commons Licence http://www.cre...
Using a manually-named varchar surrogate primary key, you could have
======================================
table: articles
--------------------------------------
article_id integer, autonumbered
title varchar
body varchar
licence char(8)
======================================
======================================
table: licences
--------------------------------------
licence_code char(8)
licence_name varchar
licence_link varchar
======================================
Example data in table licences:
GPL GNU General Public Licence http://www.gnu...
FDL GNU Free Documentation Licence http://www.gnu...
CC Creative Commons Licence http://www.cre...
Usage is fairly similar, apart from the fact that now, when you look at
the table 'articles' without doing any joins, you can still infer a bit of
information about which licence each article is under, without having to
inner join onto the licences table.
This isn't *always* a good approach, but it's often a lot better than an
auto-numbered key.
OK, clear example.
This has the advantage you have more descriptive PKs.
And before you say that this is a waste of space as char(8) takes up eight
bytes rather than 4 bytes for an integer, you're second-guessing the
database engine there.
I won't say that. :-)
I never care too much about a byte or two extra if it increases readability
or structure, etc.
Database engines really are dead clever. Most will
only store the full char(8) string in the licences table (likely to be
quite small compared to the articles table), and when storing the licence
column of the articles table will actually use a pointer back to the same
string data from the licence table -- very fast. Database engines really
are dead clever. (And as far as sorting is concerned, a short, indexed
char field is just as fast as an integer.)
Yes, B-tree index lookups (and the like) are extremely fast.
What if your database must be upgraded and the logic changes?
Do you want to check all the columns again to be sure the PK still makes
sense? (Or watch it fail in a production environment when the UNIQUE
constraint is hit you didn't see coming beforehand)
When the logic changes in some major way, you're probably going to need to
make adjustments to several tables anyway. I don't see this as a major
problem.
Besides which, it's often quite easy to choose a column that will always
be unique. For example for a table of users, instead of:
user_id auto_increment (primary key)
login varchar
password varchar
realname varchar
email_address varchar
All you need is:
login varchar (primary key)
password varchar
realname varchar
email_address varchar
Whatsmore, say you then have another table which has a column that has a
foreign key for your user table, looking down that column you don't see a
bunch of numbers like "12, 14, 71, 14" -- you see "brian, dave, greg,
dave".
Well, a simple join solves that.
Logic changing is never going to be a problem -- you're never going to
have two users with the same login name.
Well, you gave me excactly an example where your approach is weaker than
using autonumbering PKs, I think.
Consider this example:
1) Your tbluser as described above by you.
2) a collection of articles written by that user
CREATE TABLE tblarticle(
articleid serial PRIMARY KEY,
writtenby login REFERENCES tbluser(login),
content VARCHAR(10000)
)
I made a FK as you see.
Now what happens if that user wants a new loginname?
You are forced to update this new loginname everwhere in your database where
you use it.
Or do you use cascading?
(I dislike cascading though all FK constraints, but it is of course
possible)
I have been using autonumbering PK my whole programming carrier, and
never had any problems with it.
No doubt -- the problem occurs when people blindly use surrogate keys
without thinking about whether the rest of their columns could be keys --
which I'm sure you'd never do Erwin! :-)
;-)
By doing that, they put less thought into UNIQUE constraints, and end up
with lots of unwanted rows which would be duplicates, except for their
primary key. If they hadn't added that extra primary key number, then they
wouldn't have a database full of duplicate values.
Well, that is normalization.
The more databases you design, the better you get at grouping stuff and
making entities.
I never store a piece of information double.
Almost never, unless the joining will take that much calculation that I
think it is better to denormalize a little.
Who seriously cares about the few extra bytes needed?
Not me.
I am not alone with that thought.
Postgres even makes an OID for each row, something you don't even see but
can use if you want.
It's perfectly easy to disable OIDs in PostgreSQL on a case-by-case basis
or permanently. Most of the time, I disable OIDs.
On one of my current projects, I've got twelve tables, only one of which
has a surrogate integer primary key (technically it doesn't autonumber,
but I use MAX() to simulate autonumbering when creating a new record).
What happened to good old nextval()?
Guess which table is causing me the most problems?
The other ones? ;-)
It's a table of articles. A table of comments references the articles via
its numeric key. Now what happens if I write an updated version of an
article, but want to keep a copy of the old one? I mark a status flag on
the old one to hide it, then create another article with the same URL but
a different ID number. Unfortunately the comments still point to the old
ID number, so are not seen when you visit the new page.
Better would have been to design my table so that it had a primary key
like (url, revision).
Yes, that would be a nice solution.
I would typically do something like this:
CREATE TABLE revisions(
revisionid SERIAL PRIMARY KEY,
content VARCHAR(10000),
createdate timestamp
)
CREATE TABLE tblarticle(
articleid SERIAL PRIMARY KEY,
url varchar(1000),
activerevisionid integer REFERENCES tblrevisions(revisionid)
)
CREATE TABLE comments(
commentid SERIAL PRIMARY KEY,
comment VARCHAR(1000),
articleid integer REFERENCES tblarticle(articleid)
)
So you have your comments pointing at the article, and not the revision.
But, as you say, this is a matter of taste.
Anyhow, it's mostly a matter of taste. I was being somewhat tongue in
cheek when describing surrogate keys as "the root of all evil", but I
can't stand to see a good candidate key go to waste.
:-)
Clear.
I liked this discussion.
Once I tauch myself to work with autonumbering PK and good normalization and
FKs, I never gave the Primary Key much second thought.
Always good to hear other opinions.
Best of luck!
Regards,
Erwin Moller
.
- References:
- Updating the SQL key value
- From: bill
- Re: Updating the SQL key value
- From: Erwin Moller
- Re: Updating the SQL key value
- From: bill
- Re: Updating the SQL key value
- From: Erwin Moller
- Re: Updating the SQL key value
- From: bill
- Re: Updating the SQL key value
- From: Toby A Inkster
- Re: Updating the SQL key value
- From: Erwin Moller
- Re: Updating the SQL key value
- From: Toby A Inkster
- Updating the SQL key value
- Prev by Date: Re: Php / Java grids .NET grids
- Next by Date: Re: In case you need it....
- Previous by thread: Re: Updating the SQL key value
- Next by thread: Re: Updating the SQL key value
- Index(es):
Relevant Pages
|