Re: What gives data meaning?
- From: "topmind" <topmind@xxxxxxxxxxxxxxxx>
- Date: 4 Apr 2007 14:35:27 -0700
frebe wrote:
But to the point, if a program was able to store improperly-formatted
zipcode inside the DB then whose fault is that?
what's an improperly formatted zipcode? In the US, you have 5 digits,
in the netherlands you have 4 digits and 2 characters. A zipcode of
1234 AA is properly formatted for a dutch user of the application, but
not correctly formatted for the US user of the program. Hence: context.
This thus means that if the db stores '1234AA', it can do so, and the
dutch user will happily use it. The US user can't because for the US
user it's just data, 1234 and 2 characters, it's not information
(zipcode).
If you think about it, neither zip code means anything to the database.
They're simply characters in a field.
With a proper type system or check contraints, a RDBMS could guarantee
that no invalid zip codes are stored into the database. If we want to
store zip codes for multiple countries, one solution is to have
different relations for different countries with different domains/
types for the zip code attributes.
address(id, country, street, ...)
dutch_address(id, country, dutch_zipcode)
us_address(id, country, us_zipcode)
I personally would take a different approach, partly because I have
more of a dynamic-typing (or type-free) preference than frebe.
I would have a general Address or Contact table, rather than one per
country, with all columns that any country would need. Validation
could be done by a trigger. A validation table could look something
like this:
table: contactColumnValidation
-------
countryRef // foreign key to country code
columnRef // column name (such as "postal_code")
isRequired // Boolean
regEx // regular expression used to validate field
function // function name or reference for any further validation
formatNote // optional note or hint to help data entry clerk
The function name (or even a code snippet stored in the DB) points to
an optional function that can be used for validation when regular
expressions are not capable of it by themselves. (Regular expressions
hopefully handle most cases.)
Whether existing trigger implementations can handle this or not, I
don't know. But this would be the ideal IMO. If not, then at least
these tables can be used on the app side to perform validation. In
practice there is usually only one Address Entry Screen anyhow, since
maintaining multiple that do the same thing is usually not worth it.
Thus, there is usually only one "entry point" for address changes/
additions anyhow.
The country columns in duch_address and us_address should be constants
(enforced by a check constraint), and there should be two foreign keys
between dutch_address/us_address (id, country) referencing address(id,
country). Doing this, it would not be possible to have both a dutch
and us address. The format of dutch_zipcode and us_zipcode could be
enforced by a check constraint using regular expressions, or by
defining a custom type.
To make it simpler to use, we would create a view like this:
create view address_all
select id, country, street, dutch_zipcode as zip
from address a join dutch_address da on da.id=a.id
union
select id, country, street, us_zipcode
from address a join us_address ua on ua.id=a.id
So that takes us back to whether or not the database is correct beyond
its integrity checks. If it stores exactly what it should store then it
is correct. An incorrect or improperly formatted zip code isn't the
database's responsibility if that's what a human or application told it
to store.
A RDBMS is capable of enforcing such constraints. We want all data in
the database to be valid.
This reinforces the DB can be responsible for structural, type, and
referential integrity, but it can not give meaning to its data.
A RDBMS can give meaning to its data in the same way as an application
can give "meaning" to its data.
So this takes us back to our responsibility as programmers and designers
to guard our database's integrity.
I am afraid that you say that "programmers" should guard the database
because you want to use stored procedures as proxies to the database.
The data integrity is enforced by primary and foreign key constraints,
check contraints and triggers, in that preferred order. Hiding the
database behind procedures is almost never necessary.
/Fredrik
-T-
.
- References:
- Re: What gives data meaning?
- From: frebe
- Re: What gives data meaning?
- Prev by Date: Re: What gives data meaning?
- Next by Date: Re: What gives data meaning?
- Previous by thread: Re: What gives data meaning?
- Index(es):
Relevant Pages
|