Re: Relational database & OO
- From: "H. S. Lahman" <h.lahman@xxxxxxxxxxx>
- Date: Sun, 12 Nov 2006 17:50:53 GMT
Responding to Frebe73...
The RDM normalization
can be applied beyond the RDB's table/tuple paradigm.
What is the "RDB's table/tuple paradigm"?
Say, what?!? Are you saying you don't know what an RDB table is or what
a tuple is within the table? Or that the tables, keyed tuples, and
relationships in an RDB represent a specific implementation of the
relational data model?
Tuples are a fundamental part of the relational model. Table is another
word for relation. Relations are also a fundamental part of the
relational model. These two concepts does NOT represent a specific
implementation of the relational model. If tuples or relations does not
exists, relational calculus is not possible, nor normalization.
But no existing production database qualify as a relational database.
That is my some people prefer using the name "SQL database" instead of
RDB.
Which is pretty much my point. RDM and RDB are not synonyms and an RDB is a specific implementation of the RDM (however imperfect some people may view it).
And OO Class Models are routinely normalized as
part of the basic paradigm methodology.
Many class diagrams would break 1NF. I also see a problem with applying
to 2 & 3 NF because the id of the object is not a value itself, but a
pointer. Because object may be easily cloned, I suppose that would
break 2NF.
Actually, 1NF is much more commonly broken in RDBs than in Class Models.
A classic example is a telephone number, which will almost always be
stored in the RDB as a single number but if the elements of the number
(e.g., area code) are important to the problem in hand, they will always
be broken out as distinct attributes in a Class.
What is why you should not save telephone number as a single number.
And when was the last time you saw an RDB that stored a telephone number as individual fields? If the fields are separated they need to be in their own table to avoid 3NF problems. That's because the simple domains are individually dependent solely on the identity of the telephone number, which /is/ the telephone number. When was the last time you saw a table where every field was part of a compound key (i.e., no non-key attributes)? That is inherently inefficient and most DBAs will deliberately denormalize to avoid that inefficiency.
Objects abstract uniquely identifiable problem space entities. An
address in process memory is unique, so that satisfies the mapping.
What mapping?
The mapping of object abstractions to identifiable problem space entities.
It is actually more versatile that the RDB paradigm.
What are the difference between the RDB paradigm and the RM paradigm?
The relational data model is a mathematical model. The RDB paradigm is a way of applying that model to practical data storage.
Consider 6-32 screws
in an inventory. They are effectively clones without explicit identity
values but they are still uniquely identifiable in the problem space.
So long as the object corresponding to each screw has a unique address,
it is identifiable in the same sense that the physical screws in the
problem space are. The only way you can avoid 2/3NF problems for that
situation in an RDB is by adding an artificial explicit identity (e.g.,
autonumber) to the tuple itself.
And what is the normalization problem with that?
I said specifically that one /avoids/ the normalization problem through the kludge of providing an explicit tuple identity that does not exist in the problem space.
However, I don't see that as being very relevant. My point is that the
application's problem solution doesn't care how the data is stored.
Neither do the relational model or SQL.
The RDM, yes. But try using SQL on flat sequential files or an OODB.
SQL using flag files would be possible and is done all the time. Most
SQL databases uses flat files for persistence. Actually JDO claim that
an hybrid SQL language (JDO-SQL) can be used on non-relational
databases. Obviously the underlying database need to support relational
calculus or the JDO product has to implement it on top.
I'm talking about SQL and flat sequential files where there is no embedded tuple identity and the files are read by character or by block. SQL is meaningless in that context.
Note that RDM and RDB are not synonyms. An RDB is a special case of the
RDM.
In fact, no current production database qualify as a RDB. But if we had
a database qualifying as a relational database, that would be an
implementation of the relational model, not a special case.
I don't agree with that assertion, but let's not go there; this is a OO forum.
Lastly, your database is language-neutral. It shouldn't matter what
language the application sitting in front of the database is written in,
or even what paradigm it's born from. Flexibility starts with a good
database design and extends through the application--not the other way
around.
That's true enough but I would make it even stronger. RDBs are designed
to be problem-independent, not just language independent, which is
pretty much my point.
The relational model is used for modelling data, problem-independent or
not. Just because some data could be considered "problem-dependent", it
may very well me modelled using the RM.
I said RDBs, not the RDM. An RDB is one of many possible
implementations of the RDM.
Current SQL databases has some limitations that make them not qualify
as relational databases. But in what way does these limitations force
them to be problem-independent?
RDBs are designed for optimization in a multi-client environment around ad hoc queries where one cannot anticipate why the data is being accessed (i.e., what problem a particular client is solving).
So
if one is solving a non-CRUD/USER problem where special optimization is
usually required, one wants to separate the views of the solution from
those of the RDB.
Using low-level collection classes is not a good idea for modern
enterprise applications. There are a lot of issues like concurrency or
transactions, that you have to solve by yourself in that case.
By "enterprise applications" do you mean server-side layers?
I mean applications for accounting, payroll processing, logistics,
requirement management, etc. Not necessary on the server-side.
OK; been there and done that. I've never seen one where relationships instantiated at the object level would not be more efficient than relationships instantiated at the class level. In effect one completely eliminates an index search in almost all situations. No matter how you gussy up the index one is looking at at least O(NlnN) overhead for class-based indexes. That's because the OO relationships are instantiated to optimize the specific problem in hand rather than ad hoc queries.
Any concurrency relevant to client-side applications is
completely different than the concurrency related to processing parallel
transactions in the DBMS.
I guess the client-side of an application does not have very much
concurrency to deal with at all.
Only if the application is employed in batch mode. Client applications like accounts payable, accounts receivable, inventory control, and GL are usually available interactively for multiple <programmatic> users nowadays.
The tuple-based relationships of the OO
paradigm work quite well in concurrent environments.
Do you have any pointers to some website explaining "tuple-based
relationsships of the OO paradigm".
Any OOA/D book will do. In an OO context an object maps relationally to a tuple in a class set relation. Relationships are defined at the class level but they are instantiated at the object level.
1 R1 *
[A] --------------- [B]
If we have
A1 related to B2, B3
A2 related to B1
A3 related to B4, B5, B6
then A1 has a relationship collection of {B1, B3}; A2 has a relationship collection of {B1}; and A3 has a relationship collection of {B4, B5, B6}. When the relationship is navigated one only accesses the members of the [B] set that are specifically related to the A in hand. IOW, the navigation always accesses /only/ the relevant members of [B] for the A in hand.
That contrasts with the RDB-style class relationship where one potentially accesses every member of [B] to locate the members of [B] related to the A in hand. One can reduce the exhaustive search by providing a special index on the [B] set that is ordered for the specific query, but one still has an O(NlnO) search. One could also provide a "custom" index for every member of the [A] set just like the OO paradigm does routinely, but that would run out of resources pretty quickly.
When designing a large OO application one /always/ looks for ways to eliminate searches, especially class-based searches. Using a class-based search without very good justification is a good way to get burned at the stake by OO reviewers.
[Apocryphal anecdote. Once upon a time we were tasked with speeding up an application that was horrendously slow. Eventually we improved performance by more that three orders of magnitude. However, the single biggest improvement was to eliminate searches of a single class index to find individual objects. When replaced with proper relationship instantiation there was improvement in overall performance of nearly two orders of magnitude.]
*************
There is nothing wrong with me that could
not be cured by a capful of Drano.
H. S. Lahman
hsl@xxxxxxxxxxxxxxxxx
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development". Email
info@xxxxxxxxxxxxxxxxx for your copy.
Pathfinder is hiring: http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
.
- Follow-Ups:
- Re: Relational database & OO
- From: NENASHI, Tegiri
- Re: Relational database & OO
- From: frebe73
- Re: Relational database & OO
- References:
- Relational database & OO
- From: ajspowart
- Re: Relational database & OO
- From: H. S. Lahman
- Re: Relational database & OO
- From: Thomas Gagne
- Re: Relational database & OO
- From: H. S. Lahman
- Re: Relational database & OO
- From: frebe73
- Re: Relational database & OO
- From: H. S. Lahman
- Re: Relational database & OO
- From: frebe73
- Relational database & OO
- Prev by Date: Re: Events
- Next by Date: Re: Relational database & OO
- Previous by thread: Re: Relational database & OO
- Next by thread: Re: Relational database & OO
- Index(es):
Relevant Pages
|