OO versus RDB



I have a proposal for when to choose between OO and RDB...


Consider the following class

class String
{
public:
< various accessor methods >

private:
char* buffer;
int size;
};

This is an example of where OO shines. An instance of a string is a
linear array of characters - nothing more, nothing less.

To represent strings relationally would be hideous. For example (using
PROLOG notation)

stringCharacterAtPos(StringId, Index, Char) :-
Char is the character at position Index in string StringId

denotes a predicate (also called relation) that is able to
simultaneously represent the contents of any number of strings.
However it is woefully inefficient, and simple operations like
inserting into a string require many records to be changed.


Now consider this class, where OO falls over!

class SalariedEmployee
{
public:
virtual float GetPay() const { return salary; }
private:
String name;
float salary;
};

This claims that a SalariedEmployee equals the composite of a string
name and float salary. Well that is a lie - a SalariedEmployee is no
such thing! Actually a SalariedEmployee is a human in the real world,
not a simple data structure stored in computer memory or persisted on
disk.

Now that is obvious and we all know that OO implicitly models real
world entities all the time. Nevertheless, the above class is strictly
speaking a lie and should instead be renamed something like
SalariedEmployeeModel, to make it clear that it is merely a model of
something else.

A central idea in OO is that objects have identity. This is where
problems begin! The identity of an instance of a SalariedEmployeeModel
is distinct from the identity of the associated human being modeled.
This is clear when we see that we can simultaneously have two different
models of the *same* human. Eg

class SalesEmployeeModel
{
public:
virtual float GetPay() const
{
return numSales*commissionPerSale;
}
private:
String name;
int numSales;
float commissionPerSale;
};

Now OO designs avoid at all costs using multiple run time instances to
simultaneously model the one entity. Otherwise we can't have a single
polymorphic GetPay() function on a single object that adds together all
the sources of income of a given individual. Hence OO uses various
techniques to allow this to be achieved. Foremost is the idea to use
inheritance. For example, we can make a SalesEmployeeModel inherit
from SalariedEmployeeModel, like this

class SalesEmployeeModel : public SalariedEmployeeModel
{
public:
virtual float GetPay() const
{
return SalariedEmployeeModel::GetPay() +
numSales*commissionPerSale;
}
private:
int numSales;
float commissionPerSale;
};

Unfortunately there are significant problems with this approach :

1. We very quickly need multiple inheritance, and this presents a
number of problems. Many OO languages, such as Java and C# don't even
support multiple inheritance.

2. Static typing is incompatible with dynamic changes. Eg what happens
if an individual quits his job? Strongly typed OO doesn't allow an
object to change its type.

3. What happens when an individual has more than one job at a time?

Some of these problems are helped using template mixins. However, it
doesn't change the fact that static typing approaches don't (easily)
allow for circumstances to change over time.

As we model more and more information about people, the class hierarchy
(ie OO approach) quickly becomes untenable. A person can be all sorts
of things at once. Eg a father, son, employee, mechanic, thief,
manager and lotto winner.

By contrast the relational approach is great at storing arbitrary
amounts of knowledge about humans. For example, the following
predicates allow family trees to be represented:-

parent(Parent,Child) :- Parent is a parent of Child
male(Person) :- Person is a male
female(Person) :- Person is a female
born(Person,Date) :- Person was born on Date
died(Person,Date) :- Person died on Date
married(Person1, Person2, Date) :- Person1 and Person2 were married on
Date
divorced(Person1, Person2, Date) :- Person1 and Person2 were divorced
on Date

Independently of this, the following predicate

salariedEmployee(Person,Company,Salary) :-
Person is an employee of Company with Salary

can be defined, allowing a person to be a salaried employee of zero or
more companies.

Note that by having lots of fine-grained predicates, we distribute
knowledge about a person amongst many records in many tables. This is
in contrast to the OO approach that strives to collect all attributes
about a person into a single object. This is its own undoing, because
knowledge about external entities is inevitably open ended and dynamic
in nature, and therefore not amenable to the static type analysis used
by OO.

Fine-grained predicates deal naturally with partial information. Eg if
we don't know the date when someone was born we simply don't put a
record into the born(Person,Date) table. Despite this, that person can
still get married and have children!

Records in relational databases are declarative in nature. In the
terminology of PROLOG every record represents a fact. For example

father(abraham, isaac).

states the declarative fact that abraham is the father of isaac. In a
sense, a relational database is nothing more than a big collection of
facts. It is easy to reason about the truth of every record,
independently of all other tables, or even other records in the same
table. By contrast, an OO model enjoys no such simple semantic basis.
The declarative meaning of an OO program is far more intangible. It
requires analysis of the design documentation, the comments and the
methods. Not surprisingly, it is easy for OO models to create
confusion or to hide subtle errors.

The strong semantics behind the relational model allows for the
database engine to support advanced forms of query. By contrast, an OO
class is usually thought of as encapsulating internal state so the
system can't know the semantics.

Proponents of OO models say that OO's support for inheritance is a
significant advantage over relational models (which don't have any
concept of a record in one table inheriting from a record in a
different table). However, at least as far as the state representation
goes, the reverse seems to be the case. An arbitrarily complex
classification taxonomy can be represented simply, easily and
efficiently with a suitable set of predicates. This representation is
immune from questions of whether classifications are dynamic or static
in nature.

Some OO proponents have wrongly stated that the relational model is
inefficient because it needs to use big tables (with many fields), and
for a given record many of the fields are null to indicate that they
are not relevant to that record. However this is often a misuse of the
relational calculus. A better design is to use lots of fine-grained
predicates, and remember that table records are not suppose to
represent objects, but instead merely represent facts. Keeping this
in mind makes it clear that information about a single entity can and
often should be distributed across many records in many different
tables.

Conclusions...

I mostly seem to be attacking OO, but that is only because it is
obvious that relational models shouldn't be expected to model
everything - such as characters in a string or pixels in an image.
However it is more difficult to argue with the OO extremists because OO
is rather versatile!

The central thing I'm saying is that OO is great for creating "things"
that reside in memory and combine state and behaviour. A good example
is a GUI element like a button. The combination of hardware + software
allows a GUI element to be a real "device", a thing that is what it is,
and not pretending to be something else.

But when OO tries to model external entities, it loses a lot of its
appeal.


David Barrett-Lennard

.


Quantcast