Re: OO vs. RDB challenge

kurbylogic_at_hotmail.com
Date: 03/22/05


Date: 22 Mar 2005 02:56:16 -0800

Costin Cozianu wrote:
> kurbylogic@hotmail.com wrote:
> > I'm kinda comming into the discussion a bit late, so I appologize
if I
> > reiterate points already made, but here goes.
> >
> > First I completely disagree with the idea that if we only had
enough
> > persistant ram we wouldn't want or need a relational store. Its
rather
> > about choosing the correct tool for the task at hand. If you have
a
> > nail use a hammer if you have a screw use a screw-driver. SQL is
> > designed to store and query large sets of structured data.
Howerver, it
> > isn't ideally suited for complex calculations or polymorphic
behavior.
> >
> >
>
> Ahem, you're right about complex calculation, not about "polymorphic
> behavior" -- there's no such thing.
>
> > So you've figured out how to prersist users/groups and their
> > relationships in a database without any OO code... well done what
do
> > you indend to do with it? I assume you are storing this data
because
> > you want to do something with it, correct? Perhaps this data is
used
> > to "authorize" a user for a specific task? perhaps they need they
need
> > to be a member of the "Lender" group before they are allowed to
approve
> > a loan application? Simple enough we can query if the user is IN
> > ('Lender') before allowing them do anything with Loan tables. But
lets
> > complicate things a bit shall we?
>
> Sure, let's. On the other hand, I shall not commit the imprudence of
> debating complex things until the simple ones have not been settled.
>
> > We decide we that simple groups are
> > not sufficient we need a bit more granularity so we want to assign
> > "rights" such as 'Approve Loan' to a user OR a group and allow
groups
> > to contain groups to make management of user rights easier.
> >
> > Now give me a simple query that can be used to determine if the
user
> > has the "Approve Loan" right granted directly to the user or any
groups
> > he is a member of and any groups those groups might be a member
of...
> > also remember that deny overrides any grant at any level).
> >
>
> Well, that is rather for the more sophisticated and well read DB
guru.
> However, technically it's not big deal after all.
>
> It boils down to
> SELECT right FROM TC_GROUPS tc JOIN GROUP_RIGHTS gr ON
(tc.in_group_id=
> gr.group_id)
> WHERE tc.group_id IN
> (SELECT group_id from users_groups ug WHERE ug.user_id= :user_id))
> AND right='Approve Loan';
>
> Where TC_GROUPS(group_id, in_group_id) is the transitive closure of
> group membership, the only relevant problem remaining being the
> maintenance of transitive closure. This is left as an exercise for
the
> reader.
>
> But before asking me to post a complete solution, could I see the OO
> counterpart ?
>
> > Lets add a bit more type dependant (polymorphic) functionality
shall
> > we... Our fictitious lender offers many different types of loans,
home
> > loans have greater risk so not all users that can approve a loan
can
> > approve all types of loans we have decided to create permissions
for
> > different types of loans and/or the value of the loan... home loans
> > require explicit approval by a user with the "Approve Home Loan"
right,
> > otherwise "Approve Loan" permission is good enough.
> >
> >
>
> That is rather trivial.
>
> > While your at it I (roleplaying customer) need an amoritization
> > schedule (the loan applicaiton has the principal loan amount,
annual
> > interest rate, loan duration and payment schedule (monthly,
quarterly,
> > weekly, etc)). If you can write an sproc or PL/SQL script for me
to do
> > that would be cool then I don't need to waste our presious network
> > resources to transmit the loan objects to the application tier.
> >
>
> Surely you don't mean to say that loan rates, amortizations, etc. are

> "complex calculations" ?
>
> >
> > I think your on to something with this SQL approach, we can give
all
> > our loan reps a Query Analyzer and if they want to create a loan
> > application they can just type exec sp_createLoan('... oh wait
customer
> > first exec sp_createCustomer('first.. wait I think last name goes
> > first... well nevermind guess we to create a gui of some sort
> > perhaps... what language did they use to create Query Analyzer (or
the
> > database engine for that matter)? I guess we need a high level
> > language for somethings, ah well.
>
> I guess you might be missing here that PL/SQL is a pretty high level
> language, and especially with regards to distinguishing positional
> parameters is better designed than java or C# or C++, because in
PL_SQL
> you can just say
> sp_xxx(
> ...
> first_name => 'Costin',
> last_name => 'Cozianu',
> ...
> );
>
> whereas in most OO languages you simply cannot name your parameters.
I
> guess this is an instance of ignorant irony that backfires.
>
>
> > The amoritization schedule you can
> > still do for me right?
>
> Right, but only after I'll have decided you're not an incompetent
troll.
> Given the quality of portions of your post, I am rather suspicious.
>
> > and you've already got me convinced the db is
> > the best place for this user/group thing, the additional complexity
> > isn't to much for you is it?
> >
> >
> > And about this thread saftey issue... I guess we don't need to
worry to
> > much about thread synchronization at the database level but I'm a
bit
> > concerned about multiple users working with the same data at the
same
> > time. I was reading something the other day about 'dirty data' and
> > 'phantom data' such as when a user inserts a row and then another
user
> > reads that row and then the user that inserted that row aborts his
> > transaction, are you familiar with this? I checked the docs and by
> > default SQL does not prevent phantom data! I guess they have
provided
> > an issolation level called serializable that prevents phantom data
but
> > the documentation says "use only when necessary." When do you
suppose
> > this is necessary? I don't want to generate a report and send it
off
> > for funding that that says we have approved a loan when that loan
> > approval hasn't even been commited yet! I'm torn about this
> > user/group/right scenario though, do you think we need to be
concerned
> > about dirty data and phantom reads?
> > Suppose we create this gui application and this application allows
the
> > administrator to change the user right assignments by checking a
few
> > boxes or something and then save his changes, I guess it probably
would
> > replace the existing options with the ones he is viewing right? so
if
> > he perhaps checks the deny on the 'approve home loan' box and
another
> > administrator was told to grant him permission to 'approve auto
loan',
> > last person to save wins right and he would still be authorized to
> > approve home loans? I guess we could just keep the transaction
open
> > with serializable issolation the entire time the administrator is
> > viewing those options so that the second admin isn't even allowed
to
> > view the data until the first admin finishes... but if we do that
we
> > may not be able to approve other users actions either and are
entire
> > system could come to a halt waiting on this one admin. I was
looking
> > into some alternatives, what do you think about optimistic
concurrency
> > control? this doesn't seem to be a 'native' database feature. Its
> > rather simplistic in theory though, when you read the data you also
> > include a version number and then when you do the update you
include
> > that same version number if it changed then you know its been
modified
> > by someone else, so the only lock that we need in this scenerio is
> > during the update.
>
>
> I guess you are more confused about the above scenario.
>
> If an admin takes away one user's right after that user has already
> performed an action it was previously authorized for, but no longer
> subsequently, then the action stays. The session that needs to be
> working under "serializable" is that of the user, not that of the
admin.
>
> Just because you've read about "optimistic concurrency" in some
> "popularization of science" books (I know how some OO circles felt in

> love with the idea) doesn't mean you understand the issues. And given

> how you framed the above problem I suspect a reading from a more
serious
> book (like "Transactional Information Systems")would help you think
> about these types of problems in a much sounder way.
>
>
> > What was it again in high-level languages you found
> > difficult (ArrayList.Synchronized method)?
> >
> >
>
> What was that you found easy ?
>
> Did you find it easy to deadlock, or did you find it easy to arrive
at
> inconsistent results inspite going through all kinds of locks ?
>
>
> >
> > Anyways, relational storage wont be going away anytime soon perhaps
> > never, the decomposition of the data makes it possible to construct
> > very effecient queries over large sets of data. Currently searching
> > object graphs is more difficult and less efficient then it could
be,
> > for example searching for all customers located in 'AZ' its not to
> > write a foreach loop, however it gets old and its somewhat
hard-coded
> > making arbitrary queries more difficult to implement efficiently, I
> > suspect someday technologies like XQuery might become a native part
of
> > the language allowing easy creation of arbitrary queries on object
> > graphs. Memory managers probably need some work in this area as
well,
> > ts not very efficient to query object graphs compared to a
database,
> > primarily because of the way that memory is allocated, objects are
> > allocated onto the heap in a somewhat arbitrary location based on
how
> > much data you want to allocate and what it has available, there is
no
> > gaurantee that objects of the same type will be located remotely
close
> > to each other in memory or in memory at all (paged to disk), so a
> > statment like foreach customer in getcustomers() { if
> > customer.address.state = "AZ" list.add(customer); } could result in
> > lots of memory paging. The database manages the memory much
> > differently loading records of the same type close together in
"pages"
> > with fixed record lengths allowing quick navigation from one
address
> > record to the next address record. I suspect someday heap
allocation
> > for objects might become more optimized for querying done more like
a
> > database where objects of the same type will be allocated in the
same
> > location (inheritance I'm sure complicates this but perhaps some
form
> > of replication of base class properties as one might denormalize a
> > database could work but I'm not an expert on that kinda thing
anyway).
> > Databases however are learning to work better with (not to
eliminate)
> > OO apps, in a way that is more object graph friendly, SQL 2005 will
> > allow multiple active result sets or xml resultsets provide ways
make
> > it easier to read an object and its children in one step from
multiple
> > tables simultaniously rather then needing to read all customer data
> > then go back to each customer previously read to add the address
> > record. Asynchronous notification makes it possible to be informed
> > when things change rather then needing to repeadedly read the
database
> > to ensure you are most up to date. SQL 2005 has some useful
features
> > to help bridge the gap between OO and relational, each tool has a
> > purpose neither trying to elimiate the other just the gap, the goal
is
> > rather to get "seamless integration" so they play nice giving the
> > developer the most productive tool for the current task without
having
> > to pick one or the other and/or become a bridge builder.
> >
> > - Kurt
> >
>
> Nice deluge of irrelevant 2c that I cannot make any sense of. I'd
rather
> recommend to you and all contributors to this thread to stay strictly

> technical and within your areas of competence, refraining from
> unwarranted speculations.
>
> For example it would have helped if rather than being unsubtly ironic

> about stuff you think databases cannot handle, you would have tried a

> more prudent tone along the lines "In my experience, I found
difficult
> addressing this kind of issues". Because it may turn out that irony
> backfires and bites you somewhere unpleasant .
>
> And again, before complicating this challenge to no end, I'd rather
see
> the response to the simple challenge, and it would also help if
> proponents of new challenges would have already worked out one
solution
> using their technology/approach of their choice so that they can post
it
> and/or estimate the complexity, rather than playing it for a sport.

The point I wanted to make is that one should choose the correct tool
for the correct task... I don't store users and groups and/or maintain
relationships just to have users and groups I do so only because I want
to do something with them, such as authorize an action, in OO the
complexity of mapping database tables to objects is useful only if the
object has behavior, such as Loan.Approve() that involves more then
just updating a field. If all you want to do is create a list of users
and groups and maintain some relationship between them then an OO
solution has little value and given the choice between a database or OO
solution to the problem I too would use the database, nearly every
application I write anymore involves a database as part of the
solution, but a database alone cannot provide a complete end-to-end
solution.
The amoritization schedule is a recursive operation and why I presented
it not because it is inherantly "complex" the resulting code would be a
more procedural rather then set oriented operation, so why not use a
procedural language.
The database might be "thread-safe" but thread-saftey does not prevent
data in all cases from becoming inconsistant. I phrased my response
roleplaying a somewhat ignorant customer that has concerns... prompting
you the expert to explain how "thread-saftey" ensures options cannot be
overwritten as may sometimes happen if data is copied (perhaps to a
temp table) and then updates are made based on stale data, or how
thread-saftey prevents non-commited records from appearing in another
users query (such as reporting a new loanapproval record that has not
yet been commited). It is not thread-saftey alone that ensures data
concistancy, it must be coupled with issolation, issolation is the
easiest and probably most common OO approach to concurrency issues in
object solutions, instead of locking making deadlock detection a
useless feature. User u = new User(); u.SetName(name); is a completely
thread safe operation because that user object is completely invisible
to other threads. Replacing one object instance with another object
instance is completely atomic operation, this doesn't have to be a
simple object it can be a composite object with as much or as little
state as you need to ensure that an all or nothing set operation
results in a consistant system state without requireing any locks at
all. This is not true with the database where inserts that have not
yet commited can be seen by other or users, data possibly used to
perform an operation such as transferring reducing the total funds
availabe for lending based on data that has possibly been rolled back
leaving the system in a inconsistant state.

So in response to your initial challenge, the following addresses ONLY
your solution as you insist and is thread safe (policy class does not
need to be thread safe because a thread can only get an instance of it
via new Policy() or from the SecurityManager which returns a new
instance with a copy of the current state therefore it is issolated
from other threads and does not require any locking)

public class SecurityManager
{
  static Policy policy = new Policy(); // initially empty
  public static Policy GetPolicy()
  { return policy.Clone(); }
  public static SetPolicy(Policy p)
  { policy = value.Clone(); } // atomic operation, no lock required

  public static bool IsUserInGroup(string user, string group)
  {
    return policy.IsUserInGroup(user, group);
  }
}
public class Policy
{
  ArrayList users = new ArrayList();
  ArrayList groups = new ArrayList();
  ArrayList usergroups = new ArrayList();

  public void AddUser(string user) { users.Add(user); }
  public void AddGroup(string group) { groups.Add(group); }
  public void RemoveUser(string user) { users.Remove(user); }
  public void RemoveGroup(string group) { groups.Remove(group); }

  public void AssignUserToGroup(string user, string group)
  {
    if(users.Contains(user) && groups.Contains(group))
      usergroups.Add(new UserGroupAssignment(user, group));
  }
  public void RemoveUserFromGroup(string user, string group)
  {
    usergroups.Remove(new UserGroupAssignment(user, group));
  }
  public bool IsUserInGroup(string user, string group)
  {
    return usergroups.Contains(new UserGroupAssignment(user, group));
  }
  public Policy Clone()
  {
    Policy p = new Policy();
    p.users = (ArrayList)users.Clone();
    p.groups = (ArrayList)groups.Clone();
    p.usergroups = (ArrayList)usergroups.Clone();
    return p;
  }
  struct UserGroupAssignment
  {
    string user;
    string group;
    public UserGroupAssignment(string user, string group)
    {
      this.user = user;
      this.group = group;
    }
  }
}

You refuse to add any more complexity to this task, yet bring up yet
bring up "deadlock detection" which isn't even remotely necessary in a
solution that requires no locks. You claim the OO solution requies
more code for THIS - ONLY THIS problem compared to a create table
creation script, come-on really?
It is only slightly longer than your table creation script, but you did
not include any procedure creation scripts therefore I assume that you
must expect the client to query the user/group tables directly...
If/When we decied to add groups into groups all changes are issolated
to the policy class, you would have to update all of your "only one
statement..." in dozens if not hundreds of locations, I have only one
location to make changes to, the code used by my client remains the
same. You could and probably would create an sproc or whatever but by
the time you have written the Create Procedure ... As ... for each of
those "in one statement...". If you include the SQL to create
procedures (especially if you add the standard if exist drop statments
to your table and procedure creation scripts) than your code would be
just as long as mine! I suppose I could eliminate the methods on my
class and expose the private members to reduce the lines in my code as
you have to some extent done, we've learned the lessons about exposing
implementation details, so if you want to compare apples to apples (to
the extent possible) add the code for the procedures, how many lines
have we now? Is it more then 35 (my total solution not including the
{}s).

Relational databases aren't going away, but not because they are less
complex. I guess you can call SQL is a "high-level" language, its
purpose is however different, it is a set oriented language it works
well with sets of structured data, OO languages provide polymorphic
behavior which you deny even exists, polymorphic behavior is a large
part of why OO languages were created in the first place and why they
have become "mainstream" languages. So if you are going to complain an
many-many relationship in OO is *too complex* (which of the 35 lines
was complex?) then abandon OO write all your business logic in in SQL,
write a real application that does something with those users and
groups you have created and then try and maintain it for the next 5 or
so years, enhance and extend it for new requirements so it can last
another 5 or so years. Then maybe you too will be asking why can't I
just save the current state of my system into the database without
having to to shread an order object it into line items, billing and
shipping address records, why can't I say load the current state of the
system without having to reconstruct object graphs from dozens of
seperate tables. If persistance was the only useful service provided
by a relational database then I suppose more persistant memory is all
one would need. But obviously you are not interested in the
big-picture just the overwelming complexity in the 35 lines of code to
implement a many-many user group relationship. So perhaps you should
"stay strictly technical and within your areas of competence,
refraining from unwarranted speculations" regarding the limitations or
usefulness of object databases and/or object relational mappers.
Someone who does not see value in an OO language either because he
refuses to or cannot face the inevitable complexity of any real
application, or because the "complexity" of a many-many relationship is
too much to handle, than I couldn't possible expect him to see any
value in an object database either. You clearly have no appreciation
for its need. I'm not really sure what the purpose of your thread is,
your OO solution has been given not only by me and requires only a few
more (if any in practice) lines of code and isn't remotely complex. I
would be interested in an discussion about the value of object database
solutions and limitations of object databases, but as you openly
admitted you saw no relevance and have no understanding of the progress
or issues involved. Unless you have a more substantial argument
against them, I don't see any value in your opinion on this subject.

- Kurt