Re: OO vs. RDB challenge

alex99_at_medcentral.com.au
Date: 03/20/05


Date: 20 Mar 2005 04:24:51 -0800

frebe wrote:
> alex99@medcentral.com.au wrote:
> > In essence I would have layers:-
> > Layer 1. Get info from diverse source(s),
> > Layer 2. Apply filters,
> > Layer 3. Format and make pretty enough or presentation.
>
> If you filter your data after you retrieve it, you will have to
> traverse every object and you will have a linear search. Doing this
is
> the same as dismissing 40 years of computer science.
>

I'm sorry about the length of my replies but it's a huge topic.

Allow me to paste back some important bits that got snipped:-

I said:
"In the end, I only use DB's if my client demands the extra sex appeal
that comes by having one and not because they do anything for me."

You asked:
"How would you do without a RDBMS?"

* A fair question, which I aim to answer. However since there are many
non-database technologies I can't get into great technical detail.

So instead I chose to walk through some real world scenarios in order
to at least show how you can be without a database ...

I also said:
"The information might be in a CSV file, XML, dbmfile, CORBA, MQ, EJB,
LDAP, database, web page, spread*** ... and therefore the solution
will vary."

The filtering approach is not universal, it depends on the technology,
it would not be required with Corba or MQ but it is one approach I find
myself doing often.

For example if the supplier of my information only gives me an XML or
CSV file what am I do to? A linear scan of course.

> > SQL might help with Layer 2 but only if the query is simple enough.
> If
> > not you need procedural or OO language help after all.
>
> I have seen many examples of this case, then a procedural language
has
> to implement a filter because SQL can't do it. But in every case it
was
> possible to redesign the database schema to enable SQL to do its job.

I've worked in organizations who have millions of customers and
hundreds of thousands of employees, with huge databases - no redesign
was going to happen.

At any rate there are some difficult queries that are best done with
procedural help.

Or they can be computed much more quickly if done externally to the
database.

> If you do a linear procedural filtering, your perforance will be
bad.

Maybe then again maybe not.

I have many examples when performance was actually much better.

We were supplied a report that took 21 hours to run, it used the
database vendor's report writer and SQL to make all the appropriate
joins and queries. 21 hours!

We dumped all the required tables to CSV, that only took minutes. Then
we performing the join ourselves using AWK and computed the same result
in 19 minutes! Then for fun we did the same job in C, it took seconds.
Our performance was very good.

The thing is database optimizers, indexes and joins can easily be
beaten if you think about things.

> > The DBA team will not allow you to make any ad-hoc queries.
>
> I did not claim this to be a ad-hoc query. And why would not the DBA
> team allow that?

Every DBA I ever met "owns the database" the rest of us can only dream
of touching it.

Funny thing is when I was a DBA I did the same thing (blush).

> > You have 25 million customers and they don't like programmers
poking
> > with production systems. They eventually agree to FTP to your
machine
> a
> > compressed CSV dump of the customer table, they do that at 3AM
Sunday
> > mornings.

> You have a database which is only accessable by a export utility? A
did
> not say I should be poking with the database. I'm talking about a new
> feature in your application.

I didn't realize you're talking about a new feature in an application.

> You have a database which is only accessable by a export utility? A
did
> not say I should be poking with the database. I'm talking about a new
> feature in your application. I assume that you have a development
> database to work with.

Actually I may have no database at all, maybe only a Corba object or an
MQ....

> > You create a very nice page, but it's too long, it hangs the
browser,
> > we need to paginate the result. There where 7 millions rows so the
> > network guys are screaming.
> > How do we do the paginated query?
>
> If you look at Butler (http://butler.sourceforge.net), you can see
how
> a paging query (ScrollQuery), based on any other query, is done. It
is
> a little bit complicated, but very deterministic. Any good SQL
> framework should have a function for creating paging queries.

Yes I had a brief look at your work, I liked what I saw.

> > After a while they realize they want this query all the time with
> live
> > data, so you get permission to run SQL on the database. Great but
our
> > driver only allows us to move forward, we also need to go page
> > backwards.
>
> Switching between forward and backward paging is just about chaning
> operator ("<" to ">") and sort order ("asc" to "desc").
>
> > The new super-e-business-guru has decided to put all customer
> > information in an EJB and the manager has agreed.
>
> EJB is not a place to "put information". It is a layer on top of a
> database. Why would be manager only allow database acces through EJB?

To encapsulate the legecy system and distribute the information in a
database neutral fashion. There are many advantages, isolation of
schema
changes springs to mind as well as security, division of labor,
autonomy and performance. Middleware is very powerful if used wisely.

> What would he do with all his Crystal reports?

No, the manager is frightened by CR and we wants a web page so he can
show it off to his friends ;-)

> > The new web page version of the query is getting really funky now,
he
> > wants to ensure customers have a matching record in the Radius
> > Authentication Server.
>
> A good RDBMS could integrate an external authentication server.

Maybe but what if they don't allow it. I can't imagine the security
team allowing integration with ... well anyone else ;-|

> > ... and while you're at it check to ensure they exist in the all
> > important billing system. Accounts interface with CORBA but
> internally
> > they use Oracle (which is behind 3 firewalls).
>
> Maybe you could use the replication feature in Oracle?
>
Don't think so. The have explicitly hidden behind firewalls - no
access. You want access to our information, here is a limited view
via a CORBA object take it or leave it.

> > Hmmm a join across different kinds of systems. Now what?
>
> Most RDBMS:es can join from tables in other catalogs and schemas, as
> long as they are in the same RDBMS instance.

But I'm talking about information in different _kinds_ of systems.

> > I hope can you see my point ;-)
>
> You are just showing a lot of stupid limitations that your colleagues
> gives you. None of this limitations has anything to do with a RDBMS.

True they may appear stupid, but they are real world examples and
actually they are not limitations.

In large organizations departments do not allow their databases to be
hacked by everyone.

For ad-hoc work, if you're lucky, they might give you a CSV dump.

If it's an on-going need you will get access via their preferred
middleware (Corba, MQ, EJB...). One way or the other you will not
get direct access to their database. Nor should you want it!

This is encapsulation at the system level.

Actually these are not limitations but powerful enabling technologies.

> And the limitations you show are very rare. The only argument that I
> ever heard about before, is about authentication. But that is also
> caused by an obsolete and stupid idea about "connection pooling". To
> create an existence argument for application server software, you
need
> to invent a fake problem with the RDBMS.

Fredrik I'm not inventing fake problems, though I must admit they
probably sound that way, I'm just trying to walk through a non-database
world in response to your question "How would you do without a RDBMS?"

I know it can seem bizzare that there is no database (or there is but
it's hidden behind something else) but that is just how it is.

We needed to integrate across 60 large systems, can you imagine the
complete and utter chaos if everyone was allowed to access everyone' s
database directly?

How many schemas you would have to understand? How many schema changes
you'd want to avoid? How many fights about getting indexes built? No
thanks I'd rather have the encapsulated approach.

> Fredrik Bertilsson
> http://butler.sourceforge.net

Once again sorry about the long post.

Hopefully I've answered how and why there is no database and a little
bit of what I might do without one.

Cheers and thanks for the discussion,
Alex Kay.