Re: Business objects, subset of collection



Responding to Frebe...

select i.invoiceid
from invoice i join payment p on i.invoiceid=p.invoiceid
group by i.invoiceid
having sum(p.amount) < i.amount and datediff(now(), i,duedate) >= 10
This is the kind of code I write every. Even though the number of
invoices and payments are very high, the queries perform within a
number of millis. The customer is happy, I am happy.
Once again, if that is all you are doing, that is CRUD/USER processing;
you are just moving piles of data back and forth between the UI and the RDB.

Maybe you can show your code performing the same task.
You want code for inventory forecasts? A Linear programs to allocate
advertising budget to various markets' media? A simulation model of
atmospheric diffusion?

No. I was asking for the OO equivalence for the SQL statement above.

What's the point? Sure I could write AAL or OOPL code to do that. But the main point here is that one would not want to solve the problem that way unless there was no other alternative.

IOW, you are providing a common generic data access query problem and asking for an OO equivalent. But I'm talking about solving customer specific problems, not generic data storage access problems. The OO solution would be tailored to a specific problem so it would very rarely have need of that approach.


That paradigm is fine for generic data storage and
access but searching large sets sucks for algorithmic processing.
SQL databases sucks for searching large data sets, come on...
You don't deny my assertion that I can perform the same O(log N)
optimization in the implementation of a <reusable> collection class.
Of course not. The difference is that you have to do it by yourself. I
can reuse existing tools instead. That is the main difference between
using a database and not using a database.
And, as I have said at least twice, the price one pays for superior
performance is hand crafting of the optimization.

There are areas in which B-trees are not sufficient. But if you are
claiming superior performance, I think you have to be very careful
about telling in what context. O(log 1000) is not superior to O(log
10000).

Say, what?!? See you specific example below.

In CRUD/USER
processing that is largely irrelevant because (a) data is accessed once

If the complexity of the alogrithm is the same, the numer of access
are irrelevant.

That's kind of the point. The "algorithms" for CRUD/USER processing are very similar because they all solve the same problem: converting RDB views of the data to UI views and vice versa.

OTOH, once the software itself is solving problems for the customer, one has essentially infinite variety in the sorts of "algorithms" utilized. Hence the need to tailor solutions to the specific problem in hand.


So one /must/ be able to achieve more efficient searches
given OO's object-level instantiation. The price one pays for that
efficiency is that the object-level instantiation has to be hand-crafted
based on the particular problem context.
Yes, I agree that by writing the corresponing code by your self, you
might get faster applications than you would if you had used a SQL
database.
But "usually" development time, cost more than CPU time. And the only
thing you have showed is that databases has complexity O(log A) and a
tailored solution O(log B), there B < A. This is not a very big deal,
compared to the "extra tons of keystrokes", you would have to do.
It /is/ a big deal when the same data aggregate is accessed many times.

log 100000 = 5, log 1000 = 3, it is not that big deal

You are ignoring the processing. The search has to access the individual tuples and perform a comparison. Accessing 5 tuples rather than 3 every time the solution needs to navigate the relationship can make a huge difference. I have seen applications with two orders of magnitude performance hits overall because of a single context switch for a procedure call. Increase the number of times the procedure is called by 66% and you can count on somebody noticing.

Caching is managed by the database. Caching is not the concern for the
application designer.
Nonsense. Every non-CRUD/USER application I have ever seen had a
client-side cache in some form. By the time the requests get to the
DBMS, it is too late. The DBMS caches just manage server access and
resources (unless you are talking about memory mapped OODBs). One caches
on the client side to minimize DB access.

Another way to elminiate the inter-process communication overhead is
to use stored procedures for "complex processing". That is why most of
TPC (www.tpc.org) implementations rely heavily on stored procedures.
Obviously you could also use a memory mapped SQL database.

The web site didn't have a whole lot to say about stored procedures (but I didn't plow through all the papers). However, I don't think it has much to do with the point. There are four contexts of stored procedures:

(1) Enforcing data integrity constraints on the data storage itself. Such stored procedures are triggered directly by write operations. That makes a whole lot of sense since data integrity is pretty important to data storage. It also makes sense to optimize those procedures to the DB engine.

(2) Application modules that are executed on the client side but are stored on the server side. This is common in RAD environments because the event code needs to be stored and managed transparently somewhere /within/ the IDE. However, it actually adds overhead to the application because the stored procedure needs to be read from the DB into the client execution space rather than being there initially (e.g., as in a compiled image).

(3) Application modules that are executed on the server side at the request of the client. These are also common in RAD environments and usually consist of stored queries or joins. For these to execute on the server side all they can do is extract/update data in the DB because there is no mechanism for interacting with the client's data structures _as they execute_. IOW, what one is storing are CRUD/USER operations on the stored data whose timing is dictated by the application.

(4) Application models that capture business rules and policies and are triggered by other data accesses on the server without the application's knowledge. This is, indeed, more efficient because they can be optimized around the DB engine. It is also a maintainability nightmare for large applications. Any application developer that creates such stored procedures should have their thumbs broken.

In summary, (1) is part of the DBMS franchise; (2) actually adds overhead; (3) is only useful for CRUD/USER processing (or to handle DB access for the application); and (4) should be grounds for summary dismissal. So what is the benefit of stored procedures for a non-CRUD/USER application?

If one used the
same accessing paradigm internally in the solution as the RDB uses the
application would brought to its knees.
There are scenarios, there mainstream SQL databases wouldn't perform
well. But the do excel in many application areas. The major problem
with current SQL databases, is the limited set of index types that is
used. B-trees is used as a one-size-fits-all solution. Obviously
better support for other index types is wanted. I can imagine
scenarios there foreign keys implemented as pointers would be a good
thing too. As a matter of fact, I think such databases exists, even if
I don't have time to find references to support this claim.
SQL databases excel in CRUD/USER processing, which is what I keep
saying. Conversely, they suck once one gets out of that realm.

Since your definition of CRUD/USER since to be extremly wide, I am
quite happy with this statement.

How is it extremely wide? At a fundamental level CRUD/USER applications only solve one problem: data view conversions. They are limited to the very fundamental tasks of creating, extracting, updating, sorting, reporting, and deleting data stored in persistent storage. The crucial thing that they do not do is solve specific customer problems (i.e., manipulate data to produce nontrivial results).


As an example, I play an MMORPG. The game employs an RDB, a star
client/server model, and thin clients. That is a great architecture for
airline reservations systems but it makes the MMORPG non-scalable.
That's because the same data is constantly being accessed and updated by
groups of clients involved in the same interactions (i.e., many times
per second).

I don't know anything about MMORPG, but doesn't a airline reservation
system constantly access the data too?

Yes, but not the same data. To make a reservation each client needs to access one flight's information and provide one seat reservation to solve the customer's problem. Different customers might be interested in the same flight, but usually not at the same time. And only one gets to lock a seat down for update. That is ideally suited to thin clients (display flight data and a simple selection), a star client/server configuration (independent problems on each client), and RDB-style transaction processing for the relatively rare updates. (Not to mention that the airline reservation system has a very simple schema while an MMORPG has hundreds of tables and many thousands of relationships.)

More to the point, each reservation customer's access is independent; the guy in Topeka making a reservation for flight 982 could care less about the guy in Tombstone who wants to fly the same flight. The scope of each problem and its processing is completely independent (except for locking the particular seat). But in the MMORPG, all the clients are directly interacting with each other; each player is directly responding to what other players are doing in real time. And great volumes of information from many tables are relevant to those interactions and are modified nearly continuously (imagine the RDB updates flowing in at the same frequency as Windows WM_MOUSE_MOVE messages). Trying to make the RDB the coordinator of all that is a disaster for bandwidth. Given the number of relationships, an OODB would probably have been a better choice (albeit somewhat moot). But the combination of thin clients and a star configuration is hopeless.


So the cardinal rule of complex application
development is to read the data once and write it once, no matter how
many times it must be accessed in the solution.
Read and write once from what, disk? Or RAM?
The DBMS. Disk seek is the big problem but the table-level searches are
still an important problem when the same data is accessed repeatedly in
a single problem solution. (Or there are many possible relationships
among the data or the data is constantly being updated.) It is also
important to be able to convert identity into formats where one can use
more efficient data structures, like arrays.

For business applications anyway, I think it would be difficult to
find scenarios there it would really be necessary to use such low-
level data structure as an array.

You are joking, right?


But the 'n' in
O(log n) will usually be much smaller in the OO application because the
collections are object-based rather than class-based.
Lets say you want to find all unpaid invoices. Why would the n be much
smaller in a OO solution?
I said, "usually'. You are postulating a class-based search as a problem
requirement.
You might think that my example is too extreme, but isn't it good to
use a method/tools that doesn't limit you to work on small amounts of
data?
Who is limited to working on small amounts of data?!?
You complained about my example, since it was a class-based search,
and not a search which could be solved with a limited number of
objects in a collection. So why don't show how you would find all
unpaind invoices?
I've already point out that you are postulating a requirement that
/requires/ a table level search.

I just took an example from reallity. The point was to show that your
"usually" disclaimer is debatable. In reallity, there exists a lot of
scenarios there "table level" searches as necessary. You may like it
or not.

From CRUD/USER reality. Let me put it this way. In that past two decades I cannot recall a time when I actually used a WHERE clause in AAL. I am sure there were a few times, but it is a clearly pretty rare event outside of CRUD/USER "reality".


[I have also pointed out that even in
that situation there are possible ways to avoid it, such as a collection
dedicated to just unpaid invoices.]

I could show the complete solution with 4 lines of code. The reason
you are only pointing your solutions, is that it needs much more lines
of code. That is what I wanted to point out. Besides, how would you
solve this part: "datediff(now() - duedate) > 10"?

What does LOC have to do with it? I don't care how one implements the design at the 3GL level. What counts is the way the application is designed. In application design it is fundamental practice to avoid searches, especially class-level searches.


--
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
hsl@xxxxxxxxxxxxxxxxx
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development". Email
info@xxxxxxxxxxxxxxxxx for your copy.
Pathfinder is hiring: http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
.



Relevant Pages

  • Re: Portable stored procedures
    ... > JDatastore lets you store binary streams outside the table scema. ... Other databases usually do not store BLOB's ... > you will have to write the function in the client in Java (or whatever ... >> stored procedures written in some proprietary language. ...
    (comp.lang.java.databases)
  • Re: Please Help
    ... My client is looking for someone with 15+ years experience ... with multivalue/multidimensional databases. ... sales awareness, business acumen, evangelism, mentoring and training. ... IBM DB2, and Oracle- specifically the most current OLAP addition to ...
    (comp.databases.pick)
  • Re: Dont understand what version of SQL to install
    ... I'm talking about developing and testing databases for my client and ... To my XP, I can add SQL Developer 2008 to develop (I guess I should say, ... So I can use SQL Developer 2008 to create a database in standard 2005, ... installed on Client OSs such as Windows Vista. ...
    (microsoft.public.sqlserver.setup)
  • Re: Please Help
    ... recruiters, such as myself, and this post will spur a flood of ... My client is looking for someone with 15+ years experience ... with multivalue/multidimensional databases. ... sales awareness, business acumen, evangelism, mentoring and training. ...
    (comp.databases.pick)
  • Re: starting freelance
    ... In the past I just rebuilt the databases fresh like you ... Maybe you have advice on how to estimate time spent for creating the ... Of course your client wants a price. ... You can end up rebuilding structures over and over because the customer "forgot" to tell you about some huge exception or odd rule until the day before you go live. ...
    (comp.databases.filemaker)