Re: Business objects, subset of collection



Respondng to Frebe...

invoiceSet = this -> R1 WHERE (date=20080113)
FOREACH invoice IN invoiceSet
// process invoice objects
or
select invoiceid from invoice
where customerid=? and date=20080113
paymentSet = this -> R1 -> R2 WHERE (date=20080113)
FOREACH payment IN paymentSet
// process payment objects
or
select paymentid from invoice i join payment p on
i.invoiceid=p.invoiceid
where customerid=? and p.date=20080113
Yes, they are quite similar. But apropos of your other message, that
just reflects that both are based on the relational model. However, the
models are quite different because the collection sets are object-based
in the AAL but they are table-based for the SQL. The set of invoices and
the set of payments examined in the AAL case will usually be much
smaller that the corresponding invoice and payment sets in the RDB.
That is a limitation you have to do, because your solution will
perform too bad otherwise. A relational database on the contrary isn't
limited to only operate on small amounts of data.
We do this because using the RDB query model with table-based indices
would be very inefficient for memory-based computing when solving
particular problems.

I solve particular problems using "the RDB query model" every day, and
I have not noticed it being "very inefficient".

That's because you are either doing CRUD/USER applications or you haven't benchmarked your solutions...

That paradigm is fine for generic data storage and
access but searching large sets sucks for algorithmic processing.

SQL databases sucks for searching large data sets, come on...

By your own admission, queries rely on the DBMS providing O(log N) search performance. You also have not denied that RDB indices are instantiated at the n-ary relation (table) level. So N is <almost> always the number of tuples in the table.

You don't deny my assertion that I can perform the same O(log N) optimization in the implementation of a <reusable> collection class. But if OO relationships are instantiated at the object (tuple) level, then the N in OO searches will usually be much smaller than the N in table level searches. So one /must/ be able to achieve more efficient searches given OO's object-level instantiation. The price one pays for that efficiency is that the object-level instantiation has to be hand-crafted based on the particular problem context.

OTOH, the DBMS provides storage and generic access that is reusable across different problem solutions no matter who "owns" the data. The benefit of that is that all client solutions face the same access paradigm, which is very general. But the price is that it will be uniformly less efficient than if if were tailored on a case-by-case basis.

Why do you think that designers of large, complex applications spend so much effort trying to minimize persistence access, regardless of whether they are doing OO or not? They introduce elaborate caching schemes, deliberately present themselves with major data integrity problems, and whatnot because persistence access is almost always the performance bottleneck of such applications.

When solving complex problems the same data is quite often accessed and processed many times during the course of the solution. If one used the same accessing paradigm internally in the solution as the RDB uses the application would brought to its knees. (More precisely, it would not be competitive in the marketplace with other applications that optimized for the problem in hand.) So the cardinal rule of complex application development is to read the data once and write it once, no matter how many times it must be accessed in the solution. There's a reason for that rule and seek time is just part of the problem.


But the 'n' in
O(log n) will usually be much smaller in the OO application because the
collections are object-based rather than class-based.
Lets say you want to find all unpaid invoices. Why would the n be much
smaller in a OO solution?
I said, "usually'. You are postulating a class-based search as a problem
requirement.

You might think that my example is too extreme, but isn't it good to
use a method/tools that doesn't limit you to work on small amounts of
data?

Who is limited to working on small amounts of data?!? In an OO application a single relationship between classes in a Class Model is just implemented as many small relationships between class members rather that as a single relationship for all members. The number of members (tuples) that are related remains exactly the same.

[Note that I also never said one shouldn't take advantage of the RAD
facilities. One should do that in a client-side data access subsystem
where the RDB schema is relevant. But when customizing the problem
solution data structures one needs to provide the access to those data
structures anyway and one needs to do it efficiently through custom
tailoring.]

Your main argument seem to be that the performance of a relational
databases should be insufficient, compared to your pointer based
solution. There are probably many scenarios there a pointer-based
solution would perform O(1) and a relational O(log n). But history
(and your previous posts in this thread) has showed numerous
disadvantages with data management using pointers (network databases),
and the benefits with the relational model are obvious. These days,
nobody are proposing pointer-based databases, only when it comes to
data in RAM, some people are still proposing this. What if RAM isn't
big enough for the data needed in the processing? Recently I worked
with an (OO) application loading data from database at startup. I
guess I don't have to tell you the problems the applications faced.
One was a startup-time of +6 hours. Another was dirty reads.

Pointers are a red herring. They just provide a more efficient mechanism for object (tuple) identity than embedded identifiers. The situation is analogous to using consecutive integers for identity rather than, say, and ASCII name; it just enables more efficient access like arrays (i.e., a simple offset computation does not depend on the number of elements while an element-by-element search/compare does).

My main problem is with table-based searches. Table-based searches are a very general access mechanism that provides uniform access regardless of usage context (exactly what one wants in enterprise data storage). But much of the way the OO paradigm manages relationships is focused on eliminating searches by tailoring the solution structure to the specific problem. [The WHERE clause I used in an example is actually very rarely seen in OO applications; it is a kind of last resort when the requirements allow no more efficient alternative. And using a WHERE is very likely to get close scrutiny by OOA/D reviewers.]


A SQL database allows him to construct CRUD/USER applications quickly
because the only problem being solved is data view conversion. That is
exactly what the Form/Query/Table RAD paradigm is designed to largely
automate and it does a good job of it.

But when you have a problem to solve that requires complex processing of
the data, you have to optimize to the problem and provide custom data
structures. IOW, you are going to have to provide unique access because
the mappings are not the same as in the RDB schema. That's what
application design is all about outside of CRUD/USER contexts.

Are producing invoices, "complex processing"? There are scenarios of
data processing that are not supported very well by existing index
types in current mainstream (SQL) databases. But the major part of all
processing done in common business applications, perform very well
using SQL databases, without having any custom data structures. What
about all COBOL applications out there with extremly simple data
structures, relying heavily on SQL statements to do the job. Doesn't
they perform well enough?

In most situations producing invoices is pretty straight forward. Order + Lading Bill = Invoice. IOW, I would normally expect that to be CRUD/USER processing. That's why companies buy OTS Accounts Receivables packages rather than reinventing the wheel.

Your claim that a relational schema is designed for "data storage that
is independent of the applications", is debatable. When I design
schemas, I design them to fit the specific customer problem. Why
wouldn't I? If you look at the schemas for three different invocing
applications, you will find three different schemas. There doesn't
exists any application independent schema for invoices. It all depends
on the specific customer problem.
IMO, that is very bad practice for a DBA because it precludes reuse of
the data to solve other, different problems.

My databases solves problems for my applications. Between
applications, data are exported and imported.

So you live with the reuse problems and the need to keep revising schemas as new applications come online.


When one creates a Data Model, it should be based on the overall problem
domain rather than specific problems within the domain. If you do that,
then the same schema should be reusable by all the core accounting
applications.

One could also say that it only need to be one core accounting
application. The question is: How do you define the "overall problem",
and the "specific problem". You base almost all your argumentation on
such fuzzy terms, which makes everything derived from it rather fuzzy
too.

I didn't say overall problem; my use of 'problem' was as an adjective. Problem domains are quite different things from problems. THe point was that Data models should be constructed based on how the overall business enviroment works, not how individual problems are solved.



--
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
hsl@xxxxxxxxxxxxxxxxx
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development". Email
info@xxxxxxxxxxxxxxxxx for your copy.
Pathfinder is hiring: http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
.



Relevant Pages

  • Re: Business objects, subset of collection
    ... FOREACH invoice IN invoiceSet ... FOREACH payment IN paymentSet ... SQL databases sucks for searching large data sets, ... you have to optimize to the problem and provide custom data ...
    (comp.object)
  • Re: Business objects, subset of collection
    ... with amount due, payment total, etc. ... for the selected vendor ... I think one way to express my point is that the filter is the WHERE clause and one specifies the attributeto be checked in it. ...
    (comp.object)
  • Re: Business objects, subset of collection
    ... from invoice i join payment p on i.invoiceid=p.invoiceid ... locating an invoice. ... composite key based on the sum of all of the attributes and methods at ... A SQL database is very good at doing thing kind of decisions. ...
    (comp.object)
  • Re: Getting Data from 2 Queries into 1 Report
    ... the Payment in a one-to-many relationship. ... to something else that generates your actual invoice and that you may ... the Primary Key of the 'one' table. ... don't include any Lookup Fields! ...
    (microsoft.public.access.gettingstarted)
  • Re: credit card reciept response
    ... transaction is submitted for processing. ... to the consumer or a POST string to a site designated by the merchant. ... Here you accept the POST data, validate it, and mark the invoice ... DO NOT send any order details back to the payment ...
    (comp.lang.php)