Re: Transaction Oriented Architecture (TOA)
- From: "H. S. Lahman" <h.lahman@xxxxxxxxxxx>
- Date: Thu, 14 Dec 2006 20:03:40 GMT
Responding to Gagne...
The basic problem is that the RDB paradigm is designed for generic, ad hoc access of data. IOW, it is optimized to be independent of what problem is being solved with the data. Once one is outside the CRUD/USER arena, applications are optimized to solve very specific problems. That optimization involves different approaches to things like relationship navigation (e.g., object-level instantiation vs. table-level). So the views of the data should be /expected/ to be different. Therefore one solves the business problem first and then worries about how one maps that solution view into the DB view.
So you're agreeing the data structures are different in the application than in the DB? The biggest difference between what you say above and the article is that the article recommends starting with the DB, proving it correct, then developing the application.
That is where we differ. Solve the problem first and then figure out how to talk to the database. If one builds the application around the database view, one risks a suboptimal solution. [I would also argue that one risks maintainability problems simply because the DB view is necessarily static while the solution view needs to be both static and dynamic. But that is a more complex argument and we probably shouldn't go there.]
To put it another way, the database is there and there is nothing you can do about it. The problem in hand, though, is entirely under your control and you have an opportunity for creative design. Do you want the DBA to design your application instead of you?
I sense, perhaps incorrectly, a disparaging view of CRUD/USER applications. You comments in both this thread and the thread "Relational Database & OO" seem to indicate an opinion that CRUD/USER applications are too simple to be representative of sophisticated OO designs.
Not disparaging. CRUD/USER processing is a major segment of IT and it is going to be around for as long as /people/ analyze gobs of data through pattern recognition. But CRUD/USER processing is not a good application for the OO paradigm because there is really only one problem being solved: converting back and forth between the database and UI views of the data. IOW, CRUD/USER applications are pipelines between the DB and the UI while the software user is solving a particular problem externally. The UI and RDB paradigms are very well defined so that pipelining has already been largely automated in RAD IDEs and canned layered model infrastructures. [BTW, I also feel there is a lot of excellent design in the automation for CRUD/USER processing. Some very smart people put a lot of work into coming up with that stuff.]
IMO, though, OO is overkill in that context. The OO paradigm is focused on providing maintainability in large, complex applications. When most of the necessary software comes from third party automation, the solution is no longer large and complex. Thus, by its very nature RAD is reducing large, complex problems into smaller, simpler ones through automation. [That's not to say that one would not use the OO paradigm to, say, construct a RAD IDE. B-)]
Bottom line: I use CRUD/USER to simply identify a class of applications where I don't think the OO paradigm is very useful. (OTOH, I see basic application partitioning as being the main justification for separating the concerns of the problem solution from the concerns of persistence.)
In addition, a big advantage of encapsulating in a subsystem is large scale reuse. Once one abstracts the invariants of the DB paradigm de jour properly, then the subsystem can be reused across applications. Thus the RDB paradigm is abstracted exactly the same way regardless of the semantics of the data. That is, one encodes the invariants of the paradigm and leaves the details to data (e.g., mapping of table names, field names, etc.). To do that one just applies basic problem space abstraction to the RDB subject matter, so there it really costs very little extra to achieve that reuse.
Reuse is valuable and admirable, but the creation of an "encapsulating .. subsystem" may be unnecessary, and touting the benefits of reuse sounds like making virtue of necessity--except it may not be necessary in the first place.
I submit that encapsulation of the database mechanisms is always necessary except for trivial applications. It is basic separation of concerns. The problem solution doesn't care if the data is stored in flat files, an RDB, an OODB, or clay tablets. The problem solution should not have to know about mechanisms like SQL query construction, optimizations like anticipatory caches, or encoding/decoding of dataset formats.
Note that the CRUD/USER environments already provide exactly that encapsulation by providing a Data Layer that is isolated from the rest of the application through an interface. IOW, providing that encapsulation is a fundamental element of CRUD/USER structure. I'm just generalizing beyond CRUD/USER. It is just basic modularization that really doesn't have much to do with the OO paradigm. The OO paradigm only enters the picture through specific mechanisms for the interface like DAO and inheritance composition.
A database API is reused everywhere applications interface with DBs. The mapping of the DB (like VW Smalltalk's EXDI or Java's JDBC) is reused--it makes the DB's API more language-malleable, and that's reused everywhere--regardless the application domain (I'm not aware of many people that have created their own replacements for either EXDI or JDBC).
You are talking about reuse _of the database_ across applications. IOW, any application talks to the given database using the same interface. But it is a particular database paradigm (e.g., RDB vs. OODB vs. ISAM...) using particular computing environment technologies (JDBC, etc.). Thus that API is only reusable across applications if the same database and supporting technologies are available in a particular computing environment.
I submit that the problem solution should be exactly the same regardless of what computing environment one is in. For example, at the OOA/D level the design should be implementable without change on any platform with any storage mechanisms. That portability can only be achieved if one separates the persistence access concerns from the problem solution and decouples them through an interface. Only then can one substitute a new environment without touching the problem solution in any way
You seem to be advocating the creation of a layer between the application the the language-DB interface that maps the application domain objects to the DB--I'm guessing classes to tables, instances to rows, and fields to columns. The /infrastructure/ you're describing (I'm guessing) gives OO instances behaviors that allow them to either instantiate themselves from or /persist/ themselves to the DB. It is this behavior, the endowing of DB-awareness (or persistence--a rose is a rose...) into our classes and their instances, which is characteristic of the denial I was talking about. It is the premise of object databases and their lesser incarnations of object-relational products that the boundary between application and DB is better buried than exposed. The simple fact that it /can/ be done proves to many that it /should/ be done. This is a Jedi mind trick.
I am advocating inserting an interface that separates the problem solution from the persistence access mechanisms. There is nothing new in this. The existing RAD layered models already do exactly what I am advocating. Consider the classic model:
Presentation
--------------
Business
--------------
Data
In the RAD world the Data Layer actually has two pieces; one on the client side and the rest on the server side. That division just gets hidden because all the networking is hidden. At a minimum one has to link in a bunch of infrastructure modules into the application for the Business objects to be able to talk to the server. When the computing environment technologies change, one just links in a new set of infrastructure modules.
Now some layered models take this a step further and have more layers. In such models the client-side and server-side Data Layers are explicitly separated. In that case the client-side Data Layer has the responsibility for mundane tasks like forming SQL queries and encoding/decoding SQL datasets. What I am talking about is analogous to that layer that decouples the Business Layer through an interface so the Business objects don't have to know about the specific mechanisms.
(I use subsystems rather than layers simply because once one is out of the pipeline business the layers are more complicated and one needs to partition laterally as well as vertically.)
If I'm right in understanding what you're saying (and what countless vendors, analysts, and pundits sell, present, and report), this is exactly where our designs part company and where Transaction Oriented Processing (or TOA--but I dislike the word architecture when used with software) proposes an alternative, thinner, and simpler model.
It's thinner and simpler because it doesn't separate concerns. Failing to separate concerns means that one must touch the solution logic whenever something changes in the persistence realm and that is a potential maintainability problem.
[Though I would bet that the total executable code size will be less if one does encapsulate the persistence access concerns because of encoding invariants. That's because the same executable code works for all queries. So the more tables and joins needed, the more embedded code is littered throughout the solution if one does queries on a context-by-context basis.]
One needs to design the interface to the problem solution's needs. Thus it is the interface that needs to be replaced during large scale reuse (think: Facade pattern). To be successful the interface needs to be a pure message-based interface so that each side can map the message ID and data packet into its own unique view. That, in turn, means a consistent mapping of identity on each side of the interface.
The belief that there needs to be symmetrical mappings on either side of the interface assumes their needs to be a mapping in the first place. That's surprisingly similar to petitio principii, the fallacy of assumption more commonly known as "begging the question." TOA/TOP proposes (and I know I haven't gotten that far in the article) the database and its application domain stored procedures are the only persistence mechanism necessary, and that the benefits of a focused, single, data-permeable gateway between application and database far exceed the benefits of O/R mappings--regardless of abstraction--and that its lightweight appearance shouldn't be dismissed as missing heavyweight kick.
Sorry, but I don't follow this. What other persistence mechanism do you think is necessary in the approach I advocate? All I am doing is decoupling the problem solution from the persistence mechanism de jour.
Fortunately, that is usually easy for data, especially for paradigms like the RDM that are designed to provide generic identity. Half the work is done if the DBMS schema is available to the application. So one just needs a bunch of table lookups to map data packet elements into Table/Field identifiers to construct SQL strings. Those lookup tables get defined from external configuration data that maps the interface messages into the RDB schema.
The price of this is encode/decode of message data packets on each side of the interface. For a SQL RDB, that effectively means duplicated dataset encode/decode. Fortunately DB access is in milliseconds while encoding/decoding data packets is in microseconds so nobody is likely to notice anything except the developer's extra keystrokes, which don't count in the overall scheme of things.
It may be early in the AM (for me) but I'm not following what your talking about above.
Which paragraph is the problem? (I don't want to elaborate on both if only one is a problem.)
When there actually is a 1:1 mapping between the solution and DB views, that message indirection is redundant. But that decoupling pays huge bonuses when the mapping is not 1:1, when the mapping changes in the future, and for large scale reuse across applications. The thing is that once one is outside the CRUD/USER realm when solving some customer problem, one can't know when the mapping will be 1:1 or when it might change.
So one should solve the customer problem without worrying about what the DB looks like. If some objects happen to map 1:1 that is transparent because the interface is designed to the solution's data needs (e.g., "Save this pile of data I call 'X' and give it back to me when I ask for 'X' later"). IOW, one should be able to solve the customer problem and design the DB access interface without knowing or caring how the stored data is ultimately organized.
But the customer problem can be solved without using OO. It can be solved in FORTRAN, COBOL, C, LISP, or JavaWOO (much of the Java I've seen isn't very OO at all so I wanted to include it in my list of non-OO languages--Java Without Object Orientation). The customer's problem and the DB are constants. Only languages and idioms lie between and may be interchangeable. In fact, if the DB domain requirements are consistent (its domain-API expressed through procedures) then suitable applications can be written in multiple languages without compromising the DB's design.
The first sentence is why I don't think encapsulating persistence access in a subsystem is an OO issue. OO just provides useful conceptual terminology like 'encapsulation' and 'decoupling'. It is basic modularization and I've done the same thing in FORTRAN and PL/I long before OO. [Tougher to do in COBOL because the notion of 'record' pretty much married data structures to the database view. B-)]
For the rest, the customer's problem requirements don't say anything about persistence mechanisms; they just define what data needs to be persisted, when it should be stored, and what the data integrity rules are. Persistence paradigms, technologies, and mechanisms are pure computing space issues and those <nonfunctional> requirements are defined by Systems Engineering. Which is another reason for separating the concerns. B-) Similarly, the Data Modeling the DBA uses to define RDB schemas is an exercise in database design that is quite distinct from a particular application's problem solution.
At another level, I don't see how you can advise not embedding SQL in the problem solution without providing the indirection of a message-based interface.
Use stored procedures. No mapping necessary.
Alas, this is a Major Hot Button for me. I think stored procedures are one of the most abused mechanisms in IT. They are a maintenance nightmare if they are triggered by the DBMS or call one another. (I assume in your case they are triggered only by the application, but I think there are still potential problems.)
It depends on what is in your stored procedures. If they are devoted solely to accessing the database, then fine. The stored procedures are effectively providing a generic API to the persistence mechanisms. IOW, when your application invokes the stored procedure via a method call, it is just sending a message to the Data Layer (or my DB access subsystem).
However, I think that is a slippery slope because it is tough to provide an interface at that level of abstraction that does not reflect the DB view of data. That is, it will be very tempting to make the granularity of the interface map into individual queries or table accesses. As soon as the interface maps the DB view rather than the solution's needs, I think one gets into trouble with the DB driving the solution.
OTOH, if you define a stored procedure like getDataPileX where the actual data collection returned can come from an arbitrary number of tables (i.e., the stored procedure defines the necessary join) and the returned values can be distributed to an arbitrary set of objects' attributes on the solution side, then that is pretty much what I am advocating. That is, the interface is designed to the problem solution's needs.
However, when each stored procedure accesses the DB directly I still think it would sacrifice benefits. One is decoding the returned values. If they come back as a predefined server dataset, then one is not fully decoupled from the mechanisms because one must decode that particular format _within the problem solution_.
Another problem I think would be redundancy. If each stored procedure constructs its own unique joins and SQL queries, one is repeating a lot of low level processing. If that same interface were to a subsystem, then one would only need one code set in the subsystem implementation to process any query or join. IOW, at the subject matter's level of abstraction, all queries and joins are processed the same way through the same executable statements
Yet another problem is global optimization. With stored procedures at the getDataPileX level that talked directly to the DB, it would be tougher to provide global optimizations like anticipatory caching. But that is relative easy to do within a subsystem where one has local state variables and other infrastructures.
It will also be tempting to bend the solution around those surrogates, which can lead to a less optimal and maintainable solution. That is, if one has a preconceived notion of what the data looks like, one will be tempted to build the solution around that view. So the generic, ad hoc tail is wagging the solution dog.
But preconceived notions exist on both sides of the interface. As the saying (almost) goes, two preconceived notions don't make a right (notion). The database must be language neutral--but it can't be domain neutral. If it were domain neutral then we'd be implying its design didn't support the domain--which would defeat the purpose, no?
I think this comes down to the difference between a customer business domain and a problem space.
The views on each side of the interface are tailored to their specific subject matters. IOW, one is abstracting solutions from different problem spaces. Each subject matter has its own unique functional requirements that do not overlap. Those solutions need to be abstracted differently to provide proper optimization in each problem space.
While the problem solution and the data model both have their roots in the same customer domain, they are quite different. The data model is restricted to be a generic static view that is suitable for ad hoc access (i.e., problem-independent access). The problem solution is inherently dynamic in nature and needs to be tailored to a specific problem context.
IOW, I think maintainability will be maximized when the subject matter concerns are clearly decoupled. The strongest way to do that is by encapsulating the DB access in a subsystem behind a pure message-based interface.
If you believe message-based interfaces are valuable, I propose TOA/TOP is a more faithful realization of it.
Message-based interfaces are a necessary condition for decoupling subject matters. I think the real issue here is separating subject matters in the first place.
*************
There is nothing wrong with me that could
not be cured by a capful of Drano.
H. S. Lahman
hsl@xxxxxxxxxxxxxxxxx
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development". Email
info@xxxxxxxxxxxxxxxxx for your copy.
Pathfinder is hiring: http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
.
- Follow-Ups:
- Re: Transaction Oriented Architecture (TOA)
- From: Thomas Gagne
- Re: Transaction Oriented Architecture (TOA)
- From: Thomas Gagne
- Re: Transaction Oriented Architecture (TOA)
- From: topmind
- Re: Transaction Oriented Architecture (TOA)
- References:
- Transaction Oriented Architecture (TOA)
- From: Thomas Gagne
- Re: Transaction Oriented Architecture (TOA)
- From: H. S. Lahman
- Re: Transaction Oriented Architecture (TOA)
- From: Thomas Gagne
- Re: Transaction Oriented Architecture (TOA)
- From: H. S. Lahman
- Re: Transaction Oriented Architecture (TOA)
- From: Thomas Gagne
- Transaction Oriented Architecture (TOA)
- Prev by Date: Re: Transaction Oriented Architecture (TOA)
- Next by Date: Re: Transaction Oriented Architecture (TOA)
- Previous by thread: Re: Transaction Oriented Architecture (TOA)
- Next by thread: Re: Transaction Oriented Architecture (TOA)
- Index(es):