Re: Help! Difficulty understanding DB -> Object mapping
- From: "H. S. Lahman" <h.lahman@xxxxxxxxxxx>
- Date: Mon, 09 May 2005 18:25:41 GMT
Responding to Usenet.news.account...
data structure in the DB:
Order ===== Order_ID Customer_ID Customer_Address_ID ...
OrderLineItem ============= Order_ID Product_ID Qty Price Tax TotalPrice
I have a number of questions. Is it normal to:
The first thing to do is forget about what the RDB looks like. That is only relevant to accessing persistence. Solve the customer's problem first and then worry about how objects in that solution will map into tables in the RDB.
1. Store the full chosen billing and shipping address with the Order, though the addresses are stored in a separate table?
The RDB schema you suggest above violates 3NF. The customer address
depends on the Customer identity, not the Order identity. If addresses are stored in a separate table, then the Customer_Address_ID should be in the Customer table, not the Order table.
Asked my boss about this. Billing address details are stored against the customer order, in case the customer deletes addresses from their account. Reviewing old orders would still require the address to be shown, and so linking would not be validity. From other mailings it would seem maybe a Order to OrderShipping table could be used, to remove the duplication (Order_Id, Customer_ID, Customer_Address_ID)?
It still violates 3NF. I also don't buy the address-deleted argument. Presumably the address is deleted because it is no longer valid, so why would one want to use that rather that the address currently associated with the Customer?
[However, one could make a good case for a Shipping Address being associated with the Order because the Customer might want the Order shipped to some place special. But the implication here was that this was the Customer's business address.]
Note that reviewing orders is not relevant. Typically one needs more information to provide context for such reviews so one will be accessing the RDB using table joins anyway. The issue here is how to store the data in the RDB for general access unambiguously and in a manner best suited to ensuring data integrity, not how to display it for special purposes.
However, this illustrates why one should solve the software problem
first. One can conceive of contexts where Customer and Address are not separate objects. That is, if one processes one order at a time in a POS client, then the customer name and address could be unique to the Order and both are just attributes of that one entity. That's because the fact that the customer may have 14 different shipping addresses in 10 states doesn't matter; the POS system only "sees" one shipping
address for the order.
This is eCommerce site using object orientation.
OK. But the number of objects still depends on how one needs to abstract the notion of Order for the particular business context. The problem in hand may only need a simplistic view of Order data compared to the view of an enterprise database. If the problem solution never has any need to think of "a customer" as anything more than a name and address on an Order, then one does not need to abstract a Customer entity; there is no ambiguity if the name and address are regarded as fully dependent on the Order_ID. That's because at that level of abstraction they are.
[Or it may need a more complex view. A telephone number is a classic example. Most DBAs will define the level of abstraction of the RDB schema so that a telephone number is a simple data domain for 1NF. However, some applications need to view the number in terms of {area code, exchange, number}. In that case the number is not a simple domain and must be expressed as multiple attributes in the application.]
2. Store the full product details with a Line Item? I currently do not store the title of the product, just its id.
In the RDB there will undoubtedly be a separate table for Product.
That's because things like product descriptions are unique to the
Product, not the OrderLineItem. Some Product attributes, like
description, would appear redundantly in OrderLineItems from different Orders while other Product attributes, like screw thread size, might be unique to a particular Product or class of products and wouldn't appear at all in some OrderLineItems.
Yes, this is the case. The products are simple, and there are no class of products in this system. If the order references a product whose name changes, the old orders need to show the actual name used when ordered. The Version concept by Fredrick sounds good do you think?
I think Bertilsson was addressing a different problem. If things like the Product description change over time AND you have a need to keep track of that when referring the Orders, then some kind of versioning will be necessary. However, that seems unusual; usually such basic changes would be reflected in a new Product_ID with some linkage for equivalent Products that may be substituted when fulfilling an Order. That is one would have
Product: Product_ID Description .... Equivalent_Product_ID // or Obsolete_Product_Id or whatever
That is, one would have a conditional reflexive relationship on the Product table to find equivalents, replacements, etc.
In your context here the correct RDB approach is to just provide the foreign key for Product_ID in the OrderLineItem table. But apropos of my original point, in the /application/ one may need the joined view with information from both tables. Whether one expresses that in the application with separate, related objects or a single object reflecting the join depends on what problem one is trying to solve.
OTOH, in the client application there might be good reason for having at least some of the Product information in OrderLineItem. For example, for display in a Shopping Cart one would want the details and they would be associated uniquely with the OrderLineItem. That's OK because the Shopping Cart abstraction <usually> constrains the problem so that there would be only one OrderLineItem per Product (i.e., multiples are handled solely through Quantity). IOW, {Order_ID, Product_ID} is equivalent to {Order_ID, Order_Item_ID}.
My Cart has CartItem objects. CartItem has Product object, and quantity. This is same but different problem. Viewing old orders, I must produce product related data for the order. If I begin believnig I dont need to store the product items in the order, I must hook up OrderLineItem objects to Product objects, and this is now the essence of my posting earlier. For performance, how do I do this as best as possible using Objects?
This is the problem Bertilsson was addressing. Once again, the way one handles it (e.g., versioning vs. reflexive relationship) in the RDB and the way one handles it in the application may be different. The RDB view will be created <hopefully> to optimize the RDB data storage and general purpose access. The application view, though, will be created to optimize the solution to the problem in hand. One must then provide a mapping between them that allows the RDB to accessed efficiently, given its view. Solve the application problem first, then worry about the mapping, and finally worry about efficient RDB access from the subsystem.
Note that your application may not need to know anything about changes to the database. From its viewpoint one may only want to display what the customer /originally/ ordered. Then it would be up to the RDB access subsystem to properly access the RDB to obtain the information, regardless of how the RDB actually stored it.
OTOH, it seems more likely that one needs both the original and current views because, in effect, a substitution has been made from the customer's viewpoint and that is probably of interest when communicating with the customer. In that case the application will have some way to organize both views _that is convenient for the processing in hand_. The job of the DB access subsystem is to map those views into whatever views the DBA decided to use in the RDB schema.
If I store all of these details in the DB structure, I must still load them into objects. My problem currently is in loading a customers
order. I dont know where to join the product in the line items to the actual product descriptions. Do I do a DB join to get the ProductTitle back, and put the product title in my OrderLineItem object? Or do I get back the order line items, and do a secondary search for the product based on its ID, load the product, and associate teh product to the
order obejct? I'm confused, and concerned about performances problem.
Deal with persistence in a subsystem or layer that understands the
mapping rules (i.e., the mapping of object/attribute identity to
table/field identity). That allows you to address performance problems in RDB access in that subsystem or layer so that it does not affect your problem solution logic.
OK..Fine. The defaulting of this today is that datasets come back mixed with OrderLineItem info. and Product info. The ProductMapper is requested to return the Product object for the Order, but the fearing I have is that ProductMapper gets a resultset full of OrdeRLineItem and Product data, and the SQL query for Orders has to know all about Products. I realized that I change the Product table, I need to change Order related queries! Does this at all sonud fine?
Optimizing things like SQL queries depends upon the specific RDB schemas and the way the particular applications needs to access data. (The design issues for this could occupy an entire college-level course.) So there is no single magic answer. However, one can go a long way by realizing a few simple things:
(1) Fewer DB accesses are better than many.
(2) Virtual memory is cheap.
(3) DB access time will usually be the major bottleneck in the application unless it is a scientific application or does dazzling graphics.
(4) DBAs hate long transactions (multiple reads/writes while locking the data) because they tie up resources.
(5) Accesses using joins should be minimized.
Basically what this means is that one should grab as much data as one is /likely/ to need all at once even if sometimes it isn't all used. The bottleneck is getting it into memory as datasets; extracting particular information from the datasets in memory is the fast part. Prefer single complex queries to multiple simple queries. If possible, create specialized indices and store "compiled" joins for queries that are commonly used. When mapping to the problem solution's needs when performance is a big problem, look for ways to use write caching or anticipatory reads. Cache requests rather than opening long transactions whenever possible, especially if the data is from user keyboard entry.
The interface to the subsystem is designed around the problem solution's view of the data. So if the problem solution has a single object for Order with attributes for {customer_address, customer_city, etc.) it
would request, say, saving that data via saveOrder (order_ID,
customer_name, customer_address, ....) with all the attributes in the Order object. The persistence access subsystem needs to map that notion of Order into the actual tables in the RDB.
Objects in upper system, other data (and IDs) below this system? What I passion for is not to return to the DB more than once, even though other data might be needed in this lower system. How does this work?
What I am talking about here is a variation on the layered models used for RAD processing. One deals with accessing the RDB in a dedicated layer or subsystem that is encapsulated behind a generic data transfer interface. One way to think of the interface is that it is a GoF Facade pattern. That interface class takes the problem solution's requests and dispatches them to the objects in the subsystem/layer that model the RDB view.
To do that the persistence access subsystem needs to know more about
identity. For example, it needs a Customer_ID. One way to do that is via a table lookup in the persistence access subsystem between
customer_name and customer_ID. Such a table could be created
efficiently at startup with a single RDB query. That would save asking the RDB for the customer_ID given the customer_name just so that one can write a SQL query using customer_ID.
In this subsystem are you in the DB or in a lower part of the OO side of the design? Confusion.
The subsystem is part of the application. (However, when done properly such subsystems are often reusable across applications.) However, its subject matter is the RDB paradigm rather than the business paradigm being applied in the rest of the application. So while the problem solution in the rest of the application abstracts in terms of Customer, Order, and Product objects, the RDB access subject matter abstracts in terms of Table, Tuple, and Dataset objects. IOW, though the subsystem lives in the application, its mission in life is to understand the RDB paradigm and efficiently map the application's business semantics into that view.
The notion of "lower" is not the same as in the RAD layered models. Here it really reflects a level of service. The RDB access subsystem provides a low-level service for the problem solution: talking to the database. Generally such subsystems will be at the same level of abstraction as UI subsystems, whose mission is the low-level service of talking to the user through the hardware. So in a RAD layered model the UI and the RDB would be on opposite ends of the model, for more complex applications that are on roughly the same level. Both encapsulate very unique, low level computing space paradigms (e.g., GUI/browser/smoke signals or RDB/OODB/flat files) so that the actual mechanisms are completely transparent to the application solution.
Alternatively, if the customer data usually comes from the RDB
originally (e.g., the RDB is read to get get the default customer
information to populate the form), then it might be more convenient for the Order object to store customer_ID when the data is read. In that case it is a handle just like a Window handle in a GUI -- just something one hands back to the persistence access subsystem to keep identity
synchronized.
Order is linked to Account. Order goes down into OrderMapper and OrderMapper knows Account.ID is the customers ID to be used for the table linkaging.
Alas, OrderMapper sounds suspicious to me. It seems to be some sort of god object that coordinates other objects. If so, that is not a very OO way to do things. Object should abstract intrinsic knowledge and behaviors from identifiable problem space (business) entities. One then connects the dots for the overall problem solution by connecting those responsibilities with peer-to-peer messages.
OrderMapper sounds like a role that a person would play if there was no software rather than an identifiable business entity. But the software is replacing the person by distributing what the person does among the objects we abstract for the solution. When we abstract business entities that are inanimate we anthropomorphize them with the behaviors a person would have executed.
If we simply create an object to do everything the person would do we concentrate the entire solution in that object. A major goal of OO development is to manage complexity by distributing and encapsulating the behaviors across many objects.
OTOH, the Facade pattern I suggested above does a very similar thing as what you describe. The Facade would provide the mapping that knows that an Account_ID in the Order is mapped into a Customer_ID in the Customer table of the RDB when doing joins and whatnot. That is, the Facade provides a mapping between identity in the application solution semantics and the RDB semantics. So if that is all OrderMapper is doing, then I applaud it. B-)
************* There is nothing wrong with me that could not be cured by a capful of Drano.
H. S. Lahman hsl@xxxxxxxxxxxxxxxxx Pathfinder Solutions -- Put MDA to Work http://www.pathfindermda.com blog: http://pathfinderpeople.blogs.com/hslahman (888)OOA-PATH
.
- Follow-Ups:
- Re: Help! Difficulty understanding DB -> Object mapping
- From: Daniel Parker
- Re: Help! Difficulty understanding DB -> Object mapping
- References:
- Help! Difficulty understanding DB -> Object mapping
- From: usenet . news . account
- Re: Help! Difficulty understanding DB -> Object mapping
- From: H. S. Lahman
- Re: Help! Difficulty understanding DB -> Object mapping
- From: usenet . news . account
- Help! Difficulty understanding DB -> Object mapping
- Prev by Date: Re: Rose & Code generation in C++
- Next by Date: Re: Lahman, how ya doing?
- Previous by thread: Re: Help! Difficulty understanding DB -> Object mapping
- Next by thread: Re: Help! Difficulty understanding DB -> Object mapping
- Index(es):