Re: Searching OO Associations with RDBMS Persistence Models



Responding to Richie...

Give me all companies that start with "Micro" in Seattle, Washington

Class Diagram Below (The DB Diagram is also the same, where Location
has a companyID referencing a Company)
+---------+
| Company |
+---------+
|1
|
|1..*
+----------+
| Location |
+----------+


"Company is located at location X. Company has at least one location."
"Location belongs to Company X"

What you really have here at the OOA/D level is:

[Client]
| 1
|
| R1
|
| gets locations from
| *
[Company]
| 1
|
| R2
|
| has
| *
[Location]

and the Client needs access to locations for a particular Company. To do that the Client must navigate the R1 -> R2 relationship path. [I am clarifying this here because relationships are crucial to the answer to your question.]



The company class implements the association through

Class Company
{
String companyName;
Set locations;

public void addLocation(location)
{};
}

What language is this? In particular, is Set a reference or an embedded object?

In this case I would argue that it should be a reference. The addLocation method is really a responsibility of Set, not Company. Any client should collaborate directly with Set to maintain the relationship.

Class Company
{
String companyName;
Set* myLocations; // C++ style reference

public Set* getCompanyLocationSet()
}

This is better but not good for reasons I will get to below.

Here is where relationships come into the picture. The key thing to understand is that relationships are really orthogonal to class semantics. (In fact, commercial code generators for UML models treat them like aspects.) So one wants to provide decoupling by treating relationships separately from the class semantics and routine collaborations needed to solve the problem in hand.

There are three aspects to relationships: implementation, instantiation, and navigation. Here Set* represents a particular implementation of the relationship at OOP time (one of several possible) through a collection class and a reference. Thus at OOP time we have expanded the OOA/D model to:

[Client]
| 1
|
| R1
|
| gets locations from
| *
[Company]
| 1
| organizes locations for
|
| R3
|
| 1
[Set]
| 1
|
| R2
|
| collects
| *
[Location]

The relationship is instantiated in a complex fashion requiring multiple steps. It starts when a [Set] instance is created for the Company in hand and the R3 relationship is instantiated by setting the myLocations pointer. It continues as [Location] objects are added to that collection.

The navigation of the relationship path is now more complicated. A Client object must traverse R1 -> R3 -> R2. To do that the Client must obtain the R3 collection reference from the Company in hand via getCompanyCollectionSet. It then navigates R2 by asking [Set] to provide a <new and temporary> collection of [Location] references that satisfy the selection criteria (starting with "Micro"). So we really have is:

1 current locations for
[Client] --------------------------+
| 1 |
| |
| R1 |
| |
| gets locations from |
| * |
[Company] |
| 1 |
| organizes locations for | R4
| |
| R3 |
| |
| 1 | 0..1
[Set] [Set]
| 1 | 1
| |
| R2 |
| | R5
| collects |
| * |
[Location] ------------------------+
* collects

where the Set for R3 instantiates the Set for R4 and its relationships. Typically, that will be implemented and instantiated by simply having [Set] return a Set* to the Client when asked to select locations.

Now this is all quite tedious to draw in UML so we would only have the first diagram and the rest would be done at OOP time directly in the code. However, it is important to understand what is really going on. In particular, that [Set] becomes a peer class at OOP time to [Client], [Company], and [Location]. That allows us to decouple the relationship path semantics from the class semantics. Why is that important?

Suppose during future maintenance one decides to collect locations into sales regions. Now we have:

[Client]
| 1
|
| R1
|
| gets locations from
| *
[Company]
| 1
| organizes regions for
|
| R3
|
| 1
[Set]
| 1
|
| R6
|
| collects
| *
[Region]
| 1
| organizes locations for
|
| R7
|
| 1
[Set]
| 1
|
| R2
|
| collects
| *
[Location]

Since [Client] is still collaborating on a peer-to-peer basis with [Location], all we need to change is the way the path is navigated. In principle, that should be trivial to do as all we have to add is the navigation of the region set. However, when we do that we discover that Company::getCompanyLocationSet is not properly named. It should really be Company::getCompanyRegionSet now.

The problem here is that we have hard-wired the relationship organization into the semantics of the Company class and that is reflected in the name of the interface method. This example is pretty trivial and it would be no big deal to fix. One reason it would be no big deal is because the various [Set] collections are not hard-wired into the implementation of [Company] and [region]. In principle we do not need to touch the implementation of [Company] semantics; we just need to reassign pointers.

This kind of problem with the name can be manifested in very subtle ways in more complex situations. For example, suppose the maintainer decided to leave getCompanyRegionSet and have it do the navigation R6 -> R7 -> R2. This might be justified because it "hides" the complexity of regions and locations from [Client]. The problem is that it trashes the cohesion of [Company]. That's because now we have hard-wired that navigation into the implementation of the [Company] method. So if the path changes again we will have to go into the implementation of that method in [Company] to fix things up. But [Company] really shouldn't know or care anything about what is going on 2-3 objects away.

So what one really wants is something like Company::getR1SetReference (or something similarly generic to reflect Client's overall goal) to access the proper [Set] for that step of the navigation. Now one doesn't need to touch anything in the [Company] implementation when the navigation path changes. Now the only place where the new organization needs code changes is in [Client] where the the original collaboration is initiated. And even there, it is essentially orthogonal to [Client] semantics and is treated as an idiom.

However, there is another reason one wants to make relationship navigation orthogonal to class semantics, which segues to...





******The part that gets me is this.

The typical search in this scenario is

Give me all companies that start with "Micro" in Seattle, Washington

This means that the search is across two classes - Company AND Location

************************

In the database world, this is a simple join across two tables.

*************************

OO applications solve particular customer problems while RDBs are designed to provide persistence access that is independent of why the data is needed. As a result one employs quite different paradigms for dealing with relationships. So there are no joins in OO applications. Instead one "walks" individual links in relationship paths. Why?

One answer is performance. Unlike RDBs, where relationships are defined and instantiated at the Table level for all tuples of the table, OO relationships are defined at the object level (tuple, if you will). That means that when searching for "Micro" you will have a smaller set for any given Company to search.

In fact, the OO paradigm for instantiating relationships often allows one to avoid searches altogether. That's because the solution is tailored to a particular problem. So if the only subset of [Location] that is ever of interest to the problem solution is those locations starting with "Micro", one would create a separate collection for those entries that begin with "Micro". When needed one would have only those locations available for collaboration. That collection could easily be created by testing each member of Set::add to see if if began with "micro" and dumping it in the subset collection. Then no search is needed at all. This sort of instantiation is ubiquitous in OO applications so that one very rarely ever sees a FIND WHERE construct. Again, that is only possible because the application is tailored to a particular problem.

However, the most important reason why one wants to manage relationships as binary steps in a path is for managing state variables. By limiting access to objects that can be reached to those that are related to a particular object, one greatly reduces access to the state variables (knowledge attributes) in the application. In effect, instantiating relationships at the object level is how the OO paradigm eliminates the problems of global data from procedural development. One effectively provides static structure to enforce business rules and policies about data access.

One can't do that very well in an RDB because the constraints one wants to enforce are usually limited to particular problems and that would compromise the generic access that is crucial to RDB persistence.

The price one pays for the decoupling of relationships and using static structure to enforce business rules on access is that one needs to explicitly write the code to navigate the relationships paths piecemeal. Fortunately, doing that is essentially idiomatic once one is used to it, so experienced developers rarely even think about it. [It also makes writing code generators for UML models much easier. B-)]


I've spent several hours on the lists, and have read a number of posts
that reference the same issue, but came across none that answered it
practical terms.

So I'd like to preface the question by saying that if the only
persistence model available to use was an RDBMS and OO was the only
design methodology. Given that, what is the best methodology for
solving the problem.

What is the correct correlation of this in a Object Oriented World.

1) Do I create another class called CompanyLocations that does the
search and creates the objects as needed. This doesn't work too well in
my opinion as the relationship does extend further to Locations have
Employees...would this mean that I would need a CompanyLocationEmployee
Class for searches?

Possibly. It depends on how important finding locations is to the overall problem, how likely it is that things may change in the future, and how complicated the search criteria is. If it is important, complex, and/or the paths may be volatile, then a separate object to navigate the paths may be justified on the basis of encapsulating important and complex business rules.

For example, a common situation is:

[Client]
| *
|
| R1
|
| accesses
| 1
[Tree]
| 0..1
|
| R2
|
| rooted at
| 1 0..1 child of
[Node] -------------------+
| 0..* |
| parent of |
| | R3
+----------------------+

[Node] and its relationships represent a classic model of any hierarchical tree fanning out from a root node. For various reasons it may be desirable to hide the navigation rules (e.g., binary tree vs., B-Tree) from [Client]. So one introduces [Tree] to do the grunt work of navigating the relationships.

However, I suspect here your answer would lie in providing collection classes that were peers of [Company] and [Location] at the OOP level.



2) Do I create a method search() in COmpany that returns all Companies
that start with "Micro" and then loop through these to search for
locations that match Seattle, WA?

Collection classes make ideal repositories for such searches. That is, the search will usually depend upon the organization of the collection, so the collection itself if the logical "owner" of such algorithms.


3) Do I fudge the Object Orientation, in Company and create a
search(companyName, city, state) method that searches across the
tables? And creates the necessary location instances as well?

No. [Company] should know as little about [Locations] and how they are organized as possible to reduce implementation dependencies. It may need to collaborate with [Location], but that will be within the context of its own responsibilities, not [Client]'s.


4) Am I making a mountain out of a mole hill?

Not really. One could argue that relationships are one of the most fundamental distinguishing characteristics of the OO paradigm. Your questions strike at the heart of issues like cohesion, decoupling, and encapsulation. (Which why I belabored the basics so much.)


*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
hsl@xxxxxxxxxxxxxxxxx
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
(888)OOA-PATH



.



Relevant Pages

  • Re: 3-ary relationship and association class
    ... >>instantiation, and navigation. ... >>RDB approach but it is very rarely used in OOP for performance reasons. ... >>participant to the other participant in the relationship. ... navigation is bi-directional because one has to have a reference to the ...
    (comp.object)
  • Re: VS2008 creates a 2nd endpoint when actualising a web reference
    ... you are developing an WCF service and a client through VS 2008. ... The service used to use wshttpbinding, and when you switch it to netTcpBinding and regenerate the service reference at client, you found the client generate two configuration section for the service endpoint, correct? ... We welcome your comments and suggestions about how we can improve the support we provide to you. ...
    (microsoft.public.dotnet.framework.webservices)
  • Re: Help on remoting clarification ???
    ... IPurchaseOrderService on the client side that handles getting the ... object reference fromt he remote service, ... They get a reference of the remote object by using the ... Lets say i want to create a purchase order application where the ...
    (microsoft.public.dotnet.framework.remoting)
  • Re: Inheriting Consumed WebService Class
    ... > your web reference, your proxy class is generated from the WSDL's XML ... > and sending it across the wire back you client. ...
    (microsoft.public.dotnet.framework.webservices)
  • Re: How to I access a logging class from any other class.
    ... where the Client is instantiating the R2 object when it passes a reference to Service2 to Service1. ... participation in the R2 relationship is a personal matter between and [Service2]. ... But for Client to pass the /right/ instance of, the Client must understand the rules and policies related to defining participation in the R2 relationship. ... Singleton essentially disguises the notion of a global instance as a local implementation instance. ...
    (comp.object)