Re: Passing by reference



Responding to McGill...

You have raised some interesting issues. So grab a six-pack and get your feet up on the desk...

In your list of the degrees of coupling, you mentioned
data-by-reference arguments as being distinct from object-by-reference
arguments. What distinguishes the two? I am by no means an OO expert,
but when I read about Alan Kay's original vision of Smalltalk, for
example, I don't see where the concept of 'data' fits into the picture.
My understanding is that even the integers were envisioned as instances
of objects, like tiny computers, sending and receiving messages. That
the basic data types in Java exist was, I thought, essentially a
performance-driven compromise. Or do you mean data as in 'value type,'
like Java's String, Integer, etc.? If so, what allows us to treat these
objects any differently from other objects in our design? (I don't mean
to suggest that we *should* treat them the same, I just don't
understand why an immutable object is all *that* different from a
mutable one)

The problem here is that OOA/D has come a long way since Kay & Co. I am still amazed, though, that Kay got things pretty much right with the first OOPL out of the box. A lot of really bad OOPLs have followed that didn't get things right. Smalltalk is still the best OOPL I know of for first using the OO paradigm because it does a much better job of hiding the compromises with the hardware computational models that are inherent in all 3GLs.

The modern OO paradigm itself is really defined by the /way/ one constructs software rather than the mechanisms used in construction. That's why we have an entirely different 4GL notation for OOA/D in UML. In addition, the paradigm distinguishes different levels of construction:

OOA: a solution for only functional requirements that is independent of any specific computing environment.

OOD: an elaboration of OOA to deal with nonfunctional requirements at a strategic level.

OOP: an elaboration of OOD to deal tactically with a specific computing environment (3GL, technologies, infrastructures, etc.).

While there are processes for OO development that operate almost exclusively at the OOP level, the developer is still expected to be able to apply OOA/D when abstracting the problem space. In addition, OOA/D is distilled down in the form of various programming guidelines, such as Fowler's "Refactoring" and Martin's Principles of Dependency Management. So the OOA/D is still there.

When I am talking about things like relationship navigation, separating message and method, and separating the concerns of instantiation and collaboration I am really talking about OOA. That happens to be implemented at the 3GL level through mechanisms like pointers as referential attributes and separate setters for those attributes in the object's type interface. (Why there are separate setters I'll get to in awhile.)

The need to do that is obvious at the OOA/D level because the abstract action languages that describe dynamics have dedicated constructs for doing things like instantiating relationships. So one needs to go out of one's way to combine instantiation with collaboration. That is less obvious at the OOP level because the OOPL type systems combine message and method through the method signature and because they usually do not have dedicated mechanisms for relationship instantiation. (In part that is because of another compromise; referential attributes are data stores in the hardware computational models and they were accessed directly in the early OOPLs.)

My point here is that the notion of an integer being an object (i.e., a fundamental type) is a pure OOPL designer issue. That is a different realm from the OO software construction paradigm. It only exists because the 3GLs employ type systems and OOPLs employ very sophisticated type systems. It is simply a conceptual view that makes it easier to implement the OOPL type system itself.

When one is doing OOP one is really at a higher level of abstraction. It really doesn't matter if the language designer thinks integers are objects. The OO software developer is essentially mapping the OOA/D into the suite of syntactic mechanisms that the OOPL provides. The fact that the OOPL designer needed to have a unique mapping in mind to provide a consistent suite of syntactic elements is not really relevant. IOW, the OOP developer doesn't really care how the language designer came up with a suite of consistent syntactic elements; the developer just cares that they are consistent enough to allow an unambiguous mapping from the OOA/D to OOPL code.

Thus my main point here is not to get hung up on how OOPLs implement OO features. Today programming in OOPLs is not a good way to learn what the OO paradigm is about. That's because the OOPLs have to make substantial compromises with the hardware computational models, which are inherently procedural in nature. IOW, the OO paradigm has come a long way since Kay & Co. and the OO developer needs to understand OOA/D and how that maps into OOPL constructs. [Technically, not even the mapping. Any OOA/D solution can be mapped directly into a language like C. Full code generators for OOA models do this routinely for performance reasons because the code does not need to be maintained manually.]

So we come the the notion of what an object is at the OOA/D level. It is an abstraction of some identifiable problem space entity. (The problem space may be the computing space for things like String and Array; a typical application involves multiple problem spaces.) That entity is abstracted in terms of responsibilities that it has. The OO paradigm only allows two classes of responsibilities: responsibility for knowing something and responsibility for doing something. So there is no direct way to abstract a problem space quality like Purpose; the OO developer must recast that notion into something known or something done (or a combination).

Therefore, when one passes data in a message, that is quite different than passing an object reference in a message. The reason is that all the receiver can do with data is use it. If it is by value, then the receiver can change it without affecting the client who can still access the original source of the value. But an object reference is quite different because now the receiver can also trigger any behavior responsibility the object has and that can lead to lots of other data being changed, perhaps the client's.

Think of it this way. A by-value message data packet is often represented as an object with only knowledge attributes. Commonly such objects are aka "dumb data holders". They are dumb because they can't do anything. But the object whose reference is passed can have quite arbitrary behaviors with essentially unlimited side effects.

And this all segues to...

I would also be very interested to hear more about 'knowledge
accessors.' What sorts of characteristics and/or constraints
distinguish a knowledge accessor from a more typical method? Living in
the Java world, I naturally think of 'getter' methods as 'knowledge
accessors,' and have lately come to the conclusion that the presence of
getter methods (the way they are typically used) is in most cases
violating encapsulation and a symptom of misplaced responsibilities. In
my current project at work, I have only found the need for 'getter'
methods on the lowest-level domain objects in the problem space, and
these objects are so far little more than data containers with some
basic validation and consistency checks. The presence of the 'getters'
and the lack of any substance in these low-level classes makes me think
I must have comitted some sort of design sin.

Historically the getter/setter debate arose because the early OOPLs didn't get things quite right. They mapped knowledge responsibilities directly to hardware data stores and accessed then directly in the syntax. The problem with that was that the data store type itself reflected a specific implementation (e.g., integer vs. float, bit size, etc.). The compiler needed to understand that implementation to produce correct code in the accessing client. So if one changed the implementation, one had to recompile all the clients.

<aside>
BTW, this is one of the more common manifestations of physical coupling in the OOPLs. The OOPLs do a great job on logical decoupling but, because they are 3GLs, they do a lousy job on physical coupling. As a result, to provide maintainable code one has to go to substantial effort to refactor the OOPL code so that physical coupling is minimized. Much of dependency management refactoring is directed at the developer's problem of maintainability rather than solving the customer's problem. That problem does not exist at all in OOA/D solutions.
</aside>

To avoid this problem getter/setter methods were introduced that would allow the client to continue to "see" the implementation it expected even when the actual implementation changed. IOW, the getter/setter method properly encapsulated the implementation, just as procedures encapsulated behavior implementations. So all knowledge attributes were declared as 'private' and getters/setters were supplied as a Good Practice. However, this just created another problem. Now one could not look at the object type itself and know which knowledge responsibilities were actually public (i.e., intrinsic to what the object was in the problem space) or private (a convenient data store to support private behavior implementations).

Fortunately some of the more modern OOPLs have addressed this by making direct attribute access and indirect access through getters/setters syntactically interchangeable. IOW, if the developer provides a getter/setter, that is automatically substituted when the client code is compiled. Now one can declare and access public knowledge directly in the syntax as originally intended while the implementation hiding is done behind the scenes.

Alas, there is a Catch-22. That's because of the nature of responsibilities. The responsibility for knowing something can be quite complex. In OOA/D we commonly use ADTs to describe knowledge. Those ADTs are scalars at a subsystem's level of abstraction but they may represent quite complicated data stores. Or they may be computed on the fly by a complex calculation when accessed. (I worked on a system where a single scalar ADT attribute in one subsystem actually mapped into more than a dozen classes with hundreds of instances and on the order of 10**9 data values in another subsystem.)

So we can have an object like Matrix with operations like Transpose and Invert, which are algorithmically nontrivial and result is wholesale data store changes. Yet I submit such operations are simply "smart" knowledge setters rather than behavior responsibilities. IOW, a Matrix object is just a dumb data holder. So how does one distinguish between a knowledge responsibility and a behavior responsibility? The criteria are fairly simple:

(1) Only knowledge attributes of the "owning" object are affected. The accessor does not change attributes in other objects.

(2) The accessor does not instantiate objects or relationships.

(3) The accessor does not issue messages to other objects that trigger behaviors.

(4) The accessor does not involve any business rules or policies that are unique to the problem in hand. Thus Matrix.transpose provides a mathematically defined algorithm beyond the scope of the problem needing a matrix for its solution.

Any method that satisfies these criteria is just a knowledge accessor. Thus any object whose methods all satisfy these criteria is just a dumb data holder.

The really interesting criteria above is the third. That's because in OOA/D knowledge and behavior responsibilities are treated quite differently. One crafts the sequence of operations in a problem solution by essentially connecting the dots of individual, self-contained behavior responsibilities. So the _sequence of operations_ in an OO solution is defined at a different level of abstraction in OOA/D (e.g., in a UML Interaction Diagram) when one defines messages between objects.

However, that sequence must be quite general because one cannot know what the 3GL computing environment will be like. In particular, in the OOA one does not know if the computing environment will inherently asynchronous and/or concurrent. The most general way to represent behavior sequences is with an asynchronous communication model. That's because one can always map an asynchronous solution into a synchronous environment but one cannot always map a synchronous solution into an asynchronous environment.

So well-formed OOA/D solutions are usually constructed using an asynchronous communication model for sequencing behaviors. To do that one can rigorously use DbC to match execution preconditions to postconditions for other methods to determine where triggering messages should be generated. [In fact developers rarely do that in practice except in unusually complex situations because the already have the Big Picture of the flow of control in their heads. But it can be useful for dealing with complex situations and debugging.]

Now behaviors can only do two things that will affect the solution results: modify state variables (knowledge attributes) and send messages to other behaviors. IOW, regardless of how algorithmically complex a behavior is, one will never know it executed unless it changes data or invokes a another method that changes data. The Catch-22 is that methods need to "read" data to do their thing. So with an asynchronous communication model one has a data integrity problem: the data processed must be timely.

If the messages that a method uses to access data are also asynchronous, the developer's mind will quickly turn to mush trying to line up all the preconditions properly. The solution in the OOA/D is that knowledge responsibilities are assumed to be accessed synchronously. IOW, it is assumed the method needing attribute data gets it before doing anything with it on an as-needed basis. If the data is distributed, it is up to the OOP developer to provide infrastructure to ensure that the accessing method "sees" the data as-if it were synchronously available. Since procedure calls are inherently synchronous in 3GLs, that is usually pretty easy to ensure (e.g., by pausing a thread or something) and distributed infrastructures commonly do this transparently.

This has some interesting implications. The main one is that one needs to make sure one keeps straight which responsibilities are knowledge and which are behavior. One then also needs to understand which methods are simple knowledge accessors and which implement true business rules and policies. That's because one needs to keep the two modes of communication separated.

Note that in OOA/D separating relationship instantiation from collaboration is very naturally done -- even beyond the presence of dedicated abstract action language constructs. If a relationship is implemented with a referential attribute, that attribute is a knowledge attribute (i.e., the object has a responsibility to know who it is related to). The difference in communication modes alone would drive one towards a separate accessor for the referential attribute because it is a different responsibility than the behavior method invoked in a collaboration. [Ain't it great when a Plan comes together?]

Another, more subtle implication is that in a Well-formed OOA/D model messages very rarely contain data packets. That's because the synchronous knowledge communication model dictates that a method should navigate to the data it needs directly on an as-needed basis. Because of the asynchronous behavior communication model, it would be risky to pass data in messages because there are a theoretical arbitrary delay between when a message is sent and when it is consumed. (IOW, there is an implicit message queue queue underlying the asynchronous behavior messaging model.) So the data in the message could be out of date by the time the responding method actually executes.

So one usually only has message data packets for other reasons, such as a "snapshot" for sensor readings that must be processed as a group within a time box.

Yet another subtle implication applies when one chooses to deal with the asynchronous communication model the elegant way by using object state machines to deal with behavior responsibilities. Now the knowledge and behavior responsibilities are very clearly separated. All behaviors will appear in state actions while all knowledge accessors, however complex they may be, will reside in methods outside the state machines (aka synchronous services).

A corollary is that fairly often state machine actions are rather trivial (e.g., a single statement like "totalSalary = baseSalary * (1 + burdenRate)"). That's because they may look like a setter for totalSalary but they really involve a business rule that is unique to the problem in hand so they /must/ be in a state machine action. Thus distinguishing behavior from knowledge is more a matter of semantics than complexity or form.


*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
hsl@xxxxxxxxxxxxxxxxx
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development". Email
info@xxxxxxxxxxxxxxxxx for your copy.
Pathfinder is hiring: http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH



.



Relevant Pages

  • Re: Passing by reference
    ... The result is that the essentially procedural structure in the OOPLs allows them to map traditional techniques like functional decomposition directly onto the OOPL code. ... but I still don't see why the knowledge responsibilities were ever ... That approach grew up in R-T/E in the early '80s because of the juxtaposition of OO practices and the ubiquitous presence of state machines in R-T/E. ...
    (comp.object)
  • Re: Can I have Interface & Inheritance together for same class?
    ... Instead they provide a bunch of cookbook guidelines for defining responsibilities that get to the same place. ... Saying it is distracting when the entire static structure of OOA/D notations is based on set theory's relational model bothers me even more. ... A class is full of procedural code, ... At the risk of being pedantic, the AALs are specification languages, not implementation languages. ...
    (comp.object)
  • Re: UML notation
    ... In fact, many would argue that in an OOA/D generalization the superclasses should always be abstract. ... If you are using an AAL in conjunction with UML to specify behavior responsibilities, then the superclass maps to an abstract class at the OOPL level if you only specify behavior for the subclasses. ... I assume the interface you are talking about is the one that a Client uses. ...
    (comp.object)
  • Re: delegation vs. inheritance
    ... OOA/D is in attribute ADTs. ... for abstracting complex knowledge responsibilities to scalar values. ... I think I understand now what a classification based on identity is. ... IOW, the big issue for OO relationships and, indirectly, class systems is Who has access to a given object and one assumes whoever it is can talk to the given object once they know who it is. ...
    (comp.object)
  • Re: UMLsemantics questions
    ... then the only behavior operations would be state machine ... Any other responsibilities the object has, ... In an OO context behavior responsibilities execute rules ... in OOA/D one assumes synchronous access for knowledge ...
    (comp.object)