Re: Object identity
- From: "H. S. Lahman" <h.lahman@xxxxxxxxxxx>
- Date: Tue, 27 Jun 2006 17:49:40 GMT
Responding to Barrett-Lennard...
Firstly I will characterise the misconception more carefully: Rather
informally it states that it is implicit to OO that object instances
represent entities from the problem space. Therefore object identity
is fundamentally associated with these entities. This can be
formalised by introducing the concept of an *interpretation*. This is
simply a mapping I from run time object instances to entities in the
problem space.
Clarification 1: objects abstract entities from /some/ problem space.
Complex applications typically abstract for multiple problem spaces,
including the computing space for OOD/P.
Clarification 2: It is not an interpretation. It is a rule that objects
must have unique identity. It is also a rule that identity, however
it is represented, must be unambiguously traceable: problem space entity
-> object abstraction -> run-time instance. However, that doesn't mean
that identity must be represented the same way for each; it just has to
be unique for each one and traceable between them.
What is not an interpretation? Note that an interpretation, formally
defined as a mathematical function, is the standard way to deal with
the relationship between model and what is modelled. I've seen it in
texts on mathematical logic, automated theorem proving, relational
modelling etc.
Now we also have different definitions of 'interpretation' to add to the mix. B-) I don't see the application of a rule as being any kind of interpretation of what to do. IOW, rules are deterministic while interpretations are not.
Even using your definition of 'interpretation' in Section 2 to be a mapping function, I think it would be a stretch to use that in this context. The rules for how one constructs objects and instances /enable/ a mapping between problem space entities, objects, and object instances, but I don't think they are the mapping itself.
Clarification 3: What is a matter of interpretation is what constitutes
an identifiable problem space entity. Basically what that comes down to
is that a non-developer domain expert would agree that the entity is
known and identifiable in some way. But this is a methodological issue.
That is, reviewers of UML OOA/D models are very rarely confused about
what the model semantics are; their issues are about whether the authors
view of the problem space is correct.
My clarification: An interpretation is assumed to be correct by the
(pure) computer scientist, and the problem domain expert must validate
the interpretation.
OK, but I think this just puts a different spin on verification vs. validation. The developer interprets the <natural language> requirements. Then the developer can verify that the software resolves the <interpreted> requirements but the someone must validate whether the actual requirements were resolved.
However, I don't see the rules (my clarification 2) the developer uses in structuring a solution for those <interpreted> requirements as being an interpretation. The paradigm methodology does not give the developer a choice.
Using your mapping function interpretation, the mapping function between actual requirements and those the developer interprets is not rigorously defined. Since it is not deterministic, the reviewers and the developer have a negotiation. [If the reviewer is a domain expert, the reviewer should win. But that assumes that the reviewer properly understands the developer's solution. To have that understanding the reviewer must understand the rules used by the developer to create the solution.]
OTOH, the mapping function from the interpreted requirements to the solution is rigorously defined and that allows traceability _provided the developer followed the methodological construction rules_.
Now it is important that two run time object instances are not mapped
under the interpretation to the same entity in the problem space. In
other words, I must be 1-1. This can be written as
I(x) = I(y) => x = y
It is important to understand that the definition of 1-1 is tied to the
assumed domain of the function I. Making the domain smaller helps to
make an interpretation 1-1.
The only way to allow a single process to model an external entity in
more than one way is to divide the total system into sub-systems, each
with its own independent interpretation mapping. Each such sub-system
is said to work at a single, self-consistent "level of abstraction".
I mentioned subsystems because it happens to be a convenient way to
introduce the notion of scope. That allows one to abstract the same
entity differently in different scope so that the multiple abstractions
can't cause confusion. One can do the same thing with layers. Though
rarely done for other reasons, one even can do the same thing within
object implementations. And in you code example you show yet another
means of defining scope below.
The key idea is just that only one abstraction for a given entity is
visible within a particular scope. So long as they are in different
scope, one can have as many abstractions for a given problem space
entity as one wants.
So just to be certain - are you basically agreeing with the point of
view expressed by the definition I gave in section 2, and which I
believe to be "misconceived"?
Yes.
---- Section 3: Why is it a misconception?
This is demonstrated with the following code
class Employee
{
public:
string GetName() const { return name; }
void SetName(string newName) { name = newName; }
float GetSalary() const { return salary; }
void SetSalary(float newSalary) { salary = newSalary; }
private:
string name;
float salary;
};
void foo()
{
Employee* e = new Employee;
e->SetName("Albert Einstein");
e->SetSalary(25000);
e->SetName("Kurt Godel");
e->SetSalary(29000);
delete e;
}
This is a good example of the difference between object and instance.
What you have is two objects but only one instance.
Hmmm. I don't like that terminology at all. I think there is only one
object (pointed to by e). I don't distinguish the object from the
instance at all. I regard these as perfect synonyms (unless the class
instance is a value type, in which case there is no object at all).
Au contraire. The Einstein and Godel objects are quite explicitly defined and one can inspect those definitions whether the code is compiled and executed or not. That a member of the Employee set exists with identity of "Albert Einstein" and salary of 25000 is quite clear. And it is equally clear a different object of the Employee set exists with the identity "Kurt Godel" and salary of 29000.
The trick is that
only one object is instantiated at a time.
Your terminology is inconsistent, in the sense that you say there is
only one instance yet there have been two (object) "instantiations".
The instance, 'e', is identified by its address in memory. Because of the imperfections of the OOPL in its zeal to provide low level control over performance optimization, that presents a conundrum because each instance of the two objects would have the same address identity. The only way around that is to ensure that only one instance of the two objects can exist at one time, which the implementation mechanics of overwriting of memory locations ensures in a simple-minded way without regard to other issues like relationship management.
While technically ensuring unique identity for each object instance in the language implementation, that mechanism opens up a host of referential integrity problems that are pushed off onto the developer. That's why the abstract action languages for OOA/D don't allow that to happen; instance creation is a fundamental operation and the instance identity mechanism is not exposed to the developer.
Again, don't get hung up on the vagaries of OOPL implementations. They all make compromises with the hardware computational models and they often have explicit goals that are at odds with OOA/D (e.g., C++'s emphasis on performance).
The Einstein object's
instance ceases to exist when the Godel object's instance is initialized
just as surely as if one had written:
void foo()
{
Employee* e = new Employee;
e->SetName("Albert Einstein");
e->SetSalary(25000);
delete e;
Employee* e = new Employee
e->SetName("Kurt Godel");
e->SetSalary(29000);
delete e;
}
I've never read or heard anyone say that before!
BTW this won't compile because there are two declarations of variable e
in the same scope. In any case I know what you're saying.
Hey, I'm a translationist. B-) I probably haven't written 10 KLOC of 3GL in the past fifteen years. I don't even like to look at it anymore!
[In part the OOPL is confusing things by allowing the object to be
instantiated without proper initialization of identity. (One could not
do that in any of the abstract action languages used for OOA/D that I
know of.) In part the OOPL is confusing things by allowing an
optimization by reusing the memory for the object without the overhead
of heap operations. IOW, the OOPL designer is offering the developer an
opportunity for foot-shooting by washing his hands of referential
integrity issues and pushing them all on the developer. This is why I
argued that looking at OOPL code is not a good place to learn OO.]
IMO the only reason the OOPL is confusing things is because you want to
deal with object identity at the level of the entities in the problem
space. This problem disappears if you simply associate objects (and
their identity) with nothing other than the class instances that reside
in memory. In my mind this is a case of "less is more".
The problem is that objects, unlike RDB tuples, usually do not have explicit identity. Instead it is often defined referentially, which maps conveniently to a memory address in hardware so the OOPLs provide infrastructures around that paradigm.
However, when the the objects do have explicit identity -- as in your Einstein/Godel example -- there is a problem. That's because the OOPL's provide no infrastructure for identity attributes. Unlike designated keys in an RDB table, there is nothing special about such attributes. That makes it perfectly legal to change the name "Albert Einstein" (25000) to "Kurt Godel" (29000) in the Einstein instance within the OOPL syntax rules. IOW, explicit identity is purely in the mind of the developer and all we can do is break the thumbs of developers who do things like changing object identity attributes on the fly.
Thus the OOPLs fail to support object identity mapping fully. However, to avoid referential integrity chaos, the developers /must/ methodologically know what object identity is at OOA/D time and treat it with the respect it deserves. (If they get it right in the OOA/D, then it doesn't matter what sort of foot-shooting the OOPL in hand allows.)
That is clearly in conflict with the "misconception" that was defined
in section 2. I am of course assuming that the interpretation map
makes use of the 'name' member of an Employee instance. Under this
interpretation, the definition of object identity changes from one call
to the next in function foo() above.
This poor definition of object identity makes it impossible for an
object to provide a Clone() method, such as
Employee* Employee::Clone() const
{
Employee* e = new Employee;
e->name = name;
e->salary = salary;
return e;
}
The reason is that under the interpretation the clone would map to the
same entity (in the problem space), in conflict with the assumption
that the interpretation is 1-1.
This is a different problem. A true clone function creates different
instances with unique identity (address) but the instances all map to
the same object abstraction Employee{name,...}. This creates an even
nastier set of problems for referential integrity for relationships when
one changes the salary. The solution here is to break the thumbs of any
developer who does something like this.
So you say the clone function() is valid, but a developer who calls it
would be wise to first get insurance on his/her thumbs? :)
Exactly. It is valid at the 3GL level because of imperfections in the OOPLs, but it is not valid at the OOA/D level.
Ok, as promised above I will now state how I believe object identity
works. This is a mixture of formal and informal (because I'm too
lazy). I think this is the prevailing view amongst the majority of OO
programmers...
It really is just a set of definitions...
Definition: An *entity* is a thing that can be modelled by a computer.
An entity generally exists independently of the computer that models
it. An example of an entity is a human. Another example is the
integer 2189.
OK, but I really don't like the last sentence. Fundamental units of computation really shouldn't be regarded as equivalent to problem space abstractions, however convenient that might be to 3GL design. Sure, there is some mathematical concept behind the notion of Integer Number that is abstracted, but that notion is really only of interest to implementing software on a hardware computer. IOW, once one is out of the realms of computers or pure mathematics, the notion of Integer as an entity is a pretty alien concept. There 2189 is just a value of some bit of knowledge. Thus entities can be abstracted with knowledge but the bits of knowledge aren't are the same level of abstraction.
Definition: A *problem space* is a (mathematical) set of entities that
are relevant to solving a given problem using a computer. Entities in
a single problem space are allowed to form has-a relationships. For
example, Albert Einstein is an entity, and Albert Einstein's left
eyeball is an entity as well. Entities in a single problem space are
allowed to be at different "levels of abstraction". Basically there
are no restrictions!
Relative to your first definition, I think there are restrictions. For example, entities must have abstractable properties that are relevant to the solution. I would go further and argue that entities should have multiple properties. (Only one may be relevant to the problem in hand, but then a reviewer should want a lot of justification that was so.)
Remember that the OO paradigm provides for three fundamental levels of abstraction: subsystem; object; and responsibility. (Nesting, as in subsystems, is allowed and objects can embed other objects, but mechanisms like implementation hiding mitigate that.) It is subsystems and objects that map to problem space entities. By implication problem space entities are necessarily complex to support further subdivision into properties.
Definition: An *entity-type* is a set of entities used for
classification purposes. For example, the set of all humans, or the
set of integers Z. It is permissible for an entity-type to be
regarded as an entity.
Definition: A *class* is associated with a class definition written in
some OOPL that defines both state and methods. For our purposes we
are only interested in concrete classes.
Let's keep it to OOA/D sets rather that the baggage of 3GL type systems. Entities belong to sets (aka classes). Set membership is based on all members having the same property set (defined by the class).
Definition: A *process* is associated with a particular execution of
the program on a given computer. A process has its own address space.
Not relevant. One of the tests of a well-formed OOA model is that it can be unambiguously implemented as a manual system in the customer's environment (however inefficient that might be). [Obviously computing space applications like DBMS engines are an exception.] Thus any notion of object identity has to be independent of the computing environment. (Instance identity, OTOH, is dependent on the implementation environment.)
However, I would buy a similar definition related to scope _within an application_. In OOA/D that scope is a subsystem.
Definition: A *class-instance* is a particular instantiation of a
(concrete) class in a particular process at a particular location in
memory. For our purposes we don't care whether the instance is
associated with a global variable, a frame variable or was allocated on
the heap.
I don't like the term "class instance". An object instance is implemented in memory, not a class. The class is instantiated within some scope by defining a set of identified object members. Relative to the point above, that class instantiation clearly does not have to be in memory; it is just a set of identified objects.
Because of the OOPL problems with instantiation discussed in the first part of the message, there needs to be a time constraint as well. That is, object instances are born and die in time during the execution. Whether there are referential integrity issues /may/ depend on when various objects are instantiated, as in your overwrite example.
Definition: A *concrete-type* is either a concrete class or a simple
type like int, float.
Definition: A *value-type* is a concrete type used for program
variables, that represents an entity in the problem space. This
involves some given interpretation (ie map) from its state (ie bytes in
memory) to entity. When a value-type variable is copied both the
original value and the new value will represent the same entity.
Examples of value-types are int, float, as well as certain classes like
std::string (the STL string for C++). It is generally wrong to take
the address of a value-type variable for the purposes of pointer
comparisons.
Definition: At any given point in time a value-type variable is said
to contain a particular *value*. A value is associated with the entity
that the variable represents (under the interpretation as a
value-type).
I really don't like introducing type systems. They are a 3GL implementation mechanism for OOA/D class systems. As such it becomes difficult to separate the local implementation issues from the fundamental OOA/D issues. The OOA/D set-oriented view is much simpler, more general, and less dependent on the vagaries of implementation. Among other things type systems lead to silliness like making an Integer an object of equal stature to a complex notion like Employee. It also leads to a notion of 'value' that is quite different than one employs when abstracting property data domains from a problem space.
Definition: An *object* is a class-instance that is regarded as having
identity tied to that class-instance. For the purposes of identity,
the object does *not* represent an entity under some interpretation.
Although it certainly may (in the mind of the programmer) that is
entirely irrelevant to the semantics of object indentity.
OK, here is where we part company in a big way.
One can argue that any formalism tends to be stilted and that the OO paradigm must have an underlying basis in mathematics where unique definitions of things like /value/ prevail. However, I leave that to the boffins who design OO methodologies, OOA/D notations, and OOPLs. That has all been resolved at the level of solving problems using an OO appoach and the OO paradigm has methodological constraints on solution construction...
Objects abstract problem space entities and they must have unique identity that is unambiguously traceable to that of the problem space entity that they abstract. I am not making this up; it is fundamental to the OO paradigm because it enables an explicit mapping between the problem space and the software solution that was missing in previous approaches. You can postulate an A&D system where that is not necessarily true and where other mappings to the problem space are provided (P/R and FP being obvious examples), but it would not be an OO methodology.
*************
There is nothing wrong with me that could
not be cured by a capful of Drano.
H. S. Lahman
hsl@xxxxxxxxxxxxxxxxx
Pathfinder Solutions -- Put MDA to Work
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
Pathfinder is hiring: http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
.
- Follow-Ups:
- Re: Object identity
- From: David Barrett-Lennard
- Re: Object identity
- References:
- Object identity
- From: David Barrett-Lennard
- Re: Object identity
- From: H. S. Lahman
- Re: Object identity
- From: David Barrett-Lennard
- Object identity
- Prev by Date: Re: Let's put this to rest
- Next by Date: Re: design sanity check/advice
- Previous by thread: Re: Object identity
- Next by thread: Re: Object identity
- Index(es):
Relevant Pages
|