Re: Object identity




H. S. Lahman wrote:
Responding to Barrett-Lennard...

Firstly I will characterise the misconception more carefully: Rather
informally it states that it is implicit to OO that object instances
represent entities from the problem space. Therefore object identity
is fundamentally associated with these entities. This can be
formalised by introducing the concept of an *interpretation*. This is
simply a mapping I from run time object instances to entities in the
problem space.

Clarification 1: objects abstract entities from /some/ problem space.
Complex applications typically abstract for multiple problem spaces,
including the computing space for OOD/P.

Clarification 2: It is not an interpretation. It is a rule that objects
must have unique identity. It is also a rule that identity, however
it is represented, must be unambiguously traceable: problem space entity
-> object abstraction -> run-time instance. However, that doesn't mean
that identity must be represented the same way for each; it just has to
be unique for each one and traceable between them.


What is not an interpretation? Note that an interpretation, formally
defined as a mathematical function, is the standard way to deal with
the relationship between model and what is modelled. I've seen it in
texts on mathematical logic, automated theorem proving, relational
modelling etc.

Now we also have different definitions of 'interpretation' to add to the
mix. B-) I don't see the application of a rule as being any kind of
interpretation of what to do. IOW, rules are deterministic while
interpretations are not.

Even using your definition of 'interpretation' in Section 2 to be a
mapping function, I think it would be a stretch to use that in this
context. The rules for how one constructs objects and instances
/enable/ a mapping between problem space entities, objects, and object
instances, but I don't think they are the mapping itself.

You are going to have to be more formal for me to comment further. For
example, I don't know what you mean by "rule".

Clarification 3: What is a matter of interpretation is what constitutes
an identifiable problem space entity. Basically what that comes down to
is that a non-developer domain expert would agree that the entity is
known and identifiable in some way. But this is a methodological issue.
That is, reviewers of UML OOA/D models are very rarely confused about
what the model semantics are; their issues are about whether the authors
view of the problem space is correct.


My clarification: An interpretation is assumed to be correct by the
(pure) computer scientist, and the problem domain expert must validate
the interpretation.

OK, but I think this just puts a different spin on verification vs.
validation. The developer interprets the <natural language>
requirements. Then the developer can verify that the software resolves
the <interpreted> requirements but the someone must validate whether the
actual requirements were resolved.

However, I don't see the rules (my clarification 2) the developer uses
in structuring a solution for those <interpreted> requirements as being
an interpretation. The paradigm methodology does not give the developer
a choice.

Using your mapping function interpretation, the mapping function between
actual requirements and those the developer interprets is not rigorously
defined. Since it is not deterministic, the reviewers and the developer
have a negotiation. [If the reviewer is a domain expert, the reviewer
should win. But that assumes that the reviewer properly understands the
developer's solution. To have that understanding the reviewer must
understand the rules used by the developer to create the solution.]

OTOH, the mapping function from the interpreted requirements to the
solution is rigorously defined and that allows traceability _provided
the developer followed the methodological construction rules_.

Now it is important that two run time object instances are not mapped
under the interpretation to the same entity in the problem space. In
other words, I must be 1-1. This can be written as

I(x) = I(y) => x = y

It is important to understand that the definition of 1-1 is tied to the
assumed domain of the function I. Making the domain smaller helps to
make an interpretation 1-1.

The only way to allow a single process to model an external entity in
more than one way is to divide the total system into sub-systems, each
with its own independent interpretation mapping. Each such sub-system
is said to work at a single, self-consistent "level of abstraction".

I mentioned subsystems because it happens to be a convenient way to
introduce the notion of scope. That allows one to abstract the same
entity differently in different scope so that the multiple abstractions
can't cause confusion. One can do the same thing with layers. Though
rarely done for other reasons, one even can do the same thing within
object implementations. And in you code example you show yet another
means of defining scope below.

The key idea is just that only one abstraction for a given entity is
visible within a particular scope. So long as they are in different
scope, one can have as many abstractions for a given problem space
entity as one wants.


So just to be certain - are you basically agreeing with the point of
view expressed by the definition I gave in section 2, and which I
believe to be "misconceived"?

Yes.

---- Section 3: Why is it a misconception?

This is demonstrated with the following code

class Employee
{
public:
string GetName() const { return name; }
void SetName(string newName) { name = newName; }

float GetSalary() const { return salary; }
void SetSalary(float newSalary) { salary = newSalary; }

private:
string name;
float salary;
};

void foo()
{
Employee* e = new Employee;

e->SetName("Albert Einstein");
e->SetSalary(25000);

e->SetName("Kurt Godel");
e->SetSalary(29000);

delete e;
}

This is a good example of the difference between object and instance.
What you have is two objects but only one instance.


Hmmm. I don't like that terminology at all. I think there is only one
object (pointed to by e). I don't distinguish the object from the
instance at all. I regard these as perfect synonyms (unless the class
instance is a value type, in which case there is no object at all).

Au contraire. The Einstein and Godel objects are quite explicitly
defined and one can inspect those definitions whether the code is
compiled and executed or not. That a member of the Employee set exists
with identity of "Albert Einstein" and salary of 25000 is quite clear.
And it is equally clear a different object of the Employee set exists
with the identity "Kurt Godel" and salary of 29000.

The humans Einstein and Godel are entities, *not* objects. I think it
is entirely non-standard of you to call them objects (in the context of
a discussion about OO, and in particular object identity).

Let's use some consistent terminology and *always* use the word entity
for particular things that exist independently of the computer. Or do
you somehow distinguish between entity and object, even before you
compile and execute the code? If you do please provide a sufficiently
formal definition so I can understand what you mean.


The trick is that
only one object is instantiated at a time.


Your terminology is inconsistent, in the sense that you say there is
only one instance yet there have been two (object) "instantiations".

The instance, 'e', is identified by its address in memory. Because of
the imperfections of the OOPL in its zeal to provide low level control
over performance optimization, that presents a conundrum because each
instance of the two objects would have the same address identity. The
only way around that is to ensure that only one instance of the two
objects can exist at one time, which the implementation mechanics of
overwriting of memory locations ensures in a simple-minded way without
regard to other issues like relationship management.

IMO you attribute a strong implicit semantic to OO that simply isn't
there in the first place.

I have no problem with an object (ie an instance of a class in memory)
that can have fields changed, making it suddenly represent a different
entity under some interpretation. This is allowed because unlike RM,
OO doesn't itself come ready made with semantics that relates back to
an interpretation. I'm somewhat with Bob Badour when he likens OO to a
methodology for "constructing large unpredictable state machines out of
small predictable state machines". Now I wouldn't go quite that far,
but it does seem clear that the onus is on the programmer to formally
prove that an OO program will solve the problem at hand. Using OO can
be as dangerous as assembly. You have complete, unrestricted access to
a Turing machine. You can express logical, correct solutions as well
as incorrect ones that sound right but require careful analysis to
reveal subtle errors. This promiscuity allows the OO developer to
create wholly new algorithms and techniques. But that power and
generality comes at the price of only low level implicit semantics.

You can't write a general purpose program that can look at a snapshot
of the run time state of any given running OO system, in order to
deduce truths (in the form of predicates about entities), even if it's
provided with the source code, and is also able to find all the global
variables and navigate every thread's frame stack. For a start it is
faced with the problem of finding a consistent cut. It can't hope to
always "understand" some given source code because of the halting
problem. How does it know which objects in memory to trust, and which
not to? Eg is an object just for temporary purposes for some
algorithm? How does it know what the algorithm is for?

Thinking of OO merely terms of simple class diagrams and modelling of
relationships is at best an over-simplification. More to the point,
that limited view emphasises exactly what OO is poor at :
classification and storing relationships about entities.

The above "imperfections" of the OOPL don't exist at all, because OO
works at a lower level semantic than you prescribe.


While technically ensuring unique identity for each object instance in
the language implementation, that mechanism opens up a host of
referential integrity problems that are pushed off onto the developer.
That's why the abstract action languages for OOA/D don't allow that to
happen; instance creation is a fundamental operation and the instance
identity mechanism is not exposed to the developer.

I don't agree. Just find a Clone() method on a class in an OOD.

Again, don't get hung up on the vagaries of OOPL implementations. They
all make compromises with the hardware computational models and they
often have explicit goals that are at odds with OOA/D (e.g., C++'s
emphasis on performance).

The Einstein object's
instance ceases to exist when the Godel object's instance is initialized
just as surely as if one had written:

void foo()
{
Employee* e = new Employee;

e->SetName("Albert Einstein");
e->SetSalary(25000);

delete e;
Employee* e = new Employee

e->SetName("Kurt Godel");
e->SetSalary(29000);

delete e;
}


I've never read or heard anyone say that before!

BTW this won't compile because there are two declarations of variable e
in the same scope. In any case I know what you're saying.

Hey, I'm a translationist. B-) I probably haven't written 10 KLOC of
3GL in the past fifteen years. I don't even like to look at it anymore!

[In part the OOPL is confusing things by allowing the object to be
instantiated without proper initialization of identity. (One could not
do that in any of the abstract action languages used for OOA/D that I
know of.) In part the OOPL is confusing things by allowing an
optimization by reusing the memory for the object without the overhead
of heap operations. IOW, the OOPL designer is offering the developer an
opportunity for foot-shooting by washing his hands of referential
integrity issues and pushing them all on the developer. This is why I
argued that looking at OOPL code is not a good place to learn OO.]


IMO the only reason the OOPL is confusing things is because you want to
deal with object identity at the level of the entities in the problem
space. This problem disappears if you simply associate objects (and
their identity) with nothing other than the class instances that reside
in memory. In my mind this is a case of "less is more".

The problem is that objects, unlike RDB tuples, usually do not have
explicit identity. Instead it is often defined referentially, which
maps conveniently to a memory address in hardware so the OOPLs provide
infrastructures around that paradigm.

We don't share the same definition of "object". For you it is an
entity (I think). For me it is an instance in memory.

However, when the the objects do have explicit identity -- as in your
Einstein/Godel example -- there is a problem. That's because the OOPL's
provide no infrastructure for identity attributes. Unlike designated
keys in an RDB table, there is nothing special about such attributes.
That makes it perfectly legal to change the name "Albert Einstein"
(25000) to "Kurt Godel" (29000) in the Einstein instance within the OOPL
syntax rules. IOW, explicit identity is purely in the mind of the
developer and all we can do is break the thumbs of developers who do
things like changing object identity attributes on the fly.

Thus the OOPLs fail to support object identity mapping fully. However,
to avoid referential integrity chaos, the developers /must/
methodologically know what object identity is at OOA/D time and treat it
with the respect it deserves. (If they get it right in the OOA/D, then
it doesn't matter what sort of foot-shooting the OOPL in hand allows.)

IMO if the code expressed in the OOPL doesn't map simply to the OOA/D
then OO is being misused. Don't confuse ER diagram and class diagram.


That is clearly in conflict with the "misconception" that was defined
in section 2. I am of course assuming that the interpretation map
makes use of the 'name' member of an Employee instance. Under this
interpretation, the definition of object identity changes from one call
to the next in function foo() above.

This poor definition of object identity makes it impossible for an
object to provide a Clone() method, such as

Employee* Employee::Clone() const
{
Employee* e = new Employee;
e->name = name;
e->salary = salary;
return e;
}

The reason is that under the interpretation the clone would map to the
same entity (in the problem space), in conflict with the assumption
that the interpretation is 1-1.

This is a different problem. A true clone function creates different
instances with unique identity (address) but the instances all map to
the same object abstraction Employee{name,...}. This creates an even
nastier set of problems for referential integrity for relationships when
one changes the salary. The solution here is to break the thumbs of any
developer who does something like this.


So you say the clone function() is valid, but a developer who calls it
would be wise to first get insurance on his/her thumbs? :)

Exactly. It is valid at the 3GL level because of imperfections in the
OOPLs, but it is not valid at the OOA/D level.

All I can say is YUK

Ok, as promised above I will now state how I believe object identity
works. This is a mixture of formal and informal (because I'm too
lazy). I think this is the prevailing view amongst the majority of OO
programmers...

It really is just a set of definitions...

Definition: An *entity* is a thing that can be modelled by a computer.
An entity generally exists independently of the computer that models
it. An example of an entity is a human. Another example is the
integer 2189.

OK, but I really don't like the last sentence. Fundamental units of
computation really shouldn't be regarded as equivalent to problem space
abstractions, however convenient that might be to 3GL design.




Sure,
there is some mathematical concept behind the notion of Integer Number
that is abstracted, but that notion is really only of interest to
implementing software on a hardware computer.

Yes, but that is after all the topic of this discussion.

Note that your sentence betrays an aversion to wanting to treat numbers
as real in any sense whatsoever. You use words like "concept",
"notion", "abstracted". I suppose you say most numbers don't exist
because no one has written them down. I on the other hand am a
Platonist and don't lose any sleep over this, or force myself to
pollute my sentences with lots of additional but meaningless words to
indicate that numbers aren't real.

I consider that we are doing computer science here, and it is
ultimately a branch of (applied) mathematics. Your aversion to
treating numbers as real strikes me as a sure indicator that you are
lacing your arguments with metaphysical viewpoints that are outside the
scope of computer science.

IOW, once one is out of
the realms of computers or pure mathematics, the notion of Integer as an
entity is a pretty alien concept.

I don't disagree

There 2189 is just a value of some
bit of knowledge.

No. Outside the realm of mathematics, 2189 doesn't even exist, and
neither does computer science.

Thus entities can be abstracted with knowledge but
the bits of knowledge aren't are the same level of abstraction.

Huh?


Definition: A *problem space* is a (mathematical) set of entities that
are relevant to solving a given problem using a computer. Entities in
a single problem space are allowed to form has-a relationships. For
example, Albert Einstein is an entity, and Albert Einstein's left
eyeball is an entity as well. Entities in a single problem space are
allowed to be at different "levels of abstraction". Basically there
are no restrictions!

Relative to your first definition, I think there are restrictions. For
example, entities must have abstractable properties that are relevant to
the solution. I would go further and argue that entities should have
multiple properties. (Only one may be relevant to the problem in hand,
but then a reviewer should want a lot of justification that was so.)

Remember that the OO paradigm provides for three fundamental levels of
abstraction: subsystem; object; and responsibility. (Nesting, as in
subsystems, is allowed and objects can embed other objects, but
mechanisms like implementation hiding mitigate that.) It is subsystems
and objects that map to problem space entities. By implication problem
space entities are necessarily complex to support further subdivision
into properties.

Entities exist independently of the computer. So I don't see what any
of the above has to do with it.


Definition: An *entity-type* is a set of entities used for
classification purposes. For example, the set of all humans, or the
set of integers Z. It is permissible for an entity-type to be
regarded as an entity.

Definition: A *class* is associated with a class definition written in
some OOPL that defines both state and methods. For our purposes we
are only interested in concrete classes.

Let's keep it to OOA/D sets rather that the baggage of 3GL type systems.
Entities belong to sets (aka classes). Set membership is based on all
members having the same property set (defined by the class).

No. Please don't change my definitions. Classes are not sets of
entities.


Definition: A *process* is associated with a particular execution of
the program on a given computer. A process has its own address space.

Not relevant. One of the tests of a well-formed OOA model is that it
can be unambiguously implemented as a manual system in the customer's
environment (however inefficient that might be). [Obviously computing
space applications like DBMS engines are an exception.] Thus any notion
of object identity has to be independent of the computing environment.
(Instance identity, OTOH, is dependent on the implementation environment.)

However, I would buy a similar definition related to scope _within an
application_. In OOA/D that scope is a subsystem.

It is relevant because "process" appears in the definition of class
instance. You want to abstract the run time system away. I do not.
You are interested in a higher level semantic. I am not.


Definition: A *class-instance* is a particular instantiation of a
(concrete) class in a particular process at a particular location in
memory. For our purposes we don't care whether the instance is
associated with a global variable, a frame variable or was allocated on
the heap.

I don't like the term "class instance". An object instance is
implemented in memory, not a class. The class is instantiated within
some scope by defining a set of identified object members. Relative to
the point above, that class instantiation clearly does not have to be in
memory; it is just a set of identified objects.

Good point. But I wanted to distinguish between instances that act as
values versus objects. So I'll just say "instance" instead.

Because of the OOPL problems with instantiation discussed in the first
part of the message, there needs to be a time constraint as well. That
is, object instances are born and die in time during the execution.
Whether there are referential integrity issues /may/ depend on when
various objects are instantiated, as in your overwrite example.

Definition: A *concrete-type* is either a concrete class or a simple
type like int, float.

Definition: A *value-type* is a concrete type used for program
variables, that represents an entity in the problem space. This
involves some given interpretation (ie map) from its state (ie bytes in
memory) to entity. When a value-type variable is copied both the
original value and the new value will represent the same entity.
Examples of value-types are int, float, as well as certain classes like
std::string (the STL string for C++). It is generally wrong to take
the address of a value-type variable for the purposes of pointer
comparisons.

Definition: At any given point in time a value-type variable is said
to contain a particular *value*. A value is associated with the entity
that the variable represents (under the interpretation as a
value-type).

I really don't like introducing type systems. They are a 3GL
implementation mechanism for OOA/D class systems. As such it becomes
difficult to separate the local implementation issues from the
fundamental OOA/D issues. The OOA/D set-oriented view is much simpler,
more general, and less dependent on the vagaries of implementation.

I want to talk about the semantics of object identity with respect to
some given source code, not some UML diagram or whatever, particularly
if it is abstracted away from the source code.


Among other things type systems lead to silliness like making an Integer
an object of equal stature to a complex notion like Employee. It also
leads to a notion of 'value' that is quite different than one employs
when abstracting property data domains from a problem space.


Definition: An *object* is a class-instance that is regarded as having
identity tied to that class-instance. For the purposes of identity,
the object does *not* represent an entity under some interpretation.
Although it certainly may (in the mind of the programmer) that is
entirely irrelevant to the semantics of object indentity.

OK, here is where we part company in a big way.

One can argue that any formalism tends to be stilted and that the OO
paradigm must have an underlying basis in mathematics where unique
definitions of things like /value/ prevail. However, I leave that to
the boffins who design OO methodologies, OOA/D notations, and OOPLs.
That has all been resolved at the level of solving problems using an OO
appoach and the OO paradigm has methodological constraints on solution
construction...

I don't think you're being honest to *actual* OO (ie actual source
code). You hide behind abstractions of the source code like UML class
diagrams.


Objects abstract problem space entities and they must have unique
identity that is unambiguously traceable to that of the problem space
entity that they abstract.

I think that's just an unnecessary limitation, revealed in the enormous
amounts of real code that don't follow that rule at all - such as a
Clone() method.

I am not making this up; it is fundamental
to the OO paradigm because it enables an explicit mapping between the
problem space and the software solution that was missing in previous
approaches.

I don't know why you think that. I see plenty of disadvantages. If
that is fundamental to OO, then OO is living a lie.

You can postulate an A&D system where that is not
necessarily true and where other mappings to the problem space are
provided (P/R and FP being obvious examples), but it would not be an OO
methodology.

My definitions are not at odds with OO, as long as you don't blur the
distinction between instance and entity. Most software using OO
doesn't blur the distinction.

Cheers,
David Barrett-Lennard.

.



Relevant Pages

  • Re: Object identity
    ... represent entities from the problem space. ... It is not an interpretation. ... Then the developer can verify that the software resolves the requirements but the someone must validate whether the actual requirements were resolved. ... Because of the imperfections of the OOPL in its zeal to provide low level control over performance optimization, that presents a conundrum because each instance of the two objects would have the same address identity. ...
    (comp.object)
  • Re: Object identity
    ... must be unambiguously traceable: problem space entity ... Note that an interpretation, formally ... /enable/ a mapping between problem space entities, objects, and object ... Outside the realm of mathematics, 2189 doesn't even exist, and ...
    (comp.object)
  • Re: Object identity
    ... must be unambiguously traceable: problem space entity ... Note that an interpretation, formally ... the humans that the Einstein and Godel objects abstract are problem space entities. ... instances with unique identity but the instances all map to ...
    (comp.object)
  • Re: Object identity
    ... represent entities from the problem space. ... It is not an interpretation. ... introduce the notion of scope. ... Employee* e = new Employee; ...
    (comp.object)
  • Re: SF: Experimental mathematics
    ... >Mathematicians have begun to talk a bit about experimental mathematics, ... >Now the problem space I'm dealing with now involves the factoring ... >After a couple of months of theorizing and experimenting I've focused ... >integer solution that has a single prime factor of M, ...
    (sci.math)