Re: Object identity
- From: "David Barrett-Lennard" <davidbl@xxxxxxxxxxxx>
- Date: 25 Jun 2006 10:00:34 -0700
H. S. Lahman wrote:
Responding to Barrett-Lennard...
Firstly I will characterise the misconception more carefully: Rather
informally it states that it is implicit to OO that object instances
represent entities from the problem space. Therefore object identity
is fundamentally associated with these entities. This can be
formalised by introducing the concept of an *interpretation*. This is
simply a mapping I from run time object instances to entities in the
problem space.
Clarification 1: objects abstract entities from /some/ problem space.
Complex applications typically abstract for multiple problem spaces,
including the computing space for OOD/P.
Clarification 2: It is not an interpretation. It is a rule that objects
must have unique identity. It is also a rule that identity, however
it is represented, must be unambiguously traceable: problem space entity
-> object abstraction -> run-time instance. However, that doesn't mean
that identity must be represented the same way for each; it just has to
be unique for each one and traceable between them.
What is not an interpretation? Note that an interpretation, formally
defined as a mathematical function, is the standard way to deal with
the relationship between model and what is modelled. I've seen it in
texts on mathematical logic, automated theorem proving, relational
modelling etc.
Clarification 3: What is a matter of interpretation is what constitutes
an identifiable problem space entity. Basically what that comes down to
is that a non-developer domain expert would agree that the entity is
known and identifiable in some way. But this is a methodological issue.
That is, reviewers of UML OOA/D models are very rarely confused about
what the model semantics are; their issues are about whether the authors
view of the problem space is correct.
My clarification: An interpretation is assumed to be correct by the
(pure) computer scientist, and the problem domain expert must validate
the interpretation.
Now it is important that two run time object instances are not mapped
under the interpretation to the same entity in the problem space. In
other words, I must be 1-1. This can be written as
I(x) = I(y) => x = y
It is important to understand that the definition of 1-1 is tied to the
assumed domain of the function I. Making the domain smaller helps to
make an interpretation 1-1.
The only way to allow a single process to model an external entity in
more than one way is to divide the total system into sub-systems, each
with its own independent interpretation mapping. Each such sub-system
is said to work at a single, self-consistent "level of abstraction".
I mentioned subsystems because it happens to be a convenient way to
introduce the notion of scope. That allows one to abstract the same
entity differently in different scope so that the multiple abstractions
can't cause confusion. One can do the same thing with layers. Though
rarely done for other reasons, one even can do the same thing within
object implementations. And in you code example you show yet another
means of defining scope below.
The key idea is just that only one abstraction for a given entity is
visible within a particular scope. So long as they are in different
scope, one can have as many abstractions for a given problem space
entity as one wants.
So just to be certain - are you basically agreeing with the point of
view expressed by the definition I gave in section 2, and which I
believe to be "misconceived"?
---- Section 3: Why is it a misconception?
This is demonstrated with the following code
class Employee
{
public:
string GetName() const { return name; }
void SetName(string newName) { name = newName; }
float GetSalary() const { return salary; }
void SetSalary(float newSalary) { salary = newSalary; }
private:
string name;
float salary;
};
void foo()
{
Employee* e = new Employee;
e->SetName("Albert Einstein");
e->SetSalary(25000);
e->SetName("Kurt Godel");
e->SetSalary(29000);
delete e;
}
This is a good example of the difference between object and instance.
What you have is two objects but only one instance.
Hmmm. I don't like that terminology at all. I think there is only one
object (pointed to by e). I don't distinguish the object from the
instance at all. I regard these as perfect synonyms (unless the class
instance is a value type, in which case there is no object at all).
The trick is that
only one object is instantiated at a time.
Your terminology is inconsistent, in the sense that you say there is
only one instance yet there have been two (object) "instantiations".
The Einstein object's
instance ceases to exist when the Godel object's instance is initialized
just as surely as if one had written:
void foo()
{
Employee* e = new Employee;
e->SetName("Albert Einstein");
e->SetSalary(25000);
delete e;
Employee* e = new Employee
e->SetName("Kurt Godel");
e->SetSalary(29000);
delete e;
}
I've never read or heard anyone say that before!
BTW this won't compile because there are two declarations of variable e
in the same scope. In any case I know what you're saying.
[In part the OOPL is confusing things by allowing the object to be
instantiated without proper initialization of identity. (One could not
do that in any of the abstract action languages used for OOA/D that I
know of.) In part the OOPL is confusing things by allowing an
optimization by reusing the memory for the object without the overhead
of heap operations. IOW, the OOPL designer is offering the developer an
opportunity for foot-shooting by washing his hands of referential
integrity issues and pushing them all on the developer. This is why I
argued that looking at OOPL code is not a good place to learn OO.]
IMO the only reason the OOPL is confusing things is because you want to
deal with object identity at the level of the entities in the problem
space. This problem disappears if you simply associate objects (and
their identity) with nothing other than the class instances that reside
in memory. In my mind this is a case of "less is more".
The identity of the Einstein object is "Albert Einstein" and the
identity of the Godel object is "Kurt Godel". The attributes of those
abstractions are defined to depend on that identity. The identity of
the Einstein instance is the address of 'e', as is the identity of the
Godel instance. That's fine so long as they do not both exist at the
same time in the same scope. What the language has done is to butcher
the notion of scope and make it the developer's responsibility. That
is, the developer is going to be solely responsible to ensure nobody
tries to access the Einstein instance outside the <artificial> scope of
pairs of initializers.
That is a particularly nasty problem in a language where most
relationships are instantiated with address pointers. Now when
instantiating relationships one has a potential nightmare for
referential integrity. But those problems largely evaporate if one
writes the code as I did and creates a separate heap instance with a
unique address for each object. That's because scope is terminated
explicitly at the delete rather than implicitly at some other object's
initializer.
You are logically self- consistent - there is clearly no way I could
prove you wrong. Similarly my definition of object identity (which I
will discuss more formally below) is also logically self-consistent.
They represent two alternative "formalisms" for the semantics of OO.
However, I claim that my definition is simpler, easier to understand
and more natural.
This code is not intended to be an illustration of good design. That
is not its purpose!
It is generally assumed that when an object is created it has identity,
and it keeps that identity until it is destroyed. It is *not*
allowable for an object to change its identity over time. It can only
change its attributes. This allows pointer comparisons to be used for
object identity tests.
The objects' identities -- Einstein and Godel -- don't change in your
example. And an instance of both objects has a unique identity at any
given time by virtue of the fact that they do not coexist. The OOPL
allows the mapping of object identity to instance identity to be
ambiguous for purposes of relationship instantiation, but that is really
a deficiency in the OOPL to allow the instance identity (address) to be
reused for different objects anyway. Even then, technically the
language defines what happens when the instance is re-initialized so the
referential integrity problem (resolving the relationship ambiguity in
all cases) is just moved to the developer's shoulders rather than being
enforced in the language.
That is clearly in conflict with the "misconception" that was defined
in section 2. I am of course assuming that the interpretation map
makes use of the 'name' member of an Employee instance. Under this
interpretation, the definition of object identity changes from one call
to the next in function foo() above.
This poor definition of object identity makes it impossible for an
object to provide a Clone() method, such as
Employee* Employee::Clone() const
{
Employee* e = new Employee;
e->name = name;
e->salary = salary;
return e;
}
The reason is that under the interpretation the clone would map to the
same entity (in the problem space), in conflict with the assumption
that the interpretation is 1-1.
This is a different problem. A true clone function creates different
instances with unique identity (address) but the instances all map to
the same object abstraction Employee{name,...}. This creates an even
nastier set of problems for referential integrity for relationships when
one changes the salary. The solution here is to break the thumbs of any
developer who does something like this.
So you say the clone function() is valid, but a developer who calls it
would be wise to first get insurance on his/her thumbs? :)
Ok, as promised above I will now state how I believe object identity
works. This is a mixture of formal and informal (because I'm too
lazy). I think this is the prevailing view amongst the majority of OO
programmers...
It really is just a set of definitions...
Definition: An *entity* is a thing that can be modelled by a computer.
An entity generally exists independently of the computer that models
it. An example of an entity is a human. Another example is the
integer 2189.
Definition: A *problem space* is a (mathematical) set of entities that
are relevant to solving a given problem using a computer. Entities in
a single problem space are allowed to form has-a relationships. For
example, Albert Einstein is an entity, and Albert Einstein's left
eyeball is an entity as well. Entities in a single problem space are
allowed to be at different "levels of abstraction". Basically there
are no restrictions!
Definition: An *entity-type* is a set of entities used for
classification purposes. For example, the set of all humans, or the
set of integers Z. It is permissible for an entity-type to be
regarded as an entity.
Definition: A *class* is associated with a class definition written in
some OOPL that defines both state and methods. For our purposes we
are only interested in concrete classes.
Definition: A *process* is associated with a particular execution of
the program on a given computer. A process has its own address space.
Definition: A *class-instance* is a particular instantiation of a
(concrete) class in a particular process at a particular location in
memory. For our purposes we don't care whether the instance is
associated with a global variable, a frame variable or was allocated on
the heap.
Definition: A *concrete-type* is either a concrete class or a simple
type like int, float.
Definition: A *value-type* is a concrete type used for program
variables, that represents an entity in the problem space. This
involves some given interpretation (ie map) from its state (ie bytes in
memory) to entity. When a value-type variable is copied both the
original value and the new value will represent the same entity.
Examples of value-types are int, float, as well as certain classes like
std::string (the STL string for C++). It is generally wrong to take
the address of a value-type variable for the purposes of pointer
comparisons.
Definition: At any given point in time a value-type variable is said
to contain a particular *value*. A value is associated with the entity
that the variable represents (under the interpretation as a
value-type).
Definition: An *object* is a class-instance that is regarded as having
identity tied to that class-instance. For the purposes of identity,
the object does *not* represent an entity under some interpretation.
Although it certainly may (in the mind of the programmer) that is
entirely irrelevant to the semantics of object indentity.
Definition: An "object-type" is a concrete class whose instances are
regarded as objects, not values.
Some advantages of my approach (compared to yours) are:-
1. Pointers to objects can reliably be compared to test object
identity
2. Objects are just instances of classes.
3. When an object is created it has identity, and it keeps that
identity until it is destroyed.
4. It actually encompasses your approach; paradoxically your
definition of object identity corresponds to classes that are treated
as value types.
5. Classes are allowed to have Clone() methods (without being
suspicious)
6. The "confusion caused by the OOPL" is no longer a factor
7. If your approach is formalised to the same degree as mine, above,
it will be found to be more complicated, difficult to explain and
difficult to understand [not proven].
Point 4 makes me wonder whether there is something stronger I can say
about problems with your approach. It seems that you don't treat
object-types any differently from value-types for the purposes of
object identity. That just doesn't make sense.
Cheers,
David Barrett-Lennard
.
- Follow-Ups:
- Re: Object identity
- From: H. S. Lahman
- Re: Object identity
- From: Gabriel Claramunt
- Re: Object identity
- References:
- Object identity
- From: David Barrett-Lennard
- Re: Object identity
- From: H. S. Lahman
- Object identity
- Prev by Date: Re: In book,Use case diagram is static or dynamic view? (for UML)
- Next by Date: Re: OO versus RDB
- Previous by thread: Re: Object identity
- Next by thread: Re: Object identity
- Index(es):