Re: Object identity




Frans Bouma wrote:
David Barrett-Lennard wrote:

----- Section 1: Introduction

This post concerns the semantics of object identity. IMO there is a
fairly wide-spread misconception about the purpose and meaning of
object identity amongst the OO community.

No, it's rather clear actually.

I have no idea what
proportion of OO developers have this misconception, and the extent to
which they are aware of the (invalid) semantics that they are
attributing to object identity in some of their designs.

I think we're heading to the same conclusion as in your previous
thread: you mix semantical interpretation of live data with definitions
of the type of the data. But, let's argue over that further below ->

In the previous thread we rarely headed to the same conclusion! :)


Most importantly, I believe I have a new way of stating my claims that
are very difficult to counter.

Is this a competition of some sort? I always wonder why some people
waste so much time over proving something which is irrelevant to
everyone else. Isn't it so that arguing about deep theoretical details
which have no meaning in practise is perhaps fun, but not a thing
people who want to solve problems in real life have to worry about?

I.o.w.: it should be an open discussion, not some competition who can
formulate the statement which isn't counterable.

By difficult to counter I meant that my argument has simplicity and
clarity. That is surely better than the alternative.


Ultimately they clarify more strongly
the problems with the semantic lie associated with an object that
pretends to be something else, in the sense of object identity. This
IMO has previously only been countered with meta-physical arguments
that lack a scientific basis.

haha :) No, I think you're wasting time here. Let me put forward a
simple C# example:

bool a = (foo1 == foo2);
bool b = foo1.Equals(foo2);

now, if a is true, b has to be true as well, as that means that foo1
and foo2 point to the same object. If a is false, b can still be true.
If that's the case, foo1 and foo2 contain *semantically* the same
values, though in different objects. (let's pretend both aren't null).
'Equals' is a virtual method on the root type object meant for testing
object equality on data.

What does this small example tell us? Simply that there are 2
different meanings to 'the objects are equal'. If you mean: are they
the same instance, (which thus implies the same data, but it's not
about the data, it's about the instance) then you need a different
operator than when you mean: are they representing the same thing. THen
you need a data-comparison, and not an object comparison.

I understand you. I agree that a => b.

I call this identity comparison versus value comparison. This is
indeed very relevant to the thread.


But let's read on, we're not done yet :)


In what follows I need to talk about the domain of mathematical
functions (or "maps"). Therefore to avoid confusion I avoid the term
"problem domain" and instead use "problem space" to mean the same
thing.

I fail to see why it has to be formulated so formally.

In my experience it pays to be as precise as possible. Simple
misunderstandings in these discussion groups occur more often than not.


----- Section 2: Definition of the misconception
Firstly I will characterise the misconception more carefully: Rather
informally it states that it is implicit to OO that object instances
represent entities from the problem space.

No! I earlier already told you there's a difference between entity
instance and an instance of an entity definition's physical
representation. The _data_ is the entity. The container (object, table
row, view row, resultset row etc. ) is NOT.

You speak a different tongue to me :).

Do you realise that section 2 is all about (carefully) defining a point
of view that I disagree with?

The first thing that confuses me is that you say "No!". What does
"No!" to a definition mean? Are you saying that you agree with the
misconception, or are you saying that the misconception is wrong, which
ultimately means that you agree with me?

When you say "entity instance" do you just mean "entity"? What is an
"entity definition"? What is an "entity definition's physical
representation"? What data are you talking about that "is the
entity"? What do you mean when you say an object is a container?

I honestly haven't a clue what you're saying here.

object instances represent containers for entities. Please David, this
is the root issue of your confusion.

You're right I'm completely confused! I have no idea what that means.

Because the object instances
aren't the entities themselves but just containers, there is THUS a
problem with identity, but only on the level of ENTITIES, not on the
level of OBJECTS. See my previous example. Bool a contains the test on
identity of OBJECTS, b on ENTITIES.

Ah, now I think I understand what you're saying, and I basically agree,
although calling an object instance a "container" for the entity from
the problem space that it represents under an interpretation seems at
best unconventional.

Yes, you are agreeing with me that the point of view defined in section
2 really is a misconception about the nature of object identity. You
evidently don't realise that you are agreeing with me because you claim
that I am confused.

This is rather curious because you seemed to be disagreeing with me
about the nature of object identity in the thread called "OO versus
RDB". I thought you were agreeing with Lahman who stated that 1)
objects *always* represent abstractions (of entities from the problem
space), and therefore 2) it is semantically valid for object identity
to be associated with entities.


Identity is a concept which is important in consuming data in an OO
system. It's however also a root cause for a lot of confusion, exactly
the same as what you expressed in your posting.

Therefore object identity
is fundamentally associated with these entities. This can be
formalised by introducing the concept of an interpretation. This is
simply a mapping I from run time object instances to entities in the
problem space.

Pardon me for being practical, but is this all necessary to simply
describe the concept of information vs. data?

Now it is important that two run time object instances are not mapped
under the interpretation to the same entity in the problem space. In
other words, I must be 1-1. This can be written as

I(x) = I(y) => x = y

nice in theory, but in practise it's something you don't want to work
with. Imagine your server farm with your large n-tier web application
has to work with a single customer object. The inter-process
communication alone will burn it down to the ground.

Remember that *everything* in section 2 is about defining a point of
view that I disagree with!


I always use the term 'in-memory copy'. You have a central persistent
storage, say a database, and in there you have entity instances (==
data) of entity definitions, like customer, order etc. To work with the
entities in-memory, you create in-memory copies of them (you fetch the
data, which effectively makes a copy of the entity) and store these
copies in in-memory containers, i.e. the objects.

It is fairly conventional to reserve the word "entity" to *only*
represent things (like real humans) from the problem space. Entities
generally exist independently of the computer and its data model.

You evidently have a very liberal use of the word "entity", allowing it
to mean different things at different times. Now that can be useful -
sometimes a word with an overloaded meaning is just what you want.
Nevertheless, I would prefer it if you could limit your usage to the
convention I've seen used by a number of authors, only because it works
very well in practise and it will greatly help me understand what
you're saying.


It's perfectly fine to work with multiple in-memory copies of the same
entity.

I interpret this as saying it is fine for an interpretation to not be
1-1. Agreed.

BTW, this was one of the main points I made in my first post from the
previous thread. I stated that it was semantically valid for a single
human to be simultaneously modelled by multiple, distinct run time
object instances - such as a SalariedEmployeeModel object as well as a
SalesEmployeeModel object.

The curious thing is that many OO designs go to a lot of trouble to
make sure that only one object is associated with each entity from the
problem space.

After all, they're all *stale*, not the real entities, they're
copies. This means that if you have 2 processes, A and B, and both
update an entity E in-memory (which thus means, they both update their
in-memory copy) and then both persist their E copy to the persistent
storage, you've to have concurrency rules in place to make sure A and B
won't overwrite each-other's work. (or better: make sure you won't have
2 processes doing double work, scheduling work is more efficient)

The only way to allow a single process to model an external entity in
more than one way is to divide the total system into sub-systems, each
with its own independent interpretation mapping. Each such
sub-system is said to work at a single, self-consistent "level of
abstraction".

This is complexity which is unnecessary, simply because whatever you
cook up, having data in process A and process B where B is the central
store for the data means that A holds a stale copy of the data
contained in B. So if there are many more A's or just 1, it doesn't
matter.

---- Section 3: Why is it a misconception?

This is demonstrated with the following code

class Employee
{
public:
string GetName() const { return name; }
void SetName(string newName) { name = newName; }

float GetSalary() const { return salary; }
void SetSalary(float newSalary) { salary = newSalary; }

private:
string name;
float salary;
};

void foo()
{
Employee* e = new Employee;

e->SetName("Albert Einstein");
e->SetSalary(25000);

e->SetName("Kurt Godel");
e->SetSalary(29000);

delete e;
}

This code is not intended to be an illustration of good design. That
is not its purpose!

It is generally assumed that when an object is created it has
identity, and it keeps that identity until it is destroyed.

The object, not its contents.

Huh? You speak a strange tongue.

It is
not allowable for an object to change its identity over time. It can
only change its attributes. This allows pointer comparisons to be
used for object identity tests.

on the object, not its contents.

That is clearly in conflict with the "misconception" that was defined
in section 2.

No, I don't think it is.

What? I thought we agreed that the definition in section 2 was
problematic?

I am of course assuming that the interpretation map
makes use of the 'name' member of an Employee instance. Under this
interpretation, the definition of object identity changes from one
call to the next in function foo() above.

and here we are, the perfect proof you mix two things. The object
itself != the data it contains, when you interpret the semantical
aspects of the data, like 'Name' is the uniquely identifying attribute
of the data contained by the object.

No this is the perfect proof that you didn't read my post carefully
enough. We agree, but you don't realise it. Section 3 is all about
illustrating the problems with the misconception. I say that the
definition of object identity according to the misconception implies
that the identity will change from one call to another in foo().

This poor definition of object identity makes it impossible for an
object to provide a Clone() method, such as

Employee* Employee::Clone() const
{
Employee* e = new Employee;
e->name = name;
e->salary = salary;
return e;
}

I don't think it does. you just copy an in-memory copy to another
container. Which is the same action as copying the persistent store
real instance from its container (row in table) into memory in the
initial object container.

Yes again we agree, but you do not realise.

Did you even question why I would spoon feed you an example that would
show you I was wrong?


The reason is that under the interpretation the clone would map to the
same entity (in the problem space), in conflict with the assumption
that the interpretation is 1-1.

The interpretation is 1:1, but the containers aren't. In my small
foo1==foo2 example above, I mentioned the method Equals. Say you
override that method in Employee to compare two instances based on the
Name value and return true if the names are equal and false if they're
not.

No, the interpretation is *not* 1:1 After cloning an Employee object,
we have two distinct objects with the same state (and therefore map
under the interpretation to the same entity), and therefore the
interpretation is not 1-1.

Your definition of Equals is indeed appropriate for testing whether two
distinct objects represent the same human.

You clearly agree with my post.


At that moment, you have solved the problem: you can then find another
in-memory copy in a collection you receive from some process without a
problem.

You should read about the Identity Map pattern:
http://www.martinfowler.com/eaaCatalog/identityMap.html
It's a pattern to solve identity mismatches in-memory, if you want to.
The 'if you want to' is added by me, as I find that Fowler brings it
too harsh: identity map is sometimes needed but in a lot of cases you
don't need it, as it can limit the scalability of your application. It
though illustrates the difference between an object instance and the
in-memory copy (which is also an instance) of an entity, the data
inside the object instance

I more believe in context-specific identity maps. This means that in a
semantical context (== small domain in your program, like the logic
controlling one or two screens) it could be useful to have one object
instance for every entity instance, and you then use an object ('the
context') to track if you already have an instance of an entity in some
object instance in your semantical context, and if so, let the context
object provide it for you.

I hope to post on my blog later this weekend my essay I wrote for
Jimmy Nilsson's latest book Applying Domain-driven design and Patterns,
which describes the differences between entity instance and object
instance in more detail.

Cheers,
David Barrett-Lennard

.



Relevant Pages

  • Re: Godel cant tell us what makes a mathematical statement true
    ... definition will connote 2 opposite semantics for the 2-tuple. ... I've done demonstration on model subjectivity using a model with 2 ... predicate in the universe of individuals is a subset of the Universe; ... So now the interpretation set I (for the formula ...
    (sci.logic)
  • Re: Godel cant tell us what makes a mathematical statement true
    ... definition will connote 2 opposite semantics for the 2-tuple. ... I've done demonstration on model subjectivity using a model with 2 ... predicate in the universe of individuals is a subset of the Universe; ... So now the interpretation set I (for the formula ...
    (sci.logic)
  • Re: Godel cant tell us what makes a mathematical statement true
    ... definition will connote 2 opposite semantics for the 2-tuple. ... I've done demonstration on model subjectivity using a model with 2 ... predicate in the universe of individuals is a subset of the Universe; ... So now the interpretation set I (for the formula ...
    (sci.logic)
  • Re: Goldbach Conjecture & the Foundation of First Order Logic.
    ... *don't need* to talk about semantics, ... then semantics; model; and interpretation could not be disregarded. ... proof-system from the axioms of the theory. ... We say e.g. "Socrates is wise and Plato isn't" logically ...
    (sci.logic)
  • Re: ILC2005: McCarthy denounces Common Lisp, "Lisp", XML, and Rahul
    ... >> It's unimplementable for arbitrary places in the presence of threads ... >> (a given container would need to appear to have different elements in ... I guess the semantics is different than I would expect, ... > I think it could be useful to have dynamically scoped variables ...
    (comp.lang.lisp)