Re: Object identity
- From: "David Barrett-Lennard" <davidbl@xxxxxxxxxxxx>
- Date: 29 Jun 2006 21:59:44 -0700
H. S. Lahman wrote:
Responding to Barrett-Lennard...
This is a good example of the difference between object and instance.
What you have is two objects but only one instance.
Hmmm. I don't like that terminology at all. I think there is only one
object (pointed to by e). I don't distinguish the object from the
instance at all. I regard these as perfect synonyms (unless the class
instance is a value type, in which case there is no object at all).
Au contraire. The Einstein and Godel objects are quite explicitly
defined and one can inspect those definitions whether the code is
compiled and executed or not. That a member of the Employee set exists
with identity of "Albert Einstein" and salary of 25000 is quite clear.
And it is equally clear a different object of the Employee set exists
with the identity "Kurt Godel" and salary of 29000.
The humans Einstein and Godel are entities, *not* objects. I think it
is entirely non-standard of you to call them objects (in the context of
a discussion about OO, and in particular object identity).
Yes, the humans that the Einstein and Godel objects abstract are problem
space entities. But objects are abstractions that represent them with
properties like Name and Salary. Those objects are defined in your code
example (and the class definition for Employee).
Problem space:
Conceptual entity: Employee
Concrete entity: Einstein
Concrete entity: Godel
OO software space:
Class: Employee {Name, Salary}
Object: Einstein {Name = Einstein; Salary = 25000}
Object: Godel {Name = Godel; Salary = 29000}
And those objects exist conceptually independently from the instance, 'e'.
That is value semantics, not object semantics. The moment you decouple
the content of an instance from its location in memory, you are using
value semantics. A value-type can be copied around as much as you
like, and it always represents the same underlying entity.
I don't follow this argument. Objects aren't values; at best they are
multi-valued because they usually have multiple properties that have
values.
Value types can have multiple fields as well. Consider a C++ Date
class that stores year, month and day fields, and this is used as a
value type throughout the program. It tends to be passed by value
into methods. It tends to be stored by value in other classes. No
code is interested in comparing pointers to Date instances. This is
because it is a value-type, not an object-type.
And your use of "value-type" seems more like the 3GL
implementation of a class as a type than an object implementation.
Unlike Java or Smalltalk, C++ lets you create your own value types,
such as a Date class. Instances of a Date class are used like int and
float. There are no objects directly associated with instances of
Date. There are only variables that store date values. A date value
is mapped under an interpretation to the "real" notion of date that
exists independently of the computer.
Last time I looked I could create a Date class, overload operators to
manipulate it, and instantiate it in almost all OOPLs. The notion of
"date" is a well-defined problem space concept so it can be abstracted
as an object.
We agree that it can be treated in both ways. But as an object-type,
according to your rule, there must be a 1-1 mapping between instances
and entities (in some scope). That forces a design to only allow at
most one instance to represent a given date value. That is rather
difficult to enforce in practise. That would be like ensuring there
is only one representation of the integer 0 in a given scope. In a
pure OO language like small talk you are going to have some pain!
Would you be happy to simply compare pointers to date objects and
assume different pointer values imply different date values? If a user
enters a date, how are you going to ensure there isn't another Date
object in the model with the same value, breaking your rule? What
horrible impact is this going to have on the schema?
What you seem to be describing here are attribute ADTs, which are fully
supported in OOA/D and partially in most OOPLs. But attribute ADTs
aren't first class objects (though a particular language compiler may
implement them that way) -- they describe specialized data structures.
Your definition of "object" has nothing to do with (pure) OO.
You can keep saying this, but it doesn't make it true. Challenge: Cite
an established OOA/D author who defines an object as anything other than
an abstraction of an identifiable problem space entity and who does not
make a distinction between that abstraction and its instantiation at run
time as a memory image.
I've already said that OO texts are too informal.
For starters, from the Dictionary of Object Technology...
Instance n. 1 (a) Any object instantiated according to the definition
provided by the given class.
Object n. 1 (a) Any abstraction that models a single thing.
Yes, I think that's silly.
Remember that I claim that the confusion about the semantics of object
identity is rather wide spread. People who use OO to model entities
come out with quite different views than people who use OO to develop
complex state machines.
Let's use some consistent terminology and *always* use the word entity
for particular things that exist independently of the computer. Or do
you somehow distinguish between entity and object, even before you
compile and execute the code? If you do please provide a sufficiently
formal definition so I can understand what you mean.
When have I used 'entity' to mean anything except things that exist
independently of the computer? When have I not fully qualified them as
'problem space entity'? In the quoted text I am specifically talking
about /objects/, which are design abstractions that /represent/ problem
space entities. The terminology I am using is standard OO terminology.
You use the standard words but not the standard meaning.
See above.
Objects obviously exist outside the execution context because they are
models that the developer provides during design. (In a model-driven
development that will be long before the 3GL code is even written, much
less executed.) That's why I pointed out a couple of messages ago that
there is a difference between 'object' and 'instance'. There are three
levels here:
entity -> object -> instance
which can be paraphrased as:
reality -> abstraction -> execution instantiation
There are only two levels for me : Entity and object. Object is a
synonym for instance.
And that view is not consistent with the OO view, as demonstrated above.
As a practical matter one needs to define instances separately simply
because computer programs take finite time to execute and the
computation model is fundamentally serial. So one must be able to
distinguish life cycles of instantiation (i.e., times when the object is
available and times when it is not). That is important to things like
heap reuse. It can also affect relationships (e.g., dangling pointers
when the instance is deleted). IOW, one can't have a rigorous execution
model unless one can characterize when objects are instantiated and when
they are not.
I prove an implementation correct by showing that the run time system
is isomorphic to a hypothetical system that is simpler and easier to
reason about. That is far more rigorous than the above.
I have no problem with an object (ie an instance of a class in memory)
that can have fields changed, making it suddenly represent a different
entity under some interpretation. This is allowed because unlike RM,
OO doesn't itself come ready made with semantics that relates back to
an interpretation. I'm somewhat with Bob Badour when he likens OO to a
methodology for "constructing large unpredictable state machines out of
small predictable state machines". Now I wouldn't go quite that far,
but it does seem clear that the onus is on the programmer to formally
prove that an OO program will solve the problem at hand. Using OO can
be as dangerous as assembly. You have complete, unrestricted access to
a Turing machine. You can express logical, correct solutions as well
as incorrect ones that sound right but require careful analysis to
reveal subtle errors. This promiscuity allows the OO developer to
create wholly new algorithms and techniques. But that power and
generality comes at the price of only low level implicit semantics.
The onus is on someone to prove that /any/ software works correctly. As
it happens, OO software is unique because one can demonstrate
correctness at the design level since well-formed OOA models are
executable in themselves.
You're just talking about design by contract, which is always needed
for imperative programs. Any (well written) C program can also be
proven true at the design level.
No, I am talking about the OOA model being executable and, therefore,
testable.
I imagine you think that a class diagram always captures most of the
design. I use OO for systems programming, and in that domain a class
diagram is not so useful. The problems I solve are mostly algorithmic
in nature. For example, I'm currently working on a sub-system to
partition objects into spaces that are independently garbage collected,
and support shared read and exclusive lock modes. I'm definitely using
OO, but the class diagram is simple and not very interesting. Most of
the detail is in the locking and garbage collection algorithms. I find
C++ to be a reasonable (but far from ideal) way to directly express the
algorithms. Separate documentation is required to formally prove
correctness of the algorithms.
I'm afraid your assumption is incorrect (first sentence). A Class Model
only defines static structure. Problems cannot be solved and models
cannot be executable without a dynamic description.
Good, we agree.
Whether OO is even appropriate for highly algorithmic problems depends
on how much of the algorithm is defined outside the specific problem
context and how much is unique to the problem in hand. For example, if
I need to solve a problem with linear programming, I am not going to
write my own RDS code using OO. I am going to use a commercial package
to do the grunt work and all I will use OO for is setting up the
constraint matrices, providing a basic feasible solution, and I/O. If a
commercial package wasn't available so I was forced to write it myself,
I would probably use C or an FPL rather than OOA/D/P.
OTOH, if one is doing OOA/D, then one is better off with an AAL rather
than an implementation language like C++ to describe what methods do.
The problem with implementation languages is that they are at a much
lower level of abstraction so it is much easier to pollute the design
with low level detail.
Another advantage is that one's access to the hardware computational
models is actually severely restricted in OOA/D. For example, one must
resolve functional requirements in an OOA model without /any/ pollution
from the computing space or else the reviewers will get out the
crucifixes and garlic cloves. (As I think I already pointed out in this
thread, one should be able to implement an OOA model unambiguously in
the customer space as a manual system and one can't do that if the
solution is explicitly twiddling the hardware.) That allows one to
separate the concerns of the computing space from those of the problem
space and focus on each individually better than one can do in any other
approach to software development that I know of.
I don't know what you mean. Eg, what is a "hardware computation
model"?
Turing's, von Neumann's, etc. Though more abstract than Assembly, 3GLs
are still very closely married to hardware conventions, such as call
stacks. For example, none of the popular 3GLs provide direct support
for concurrent execution because the hardware models are all based on
serial processing (sequences of discrete operations). To provide
concurrency one needs additional software infrastructures like threads
to be boot-strapped.
The point of my paragraph, though, was that one can resolve functional
requirements without knowing anything about the hardware. If one
isolates that solution, one has better focus by eliminating the
computing space concerns. Conversely, when addressing non-functional
requirements one can focus on resolving computing space issues.
In addition to managing complexity by divide-and-conquer, one has
additional benefits. Specialization is possible because the skills
problem spaces are different. One can also achieve design reuse by
reusing solutions for nonfunctional requirements (e.g., design patterns
for things like write caching) across multiple problems. In the
extreme, with translation one gets full reuse of /everything/ in the
computing space so the application developer solving a specific problem
only needs to worry about resolving functional requirements.
Seems a little off-topic :)
You can't write a general purpose program that can look at a snapshot
of the run time state of any given running OO system, in order to
deduce truths (in the form of predicates about entities), even if it's
provided with the source code, and is also able to find all the global
variables and navigate every thread's frame stack. For a start it is
faced with the problem of finding a consistent cut. It can't hope to
always "understand" some given source code because of the halting
problem. How does it know which objects in memory to trust, and which
not to? Eg is an object just for temporary purposes for some
algorithm? How does it know what the algorithm is for?
But you can't do that (your first sentence) for /any/ program. You can
only demonstrate correctness at run time by either testing it or through
logical reduction from the initial design down to the Assembly to
demonstrate that the design was correctly implemented at each level
under the rules for each refinement in those level transformations. IOW,
you would have to prove that things like the 3GL compiler did the right
thing. There is no way you are going to apply RM or anything else to a
bunch of 1s and 0s in memory to directly determine correctness after
optimizing compilers, linkers, loaders, and the OS have gotten through
chewing on the design.
I wasn't comparing technologies. My point it is that many OO systems
are best described as complex state machines (rather than models of
problem space entities), and this conflicts with your semantics of
object identity.
As it happens, I use an OO methodology where all behavior is described
with state machines -- which is a lot more emphasis than most OO
methodologies place on state machines. But they are object state
machines. One does not attempt to describe behavior spanning objects
with large, complex state machines. As such they are usually quite
simple and self-contained at the object level. So I don't know where
this notion of OO /systems/ being state machines comes from.
You can emulate that with the proper tools, like a 3GL source debugger
or a UML model simulator. But you are depending on the tools to provide
the necessary mapping of the 1s and 0s produced to the high level
constructs you are looking at in the tool. IOW, the tool provides the
logical reduction from the user view to the machine view. So if the
code generator or compiler screwed up and didn't follow the
transformation rules the tool depends upon, that will be manifested in
the debugger/simulator as inexplicable results. Been there; done that.
So?
My point is that you are offering a test of mapping that doesn't exist.
You can't prove the mapping from the run-time execution image. You
can only demonstrate the mapping at the design level where the level of
abstraction is sufficiently high. A 3GL debugger or model simulator
gives you the illusion of looking at the execution image, but it depends
upon the same rules compilers and whatnot use to ensure correct logical
reduction of the design abstractions when creating the execution image.
So if the mapping is broken in the execution image because the rules
weren't followed, you don't know how because all you see manifested are
garbled results.
I only require there to be an isomorphism between design and
implementation. If the design is correct then so must be the
implementation.
You on the other hand are constraining the design/implementation with a
rule that I think is quite restrictive and unnatural.
Thinking of OO merely terms of simple class diagrams and modelling of
relationships is at best an over-simplification. More to the point,
that limited view emphasises exactly what OO is poor at :
classification and storing relationships about entities.
First, where do you get the idea that this is all I see in OO? One
defines /solutions/ and those solutions can be validated at the UML
model for functional requirements.
Complex algorithms often can't be validated merely with a UML model.
Eg, an algorithm may require proof by induction. How do you prove that
a multithreaded design can't dead-lock? UML is certainly useful, but
hardly the be all and end all.
Your first sentence is not correct. Translationists do this all the
time. I spent a decade solving np-Complete problems on machines that
were too small.
I think of UML models as incomplete. They are merely a way to document
aspects of the design. You evidently do not.
Since the models are executable, one can simply test the application.
However, one also has the same techniques for logical correctness proofs
as one uses in any other software context. They are just a lot easier
to use because the representation and the tools to navigate that
representation are at a higher level of abstraction.
Remember, I am a translationist. Code generators do what you say, not
what you meant. So the solution specification must be complete,
precise, and unambiguous. So long as it is, logical reduction is possible.
However, the second sentence really boggles my mind. Relationships are
crucial to OO development. Among other things, they are critical to
managing access to data. OO relationships are implemented at the object
level rather than the class (table) level in order to restrict access to
data. OO relationships are also crucial tailoring solutions to the
problem in hand so one usually has much better performance than one
would have if restricted to the RDB view of relationships. I could
probably do several more paragraphs on how unique and important OO
relationships are to the paradigm.
OO is good at classifying and storing relationships about objects
(using my definition of object - which is directly associated with
instances). OO is relatively poor at classifying and storing
relationships about entities - as I said above.
I think your interpretation is the problem. If you would accept the OO
notion of an object that problem goes away because object relationships
are defined among objects based upon their counterparts between problem
space entities. IOW, one can model exactly the same complexity one has
in most problem spaces.
I'm saying tangible things. Your language is inprecise and can't even
express what I'm saying, so of course you will never agree. For
example, you don't comment on my assertion that OO is good at
classifying and representing relations about instances, and not
entities.
So long as instances map 1:1 to objects at any
moment in time, correct instantiation of the relationships is fairly
trivial.
This 1:1 mapping can be more difficult to achieve than you imagine.
You don't draw the distinction even though it exists and is easy to
define. I don't even know where instances feature in your formalism.
You think of objects as values that implicitly represent entities.
That may be true when OO is (mis)used in business applications. It is
not true in problem domains where OO works really well.
I don't think of objects as values at all. Objects are design
abstractions of identifiable problem space entities, not values. Only
object properties can be represented by values.
You do but you don't realise it. That was obvious when you decoupled
the content of the Employee instance from its location in memory, and
called that state an "object".
In systems programming you deal with things like queues, stacks, maps,
sets, threads, mutexes, semaphores, caches, smart pointers. A design
is expressed in terms of these building blocks. When I push an
element onto a queue, I know the queue instance has changed, not some
(external) entity. Identity is associated with the instances in
memory. You say the instances in memory are just abstractions for
mathematical notions of stack etc, and don't feel any need to associate
identity with the instance in memory. How can you possibly validate
the source code? More specifically, the imperatively expressed
algorithms.
And every item in your first sentence is a well-defined problem space
entity that can be abstracted as an object in software and instantiated
in an execution environment. The problem space just happens to be
mathematics or computing rather than something like banking.
Your lack of formalism makes you say things that seem right but are
rather more subtle. Consider an algorithm that deals with multiple
threads, stacks, mutexes etc. I'm sure you agree you can't map each
independently to a corresponding mathematical notion divorced from
identity. Otherwise you can't validate the algorithm. Instead, only
the whole sub-system can be mapped under interpretation to an
isomorphic mathematical system that represents none other than the
original state machine, with all its interacting parts. By proving
that the mathematical state machine works correctly, we end up
validating the implementation. However, because the entire state
machine has been mapped isomorphically to another state machine, it
seems easier to simply say that the instances *are* objects and don't
bother to draw a distinction.
Now do this in the banking domain and a Person instance is mapped to a
real human. Is this a reasonable isomorphism for the state machine?
Not for me! Real humans aren't simple mathematical entities with
well-defined state and behaviour.
Behaviors like pushing onto a stack are fully defined for the problem
space entity and they are abstracted as necessary in the object
definition. The object has unique identity that maps directly to the
problem space entity's identity. And a run-time instance of the object
has unique identity that maps directly to the object. Everything maps
unambiguously, so where is the problem with validating the application?
As long as you introduce one object per instance (as I did in the sense
of an isomorphism above) then we agree. But then I don't see any
point in distinguishing object and instance.
But you evidently don't do that. You allow an instance to map to a
different object as the program runs.
One would only have problems with validation if the mapping of
identity was ambiguous, such as clones.
In Business applications there are lots of entities that exist
independently of the computer. It doesn't surprise me that OO is good
for systems programming and not so good for business applications.
Say, what?!? Where do you get these ideas? There is probably at least
an order of magnitude more OO applications in the IT arena than in
R-T/E. Are you saying those developers are all seriously misguided?
It depends on whether they use OO to build an application around data
that is stored using RM, or they throw away RM completely and use OO to
store information about entities that is recorded very well in the form
of relations.
Bottom line: if there is one single thing that would give OO an
advantage over other approaches to software development (where it is
applicable), it would be that way the OO paradigm deals with
relationships. It is a far more versatile approach than the RDB-style RDM.
You say RM is inferior to OO for storing relationships, even though R
stands for "relational" in RM. That's a bold statement! I think it's
too generic and lacks any substance. I neither agree nor disagree.
You've referred to "storing relationships" a couple of times. What do
you mean by that? In an OO application one defines, implements,
instantiates, and navigates relationships but one doesn't store them.
I certainly don't mean persist. I'm just talking about state being
used to represent information.
The relationships in the RDM for RDBs (which you will note I have been
careful to specify whenever "RM" comes up because that was the context
of the original thread that triggered all this) are highly structured
and quite limited because that is all that is necessary to describe
static data structures in a manner that is independent of usage. OO
relationships need to be more versatile because they are constructed to
supported dynamic (behavioral) collaborations in addition to providing
static structure for data.
I don't want to talk about differences between RM and OO anymore :)
While technically ensuring unique identity for each object instance in
the language implementation, that mechanism opens up a host of
referential integrity problems that are pushed off onto the developer.
That's why the abstract action languages for OOA/D don't allow that to
happen; instance creation is a fundamental operation and the instance
identity mechanism is not exposed to the developer.
I don't agree. Just find a Clone() method on a class in an OOD.
None of the AALs I know of for OOA/D will allow you to do that. They
all treat object instantiation as a fundamental operation where one must
fully initialize the instance as part of the creation. That operation
will always produce a unique instance (i.e., one can't "reuse" a memory
instance as you can in C++). The model simulators and debuggers will
also flag duplicate identity attributes as an error. [If an object is
identified by an embedded attribute, then that attribute must be
designated as such in the Class Model, just like in a Data Model. So
the model debugger will detect the duplicate, just like a DBMS would in
a properly normalized RDB.]
IMO you are talking about some small off shoot of OO used specifically
to model entities. I think this is a misuse of OO. For you it is
what OO is all about!
I don't think it is not an offshoot of anything. I am talking about the
standard notation for doing all OOA/D: UML. UML provides a semantic
meta model for abstract action languages that describe behavior in OOA/D
models. That was necessary to make UML a full 4GL. (More precisely, it
enables creating instances of UML that are 4GLs.)
[In part the OOPL is confusing things by allowing the object to be
instantiated without proper initialization of identity. (One could not
do that in any of the abstract action languages used for OOA/D that I
know of.) In part the OOPL is confusing things by allowing an
optimization by reusing the memory for the object without the overhead
of heap operations. IOW, the OOPL designer is offering the developer an
opportunity for foot-shooting by washing his hands of referential
integrity issues and pushing them all on the developer. This is why I
argued that looking at OOPL code is not a good place to learn OO.]
IMO the only reason the OOPL is confusing things is because you want to
deal with object identity at the level of the entities in the problem
space. This problem disappears if you simply associate objects (and
their identity) with nothing other than the class instances that reside
in memory. In my mind this is a case of "less is more".
The problem is that objects, unlike RDB tuples, usually do not have
explicit identity. Instead it is often defined referentially, which
maps conveniently to a memory address in hardware so the OOPLs provide
infrastructures around that paradigm.
We don't share the same definition of "object". For you it is an
entity (I think). For me it is an instance in memory.
Right. This is becoming clear. However, in the OO paradigm an object
is a uniquely identified abstraction, not an instance in memory. One
uses 'instance' for the memory image of the object.
Can you find a reference that states that?
I did above from the DOT. I could provide dozens of others.
I don't care about the terminology, as long as there is an isomorphism
between instances and objects, because that shows there is no real
difference. Do any of the references discuss that?
That is clearly in conflict with the "misconception" that was defined
in section 2. I am of course assuming that the interpretation map
makes use of the 'name' member of an Employee instance. Under this
interpretation, the definition of object identity changes from one call
to the next in function foo() above.
This poor definition of object identity makes it impossible for an
object to provide a Clone() method, such as
Employee* Employee::Clone() const
{
Employee* e = new Employee;
e->name = name;
e->salary = salary;
return e;
}
The reason is that under the interpretation the clone would map to the
same entity (in the problem space), in conflict with the assumption
that the interpretation is 1-1.
This is a different problem. A true clone function creates different
instances with unique identity (address) but the instances all map to
the same object abstraction Employee{name,...}. This creates an even
nastier set of problems for referential integrity for relationships when
one changes the salary. The solution here is to break the thumbs of any
developer who does something like this.
So you say the clone function() is valid, but a developer who calls it
would be wise to first get insurance on his/her thumbs? :)
Exactly. It is valid at the 3GL level because of imperfections in the
OOPLs, but it is not valid at the OOA/D level.
All I can say is YUK
Why? What need is there for clones? Give me a plausible example
Ok. A user creates a CAD drawing. This uses the composite pattern for
grouping the basic graphics elements. Therefore a drawing has a tree
structure. Semantically a node object (= instance) owns all its
descendent graphic elements. A node object isn't semantically bound to
some particular external entity. Therefore it is semantically valid to
clone any node (which implicitly means a whole sub-tree of the drawing
is cloned). The clone takes on an independent identity, and can be
inserted somewhere else in the tree. This is needed to implement copy
and paste - quite a useful thing in a drawing program, particularly
when the clone will then be modified to suit a specific requirement
needed by the user.
The node object is semantically bound to a specific element of the CAD
drawing. IOW, the CAD drawing is the problem space
That's vague! What happens as the drawing changes? Does the problem
space change as well?
and each element in
the CAD drawing is an entity that is uniquely identified by its position
(and other context) within the CAD drawing. (Which, in turn, is a
representation of something real like an electronic circuit, but from
the software's viewpoint the CAD representation is the problem space.)
Each member of the Node set in the composite abstracts one of those
drawing entities and the Node identity is mapped directly to a specific
CAD drawing element.
Why not just say there is an isomorphism between a mathematical CAD
drawing and the one represented in memory? Who cares?
What you are cloning are the property values of the Node set members
because they all happen to look alike in the CAD diagram. So if your
CAD diagram happens to be an electronic circuit schematic, you will have
lots of Resistor nodes with a value of 100K. But each resistor is still
a unique (identifiable) element of the circuit and it is also a unique
(identifiable) circuit element in the CAD drawing. And it will be a
unique (identifiable) member of the Node set in the application design,
regardless of the fact that its elements have the same values as some or
all of the other Nodes.
Yes obviously when you clone an object you copy its state. Do you
still say cloning is not required?
My example demonstrates that it is not merely the state within an
object that matters. It is fair and reasonable for two different
objects to have identical state.
In relational terms the same mapping prevails. You would have a [Node]
table with a tuple for each element in the CAD drawing. Each Node
object becomes a tuple that maps to a specific CAD drawing element. The
fact that all the tuples happen to have the same non-key attribute
values is serendipity. (But, unlike the object model, you would have to
provide an embedded key, such as {x,y} position in the diagram, that was
unique to the CAD drawing element.) IOW, you can't have clones in the
OO model any more than you can have clones in the relational model.
No. Clones have different identity so they can coexist.
I wonder what terminology you use instead of "clone". Do you have a
suggestion?
from
some problem space where one would have use for clones (i.e., problem
space entities exist without unique identity). Or an example of a
design where one needs to have multiple instances with the same identity
based on a single entity.
It seems you assume OO run time state *always* binds (under
interpretation) to specific entities in some problem space. That is
simply not true in general.
It had better be true in an OO application or one risks serious
maintainability problems down the road. It had also better be true if
one is to have any reasonable requirements traceability.
I say this in the spirit of examples like the CAD drawing. Of course
you can always create a trivial isomorphism to a mathematical notion of
the CAD drawing. That is hardly relevant to "requirements
traceability"
Sure,
there is some mathematical concept behind the notion of Integer Number
that is abstracted, but that notion is really only of interest to
implementing software on a hardware computer.
Yes, but that is after all the topic of this discussion.
Then we have a big disconnect. I have been talking about solving
problems using an OO methodology, not designing tools.
Note that your sentence betrays an aversion to wanting to treat numbers
as real in any sense whatsoever. You use words like "concept",
"notion", "abstracted". I suppose you say most numbers don't exist
because no one has written them down. I on the other hand am a
Platonist and don't lose any sleep over this, or force myself to
pollute my sentences with lots of additional but meaningless words to
indicate that numbers aren't real.
I have no problem with numbers as a concept. I don't even have a
problem with them being treated as objects in the implementation of an
OOPL. What I have a problem with is making them first class objects
having equal stature to those that abstract problem space entities.
Saying a number is an object just seems absurd to me. Objects by
definition have identity, state and behaviour and are associated with
instances in memory.
Pretty much my point. You were the one who was arguing that Integer is
a problem space entity and, therefore, could be abstracted as an object
in an OO context.
No, actually if you look at my definitions you will discover that I
only talk about entities with regard to value-types. I'm silent on
some mapping between objects and identities.
Note that RM happily stores relations about humans or numbers in the
same database. They are all just "named" entities. Eg < is a
relation on numbers. PROLOG makes it clear that you treat < as just
another predicate, as if you had listed all the unit clauses defining
the relation yourself.
I find this to be a simple and elegant way to think.
That's fine, but it is a different paradigm than OO.
It's merely a definition of the word "entity". Don't call it a
paradigm.
IOW, once one is out of
the realms of computers or pure mathematics, the notion of Integer as an
entity is a pretty alien concept.
I don't disagree
There 2189 is just a value of some
bit of knowledge.
No. Outside the realm of mathematics, 2189 doesn't even exist, and
neither does computer science.
Try telling that to a customer who is looking at their bank balance.
B-) Numeric values do exist ubiquitously in most customer problem
spaces. That's why computers are ubiquitous and knowledge is one of the
two ways one can abstract problem space entities in the OO paradigm.
Commenting on this just seems rather pointless. Do you consider
computer science to be a branch of applied mathematics?
No, it is a hybrid with a dominant hardware component (if one views
Software Engineering as a complimentary discipline). My alma mater
still has CS as part of the EE department, BTW!
Thus entities can be abstracted with knowledge but
the bits of knowledge aren't are the same level of abstraction.
Huh?
An object abstracts a problem space entity. Part of that abstraction
are bits of knowledge that the entity is responsible for knowing. But
those bits of knowledge are properties of the entity, not the entity
itself. My point above about numbers is that all first class objects
should abstract entities, not properties.
I find the lack of formalism in what you say excruciating.
An object is an abstraction.
An object abstracts some unique problem space entity.
An object has unique identity.
Object identity maps 1:1 to problem space identity.
An object is composed of properties.
Object properties are defined as responsibilities.
Object responsibilities can only be defined as behavior (to do
something) or knowledge (to know something).
Where is the excruciating lack of formalism in this? The semantic meta
model for UML even provides nice diagrams to describe it.
What does "abstracts" mean? What exactly is a property? What do you
mean by "responsibility"? How does all this relate to actual source
code?
Every sentence above invites more questions.
Definition: A *problem space* is a (mathematical) set of entities that
are relevant to solving a given problem using a computer. Entities in
a single problem space are allowed to form has-a relationships. For
example, Albert Einstein is an entity, and Albert Einstein's left
eyeball is an entity as well. Entities in a single problem space are
allowed to be at different "levels of abstraction". Basically there
are no restrictions!
Relative to your first definition, I think there are restrictions. For
example, entities must have abstractable properties that are relevant to
the solution. I would go further and argue that entities should have
multiple properties. (Only one may be relevant to the problem in hand,
but then a reviewer should want a lot of justification that was so.)
Remember that the OO paradigm provides for three fundamental levels of
abstraction: subsystem; object; and responsibility. (Nesting, as in
subsystems, is allowed and objects can embed other objects, but
mechanisms like implementation hiding mitigate that.) It is subsystems
and objects that map to problem space entities. By implication problem
space entities are necessarily complex to support further subdivision
into properties.
Entities exist independently of the computer. So I don't see what any
of the above has to do with it.
But subsystems and objects are solution constructs. The scale of the
construct is important. Subsystems and objects abstract problem space
entities. As such they are first class objects and they have unique
identity. However, numbers are only relevant as values of knowledge
properties. So they don't need to be first class objects with unique
identity. Their identity is already expressed by the identity of the
property and identity of the owning object.
YUK. Are we even in the same profession?
Hey, you are the one espousing the relational view. In that view an
object is a tuple and object properties are tuple attributes. Only the
tuple attributes have values. The tuple attributes do not have identity
other than the tuple identity and their data domain. All I am arguing
here is that the OO paradigm has the same sorts of restrictions that the
relational paradigm has in response to your assertion that there are no
restrictions on mapping the problem space (last sentence in you 1st
quoted paragraph in this subsection). And those restrictions preclude
making Integer a first class object (your definition of a value-type).
The
only things that have values in an OO context are object properties (and
object identity when it is not an explicit attribute). But object
properties do not have unique identity; their identity is attached to
the object. So there is no separable instance of a value unless one is
talking about computer hardware implementation, which is only relevant
if one is implementing an OOPL (e.g., computing offsets from the
object's address).
Why should a value-type be limited to a member of an object?
Because that is how the OO paradigm limits the way one can abstract
problem space entities? B-)
Such limitations are necessary so that the OOA specification can be
mapped unambiguously into OOD and, in turn, an OOD specification can be
mapped unambiguously into an OOP program. There is a fundamental
problem in spanning the gap from most customer problem spaces to the
computing space. Such semantic constraints are necessary to migrating
across that gap.
For example, knowledge may be an abstract notion in OOA/D represented by
some neat-sounding ADT, but ultimately it must be represented as a data
store somewhere in memory. In turn, that data store needs to be
unambiguously referenced. That may translate into offset table
addressing. But for that to work conveniently, one can't have a bunch
of special cases. One needs a nice, simple mapping:
object identity -> object address -> offset table -> attribute value.
but that only works cleanly if one restricts values to object
properties. IOW, it is an issue of providing a simple but very general
mapping from problem space entities to computing space artifacts.
Definition: An *object* is a class-instance that is regarded as having
identity tied to that class-instance. For the purposes of identity,
the object does *not* represent an entity under some interpretation.
Although it certainly may (in the mind of the programmer) that is
entirely irrelevant to the semantics of object indentity.
OK, here is where we part company in a big way.
One can argue that any formalism tends to be stilted and that the OO
paradigm must have an underlying basis in mathematics where unique
definitions of things like /value/ prevail. However, I leave that to
the boffins who design OO methodologies, OOA/D notations, and OOPLs.
That has all been resolved at the level of solving problems using an OO
appoach and the OO paradigm has methodological constraints on solution
construction...
I don't think you're being honest to *actual* OO (ie actual source
code). You hide behind abstractions of the source code like UML class
diagrams.
Of course. You can abuse any paradigm. To use my favorite paraphrase
of G. B. Shaw on Christianity, the only trouble with OO is that it
hasn't been tried. Most supposedly "OO" software today is really just
FORTRAN and C programs with strong typing. But I am assuming that one
wants to construct software in a methodologically correct manner. The
methodologies allow one to construct well-formed OO software without a
lot of angst over the underlying mathematics. Those OO methodologies
lead to the following quoted point because they /define/ the constraint
to ensure the underlying mathematics is not abused.
But end up making a mathematical mess.
How?
Forcing the 1-1 interpretation is just wrong unless you define a
trivial isomorphism between instances and entities, and then it's
saying nothing important at all. The whole idea is a waste of time.
Objects abstract problem space entities and they must have unique
identity that is unambiguously traceable to that of the problem space
entity that they abstract.
I think that's just an unnecessary limitation, revealed in the enormous
amounts of real code that don't follow that rule at all - such as a
Clone() method.
And they break all the time because of referential integrity problems
that they wouldn't have had if they played by the methodological rules.
I don't think that's where OO comes off the rails at all. The biggest
problem is using it in the first place to solve a problem that it
wasn't intended for.
You are right, it isn't where OO falls off the rails. It is the abuse
of OO by doing things like cloning that falls off the rails. Don't use
clones and you eliminate a whole class of referential integrity problems.
LOL
If all we wanted was to write quick, elegant programs in a short amount
of time we would all be doing functional programming. But one does OO
because one wants the software to be robust and maintainable in the face
of volatile requirements.
You often revert to generic justifications of OO that seem irrelevant
to the topic.
Maintaining 1:1 identity mapping from problem
space entity to object to instance is done to ensure that one won't have
to do a massive shotgun refactoring when the requirements change and one
can't clone because the CAD nodes no longer all look exactly the same.
It will *always* make sense and be useful to copy parts of drawings.
Similarly for text documents, source code and scene graphs.
If one has properly emulated the actual uniqueness of CAD nodes, then
modifying those that do need to change becomes <relatively> easy and the
relationship infrastructure will be unchanged, reducing the likelihood
that side effects will break something in another part of the application.
You can postulate an A&D system where that is not
necessarily true and where other mappings to the problem space are
provided (P/R and FP being obvious examples), but it would not be an OO
methodology.
My definitions are not at odds with OO, as long as you don't blur the
distinction between instance and entity. Most software using OO
doesn't blur the distinction.
I think the blurring lies in essentially eliminating the OO view of
'object' by trying to go directly from problem space entity to memory
instance. The gulf between problem space and computing space is just
too broad to do that reliably. So the OO notion of an object as an
abstraction between problem space entity and memory instance is an
important step stone in a conceptual chain of representation transformation.
I was hoping you would accept the definitions for what they are (merely
definitions), and use that as a basis for properly understanding my
perspective, and then comment on whether there are logical
inconsistencies with what I'm saying, or else tangible disadvantages,
preferably using an example to illustrate your point. Instead you
have only given me general, hand waving arguments that mostly only
repeat your own alternative definitions. I made some specific points
about why my definitions are better than yours and you didn't comment
on them.
Note that your original post in this thread referred to a
"misconception" about the OO paradigm. You then provided a set of
definitions to demonstrate where that misconception went wrong. All I
have been pointing out is that your view of entity/instance is not the
established OO view and your definitions, while consistent with some
poorly formed OOPL designs, are not consistent with good OOA/D practice.
Specifically, practices like cloning objects with the same identity
and instance reuse will inevitably lead to referential integrity problems.
The definitions I have provided are not "alternative definitions"; they
are the standard OO definitions that can be found in any popular book on
OOA/D. The point being that there is no "misconception" about the OO
paradigm.
I disagree.
Finally, with the exception of getting into the muck of 3GL type system
definitions, I believe I have responded to all the points you have made.
I think we have reached an impasse. Let's agree to disagree.
Cheers,
David Barrett-Lennard
.
- Follow-Ups:
- Re: Object identity
- From: Mark Nicholls
- Re: Object identity
- References:
- Object identity
- From: David Barrett-Lennard
- Re: Object identity
- From: H. S. Lahman
- Re: Object identity
- From: David Barrett-Lennard
- Re: Object identity
- From: H. S. Lahman
- Re: Object identity
- From: David Barrett-Lennard
- Re: Object identity
- From: H. S. Lahman
- Re: Object identity
- From: David Barrett-Lennard
- Re: Object identity
- From: H. S. Lahman
- Object identity
- Prev by Date: Re: Designing issue
- Next by Date: Re: Design
- Previous by thread: Re: Object identity
- Next by thread: Re: Object identity
- Index(es):
Relevant Pages
|