Re: Observer pattern limitations



Dmitry A. Kazakov wrote:
On 15 Jul 2006 18:05:09 -0700, David Barrett-Lennard wrote:

Dmitry A. Kazakov wrote:

It is an independent issue. I prefer to do it at once to minimize
distributed overhead of dereferencings. Lazy objects are implemented in a
different way. The object stays resident in the DB, only its small proxy
becomes memory-resident. That proxy can do cache-on-demand, with the same
effect as lazy references. It is more flexible, IMO.

Here is an attempt to understand you...

By "at once" I assume you mean synchronously or eager (rather than
postponed). I assume you are talking about some traversal over the
modified objects so that all the changes are written synchronously to
disk (apart from write caching). Furthermore this discovers new
objects that haven't been written to disk before. I presume they need
to be given some kind of persistable object identifier (OID).

Yes.

Somehow,
you are able to ensure that a consistent snapshot is written to disk.
You call this whole process "finalisation".

Finalization triggers the process. A weak reference held by the storage
object becomes a notification, before actual finalization starts. This
gives the storage object an opportunity to sync the memory object (which is
still fully functional), just before it gets finalized and then freed.

Here's another attempt to understand...

An OODBMS typically has special internal data structures that allow it
to keep track of what has been faulted into memory. For simplicity I
won't assume a page based approach. Basically this takes the form of
an invertible mapping between OIDs and virtual address (ie memory)
locations called a Resident Object Table (ROT). The map doesn't
contain entries for objects that don't reside in memory. Therefore
failure to find an entry for given OID tells you that the object
doesn't reside in memory. The ROT is used for swizzling/unswizzling.

An important concept is that the ROT only holds weak references to the
objects in memory. This allows objects in memory to be evicted, say
using an LRU policy. Note however that objects that have been changed
must be pinned in memory.


You seem to be breaking the system up into separate "spaces" in which
objects reside. A space may be associated with the virtual address
space of a process, or it may be associated with an OODB on disk. This
allows you to say that an object "resides on disk".

I think you assume an object is usually tied to the space in which it
resides. ie it can't generally migrate. Nevertheless, you allow an
object in one space to act as a proxy for an object in another space.
In particular, for when an object in memory is regarded as a proxy for
an object on disk.

You say that storage objects (ie objects on disk) contain weak
references to their associated proxies (if any) in memory. Surely
this is only conceptual because you don't want objects on disk to
directly store memory address values. The idea is simply that
conceptually a proxy is not kept alive just because of its association
to a storage object. Otherwise all proxies would be forced into memory
at all times.

However, you surely must have some kind of ROT, and that would indeed
really contain weak references to the (proxy) objects in memory. Is
that what you mean?

Now let's talk about finalization. Everything begins with a GC of the
objects in memory. In particularly this includes the proxy objects.
The ROT only contains weak references so proxies may easily become
unreachable even though their associated objects on disk are not
garbage. This was discussed in your example with objects A,B,C.

A relevant question is whether this is a stop the world or incremental
collector and how it relates to the status of all the threads in the
process. Depending on the approach taken, it may be the case that
threads are in the middle of running their respective "algorithms" when
they are blocked by the tracing collector. This raises the difficult
question of how to find a consistent cut. IMO the only efficient,
reasonable solution is to force threads to use explicit transactions in
order to access persistent objects. A transaction defines an atomic
unit of work, and is a basis for allowing the persistent objects to
recover after power failure to a consistent state.

This issue relates to the ideology of orthogonal persistence. I
believe the idea is flawed. It doesn't recognise the fundamental
dichotomy between independents and dependents. It can't be made
efficient. It is incompatible with schema evolution. FYI, an
interesting paper that addresses part of the problem is called
"Concurrency -= the fly in the ointment". Look it up in google if
you haven't read it already! What is you position on this?

Anyway, after performing a trace the collector has found some objects
in memory that are unreachable. You say these are then "finalized" in
the manner used by Java. The proxy objects are treated specially. At
this point, finalization (at least conceptually) triggers GC in the
storage space, possibly causing objects to be deleted from disk. This
is also where modifications of (proxy) objects in memory are
"synced" (ie written to disk).

Can you please explain in more detail how you ensure the system will
recover to a consistent cut?


Are you familiar with stub-scion chains? Is that what you're doing?

In some sense yes. But the concept is IMO wrong from a wider perspective of
type system. I consider references and pointers as subtypes of the target
type. So a stub to me is just a subtype, which implements the behavior of
the target type through marshaling, caching, whatsoever. In this sense it
is merely an implementation detail, not to be exposed.

Because you might wish to collect them.

I assume you mean GC.

Consider A, B -> C.

I assume you mean A,B,C are objects and A,B independently contain
references to C. [You are *not* talking about a dependency, ie that C
is calculated from A,B]

Right

Then A gets
memory resident it forces C as well.

I assume you mean the reference to C from A is followed, causing C to
become memory resident.

Later on, A is destroyed. Then, you
have to decide if C should vanish only from the memory or from DB as well.
You cannot do it because B is out.

I assume you mean B is not resident in memory.

Yes

In my system memory-resident C is
finalized. Upon this the DB GC is run to determine whether C has to be
removed from the DB. If B exists, C is synced to DB. If not, C is removed
from DB.

IMO there are two important views of the system, and therefore two
types of GC. There is a "transient view" which considers all objects
in memory (both independents and dependents), and a "persistent view"
which only considers the independents.

If you take the pure functional dependency perspective on the dichotomy
of dependents versus independents, then *by definition*, dependents
contain references to independents (or other dependents) but never the
reverse. Dependents only exist for the purpose of caching the results
of calculations. The ability to navigate from independents to
dependences simply exists to allow cached results to be found or to be
marked as dirty.

There is a theoretical sense in which all the dependents can be
completely eliminated from the system without impacting the program
functionality. After all, caching is only for performance. For that
reason, it is possible to persist the independents without the
dependents, and it is also possible to GC the independents without the
dependents. ie by specifically not following the navigation links from
independents toward dependents. These are exactly the "observer
pattern" links - ie when a subject is able to notify its observers by
storing pointers to them.

Yes. But note that the same object can migrate from one camp to another.
Unfortunately, it is impossible to stay fully functional, because some
objects are quite large for "cut'n'paste."

I see your argument, but nevertheless I don't agree. Can you back it
up with a more concrete example?

Consider a component that calculates a (sampled) Mandelbrot image. At
each c = x+iy, it repeats z := z^2 + c starting at z = c to count the
number of times to leave the rectangle [-2,2] x [-2,2] in the complex
plane (or gives up). This is rather CPU intensive.

Let the component be able to inserted into a word processing document.
Now to reduce the storage requirements on disk it may be useful to
avoid persisting the calculated image. This means that each time the
document is loaded, the Mandelbrot image will need to be calculated.
This is done in the background by a worker thread so it doesn't cause
an hour glass to appear and the application becomes unresponsive.

The Mandelbrot object internally contains independents and dependents.
The independents include things like the "window" - as a floating
point rectangle in the complex plane. An example of a dependent is
the calculated monochrome image where for each pixel it gives the
number of iterations of z := z^2 + c. This monochrome image is mapped
through a colour LUT when it is rendered on the screen to make it look
nice.

From the "independents" POV the Mandelbrot is actually a very small
lightweight object that can be persisted very easily and efficiently.

Now back to your claim that the dependents can migrate from one camp to
another: I don't think that's reasonable. As I see it, if the
user really wants an editable Mandelbrot image (eg that can be
"touched up") then the user needs to explicitly copy the calculated
image as a value type, into an Image component that represents a
"real" sampled image. In the Image component the pixel values now
represent independent values that can be edited by the user and need to
persist.


Firstly, that is a requirement. Secondly, to me it is not obvious why
persistence should slow down anything. A persistent object gets one weak
reference more. It is not a big deal.

I don't understand you.

A memory resident object that persists has a weak reference from the
storage object where it persists. This has no [sufficient] overhead.

What I mean is that dependents include things like screen pixel values.
I assume you agree it would be very inefficient to persist snapshots
of the screen. How do you reconcile that?

I'm not sure what you mean. But bit blit is that thing. This is a different
theme, but I designed a widget library for real-time wave forms and list
views using this approach.

What about the calculated Mandelbrot image, in the example above? Do
you agree you may not want that to persist?

Do you mean writing modified
persistent objects back to disk?

Yes, if the finalized object persists and was modified.

How do you ensure integrity if objects are only written to disk when
they are finalised? What if there is a power failure, so your data is
in an inconsistent state?

It is not, because if a reference to a proxy object is held , that means
that this object is potentially being modified. To sync it prematurely
would mean to bring the system in an inconsistent state.

I don't see that as being the problem.


I'm talking about the problem of a
consistent cut or snapshot.

It is driven by user actions. The objects never know if a system they
comprise is consistent. It would be bad OO design. Each object is
responsible for only itself. A memory-resident object maintains its
invariant. The interplay of invariants is user's responsibility. He can
combine objects into larger objects to enforce some higher-order
invariants. That's OO design is all about.

Yes I agree, but the underlying persistent system is ignorant of the
high level design and invariants. How does it ensure the system always
recovers to a consistent cut after a power failure?


There is no "durability", no such thing. There
is a larger system which includes the DBMS, that system is never off. If
you don't need to update the object, then that means that logically it is
not in the scope of the larger system. Quite simple, isn't it? ]

I'm struggling with understanding what you mean. In that sense I'm not
finding this simple at all!

Why? Each object has a scope which determines its life time. The scope of a
persistent object is one of the system where it persists in. What is called
"persistent" object is just an object that exists in the scope of DBMS, DSN
or call it as you wish. For software design it is no matter whether that
system is a crystalline material exhibiting ferromagnetism with some junk
SQL interface or anything else.

I have. For all intensive purposes it is just as fast as a system
semaphore.

That's interesting. Do you use Windows or POSIX? Can you outline the
algorithm?

I use the semaphore in Win32. I last measured it some time ago on a
2GHz Pentium IV. It can be locked then unlocked in 2us. I used the
standard algorithm for exclusive write or shared read locks and
obtained the same performance because the time is dominated by the
semaphore. Note that once a shared read lock has already been gained,
subsequent shared read locks are granted far more quickly because they
only lock a Win32 critical section to protect the read count, and a
critical section is about 25 times faster than a semaphore on that
machine.

This is what I'am doing too. First take a critical section, then look at
the write count, increase the read count then release it. If write count is
set, then release and wait for an event. If write is required, then spin
for a mutex, look for the read count ... It is much slower than plain
critical sections. Especially because there are tricky places where race
conditions need to be pounded out. Someday I'll give it another try in Ada.

I don't have an explicit write count. I presume that exists within
the Win32 semaphore. This is only used to avoid having a thread with
exclusive access block itself if it nests calls to gain an exclusive
lock.

Here is the algo in pseudo C :-

int readCount = 0; // Number of readers
CriticalSection cs; // Protect readCount
Semaphore wr = 1; // exclusive writer or else multiple
readers

void write()
{
wait(wr); // lock out all readers, or another writer
< allow exclusive access >
signal(wr); // up for grabs
}

void read()
{
{
Lock lock(cs);
if (readCount++ == 0) wait(wr);
}
< allow read access >
{
Lock lock(cs);
if (--readCount == 0) signal(wr);
}
}

Note that a semaphore must be used, not a Win32 mutex because the
latter assumes that the thread that releases the mutex was the same
thread that locked it. That doesn't generally happen in shared read
modes because the first thread to get the read lock won't always be
the last thread that releases the read lock.

That is the case for intervals and fuzzy numbers. The problem there is
aliasing, not side effects. For both the result of a*b depends on whether a
and b are *independent*. For example intervals: [-1,2]*[-1,2]=[-2,4], yet
[-1,2]^2=[0,4].

Hmmm. I think the functional programming perspective is too important
to ignore.

I just described my perspective in some detail in a post to Leslie
Sanford. I would greatly appreciate it if you could have a read and
comment.

I don't ignore it, I just point out its limitations. Some of them are
fundamental, like need of independence analysis above.

I don't see that!

That's a fact of your own biography, then thing is a hard mathematical
reality, you're platonic, after all! (:-))

In the case of intervals the result of a*b depends on the joint
distribution of a and b over R. When they are independent, then the
distribution can be represented as a multiplicative of individual
distributions (min is the multiplication operation in that case). If a and
b are not independent (as in the case of x*x), then the result of * is
inaccurate (it remains an estimation from above, in some sense, which makes
interval arithmetic sound, but that's another story).

I don't know what point you're making here.


Cheers,
David Barrett-Lennard

.



Relevant Pages

  • Re: Observer pattern limitations
    ... to keep track of what has been faulted into memory. ... or it may be associated with an OODB on disk. ... dichotomy between independents and dependents. ...
    (comp.object)
  • Re: Observer pattern limitations
    ... object becomes a notification, before actual finalization starts. ... to keep track of what has been faulted into memory. ... dichotomy between independents and dependents. ...
    (comp.object)
  • Re: Observer pattern limitations
    ... memory resident it forces C as well. ... I assume you mean the reference to C from A is followed, ... in memory (both independents and dependents), ... which only considers the independents. ...
    (comp.object)
  • Re: Observer pattern limitations
    ... Further I distinguish references in memory to ones into the store. ... objects that haven't been written to disk before. ... I don't really see why you would want dependents to persist on disk, ... which only considers the independents. ...
    (comp.object)
  • Re: Observer pattern limitations
    ... Further I distinguish references in memory to ones into the store. ... I don't really see why you would want dependents to persist on disk, ... The object encapsulates a connection (and a transaction). ...
    (comp.object)