Re: Observer pattern limitations
- From: "David Barrett-Lennard" <davidbl@xxxxxxxxxxxx>
- Date: 14 Jul 2006 15:38:35 -0700
Dmitry A. Kazakov wrote:
On 13 Jul 2006 21:03:24 -0700, David Barrett-Lennard wrote:
Dmitry A. Kazakov wrote:[...]
On 13 Jul 2006 06:36:55 -0700, David Barrett-Lennard wrote:
In fact,
dependents are generally transient and don't persist at all. All
behaviour in the objects is ultimately derived from an implicit
"liveliness" contract derived from main(). To help make the system
"operate", main() may directly or indirectly open a persistent store,
socket connections, create threads etc. A dependency graph results
from a liveliness contract in certain "observables" such as the
display. This requires that the display be redrawn at regular
intervals (say).
I am unsure what you've meant here. Normally it is a bad idea to traverse
dependencies backwards.
I have found this approach to be very fast, efficient and easy to
reason about.
My approach is different. The closure is calculated each time upon
finalization which causes an object to persist. Because most of the
referred objects already persist traversal stops almost immediately.
Further I distinguish references in memory to ones into the store. This
allows to have in-the-store objects, which never become memory resident.
That's important to me, because some objects might be very large.
Most OODBMS's fault objects into memory on demand. Many in fact are
page based (like Object Store), but the idea is the same.
Swizzling refers to the idea to convert persistent references to memory
pointers, and is done when an object or page is faulted into memory.
Unswizzling is required when a page is dirty and needs to be written
back to disk.
I have written my own OODB that uses software swizzling and faults
objects into memory as smart pointers are dereferenced. The smart
pointer holds an OID (which is what persists) as well as the (mutable)
memory pointer.
I don't really see why you would want dependents to persist on disk,
and therefore why the dependency graph has much to do with the solution
for persistence.
The dependency graph can be complex or simple; it
depends on the problem at hand. Normally it won't contain cycles, so
for a given snapshot of all the independents, all the dependent values
(including the screen) are *precisely* defined as pure functional
dependencies. Now the independents typically correspond to the
persistent sub-graph of objects (ie a sub-graph of all the objects that
reside in memory). This sub-graph has ACID properties and requires
transactional access.
That's problematic. Isolated graphs might become connected. I am using a
storage object. All objects persisting there depend on the storage object.
The object encapsulates a connection (and a transaction). When a persistent
object refers to some other object, the latter will be forced to persist.
In effect, dependencies across connections cause copying of objects.
My own implementation of persistence doesn't suffer from this
problem. Although I implement a generic persistence by reachability, I
distinguish between objects that can persist from those that can't.
This is type intrusive, but I don't mind because I think orthogonal
persistence is a nice idea, but fundamentally wrong. Making all
reachable objects persist is just too inefficient in practice, and even
worse makes schema evolution a nightmare.
Firstly, that is a requirement. Secondly, to me it is not obvious why
persistence should slow down anything. A persistent object gets one weak
reference more. It is not a big deal.
I don't understand you.
This ensures atomicity of the persistent state.
It also allows for reasoning about the dependency graph : we are only
interested in its "atomic" recalculation *between* transactions.
Atomicity (and integrity) of the persistent objects leads to atomicity
and integrity in all the dependents, including the output display.
This conceptually eliminates the display of confusing, partly
calculated results, and aligns well with double buffered graphics that
atomically switch from one depiction to the next.
As I said before, persistence is a much lesser a problem, provided, you
don't want concurrent access to the persistent objects, while they are
inactive. I don't see how the above solves the problem of
publisher-subscriber *during* a transaction.
During an update transaction you are only allow to propagate dirtiness.
There is no problem.
Under transaction I meant all the time an object is memory-resident. What
you mean is a short period of time during finalization of a memory-resident
object.
What do you mean by finalization? Do you mean writing modified
persistent objects back to disk? I'm talking about a short period for
which an update transaction is performed. In a database without the
'D' for ACID properties there isn't even a requirement that the
modified objects be written back to disk as part of the transaction.
I wonder if the concept of 'stale' is a source of confusion between us.
I am referring to pure calculated values (ie dependents) that are
cached and may need to be marked as dirty. I wonder if you are
talking about the persistent state that represents the logical state of
the system, and this is marked as dirty to indicate that it needs to be
written back to disk.
I think these are totally different concepts.
When this object is still in use in the storage, then it is synced
with it. At this time there of course is no any problem whatsoever, because
the object is not referenced by any memory-resident object and thus
invisible. Otherwise, the former couldn't be finalized.
Sorry that may as well have been written in Greek. I don't understand
you.
The two-phase
approach to processing the dependency graph aligns well with this
perspective. During a mutative, transactional update to the persistent
objects, dirtiness is propagated eagerly through all affected
dependents, but they are *not* recalculated immediately. This allows
(mutative) transactions to take the shortest possible time, maximising
concurrency on the persistent objects. By contrast, recalculation of
dependents, which is CPU intensive only requires shared read mode
locking on the persistent state, leading to the possibility of
excellent parallelism.
Shared read has problems as well. I designed several implementations of
mutexes which allow shared read / exclusive write. It is not a proper place
to discuss it, but in short, their performance is disappointing (as
compared with simple locks), then there are deadlock problems in
transitions between read and write. I'll definitely make one more attempt
based on Ada protected objects.
I have implemented such a mutex myself. It works very well, perhaps
because I make no attempt to upgrade a lock.
It has been hammered with dozens of threads for hours without showing
any problems.
You must compare it with a simple lock (critical section or system mutex,
depending on the context you are talking about).
I have. For all intensive purposes it is just as fast as a system
semaphore.
The implementation I refer
is an industrial middleware, the number of processes and threads depends on
the application, hundred is a medium size. We did some measures and found
that read access (~1µs) is about 1000 times slower than plain memory read
(~1ns). The context is system wide, i.e. one have to cross process borders,
which makes things worse.
Ok. I'm talking more about an in-process system.
I favour a replication approach to achieve cross process parallelism.
Inevitably you get divergence if you avoid centralised, pessimistic
locking. I'm interested in the mathematics of achieving convergence
again, such as with operational transform.
I'm going to switch to protected objects, because they are more efficient
than mutexes (even plain ones.)
If I understand you I actually do this as well. I *don't* think
fine-grained locking and strict 2PL is best. As you say, locking a
system mutex is 1000 times slower than plain memory read.
I instead prefer very course grained locks, conservative 2PL, shared
read modes, and no need for dead-lock detection.
I don't know why you say there is an ordering problem. The only rule
is that a,b are calculated before c.
Provided that calculation of a does not influence the value of b. Typical
example: c := x^2. Funny as it may appear, but this cannot be always
implemented as c := x*x, because x^2 = x*x does not hold, when x is an
interval, x is fuzzy number, x is random, x is clock etc.
Yes, but I *assume* no such side effects.
That is the case for intervals and fuzzy numbers. The problem there is
aliasing, not side effects. For both the result of a*b depends on whether a
and b are *independent*. For example intervals: [-1,2]*[-1,2]=[-2,4], yet
[-1,2]^2=[0,4].
Hmmm. I think the functional programming perspective is too important
to ignore.
I just described my perspective in some detail in a post to Leslie
Sanford. I would greatly appreciate it if you could have a read and
comment.
BTW I'm enjoying this slightly cryptic but interesting discussion.
Cheers,
David Barrett-Lennard
.
- Follow-Ups:
- Re: Observer pattern limitations
- From: Dmitry A. Kazakov
- Re: Observer pattern limitations
- References:
- Observer pattern limitations
- From: David Barrett-Lennard
- Re: Observer pattern limitations
- From: Dmitry A. Kazakov
- Re: Observer pattern limitations
- From: David Barrett-Lennard
- Re: Observer pattern limitations
- From: Dmitry A. Kazakov
- Re: Observer pattern limitations
- From: David Barrett-Lennard
- Re: Observer pattern limitations
- From: Dmitry A. Kazakov
- Re: Observer pattern limitations
- From: David Barrett-Lennard
- Re: Observer pattern limitations
- From: Dmitry A. Kazakov
- Re: Observer pattern limitations
- From: David Barrett-Lennard
- Re: Observer pattern limitations
- From: Dmitry A. Kazakov
- Re: Observer pattern limitations
- From: David Barrett-Lennard
- Re: Observer pattern limitations
- From: Dmitry A. Kazakov
- Observer pattern limitations
- Prev by Date: Re: Observer pattern limitations
- Next by Date: Re: Observer pattern limitations
- Previous by thread: Re: Observer pattern limitations
- Next by thread: Re: Observer pattern limitations
- Index(es):
Relevant Pages
|