Re: Observer pattern limitations
- From: "David Barrett-Lennard" <davidbl@xxxxxxxxxxxx>
- Date: 15 Jul 2006 18:05:09 -0700
Dmitry A. Kazakov wrote:
On 14 Jul 2006 15:38:35 -0700, David Barrett-Lennard wrote:
Dmitry A. Kazakov wrote:
On 13 Jul 2006 21:03:24 -0700, David Barrett-Lennard wrote:
Dmitry A. Kazakov wrote:[...]
On 13 Jul 2006 06:36:55 -0700, David Barrett-Lennard wrote:
In fact,
dependents are generally transient and don't persist at all. All
behaviour in the objects is ultimately derived from an implicit
"liveliness" contract derived from main(). To help make the system
"operate", main() may directly or indirectly open a persistent store,
socket connections, create threads etc. A dependency graph results
from a liveliness contract in certain "observables" such as the
display. This requires that the display be redrawn at regular
intervals (say).
I am unsure what you've meant here. Normally it is a bad idea to traverse
dependencies backwards.
I have found this approach to be very fast, efficient and easy to
reason about.
My approach is different. The closure is calculated each time upon
finalization which causes an object to persist. Because most of the
referred objects already persist traversal stops almost immediately.
Further I distinguish references in memory to ones into the store. This
allows to have in-the-store objects, which never become memory resident.
That's important to me, because some objects might be very large.
Most OODBMS's fault objects into memory on demand. Many in fact are
page based (like Object Store), but the idea is the same.
It is an independent issue. I prefer to do it at once to minimize
distributed overhead of dereferencings. Lazy objects are implemented in a
different way. The object stays resident in the DB, only its small proxy
becomes memory-resident. That proxy can do cache-on-demand, with the same
effect as lazy references. It is more flexible, IMO.
Here is an attempt to understand you...
By "at once" I assume you mean synchronously or eager (rather than
postponed). I assume you are talking about some traversal over the
modified objects so that all the changes are written synchronously to
disk (apart from write caching). Furthermore this discovers new
objects that haven't been written to disk before. I presume they need
to be given some kind of persistable object identifier (OID). Somehow,
you are able to ensure that a consistent snapshot is written to disk.
You call this whole process "finalisation".
Independently of that, there is a concept of late binding to objects.
I presume a "lazy reference" is a smart pointer that only binds when it
is first dereferenced. That raises the question of when it unbinds (if
ever). Anyway you distinguish between a proxy and what it binds to.
Therefore you don't really consider objects to be "faulted into
memory". Instead it is interpreted as binding.
Are you familiar with stub-scion chains? Is that what you're doing?
Swizzling refers to the idea to convert persistent references to memory
pointers, and is done when an object or page is faulted into memory.
Unswizzling is required when a page is dirty and needs to be written
back to disk.
I have written my own OODB that uses software swizzling and faults
objects into memory as smart pointers are dereferenced. The smart
pointer holds an OID (which is what persists) as well as the (mutable)
memory pointer.
I don't really see why you would want dependents to persist on disk,
and therefore why the dependency graph has much to do with the solution
for persistence.
Because you might wish to collect them.
I assume you mean GC.
Consider A, B -> C.
I assume you mean A,B,C are objects and A,B independently contain
references to C. [You are *not* talking about a dependency, ie that C
is calculated from A,B]
Then A gets
memory resident it forces C as well.
I assume you mean the reference to C from A is followed, causing C to
become memory resident.
Later on, A is destroyed. Then, you
have to decide if C should vanish only from the memory or from DB as well.
You cannot do it because B is out.
I assume you mean B is not resident in memory.
In my system memory-resident C is
finalized. Upon this the DB GC is run to determine whether C has to be
removed from the DB. If B exists, C is synced to DB. If not, C is removed
from DB.
IMO there are two important views of the system, and therefore two
types of GC. There is a "transient view" which considers all objects
in memory (both independents and dependents), and a "persistent view"
which only considers the independents.
If you take the pure functional dependency perspective on the dichotomy
of dependents versus independents, then *by definition*, dependents
contain references to independents (or other dependents) but never the
reverse. Dependents only exist for the purpose of caching the results
of calculations. The ability to navigate from independents to
dependences simply exists to allow cached results to be found or to be
marked as dirty.
There is a theoretical sense in which all the dependents can be
completely eliminated from the system without impacting the program
functionality. After all, caching is only for performance. For that
reason, it is possible to persist the independents without the
dependents, and it is also possible to GC the independents without the
dependents. ie by specifically not following the navigation links from
independents toward dependents. These are exactly the "observer
pattern" links - ie when a subject is able to notify its observers by
storing pointers to them.
Firstly, that is a requirement. Secondly, to me it is not obvious why
persistence should slow down anything. A persistent object gets one weak
reference more. It is not a big deal.
I don't understand you.
A memory resident object that persists has a weak reference from the
storage object where it persists. This has no [sufficient] overhead.
What I mean is that dependents include things like screen pixel values.
I assume you agree it would be very inefficient to persist snapshots
of the screen. How do you reconcile that?
This ensures atomicity of the persistent state.
It also allows for reasoning about the dependency graph : we are only
interested in its "atomic" recalculation *between* transactions.
Atomicity (and integrity) of the persistent objects leads to atomicity
and integrity in all the dependents, including the output display.
This conceptually eliminates the display of confusing, partly
calculated results, and aligns well with double buffered graphics that
atomically switch from one depiction to the next.
As I said before, persistence is a much lesser a problem, provided, you
don't want concurrent access to the persistent objects, while they are
inactive. I don't see how the above solves the problem of
publisher-subscriber *during* a transaction.
During an update transaction you are only allow to propagate dirtiness.
There is no problem.
Under transaction I meant all the time an object is memory-resident. What
you mean is a short period of time during finalization of a memory-resident
object.
What do you mean by finalization?
Actions before freeing the memory allocated by the object.
Ok. Java terminology.
Do you mean writing modified
persistent objects back to disk?
Yes, if the finalized object persists and was modified.
How do you ensure integrity if objects are only written to disk when
they are finalised? What if there is a power failure, so your data is
in an inconsistent state? I'm talking about the problem of a
consistent cut or snapshot.
I'm talking about a short period for
which an update transaction is performed. In a database without the
'D' for ACID properties there isn't even a requirement that the
modified objects be written back to disk as part of the transaction.
[ OK, DB terminology is mostly rubbish. (I have checked it, this is not
cross-posted to c.d.t (:-)).
LOL
There is no "durability", no such thing. There
is a larger system which includes the DBMS, that system is never off. If
you don't need to update the object, then that means that logically it is
not in the scope of the larger system. Quite simple, isn't it? ]
I'm struggling with understanding what you mean. In that sense I'm not
finding this simple at all!
I wonder if the concept of 'stale' is a source of confusion between us.
I am referring to pure calculated values (ie dependents) that are
cached and may need to be marked as dirty. I wonder if you are
talking about the persistent state that represents the logical state of
the system, and this is marked as dirty to indicate that it needs to be
written back to disk.
It is probably a terminology confusion between us, see my rant above.
When this object is still in use in the storage, then it is synced
with it. At this time there of course is no any problem whatsoever, because
the object is not referenced by any memory-resident object and thus
invisible. Otherwise, the former couldn't be finalized.
Sorry that may as well have been written in Greek. I don't understand
you.
Syncing with the DB happens upon finalization of memory-resident [proxy]
objects, as I described above.
You must compare it with a simple lock (critical section or system mutex,
depending on the context you are talking about).
I have. For all intensive purposes it is just as fast as a system
semaphore.
That's interesting. Do you use Windows or POSIX? Can you outline the
algorithm?
I use the semaphore in Win32. I last measured it some time ago on a
2GHz Pentium IV. It can be locked then unlocked in 2us. I used the
standard algorithm for exclusive write or shared read locks and
obtained the same performance because the time is dominated by the
semaphore. Note that once a shared read lock has already been gained,
subsequent shared read locks are granted far more quickly because they
only lock a Win32 critical section to protect the read count, and a
critical section is about 25 times faster than a semaphore on that
machine.
The implementation I refer
is an industrial middleware, the number of processes and threads depends on
the application, hundred is a medium size. We did some measures and found
that read access (~1µs) is about 1000 times slower than plain memory read
(~1ns). The context is system wide, i.e. one have to cross process borders,
which makes things worse.
Ok. I'm talking more about an in-process system.
I favour a replication approach to achieve cross process parallelism.
Inevitably you get divergence if you avoid centralised, pessimistic
locking.
The problem is that apart from memory consumption, replicas are very slow
for communication. Consider a system which uses your thing as a middleware.
I.e. one process writes an object, while another reacts on the change. In a
commercial system for Windows I developed the latencies are 10µs + process
and threads switching overhead. With replicas it would be much slower.
I'm going to switch to protected objects, because they are more efficient
than mutexes (even plain ones.)
If I understand you I actually do this as well.
Unlikely. Protected objects require language support. They are faster than
mutexes, because they allow to perform protected actions on an arbitrary
context. That can sufficiently reduce the number of context switches (such
as threads). The overhead of taking a mutex is same, you can't get rid of
it anyway.
OK, replica is a different way = you marshal everything. The problem is, as
I said, with 1) mutable things, 2) a large number of subscribers. Basically
replicas are kind of broadcasting, which is "functional", in some sense.
I *don't* think fine-grained locking and strict 2PL is best.
Yes
As you say, locking a
system mutex is 1000 times slower than plain memory read.
That is more than just locking. It is total read access overhead. This
includes multiple locking and fetching the data.
I don't know why you say there is an ordering problem. The only rule
is that a,b are calculated before c.
Provided that calculation of a does not influence the value of b. Typical
example: c := x^2. Funny as it may appear, but this cannot be always
implemented as c := x*x, because x^2 = x*x does not hold, when x is an
interval, x is fuzzy number, x is random, x is clock etc.
Yes, but I *assume* no such side effects.
That is the case for intervals and fuzzy numbers. The problem there is
aliasing, not side effects. For both the result of a*b depends on whether a
and b are *independent*. For example intervals: [-1,2]*[-1,2]=[-2,4], yet
[-1,2]^2=[0,4].
Hmmm. I think the functional programming perspective is too important
to ignore.
I just described my perspective in some detail in a post to Leslie
Sanford. I would greatly appreciate it if you could have a read and
comment.
I don't ignore it, I just point out its limitations. Some of them are
fundamental, like need of independence analysis above.
I don't see that!
Cheers,
David Barrett-Lennard
.
- Follow-Ups:
- Re: Observer pattern limitations
- From: Dmitry A. Kazakov
- Re: Observer pattern limitations
- References:
- Observer pattern limitations
- From: David Barrett-Lennard
- Re: Observer pattern limitations
- From: Dmitry A. Kazakov
- Re: Observer pattern limitations
- From: David Barrett-Lennard
- Re: Observer pattern limitations
- From: Dmitry A. Kazakov
- Re: Observer pattern limitations
- From: David Barrett-Lennard
- Re: Observer pattern limitations
- From: Dmitry A. Kazakov
- Re: Observer pattern limitations
- From: David Barrett-Lennard
- Re: Observer pattern limitations
- From: Dmitry A. Kazakov
- Re: Observer pattern limitations
- From: David Barrett-Lennard
- Re: Observer pattern limitations
- From: Dmitry A. Kazakov
- Re: Observer pattern limitations
- From: David Barrett-Lennard
- Re: Observer pattern limitations
- From: Dmitry A. Kazakov
- Re: Observer pattern limitations
- From: David Barrett-Lennard
- Re: Observer pattern limitations
- From: Dmitry A. Kazakov
- Observer pattern limitations
- Prev by Date: Re: Observer pattern limitations
- Next by Date: Re: Observer pattern limitations
- Previous by thread: Re: Observer pattern limitations
- Next by thread: Re: Observer pattern limitations
- Index(es):
Relevant Pages
|