Re: Observer pattern limitations



Dmitry A. Kazakov wrote:
On 13 Jul 2006 06:36:55 -0700, David Barrett-Lennard wrote:

Dmitry A. Kazakov wrote:

No, as with observer, GC does not solve this problem. The perspective must
be not "graph theoretic." I should be *just* theoretic. The problem is that
you don't have formally specified the requirements on what we call cache,
notification, dependency etc. Once you had, you would quickly discover that
the requirements are contradictory. Then the actual work begins: to remove
contradictions from there.

I like the following perspective : You have a graph of objects that
are persisted on disk (perhaps using an OODBMS). Conceptually these
objects are behaviourless before main() begins and after main() exits.
Therefore the dependency graph has no need to persist.

No it does with the objects, they know what they depend on. Note also that
there are good reasons why dependencies should be kept separately from the
objects. One of them is GC in the persistent storage. I used this approach
in my implementation of object persistence.

I associate most of the persistent state with the "independents".
Making a calculation result persist is (only) useful if it takes less
time to load it from disk than it takes to calculate it. Hard-disk
seeks are slow compared to the CPU so it is often undesirable to
persist calculation results. It increases the burden of what needs to
be written to disk and the scope of a transaction which hurts
concurrency.

Of course sometimes results of calculations need to be written to disk,
but I regard that as another problem entirely. I tend to solve that by
using a persistent map from inputs to outputs that caches the last n
calculations. The map is not a dependent in the dependency graph
because there is no sense in which it needs to be marked as dirty,
because the calculated value is keyed on the inputs on which the
calculation was based. A map entry is never "wrong". Entries are
evicted according to the likelihood of seeing the same input values
again.


In fact,
dependents are generally transient and don't persist at all. All
behaviour in the objects is ultimately derived from an implicit
"liveliness" contract derived from main(). To help make the system
"operate", main() may directly or indirectly open a persistent store,
socket connections, create threads etc. A dependency graph results
from a liveliness contract in certain "observables" such as the
display. This requires that the display be redrawn at regular
intervals (say).

I am unsure what you've meant here. Normally it is a bad idea to traverse
dependencies backwards.

The dependency graph is assumed to be a DAG. When independents are
changed (during an update transaction on the persistent objects), all
affected dependents are eagerly marked as dirty. This is analogous to
the notification from subject towards observer in the observer pattern.
However the notification handler does nothing more than mark itself
as dirty (if not already dirty) and forwards dirtiness onwards to all
downstream dependents.

I call this "phase 1". It only happens during an update
transaction on the persistent objects / independents. No recalculation
of the (transient) dependents is allowed to occur in phase 1. This
allows the update transaction to be performed in the smallest possible
amount of time, because propagating dirtiness is very fast. It also
allows the "dust to settle" from the perspective of recalculating
dependents, and avoids problems like dirty reads.

Any number of update transactions can be applied in sequence. A
dependent can be marked as dirty again and again. It doesn't matter
how many times. Dirtiness only has to be propagated onwards when a
node first transitions from clean to dirty.

Later, in a separate phase (I'll call it phase 2) we may want to
update the display. We open a shared read transaction on the
persistent objects. This ensures that we see a consistent snapshot of
all the persistent objects (ie independents). If necessary it will
block further update transactions. In practice an OODBMS may use
strict 2PL and need to abort transactions in case of dead-lock.

The purpose of phase 2 is to recalculate dependents, including the
display. We know that we won't need to upgrade the lock to support
updates, because we know we will not be doing any mutative work to the
independents. Only the dependents need to be recalculated and they
don't persist so they are outside the scope of the objects protected
by the transaction.

Phase 2 begins with a call to recalculate some dependent. This is
typically a "leaf" in the DAG. This is coded in the manner of a
functional programming language. Ie "get" functions are called on
objects that are *assumed* to not have side affects (apart of course
from recalculating the cached value and marking it as clean again).
The result is that recalculation is naturally done in the order
dictated by the functional dependencies. This corresponds to the
reverse order of the dependency edges, because functions make calls in
a depth first manner and only generate results as the stack unwinds.

It is possible to take advantage of shared read locking modes. For
example, some of the scene graph engines currently on the market
perform the work in parallel on the assumption that multiple threads
have shared read access to the scene graph.

As a general rule, update transactions don't need to perform
calculation so they can avoid holding exclusive locks for long periods.
During shared read access, it is possible to concurrently perform
query, recalculate dependents, perform a tracing GC etc.

I have found this approach to be very fast, efficient and easy to
reason about.


The dependency graph can be complex or simple; it
depends on the problem at hand. Normally it won't contain cycles, so
for a given snapshot of all the independents, all the dependent values
(including the screen) are *precisely* defined as pure functional
dependencies. Now the independents typically correspond to the
persistent sub-graph of objects (ie a sub-graph of all the objects that
reside in memory). This sub-graph has ACID properties and requires
transactional access.

That's problematic. Isolated graphs might become connected. I am using a
storage object. All objects persisting there depend on the storage object.
The object encapsulates a connection (and a transaction). When a persistent
object refers to some other object, the latter will be forced to persist.
In effect, dependencies across connections cause copying of objects.

My own implementation of persistence doesn't suffer from this
problem. Although I implement a generic persistence by reachability, I
distinguish between objects that can persist from those that can't.
This is type intrusive, but I don't mind because I think orthogonal
persistence is a nice idea, but fundamentally wrong. Making all
reachable objects persist is just too inefficient in practice, and even
worse makes schema evolution a nightmare.


This ensures atomicity of the persistent state.
It also allows for reasoning about the dependency graph : we are only
interested in its "atomic" recalculation *between* transactions.
Atomicity (and integrity) of the persistent objects leads to atomicity
and integrity in all the dependents, including the output display.
This conceptually eliminates the display of confusing, partly
calculated results, and aligns well with double buffered graphics that
atomically switch from one depiction to the next.

As I said before, persistence is a much lesser a problem, provided, you
don't want concurrent access to the persistent objects, while they are
inactive. I don't see how the above solves the problem of
publisher-subscriber *during* a transaction.

During an update transaction you are only allow to propagate dirtiness.
There is no problem.


The two-phase
approach to processing the dependency graph aligns well with this
perspective. During a mutative, transactional update to the persistent
objects, dirtiness is propagated eagerly through all affected
dependents, but they are *not* recalculated immediately. This allows
(mutative) transactions to take the shortest possible time, maximising
concurrency on the persistent objects. By contrast, recalculation of
dependents, which is CPU intensive only requires shared read mode
locking on the persistent state, leading to the possibility of
excellent parallelism.

Shared read has problems as well. I designed several implementations of
mutexes which allow shared read / exclusive write. It is not a proper place
to discuss it, but in short, their performance is disappointing (as
compared with simple locks), then there are deadlock problems in
transitions between read and write. I'll definitely make one more attempt
based on Ada protected objects.

I have implemented such a mutex myself. It works very well, perhaps
because I make no attempt to upgrade a lock.

It has been hammered with dozens of threads for hours without showing
any problems.


No, that won't solve the problem when you have pre-written components
in a third party library and you need to add the objects that represent
the "wiring". What deletes the wiring?

GC. You have a handle to it.

In my example only the subject does.

So the problem.

I'm talking about examples like "c := a.get() + b.get()". The
act of executing that code causes a and b to be calculated *before* c
is calculated. Lazy evaluation is great for solving most order of
calculation problems.

It allows enforcing order on operands, but that does not solve problem when
get() has side effects.

But it shouldn't. As I said, I'm taking a functional programming
perspective.

Then you wouldn't need laziness. You are talking about caching, that's
*not* functional.

Agreed. But I claim that there is a nice way of thinking that combines
the functional perspective together with caching to get the best of
both worlds.

I have implemented this. It works well.

a and b are stateful and there is an independent source
which changes their states. There are some premises required to make your
proposition true. In general case, these premises don't hold.

This is why I suggested you to formalize the problem you have.

You will also have aliasing problem and one of
ordering operands of "+".

I don't know why you say there is an ordering problem. The only rule
is that a,b are calculated before c.

Provided that calculation of a does not influence the value of b. Typical
example: c := x^2. Funny as it may appear, but this cannot be always
implemented as c := x*x, because x^2 = x*x does not hold, when x is an
interval, x is fuzzy number, x is random, x is clock etc.

Yes, but I *assume* no such side effects. The function is only allowed
to have the side effect of recalculating the cache and marking itself
as clean again.


Note also that when there is no side effects, there is also no need to have
it lazy. Marshal values of a and b to c, that's it. Lazy can be great for
dealing with side-effects, but it does not eliminate the ordering problem.
It is a tool.

The problem space has to be analysed to find a solution of the *narrower*
problem.

Hmmm. I don't understand what you mean.

See posts of other people. From Lahman to Martin (usually I disagree with
both (:-)), they all make the point about applicability of the observer
pattern. Don't mix the problem and solution spaces.

[ As for me, I'm just trying to illustrate this point on war stories,
because publisher-subscriber stuff is my daily business. ]

My daily business as well. It is good to compare notes!

Cheers,
David Barrett-Lennard

.



Relevant Pages

  • Re: Observer pattern limitations
    ... Further I distinguish references in memory to ones into the store. ... I don't really see why you would want dependents to persist on disk, ... The object encapsulates a connection (and a transaction). ...
    (comp.object)
  • Re: Observer pattern limitations
    ... "liveliness" contract derived from main. ... Further I distinguish references in memory to ones into the store. ... The object encapsulates a connection (and a transaction). ... and integrity in all the dependents, ...
    (comp.object)
  • Re: New Orleans police to be pulled off streets
    ... >dependents, and how much would that be worth in court. ... The calculation goes ... Plutocracy prevails in practice, but when it prevails in principle, ... We need more honest schoolbooks. ...
    (soc.retirement)