[Zope-dev] Suggestions towards implementing an experimental protocol for cross-database persistent object references and multi-database Zopelications cross-database persistent object references and multi-database Zopelications

Tue, 16 Nov 1999 23:58:48 -0500

After studying Jeffrey Shell's ZLDAP package, and the current ZODB system,
in the light of recent conversations with Jim Fulton, a few lightbulbs went
on with respect to the usefulness of multi-database Zopelications.  For
example, wouldn't it be keen if regular Zope objects could 'store' object
attributes that were actually LDAP entries?  Or SQL database records?  That
would be pretty awesome.

The cool thing about the ZLDAP stuff is that the LDAP Connection object,
itself a database, is actually a persistent object stored in the regular
ZODB.  That suggests a clean and sensible way to integrate multi-database
Zopes: any given Zope installation must store connections to other
databases as persistent objects within its "root" database.  That is, any
Persistent object in a particular Zopelication should have a _p_jar
attribute which either points to the REQUEST-owned Connection, or to an jar
which meets this criterion (recursively).  This means that one could, at
least in theory, reach any database in a multidatabase system by following
an ever-expanding tree of database references.

Now, if this property is followed, then it is possible for an object in any
database to refer to any object which is located "downstream" in the tree.
That is, an object O1 in database DB1 can reference object O2 in database
DB2 so long as DB2 is reached by way of a persistent object stored in DB1,
or a database thus referenced by DB1, recursively.  (Upstream references
are not possible without a global database naming system; however, there is
nothing about my suggested implementation that prevents a global naming
scheme from later being used either together with, or in place of, this
model.)

Due to this tree-oriented nature, this multidatabase model is most
appropriate to Zopelications which provide for local needs in a local
database, but need to reference other, shared databases or legacy systems.
(Note: this model does not support multi-database undo, which requires a
global naming mechanism for databases and transactions, so that data
integrity can be maintained by refusing to undo transactions unless all
databases which were involved can undo it.)

Anyway...  making references work.  Initially, to test this concept, a
simple mode of implementation at the application level is to provide a
getReference function.  If I want to store an object in one of my
attributes, I would say:

self.attribute = getReference(self,object)

The function would look something like:

def getReference(source,target):
    tgt_jar = target._p_jar  # note: fails if we can't make reference
    if tgt_jar is source._p_jar or tgt_jar is None: return target
    return RemoteReference(getReference(source,tgt_jar),target._p_oid)

This recursively builds a dereferencing object which, when retrieved from
my self.attribute later, will return the object from the correct database.
The RemoteReference class is as follows (or equivalent in C):

class RemoteReference(ExtensionClass.Base):
    def __init__(self,jar,oid):
        self.jar, self.oid = jar,oid

    def __of__(self,parent):
        object = self.jar[self.oid]
        if hasattr(object,'__of__'):
            return object.__of__(parent)
        return object

The RemoteReference class simply refers to a jar (which must be a
persistent object) and an oid to be retrieved from the jar.  When a
RemoteReference is retrieved from an object, it will replace itself with
the result of retrieving that oid from that jar, and call __of__ on that
result.  (Note that if the jar itself can be referenced by a
RemoteReference, and it will be unpacked when we do self.jar to use it.
Thus, a reference "two databases deep" (or more) will be properly unpacked.)

Notice that this works even if a portion of a database tree is isolated and
used as a root unto itself, since anything stored in a given database can
only reference objects in itself, or in databases referenced from it.

In order for this protocol to work, one need only do two things:

* Any database which wishes to be referenceable must be able to have
Connection-like objects stored as Persistent objects.

* When storing a reference to another object, one must call
getReference(self,object) and store the result, AND, self must already be
assigned a _p_jar.

The first requirement burdens database implementors, but it is not that far
out of the question.  It merely needs a Persistent object which can
delegate its behavior to a "real" Connection object of some kind.  The
second requirement burdens those who would store foreign references, and
seems a bit more severe, although it seems that often one will know when
one is trying to do this.

This application-level restriction could be eased by extending databases'
persistent_id mechanism (used w/cPickle) to return a RemoteReference as the
oid of an object stored in a foreign jar.  When they are asked for an
object whose oid is a RemoteReference object, they can simply return the
RemoteReference itself, or automatically dereference it.  The latter has
the potential problem of perhaps unnecesarily waking up currently dormant
databases, but I suspect it is unlikely to be a real problem in practice.
(Note that any such waking-up will be bounded by the depth of the database
tree which is currently in use, and also that this mechanism does not
preclude the future use of RemoteReferences based on a global naming
scheme, or of cyclical references under such a scheme.)

To sum up, this seems like a reasonably workable approach to cross-database
references in Zope where such references proceed from private "roots" to
shared "leaves" of a database tree.  It is incrementally implementable, and
does not initially require changing any part of the existing Zope
framework.  But, with additional effort, it can be scaled up to provide
better ease-of-use and generality.  Creating a "database" that can take
advantage of the protocol could be almost as simple as making a Persistent
object whose __getitem__ method calls an SQL method to retrieve something
from a database, then sets object._p_jar=self and
object._p_oid=retrieval_key.  Presto, you now have an SQL record which can
be "pointed to" by ZODB objects, which need not concern themselves with the
details of SQL involved.

At this point, all sorts of application ideas begin bubbling through my
head, ranging from having counter-type objects stored in suitable storages,
to having "storage-managed object pools", a concept Ty and I have been
batting around for some time as a means of reducing certain types of
write-contention in large applications, and for taking advantage of
BerkeleyDB and other databases' native indexing facilities.  Anyway,
further applications are left as an exercise for the reader.  :)

Comments?