[Zope-dev] Zope components and revision control with cvs

Shane Hathaway shane@zope.com
Thu, 1 Aug 2002 23:46:08 -0400 (EDT)


On 1 Aug 2002, Gary Poster wrote:

> Given a hypothetical folder-like instance called "myFLI", we would
> presumably want, in CVS (or Subversion, or whatever) a folder named
> "myFLI" containing the children and a file named, to borrow your
> example, "myFLI.properties.zexp" that *only* contains the
> non-ObjectManager-children properties, whatever they are.  But, as I
> understand it, when you pickle an object for storage as a zexp--in the
> way the ZCVSFolder does it, for instance--you are pickling the object
> *and* its (ObjectManager) children: not what we want.
>
> This is the bigger stumbling block for me.  Is this fixable?  Overriding
> __getstate__ (I assume?) just for this seems fragile (can we guarantee
> the source of the ObjectManager children in the object, for instance?  I
> don't think so).  So that was my concern.

AdaptableStorage, which I just presented at OSCON 2002, faced the same
problem, but I wrestled until I found a solution.  A lot of it is based on
ideas from ZODB.  Here's my train of thought.  I'm trying to refine my
explanation each time I give it.

1) You don't have to store or version pickles.  Instead, you can ask a
series of "serializers" to convert your object to a simpler
representation, then you can store or version the simpler format.

2) The simpler representation can encode inter-object references just like
ZODB does.

3) As the serializers convert the object to a simpler representation, they
can report to a "serialization tracker" which parts of the object have
been recorded.

4) To finish idea 2, when the serializers record an inter-object
reference, they are required to report the reference to the serialization
tracker.  This tells the controlling software to look at the referenced
object, and if it is a new object not previously recorded, it is added to
the stack of objects to record.  This is similar to what
ZODB.Connection.commit() does.

5) One kind of serializer is the "leftover pickle" serializer, which
records the state of attributes and subobjects not recorded by other
serializers.  This serializer makes it safe to version any kind of object,
with a caveat that the leftover data is stored in a binary format.

6) Serializers should also be deserializers, helping ensure that anything
that can be serialized can also be deserialized.

7) The thing that I spent months (maybe even years) pondering was the
simple format.  I knew that object serializers could be very useful for
storage anywhere, as well as versioning and merging, if we could just come
up with a common format that would let us connect serializers and storage
adapters together.  I tried XML and DOM, but it was too cumbersome, and I
tried several kinds of custom classes.  Nothing really fit the bill.

I finally gave up, but by giving up I found the solution.  Sometimes
having a deadline really helps!  I started using sequences of records or,
in most cases, tuples of tuples.  I rediscovered what RDBMS experts have
known all along--that data can be easily represented as sequences of
schema-bound records.  So now all the serializers in AdaptableStorage
serialize to a sequence of records.  I admit that the requirement is a
stretch in some places, since I ended up writing code like "return
((value,),)", but there are great benefits to having a common, simple
serialization format, and Python provides just the right ingredients.

8) Serializers are configurable components.  Through the configuration
mechanism, we will be able to add serializers to the component system in
Zope 3.  We will configure which serializers should apply to which kinds
of objects.  Many applications will use them, including object persistence
mechanisms, version control adapters, merging tools, etc.  It will open up
a lot of possibilities, I think.

I'm sure I left a few things out, so ask questions about the unclear
parts.  It's probably more info than you were expecting. ;-)

Shane