[Zope] - Hints and strategical info for database migration needed (longish))

Jim Fulton jim.fulton@digicool.com
Fri, 08 Jan 1999 14:38:14 -0500


Stefan Franke wrote:
> 
> After some days of evaluation I decided that Zope is (surprise!-)
> exactly what I need for my current application.
> I'm trying to publish some thousand elements of strictly hierarchical
> data with an advanced search interface (for the user) and a management
> interface for which Zope's standard interface would be just fine.
> 
> Currently I generate my data strucure - an attributed tree with cyclic
> links back from each element to its father - with a python program from
> a flat file. I'm a bit uncertain about the best way to migrate this
> structure into Zope and would appreciate some opinions about it.
> 
> Since I would like to recycle Zope's management interface, I it seems
> better to have the data element-wise included into the persistent
> storage with a separate product class for each node or leaf type in
> my structure (the tree itself is very heterogenous) than to write an
> interface to my existing data management (not speaking of the other
> benefits BoboPOS provides).
> 
> The questions I'm concerned about are
> - The performance of the query functions: At the moment, the whole
>   data fits well into memory (few thousand elements / ~10 MB). How
>   big is the overhead in speed and size if I turn each element into
>   a persistent product instance.

I don't think that persistence will add significantly
to the memory usage.

Memory usage depends on the "state" of the object.

Persistent objects can be in one of three states
wrt persistence:

  - Not in memory

    The storage requited for out of memory depends on the
    storage manager used.  The current storage manager
    consumes about 6 bytes per persistent object whether
    or not the object is in memory.

    In the next generation of the database, there will be storage
    managers that do not impose per-object memory costs.

  - In memory and active

    In addition to the storage used by the object, there is
    about 26 bytes of overhead on 32-bit machines.  On 64-bit
    machines, the overhead is probably a little less than 
    twice this.

  - In memory not not active.

    Inactive objects have the same persistence overhead as active
    objects, but they usually consume much less memory because their
    state (e.g. instance dictionary items) are not in memory.

An important point to keep in mind is that most of the time, 
only a small percentage of your database is in memory.

Depending on your access patterns, the memory consumed by the
persistent objects should be much less than that required
to load the entire non-persistent network in memory.    

> - Are Products the right way to go?

Yes.
 
> - Are there any problems with the inherent cyclic structure of the
>   data in conjunction with the persistent storage?

Actually, the persistence machinery buys you alot here.
The cache manager automagically breaks circular references
when it deactivates objects.
 
> - Let's say I would subclass Zope's folder class for my inner nodes.
>   I think the links to the containing objects are provided anyway due
>   to the aquisition structure?

Right.  You don't need to subclass folder to get this.
This is a feature of acquisition.
 
> - And most important: How do migrate my data into the Z database? It
>   would be cool for me using a HTTP file upload to completely replace
>   the interned database with the data from the uploaded file (I could
>   retain my existing flat file parser).

This should be straightforward.

>   But how do I access the persistent storage from (let's say) the
>   external method performing the upload).

A major goal of the persistence mechanism used by Zope is transparency.
You access persistent storage by simply modifying objects.

>   Of course, I have seen the
>   BoboPOS docs, but what are the concrete instances I can access from
>   an external method?

If your method has a 'self' argument, it will be passed the folder
in which the method was invoked.

Your upload method will look something like this:

  def myupload(
       self,       # The folder that will contain the tree
       id,         # The id of the tree in the folder (ie attr name)
       title,      # and optional descriptive title
       data_file   # The raw data
       ):

     # Compute the tree object.  This is pretty much the
     # same thing you have now. except that the various instances
     # in the tree now mix in BoboPOS.Persisistent and follow
     # the few basic rules for dealing with subobjects.
     tree=myOldParserFunction(data_file)

     tree.id=id
     tree.title=title

     # You could just:
     #   setattr(self, id, tree)
     # to poke the tree in the folder and make it persistent, 
     #
     # but you probably want to manage the tree through the
     # Zope interface, so instead you'll:
     self._setObject(id, tree)
     
Note that none of the code above has anything to do with 
persistence. :)

> Though being generally a little bit overwhelmed by the whole Z
> documentation and confused by the Bobo/Principia legacy naming, the
> latter is one of the most unclear parts for a newbie like me: What is
> the surrounding API in a published module. What
> functions/modules/globals
> can I access?

We're working on improving the documentation.  Alot of it is
there.  There's alot of good development documentation at:

  http://www.zope.org/Documentation/Reference
  
Jim

--
Jim Fulton           mailto:jim@digicool.com
Technical Director   (888) 344-4332              Python Powered!
Digital Creations    http://www.digicool.com     http://www.python.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.