[Grok-dev] Notes from working on export/reimport of data

Sebastian Ware sebastian at urbantalk.se
Mon Aug 25 05:20:38 EDT 2008


I just want to write a short note of my experience from working with  
export/reimport of data with Grok/Zodb. As you might know, I am not a  
stellar developer, but what I did might help someone or maybe inspire  
someone to come up with a better solution.

My problem was that I needed a solution that allowed me to move data  
between two versions of my application. One production version and a  
refactored (moved modules etc.) developer version. In a realtional  
database, you can export a table to a tab separated file and import it  
to any other table. This is the flexibility I wanted to achieve.

For the simple solution (export/reimport to compatible versions of an  
application, without any data manipulation) I found that pickling  
worked if one overrides the __parent__ attribute in the root object  
(as returned by __getstate__()). This solution was also used by Kevin  
Smith (I believe), however It probably has drawbacks for normal  
operations.

    def __getstate__(self):
       state = dict(self.__dict__)
       state['__parent__'] = None
       return state

When it comes to moving data between refactored versions of my  
application, I need to instantiate the object as a new class. I have  
earlier been hinted about how to move modules by means of module  
aliases, but I was stumbling a bit so I chose an approach that I found  
easier to get my head around.

Instead of storing a pickle of all the objects, I stored a pickle of a  
list of dictionaries describing the data in the objects. Since the  
attributes can contain objects I substituted these references for a  
tag <objectref id="###"> and used a file specific reference that  
allows me to substitute this string with the recreated object during  
import. Something a bit like this:

    object['type'] = type(obj)
    object['id'] = counter # file specific object id
    object['model'] = list(obj.__dict__.items()) # but without the  
__parent__ attribute
    object['container'] = list(obj.items())
    object['annotations'] = list(obj.__annotations__.items())
    objList.append(object)

During import, I create a new object using the type identifier and an  
"if this then that" kind of selector... (I guess this could be done  
using adaptors, but I wanted a lowtech solution that just works first).

One of the problems I ran into was how to handle containers. I can't  
add stuff to the containers until the object has been added. So, I  
need to fire a

   grok.notify(grok.ObjectAddedEvent(obj))

for each object and store the contents of the container in a temporary  
attribute. This attribute is read by the code triggered by the event  
and is used to update the container.

When it comes to annotations, I use a bit of custom code in order to  
recreate them. I have used "hurry.workflow" so instead of just adding  
the annotations I called the setState() of hurry workflow using the  
state parameter stored in the annotations dictionary.

All in all, I have found this to be a painfull (but quite interesting)  
experience and I hope the data export/import story is improved. I  
think this is important because nobody should have to pay for a CMS/ 
webbapp where you can't easily export your data in case the database  
gets corrupted and/or you need to upgrade the application.

Mvh Sebastian



More information about the Grok-dev mailing list