[Zope] Uploading textfiles into Zope using LoadSite

Jean Jordaan jean@mosaicsoftware.com
Thu, 9 Nov 2000 14:27:35 +0200


Hi Vimmers

I'd like to do a show-and-tell about how I

    - uploaded a bunch of textfiles into Zope (by making use of
      'load_site.py')

    - and turned them into ZClasses (using a DTML method).

This is not a flashy "here's how to do it" piece -- it's really a
request for advice on how to do things better, and a step-by-step from a
relative beginner, for other relative beginners.

OK, to business. I used 'loadsite-1.4.0-test.tgz'. From the README::

    LoadSite version 1.3
    Based on load_site.py from Digital Creations

    Modified by Oleg Broytmann <phd2@earthling.net>
    Modified by Itamar Shtull-Trauring <itamars@ibm.net>
    wxLoadSite uses code written by Amos Latteier, and uses Fredrik
    Lundh's xmlrpclib.

(Is there a newer version available?).

I started from a flattext database, with a structure like:


ID|penname|pensurname|notes|title|vol|nr|year|printed|text|name|surname|date
|nick|email

That is: 15 fields, pipe-delimited, containing different kinds of data
(strings, multiline text, dates, email, ...). Here is a shortened sample
record:

    66|Zandra|Bezuidenhout||A room of my own||||Nee|A room of my own~nl~-
garden view,~nl~computer -,|||1998/7/23|zandra|zbez@netactive.co.za

I wanted to get the individual fields into Zope: for example, as
properties of the DTML Documents that LoadSite creates. To do this, I
had to hack on LoadSite slightly. I noticed that it found the body
attributes and aggregated all of them into one property (in
'htmlutils.py'). I guessed that I could import these attributes as
properties. To do so, I made the following changes to 'htmlutils.py'.
Originally::

    *** 95,98 ****
            if not self.seen_startbody:
                self.seen_startbody = 1
                self.accumulator = ""
                self.bodyattribs = join_attrs(attrs)

I changed this as follows::

    --- 95,99 ----
            if not self.seen_startbody:
                self.seen_startbody = 1
                self.accumulator = ""
                # njj: self.bodyattribs = join_attrs(attrs)
                self.bodyattribs = attrs

This way, bodyattribs is not changed into a string like 'name="value"
[...]', but remains a dictionary { penname:Zandra,
pensurname:Bezuidenhout }. Then I had to change how they were fed
into Zope. To do this, I changed 'Uploader.py' from::

    *** 113,115 ****
          if bodyattribs:
            self.call(object.manage_addProperty,
                 id="loadsite_bodyattribs", type="string",
value=bodyattribs)

To::

    --- 113,128 ----
          if bodyattribs:
              # njj: start
              for attrname, attrvalue in bodyattribs:
                  if attrname == 'text' or attrname == 'notes':
                      self.call(object.manage_addProperty,
                          id=attrname, type="text", value=attrvalue)
                  elif attrname == 'date':
                      self.call(object.manage_addProperty,
                          id=attrname, type="date", value=attrvalue)
                  else:
                      self.call(object.manage_addProperty,
                          id=attrname, type="string", value=attrvalue)
              # njj: end

            # njj: self.call(object.manage_addProperty,
            # njj:     id="loadsite_bodyattribs", type="string",
value=bodyattribs)

This works :)  Unfortunately, it's a hardcoded hack that will only work
for this particular database. To be useful, you should be able to tell
LoadSite to snarf the attributes of <meta> tags as properties, paying
attention to the ':type' convention of indicating the format of data.
But I don't understand sgmllib well enough to do that ..

I ran the flattext database through a pre-processor
('http://www.htmlpp.org', for the curious) to get files (named '1.html'
and so on) that look like this::

    <HTML><HEAD>
    <TITLE>A room of my own -- Zandra Bezuidenhout</TITLE>
    </HEAD>
    <body ID="66"
        penname="Zandra"
        pensurname="Bezuidenhout"
        notes=""
        titel="A room of my own"
        vol=""
        nr=""
        year=""
        printed="Nee"
        text="A room of my own
    - garden view,
    computer -,"
        name=""
        surname=""
        date="1998/7/23"
        nick="zandra"
        email="zbez@netactive.co.za" >
    </BODY></HTML>

When uploaded by LoadSite, this creates DTML Documents with 15
properties each. Now to get them into ZClasses. Here is the DTML Method
that does this::

    <dtml-with DTML_texts>
      <dtml-call "manage_addFolder('ZClass_texts')">
      <dtml-in "objectItems(['DTML Document'])">
        <dtml-call "REQUEST.set('ID'         , ID         )">
        <dtml-call "REQUEST.set('penname'    , penname    )">
        <dtml-call "REQUEST.set('pensurname' , pensurname )">
        <dtml-call "REQUEST.set('notes'      , notes      )">
        <dtml-call "REQUEST.set('title'      , title      )">
        <dtml-call "REQUEST.set('vol'        , vol        )">
        <dtml-call "REQUEST.set('nr'         , nr         )">
        <dtml-call "REQUEST.set('year'       , year       )">
        <dtml-call "REQUEST.set('printed'    , printed    )">
        <dtml-call "REQUEST.set('text'       , text       )">
        <dtml-call "REQUEST.set('name'       , name       )">
        <dtml-call "REQUEST.set('surname'    , surname    )">
        <dtml-call "REQUEST.set('date'       , date       )">
        <dtml-call "REQUEST.set('nick'       , nick       )">
        <dtml-call "REQUEST.set('email'      , email      )">
        <dtml-let folder_str=nick>
          <dtml-var folder_str>
          <dtml-let id_str="_.string.split(_['sequence-key'], '.')[0]">
            <dtml-var id_str>
            <dtml-with "_.getitem('ZClass_texts')">
              <dtml-try>
                <dtml-call "_.getitem(folder_str)">
              <dtml-except KeyError>
                <dtml-call "manage_addFolder(folder_str)">
              </dtml-try>
              <dtml-with "_.getitem(folder_str)">
                <dtml-call "REQUEST.set('id', id_str)">
                <dtml-with "manage_addProduct['TextProduct']">
                  <dtml-call "Text_add(_.None, _, NoRedir=1)">
                </dtml-with>
              </dtml-with>
            </dtml-with>
          </dtml-let>
        </dtml-let>
      </dtml-in>
    </dtml-with>

It iterates over all the DTML Documents in the 'DTML_texts' folder,
and creates a 'ZClass_texts' folder inside it. If no folder 'nick'
exists, it creates one; and inside the 'nick' folder, it creates
'Text' ZClasses named '1', '2' and so on (discarding the '.html' bit).
In this way, each author's texts are collected in their own folder as
ZClasses. Nice!

The author folders will probably become ZClasses subclassing 'ZClasses:
ObjectManager'.

TODO: for each author, a user with default password should be created.
An author should have permissions to administer their texts.

I hope someone finds this interesting!

Regards,
Jean Jordaan