[Zope] Indexing and Searching through XML files

Giuseppe Bonelli giuseppe.bonelli at tiscali.it
Fri Sep 12 19:20:52 EDT 2003


Under Zope it is definitely possible, under Plone I don't know as I
don't have direct experience with it.

I definitely did this (and much more ... ;-)) from Zope.

Your mileage may vary, but this is the route I have followed with _very_
satisfactory results.

1. use localFs or alike to map yur xml doc base from the filesystem to
ZooDB
2. install the standard xml python plumbing, i.e. pyXML and/or 4Suite
3. install the zope xml plumbing of your choiche. I use the
zopexmlmethods product. This gives you an easy and very reliable way to
perform XSLT (and much more).
4. depending on the structure of your xml files you may find useful to
write an import routine which split the xml files in chunks and create a
structure of custom zope classes; this is not really necessary, but I
think is a best practice as is performance-friendly; you will ned at
least one container class derived from Folder and a content class
derived from SimpleItem and both need to be catalog aware. This will
definitely helps you in indexing and bulding a navigation path through
the html produced by the XSLT. Obviously having one or more dtd
describing your xml content would be very advisable.
5. for indexing the xml content you need some xml stripping code which
extracts the content as unicode and feed the textIndexes you need (I use
the TextIndexNG products instead of the standard Zope textIndex).

All this gives you a tremendous amount of flexibility and a very
scalable infrastructure.

Lessons I learned developing all this:

- use xslt only when really necessary; i.e only for HTML (or other
formats) rendering.
- import the xml into ZooDB using custom classes
- when importing, transform some high level structure present in the xml
content to python properties (for example chapter titles, section
headings, ecc.).
- remember that dtml/tal is a faster templating system than plain xslt
as xml parsing has a significant performance overhead. This, actually,
is the old "separate logic from presentation" mantra:if you need to
apply logic to your content, parse once from xml to native python
structures and use python methods to do whatever you need. On this
respect you my find useful two remarkable python modules: elementtree
and pyXRP: both gives you an easy path from xml to native python
structures. Remember also that is very easy (and fun) to create xml
streams from python lists/tuples.
- pay _extreme_ attention to unicode related issues: this means
transforming from xml strings to unicode types as soon as you read the
xml content into python
- use Zcatalog as much as you can (but this should be standard Zope
practice).
- put everything behind apache and you will have a wonderfull three
level chaching system: level0=xslt chaching made by zopexmlmethod,
level1=zope standard chace system, level2=apache
- use _always_ absolute urls !!!

All this seems complicated , but in reality it isn't, thanks to the
standard services python/Zope gives you and to the remarkable products
developed by the bright folks on this list!!!

Hopes this helps,

__peppo


> -----Original Message-----
> From: zope-bounces at zope.org [mailto:zope-bounces at zope.org]On Behalf Of
> FNk
> Sent: giovedi 11 settembre 2003 10.24
> To: Zope
> Subject: [Zope] Indexing and Searching through XML files
>
>
> Hi,
>
> I got lots of XML-files(about 100.000) in a directory tree on my file
> system. I want to publish those files using Zope/Plone.
>
> I need to be able to index them in their native format
> without having to
> upload them in the ZODB, let users search their contents
> through zope, and
> have the result displayed.
>
> Is it possible? Did anybody ever do this? Any suggestions?
>
> I'm running zope-2.6.2 and Plone-1.1.
>
> Thanks,
>
> Fab.
>
>
> _______________________________________________
> Zope maillist  -  Zope at zope.org
> http://mail.zope.org/mailman/listinfo/zope
> **   No cross posts or HTML encoding!  **
> (Related lists -
>  http://mail.zope.org/mailman/listinfo/zope-announce
>  http://mail.zope.org/mailman/listinfo/zope-dev )
>




More information about the Zope mailing list