[Zope] Problems with ZCatalog-indexing

Michel Pelletier michel@digicool.com
Tue, 28 Mar 2000 08:20:41 -0800


Tobias Kiesling wrote:
> 
> Hello all,
> 
> I've got some problems with indexing in ZCatalog.
> 
> First of all it doesn't seem possible to index words containing anyl
> non-ascii characters (most important german umlauts). On zope.org the
> only info on charactersets I found was, that there are some supported
> ones, but I couldn't find out which these are.
> 
> How is it possible to have a correct indexed catalog on a german site ?

You must your locale environment variable correctly.  See 'man locale'
on linux, for example.  Also, you can pass z2.py the -L argument to
explicitly set a locale.  This will allow you to index german
characters.
 
> The second problem is related with the first one, but not restricted to
> german characters.
> Words where a hyphen appears, e.g. names like 'Hans-Dietrich', aren't
> indexed at all, in the example 'Hans-Dietrich'
> cannot be found on the site, but 'Hans' or 'Dietrich' cannot be found,
> either.

Hyphens are hard coded in the text index to split words.  This is
because the text splitter is not very flexible and is very english
centric.

To fix this, you must either 1) create your own splitter, 2) update to
the latest CVS and create your own vocabulary and provide a new
splitter.  

The CVS version of Catalog is much more language neutral, and there are
now two kids of vocabulary object, English and Globbing-English.  Soon
someone I'm in contact with will be providing a Japanese vocabulary.  If
you have the patience, you should developer a German vocabulary to
address these issue.

-Michel