unicode and ZCTextIndex, WAS:RE: [Zope] strange unicode behaviour

Giuseppe Bonelli giuseppe.bonelli@tiscali.it
Thu, 24 Jul 2003 16:21:11 +0200


Thanks to all who responded to my original post, particularly to Toby
who pointed me in the right direction: I was mistakenly (and stupidly
...) using:

<meta http-equiv=3D"content-type"
content=3D"text/html;charset=3D&dtml-encoding;">
instead of:
<dtml-call
"RESPONSE.setHeader('content-type','text/html;charset=3Dutf-8')">

in my standard_html_header, so I was encoding on the browser, but not
over http !!!

This solved everything, but an issue remains:

I started fiddling with encoding, when I wanted to full text index my
utf-8 encoded unicode content with ZCTextIndex and the lexicon gave me
the usual ordinal not in range decoding error when building the index.

Now I have a clean unicode setup (i.e. no locale when starting Zope and
no sys.setdefaultendoding when starting python 2.1.3) and the lexicon
started again to give me errors, for example when indexing a string
containing "isn't" (the errors are generated at line 133 in lexicon.py).

I searched the mail list archives and I found references to an old
ZCTextIndex bug (597 in the collector), whose resolution seems to
require starting zope with a -L option.

Now I am a little bit confused and I ask if someone has a firm
understanding on the status of Zope find/search support of unicode
string containing high chars.

Specifically:
1. Does the standard ZCTextIndex coming with Zope 2.6.1 support this ?
2. If yes, do I need to start Zope with a particular locale ?
3. Regarding these issues, is the recently released TextIndexNG ver.2 a
better solution ?

NB: if this matters, I have utf-8 encoded content in various languages,
so I would prefer not to have to use any -L setting when starting Zope
as I do not need to support TTW content editing.

TIA,

--peppo

> -----Original Message-----
> From: Toby Dickenson [mailto:tdickenson@geminidataloggers.com]
> Sent: gioved=EC 24 luglio 2003 9.33
> To: giuseppe.bonelli@tiscali.it; Giuseppe Bonelli; zope@zope.org
> Subject: Re: [Zope] strange unicode behaviour
>
> [...snip...]

> > I have utf-8 as sys.defaultencoding and I do not load any
> > locale when
> > starting Zope.
>
> That is old advice that predates Zope 2.6. It was never a
> particularly good
> idea, because it affects all of pythons internals. You only
> need to encode
> your unicode as utf-8 (or other encoding) before sending it
> over the network,
> and ZPublisher is capable of doing that itself if you tell it
> the encoding in
> the header.
>
> --
> Toby Dickenson - http://www.geminidataloggers.com/people/tdickenson
>
> Want a job like mine?  http://www.geminidataloggers.com/jobs
> for Software
> Engineering jobs at Gemini Data Loggers in Chichester, West
> Sussex, England
>
>