[Zope] How to convert Zope instance charset?

Daniel Dekany ddekany at freemail.hu
Sun Apr 24 11:45:27 EDT 2005


Sunday, April 24, 2005, 4:22:10 PM, Andreas Jung wrote:

>
>
> --On Sonntag, 24. April 2005 16:03 Uhr +0200 Daniel Dekany 
> <ddekany at freemail.hu> wrote:
>
>> Sunday, April 24, 2005, 2:36:24 PM, Andreas Jung wrote:
>>
>>> --On Sonntag, 24. April 2005 14:18 Uhr +0200 Daniel Dekany
>>> <ddekany at freemail.hu> wrote:
>>>
>>>> I have a Zope instance that uses utf-8 for everything. Since
>>>> Python/Zope/etc practically doesn't support utf-8,
>>>
>>> Please explain in which sense Zope would not support utf-8. For your
>>> information:
>>
>> It can't sort strings alphabetically *anywhere* (concretely: the
>> accented letters will go to the end of the list -- I guess because 0x80
>> is mathematically greater than the code of the US-ASCII characters).
>
> This is neither a problem of Zope nor of Python! A Python string has no
> notion an an encoding. The sort method can not smell the encoding.

First of all, in this thread I don't care whose mistake it is. My
concern is if I can use Zope with UTF-8 (in fact, Plone) in reality or
not. Assume that I'm using a few non-US-ASCII characters, and I want
sometimes show things alphabetically sorted...

Then, of course if something wants to collate string for human reading,
it will use locale.strcoll, which do consider charset and locale. That
locale.strcoll is wrong with UTF-8, that's certainly the mistake of
Python, right?

> Instead use Python unicode strings and depend on the sorting order
> defined by the Unicode standard.

I take that advice, but unfortunately it's not about my Python code, but
about other people's Python code.

> This is an application-level problem but not a server-side problem.

Zope itself gives a method for sorting strings:
DocumentTemplate.sequence.sort. Many of the products relies on that for
sorting. And that sorts UTF-8 incorrectly (I guess because
locale.strcoll does it incorrectly). Also, ZCatalog sorts incorrectly
(surely for the same reason), which is also the part of the standard
Zope distribution.

>>> Plone has UTF8 as default charset.
>>
>> Believe me, I really hope I'm wrong. So how could I achieve that strings
>> are sorted correctly? If it works for someone, how? (I have locale
>> hu_HU.UTF-8 in zope.conf, I have even printed
>> locale.getlocale(locale.LC_COLLATE) from products and external methods,
>> and it was hu_HU.UTF-8. Note that at least on Python level sorting with
>> hu_HU.ISO-8859-2 works... so I hope it would work with Plone as well.)
>>
>
> see above..Also the standard sort() methods of Python does not care about
> your
> locales (why should it)....strings are streams of bytes...nothing else...

I know, and I have referred to locale.strcoll, which does care about
encoding and locale. Seems many products use that (indirectly) when they
want to sort something.

> sort() accepts a user-defined comparison method of implement user-specific
> sorting.

Yes, but this doesn't help, unless I write an UTF-8 comparison method,
and then find all sort() and locale.sort() calls in Zope, Plone, and in
other products, and patch them all...

> And there are also methods in Python "locale" module to perform
> locale-dependent comparison.

Which I can't get working with UTF-8, it puts non-US-ASCII letters at
the end of the list. Somebody did? How? I'm all ears. I guess the Plone
site should suddenly sort correctly then, at least on the places where
the programmer of the Zope product was wise enough not to use raw
sort().

> Once again: you must solve your problem on the application layer...

(Anyway string collation is not an application level problem in
principle. It is the same for a book store application and for a first
person shooter, there is nothing application specific in it. If Python
is not mature enough to take this task, that's a different question.)

> Zope does not help you at this point because it can't.

So however I formulate it, the end is that you *practically* can't use
UTF-8 with Zope, unless you are using a language that doesn't use
non-US-ASCII characters, in which case you don't utilize UTF-8. Hence, I
said it is "not supported"... It doesn't mean that it is the mistake of
Zope, it just means that you can't use it.

So, back to the topic... Since UTF-8 is not working (it seems), how
could I convert that already filled instance to use ISO-8859-2 instead
of UTF-8? Some tool helps me in it done relatively easy?

> -aj

-- 
Best regards,
 Daniel Dekany



More information about the Zope mailing list