[Zope] Re: Zope iso-8859-1 to utf-8

Tue Sep 13 14:10:08 EDT 2005

Pascal Peregrina wrote at 2005-9-13 14:21 +0100:
>I see...  And what python function would you use for conversion ?

   unicode(iso_string, 'iso-8859-1').encode('utf-8')

>I made some tests and was surprised of the results... 
>I switched ZMI to UTF-8 (management_page_charset) and edited some of my
>documents / properties and all went fine.

Strange. I had expected that non-ASCII characters were displayed
in a wrong way.

>The generated documents are still sent to browsers as iso-8859-1, and are
>not broken.

If you switched to "utf-8", then *you* should ensure that
they are sent as "utf-8".

>So my question would be : which valid UTF-8 characters (for typical Western
>languages like English, French, Spanish, ...) would be invalid in iso-8859-1

This is a strange question...
The problem does not lie with the characters but with their codes.

The code agrees between UTF-8 and iso-8859-1 for precisely the
ASCII characters (unicode chars 0-127). Unicode characters
128-255 use 2 bytes in UTF-8 but 1 in "iso-8859-1". Unicode characters
256 and up can be represented encoded in "UTF-8" but not "iso-8859-1".

> ...
>Last thing, if ZMI is switched to UTF-8, then what is the difference between
>ustring/string, etc properties ?

"ustring" is a unicode string: stored inside Zope as unicode,
sent to the browser UTF-8 encoded and expected to come back
UTF-8 encoded.

"string" is a plain (non unicode) string. It should use
the encoding of your page (UTF-8, once you switched to UTF-8).

-- 
Dieter