[Zope] Character set problems

Fri Sep 9 13:43:53 EDT 2005

Niklas Saers wrote at 2005-9-8 18:59 +0200:
> ...
>What is the standard way of ensuring that I get the correct data?
>
>I tried writing a little converter, but the string
>
>tekst = tekst.replace("\xc3\xa6", "ae");
>
>gives me the error *UnicodeDecodeError: **'ascii' codec can't decode byte 
>0xc3 in position 0: ordinal not in range(128)*

Apparently, "tekst" contains a unicode string.
Mixing unicode and non-unicode (as you do above) is tricky
in Python (avoid it, if you can).

If you know that all non-unicode text uses the same encoding,
you can set Python's "default encoding". It can be
set with "sys.setdefaultencoding(encoding)" -- but only
during startup (usually in "sitecustomize.py").
Whenever, unicode and non-unicode come together, Python uses
its default encoding to convert the non-unicode to unicode.

If you do nothing, the "default encoding" is "ascii" -- resulting
in the above error.

The ideal way would be to have all your data and templates unicode and
let ZPublisher (more precisely "ZPublisher.HTTPResponse.HTTPResponse")
convert to the output encoding (defined via
the "charset" parameter of the "Content-Type" response header).

Unfortunately, only few parts of Zope are already fully unicode
aware: actually, only XML PageTemplates, but neither HTML PageTemplates
not Python Scripts. This will change with Zope 3.

Up to that time, you can live with either setting
Python's default encoding or converting unicode explicitly
to your encoding as soon as you get it.
In either case, you *MUST* "declare" the encoding you are using
in the "charset" parameter of the "Content-Type" response header.

-- 
Dieter