[Zope-Coders] Re: [Zope-dev] Unicode treatment in 2.6b1

Toby Dickenson tdickenson@geminidataloggers.com
Fri, 27 Sep 2002 07:57:47 +0100


cc: zope-coders

On Thursday 26 Sep 2002 10:58 pm, Florent Guillaume wrote:

> For PageTemplates, the various blocks produced by the template and
> python are sent to an StringIO-like objects, which is responsible for
> converting them into a coherent thing when its getvalue() method is
> called. At the moment it doesn't deal very well mixed Unicode and
> non-Unicode strings so the reported failures don't surprise me. WE NEED
> TO FIX THIS BEFORE THE NEXT BETA,

I agree. Is someone committed to working on this?


> probably also by providing an explicit
> native encoding.

Thats not what dtml currently does, and I dont see an obvious reason why =
page=20
templates should be different. The dtml semantics have been worked out=20
carefully over the last few years.

The problem with this proposed approach is that it confuses the encoding =
of=20
the *document* with the encoding of the *attributes* of the objects which=
 are=20
used to create the document. Page templates often deal with diverse objec=
ts=20
from different source; how is it to know that all objects use the same=20
character encoding for 8-bit strings?

New objects should be exposing these attributes as unicode objects, and l=
egacy=20
objects would have had to expose them as latin-1 if it wanted them render=
ed=20
correctly in the ZMI.

Legacy page template using legacy non-latin-1 properties will continue to=
 work=20
unchanged as long as it does not encounter a unicode object while the=20
template is being rendered.

I dont believe it is possible to introduce unicode without some form of p=
ain.=20
The scheme implemented for dtml puts all the pain on users who are MIXING=
=20
non-ascii non-latin-1 8 bit string objects with unicode object, which is=20
perhaps not a bad thing.