[Zope-Coders] Re: [Zope-dev] Unicode treatment in 2.6b1

Toby Dickenson tdickenson@geminidataloggers.com
Mon, 30 Sep 2002 07:50:33 +0100


On Friday 27 Sep 2002 5:02 pm, Florent Guillaume wrote:

> But how much feedback from international users did you have until
> recently?

A fair amount of 'it works for me' in the early days. I am honestly supri=
sed=20
that this objection has been raised today.

The patches were labelled as 'unicode patches' which probably had some ef=
fect=20
on the selection of users who applied the patches. I dont expect anyone w=
ho=20
applied bleeding edge unicode patches to be in the scenario you described=
=20
where most things are stored in favorite encoding, and a unicode value is=
=20
only encountered accidentally.

> > how is it to know that all objects
> > use the same character encoding for 8-bit strings?
>
> Because if, say, some Greek guy puts 8-bit strings in the source of his
> pages (and believe me he does it all the time :-), and in the attribute=
s
> of objects, they're all likely to be in *his* native default encoding,
> which happens to be latin-7. Until Unicode was in, he just had to slap =
a
> content-type: text/html; charset=3Diso-8859-7 and all was well. Same th=
ing
> for Russian, Japanese, etc. My point is that it is very likely that
> there was a uniformity of encodings (otherwise the application would
> already render weirdly on the browsers).

Yes. I can see that alot of legacy zope had to work that way. So far I ha=
d=20
assumed that this practice would die out once unicode support was availab=
le=20
throughout the framework, and books. Any thoughts on this?

> Enters Unicode. For some reason, now part of the strings he generates
> are in Unicode

In Zope 2.5 this would be likely to cause a UnicodeError exception. The f=
ix=20
for this bug in this application would be to manually encode the unicode=20
string in latin-7. In 2.6 the effect of this bug is different, but (IMO) =
the=20
recommended solution should be the same.

> > New objects should be exposing these attributes as unicode objects, a=
nd
> > legacy objects would have had to expose them as latin-1 if it wanted =
them
> > rendered correctly in the ZMI.
>
> (testing manage_propertiesForm)
>
> In Zope 2.5.1, the ZMI doesn't set any charset encoding so the scenario
> above would send 8bit character strings, and the user would have his
> browser autodetect (or not) that the encoding should be latin-7. This i=
s
> how it works today.
>
> When migrating to Zope 2.6, all the preexisting string properties
> containing latin-7 will be sent as 8-bit strings in the ZMI, which woul=
d
> be encoded by DTML into Unicode as latin-1 (fixed encoding) so would
> render unexpectedly.
>
> Providing an explicit charset for conversions
> (maybe simply as an environment variable, that's for legacy after all)
> would correct that.

An environment variable would be better than per-document settings.

Suppose this Greek wants to publish a product, but the rest of the world =
finds=20
that it only works if this environment variable is set to latin-7.  A mor=
e=20
likely scenario is that the greek developer will find products only work =
if=20
the environment variable is unset, or set to latin-1. Is this an overall=20
better solution?