[Zope-Coders] Re: [Zope-dev] Unicode treatment in 2.6b1

Toby Dickenson tdickenson@geminidataloggers.com
Mon, 7 Oct 2002 08:09:47 +0100


On Sunday 06 Oct 2002 9:26 pm, Florent Guillaume wrote:
> I've been toying with ideas quite a bit.
>
> To recap the problem: many users have page templates containing 8-bit
> strings, and code generating 8-bit strings. Just because they now also
> want to output Unicode strings, it's a bit harsh to ask them to remove
> all their 8-bit strings and revert to ascii (which practically speaking
> isn't feasible for the page templates).

Who is asking them to do that?  Surely their best option is to manually e=
ncode=20
their unicode strings into their favorite character encoding, and carry o=
n as=20
before. I agree this is an area for improved automation - but I dont thin=
k=20
its fair to characterise it as a problem.


Are these filesystem page templates, or through-the-web page templates?

For anything filesystem based, there needs to be some encoding specificat=
ion=20
inside the file (or directory) if there is any chance of product portabil=
ity.

For through-the-web page templates: why cant the page template be process=
ed in=20
unicode? Forget how the page is *stored* - lets assume it is an opaque pi=
ckle=20
inside data.fs.=20

> One solution would be for our derivative of StringIO in the page
> template code to check for the case where ''.join fails (UnicodeError),
> and then recode by hand any non-ascii 8-bit string into Unicode using
> some assumed encoding. But this is very slow

Have you measured the performance? This is how dtml currently works, and =
I=20
couldnt measure the difference.

> If we make the quite reasonable assumption that there will be only one
> legacy 8-bit encoding throughout the site

No chance. I use a range of third party products that use a mix of latin-=
1,=20
latin-9, and utf-8. However I dont expect to be able to have mixes handle=
d=20
automatically. Handling these data type boundaries is the responsibility =
of=20
the programmer.