[Zope-Coders] Re: [Zope-Checkins] CVS: Zope27/lib/python/TAL - TALInterpreter.py:1.68.26.2

09 Sep 2002 16:52:06 +0200

On Mon, 2002-09-09 at 16:28, Guido van Rossum wrote:
> > > Depends on what you want.  Since Python has no standard API for
> > > writing Unicode to files, this is indeed nontrivial.  I think the
> > > Python StringIO.py might accidentally support Unicode.
> >=20
> > It doesn't (python 2.2):
> > from StringIO import StringIO
> > s =3D StringIO()
> > s.write('=E9')
> > s.write(u'a')
> > s.getvalue()
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in ?
> >   File "/usr/lib/python2.2/StringIO.py", line 169, in getvalue
> >     self.buf +=3D ''.join(self.buflist)
> > UnicodeError: ASCII decoding error: ordinal not in range(128)
>=20
> Depends on what you call "support".  :-)

Well, yes :-)

This doesn't necessarily mean StringIO has to be changed, but rather
that those who call it have to ensure that they always pass the same
kind of strings to it.

> > But then it's not clear what should be done in any case. In this
> > example the first '=E9' is in a "native" coding and shouldn't be
> > allowed by the application. But because TALES can get its values
> > from python code, it's conceivable that we can receive native
> > strings and have to decide what to do with them.
> >=20
> > Localizer's choice is to convert all Unicode strings to standard
> > strings in the desired output charset, and leave "native" strings
> > alone (supposing the application has generated them in the correct
> > way).
>=20
> OK, but then I don't understand why it needs to solve the problem you
> show above.  Either it converts everything to an 8-bit encoding before
> it hits the StringIO object, and then you don't need a Unicode-aware
> StringIO object, or it *only* writes Unicode to the StringIO object
> (in which case StringIO.py is just fine).

To be able to do the conversion before it hits the StringIO a number of
places in Zope would have to be changed. So it was decided that it was
simpler to replace only the StringIO part and make it do the
conversions.

> > 2.6's choice is to allow building a complete response using Unicode
> > strings, and do the conversion only upon publishing to the
> > client. But then we have to convert a mix of non-unicode strings and
> > unicode strings, which can cause the problems outlined above.
>=20
> I don't understand how your monkey patch helps you solve this
> solution.
>=20
> (And if you have a patch for StringIO.py, maybe you can make it
> available for the Python standard library?  Others might need this.)

For the unicode-StringIO part I'm just experimenting here, it's too
early. Probably, a Unicode-aware StringIO would be one that takes an
additional join function as parameter, this function being responsible
for joining strings of arbitrary type and returning a sane result.
Basically join_unicode. And defaulting to ''.join.

> > > > What do you think about it now. Should I revert them?
> > >=20
> > > Ask the Zope Pope.
> >=20
> > Jim, do you want those reverted?
> >=20
> > Again, for the record, my argument to leave those in is: they don't
> > harm, they'll be removed later, and in the meantime third-party product=
s
> > can still function.
>=20
> At the very best, I propose something like this instead:
>=20
> from StringIO import StringIO
>=20
> CustomStringIOClass =3D StringIO
>=20
> def setCustomStringIO(C):
>     CustomStringIOClass =3D C
>=20
> class C: # This is the class you patched
>=20
>     def StringIO(self):
>         return self.CustomStringIOClass()

Yes, that's much cleaner.
I can do that change if I'm given a go-ahead.

Florent

--=20
Florent Guillaume, Nuxeo (Paris, France)
+33 1 40 33 79 87  http://nuxeo.com  mailto:fg@nuxeo.com