[ZPT] Re: Fix for UnicodeError: ASCII decoding error: ordinal not in range(128)

vlado vlado@vintech.bg
24 Jan 2003 18:42:30 +0200


Hi,

I tryed LOCALIZER_USE_ZOPE_UNICODE=3D1 and hitted the same UnicodeError
again.
I'm using z2.6.1b1, Localizer1.0, TranslationService0.2=20
this is the traceback:

    *  Module ZPublisher.Publish, line 150, in publish_module
    * Module Products.Localizer, line 55, in new_publish
    * Module ZPublisher.Publish, line 114, in publish
    * Module Zope.App.startup, line 182, in zpublisher_exception_hook
    * Module ZPublisher.Publish, line 98, in publish
    * Module ZPublisher.mapply, line 88, in mapply
    * Module ZPublisher.Publish, line 39, in call_object
    * Module Shared.DC.Scripts.Bindings, line 252, in __call__
    * Module Shared.DC.Scripts.Bindings, line 283, in _bindAndExec
    * Module Products.CMFCore.FSPageTemplate, line 189, in _exec
    * Module Products.CMFCore.FSPageTemplate, line 122, in pt_render
    * Module Products.PageTemplates.PageTemplate, line 95, in pt_render
      <FSPageTemplate at /Sites/test/login_form used for /Sites/test>
    * Module TAL.TALInterpreter, line 186, in __call__
    * Module TAL.TALInterpreter, line 230, in interpret
    * Module TAL.TALInterpreter, line 689, in do_useMacro
    * Module TAL.TALInterpreter, line 230, in interpret
    * Module TAL.TALInterpreter, line 622, in do_loop_tal
    * Module TAL.TALInterpreter, line 230, in interpret
    * Module TAL.TALInterpreter, line 400, in do_optTag_tal
    * Module TAL.TALInterpreter, line 385, in do_optTag
    * Module TAL.TALInterpreter, line 380, in no_tag
    * Module TAL.TALInterpreter, line 230, in interpret
    * Module TAL.TALInterpreter, line 655, in do_condition
    * Module TAL.TALInterpreter, line 230, in interpret
    * Module TAL.TALInterpreter, line 400, in do_optTag_tal
    * Module TAL.TALInterpreter, line 385, in do_optTag
    * Module TAL.TALInterpreter, line 380, in no_tag
    * Module TAL.TALInterpreter, line 230, in interpret
    * Module TAL.TALInterpreter, line 689, in do_useMacro
    * Module TAL.TALInterpreter, line 230, in interpret
    * Module TAL.TALInterpreter, line 745, in do_onError_tal
    * Module StringIO, line 160, in getvalue

UnicodeError: ASCII decoding error: ordinal not in range(128) (Also, an
error occurred while attempting to render the standard error message.)

On Fri, 2003-01-24 at 04:05, Florent Guillaume wrote:
> Hi Folks,
>=20
> Sorry for the crosspost but this really covers ZPT and Localizer, and is
> of great interest to the Plone i18n users. Please keep your answers to
> the lists where they are legitimate -- and I'd appreciate being kept as
> Cc.
>=20
>=20
> Ok, I got down to the reason for the infamous "UnicodeError: ASCII
> decoding error: ordinal not in range(128)". Thanks to all who cooperated
> in that matter.
>=20
> Readers wanting the quick solution without the rest of the discussion
> can skip to the part bracketed by #######.
>=20
> First a reminder of the problem for those not familiar with it.
>=20
> In many situations, in a multilingual Plone site using Localizer, people
> got the above error.
>=20
> This in fact happened in the following circumstances:
>=20
> - A page template like:
>         <h1 i18n:translate=3D"edit_type_header">
>         Edit an object of type
>           <span i18n:name=3D"type">
>             <span i18n:translate=3D""        =20
>                   tal:content=3D"python:here.getTypeInfo().Title()"=20
>                   tal:omit-tag=3D"">Type</span>
>             </span>=20
>         </h1>
>=20
> - A translation for type_header of the form
>         =C3=89diter un objet de type ${type}
>   where the translation contains non-ascii characters ("=C3=89" here),
>=20
> - A substituted string for ${type} that itself has non-ascii characters,
>   for instance "d=C3=A9j=C3=A0".
>=20
> What happens behind the scene during the template evaluation is complex,
> but at some point the <span i18n:translate> gets evaluated, the message
> catalog gets consulted and a u'd=C3=A9j=C3=A0', as Unicode, is returned.
>=20
> At that point Localizer has a mechanism to convert all non-Unicode
> strings to their final browser encoding, in a plain string of bytes,
> so for instance using UTF-8 it would substitue 'd\xc3\xa9j\xc3\xa0'.
>=20
> The problem here is that this string is not destined to go to the
> browser yet, but will first be used further in the ZPT processing to be
> substituted for ${type}. So later in the processing, we have to
> substitute
>      u'=C3=89diter un objet de type ${type}'
> using the mapping
>      {u'type': 'd\xc3\xa9j\xc3\xa0'}
>=20
> At that point, we have a mix of Unicode (which is legitimate) and some
> plain string encoded in the final output. This encoding came too soon!
> We would still like to have Unicode here... If we still had it it would
> work.
>=20
> Fortunately, I kind of foresaw this sort of problem a few months ago,
> and I included in Localizer a way to turn off its early conversion to
> browser output encoding.
>=20
> #######
>=20
> To do that, you have to launch Zope with the LOCALIZER_USE_ZOPE_UNICODE
> environment variable set to something not empty, for instance "yes".
>=20
> #######
>=20
> Now, why did Localizer choose to do early encoding by default? The
> problem is the following: during ZPT parsing, we're building something
> from the concatenation of a list of strings, some which are Unicode if
> they come from a message catalog (or some TALES returning Unicode), some
> which are plain strings like most of the page template itself.
>=20
> If all the plain strings are only ever pure ASCII, then there's no
> problem doing a join of all of them with something Unicode, and the
> result will be Unicode. That's what pure Zope 2.6 does by default. It
> then, in ZPublisher, proceeds to encode that resulting Unicode string in
> the preferred browser encoding and sends that. This mode is what you get
> if you define LOCALIZER_USE_ZOPE_UNICODE.
>=20
> But when Localizer was introduced, it was to be used by people who had
> localized their page templates by hand and thus included a lot of
> non-ASCII characters in them, in their preferred encoding, say, UTF-8,
> together with a RESPONSE.setHeader('Content-Type') with that encoding.
> So because of those non-ASCII characters, the strategy of the previous
> paragraph wouldn't work. So Localizer decided to encode all Unicode
> strings to the preferred encoding (assumed to be the same as the browser
> encoding) as soon as it saw them inside the ZPT parsing.
>=20
> Unfortunately, as we saw at the beginning, this can't work in the
> presence of i18n:name substitutions.
>=20
> As a conclusion, I recommend that Localizer use the standard Zope
> behavior by default, and only enable its early conversion when some new
> environment variable, for instance LOCALIZER_UNICODE_CONVERSION, is set.
> This will only be useful to people who have half-translated their site
> (some Unicode from the message catalog, and still some non-ASCII in the
> templates).
>=20
>=20
>=20
> A final digression about ZPT:
>=20
> I think the correct way to build the result of a ZPT would be to build a
> Unicode strings as soon as TALIntepreter detects a non-ASCII string. It
> would then encode the non-ASCII to Unicode using some kind of site- or
> page-default encoding. This would avoid most of our problems, and would
> anyway be more robust. It would simply mean replacing StringIO's
> (actually FasterStringIO's) getvalue method with an intelligent join
> that does the conversion I just outlined if needed.
>=20
> There remains the problem of deciding which is the default encoding to
> use...
>=20
>=20
>=20
> Thanks for any comments (and please watch where you send them!).
>=20
>=20
> Florent
>=20
>=20
> --=20
> Florent Guillaume, Nuxeo (Paris, France)
> +33 1 40 33 79 87  http://nuxeo.com  mailto:fg@nuxeo.com
>=20
--=20
Vladimir Iliev