[ZPT] Re: Fix for UnicodeError: ASCII decoding error: ordinal not in range(128)

Juan David Ibáñez Palomar j-david@noos.fr
Fri, 24 Jan 2003 13:09:09 +0100


This is a multi-part message in MIME format.
--------------070809080803010603040100
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit


Hi,

Thanks Florent for the report.

I'm not willing to "fix" something just to break something else.
I prefer to go deeper in the solution you explain at the end,
"A final digression about ZPT", even if ZPT does not it by itself
maybe Localizer could with a dynamic patch.

Unfortunately I don't have time right now to research it by myself.
This means that either you or somebody else find a solution that
works in both situations, either you wait for me to find time to
find a solution.

Just for the record I attach a zexp file. It is a folder with two
page templates (case1 and case2), case1 one works when the variable
LOCALIZER_USE_ZOPE_UNICODE is set, case2 works when it isn't. To
test it use Localizer 1.0 and TranslationService 0.2. I want
something that works for both cases, nothing else is an option.


Best regards,
david


Florent Guillaume wrote:

>Hi Folks,
>
>Sorry for the crosspost but this really covers ZPT and Localizer, and is
>of great interest to the Plone i18n users. Please keep your answers to
>the lists where they are legitimate -- and I'd appreciate being kept as
>Cc.
>
>
>Ok, I got down to the reason for the infamous "UnicodeError: ASCII
>decoding error: ordinal not in range(128)". Thanks to all who cooperated
>in that matter.
>
>Readers wanting the quick solution without the rest of the discussion
>can skip to the part bracketed by #######.
>
>First a reminder of the problem for those not familiar with it.
>
>In many situations, in a multilingual Plone site using Localizer, people
>got the above error.
>
>This in fact happened in the following circumstances:
>
>- A page template like:
>        <h1 i18n:translate="edit_type_header">
>        Edit an object of type
>          <span i18n:name="type">
>            <span i18n:translate=""         
>                  tal:content="python:here.getTypeInfo().Title()" 
>                  tal:omit-tag="">Type</span>
>            </span> 
>        </h1>
>
>- A translation for type_header of the form
>        Éditer un objet de type ${type}
>  where the translation contains non-ascii characters ("É" here),
>
>- A substituted string for ${type} that itself has non-ascii characters,
>  for instance "déjà".
>
>What happens behind the scene during the template evaluation is complex,
>but at some point the <span i18n:translate> gets evaluated, the message
>catalog gets consulted and a u'déjà', as Unicode, is returned.
>
>At that point Localizer has a mechanism to convert all non-Unicode
>strings to their final browser encoding, in a plain string of bytes,
>so for instance using UTF-8 it would substitue 'd\xc3\xa9j\xc3\xa0'.
>
>The problem here is that this string is not destined to go to the
>browser yet, but will first be used further in the ZPT processing to be
>substituted for ${type}. So later in the processing, we have to
>substitute
>     u'Éditer un objet de type ${type}'
>using the mapping
>     {u'type': 'd\xc3\xa9j\xc3\xa0'}
>
>At that point, we have a mix of Unicode (which is legitimate) and some
>plain string encoded in the final output. This encoding came too soon!
>We would still like to have Unicode here... If we still had it it would
>work.
>
>Fortunately, I kind of foresaw this sort of problem a few months ago,
>and I included in Localizer a way to turn off its early conversion to
>browser output encoding.
>
>#######
>
>To do that, you have to launch Zope with the LOCALIZER_USE_ZOPE_UNICODE
>environment variable set to something not empty, for instance "yes".
>
>#######
>
>Now, why did Localizer choose to do early encoding by default? The
>problem is the following: during ZPT parsing, we're building something
>from the concatenation of a list of strings, some which are Unicode if
>they come from a message catalog (or some TALES returning Unicode), some
>which are plain strings like most of the page template itself.
>
>If all the plain strings are only ever pure ASCII, then there's no
>problem doing a join of all of them with something Unicode, and the
>result will be Unicode. That's what pure Zope 2.6 does by default. It
>then, in ZPublisher, proceeds to encode that resulting Unicode string in
>the preferred browser encoding and sends that. This mode is what you get
>if you define LOCALIZER_USE_ZOPE_UNICODE.
>
>But when Localizer was introduced, it was to be used by people who had
>localized their page templates by hand and thus included a lot of
>non-ASCII characters in them, in their preferred encoding, say, UTF-8,
>together with a RESPONSE.setHeader('Content-Type') with that encoding.
>So because of those non-ASCII characters, the strategy of the previous
>paragraph wouldn't work. So Localizer decided to encode all Unicode
>strings to the preferred encoding (assumed to be the same as the browser
>encoding) as soon as it saw them inside the ZPT parsing.
>
>Unfortunately, as we saw at the beginning, this can't work in the
>presence of i18n:name substitutions.
>
>As a conclusion, I recommend that Localizer use the standard Zope
>behavior by default, and only enable its early conversion when some new
>environment variable, for instance LOCALIZER_UNICODE_CONVERSION, is set.
>This will only be useful to people who have half-translated their site
>(some Unicode from the message catalog, and still some non-ASCII in the
>templates).
>
>
>
>A final digression about ZPT:
>
>I think the correct way to build the result of a ZPT would be to build a
>Unicode strings as soon as TALIntepreter detects a non-ASCII string. It
>would then encode the non-ASCII to Unicode using some kind of site- or
>page-default encoding. This would avoid most of our problems, and would
>anyway be more robust. It would simply mean replacing StringIO's
>(actually FasterStringIO's) getvalue method with an intelligent join
>that does the conversion I just outlined if needed.
>
>There remains the problem of deciding which is the default encoding to
>use...
>
>
>
>Thanks for any comments (and please watch where you send them!).
>
>
>Florent
>
>
>  
>


-- 
J. David Ibáñez, http://www.j-david.net
Software Engineer / Ingénieur Logiciel / Ingeniero de Software


--------------070809080803010603040100
Content-Type: application/octet-stream;
 name="utests.zexp"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="utests.zexp"

WkVYUAAAAAAAAAoDAAAAAAAAA+4oKFUKT0ZTLkZvbGRlcnEBVQZGb2xkZXJxAnRxA050Ln1x
BChVE3RyYW5zbGF0aW9uX3NlcnZpY2VxBShVCAAAAAAAAAoHcQYoVTZQcm9kdWN0cy5UcmFu
c2xhdGlvblNlcnZpY2UuUGxhY2VmdWxUcmFuc2xhdGlvblNlcnZpY2VxB1UaUGxhY2VmdWxU
cmFuc2xhdGlvblNlcnZpY2VxCHR0UVUCaWRxCVUGdXRlc3RzcQpVEl9fYWNfbG9jYWxfcm9s
ZXNfX3ELfXEMVQVhZG1pbnENXXEOVQVPd25lcnEPYXNVBWNhc2UycRAoVQgAAAAAAAAKDXER
KFUnUHJvZHVjdHMuUGFnZVRlbXBsYXRlcy5ab3BlUGFnZVRlbXBsYXRlcRJVEFpvcGVQYWdl
VGVtcGxhdGVxE3R0UVUIX29iamVjdHNxFCh9cRUoVQltZXRhX3R5cGVxFlUOTWVzc2FnZUNh
dGFsb2dxF2gJVQdnZXR0ZXh0cRh1fXEZKGgWVRNUcmFuc2xhdGlvbiBTZXJ2aWNlcRpoCVUT
dHJhbnNsYXRpb25fc2VydmljZXEbdX1xHChVCW1ldGFfdHlwZXEdVQlMb2NhbGl6ZXJxHlUC
aWRxH2gedX1xIChoHVUNUGFnZSBUZW1wbGF0ZXEhaB9VBWNhc2UxcSJ1fXEjKGgdaCFoH2gQ
dXRxJGgiKFUIAAAAAAAACghxJShoElUQWm9wZVBhZ2VUZW1wbGF0ZXEmdHRRVR5fX2JlZm9y
ZV9wdWJsaXNoaW5nX3RyYXZlcnNlX19xJyhjWlB1Ymxpc2hlci5CZWZvcmVUcmF2ZXJzZQpN
dWx0aUhvb2sKcShvcSl9cSooVQVfbGlzdHErXXEsKGNaUHVibGlzaGVyLkJlZm9yZVRyYXZl
cnNlCk5hbWVDYWxsZXIKcS1vcS59cS9VBG5hbWVxMGgec2JhVQZfcHJpb3JxMU5VCV9ob29r
bmFtZXEyaCdVEV9kZWZpbmVkX2luX2NsYXNzcTNLAHViVQdnZXR0ZXh0cTQoVQgAAAAAAAAK
BHE1KFUhUHJvZHVjdHMuTG9jYWxpemVyLk1lc3NhZ2VDYXRhbG9ncTZVDk1lc3NhZ2VDYXRh
bG9ncTd0dFFVE19fYmVmb3JlX3RyYXZlcnNlX19xOH1xOShLY2gedGguc2geKFUIAAAAAAAA
CgtxOihVHFByb2R1Y3RzLkxvY2FsaXplci5Mb2NhbGl6ZXJxO1UJTG9jYWxpemVycTx0dFFV
BXRpdGxlcT1VB2FscmVhZHlxPlUGX293bmVycT8oXXFAVQlhY2xfdXNlcnNxQWFoDXRxQnUu
AAAAAAAACgcAAAAAAAAA5igoVTZQcm9kdWN0cy5UcmFuc2xhdGlvblNlcnZpY2UuUGxhY2Vm
dWxUcmFuc2xhdGlvblNlcnZpY2VxAVUaUGxhY2VmdWxUcmFuc2xhdGlvblNlcnZpY2VxAnRx
A050Ln1xBChVDF9kb21haW5fZGljdHEFfXEGTlUHZ2V0dGV4dHEHc1UMX2RvbWFpbl9saXN0
cQgoTnRVAmlkcQlVE3RyYW5zbGF0aW9uX3NlcnZpY2VxClUSX19hY19sb2NhbF9yb2xlc19f
cQt9cQxVBWFkbWlucQ1dcQ5VBU93bmVycQ9hc3UuAAAAAAAACg0AAAAAAAACvCgoVSdQcm9k
dWN0cy5QYWdlVGVtcGxhdGVzLlpvcGVQYWdlVGVtcGxhdGVxAVUQWm9wZVBhZ2VUZW1wbGF0
ZXECdHEDTnQufXEEKFUGZXhwYW5kcQVLAFUCaWRxBlUFY2FzZTJxB1USX19hY19sb2NhbF9y
b2xlc19fcQh9cQlVBWFkbWlucQpdcQtVBU93bmVycQxhc1ULX2JpbmRfbmFtZXNxDShjU2hh
cmVkLkRDLlNjcmlwdHMuQmluZGluZ3MKTmFtZUFzc2lnbm1lbnRzCnEOb3EPfXEQVQZfYXNn
bnNxEX1xElUMbmFtZV9zdWJwYXRocRNVEHRyYXZlcnNlX3N1YnBhdGhxFHNzYlUFX3RleHRx
FVR0AQAAPGh0bWw+CiAgPGhlYWQ+CiAgICA8dGl0bGUgdGFsOmNvbnRlbnQ9InRlbXBsYXRl
L3RpdGxlIj5UaGUgdGl0bGU8L3RpdGxlPgogIDwvaGVhZD4KICA8Ym9keT4KCiAgICA8dGFs
OmJsb2NrIGNvbnRlbnQ9InN0cnVjdHVyZSBoZXJlL0xvY2FsaXplci9jaGFuZ2VMYW5ndWFn
ZUZvcm0iIC8+CgogICAgPGgxIGkxOG46dHJhbnNsYXRlPSJlZGl0X3R5cGVfaGVhZGVyIj4K
ICAgICAgRWRpdCBhbiBvYmplY3Qgb2YgdHlwZQogICAgICA8c3BhbiBpMThuOm5hbWU9InR5
cGUiPgogICAgICAgIDx0YWw6YmxvY2sgY29udGVudD0iaGVyZS90aXRsZV9vcl9pZCIgLz4K
ICAgICAgPC9zcGFuPiAKICAgIDwvaDE+CgogICAgZOlq4AoKICA8L2JvZHk+CjwvaHRtbD4K
cRZVDGNvbnRlbnRfdHlwZXEXVQl0ZXh0L2h0bWxxGFUFdGl0bGVxGVUAdS4AAAAAAAAKCAAA
AAAAAALDKChVJ1Byb2R1Y3RzLlBhZ2VUZW1wbGF0ZXMuWm9wZVBhZ2VUZW1wbGF0ZXEBVRBa
b3BlUGFnZVRlbXBsYXRlcQJ0cQNOdC59cQQoVQZleHBhbmRxBUsAVQJpZHEGVQVjYXNlMXEH
VRJfX2FjX2xvY2FsX3JvbGVzX19xCH1xCVUFYWRtaW5xCl1xC1UFT3duZXJxDGFzVQtfYmlu
ZF9uYW1lc3ENKGNTaGFyZWQuREMuU2NyaXB0cy5CaW5kaW5ncwpOYW1lQXNzaWdubWVudHMK
cQ5vcQ99cRBVBl9hc2duc3ERfXESVQxuYW1lX3N1YnBhdGhxE1UQdHJhdmVyc2Vfc3VicGF0
aHEUc3NiVQVfdGV4dHEVVHsBAAA8aHRtbD4KICA8aGVhZD4KICAgIDx0aXRsZSB0YWw6Y29u
dGVudD0idGVtcGxhdGUvdGl0bGUiPlRoZSB0aXRsZTwvdGl0bGU+CiAgPC9oZWFkPgogIDxi
b2R5PgoKICAgIDx0YWw6YmxvY2sgY29udGVudD0ic3RydWN0dXJlIGhlcmUvTG9jYWxpemVy
L2NoYW5nZUxhbmd1YWdlRm9ybSIgLz4KICAgIDxoMSBpMThuOnRyYW5zbGF0ZT0iZWRpdF90
eXBlX2hlYWRlciI+CiAgICAgIEVkaXQgYW4gb2JqZWN0IG9mIHR5cGUKICAgICAgPHNwYW4g
aTE4bjpuYW1lPSJ0eXBlIj4KICAgICAgICA8dGFsOmJsb2NrIGkxOG46dHJhbnNsYXRlPSIi
IGNvbnRlbnQ9ImhlcmUvdGl0bGVfb3JfaWQiIC8+CiAgICAgIDwvc3Bhbj4gCiAgICA8L2gx
PgoKICA8L2JvZHk+CjwvaHRtbD4KcRZVDGNvbnRlbnRfdHlwZXEXVQl0ZXh0L2h0bWxxGFUF
dGl0bGVxGVUAdS4AAAAAAAAKBAAAAAAAAAE9KChVIVByb2R1Y3RzLkxvY2FsaXplci5NZXNz
YWdlQ2F0YWxvZ3EBVQ5NZXNzYWdlQ2F0YWxvZ3ECdHEDTnQufXEEKFURX2RlZmF1bHRfbGFu
Z3VhZ2VxBVUCZW5xBlUCaWRxB1UHZ2V0dGV4dHEIVRJfX2FjX2xvY2FsX3JvbGVzX19xCX1x
ClUFYWRtaW5xC11xDFUFT3duZXJxDWFzVQpfbGFuZ3VhZ2VzcQ4oaAZVAmZycQ90VQtfcG9f
aGVhZGVyc3EQKFUIAAAAAAAACgVxEShVC1BlcnNpc3RlbmNlcRJVEVBlcnNpc3RlbnRNYXBw
aW5ncRN0dFFVBXRpdGxlcRRVAFUJX21lc3NhZ2VzcRUoVQgAAAAAAAAKBnEWKGgSVRFQZXJz
aXN0ZW50TWFwcGluZ3EXdHRRdS4AAAAAAAAKCwAAAAAAAACsKChVHFByb2R1Y3RzLkxvY2Fs
aXplci5Mb2NhbGl6ZXJxAVUJTG9jYWxpemVycQJ0cQNOdC59cQQoVQV0aXRsZXEFVQBVEV9k
ZWZhdWx0X2xhbmd1YWdlcQZVAmVucQdVEl9fYWNfbG9jYWxfcm9sZXNfX3EIfXEJVQVhZG1p
bnEKXXELVQVPd25lcnEMYXNVCl9sYW5ndWFnZXNxDV1xDihoB1UCZnJxD2V1LgAAAAAAAAoF
AAAAAAAAAKwoKFULUGVyc2lzdGVuY2VxAVURUGVyc2lzdGVudE1hcHBpbmdxAnRxA050Ln1x
BFUKX2NvbnRhaW5lcnEFfXEGKFUCZW5xB31xCChVDWxhbmd1YWdlX3RlYW1xCVUAVRRsYXN0
X3RyYW5zbGF0b3JfbmFtZXEKVQBVFWxhc3RfdHJhbnNsYXRvcl9lbWFpbHELVQBVB2NoYXJz
ZXRxDFUAdVUCZnJxDWgIdXMuAAAAAAAACgYAAAAAAAAAsygoVQtQZXJzaXN0ZW5jZXEBVRFQ
ZXJzaXN0ZW50TWFwcGluZ3ECdHEDTnQufXEEVQpfY29udGFpbmVycQV9cQYoVQdhbHJlYWR5
cQcoVQgAAAAAAAAKDHEIKGgBVRFQZXJzaXN0ZW50TWFwcGluZ3EJdHRRVRBlZGl0X3R5cGVf
aGVhZGVycQooVQgAAAAAAAAKCXELKGgBVRFQZXJzaXN0ZW50TWFwcGluZ3EMdHRRdXMuAAAA
AAAACgwAAAAAAAAAVigoVQtQZXJzaXN0ZW5jZXEBVRFQZXJzaXN0ZW50TWFwcGluZ3ECdHED
TnQufXEEVQpfY29udGFpbmVycQV9cQZVAmZycQdYBgAAAGTDqWrDoHEIc3MuAAAAAAAACgkA
AAAAAAAAcCgoVQtQZXJzaXN0ZW5jZXEBVRFQZXJzaXN0ZW50TWFwcGluZ3ECdHEDTnQufXEE
VQpfY29udGFpbmVycQV9cQZVAmZycQdYIAAAAMOJZGl0ZXIgdW4gb2JqZXQgZGUgdHlwZSAk
e3R5cGV9cQhzcy7/////////////////////
--------------070809080803010603040100--