[Zope3-Users] Re: Unicode for Stupid Americans (like me)?

Jeff Shell eucci.group at gmail.com
Wed Feb 28 16:06:39 EST 2007


On 2/28/07, Philipp von Weitershausen <philipp at weitershausen.de> wrote:
> Jeff Shell wrote:
> > - Not have any encode / decode errors. 'ascii codec doesn't recognize
> > character ... at position ...'. I don't want to keep on bullying
> > through whenever this pops up.
>
> You can't just simply do str(some_unicode) or unicode(some_str), unless
> you really know that you're only dealing with the ASCII subset in both
> cases. Use explicit encodings to convert.
>
> Now, the trick is obviously to know the encoding. A 'str' object is
> worth squat if you don't know the encoding that goes along with it. In
> other words, (some_str, encoding) is isomorph to a unicode object.

Ahh. I finally get this now. I was casting back and forth with wild
abandon in some key places - in one particular place I was doing wild
encoding somersaults when I really meant to be doing a small set of
decode tries. I think this is why I was seeing customer garbage: I was
turning unicode into strs and back again long before the final
response was all built up.

> >  - HOW do I know what a browser has sent me? There doesn't seem to be
> > a real way of handling this. Do I guess?
>
> That's sorta what zope.publisher does. Actually, it figures that if the
> browser sends an Accept-Charset header, the stuff that its sending to us
> would be encoded in one of those encodings, so it tries the ones in
> Accept-Charset until it's lucky. It falls back to UTF-8.
>
> This seems to work. But yeah, it's relying on implementation details of
> the browser and it's weird.

Ugh. I don't know how I missed that header. I was always looking for a
content-type on the post, hoping that it had the information.

I was finally able to confirm that Zope was handing me the data
properly; it was some of my HTML generation code that was mangling
data on output.

> > But again,
> > how do I know when to decode from latin-1 and when to decode from
> > UTF-8? When or why should I encode to one or the other at response
> > time? Should I worry at all?
>
> If you're using Zope, you don't have to encode outgoing text at all,
> unless you're setting a non-text content-type on the outgoing response.
> If the context-type is text/*, you can just return unicode from your
> browser view and zope.publisher will use the best encoding that the
> browser prefers (from Accept-Charset). "Best" meaning that if the
> browser accepts latin-1,utf-8 and your page contains Korean text, it'll
> use utf-8, not latin-1. utf-8 is always a fallback, anyway, so that
> there's no chance to not be able to encode.

This finally made sense to me as well. I had a form with a widget
rendered by my own HTML generation code, and with a zope.app.widgets
text field. I pasted Sam Ruby's "Internationalization" diacritic-heavy
string into both fields. When I saw that the zope.app.widget was
rendering properly while my own field was not, that sealed it.
Unfortunately, all of my prior tests had involved my own widget, since
that is where I had seen the junk characters.

Now I ensure that my HTML generator is all unicode. Any basic string
that it encounters, which typically come from source code, is decoded
into unicode immediately. As mentioned above, I was wildly and
inappropriately encoding to strings with some forceful settings so
that I could join elements together.

> I'm wondering if I make this clear enough in my book. It's always hard
> to tell by myself since these things seem obvious to me. If you got any
> constructive feedback regarding this, I'll be more than happy to hear it
> and consequently improve the book for you "Stupid Americans" :).

At quick glance, I didn't see where this might have been described.
There's no mention of unicode in the back index, and from the table of
contents I didn't see much besides the chapter on internationalization
(which we're completely avoiding until we absolutely need to do it).

But this helps. Between all of the answers I've received thus far, I
finally have a grasp of what I'm doing. I'll try to codify it into a
useful document.

Thanks!

-- 
Jeff Shell


More information about the Zope3-users mailing list