[Grok-dev] Problem with character encoding

Sebastian Ware sebastian at urbantalk.se
Tue Jul 8 17:49:21 EDT 2008


8 jul 2008 kl. 23.41 skrev Luciano Ramalho:

> On Tue, Jul 8, 2008 at 6:29 PM, Sebastian Ware  
> <sebastian at urbantalk.se> wrote:
>> I know this is slightly off topic but maybe there is a simple answer.
>>
>> I have a unicode attribute [message] of an object stored in the  
>> ZODB and I
>> want to encode it to "iso-8859-1" and the use urllib.urlencode to  
>> create
>> parameters for a http post operation.
>>
>> The problem is that the characters "åäöÅÄÖ" are encoded to this:
>>
>> '%E5%E4%F6%C5%C4%D6'
>>
>> but should be encoded to this:
>>
>> '%C3%A5%C3%A4%C3%B6%C3%85%C3%84%C3%96'
>>
>> I notice the following (the first one is what I want):
>>
>>>> u'å'.encode('iso-8859-1')
>>  '\xc3\xa5'
>>>> self.context.message[0].encode('iso-8859-1')
>>  '\xe5'
>>
>> Any hints?
>
> Something is wrong in this picture, Sebatian, because you say you want
> to encode with iso-8859-1 but then you say the correct encoding is one
> with two bytes per character. However, iso-8859-1 uses only one byte
> per character. It is UTF-8 which uses 2 or more bytes for non-ASCII
> characters. Did I misuntersdand your message, or are you working too
> late?
>
> Cheers,
>
> Luciano


Many thanks for your patience Luciano! I wish I was just tired, but  
unfortunately it is the character encoding that confuses me :(

I was expecting

    u'å'.encode('iso-8859-1')

to encode the unicode string to a 'iso-8859-1' encoded string, but as  
you are pointing out, it returns a two byte encoding. However, it is  
eventually encoded properly by urllib.urlencode and allows me to (in  
this case) send an sms with non-ascii characters.

The spec I need to meet is:

  -perform a http-post with a 'iso-8859-1' encoded string

I can do it in the python interpreter, but once I use a string stored  
in the Zodb, non-ascii characters go bonkers...

Mvh Sebastian


More information about the Grok-dev mailing list