I can second this. With CVS-Zope (did the last cvs up this moment) I'm getting a very curios thing: Displaying .../index_html is ok. But return context.index_html(context,request) creates broken characters instead is isolatin1 Umlaute. In my case (Konqueror on Linux) it seems that the text/html;charset=UTF-8 breaks the page because the byte values are correct for the "Umlaute". This is further confirmed by the fact that forcing Konq to display iso8859-1 fixes the display.
Hmm, you may check out http://collector.zope.org/Zope/517 but it could be the same difficulties as we experienced earlier.
The problem here was that Zope thought it was returning UTF-8, while it was really returning ISO-8859-1. This was due to the <dtml-var "u''"> statement not having the desired effect. <dtml-var "u' '"> (notice the space) seemed to work brilliantly.
So how are these Unicode changes supposed to work? Are non-ascii characters forbidden now? And how do I get UTF-8 text into Zope?
There are converters inside ZOPE. UTF-8 is simply a transport format, although it may be used for storage to save space. There is lots of software that supports UTF-8 today. This is the future.
While I'm quite sure that this will help Zope in the Asiatic region, it seems quite inconvienent for isolatin1 world :(
This will be a win in Europe as well, especially for multilingual sites. IIRC there are 15 variants of ISO-8859-1.
I18N is *very* important, and Unicode is an essential ingredient.
Arnar Lundesgaard
Hi
I'm Japanese. #Sorry so I don't write english well.
While I'm quite sure that this will help Zope in the Asiatic region, it seems quite inconvienent for isolatin1 world :(
This will be a win in Europe as well, especially for multilingual sites. IIRC there are 15 variants of ISO-8859-1.
I18N is *very* important, and Unicode is an essential ingredient.
[1] These 3 filses (total 3 line) change from "encode('latin1')" to "encode('utf-8')" $find . -name '*.py' -exec grep -l 'encode.*latin1' {} ; ./lib/python/ZPublisher/Converters.py ./lib/python/ZPublisher/HTTPRequest.py ./lib/python/ZPublisher/HTTPResponse.py
[2] This line change from 'iso-8859-1' to 'utf-8' lib/python/App/dtml/manage_page_header.dtml <dtml-call "REQUEST.set('management_page_charset','iso-8859-1')">
These changes seems well work for me. But I have not enough test.
Are there some reason to treat unicode string not as 'utf-8' but as 'latin1'(iso-8859-1)?
Additionaly Japanese languese have some encoding, euc-jp, shift_jis, iso-8859-jp and utf-8 (and another?). I want mechanism to change encoding dinamically.
[2] This line change from 'iso-8859-1' to 'utf-8' lib/python/App/dtml/manage_page_header.dtml <dtml-call "REQUEST.set('management_page_charset','iso-8859-1')">
Bad news. That will cause all management form submissions to encode strings in utf8. 99% of the methods to which the strings are being submitted will not be expecting this, and will corrupt characters whose unicode code point is >127.
If you have a ZMI form that *is* expecting this then you need to make some other changes to avoid breakage. Essentially just adding :utf8: marshalling tags, possibly some :strings: and :ustring: too.
(yes, this sucks. The problem is that browsers dont specify the character encoding used in form submissions. At some point we need to discuss the way forward on this issue....)
[1] These 3 filses (total 3 line) change from "encode('latin1')" to "encode('utf-8')" $find . -name '*.py' -exec grep -l 'encode.*latin1' {} ; ./lib/python/ZPublisher/Converters.py ./lib/python/ZPublisher/HTTPRequest.py ./lib/python/ZPublisher/HTTPResponse.py
Even more bad news. Suppose a dtml page it not yet prepared to handle unicode (because it hasnt had the changes described above) but it 'accidentally' encounters a unicode attribute. This happens often in the ZMI when objects have a unicode 'title', because many products render the title attribute of *other* objects. We cant force the response to utf8 because this will cause the same breakage described above.
For more details see:
http://www.zope.org/Members/htrd/howto/unicode http://www.zope.org/Members/htrd/howto/unicode-zdg-changes
But I have not enough test.
obviously. ;-)
I want mechanism to change encoding dinamically.
the manage_properties page that Arnar Lundesgaard has been working with is a good example. It switches between latin-1 and utf-8 automatically depending on whether any unicode properties have been defined. (to support *really* old browsers that dont understand utf8)
In article 200209270735.46452.tdickenson@geminidataloggers.com you write:
If you have a ZMI form that *is* expecting this then you need to make some other changes to avoid breakage. Essentially just adding :utf8: marshalling tags, possibly some :strings: and :ustring: too.
(yes, this sucks. The problem is that browsers dont specify the character encoding used in form submissions. At some point we need to discuss the way forward on this issue....)
There is a standard accept-charset attribute of forms, which says what encodings are accepted by the form handler (Zope here). I think we should use it and set it to UTF-8 in those cases.
Florent
(yes, this sucks. The problem is that browsers dont specify the character encoding used in form submissions. At some point we need to discuss the way forward on this issue....)
There is a standard accept-charset attribute of forms, which says what encodings are accepted by the form handler (Zope here). I think we should use it and set it to UTF-8 in those cases.
Just to be clear, this is an HTML attribute of the <form> tag. For instance:
<form action="foo" ... accept-charset="UTF-8"> ... </form>
This instructs the browser it should send the content of the form in the accepted charset. As a default, it is recommended that user agents use the encoding of the document, but this is not a strict requirement in HTML4.
Florent
On Saturday 28 Sep 2002 4:38 pm, Florent Guillaume wrote:
(yes, this sucks. The problem is that browsers dont specify the character encoding used in form submissions. At some point we need to discuss the way forward on this issue....)
Just to be clear, this is an HTML attribute of the <form> tag. For instance:
<form action="foo" ... accept-charset="UTF-8"> ... </form>
This instructs the browser it should send the content of the form in the accepted charset.
Yes, accept-charset could be part of a full solution to this problem, but I dont think it is a whole solution....
Are you suggesting that a method could assume its form submissions would always be made in utf-8? That would cause problems if a submission was made from: * some other form that didnt have an accept-charset * some non-browser code that synthesizes http requests
A further problem is that we want this decoding to be performed in ZPublisher, but that that point in the publishing process it doent know which method is going to be called. That means the utf8 assumption cant be made independantly for each method.
One answer to this problem is when browsers include the charset attribute in "multipart/form-data" POST requests. ZPublisher knows unambiguously what encoding was used by the browser.
Sadly I cant see a nice way to do the same for GET requests
On Mon, 2002-09-30 at 09:17, Toby Dickenson wrote:
On Saturday 28 Sep 2002 4:38 pm, Florent Guillaume wrote:
<form action="foo" ... accept-charset="UTF-8"> ... </form>
This instructs the browser it should send the content of the form in the accepted charset.
Yes, accept-charset could be part of a full solution to this problem, but I dont think it is a whole solution....
Are you suggesting that a method could assume its form submissions would always be made in utf-8? That would cause problems if a submission was made from:
- some other form that didnt have an accept-charset
- some non-browser code that synthesizes http requests
Yes, there is no good way.
A further problem is that we want this decoding to be performed in ZPublisher, but that that point in the publishing process it doent know which method is going to be called. That means the utf8 assumption cant be made independantly for each method.
One answer to this problem is when browsers include the charset attribute in "multipart/form-data" POST requests. ZPublisher knows unambiguously what encoding was used by the browser.
This really sucks, you'd think that by 2002 all recent browsers would send a content-type:text/plain;charset=foobar in multipart/form-data, as the spec (from 1998) recommends... But even Mozilla 1.1 doesn't do it.
Sadly I cant see a nice way to do the same for GET requests
As an aside, one interesting tidbit about Mozilla: if you paste Unicode into a field of a form without an explicit encoding (accept-charset or document-charset), it encodes Unicode characters into &#xxxx; and sends that on the wire.
Anyway, in the near future I see no alternative to putting :utf8: into field names, and using accept-charset="UTF-8" or an utf-8 encoding for the document.
Florent