[ZPT] Makeing PageTemplate's edit pages Unicode aware

Stuart Bishop stuart.b at commonground.com.au
Mon Mar 29 03:13:06 EST 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On 27/03/2004, at 9:57 PM, Dieter Maurer wrote:

> Stuart Bishop wrote at 2004-3-25 12:27 +1100:
>> Currently, if you enter non-ascii text into the title or contents
>> fields on a PageTemplate's edit page, the data ends up stored as
>> an encoded string (using management_page_charset, if it is set. 
>> Unknown
>> encoding if it is not).
>>
>> This should be easy to fix using the foo:charset:ustring notation
>> to have Zope convert the encoded strings to Unicode. However, the
>> file upload  feature is more problematic. Should the file upload
>> try converting the file to Unicode from UTF-8 and raise an exception
>> if this is not possible? I personally feel this is preferable to
>> ending up with arbitrarily enncoded document source, with no idea
>> of the character set used.
>
> I do not think that Zope should convert when it does not know the
> encoding. I am unaware that a missing "management_page_charset"
> can be interpreted as "UTF-8". If this were the case, converstion
> to unicode might be correct. By the way: the HTML specification
> says that uploaded files should come with a "content-type" declaration.
> In this case, the charset specified there (if any) should be used
> to determine the encoding.

Yes - A missing management_page_charset should probably be interpreted
as either US-ASCII or ISO-8859-1. US-ASCII is probably more correct,
but I would guess that most browsers will be configured to use
ISO-8859-1 as their default (and this might be specified in the HTML
spec?)

I guess using the charset type the browser tells us for file uploads
means we can blame the browser. I don't know how this could be reliable
(since text files themselves don't encode their character set unless
they happen to be UTF-16 or have a BOM). I am wondering if having a
file upload  function is incompatible with a Unicode aware page
templates product.

If management_page_charset is not set, it is unknown what charset
is being used. The only way of knowing the character set of data that
has been submitted is to know the character set of the form that it
was submitted from. All other mechanisms do not work due to
incompatibilities in how the browsers work.

Currently, if you create a page template that contains non-ASCII
characters, any tal:content or tal:replace expressions that return
Unicode will now raise a Unicode error. This can be demonstrated
simply:
     <html>
       <div>My 2¢</div>
       <div tal:content="python:u'My 2\N{CENT SIGN}'">My 2¢</div>
     </html>
	
These are the things I think need to be fixed in Zope's Page Templates
implementation to make them Unicode aware. There may be more (?):

	- It should be possible for the actual page template source to
		be stored as a Unicode string. Currently, there is an assert
		ensuring it is a traditional string.

	- The title property should be a Unicode string.

	- PageTemplateFile should grow an optional charset parameter,
	  defaulting to US-ASCII.

	- PageTemplate.write(text) should raise an exception if text
	  is not either a Unicode string or an ASCII string.

     - The ZopePageTemplate edit page should use Zope's
	  :charset:ustring notation so Unicode strings get passed
	  to its handler.

	- The file upload widget needs to either be removed, or grow
	  a charset box. I don't think either of these solutions are
	  ideal :-(

Note that when I say 'Unicode string', we can still store ASCII
text using a traditional string to save space.

My application is currently using a ZopePageTemplate subclass that
has been modified to use Unicode strings for the document source
and title, and it seems to be functioning just fine. Does anyone
know if that "assert type(text) == type('')" in PageTemplate.write
is there for a reason?

- --  
Stuart Bishop <stuart at stuartbishop.net>
http://www.stuartbishop.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (Darwin)

iD8DBQFAZ9qWAfqZj7rGN0oRAkNUAJ9DzbEUOSsLSbhl4lAwbi0vTxVxdQCdFHHh
K4vCbBjEusbPI+iuu8E+7oY=
=nLPs
-----END PGP SIGNATURE-----




More information about the ZPT mailing list