[Zope-dev] Problem with XML in Zope 220b1

Brian Lloyd Brian@digicool.com
Tue, 13 Jun 2000 15:43:16 -0400


> I have a lot of Chinese XML-files stored in Zope. The 
> internal encoding is
> UTF-8. Everything was fine in 214, 216 and 220a1. Now with 220b1, some
> characters are (apparently?) randomly turned into lt;, gt; 
> and the like. Now
> this looks like some unwanted HTML escaping, but the leading 
> '&' is missing
> and the characters are definitely all in the range greater 
> 127 (this is a
> property of UTF8), so there is no direct relationship to the 
> codepoints of
> >, < and co.
> 
> Any ideas what could have gone wrong here?

Yes - during the alpha period we got a bug report concerning the fact 
that Netscape browsers honor the windows "extended Latin-1" characters 
\213 and \233 (which are < and >). That means that if you don't filter 
those as a part of html_quote 'ing then some Netscape versions are 
open to the same sort of script-kiddie attacks that they would be if 
the HTML was not quoted at all :(

I'm not quite sure what the right answer is here. How are you using
the html_quote format in your application?


Brian Lloyd        brian@digicool.com
Software Engineer  540.371.6909              
Digital Creations  http://www.digicool.com