[Zope] Squishdot 1.1.0 (stripogram.py): HTML filtering problems

J M Cerqueira Esteves jmce@artenumerica.com
Tue, 8 May 2001 20:25:00 +0100


On Tue, May 08, 2001 at 12:48:10AM +0100, Chris Withers wrote:
>    - HTML parsing now done using the Strip-o-Gram library.

A few minutes ago, I made a few tests with the html2safehtml function in
stripogram.py and found that it is possible to force inclusion of arbitrary
tags in the output text.

  html2safehtml ('Roses <b>are</B> red,<br>violets <i>are</i> blue', 
                 valid_tags=['b', 'i', 'br'])

returns

    'Roses <b>are</b> red,<br>violets <i>are</i> blue'

as expected, but

  html2safehtml ('Roses <b>are</B> red,<br/>violets <i>are</i> blue', 
                 valid_tags=['b','i','br'])

returns

    'Roses <b>are</b> red,<br>>violets <i>are<i> blue'

Notice that the (valid for XHTML) '<br/>' becomes  '<br>>'
and the closing '</i>' at the end comes out as... '<i>'.

But it gets more interesting: the result of 

  html2safehtml ('Roses <b>are</B> red,<br/QUACK>violets <i>are</i> blue', 
                 valid_tags=['b','i','br'])
is 
    'Roses <b>are</b> red,<br>QUACK>violets <i>are<i> blue'

inspiring one to write

  html2safehtml ('Roses <b>are</B> red,<br/<QUACK>violets <i>are</i> blue', 
                 valid_tags=['b','i','br'])     

getting  'Roses <b>are</b> red,<br><QUACK>violets <i>are<i> blue'

or even 

  html2safehtml ('Roses <b>are</B> red,<br/<blink>QUACK<//blink> violets '
                 '<i>are</i> blue', 
	         valid_tags=['b','i','br'])

successfully smuggling a <blink>...</blink> inside the result:

       'Roses <b>are</b> red,<br><blink>QUACK</blink> violets <i>are</i> blue'

(Notice that the closing '</i>' is now OK again, and that I had to use
'<//blink>' in order to get '</blink>'.


Maybe a problem with sgmllib?  I have no time for further tests now...

-- 
 jmce:  +351 919838775   ~  http://jmce.artenumerica.org/