[Zope-dev] TALParser barfing on byte-order marked utf8 XML files.

Romain Slootmaekers romain at zzict.nl
Fri Jul 9 09:21:16 EDT 2004


Yo,

We are using TAL for things other than ZPT. but are having problems with 
files that include a BOM preamble.

the problem is that althought the underlying XML parser is capable of 
parsing these kind of files, TALParser initialises his parent without 
encoding (XMLParser.__init__(self) in TALParser.py  line 27)

Anyway,
I have attached a small example (test.py  + test.ml) that illustrates 
the problem with Zope 2.7.1.

running the test gives:

UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in 
position 0: ordinal not in range(128)
which is perfectly logical: feff (the start of the bom preamble) is not 
ascii.

chipping away the preamble (data=data[4:] ) gives problems further on in 
the file as the test example has some german characters (ä)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in 
position 50: ordinal not in range(128) which is also perfectly logical: 
ä has code 132.


My question is simply: why is TALParser not taking the encoding into 
acount ? Is this deliberate, or is it an oversight ?


Romain Slootmaekers.



-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.xml
Type: text/xml
Size: 93 bytes
Desc: not available
Url : http://mail.zope.org/pipermail/zope-dev/attachments/20040709/cd31fbf0/test.xml
-------------- next part --------------
#
#
#
from xml.dom.minidom import parseString

import sys
from TAL.TALParser import TALParser
from TAL.TALInterpreter import TALInterpreter
from TAL.DummyEngine import DummyEngine
import StringIO

import codecs



print sys.getdefaultencoding()

def readData():
    f = open('test.xml','r')
    
    readerClass = codecs.getreader('utf8')
    print readerClass
    reader = readerClass(f)
    data = reader.read()
    f.close()
    print "size = %s" % len(data)
    return data


def expand(xml):
    
    parser = TALParser()
    xml = xml[4:]
    parser.parseString(xml)
    program, macros = parser.getCode()
    engine = DummyEngine(0)
    out = StringIO.StringIO()
    interpreter = TALInterpreter(program,macros,engine,stream=out)
    interpreter()
    result = out.getvalue()
    
    return result

data = readData()
expanded = expand(data)
document = parseString(expanded)

print "ok"


More information about the Zope-Dev mailing list