[Zope-Checkins] SVN: Zope/branches/2.11/ LP/#324876: tighened regex for detecting the charset

Andreas Jung andreas at andreas-jung.com
Wed Feb 4 05:57:46 EST 2009


Log message for revision 96069:
  LP/#324876: tighened regex for detecting the charset
  from a meta-equiv header
  

Changed:
  U   Zope/branches/2.11/doc/CHANGES.txt
  U   Zope/branches/2.11/lib/python/Products/PageTemplates/utils.py

-=-
Modified: Zope/branches/2.11/doc/CHANGES.txt
===================================================================
--- Zope/branches/2.11/doc/CHANGES.txt	2009-02-04 10:56:35 UTC (rev 96068)
+++ Zope/branches/2.11/doc/CHANGES.txt	2009-02-04 10:57:45 UTC (rev 96069)
@@ -19,6 +19,9 @@
 
     Bugs Fixed
 
+      - LP/#324876: tighened regex for detecting the charset
+        from a meta-equiv header
+
       - configure script: setting ZOPE_VERS to '2.11'
 
       - Acquisition wrappers now correctly proxy __iter__.

Modified: Zope/branches/2.11/lib/python/Products/PageTemplates/utils.py
===================================================================
--- Zope/branches/2.11/lib/python/Products/PageTemplates/utils.py	2009-02-04 10:56:35 UTC (rev 96068)
+++ Zope/branches/2.11/lib/python/Products/PageTemplates/utils.py	2009-02-04 10:57:45 UTC (rev 96069)
@@ -20,7 +20,15 @@
 
 
 xml_preamble_reg = re.compile(r'^<\?xml.*?encoding="(.*?)".*?\?>', re.M)
-http_equiv_reg = re.compile(r'(<meta.*?http\-equiv.*?content-type.*?>)', re.I|re.M|re.S)
+# This regular expression is defined extremely carelessly. It starts
+#  with a tag beginning with 'meta' and extends until an arbitrary
+#  'content-type' (maybe in a completely unrelated element).
+#  Tighten the expression a bit.
+#  Note that using a regular expression at all is unreliable as it does
+#  not know about e.g. HTML comments. A robust solution would need to
+#  use an HTML parser to locate the 'meta' tag.
+#http_equiv_reg = re.compile(r'(<meta.*?http\-equiv.*?content-type.*?>)', re.I|re.M|re.S)
+http_equiv_reg = re.compile(r'(<meta\s+[^>]*?http\-equiv[^>]*?content-type.*?>)', re.I|re.M|re.S)
 http_equiv_reg2 = re.compile(r'charset.*?=.*?(?P<charset>[\w\-]*)', re.I|re.M|re.S)
 
 def encodingFromXMLPreamble(xml):



More information about the Zope-Checkins mailing list