[Zope-CVS] CVS: Products/ZCTextIndex - ZCTextIndex.py:1.1.2.15

Fred L. Drake, Jr. fdrake@acm.org
Tue, 7 May 2002 17:38:00 -0400


Update of /cvs-repository/Products/ZCTextIndex
In directory cvs.zope.org:/tmp/cvs-serv11800

Modified Files:
      Tag: TextIndexDS9-branch
	ZCTextIndex.py 
Log Message:
Splitter:  Pre-compile the regex so SRE doesn't need to look it up all
    the time.  Not a biggie, but it seems to be a good idea.

StopWordRemover:  Instead of testing the length of the word and
    inclusion in the stop words dict as two separate tests, add all
    1-character 8-bit strings to the dict and just use one test for
    each word.


=== Products/ZCTextIndex/ZCTextIndex.py 1.1.2.14 => 1.1.2.15 ===
 
 import re
+import string
 
 import ZODB
 from Persistence import Persistent
@@ -67,10 +68,12 @@
 
 class Splitter:
 
+    rx = re.compile(r"\w+")
+
     def process(self, lst):
         result = []
         for s in lst:
-            result += re.findall(r"\w+", s)
+            result += self.rx.findall(s)
         return result
 
 class CaseNormalizer:
@@ -80,8 +83,10 @@
 
 class StopWordRemover:
 
-    dict = get_stopdict()
+    dict = get_stopdict().copy()
+    for c in range(255):
+        dict[chr(c)] = None
 
     def process(self, lst):
-        d = self.dict
-        return [w for w in lst if len(w) > 1 and not d.has_key(w)]
+        has_key = self.dict.has_key
+        return [w for w in lst if not has_key(w)]