[Zope-dev] ZCatalog text index search bugs?

R. David Murray bitz@bitdance.com
Mon, 12 Jun 2000 19:59:20 -0400 (EDT)


I am very confused.

I'm looking at the SearchIndex source under 2.1.4 (2.1.6 seems to be
the same).  In Lexicon.py the 'query' method defines the default_operator
to be 'or'.  I can't see that TextIndex overrides this when it calls
it.

But the response to PR 1141 (against 2.1.6) in the collector says:

          The TextIndex search does an AND, not an OR, of the search
          words: if you ask it to find "foo bar", it returns only
          objects matching *both* "foo" and "bar", rather than object
          matching *either* "foo" or "bar" (which Jason expected).

Indeed, if you do a search that includes a word that is not on an
item, the item is not returned.  So how is that working?

A possible answer is:  if you do a search like 'something or
somethingelse', this *also* does not return the object if one of
those words is not on the object.  So is 'or' searching broken?

Note that if you do a search like "something or with", this returns
the object, "with" being a stop word.   So does "something with".
On the other hand, "something and with" does *not* return the
object.

So I think 'or' searching is broken, and that text indexes being
a default 'and' search is just an accident <grin>.

Following up on the 'something and with', though:  Since "with" is
a stop word, it can never be on the object.  Since the user entering
search words into the search form doesn't know what the list of
stop words is, this stikes me as broken behavior.  Anyone disagree?

I also have a problem with a word such as "T-shirt".  If I search
on "T-shirt", my object that has that word in its text index does
not show up.  The splitter should be breaking that into "t" and
"shirt", right?  Is the problem that single letters are discarded
by the Splitter, therefore T is like a stop word (but it isn't in
the stopword table), therefore the implicit 'and' search(*) fails?
To corroborate this, a search for "something t" finds that
record, but "something and t" does not.  This can't be the whole
answer, though, since searching on just 'shirt' does *not*
return the object.

(*) I recall reading that the 'near' operator, which is used if
the splitter breaks up a word in the search string, is not really
supported and that the 'and' operator is used instead.)

I can't tell yet if this bug is (these bugs are?) fixed in 2.2.0b1
since I can't see the source release yet.  Looking at the a1 source,
things have moved around a bit.  But I see that "default_operator"
is still set to 'or', so I suspect these bugs may remain...

If I can reproduce this in 2.2.0b I'll file it in the collector.

--RDM