[Zope-dev] Hey Chris, question for you

Casey Duncan cduncan@kaivo.com
Wed, 27 Jun 2001 08:37:52 -0600


Chris McDonough wrote:
> 
> Hi casey,
> 
> Changes were recently made to Field/Keyword Indexes so that they will
> store empty items.  An equivalent change could be made to TextIndexes...
> we'd need to think about that a bit.
> 
> But for your purposes, you might want to start out attempting to write
> your operator implementation using Field and Keyword indexes...
> 
> - C
> 
> Michel Pelletier wrote:
> >
> >
> > Hmm the reason for the current behavior was optimization by saving space
> > not indexing empty values.  The problem with your latter aproach is that
> > "all objects in the catalog" may include object that don't have a title
> > attribute at all.
> >
> > I'm not against indexing empty values though.
> >
> > -Michel
> >

My implementation does not modify the behavior of the indexes in any
way, and I would like to keep it that way if possible. I have been able
to (thus far) pull this off without compromises, which was my hope in
the beginning.

I guess the question here is given the query:

spam != 'eggs'

Should objects be returned that do not have an attribute "spam" at all.
For the behavior to be intuitive, I would say yes, but that is just my
opinion. I also though of an optimization that could eventually be
included if this behavior is adopted. for example, take the following
query expression:

title == 'foo' and spam != 'eggs'

As implemented, my query engine does the following:

1. Find items where title  matches 'foo' (exact behavior depends on
index type)
2. Find items where spam matches 'eggs'
3. Take the difference of all items in the index spam and the result of
#2
4. Return the intersection of #3 and #1

To be "intuitive" (I use that term loosely) I think it should be:

1. Find items where title  matches 'foo'
2. Find items where spam matches 'eggs'
3. Take the difference of all items in the catalog and the result of #2
4. Return the intersection of #3 and #1

Which can be optimized as:

1. Find items where title  matches 'foo'
2. Find items where spam matches 'eggs'
3. Return the difference #1 and #2

If an "or" is used in place of the "and", then the optimization doesn't
apply though.

One other thing:

I noticed (with a colleague) that passing a list of values to a
FieldIndex and a TextIndex results in nearly opposite behavior. The
fieldIndex does a union on the results of querying against each item in
the list whereas TextIndex does an intersection. This seemed highly
inconsistent to me, another thread perhaps...

-- 
| Casey Duncan
| Kaivo, Inc.
| cduncan@kaivo.com
`------------------>