Catalog Query Feature Request,was: RE: [Zope-dev] An idea for Un iqueValuesFor

sean.upton@uniontrib.com sean.upton@uniontrib.com
Mon, 30 Jul 2001 11:16:33 -0700


For what it's worth, here is the code I am using to parse the query and add
the wildcards; it seems to be non-intrusive towards all ops, including (),
and it's been moderately tested. It might be slightly inefficient, though...
Feel free to borrow and/or improve on this if it is useful.
Sean

	def queryExtender(self, query):
	        """
	        Takes, as input, query for Text index of ZCatalog, and
	        makes it more intelligent by parsing it and rewriting it
	        to include wildcards at the end of words so that we can
	        search sub-words; in other words, a search for something
	        like "engineer" should yield results for "*engineer*" so
	        that terms like "engineers" and "engineering" also are
	        considered matches.

	        Obviously, we have to be careful not to incorrectly
	        parse the query, and we don't want to mess with words
	        that already have wildcards at the end, because you
	        don't want to end up with something like "*engineer**"
	
	        """
	
	        ### Define Character Patterns to Strip Out and Split Upon
	        everythingButSearchTerms = '[^A-Za-z0-9*]+' #Regex Pattern


	        ### Create the word list
	        result = re.split(everythingButSearchTerms, query)     
	        
	        ### Get rid of empty string elements in the word list
	        try:
	            for i in range(result.count('')):
	                result.remove('')
	        except:
	            pass
	
	        ### Get rid of boolean operators
	        booleanops =
'^([Aa][Nn][Dd])|([Oo][Rr])|([Aa][Nn][Dd][Nn][Oo][Tt])|([Nn][Ee][Aa][Rr])$'
	
	        i=0 #count variable, used for indexing
	        for item in result:
	                if re.search(booleanops, item):
	                        result.pop(i)
	                i = i + 1

	        ### Now, result is a list of just the words that are 
	        ### meaningful to the search, but we need to eliminate
	        ### any entries that have wildcards in them, because 
	        ### they are likely more specific than our rewrite here
	        asteriskinterm = '(^[*])|([*]$)$' 
	                         #asterisk at start or end of term

	        i=0 #count variable, used for indexing
	        for item in result:
	                if re.search(asteriskinterm, item):
	                        result.pop(i)
	                i = i + 1

 	        ### Now, the list of words in the query we need to modify is
 	        ### final, so we can start modifying the queries, one word
 	        ### at a time...
 	        for item in result:
 	               query = re.sub(item, '*'+item+'*', query, count=1)
	
	        return query


-----Original Message-----
From: Casey Duncan [mailto:cduncan@kaivo.com]
Sent: Monday, July 30, 2001 10:36 AM
To: sean.upton@uniontrib.com; zope-dev@zope.org
Subject: Re: Catalog Query Feature Request,was: RE: [Zope-dev] An idea
for Un iqueValuesFor


sean.upton@uniontrib.com wrote:
> 
> I could definitely see the value of a unique-values query into ZCatalog,
> especially for creating things using <dtml-tree> using keywords, etc...

I'm wondering the best way to implement this on the API side, since it
would change the output from catalog results to just attribute values.
Any thoughts?

> 
> On a slightly related (well, not really) note, CatalogQuery looks like it
> would solve a lot of problems I have had with a very Catalog-intensive
> application.  One thought I had - I might suggest the possibility of
adding
> a fuzzy matching operator to CatalogQuery that performs the function of
> wrapping wildcard searches on search terms for Text Indexes, supposing the
> Catalog is using a globbing vocabulary:
> 
> ~=      as an operator would mean an approximate (substring) match
> 
> So a search for 'title ~= "engineer"' would perform a search for
> '*engineer*' and return results containing words like engineer, engineers,
> engineering, etc.

That sounds like a good idea. Would a simple split/join work, something
like:

ops = ('and', 'or')
words = query_string.lower().split()
for word in words:
    if word not in ops: word = '*%s*' % word
query_string = words.join()

I can look at adding this capability

> 
> Right now, I attempt to safely rewrite
REQUEST['someFieldThatIamSearching']
> with a Python class method that uses a zillion re.sub() calls to wrap
search
> terms in * characters; I wonder if there is a way to alternately implement
> something like this at a lower level, perhaps in CatalogQuery; I get the
> feeling it would be quicker and much more simple.
> 
> If something like that were implemented as well as some equivalent to
> sort_on, I'd stop pulling my hair out with traditional workarounds and
> definitely switch all my stuff to use CatalogQuery instead...

Yeah, I definitely want to add a sort_on capability. I think I will
implement it as an optional argument, like it is for ZCatalog, rather
than as part of the query string, at least for now.

> 
> Thoughts?
> 
> Sean
> 
> -----Original Message-----
> From: Casey Duncan [mailto:cduncan@kaivo.com]
> Sent: Monday, July 30, 2001 8:20 AM
> To: Chris Withers
> Cc: zope-dev@zope.org; Anthony Baxter
> Subject: Re: [Zope-dev] An idea for UniqueValuesFor
> 
> Chris Withers wrote:
> >
> > Casey Duncan wrote:
> > >
> > > possibly, yes. I'll look to add this to my CatalogQuery product. I
> > > believe the btrees can be pressed into service here...
> >
> > Hadn't heard of this CatalogQuery product... where can I find out more?
> >
> > I think I may have been about to develop something similar, so maybe we
> can help
> > each otehr out?
> >
> > cheers,
> >
> > Chris
> >
> 
> http://www.zope.org/Members/Kaivo/CatalogQuery
> 
> This is my first stab at this. I forsee a much more general query
> mechanism in the future, but this works better than the stock stuff (for
> me) and it works today!
> 
> Let me know what your ideas are...
> 
> --
> | Casey Duncan
> | Kaivo, Inc.
> | cduncan@kaivo.com
> `------------------>
> 
> _______________________________________________
> Zope-Dev maillist  -  Zope-Dev@zope.org
> http://lists.zope.org/mailman/listinfo/zope-dev
> **  No cross posts or HTML encoding!  **
> (Related lists -
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope )

-- 
| Casey Duncan
| Kaivo, Inc.
| cduncan@kaivo.com
`------------------>

_______________________________________________
Zope-Dev maillist  -  Zope-Dev@zope.org
http://lists.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://lists.zope.org/mailman/listinfo/zope-announce
 http://lists.zope.org/mailman/listinfo/zope )