[Zope] Re: relevance ranking in ZCTextIndex or equivalent

Miles Waller miles at jamkit.com
Fri Jun 2 09:43:04 EDT 2006


Hi,

Thanks for the help.  From my investigations, it seems it's not possible 
to meet the requirements in a super-straightforward way - a query that 
uses several text indexes adds each individual score together, so the 
only output available is the total score.

Trying to separate the scores out (for example so it's a tuple 
(title_score, description_score, body_text_score) that I can sort on) 
looks quite hard - it looks like it would mean changing the indexes to 
return the scores in this different format.

My latest approach is to do something like the following (untested):

from BTrees.IIBTree import difference

def specialSearch(words):

    # i'm going to manipulate the indexes directly
    getIndex = portal_catalog._catalog.getIndex

    r1, id1 = getIndex('Title')._apply_index( {'Title':words} )
    r2, id2 = getIndex('Description')._apply_index( {'Description':words} )
    r3, id3 = getIndex('SearchableText')._apply_index( 
{'SearchableText':words} )

    # de-dupe this set of results
    r3 = difference(r3, r2)
    r2 = difference(r2, r1)

    # now i have 3 IIBuckets, consisting of (docid, score) tuples
    # i sort them into order on score
    r1 = r1.byValue(0)
    r2 = r2.byValue(0)
    r3 = r3.byValue(0)

    # concatenate them, preserving the order
    res = r1 + r2 + r3

    # return something catalog brain-like
    return LazyMap(catalog.__getitem__, rs, len(rs))

My debug-prompt tests seem to indicate that this should work.  I don't 
know if anyone who knows more about lists and btrees can comment if 
there's a better way to do the sorting and concatenation of the 
different result sets.

Thanks,

Miles




Jonathan wrote:
> 
> ----- Original Message ----- From: "Miles Waller" 
> <miles-HeBKeAamoVjQT0dZR+AlfA at public.gmane.org>
> To: <zope-CWUwpEBWKX0 at public.gmane.org>
> Sent: Wednesday, May 31, 2006 10:59 AM
> Subject: [Zope] relevance ranking in ZCTextIndex or equivalent
> 
> 
>> Hi,
>>
>> I'm planning to implement a text search where
>>
>> (match against the title)
>>  ranks more highly than
>> (match in the description)
>>  ranks more highly than
>> (matches against the body text).
>>
>> Titles and descriptions are short bits of text, so results in these
>> categories can be ranked just by the frequency that the word appears in
>> that part of the text.  Matches against the body text should ideally be
>> ranked more like ZCTextIndex rather than plain frequency.
>>
>> My ideas are:
>>
>> - do three separate searches, and then concatenate the result sets
>> together.
>> problem: making sure there are no duplicates in the list without parsing
>> all the results in their entirety.
>>
>> - hijack the 'scoring' part of the index, so those results with matches
>> in the title can have their scores artificially heightened to achieve
>> the ordering i want
>> problem: it's compleletely opaque without a lot of study whether this
>> would achieve what i want.  i'd also need to index the items so the
>> index knew what was in the title, which could be a problem.
>>
>> - index title, description and text separately, and then use dieter's
>> AdvancedQuery product to do the query and combine results
>> problem: is it possible to get at the scores when the documents are
>> returned from the index to be able to order them?  are the scores
>> returned separately, or will each query overwrite the last one?
>>
>> Has anyone ever tried to do this - or got any pointers - at all?
> 
> 
> A definitely non-trivial task, but here are some ideas to get you 
> pointed in the right (I hope) direction:
> 
> Try googling, or looking in the zope source for:
> 
> data_record_normalized_score_
> BaseIndex.py
> OkapiIndex.py
> SetOps.py
> okascore.c
> 
> 
> Good Luck!
> 
> Jonathan
> _______________________________________________
> Zope maillist  -  Zope-CWUwpEBWKX0 at public.gmane.org
> http://mail.zope.org/mailman/listinfo/zope
> **   No cross posts or HTML encoding!  **
> (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce
> http://mail.zope.org/mailman/listinfo/zope-dev )
> 



More information about the Zope mailing list