[Zope] Search with partial words on own ZClass

Rik Hoekstra rik.hoekstra@inghist.nl
Tue, 4 Apr 2000 23:11:06 +0200


>> Is that just partial searching, or also wildcard and even regexp
searching?
>
>I define 'Partial' and 'wildcard' as the same thing, but I'm using my
>own terminology so I could be wrong.  I define partial as 'finding part
>or all of a word', which can be acomplished with wildcards '*part*'.
>


Yes

[snip explanation]

>
>Regular expressions are not feasiable in any searching system.  Although
>it may be possible, with the existing lexical analysis that globbing
>lexicons do, to implement a larger subset of regexp than just * and ?,
>it is not feasable to implement the entire regexp language.
>

No, of course not. I was more thinking along the lines of (to stick with
your example)
fl[e|a|u]c*, but it doesn't really matter.

>> And since you keep locations of the words, is there proximity searching
also
>> possible?
>
>The location in the document is not kept, just the score.  There are
>TextIndex methods however for finding the positions of words in a
>document, this is used to support the 'Near' operator, which is '...'
>This operator exists in TextIndexes now (it allways has, since I took
>over the indexing realm), I tested them a few months ago but couldn't
>get the concept to work.  I suspect it's buggy, the code holds over from
>ZTables.


Ok, that's clear

>>
>> Another question: how do I retrieve a list of unique words from a
full-text
>> catalog?
>
>In 2.1, you need to hack the lexicon from Python.  In 2.2, you call a
>Vocabulary object's 'words' method, or you can call the Vocabulary with
>a pattern '*' to match all words, or a more restrictive pattern if you
>only want all the unique words that match a pattern, like '*ing', all
>the words that end in ing.
>

That's very nice, and opens up (even more) possibilities for an already
great product.


>> Now, I know there is no standard way, but is it possible at all.
>
>In 2.2 it is standard (and documented in the Interfaces Wiki).
>
>> Can I use the items, keys etc interfaces of the text index (perhaps with
>> some python hacking)?
>
>TextIndexes do not store the word, they store an integer that the
>lexicon maps to a word.  This is so text indexes can be language
>independent.



right

Rik