[Zope] Knowledge Base type of product

Chris McDonough chrism@zope.com
06 Jul 2002 11:57:44 -0400


Hi,

You should try Zope 2.6, which has ZCTextIndex, a much-improved text
index.  Sounds like it would work very well for this application.

>From the ZCTextIndex readme:

- A new query language, supporting both explicit and implicit Boolean
  operators, parentheses, globbing, and phrase searching.  Apart from
  explicit operators and globbing, the syntax is roughly the same as
  that popularized by Google.

- A more refined scoring algorithm, resulting in better selectiveness:
  it's much more likely that you'll find the document you are looking
  for among the first few highest-ranked results.

- Actually, ZCTextIndex gives you a choice of two scoring algorithms
  from recent literature: the Cosine ranking from the Managing
  Gigabytes book, and Okapi from more recent research papers.  Okapi
  usually does better, so it is the default (but your milage may
  vary).

- A redesigned Lexicon, using a pipeline architecture to split the
  input text into words.  This makes it possible to mix and match
  pipeline components, e.g. you can choose between an HTML-aware
  splitter and a plain text splitter, and additional components can be
  added to the pipeline for case folding, stopword removal, and other
  features.  Enough example pipeline components are provided to get
  you started, and it is very easy to write new components.

Performance is roughly the same as for TextIndex, and we're expecting
to make tweaks to the code that will make it faster.

(Try it out on this maillist archive:  http://saints.homeunix.com:8080/)

On Sat, 2002-07-06 at 10:44, Hung Jung Lu wrote:
> Hi,
> 
> I am looking for some product that can store a
> "knowledge database". Open-source or commercial (the
> cheaper, the better), Zope or otherwise.
> 
> I simply need to store text files, and make them
> searchable. I know that ZCatalog can kind of do the
> job, I used it a few years ago, but back then the
> search features were kind of limited (for instance,
> two-word search was hard to implement, like when
> searching for "correlation matrix": you don't want
> files that contain "correlation" and/or "matrix", you
> want files that contains the two words consecutively.
> Also, back then, ZCatalog did not have "and", "or"
> logical operators.) I don't know whether it's been
> improved recently. (I know search engine is no easy
> matter.)
> 
> Ideally the product should allow some sort of failure
> report (when some user looks up for certain keywords
> and couldn't find anything), and also some basic
> statistics, so that a human editor could improve the
> hit scores, say, once a day or once a week. Anyway, I
> am looking for something that is not 100% automated:
> it would be great if some human editor assistance can
> be incorporated to make the knowledge database's
> output more reasonable.
> 
> I'd appreciate any pointers.
> 
> regards,
> 
> Hung Jung
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Sign up for SBC Yahoo! Dial - First Month Free
> http://sbc.yahoo.com
> 
> 
> _______________________________________________
> Zope maillist  -  Zope@zope.org
> http://lists.zope.org/mailman/listinfo/zope
> **   No cross posts or HTML encoding!  **
> (Related lists - 
>  http://lists.zope.org/mailman/listinfo/zope-announce
>  http://lists.zope.org/mailman/listinfo/zope-dev )