[Zope3-Users] Tagging content

Jeff Shell eucci.group at gmail.com
Sun Jan 8 12:52:14 EST 2006


Ha! I just implemented this, and I _generally_ like what I came up
with. I'm thinking of writing it up as a document or releasing it if
work and time will allow me to. I can share my basic implementation
here.

The overview: First, I created an ITaggable interface:

class ITaggable(Interface):
    """ Tags are a field of keywords """
    tags = zope.schema.Tuple(
        title=u"Tags",
        description=_(u"Keyword Tags"),
        value_type=zope.schema.TextLine(title=u"Tag", required=True)
        )

(I don't know if the 'required=True' needs to be there in the value_type).

Then for my particular implementation I made a property for my object
that mapped to the dublin core 'subjects' property. I wrote a custom
'DublinCoreProperty' for this. You can do your own thing if you like.
My reasoning - I want tags to be thought of as 'tags', but since I'm
working with a Zope 3 content object that has dublin core, I may as
well take advantage of what's there, so I basically adapt 'subjects'
to 'tags'. This could also be done with an adapter, and you could make
everything taggable, using something like this:

from zope.app import zapi
from zope.interface import implements
from zope.app.dublincore.interfaces import IZopeDublinCore
from zope.app.annotation.interfaces import IAnnotatable

class TaggableSubjects(object):
    """ Adapts annotatable objects to expose the 'subjects' field as 'tags' """
    implements(ITaggable)
    zapi.adapts(IAnnotatable)

    def __init__(self, context):
        self.dc = IZopeDublinCore(context)

    def _getTags(self):
        return self.dc.subjects

    def _setTags(self, value):
        self.dc.subjects = value
    tags = property(_getTags, _setTags)

That's a pretty simplistic adapter - you'd probably want more error
checking. In any case, I just want to point out that in my
implementation, tags are just a tuple of strings on any ITaggable
object.

Now - how about querying them? This was hard until recently. I used
zc.catalog out of the Zope subversion repository, because it has
what's known as a SetIndex. A SetIndex allows for querying against,
um, sets of data. So with a catalog or extent catalog (the 'extent
catalog' is part of the 'zc.catalog' package), I create an index,
'tags', as a SetIndex, which uses the 'tags' attribute of the
ITaggable interface. The catalog will try to adapt an indexed object
to ITaggable, so you can use something like the above adapter to give
'tags' to existing objects. Or implement it directly. In any case,
this is a really cool feature, adaptation. There are already event
listeners in place for the catalog / indexes to respond to object
modification and deletion events, so you don't have to remove it from
all objects it was tagged with. In my scenario, tags are not smart
objects (I did have an earlier implementation that had them as such,
before catalogs and the SetIndex were available).

To query them, you just need to search the catalog. I use hurry.query,
and with that it looks like this:

from zope.app import zapi
from hurry.query.interfaces import IQuery
from hurry.query import set as setquery

(inside whatever class / function / method you have for querying):

tags = self.tags   # a tuple of strings: ('zope','tags','tagging','catalog')
tagindex = ('catalog', 'tags')   # (ICatalog name, index name) - its
how hurry.query works
query = setquery.AllOf(tagindex, tags)
queryUtility = zapi.getUtility(IQuery)
results = self._results = queryUtility.searchResults(query)

With hurry.query you can easily add other queries on to this, such as
full text indexing, date sets, and more, depending on your indexes.

Anyways, there's the heart of what I use in this knowledge base
application I've been working on for the past couple of weeks. It's an
application that stores all of its articles in one big container,
generates names automatically (yay INameChooser!). You just add an
article and tag it, and use tags for navigation.

Although... that's where things get a little more tricky. I do have a
system in place that lets me have URLs like:

http://kbase.example.com/@@tags/zope/catalog/setindex

and that will turn into a query for all documents that have the tags
('zope', 'catalog', 'setindex'). I have a solution in place that I
like that actually generates objects on the fly for each point in the
URL. These aren't Tag objects (since tags are just keyword strings),
but TagMatchers. A TagMatcher is a simple interface and object that
holds a list of tags and runs a catalog query for the matching tags.
This query has to be called explicitly. I then have some
IBrowserPublisher adapters and views that are used to manage the
traversal. Basically every time a name is traversed, a new tagmatcher
is made, with the understanding that its parent is a TagMatcher. The
parents tags plus the new name are passed into the new matcher. At the
end of the path, a default view for ITagMatcher calls 'process' which
causes the query to be run and then it gets and sorts the results for
display.

One of these days I'm going to make a kick-ass site where I can write
documents about how I did this. It's cool when you get it working, but
figuring out the traversal options and all of that just to have cool
URLs like 'tags/foo/bar/baz' when 'foo', 'bar', and 'baz' are all
objects being created on the fly was not easy.

Anyways, the summary is: I used Catalog with a SetIndex (zc.catalog,
look at http://svn.zope.org/Sandbox/zc/catalog/ ) for
``ITaggable.tags``. I use publishTraverse and views to do tag
traversal. I have an AllTags view that when called renders a page
showing tags in the 'cloud' style (showing popularity by size and with
a number of matching), and when that's traversed through the
TagMatcher sequence starts which creates these objects for each point
along the URL that collect each tag name and when the end is reached,
the default view runs the query and presents the results. You don't
delete 'tags', since they're not first class objects. The set index
deals with everything and seems quite efficient.

You can do your option #2. My previous implementation, done last
summer, worked that way.. kindof. Tags were more of a first class
object and intids and all of that were used. But this new
implementation I'm using seems a lot faster and more efficient. There
were some hard things (namely the traversal stuff for nice URLs), but
as far as indexing and querying goes, the Catalog/SetIndex combination
works great.

On 1/8/06, Igor Stroh <igor at rulim.de> wrote:
> Hi there,
>
> I'm trying to figure out the best way to implements a general
> solution for tags (like tags on flickr or del.icio.us). So far
> I've two different approaches, maybe someone could comment on
> them:
>
> 1) Store the tags as attributes of the particular content types.
>    Searching for tagged content is handled by a catalog index.
>    New tags are created by simply adding them to the object,
>    a collection of tags is the set of all tags from the catalog.
>    Drawbacks: centralized tag management is inefficient, e.g.
>    deleting a tag implies searching for the particular tag and
>    removing it from all objects it was tagged with.
>
> 2) Store tags in a local utility. The utility manages a
>    tag -> intids mapping (similar to a catalog index).
>    Searching for tags is easy - just query the mapping
>    for appropriate keys. New tags are created by adding
>    a new key to the mapping, a (weighted) collection of
>    tags is the list if mapping keys.
>    Drawbacks: I can't think of any right now
>
> Has anyone done that before? Maybe I'm just reinventing the wheel...
>
> Thanks in advance,
> Igor
> _______________________________________________
> Zope3-users mailing list
> Zope3-users at zope.org
> http://mail.zope.org/mailman/listinfo/zope3-users
>


--
--
Jeff Shell


More information about the Zope3-users mailing list