[Zope] Brainstorm: Zope behind proxying cache?

Thu, 13 Jan 2000 22:19:31 -0600

Paul Everitt wrote:

> Jonathan wrote:
> > Idea: Put a proxying cache with content negotiation, rewriting of
> > requests etc. in front of Zope.

[snip]

> The problem is: what is a page?

[snip]

> The win these days is in smarter caching which is more finely-grained
> than at the page level.

Exactly.  If you're accustomed to thinking of a web site as a collection of
HTML documents in a file system, squid/apache/roll-your-own caching sounds
attractive.  Your pages change only when you tell them to, which will be
sporadically and infrequently (relative to page hits, if not absolutely).

Once you really absorb the possibilities and practices of a tool like Zope
or PHP, though, you begin to see your site as a collection of templates and
interfaces, combining static snippets with data queries and views.  It
makes no more sense to cache many of the pages from such a site than it
would to cache windows from an accounting program or word processor.
You're moving into the land of the web application.

As you take more and more advantage of the leverage that Zope gives you,
you will realize that the *are* things you would like to cache, but they
aren't pages any more.  They're dribs and drabs of data, such as a menu
generated by walking an object tree, or a database-driven bit of output
which rarely changes but is relatively expensive to run.  Typically, each
such cacheable bit will be the value returned by a single method/object,
possibly varying by parameters/context.

Sometimes, as with the entry page of an active message board, you may want
to provide customized views to each user, yet recompute the message summary
only every minute or so, rather than with every hit.  Other times, you may
want to flush an item from the cache only if some 'upstream' object is
modified.  Then again, there are situations where only manual flushing will
do, as it is impractical to try to automatically discover when the cache is
stale.

One way to handle this is to tag an object whose output you wish to cache
with a set of rules, such as minimum or maximum cache lifetime, and to
provide a 'flush from cache' method.  Trying to automatically track
dependencies is probably not workable, since acquisition and the request
environment provide so many sources for variable data.  On the other hand,
with careful design, it may be possible to specify a set of values or a
formula which can be used as a cache key for particular objects.  This key
could be computed by a user-defined method, or provided as an expression
list if it's simple enough.

Often several or many objects share common cache characteristics.  They may
depend in the same inputs, or simply have the same 'freshness'
requirements.  Rather than attach cache settings to particular objects, it
might be a good idea to attach them to Cache Policy objects, and simply
assign cacheable objects a Policy.  This is roughly similar to the ZSQL
Method/Database Connector division of labor.  A single call to a Policy
method could clear the cache of all objects with that Policy, and a cache
key method/formula might only need to be calculated once.

I'm not sure how Policies should best be assigned to objects.  One way
would be to provide Cache containers which encapsulate the objects to be
cached.  Another is to make some classes cache-aware, just as they can be
ZCatalog-aware.  Yet another is to provide Cache Manager objects, which can
control cache Policy assignments for sibling objects.

One of these days, I may care enough about performance to set this down in
code, but not yet.  Python underlies Zope, and it's philosophy on the
subject is a good one: solve the problem now with clear, well-chosen
algorithms and only worry about 'optimizing' if performance measurable
suffers.  If you try to guess where to optimize in advance, you'll probably
waste your time and produce gnarly, bug-enhanced code.

Cheers,

Evan @ 4-am