[Zope-dev] how bad are per-request-write-transactions

17 Apr 2002 10:04:50 -0400

On Wed, 2002-04-17 at 11:44, Casey Duncan wrote:
> Paul Everitt wrote:
> > 
> > I don't agree that high write is always forbidden.  I think there are 
> > plenty of cases where this can work.  It simply becomes unworkable much 
> > sooner than other data systems (e.g. a relational database or FS-based 
> > solution).
> 
> I agree, but I loath to approve of any solution which demands a write 
> for every read of an object.

Even if the pertinent objects are only read once a minute?  That's
pretty severe.

> > For instance, think about bloat for a second.  Let's be crazy and say it 
> > takes 100 bytes to store an integer representing a count.  Let's say you 
> > write once a second.  That's under 7Mb a day (per counter).  Combined 
> > with hourly packing, that might be well within limits.
> 
> Yes, but the counter is not the only thing written. The whole containing 
> object is written out to the storage. Now that doesn't include binary 
> data (such as image and file data), but it does include any primitive 
> data stored in the object's attributes (strings, lists, dicts, etc).

That's only if you do it as a property.  It doesn't have to be done that
way.  Shane and I discussed a counter that existed as a central
datastructure.  Objects that were being counted would simply have
methods to increment the count and display the count.

This data structure would likely be some kind of tree, to avoid itself
being completely written on every change.

> Hourly packing seems like a blunderbus solution to the bloat problem. 
> You can't tell me that won't kill performance...

Again, some people might not care if, once an hour, there is a 20 second
performance penalty.  The tradeoff might be worth.  But I was being
hypothetical here.  It's better to get it to a once-a-day pack, which
people should do anyway.

Of course blunderbus is in the eye of the beholder.  Writing a cron job
to wake up every N seconds, scan a log, and update the count of pages
seems a bit blunderbus-y to me as well. :^)

> > Let's take the next step and say that you can live with a little 
> > volatility in the data.  You write an object that caches ten seconds 
> > worth of writes.  Whenever a write comes in at the over-ten-second mark, 
> > you write the _v_ attribute to the persistent attribute.  There's an 
> > order of magnitude improvement.
> 
> Only if you run single threaded. For multi-threaded Zope apps (the 
> default), you would need to use a transient object which introduces its 
> own complexities.

Correct.  The ideal is a data structure built for this kind of problem. 
Fortunately this isn't unknown territory.

> > Finally, you store all your counters in a non-versioned storage.  Now 
> > you have *no* bloat problem. :^)
> 
> Right, the transient object or something else that writes to disk. Now 
> you have to make sure the counters can be related to the object 
> robustly. Bookkeeping... This is certainly a possibility, I would 
> hesitate to argue for it on the notion it is less complex though.

Hmm, I thought this was a fairly common pattern courtesy of the
catalog.  An object changes.  Something else is told to update itself.

> > Regarding performance, maybe his application isn't doing 50 
> > requests/second and he'd be willing to trade the slight performance hit 
> > and bloat for a decrease in system complexity.
> 
> That could be a good trade, I just wanted to make sure the issues were 
> known.

Completely agreed.  My disagreement is portraying the counter problem as
impossible with the zodb.  I think some people, as evidenced by some of
the responses, are willing to live with the tradeoffs.  Other people
will find managing a log file on disk to be a more manageable solution.

> > All of the above has downsides as well.  My point, though, is that we 
> > shouldn't automatically dismiss the zodb as inappropriate for *all* 
> > high-write situations.  In fact, with Andreas and Matt Hamilton's 
> > TextIndexNG, you might even be able to write to catalogued applications 
> > at a faster rate than one document per minute. :^)
> 
> Of course not, but the obvious and easiest solution (just incrementing a 
> counter on the objects on every read) is probably not the best solution.

If people can live within the limitations (e.g. they have a small number
of infrequently-changing things to count), then it's unlikely to be much
of a problem.

All in all, an interesting discussion from which not much is likely to
change, as _I'm_ certainly not going to implement what I describe. :^)

--Paul