[Zope-dev] RE: [Zope] Idea: ZTest: Integrated Use-case based web site testin g. testin g.

Fri, 27 Aug 1999 09:20:19 -0500

There's a much simpler way to do this.  Use a ZCatalog with an index on
'event_scheduled_date' or some similar property.  Have your worker process
query the catalog for objects that have an 'event_scheduled_date' less than
or equal to the current time and whose 'run_state' is not 'running'.
Update each one's run_state in the catalog to 'running', then execute them.
 Update the catalog again to remove them when they're done.  Poof.  Simple.
 You could even make the protocol be that you call the ZCatalog 'Schedule'
and it has to be in the root of the site.

This would allow you to use any callable object/method as a scheduled
event, as long as it's accessible via a Zope path.  And an object that
wants to schedule a method of itself can just make a dummy object with the
right data for the Catalog, and catalog it with a path to one of its methods.

If you want to get a bit fancier, and perhaps a bit more efficient, you
could always hand-create a Product that used a IOBTree object to map from
time values to lists of events to execute and create a custom API, but why
go to all that trouble when ZCatalog is available, at least for a
proof-of-concept?

The trickiest part to all of the above, is dealing with the possibility of
two simultaneous workers trying to run the same event.  Either you have to
use a lock to serialize all events running, or else you have to commit
transaction when setting the state to 'running' so that you'll be forced to
retry if another thread has just done the same thing.  But if you commit
the transaction, showing it 'running', and then the process crashes or even
just the rest of the scheduling operation, you now have an orphaned event
which can never be run because in theory, it's already running.

None of this is a problem as long as your scheduled event has no effects
which can't be undone by the transaction machinery.  But LDAP, MySQL,
e-mail, and remote ZClient calls are just a few of the things which
currently can't be undone in such fashion, and most of these are of
significant interest for any serious scheduling system.

(Can you tell I've been thinking about this one for a while already?  <grin>)

Hmmm...  wait, there *is* a way to fix this.  An in-memory data structure,
protected by a lock, and with a try...finally block to remove the 'running'
event from the data structure.  That'd work.  Use the object's path as a
dictionary key, the thread running it as the value.  Instead of changing
run_state to 'running', just acquire a lock on the dictionary, check
whether another thread already is running that object.  If so, keep
browsing through the available events until you find one that's not in the
'running' dictionary.  Otherwise, claim it for yourself and release the
lock.  Then remove the event from the catalog and run it, protected by a
finally clause that removes the marker from the dictionary.  If the event
runs successfully, the removal gets committed; if not, it's still in the
catalog.  Either way, the event is no longer on the 'running' list, and
workers are free to try it again if it's still in the catalog.  Voila!  Now
it can support multiple workers without orphaned or conflicting events.

At 11:32 AM 8/27/99 +0200, Martijn Pieters wrote:
>
>I still think we should do the scheduler with a queue, we can always 
>convert to multiple threads later. Let's keep it simple and expand.
>
>Scheduled Event objects are to be based on DTML Methods with some extra 
>properties to set up frequence, etc.
>
>The Scheduler is a Folder object that only allows Event objects to be 
>added. At least, I haven't thought of any scenario where it would be useful 
>to add anything else.
>
>I am still thinking of a queue implementation, where worker threads pick 
>off events to execute. Events are inserted into the queue based on who's 
>next to be executed.
>
>An event has only one reference in the queue, and is put back on after 
>execution (so long running methods that are scheduled to run every second 
>won't get executed that often).