[Zope-dev] RE: [Zope] Re: Zope hanging (poss. threads-related)

Marcus Collins mcollins@sunesi.com
Fri, 14 Apr 2000 19:58:20 +0200


> -----Original Message-----
> From: Amos Latteier [mailto:Amos@digicool.com]
> Sent: 13 April 2000 20:53
> To: 'Tres Seaver'; Marcus Collins
> Cc: zope-dev@zope.org
> Subject: RE: [Zope] Re: Zope hanging (poss. threads-related)

<snip>
 
> The ZServer zombie stuff is to get rid of zombie client 
> connections, not zombie publishing threads. These are quite 
> different beasts.

Thanks for the clarification. I'm still on my way to understanding how it
all fits together...

> > > Everything before the call to handle() is 100%. Sometimes, 
> > > however, we don't get from handle() to the next stage. This is 
> > > on Zope 2.1.6, which I've been running with up to 100 threads, 
> > > although I unfortunately can't excercise that many! 
 
> In general there is little reason to have so many publishing 
> threads.
> You almost never need that many unless you have a bunch of 
> requests that can take a *long* time.
> 
> If, on the other hand, you are trying to provoke some kind of 
> thread contention issue I advise you to publish resources that take 
> a long time to return. That way you can easily pile up as many 
> publishing theads as you want to.

Normally, I'd estimate that 20 threads would be sufficient for our site, and
that's what we want to run. We have some search queries that may take a
number of seconds to complete. However, in this instance, I was deliberately
attempting to hang threads. The pages of our site on which I've noted
hanging occurring take much less than a second to publish; no hanging
problems have been noted on pages which take significantly longer (say, 10
or 20 seconds in the extreme) to publish, although possibly that's a
statistical probability.
 
> > > I've added this logging to the Zope 2.1.3 serving the live 
> > > site, and will report my findings as soon as something untoward 
> > > occurs. Maybe others who are experiencing hanging would also be 
> > > able to do some extra logging and report the results [now, 
> > > there, I see Wiki would be really useful!].

I've described at
http://www.zope.org/Members/tseaver/Projects/HighlyAvailableZope/DiagnosingH
angProblems/DebugLogger how I've set up DebugLogger to give a greater
granularity to the logging of requests. Perhaps, if other users also
discover threads hanging, they could post their results there? 

Since extending the logging, I have found that the hanging occurs less
frequently; when I reduced the logging again, the hanging recurred --
specifically, when I reduced the logging in send_response() in
PCGIServer.py. The logs then indicated that the request failed sometime
between found_terminator() ending and the return from HTTPRequest() in
send_response(). Note that this is _different_ to my previous posting, where
I indicated that the request failed before handle() returned. Both of the
reports are true, so it seems we haven't managed to nail it down quite as
well as I thought...

The hangings that I've now recorded in testing occurred with Zope 2.1.6 when
NUMBER_OF_THREADS was set at 10. It's also hung with 4 and with 20 threads.
To be fair, though, since adding additional logging, I've really had to
pound it (20-30 simultaneous users, abnormally heavy usage) to get it to
hang.

The hangings on the live site (Zope 2.1.3) have occurred under light load
(one or two users), first with 20 threads, and now with 4. 
 
> In my experience when a Zope publishing thread hangs its 
> almost always a problem with the published resource. Maybe there's 
> something that puts Zope in a loop that never exits, or maybe 
> there's some DA weirdness that hangs the thead.

> My advice is to try and identify which requests hang using 
> debug logging and examine the resources that those requests use.

The specific resources on which this hanging has occurred are:

- /manage_*, but not yet when using manage_main alone (again, perhaps a
statistical probability)
- a frameset page on our site, which uses:
  * ZSQL methods calling a MySQL database (MySQL 3.22.27/32; ZMySQLDA 1.1.3
patched with MySQLdb 0.1.0; db connection pool_size equal to
NUMBER_OF_THREADS)
  * GUF 1.2.2 (using the database)
  * SiteAccess 1.0.1.

A single thread at a time hangs. Recently, however, when /manage was
accessed shortly after a thread hung, ZServer became completely unresponsive
to http or pcgi (didn't check ftp...).

I'll be putting this and further details up at
http://www.zope.org/Members/tseaver/Projects/HighlyAvailableZope/MarcusColli
ns  as soon as I get a chance. Unfortunately, that's not likely to be before
Monday.

Thanks All.

-- Marcus