[Zope] threading, dtml and performance

Tony McDonald tony.mcdonald@ncl.ac.uk
Mon, 25 Feb 2002 14:33:32 +0000


On 25/2/02 12:49 pm, "oliver.erlewein@sqs.de" <oliver.erlewein@sqs.de>
wrote:

> Hi all 
> 
> I've tried everything that I can imagine to convince my *#~!!& SUN Solaris to
> do Zope/Python with multitasking / threading / LWP /... NO CHANCE!!!!
> 
> After about 6+ Months of on & off trying I'm giving up. I'll install my test
> and production system on xxxBSD or Linux (even NT runns without problems!).
> 
> It's probably easier to convince my ISP of adopting Linux than finding a
> patch! So to all Solaris hosts out there good luck, you'll need it! The
> problem lies in the conjunction of the systems. Solaris, Python and Zope just
> dont mix with Solaris threading.
> 
> My production system runs faster on my laptop on Win2k -with multitaskig- than
> on my high power SUN 19" rack with 4 Processors an 4GB RAM! On that note I'll
> just go and believe that Solaris is the wrong thing to do when thinking about
> Zope!!
> 
> Best regards 
> 
> Oliver Erlewein 
> 

A lurker surfaces....

Oliver and Paul - sorry for the delay in replying to you, we can continue
off list, but hopefully this email exchange can get some people who really
understand what's going on to expose themselves and shed some light on this
bizarre situation.

Recap for those not in the know. Solaris, Python and Zope exhibit strange
behaviour whereby only one Zope thread at a time appears to be serviced,
even though you have explicitly set the number of threads > 1.

Oliver, Paul, myself and Andrew Milton (amongst others) have posted on this
subject before - if you need more context, check things out in the archives.

Try this on a Solaris box. Put this code into a DTML method
<dtml-var standard_html_header>
<dtml-in
"[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
">
 <dtml-in "ZopeFind(this(), '', '', '', '', '' , '', '', '', 1)"
sort=bobobase_modification_time reverse>
 *<br>
 </dtml-in>
</dtml-in>
<dtml-var standard_html_footer>

'view' it - it will take a few seconds to do its work. Whilst Zope is
servicing this request, you can't do anything else on that Zope server on
your Solaris box. This was the standard behaviour on Zope 2.4/python 2.1
installations.

However .... 
Thing is, I can't reproduce this consistently on our own machines. I've
recently upgraded to

Zope Version :(Zope 2.5.0 (source release, python 2.1, linux2), python
2.1.2, sunos5) 
Python Version : 2.1.2 (#1, Jan 16 2002, 21:19:54) [GCC 2.95.3 20010315
(release)] 

On most of our Zope servers on our Solaris 8 box.

Python was built with just ./configure (I understand that threads are built
by default now - is this true?), and my 'suck it and see' test of

% python /usr/local/lib/python2.1/test/test_thread.py

Gives something similar to what I see on my MacOS-X box (which has no
problems with threading under Zope).

I.e. Solaris
creating task 1
creating task 2
task 2 will run for 7.1 sec
creating task 3
creating task 4
creating task 5
creating task 6
creating task 7
task 3 will run for 4.3 sec
task 1 will run for 4.0 sec
task 6 will run for 8.7 sec
creating task 8
creating task 9
creating task 10
waiting for all tasks to complete
task 5 will run for 5.0 sec
task 4 will run for 2.1 sec
task 7 will run for 1.4 sec
task 8 will run for 3.0 sec
task 10 will run for 5.4 sec
task 9 will run for 8.3 sec
task 7 done
task 4 done
task 8 done
task 1 done
task 3 done
task 5 done
task 10 done
task 2 done
task 9 done
task 6 done
all tasks done

MacOS-X (this is python2.1 by the way)
creating task 1
task 1 will run for 9.6 sec
creating task 2
task 2 will run for 6.6 sec
creating task 3
task 3 will run for 0.3 sec
creating task 4
task 4 will run for 7.8 sec
creating task 5
task 5 will run for 6.1 sec
creating task 6
task 6 will run for 8.5 sec
creating task 7
task 7 will run for 4.4 sec
creating task 8
task 8 will run for 1.8 sec
creating task 9
task 9 will run for 7.8 sec
creating task 10
task 10 will run for 4.5 sec
waiting for all tasks to complete
task 3 done
task 8 done
task 7 done
task 10 done
task 5 done
task 2 done
task 4 done
task 9 done
task 6 done
task 1 done
all tasks done

But now things get interesting. If you use 'ab' on the following External
Method

import time

def sleep(self):
    time.sleep(5)

You find this as you increase the concurrency (this is on a server with 4
threads running)

% ab -c1 http://host/sleep
Time taken for tests:   5.030 seconds
% ab -c2 http://host/sleep
Time taken for tests:   5.034 seconds
% ab -c3 http://host/sleep
Time taken for tests:   5.034 seconds
% ab -c4 http://host/sleep
Time taken for tests:   5.040 seconds
% ab -c5 http://host/sleep
Time taken for tests:   5.050 seconds
% ab -c6 http://host/sleep
Time taken for tests:   5.038 seconds

?? The last two tests imply that there's more than one thread running, but
wait - look at the access_log from the server

host - - [25/Feb/2002:13:53:13 +0000] "GET /sleep HTTP/1.0" 200 45
host - - [25/Feb/2002:13:53:13 +0000] "GET /sleep HTTP/1.0" 200 45
host - - [25/Feb/2002:13:53:13 +0000] "GET /sleep HTTP/1.0" 200 44
host - - [25/Feb/2002:13:53:13 +0000] "GET /sleep HTTP/1.0" 200 45
host - - [25/Feb/2002:13:53:18 +0000] "GET /sleep HTTP/1.0" 200 45
host - - [25/Feb/2002:13:53:18 +0000] "GET /sleep HTTP/1.0" 200 45

The first four threads are kicked off, then two more (after 5 seconds).
Apache bench returns after the first four threads have run - I have no idea
why this is.

If you do the same for 26 concurrent accesses ...
% ab -c26 http://host/sleep

Server Software:        Zope/(Zope
Server Hostname:        host
Server Port:            80

Document Path:          /sleep
Document Length:        44 bytes

Concurrency Level:      26
Time taken for tests:   5.092 seconds
Complete requests:      1
Failed requests:        0
Total transferred:      286 bytes
HTML transferred:       44 bytes
Requests per second:    0.20
Transfer rate:          0.06 kb/s received

Connnection Times (ms)
              min   avg   max
Connect:        7     7     7
Processing:  5084  5084  5084
Total:       5091  5091  5091

Whilst the access logs show groups of 4 requests being processed in series.

I upped the number of threads on Zserver to 20 (-t20) and this is what
happened
% ab -c26 http://host/sleep

Server Software:        Zope/(Zope
Server Hostname:        host
Server Port:            80

Document Path:          /sleep
Document Length:        45 bytes

Concurrency Level:      26
Time taken for tests:   5.067 seconds
Complete requests:      1
Failed requests:        0
Total transferred:      287 bytes
HTML transferred:       45 bytes
Requests per second:    0.20
Transfer rate:          0.06 kb/s received

Connnection Times (ms)
              min   avg   max
Connect:        7     7     7
Processing:  5060  5060  5060
Total:       5067  5067  5067

And the access_log
host - - [25/Feb/2002:14:03:59 +0000] "GET /sleep HTTP/1.0" 200 45
host - - [25/Feb/2002:14:03:59 +0000] "GET /sleep HTTP/1.0" 200 45
host - - [25/Feb/2002:14:03:59 +0000] "GET /sleep HTTP/1.0" 200 45
host - - [25/Feb/2002:14:03:59 +0000] "GET /sleep HTTP/1.0" 200 45
host - - [25/Feb/2002:14:03:59 +0000] "GET /sleep HTTP/1.0" 200 45
host - - [25/Feb/2002:14:03:59 +0000] "GET /sleep HTTP/1.0" 200 45
host - - [25/Feb/2002:14:03:59 +0000] "GET /sleep HTTP/1.0" 200 45
host - - [25/Feb/2002:14:04:04 +0000] "GET /sleep HTTP/1.0" 200 46
host - - [25/Feb/2002:14:04:04 +0000] "GET /sleep HTTP/1.0" 200 45
host - - [25/Feb/2002:14:04:04 +0000] "GET /sleep HTTP/1.0" 200 46
host - - [25/Feb/2002:14:04:04 +0000] "GET /sleep HTTP/1.0" 200 46
host - - [25/Feb/2002:14:04:04 +0000] "GET /sleep HTTP/1.0" 200 46
host - - [25/Feb/2002:14:04:04 +0000] "GET /sleep HTTP/1.0" 200 46
host - - [25/Feb/2002:14:04:04 +0000] "GET /sleep HTTP/1.0" 200 46
host - - [25/Feb/2002:14:04:09 +0000] "GET /sleep HTTP/1.0" 200 46
host - - [25/Feb/2002:14:04:09 +0000] "GET /sleep HTTP/1.0" 200 46
host - - [25/Feb/2002:14:04:09 +0000] "GET /sleep HTTP/1.0" 200 46
host - - [25/Feb/2002:14:04:09 +0000] "GET /sleep HTTP/1.0" 200 46
host - - [25/Feb/2002:14:04:09 +0000] "GET /sleep HTTP/1.0" 200 46
host - - [25/Feb/2002:14:04:09 +0000] "GET /sleep HTTP/1.0" 200 46
host - - [25/Feb/2002:14:04:09 +0000] "GET /sleep HTTP/1.0" 200 46
host - - [25/Feb/2002:14:04:14 +0000] "GET /sleep HTTP/1.0" 200 45
host - - [25/Feb/2002:14:04:14 +0000] "GET /sleep HTTP/1.0" 200 45
host - - [25/Feb/2002:14:04:14 +0000] "GET /sleep HTTP/1.0" 200 45
host - - [25/Feb/2002:14:04:14 +0000] "GET /sleep HTTP/1.0" 200 45
host - - [25/Feb/2002:14:04:14 +0000] "GET /sleep HTTP/1.0" 200 45

That is, Zope is shifting 7 requests at a time (I think I remember reading
this is hard wired into Zope somewhere). I think this also points out that
it's futile to increase -t to over 7.

What does this all mean? I dunno - but maybe, just *maybe* the Solaris
problem is fixed if you upgrade to python 2.1.2 and Zope 2.5.0

Sorry I can't be conclusive, but hope this helps.

tone
-- 
Dr Tony McDonald,  Assistant Director, FMCC, http://www.fmcc.org.uk/
The Medical School, Newcastle University Tel: +44 191 243 6140
A Zope list for UK HE/FE  http://www.fmcc.org.uk/mailman/listinfo/zope