[Zope-DB] odd slowness in ZEOapp server

Philip Freeman phil@hostway.net
Mon, 30 Sep 2002 14:16:43 -0500


hello all -

i had earlier posted this message to the zope mailing
list, and have been recomended to this list.

since about a week ago we started experiencing excessive delays
in accessing the Zope Management Interface. Accessing the front
door of our website is is still quick as ever. The delays are
happening only in the ZMI and are mostly assosiated with the
manage_menu. Nothing was changed on our production cluster from
the systems side. merely content changes. i've recently re-tested
all network connections to the management server (ZEO app server)
and from the management server to the database. I enabled
detailed logging on the App server and hacked up a quick script
to analyze the log for slow responses :

/home/zope/var/log/zc# ./analyze.py Mdebug.log  5
request 145319924 took 189.0 seconds for GET /manage_menu
request 144738612 took 10.0 seconds for GET /manage_menu
request 144716924 took 191.0 seconds for GET /manage_menu
request 145131396 took 190.0 seconds for GET /p_/rtab
request 145706116 did not finish for  GET /manage_page_style.css
request 146220612 did not finish for  GET /manage_menu
request 144733364 took 192.0 seconds for GET /manage_menu
request 161581820 took 193.0 seconds for GET /manage_menu
request 144670844 took 190.0 seconds for GET /misc_/OFSP/dtmlmethod.gif
...

since i was running out of ideas, i ran the app server under
strace to try and track down the source of the delays. here's
a small snippet :

26243      0.000070 <... rt_sigsuspend resumed> ) = -1 EINTR (Interrupted system call)
26243      0.000030 sigreturn()         = ? (mask now [])
26243      0.000183 recv(16, "\0\0\0\362R(U\336((U\24Products.ZMySQLDA.DA"..., 8096, 0) = 246
26243      0.000304 select(26, [12 16 17 18 19 23 24 25], [], [], {30, 0}) = 0 (Timeout)
26243     29.998487 select(26, [12 16 17 18 19 23 24 25], [], [], {30, 0}) = 0 (Timeout)       
26243     29.999949 select(26, [12 16 17 18 19 23 24 25], [], [], {30, 0}) = 0 (Timeout)
26243     30.000054 select(26, [12 16 17 18 19 23 24 25], [], [], {30, 0}) = 1 (in [17], left {1,
+340000})
26243     28.665289 recv(17, "", 4096, 0) = 0
26243      0.000109 close(17)           = 0
26243      0.000257 select(26, [12 16 18 19 23 24 25], [], [], {30, 0}) = 1 (in [24], left {29,
+860000})
26243      0.139021 recv(24, "", 4096, 0) = 0



the ZEO_DB server lives in a private network without access to
a dns server, but this should not be an issue as there are hosts
entries for the machines it needs to talk to. plus it has been
working untouched for a couple of weeks. anyhow, to experiment
further, today i installed a new ZEO_DB server and ZEO_APP server
on the same machine to rule out the networking issue. Oddly enough
it does perform quite a bit better but still times out on the
same sort of requests :

/home/zopetest/log# ./analyze.py Mdebug.log  5                                                 
request 148664300 took 6.0 seconds for GET /manage_menu
request 145424316 took 11.0 seconds for GET /manage_menu
request 149665228 took 7.0 seconds for GET /manage_menu
...


ideas anyone ?


-- 

-phil@hostway.net