[Zope] Urgent help needed: Zope falls over under moderate load

Michael Fraase mfraase@farces.com
Tue, 20 Nov 2001 15:39:26 -0600


Okay, Chris, that's a lot to chew on. Thanks very much for your patience
and time.

--
Michael Fraase
ARTS & FARCES LLC
mfraase@farces.com
www.farces.com
PGP Fingerprint:
3D85 F3F4 9E65 4949 176A  260C CB47 190D C864 9A96

> -----Original Message-----
> From: Chris McDonough [mailto:chrism@zope.com] 
> Sent: Tuesday, November 20, 2001 3:39 PM
> To: mfraase@farces.com; 'Chris Withers'
> Cc: zope@zope.org
> Subject: Re: [Zope] Urgent help needed: Zope falls over under 
> moderate load
> 
> 
> > I guess I'm confused. Everything that *could* be cached *was*
> cached.
> > And no, I don't run a caching server or a proxy server or anything
> else
> > in front of Zope. I'm a writer, not a programmer.
> 
> OK, fair enough.
> 
> But your profession still doesn't absolve you from needing to 
> cache more in order to survive a Slashdotting.  ;-)  Either 
> that or you'll need to start developing your site with static 
> pages only.  That'd work too.
> 
> > The /. piece hit about 1:00 AM. By 1:01 AM Zope had folded like a
> cheap
> > suit. It's still going down about every 40 minutes or so.
> >
> > Now remember, my outbound bandwidth is limited to 512Kb.
> 
> If 512Kb/s is hit by as many 300-byte requests per minute as 
> possible, this translates into without taking into account 
> latency or response usage a potential inbound rate of 213 
> requests per second.  That's still a lot of requests.  As 
> something to measure that up against at peak normal load, 
> Slashdot gets about 180 requests/sec.  The 512Kb/s isn't much 
> of a throttle.
> 
> And this is assuming that your inbound bandwidth is limited 
> to 512Kb/s.. you only mentioned your outbound in this mail.  
> If inbound is higher, it's even more of a problem.
> 
> > Am I correct in my understanding that Zope can't handle even 512Kb
> of
> > demand without some technical doohickey in front of it so 
> it doesn't 
> > fall down?
> 
> Your pipe is fat enough to allow lots of requests in, and 
> what you're serving is probably sufficiently complex to be 
> very slow.  Squishdot is really not known for its speed.
> 
> "Raw" Zope itself could almost certainly handle it, however, 
> if what you were returning is a DTML method that said 
> "<html>this is a simple page</html>".  But this isn't what 
> you're returning; Squishdot has a big say in what shows up.
> 
> > No offense intended, but I think two internal Squishdot pages meet
> the
> > definition of pretty dang simple.
> 
> Maybe conceptually it's simple, but apps like Squishdot do 
> lots of stuff in order to generate these pages.  For fun, you 
> should try to set up a "barebones" Squishot with the default 
> homepage, and hit it repeatedly with a load-generator like 
> Apache's "ab".  Then try the same thing with a Zope page that 
> is "<html>Hello!</html>".  You will see a big difference.  On 
> an 850Mhz box at ZC, I can get Zope to serve about 152 
> requests/s with the simple page.
> 
> Anybody want to try this with an out of the box Squishdot 
> homepage? Or a Squishdot story page?  The guy from the KDE dot
> (http://dot.kde.org) claimed he could only get about 2 
> requests/second out of a Squishdot home page.  After setting 
> up caching properly, he was able to get about 2000.
> 
> > And why does it fall over anyway? This just doesn't make any sense
> to
> > me. I can see it getting slow and timing out, but giving up
> completely
> > and just bailing? What's that about? Explain it to me like I'm an 
> > intelligent, non-technical friend. Thanks.
> 
> The big "bang for buck" solution provider is caching.  
> Assuming that you had no problems *before* the slashdotting, 
> that will solve your problem because it will cause Zope to 
> need to serve far fewer requests, closer to the number of 
> requests you normally get.  And this is (I assume) the 
> outcome that you actually want.  I highly recommend setting 
> up a caching proxy in front of Zope if this sort of load will 
> be recurring.  It's way faster and cheaper than trying to 
> understand the problem deeply.  ;-)  Most commercial sites 
> are developed using this principle, AFAICT.
> 
> But if you're as interested in understanding the phenomena as 
> you are in solving the problem and you'd like to help the 
> current Squishdot maintainer and ZC improve their products' 
> behavior under load, it'd be necessary to know more details 
> about how it was failing under load and what happened during 
> the failures.  I would be interested in these results.  It 
> could be a memory leak, it could be a Zope bug, a Squishdot 
> bug, it could be just about anything.  You need forensic 
> information and you need to let it fail under load in order to get it.
> 
> Usually, you can get this info by turning on "big M" logging 
> (by passing "-M detailed.log" at the end of your start.bat 
> script, maybe). On Linux, I'd recommend also using the 
> ForensicLogger product (see
> http://www.zope.org/Members/mcdonc) to gather more details 
> such as memory utilization and CPU utilization; it doesn't 
> work on Windows, however.  If you're willing to do this, let 
> it fail under load, then send the log with the failure in it 
> to me and I will try to analyze it.
> 
> Note that you *might* be able to make use of the AutoLance 
> product at http://www.zope.org/Members/mcdonc to autorestart 
> your machine for you if you've got a memory leak.
> 
> HTH,
> 
> - C
> 
> 
>