[Zope] The Principles of Designing a Heavy Load Site with Zope

Matthew T. Kromer matt@zope.com
Mon, 04 Mar 2002 10:48:51 -0500


I think there are some fundamental misunderstandings going on here, but
I thought it would be interesting to try to respond anyway.


iap@y2fun.com wrote:

> Hi,
>
> This issue has been discussed again and again,
>
> I would like to clarify my idea and your comments will be very
> appreciated.
>
> Suppose we want to provide a server which is:
>
> 1) Hosting 1,000,000 members' profile. Each member's disk quota is 5MB.
>
> Which means we need at least 5,000GB (5 TeraGB) disk space.
>
> 2) Assume the concurrent accessing to a URL is 1000 request/per second
>
> 3) Assume all the requests retrieve dynamic content.
>
> 4) We want to leverage the power of Zope which means all the pages
> should be
>
> rendered by zope.
>


Having 5 TB of disk space usually means some very high-powered RAID
gear; my personal favorite is the EMC Symmetrix units; I think you would
probably want at least two of those to provide your coverage. Estimated
cost for this is about $5,000,000 (but *very* dependant on EMC's pricing
strategies).

You could get by for less, by distributing each disk with each CPU (the
breadrack approach.)

1000 requests/second isnt terribly high; Zope installations have done
400/sec with no problem. However, these are in situations where Zope is
being heavily cached; less than 10% of the requests are actually being
rendered by Zope. So, if you wanted no caching (ala everything is
completely dynamic content), my estimate is you would need something
like 100 1Ghz Intel Pentium III class machines to perform that amount of
dynamic rendering. If each of those machines had a 50 GB disk drive,
you'd theoretically have your 5TB of disk space.
At a rough commercial cost of $4,000 per unit (probably a bit high),
that's only $400,000.

As a practical matter, you'd then need some pretty hefty load balancing
servers; at least two, possibly more.

However, that begs the question of how you organize that much disk
space. It's not an easy task. Whether or not you use an RDBMS is
irrelevant until you can work out a strategy for using the disk space
scattered amongst all of those machines.

* * *

Now, if you forget for a moment about requiring each page to be
dynamically rendered each time it is viewed, and you set aside the
storage questions, you could estimate that, with a 90% caching rate, you
could serve 1000 requests/sec with only about 14 machines (10 renderers,
2 cache servers, and two load balancers). Estimated cost for that is
$56,000.

What is most unrealistic about this scenario is your assumptions about
the member base, and its ratio to expected activity. One million users
may only generate 1,000 requests/sec, but they certainly could generate
a lot more. In fact, a critical strategy for large systems like this is
anticipating for "peak demand" events. Lets say you send an e-mail out
to all million people, telling them to log in and check out a particular
URL. That timed event will generate a demand curve that is not evenly
distributed over time; in fact, it is usually very front-loaded. Within
about 5 minutes, more than 10% of the user base will probably respond.
This is a raw rate of about 333 requests/sec, but that presumes that the
single URL is the only thing they load; usually, a page contains images
and other content (style sheets etc) which also much be fetched. Pages
with a high art content can have 25 elements or more on them. That
pushes the request rate up to 8333 requests/sec; way out of the 1000
request/sec bound.

> The priciples I would like to verify are:
>
> 1) Some database (RDBMS) should be used instead of FileStorage for ZODB.
>
> 2) The ZEO should be used for constructing a cluster computing.
>
> 3) The Apache should be the front end instead of ZServer.
>
> 4) The PCGI should be the connection between Apache and Zope.
>
> 5) I shouldn't create static instance into the ZODB but query the
>
> external database.
>
> 6) The "Cache" of zope is useless since all the responses are dynamic
> rendered.
>
> By the way, how much will this kind of system costs, regardless the
> hardware?
>
> Iap, Singuan
>


-- 
Matt Kromer
Zope Corporation  http://www.zope.com/