[Zope] ZEO and a front end...

Bill Anderson bill@libc.org
Tue, 18 Jul 2000 16:08:48 -0600


Toby Dickenson wrote:
> 
> On Tue, 18 Jul 2000 04:22:16 -0600, Bill Anderson <bill@libc.org>
> wrote:
> 
> >> I think most people seem to be missing the point here.
> >>
> >> The idea is that ALL servers can serve ALL content.  HOWEVER, the 'load
> >> balancer' will opt for a certain server for a certain URL, in order to
> >> improve cache hits.
> >>
> >> So, for www.contrived-example.com/dir1  it will first try server1, but if
> >> it's busy (or down) it will try others.  This way, the cache on server1 is
> >> more likely to contain objects relevant to /dir1  and thus have a higher hit
> >> rate, therefore improving performance.
> >
> >No, I understand what is being discussed, I doubt the problem. :-)
> 
> You are right, theres no problem in the scenario you described.
> 
> Ill fill in some more details about the fictional example for which I
> still can't see an easy solution....
> 
> Zope is used to store books. Each book object contains:
> 1. The text of the books, each page in a separate object
> 2. Images and diagrams for the book.
> 3. A ZCatalog full-text-index of the book.
> Each book object allows:
> 1. Searching, viewing pages, etc.
> 2. Dynamically rendering a range of pages as pdf, postscript, etc.
> 
> The whole database stores 10,000 books, and is served by a cluster of
> many identical Zope servers.
> 
> A typical usage pattern might be:
> a. Users searches through a book to find the interesting pages
> b. He browses the pdf version of those pages
> c. He tweaks the page range, and double-checks the pdf version.
> d. then downloads a postscript version of that page range for printing
> 
> Assume that noone has accessed this book recently, so it's not in any
> caches.
> 
> The cache has to be filled at step b. This transfers alot of data -
> possibly the whole content of the book - and introduces a noticeable
> delay.
> 
> The possibility for optimisation comes at steps c and d. There is one
> cache already filled with the right data - if the requests from c and
> d can be directed to the same server as the original then the
> cache-filling delay can be avoided.
> 
> This extra delay might not have a great impact of actual site
> performance, but I've found a catastrophic affect on perceived
> performance in some usability tests. Users seem happy to accept a
> delay when they first access their data, but not if it repeated in a
> subsequent request.
> 
> Bill wrote...
> 
> > http://my.site.com/sec1 is mapped to: sec1.site.com, which
> > is load balanced across as many machines as possible
> 
> I might be reading more into his words than was intended, but I think
> this demonstrates the problem. Distributing multiple requests for one
> section across multiple servers is (what I consider to be)
> undesirable.

You can actually do it either way. Curtis (AIUI) complained that the
method described meant your site depended upon each of th esection's
servers being up, that there was no redundancy. So I described a way of
doing it with redundancy. 
 
> I want to move load balancing up one level of abstraction -
> distributing sections across machines (rather than connections).

That's easier :) Make sec1.site.com a single machine, and all requests
for my.site.com/sec1 go to this machine, thus the cache will have it
loaded if it has been accessed at all. The downside, like Curtis
mentioned, is that if sec1 dies, you lose that part of the site.

 
> >If that isn't enough, you can throw eddieware into the mix, which
> >*already* has the ability to redirect based upon the URL.
> 
> Ive not seen eddieware before - so it looks like Ive got some reading
> to do.
> 
> At a first glance it doesn't have any integrated http caching
> (although it seems to have everything else ;-) and theres no obvious
> place to hang squid. In my example above, I really want to be able to
> cache the rendered pdf files.

EddieWare does do 'intellgient' caching, allowing you to seperate out
sections of a site to a server (for example, all images come from this
machine, and text from that one, etc.), and it works at the IP Address
level. You simply plug in squid wherever, AIUI.



--
Do not meddle in the affairs of sysadmins, for they are easy to annoy,
and have the root password.