[ZWeb] Zope.org currently unusable

Andrew Sawyers andrew at zope.com
Thu Mar 10 09:27:55 EST 2005


I need to read up on the robots.txt spec.  Excellent Mark, thanks.
Andrew

--
Zope Managed Hosting
Software Engineer
Zope Corporation
(540) 361-1700 

> -----Original Message-----
> From: zope-web-bounces at zope.org [mailto:zope-web-bounces at zope.org] On
> Behalf Of Mark Pratt
> Sent: Thursday, March 10, 2005 6:16 AM
> To: Jens Vagelpohl
> Cc: zope-web at zope.org
> Subject: Re: [ZWeb] Zope.org currently unusable
> 
> Hi,
> 
> I recommend adding crawl delays for all but google to something like:
> 
> User-agent: Slurp
> Crawl-delay: 120
> 
> This is for the yahoo bot but should also be applied to msnbot.
> 
> It's crazy how some of these bots love to hit your site at the same
> time. A 120 second delay should be more than enough time between
> hits even if they all come at the same time.
> 
> Cheers,
> 
> Mark
> 
> 
> On Mar 10, 2005, at 10:33 AM, Jens Vagelpohl wrote:
> 
> >
> > On Mar 10, 2005, at 2:18, Andrew Sawyers wrote:
> >
> >> It's a little of both; there's a group of people working on this - we
> >> hope
> >> to have something real soon now :) as a fix.  Jens, could do you have
> >> the
> >> time to check the zope.org robots.txt?  A lot of the problems I've
> >> seen
> >> recently were due to several robots spidering zope.org at a time.  I'm
> >> working on additional hardware and we should see more traction on the
> >> project sooner then later.
> >
> > I don't believe all that much in robots.txt. The nasty bots completely
> > ignore it, anyway. The only way to deal with them is to block them
> > with e.g. iptables.
> >
> > What's currently there looks odd:
> >
> > """
> > User-agent: wget
> > Disallow: /
> >
> > User-agent: Wget
> > Disallow: /
> >
> > # Ask Google to skip search queries and the like.
> > User-agent: Googlebot
> > Disallow: /*?
> > """
> >
> > Looking at the spec the case sensitivity of the User-agent value is
> > only "recommended", but you could shorten that into the following,
> > because multiple User-agent values are allowed per rule set:
> >
> > """
> > User-agent: wget
> > User-agent: Wget
> > Disallow: /
> > """
> >
> > Otherwise there really isn't much in there, and from seeing googlebots
> > myself often enough I have my doubts whether the line "Disallow: /*?"
> > works at all.
> >
> > jens
> >
> > _______________________________________________
> > Zope-web maillist  -  Zope-web at zope.org
> > http://mail.zope.org/mailman/listinfo/zope-web
> >
> >
> 
> _______________________________________________
> Zope-web maillist  -  Zope-web at zope.org
> http://mail.zope.org/mailman/listinfo/zope-web



More information about the Zope-web mailing list