[Zope] i18n site and search robots

Tino Wildenhain tino@wildenhain.de
Wed, 23 Jul 2003 15:33:08 +0200


Hi Gilles:

Gilles Lenfant wrote:
> ----- Original Message -----
> From: "Tino Wildenhain" <tino@wildenhain.de>
> To: "Dieter Maurer" <dieter@handshake.de>
> Cc: "Gilles Lenfant" <gilles@pilotsystems.net>; <zope@zope.org>
> Sent: Wednesday, July 23, 2003 9:13 AM
> Subject: Re: [Zope] i18n site and search robots
> 
> 
> 
>>Hi,
>>
>>Dieter Maurer wrote:
>>
>>>Gilles Lenfant wrote at 2003-7-22 15:50 +0200:
>>> > This is not strictly speaking a Zope problem, but certainly lots
> 
> amond you
> 
>>> > faced and fixed this.
>>> > I made a i18n site with Localizer that runs fairly good, including
> 
> its i18n
> 
>>> > search engine.
>>> > But what about external searche engine robots (google, infossek...)
>>> > How to "tell" them that they may browse and index the pages in
> 
> french,
> 
>>> > english, spanish (...), changing their http header "Accept-Language"
> 
> ?
> 
>>>Not sure, whether this is the most elegant way, but:
>>>
>>>  You could have "language access folders", e.g. "en", "fr", "de".
>>>
>>>  Requests that go through these folders select the corresponding
>>>  language. A ("SiteAccess") AccessRule in the folders ensures
>>>  that "Accept-Language" is correctly set in "REQUEST.environ"
>>>  and that even "absolute_url" generates the correct language
>>>  specific URLs.
>>>
>>
>>
>>
>>According to the W3C standard, the server would
>>1.) issue a vary: Accept-Language  header on each request
>>2.) if no accept-language header is sent, definition requires to send
>>     300 "Multiple Choices" as status and provide a list of available
>>     variations
>>In the multiple choice answer, the list could consist of the said links
>>to the language-acess folders Dieter proposed.
>>
>>
>>This would make a good crawler switch.
>>
> 
> 
> Many thanks Tino,
> 
> Could you please give this full doc URL.
> I didn't find this (or search correctly) in the w3c.
> 
> Thanks in advance.
> 

Sorry, it was (of course) not W3C but RFC ;))

Fielding, et al.            Standards Track                 [Page 60/61]
RFC 2616                        HTTP/1.1                       June 1999

10.3.1 300 Multiple Choices

    The requested resource corresponds to any one of a set of
    representations, each with its own specific location, and agent-
    driven negotiation information (section 12) is being provided so that
    the user (or user agent) can select a preferred representation and
    redirect its request to that location.

    Unless it was a HEAD request, the response SHOULD include an entity
    containing a list of resource characteristics and location(s) from
    which the user or user agent can choose the one most appropriate. The
    entity format is specified by the media type given in the Content-
    Type header field. Depending upon the format and the capabilities of
    the user agent, selection of the most appropriate choice MAY be
    performed automatically. However, this specification does not define
    any standard for such automatic selection.

    If the server has a preferred choice of representation, it SHOULD
    include the specific URI for that representation in the Location
    field; user agents MAY use the Location field value for automatic
    redirection. This response is cacheable unless indicated otherwise.

I think you can include references to different alternatives into
the HTML-Header too. Maybe the <link ..> and <meta ..> tags
have definitions for this.

Regards
Tino Wildenhain