[Zope] Prevent recursive and multiple URLs in Zope

Dieter Maurer dieter@handshake.de
Sat, 10 Aug 2002 15:54:00 +0200


Urs van Binsbergen writes:
 > ...
 > There are however 2 things about it I do not like very much:
 > 
 > 1) URL trailing slash handling:
 > 
 > http://example.com/some_doc
 > http://example.com/some_doc/
 > are both valid URLs to access the method or document some_doc in the
 > given root folder. In file-based publishing (like with apache) the second
 > URL would be invalid, because some_doc is not a folder.
 > 
 > http://example.com/some_folder
 > http://example.com/some_folder/
 > are both valid URLs to access the folder some_folder in the root folder.
 > Apache would allow the first URL, but would redirect to the second,
 > because some_folder is not a document, it is a folder.
 > 
 > 2) recursive acquisition:
 > 
 > http://example.com/some_folder/some_folder/some_folder/some_folder/
 > is a valid URL to access the folder some_folder in the root folder.
 > 
 > ---
 > 
 > WHY do I dislike these two things?
 > 
 > a) Philosophically: As the name "UNIQUE resource locator" already says:
 > it is generally not good to have the same content available via different
 > locators.
Maybe, your philosophical argument is weakened when you learn
that URL stands for "*UNIVERSAL* resource locator".

     Its a universal syntax (!) to locate a resource accessible throuch
     a wide variety of protocols.

It is quite common to have the same resource accessed through different
URLs: often the same resource can be accessed both via HTTP and FTP,
often the same (local) resource can be accessed with the "file", the "ftp" and
the "http" protocol, often the same resource can be accessed
via both "ftp" and "webdav" (wich is HTTP based).

 > b) Technically: Working with relative links becomes unreliable and
 > dangerous. Problem #1 causes a relative URL to sometimes work and
 > sometimes not work, depending on whether the visitor accesses "foo/bar/"
 > or "foo/bar".
Only, when you do strange strings. Usually, Zope sets the HTML base
tag, such that it does not matter whether the user uses "foo/bar/"
or "foo/bar".

 > Problem #2 makes relative links to be the door to infinite
 > recursion. A simple link like "<a href="foo/">clickme</a>" will be the
 > trap, where tumb spiders will loose themselves in a infinite loop (this
 > was discussed shortly on this list under the subject "htdig indexing
 > problem".
When you use relative links in the same way you are forced to do it
in a file system based publishing environment, there will be no
infinite recursion. Simply avoid relative links containing a "/"
not preceeded by "..". Use an absolute URL otherwise.

 > Experiences?
 > 
 > Since there are lots of Zope sites out there and I did not find big
 > discussion on this matter until yet, am I maybe putting too much weight
 > on  it?
I feel you do.

 > Workarounds
 > ...
 > - work with <base href=...>
This is done at automatically unless your pages are strange..

 > ...
 > Other workarounds I was told:
 > - (for problem 2): put an access-restricted subfolder with the same name
 > into any folder
 > - (for problem 2): disallow access to any some_folder/some_folder
 > combinations in a robots.txt
You may also learn about SiteAccess AccessRules (--> documentation
on Zope.org).

 > Solution?
 > ...
 > - if the request-URL has a trailing slash, and the invoked object is not
 > a folder: reponse 404 (even if generic Zope would serve an object then)
While a file system folder is a very narrow concept, there are many
folder variants in Zope. In fact, most objects in Zope can act like
a folder (in the sense that they support a default presentation called
"index_html").

Forget about the trailing "/" problem. Give your pages an HTML "head"
element (as you should anyway) and do not include a "base" tag,
then Zope will put such a tag in when it modified the URL.

 > - if the acquisition path invoked by the request-URL contains multiple
 > times an identical object: reponse 404
--> SiteAccess AccessRule in your root folder.

 > Does this make sense?
Maybe for you. I would not go this way.



Dieter