[Zope] keeping Java Servlets session ids based on url rewriti ng

Chris McDonough chrism@digicool.com
Mon, 11 Sep 2000 23:49:01 -0400


Albert,

I've put this in the collector as a possible bug... hopefully it will
get fixed with the next release if it proves not to be the proper
behavior.  In the meantime, you want want to try messing around with
that regex to get the appropriate behavior for your environment.

aboulang@ldeo.columbia.edu wrote:
> 
>       I've done a little poking around in ZPublisher's HTTPRequest.py and
>       BaseRequest.py and I don't think that's where the ';*' gets stripped.  I
>       can't find *where* it gets stripped.  It must be possible to make Zope
>       de-ignore things split on a ";", but right now I can't find out where to
>       do so.
> 
>    Um from looking at the code I think it may be Zserver not Zpublisher
>    doing it. I think there is code which set up the CGI env vars at
>    Zpublisher pick em up and works with them, so it is the code that sets
>    those GCI vars that is dropping it. Isn't it tru that if you use
>    APACHE, they are set by APACHE and is you use Zserver w/o frontending
>    it with APACHE something in Zserver has to be setting them?
> 
> I think this is where the stripping occurs:
> 
> From default_handler in the medusa directory...
> 
> # split a uri
> # <path>;<params>?<query>#<fragment>
> path_regex = regex.compile (
> #        path        params        query       fragment
>         '\\([^;?#]*\\)\\(;[^?#]*\\)?\\(\\?[^#]*\)?\(#.*\)?'
>         )
> 
> def split_path (path):
>         if path_regex.match (path) != len(path):
>                 raise ValueError, "bad path"
>         else:
>                 return map (lambda i,r=path_regex: r.group(i), range(1,5))
> 
> Which is called by HTTPServer.py:
> 
> def get_environment(self, request,
>                         # These are strictly performance hackery...
>                         split=string.split,
>                         strip=string.strip,
>                         join =string.join,
>                         upper=string.upper,
>                         lower=string.lower,
>                         h2ehas=header2env.has_key,
>                         h2eget=header2env.get,
>                         workdir=os.getcwd(),
>                         ospath=os.path,
>                         ):
>         [path, params, query, fragment] = split_path(request.uri)
>         while path and path[0] == '/':
>             path = path[1:]
>         if '%' in path:
>             path = unquote(path)
>         if query:
>             # ZPublisher doesn't want the leading '?'
>             query = query[1:]
> 
>         server=request.channel.server
>         env = {}
>         env['REQUEST_METHOD']=upper(request.command)
>         env['SERVER_PORT']=str(server.port)
>         env['SERVER_NAME']=server.server_name
>         env['SERVER_SOFTWARE']=server.SERVER_IDENT
>         env['SERVER_PROTOCOL']=request.version
>         env['channel.creation_time']=request.channel.creation_time
>         if self.uri_base=='/':
>             env['SCRIPT_NAME']=''
>             env['PATH_INFO']='/' + path
>         else:
>             env['SCRIPT_NAME'] = self.uri_base
>             try:
>                 path_info=split(path,self.uri_base[1:],1)[1]
>             except:
>                 path_info=''
>             env['PATH_INFO']=path_info
>         env['PATH_TRANSLATED']=ospath.normpath(ospath.join(
>                 workdir, env['PATH_INFO']))
>         if query:
>             env['QUERY_STRING'] = query
>         env['GATEWAY_INTERFACE']='CGI/1.1'
>         env['REMOTE_ADDR']=request.channel.addr[0]
> 
>         # If we're using a resolving logger, try to get the
>         # remote host from the resolver's cache.
>         if hasattr(server.logger, 'resolver'):
>             dns_cache=server.logger.resolver.cache
>             if dns_cache.has_key(env['REMOTE_ADDR']):
>                 remote_host=dns_cache[env['REMOTE_ADDR']][2]
>                 if remote_host is not None:
>                     env['REMOTE_HOST']=remote_host
> 
>         env_has=env.has_key
>         for header in request.header:
>             key,value=split(header,":",1)
>             key=lower(key)
>             value=strip(value)
>             if h2ehas(key) and value:
>                 env[h2eget(key)]=value
>             else:
>                 key='HTTP_%s' % upper(join(split(key, "-"), "_"))
>                 if value and not env_has(key):
>                     env[key]=value
>         env.update(self.env_override)
>         return env
> 
> Also from rfc1738
> 
> http://rfc.fh-koeln.de/rfc/html/rfc1738.html
> 
> "Reserved:
> 
>    Many URL schemes reserve certain characters for a special meaning:
>    their appearance in the scheme-specific part of the URL has a
>    designated semantics. If the character corresponding to an octet is
>    reserved in a scheme, the octet must be encoded.  The characters ";",
> 
>    "/", "?", ":", "@", "=" and "&" are the characters which may be
>    reserved for special meaning within a scheme. No other characters may
> 
>    be reserved within a scheme.
> 
>    Usually a URL has the same interpretation when an octet is
>    represented by a character and when it encoded. However, this is not
>    true for reserved characters: encoding a character reserved for a
>    particular scheme may change the semantics of a URL.
> 
>    Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
>    reserved characters used for their reserved purposes may be used
>    unencoded within a URL.
> 
>    On the other hand, characters that are not required to be encoded
>    (including alphanumerics) may be encoded within the scheme-specific
>    part of a URL, as long as they are not being used for a reserved
>    purpose.
> "
> 
> Hopes this helps,
> Albert

-- 
Chris McDonough
Digital Creations, Publishers of Zope
http://www.zope.org