Jan-Wijbrand Kolman wrote:
> Hello,
> we recently realised mimetype assignment in Zope to e.g. Zope File
> objects is inconsistent and can vary when different clients (browsers)
> upload files with the same file extensions.
> Example: when a file called "foobar.rtf" is upload to a Zope File
> object from Linux Firefox, the mimetype assigned is (can be)
> 'application/rtf'. However, the same file uploaded to the same Zope
> File object in the same Zope instance, using IE on Window2000 (with MS
> Office installed) will get 'application/msword' assigned.
> The mimetype assignment for uploaded files is done in OFS.Image.py
> (maybe there're more places or other Products that do this - I know
> that at least ExtFile does this too). line 463 of OFS.Image.py, Zope
> 2.7.2:
> def _get_content_type(self, file, body, id, content_type=None):
>     headers=getattr(file, 'headers', None)
>     if headers and headers.has_key('content-type'):
>         content_type=headers['content-type']
>     else:
>         if type(body) is not type(''): body=body.data
>         content_type, enc=guess_content_type(
>             getattr(file, 'filename',id), body, content_type)
>     return content_type
> Then I understood that the headers as sent by the client for this file
> (may?) have a content-type entry that takes precedence over both the
> mimetypes 'database' and the content_type passed in as an argument.
> We could deal with the inconsistent assignment on the application
> level (in this case Silva), but I'd rather consider changing this
> behaviour on the Zope level. I could imagine changing the way a
> mimetype is 'guessed' from an uploaded File to something like:
> def _get_content_type(self, file, body, id, content_type=None):
>     """
>     Order of precedence:
>     1) see if guess_content_type resolves to a mimetype for the
>        filename
>     2) if not use content_type as sent in the headers if
>        available
>     3) else use argument passed in
>     """
>     headers = getattr(file, 'headers', {})
>     content_type = headers.get('content-type', content_type)
>     if type(body) is not type(''):
>         body = body.data
>     name = getattr(file, 'filename', id)
>     content_type, enc = guess_content_type(name, body, content_type)
>     return content_type
> Does anyone have an opinion on this? Is the current behaviour
> completely intentional, maybe even according to some specification
> (and thus I should deal with it on the application level)? Should I
> file a collector issue?

-1 for using the "guessed" value over the one from the headers;  +1 for 
using the argument over the guessed value (so that the application can 
"fix" the problem).  I agree that having different clients supply 
different types is painful, but I don't think that "fixing" it at the 
low level is reasonable (mechanism vs. policy).

In summary, I would prefer the precedence to be:

   1. Passed value

   2. Request header

   3. Guessed value

