[issue38449] regression - mimetypes guess_type is confused by ; in the filename

Abhilash Raj report at bugs.python.org
Fri Oct 11 17:26:30 EDT 2019


Abhilash Raj <raj.abhilash1 at gmail.com> added the comment:

The bug is interesting due to some of the implementation details of "guess_type". The documentation says that it can parse either a URL or a filename.

Switching from urllib.parse._splittype to urllib.parse.urlparse changed what a valid "path" is. _splittype doesn't care about the rest of the URL except the scheme, but urlparse does. Previously, we used to split things like:

   >>> print(urllib.parse._splittype(';1.tar.gz')
   (None, ';1.tar.gz')

Then, we'd just treat the 2nd part as a filesystem path, which would rightfully guess the extension as .tar.gz

However, switching to using parsing via urllib.parse.urlparse, we get:

    >>> print(urllib.parse.urlparse(';1.tar.gz')
    ParseResult(scheme='', netloc='', path='', params='1.tar.gz', query='', fragment='')

And then we get the ".path" attribute for further processing, which being empty, returns (None, None).

The format of all these parts is:

    scheme://netloc/path;parameters?query#fragment

A simple fix would be to just merge path, parameters, query and fragment together (with appropriate delimiters) and the proceed with further processing. That would fix parsing of Filesystem paths but would break (again) parsing of URLs like:

    >>> mimetypes.guess_type('http://example.com/index.html;1.tar.gz')
    ('application/x-tar', 'gzip')

It should return 'text/html' as the type, since this is a URL and everything after the ';' should not be used to determine the mimetype. But, if there is no scheme provided, we should treat it as a filesystem path and in that case 'application/x-tar' is the right type.

I hope I am not confusing everyone here. 

The right fix IMO would be to make "guess_type" not treat URLs and filesytem paths alike.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue38449>
_______________________________________


More information about the Python-bugs-list mailing list