[issue42096] zipfile.is_zipfile incorrectly identifying a gzipped file as a zip archive

Gregory P. Smith report at bugs.python.org
Sat Oct 24 15:05:16 EDT 2020


Gregory P. Smith <greg at krypto.org> added the comment:

for what it's worth: false positives are always going to be possible in any such "magic" check as is_zipfile is.

we don't check the start of the file because zip files are defined by their end of file central directory which contains length information to determine where within the file the zip archive actually starts.

The issue28494 tests are a demonstration of this; It is somewhat common practice to append a zipfile to an executable of various forms for use as application specific data.

If you need more more reliable determination of file type not tied to a specific Python release, you might look at what the various file type sniffing magic libraries do for you, some examples include:
 https://pypi.org/project/filetype/
 https://pypi.org/project/puremagic/
 https://pypi.org/project/python-magic/

I _can_ reproduce this issue with the testdata @bckohan provided.

But I can't promise there is anything to fix here.  Even if we make the test slightly more robust by looking at another byte or two, it is always possible for files to appear to be a bunch of things at once based on small data signatures.

If nothing else we should reinforce in the documentation that is_zipfile is at best a guess.  False means it is not as far as the zipfile module is concerned.  True cannot guarantee that it is.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue42096>
_______________________________________


More information about the Python-bugs-list mailing list