[issue28494] is_zipfile false positives

Serhiy Storchaka report at bugs.python.org
Sun Nov 27 05:29:35 EST 2016


Serhiy Storchaka added the comment:

No, checking the first bytes of the file is not appropriate option. zipfile should support the Python zip application format [1].

I see two options:

1. Make is_zipfile() more strict that the ZipFile constructor. The later supports ZIP files with a data past the comment or with truncated comments, but the former should reject them.

2. Make both is_zipfile() and the ZipFile constructor more robust. They should check not just the EOCD signature, but check the Zip64 end of central directory record (if exists) and the first central file header signature (if the ZIP file is not empty).

It may be that PDF files contain PK\005\006 not accidentally, but because they contain embedded ZIP files (I don't know if this is a case). In that circumstances is_zipfile() returning True is correct.

[1] https://docs.python.org/3/library/zipapp.html

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue28494>
_______________________________________


More information about the Python-bugs-list mailing list