[New-bugs-announce] [issue42369] Reading ZipFile not thread-safe

Thomas report at bugs.python.org
Mon Nov 16 06:07:51 EST 2020


New submission from Thomas <thger100 at hhu.de>:

According to https://docs.python.org/3.5/whatsnew/changelog.html#id108 bpo-14099, reading multiple ZipExtFiles should be thread-safe, but it is not.

I created a small example where two threads try to read files from the same ZipFile simultaneously, which crashes with a Bad CRC-32 error. This is especially surprising since all files in the ZipFile only contain 0-bytes and have the same CRC.

My use case is a ZipFile with 82000 files. Creating multiple ZipFiles from the same "physical" zip file is not a satisfactory workaround because it takes several seconds each time. Instead, I open it only once and clone it for each thread:

with zipfile.ZipFile("/tmp/dummy.zip", "w") as dummy:
    pass

def clone_zipfile(z):
    z_cloned = zipfile.ZipFile("/tmp/dummy.zip")
    z_cloned.NameToInfo = z.NameToInfo
    z_cloned.fp = open(z.fp.name, "rb")
    return z_cloned

This is a much better solution for my use case than locking. I am using multiple threads because I want to finish my task faster, but locking defeats that purpose.

However, this cloning is somewhat of a dirty hack and will break when the file is not a real file but rather a file-like object.

Unfortunately, I do not have a solution for the general case.

----------
files: test.py
messages: 381090
nosy: Thomas
priority: normal
severity: normal
status: open
title: Reading ZipFile not thread-safe
versions: Python 3.7, Python 3.8
Added file: https://bugs.python.org/file49601/test.py

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue42369>
_______________________________________


More information about the New-bugs-announce mailing list