Problem with zipfile and newlines

John Machin sjmachin at lexicon.net
Mon Mar 10 17:37:18 EDT 2008


On Mar 10, 11:14 pm, Duncan Booth <duncan.bo... at invalid.invalid>
wrote:
> "Neil Crighton" <neilcrigh... at gmail.com> wrote:
> > I'm using the zipfile library to read a zip file in Windows, and it
> > seems to be adding too many newlines to extracted files. I've found
> > that for extracted text-encoded files, removing all instances of '\r'
> > in the extracted file seems to fix the problem, but I can't find an
> > easy solution for binary files.
>
> > The code I'm using is something like:
>
> > from zipfile import Zipfile
> > z = Zipfile(open('zippedfile.zip'))
> > extractedfile = z.read('filename_in_zippedfile')
>
> > I'm using Python version 2.5.  Has anyone else had this problem
> > before, or know how to fix it?
>
> > Thanks,
>
> Zip files aren't text. Try opening the zipfile file in binary mode:
>
>    open('zippedfile.zip', 'rb')

Good pickup, but that indicates that the OP may have *TWO* problems,
the first of which is not posting the code that was actually executed.

If the OP actually executed the code that he posted, it is highly
likely to have died in a hole long before it got to the z.read()
stage, e.g.

>>> import zipfile
>>> z = zipfile.ZipFile(open('foo.zip'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\python25\lib\zipfile.py", line 346, in __init__
    self._GetContents()
  File "C:\python25\lib\zipfile.py", line 366, in _GetContents
    self._RealGetContents()
  File "C:\python25\lib\zipfile.py", line 404, in _RealGetContents
    centdir = struct.unpack(structCentralDir, centdir)
  File "C:\python25\lib\struct.py", line 87, in unpack
    return o.unpack(s)
struct.error: unpack requires a string argument of length 46
>>> z = zipfile.ZipFile(open('foo.zip', 'rb')) # OK
>>> z = zipfile.ZipFile('foo.zip', 'r') # OK

If it somehow made it through the open stage, it surely would have
blown up at the read stage, when trying to decompress a contained
file.

Cheers,
John



More information about the Python-list mailing list