Problem with zipfile and newlines

neilcrighton at gmail.com neilcrighton at gmail.com
Tue Mar 11 16:43:32 EDT 2008


Sorry my initial post was muddled. Let me try again.

I've got a zipped archive that I can extract files from with my
standard archive unzipping program, 7-zip. I'd like to extract the
files in python via the zipfile module.  However, when I extract the
file from the archive with ZipFile.read(), it isn't the same as the 7-
zip-extracted file. For text files, the zipfile-extracted version has
'\r\n' everywhere the 7-zip-extracted file only has '\n'. I haven't
tried comparing binary files via the two extraction methods yet.

Regarding the code I posted; I was writing it from memory, and made a
mistake. I didn't use:

z = zipfile.ZipFile(open('foo.zip', 'r'))

I used this:

z = zipfile.ZipFile('foo.zip')

But Duncan's comment was useful, as I generally only ever work with
text files, and I didn't realise you have to use 'rb' or 'wb' options
when reading and writing binary files.

To answer John's questions - I was calling '\r' a newline. I should
have said carriage return. I'm not sure what operating system the
original zip file was created on. I didn't fiddle with the extracted
file contents, other than replacing '\r' with ''.  I wrote out all the
files with open('outputfile','w') - I seems that I should have been
using 'wb' when writing out the binary files.

Thanks for the quick responses - any ideas why the zipfile-extracted
files and 7-zip-extracted files are different?

On Mar 10, 9:37 pm, John Machin <sjmac... at lexicon.net> wrote:
> On Mar 10, 11:14 pm, Duncan Booth <duncan.bo... at invalid.invalid>
> wrote:
>
>
>
> > "Neil Crighton" <neilcrigh... at gmail.com> wrote:
> > > I'm using the zipfile library to read a zip file in Windows, and it
> > > seems to be adding too many newlines to extracted files. I've found
> > > that for extracted text-encoded files, removing all instances of '\r'
> > > in the extracted file seems to fix the problem, but I can't find an
> > > easy solution for binary files.
>
> > > The code I'm using is something like:
>
> > > from zipfile import Zipfile
> > > z = Zipfile(open('zippedfile.zip'))
> > > extractedfile = z.read('filename_in_zippedfile')
>
> > > I'm using Python version 2.5.  Has anyone else had this problem
> > > before, or know how to fix it?
>
> > > Thanks,
>
> > Zip files aren't text. Try opening the zipfile file in binary mode:
>
> >    open('zippedfile.zip', 'rb')
>
> Good pickup, but that indicates that the OP may have *TWO* problems,
> the first of which is not posting the code that was actually executed.
>
> If the OP actually executed the code that he posted, it is highly
> likely to have died in a hole long before it got to the z.read()
> stage, e.g.
>
> >>> import zipfile
> >>> z = zipfile.ZipFile(open('foo.zip'))
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "C:\python25\lib\zipfile.py", line 346, in __init__
>     self._GetContents()
>   File "C:\python25\lib\zipfile.py", line 366, in _GetContents
>     self._RealGetContents()
>   File "C:\python25\lib\zipfile.py", line 404, in _RealGetContents
>     centdir = struct.unpack(structCentralDir, centdir)
>   File "C:\python25\lib\struct.py", line 87, in unpack
>     return o.unpack(s)
> struct.error: unpack requires a string argument of length 46
>
> >>> z = zipfile.ZipFile(open('foo.zip', 'rb')) # OK
> >>> z = zipfile.ZipFile('foo.zip', 'r') # OK
>
> If it somehow made it through the open stage, it surely would have
> blown up at the read stage, when trying to decompress a contained
> file.
>
> Cheers,
> John




More information about the Python-list mailing list