read xml file from compressed file using gzip

John Machin sjmachin at lexicon.net
Sun Jun 10 05:43:40 EDT 2007


On 10/06/2007 3:06 PM, flebber wrote:
> On Jun 10, 3:45 am, Stefan Behnel <stefan.behnel-n05... at web.de> wrote:
>> flebber wrote:
>>> I was working at creating a simple program that would read the content
>>> of a playlist file( in this case *.k3b") and write it out . the
>>> compressed "*.k3b" file has two file and the one I was trying to read
>>> was maindata.xml
>> The k3b format is a ZIP archive. Use the zipfile library:
>>
>> file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html
>>
>> Stefan
> 
> Thanks for all the help, have been using the docs at python.org and
> the magnus t Hetland book. Is there any docs tha re a little more
> practical or expressive as most of the module documentation is very
> confusing for a beginner and doesn't provide much in the way of
> examples on how to use the modules.
> 
> Not criticizing the docs as they are probably very good for
> experienced programmers.
> 


Somebody else has already drawn your attention to the/a tutorial. You 
need to read, understand, and work through a *good* introductory book or 
tutorial before jumping into the deep end.

 > class GzipFile([playlist_file[decompress[9, 'rb']]]);

Errr, no, the [] are a documentation device used in most computer 
language documentation to denote optional elements -- you don't type 
them into your program. See below.

Secondly as Stefan pointed out, your file is a ZIP file (not a gzipped 
file), they're quite different animals, so you need the zipfile module, 
not the gzip module.


 > os.system(open("/home/flebber/tmp/maindata.xml"));

The manuals say quite simply and clearly that:
open() returns a file object
os.system's arg is a string (a command, like "grep -i fubar *.pl")
So that's guaranteed not to work.

 From the docs of the zipfile module:
"""
class ZipFile( file[, mode[, compression[, allowZip64]]])

Open a ZIP file, where file can be either a path to a file (a string) or 
a file-like object. The mode parameter should be 'r' to read an existing 
file, 'w' to truncate and write a new file,
or 'a' to append to an existing file.
"""
... and you don't care about the rest of the class docs in your simple 
case of reading.

A class has to be called like a function to give you an object which is 
an instance of that class. You need only the first argument; the second 
has about a 99.999% chance of defaulting to 'r' if omitted, but we'll 
play it safe and explicit:

import zipfile
zf = zipfile.ZipFile('/home/flebber/oddalt.k3b', 'r')

OK, some more useful docs:
"""
namelist( )
   Return a list of archive members by name.
printdir( )
   Print a table of contents for the archive to sys.stdout.
read( name)
     Return the bytes of the file in the archive. The archive must be 
open for read or append.
"""

So give the following a try:

print zf.namelist()
zf.printdir()
xml_string = zf.read('maindata.xml')
zf.close()

# xml_string will be a string which may or may not have line endings in 
it ...
print len(xml_string)

# If you can't imagine what the next two lines will do,
# you'll have to do it once, just to see what happens:
for line in xml_string:
    print line

# Wasn't that fun? How big was that file? Now do this:
lines = xml_text.splitlines()
print len(lines) # number of lines
print len(lines[0]) # length of first line

# Ummm, maybe if it's only one line you don't want to do this either,
# but what the heck:
for line in lines:
     print line

HTH,
John



More information about the Python-list mailing list