read xml file from compressed file using gzip

flebber flebber.crue at gmail.com
Sun Jun 10 06:08:02 EDT 2007


On Jun 10, 7:43 pm, John Machin <sjmac... at lexicon.net> wrote:
> On 10/06/2007 3:06 PM, flebber wrote:
>
>
>
> > On Jun 10, 3:45 am, Stefan Behnel <stefan.behnel-n05... at web.de> wrote:
> >> flebber wrote:
> >>> I was working at creating a simple program that would read the content
> >>> of a playlist file( in this case *.k3b") and write it out . the
> >>> compressed "*.k3b" file has two file and the one I was trying to read
> >>> was maindata.xml
> >> The k3b format is a ZIP archive. Use the zipfile library:
>
> >> file:///usr/share/doc/python2.5-doc/html/lib/module-zipfile.html
>
> >> Stefan
>
> > Thanks for all the help, have been using the docs at python.org and
> > the magnus t Hetland book. Is there any docs tha re a little more
> > practical or expressive as most of the module documentation is very
> > confusing for a beginner and doesn't provide much in the way of
> > examples on how to use the modules.
>
> > Not criticizing the docs as they are probably very good for
> > experienced programmers.
>
> Somebody else has already drawn your attention to the/a tutorial. You
> need to read, understand, and work through a *good* introductory book or
> tutorial before jumping into the deep end.
>
>  > class GzipFile([playlist_file[decompress[9, 'rb']]]);
>
> Errr, no, the [] are a documentation device used in most computer
> language documentation to denote optional elements -- you don't type
> them into your program. See below.
>
> Secondly as Stefan pointed out, your file is a ZIP file (not a gzipped
> file), they're quite different animals, so you need the zipfile module,
> not the gzip module.
>
>  > os.system(open("/home/flebber/tmp/maindata.xml"));
>
> The manuals say quite simply and clearly that:
> open() returns a file object
> os.system's arg is a string (a command, like "grep -i fubar *.pl")
> So that's guaranteed not to work.
>
>  From the docs of the zipfile module:
> """
> class ZipFile( file[, mode[, compression[, allowZip64]]])
>
> Open a ZIP file, where file can be either a path to a file (a string) or
> a file-like object. The mode parameter should be 'r' to read an existing
> file, 'w' to truncate and write a new file,
> or 'a' to append to an existing file.
> """
> ... and you don't care about the rest of the class docs in your simple
> case of reading.
>
> A class has to be called like a function to give you an object which is
> an instance of that class. You need only the first argument; the second
> has about a 99.999% chance of defaulting to 'r' if omitted, but we'll
> play it safe and explicit:
>
> import zipfile
> zf = zipfile.ZipFile('/home/flebber/oddalt.k3b', 'r')
>
> OK, some more useful docs:
> """
> namelist( )
>    Return a list of archive members by name.
> printdir( )
>    Print a table of contents for the archive to sys.stdout.
> read( name)
>      Return the bytes of the file in the archive. The archive must be
> open for read or append.
> """
>
> So give the following a try:
>
> print zf.namelist()
> zf.printdir()
> xml_string = zf.read('maindata.xml')
> zf.close()
>
> # xml_string will be a string which may or may not have line endings in
> it ...
> print len(xml_string)
>
> # If you can't imagine what the next two lines will do,
> # you'll have to do it once, just to see what happens:
> for line in xml_string:
>     print line
>
> # Wasn't that fun? How big was that file? Now do this:
> lines = xml_text.splitlines()
> print len(lines) # number of lines
> print len(lines[0]) # length of first line
>
> # Ummm, maybe if it's only one line you don't want to do this either,
> # but what the heck:
> for line in lines:
>      print line
>
> HTH,
> John

Thanks that was so helpful to see how to do it. I have read a lot but
it wasn't sinking in, and sometimes its better to learn by doing. Some
of the books I have read just seem to go from theory to theory with
the occasional example ( which is meant to show us how good the author
is rather than help us).

For the record

>>> ## working on region in file /usr/tmp/python-F_C5sr.py...
['mimetype', 'maindata.xml']
File Name
Modified             Size
mimetype                                       2007-05-27
20:36:20           17
maindata.xml                                   2007-05-27
20:36:20        10795
>>> print len(xml_string)
10795
>>> for line in xml_string:
   print line
... ...
<
?
x
m
l

v
e
r
s
i.....(etc ...it went for a while)

and

>>> lines = xml_string.splitlines()
>>> print len(lines)
387
>>> print len(lines[0])
38
>>> for line in lines:
... print line
  File "<stdin>", line 2
    print line
        ^
IndentationError: expected an indented block
>>> for line in lines:
    print line




More information about the Python-list mailing list