subprocess problem on WinXP

Simon Forman rogue_pedro at yahoo.com
Wed Jul 26 14:37:48 EDT 2006


Wolfgang wrote:
> Hi Simon,
>
> I did not know that library! I'm still new to python and I still have
> problems to find the right commands.

Welcome. : )  Python comes with "batteries included".  I'm always
finding cool new modules myself, and I've been using it for years. In
fact, I didn't notice the bz2 module until about a week ago.

Browse the standard library docs for fun:
http://docs.python.org/lib/lib.html  there's all kinds of cool stuff in
there.  Whenever you say to yourself, "Hmm, somebody must have had this
problem before," reach for the standard library.  The solution's likely
already in there.

>
> But I suppose this library is mainly for partially
> compressing/decompressing of files. How can I use that library to
> compress/decompress full files without reading them into memory? And
> what about performance?

Read the docs.  There seems to be api for (de)compressing both
"streams" of data and whole files.

I don't know about performance, as I've never tried to use the module
before, but I would bet that it's good.  It almost certainly uses the
same bzip2 library as the bzip2 program itself and it avoids the
overhead of creating a new process for each file.

But if you're in doubt (and performance really matters for this
application) test and measure it.

I think your script could be rewritten as follows with good speed and
memory performance, but I haven't tested it (and the output filepaths
may not be what you want...):

import os
import bz2

dir_ = r"g:\messtech"


for root, dirs, files in os.walk(dir_):
    for file_ in files:
        f = os.path.join(root, file_)
        bzf = os.path.join(f, '.bz2')

        F = open(f)
        BZF = BZ2File(bzf, 'w')

        try:
            for line in F: BZF.write(line)
        finally:
            F.close()
            BZF.close()


Also, note that I changed 'dir' and 'file' to 'dir_' and 'file_'.  Both
dir and file are python built-ins, so you shouldn't reuse those names
for your variables.


Peace,
~Simon




More information about the Python-list mailing list