Calling a C program from a Python Script
Matt Gerrans
matt.gerrans at hp.com
Thu Dec 9 17:06:13 EST 2004
"Brad Tilley" <bradtilley at gmail.com> wrote:
>>>I'm dealing with a terabyte of files. Perhaps I should have mentioned
>>>that.
I wouldn't automatically assume that recursing the directories with a Python
script that calls a C program for each file is faster than doing the
processing in Python. For example, I found that using zlib.crc32()
directly in Python was no slower than calling a C program that calculates
CRCs of files. (for huge files, it was important to find the right size
buffer to use and not try to read the whole thing at once, of course -- but
the C program had to do the same thing). However, if all the processing
is done in Python code (instead of a C extension), there probably would be a
big performance difference. It is just a question of whether the overhead
of starting a separate process for each file is more time consuming than the
difference between the Python and C implementations.
The pure Python implementation is probably easier to write, so you can do it
that way and you're have something that works. *Then* if the performance is
not acceptable, try the other route.
Additionally, depending on how much directory crawling you are doing, you
can just do the whole darned thing in C and save another minute or so.
Anyway, I didn't see the simple answer to your question in this thread (that
doesn't mean it wasn't there). I think you could do something like this:
for root, files, dirs in os.walk(path)
for f in files:
try:
os.system( "cprog %s" % (os.path.join(root,f) )
I prefer naming like this, though:
for directory, filenames, subdirs in os.walk(startpath):
for filename in filenames:
...
(particularly, since "root" will not be the root directory, except maybe
once).
- Matt
More information about the Python-list
mailing list