Calling a C program from a Python Script

Matt Gerrans matt.gerrans at hp.com
Thu Dec 9 17:06:13 EST 2004


"Brad Tilley" <bradtilley at gmail.com> wrote:
>>>I'm dealing with a terabyte of files. Perhaps I should have mentioned 
>>>that.

I wouldn't automatically assume that recursing the directories with a Python 
script that calls a C program for each file is faster than doing the 
processing in Python.   For example, I found that using zlib.crc32() 
directly in Python was no slower than calling a C program that calculates 
CRCs of files.   (for huge files, it was important to find the right size 
buffer to use and not try to read the whole thing at once, of course -- but 
the C program had to do the same thing).    However, if all the processing 
is done in Python code (instead of a C extension), there probably would be a 
big performance difference.    It is just a question of whether the overhead 
of starting a separate process for each file is more time consuming than the 
difference between the Python and C implementations.

The pure Python implementation is probably easier to write, so you can do it 
that way and you're have something that works.  *Then* if the performance is 
not acceptable, try the other route.

Additionally, depending on how much directory crawling you are doing, you 
can just do the whole darned thing in C and save another minute or so.

Anyway, I didn't see the simple answer to your question in this thread (that 
doesn't mean it wasn't there).   I think you could do something like this:

for root, files, dirs in os.walk(path)
     for f in files:
         try:
             os.system( "cprog %s" % (os.path.join(root,f) )

I prefer naming like this, though:

for directory, filenames, subdirs in os.walk(startpath):
   for filename in filenames:
      ...

(particularly, since "root" will not be the root directory, except maybe 
once).

- Matt






More information about the Python-list mailing list