Python vs. Ruby (and os.path.walk)

Peter Hansen peter at engcorp.com
Fri Aug 9 22:04:45 EDT 2002


Steven Atkinson wrote:

> Then I took out all screen IO and really sped the code up. I put 
> the IO back in and now the code is still acceptable in terms of 
> speed (though slightly slower than Ruby).
> 
> The original code was:
> -----------------------------
[snip]

> I changed it to:
> ------------------------
[snip]
> 
> The second is faster. They are both decent. Thanks again.
> 
> Signed plerplexed and embarrassed.

A few items:

1. The timing is rather dependent on the mix of directories
and files, and matching files.  The os.path.walk routine
runs isdir() on every name it finds, so that can take a lot
of time, even if there are few directories to search, if
there are a lot of files.

2. The second one is actually significantly faster than
the first one, in my particular system.  (I changed the
search pattern of course, and the directory, and I'm 
running under either Linux or Win98.)  In both cases
my search actually finds only a few files out of the 
many thousands that are there.  The difference in speed
is roughly 2x (because most of lister() is skipped 
most of the time in my case, so it's all from the walk()
routine.  YMWV...your mileage _will_ vary :-) )

3. Matt's comment about using the profiler is of course
the only right way to go about optimizing.  It's simpler
than you might think, if you haven't used it already:

  import profile
  profile.run('import walk2')

Normally you should try running the code a second time
to allow you to measure and factor in (or out) any extra
effects like hard drive caching.  In that case:

  import walk1
  import profile
  profile.run('reload(walk2)')

..is better.  It's also easier to just repeat the second
line over and over while you tweak the code in walk2.py
in a text editor, for example.

4. You probably want to insert a backslash in front of
the dots "." in the file extensions in the regular 
expression.  Otherwise you're matching on any character,
not on a period...

-Peter



More information about the Python-list mailing list