iglob performance no better than glob

Sun Jan 31 11:22:42 EST 2010

I have a dir with a large # of files that I need to perform operations
on, but only needing to access a subset of the files, i.e. the first
100 files.

Using glob is very slow, so I ran across iglob, which returns an
iterator, which seemed just like what I wanted. I could iterate over
the files that I wanted, not having to read the entire dir.

So the iglob was faster, but accessing the first file took about the
same time as glob.glob.

Here's some code to compare glob vs. iglob performance,  it outputs
the time before/after a glob.iglob('*.*') files.next() sequence and a
glob.glob('*.*') sequence.

#!/usr/bin/env python

import glob,time
print '\nTest of glob.iglob'
print 'before       iglob:', time.asctime()
files = glob.iglob('*.*')
print 'after        iglob:',time.asctime()
print files.next()
print 'after files.next():', time.asctime()

print '\nTest of glob.glob'
print 'before        glob:', time.asctime()
files = glob.glob('*.*')
print 'after         glob:',time.asctime()

Here are the results:

Test of glob.iglob
before       iglob: Sun Jan 31 11:09:08 2010
after        iglob: Sun Jan 31 11:09:08 2010
foo.bar
after files.next(): Sun Jan 31 11:09:59 2010

Test of glob.glob
before        glob: Sun Jan 31 11:09:59 2010
after         glob: Sun Jan 31 11:10:51 2010

The results are about the same for the 2 approaches, both took about
51 seconds. Am I doing something wrong with iglob?

Is there a way to get the first X # of files from a dir with lots of
files, that does not take a long time to run?

thanx, mark