Speeding up glob.glob?
Rob Hooft
rob at hooft.net
Wed Mar 14 10:18:58 EST 2001
I have been playing around a bit with file name globbing, because
I was getting performance problems in one of my programs that repeatedly
counts the number of files matching a certain pattern. In fact, I was
counting all files like "*.kcd", "*.kcd.Z", "*.kcd.bz2" and "*.kcd.gz".
To do that, I used to have a glob function in my "compress" module that
looked like this:
table={'.Z':'compress','.gz':'gzip','.bz2':'bzip2'}
def glob(pattern):
"""
Same as glob.glob, but includes compressed files too.
"""
import glob
r=glob.glob(pattern)
for ext in table.keys():
r.extend(glob.glob(pattern+ext))
return r
In a largish directory, this would take 0.3-0.5 seconds.
After some play, I ended up with:
def glob(pattern):
"""
Same as glob.glob, but includes compressed files too.
Contains optimization in case pattern does not contain a '/'
"""
import fnmatch,re
exp=fnmatch.translate(pattern)[:-1]+"("
for ext in table.keys():
exp=exp+"\\"+ext+"|"
exp=exp[:-1]+")?$"
pat=re.compile(exp)
r=[]
if os.sep in pattern:
import glob
fl=glob.glob(pattern+'*')
else:
fl=os.listdir('.')
for filename in fl:
if pat.match(filename):
r.append(filename)
return r
To my surprise, this version runs in 0.01-0.02 seconds in similar
largish directories. I think a large fraction of this gain can also
be obtained in glob.glob itself. Currently the code reads:
-----------
dirname, basename = os.path.split(pathname)
if has_magic(dirname):
list = glob(dirname)
else:
list = [dirname]
-----------
I think this could be something like:
-----------
dirname, basename = os.path.split(pathname)
if not dirname:
return glob1(os.curdir,basename)
elif has_magic(dirname):
list = glob(dirname)
else:
list = [dirname]
-----------
This saves lots of os.path.join's with empty directory paths.
Regards,
Rob Hooft
--
===== rob at hooft.net http://www.hooft.net/people/rob/ =====
===== R&D, Nonius BV, Delft http://www.nonius.nl/ =====
===== PGPid 0xFA19277D ========================== Use Linux! =========
More information about the Python-list
mailing list