Speeding up glob.glob?

Rob Hooft rob at hooft.net
Wed Mar 14 10:18:58 EST 2001


I have been playing around a bit with file name globbing, because 
I was getting performance problems in one of my programs that repeatedly
counts the number of files matching a certain pattern. In fact, I was
counting all files like "*.kcd", "*.kcd.Z", "*.kcd.bz2" and "*.kcd.gz".

To do that, I used to have a glob function in my "compress" module that
looked like this:

table={'.Z':'compress','.gz':'gzip','.bz2':'bzip2'}

def glob(pattern):
    """
    Same as glob.glob, but includes compressed files too.
    """
    import glob
    r=glob.glob(pattern)
    for ext in table.keys():
        r.extend(glob.glob(pattern+ext))
    return r

In a largish directory, this would take 0.3-0.5 seconds.
After some play, I ended up with:

def glob(pattern):
    """
    Same as glob.glob, but includes compressed files too.

    Contains optimization in case pattern does not contain a '/'
    """
    import fnmatch,re
    exp=fnmatch.translate(pattern)[:-1]+"("
    for ext in table.keys():
        exp=exp+"\\"+ext+"|"
    exp=exp[:-1]+")?$"
    pat=re.compile(exp)
    r=[]
    if os.sep in pattern:
        import glob
        fl=glob.glob(pattern+'*')
    else:
        fl=os.listdir('.')
    for filename in fl:
        if pat.match(filename):
            r.append(filename)
    return r

To my surprise, this version runs in 0.01-0.02 seconds in similar
largish directories.  I think a large fraction of this gain can also
be obtained in glob.glob itself. Currently the code reads:

-----------
        dirname, basename = os.path.split(pathname)
        if has_magic(dirname):
                list = glob(dirname)
        else:
                list = [dirname]
-----------

I think this could be something like:

-----------
        dirname, basename = os.path.split(pathname)
	if not dirname:
	        return glob1(os.curdir,basename)
        elif has_magic(dirname):
                list = glob(dirname)
        else:
                list = [dirname]
-----------

This saves lots of os.path.join's with empty directory paths.

Regards,

Rob Hooft
-- 
=====   rob at hooft.net          http://www.hooft.net/people/rob/  =====
=====   R&D, Nonius BV, Delft  http://www.nonius.nl/             =====
===== PGPid 0xFA19277D ========================== Use Linux! =========



More information about the Python-list mailing list