parsing directory for certain filetypes

Tim Chase python.list at tim.thechases.com
Mon Mar 10 11:03:59 EDT 2008


> i wrote a function to parse a given directory and make a sorted list
> of  files with .txt,.doc extensions .it works,but i want to know if it
> is too bloated..can this be rewritten in more efficient manner?
> 
> here it is...
> 
> from string import split
> from os.path import isdir,join,normpath
> from os import listdir
> 
> def parsefolder(dirname):
>     filenms=[]
>     folder=dirname
>     isadr=isdir(folder)
>     if (isadr):
>         dirlist=listdir(folder)
>         filenm=""
>         for x in dirlist:
>              filenm=x
> 	     if(filenm.endswith(("txt","doc"))):
>                  nmparts=[]
> 		 nmparts=split(filenm,'.' )
>                  if((nmparts[1]=='txt') or (nmparts[1]=='doc')):
>                       filenms.append(filenm)
>         filenms.sort()
>         filenameslist=[]
>         filenameslist=[normpath(join(folder,y)) for y in filenms]
> 	numifiles=len(filenameslist)
> 	print filenameslist
> 	return filenameslist
> 
> 
> folder='F:/mysys/code/tstfolder'
> parsefolder(folder)

It seems to me that this is awfully baroque with many unneeded 
superfluous variables.  Is this not the same functionality (minus 
prints, unused result-counting, NOPs, and belt-and-suspenders 
extension-checking) as

   def parsefolder(dirname):
     if not isdir(dirname): return
     return sorted([
       normpath(join(dirname, fname))
       for fname in listdir(dirname)
       if fname.lower().endswith('.txt')
       or fname.lower().endswith('.doc')
       ])

In Python2.5 (or 2.4 if you implement the any() function, ripped 
from the docs[1]), this could be rewritten to be a little more 
flexible...something like this (untested):

   def parsefolder(dirname, types=['.doc', '.txt']):
     if not isdir(dirname): return
     return sorted([
       normpath(join(dirname, fname))
       for fname in listdir(dirname)
       if any(
         fname.lower().endswith(s)
         for s in types)
       ])

which would allow you to do both

   parsefolder('/path/to/wherever/')

and

   parsefolder('/path/to/wherever/', ['.xls', '.ppt', '.htm'])

In both cases, you don't define the case where isdir(dirname) 
fails.  Caveat Implementor.

-tkc


[1] http://docs.python.org/lib/built-in-funcs.html







More information about the Python-list mailing list