newb question: file searching

jaysherby at gmail.com jaysherby at gmail.com
Tue Aug 8 21:45:03 EDT 2006


I do appreciate the advice, but I've got a 12 line function that does
all of that.  And it works!  I just wish I understood a particular line
of it.

def getFileList(*extensions):
    import os
    imageList = []
    for dirpath, dirnames, files in os.walk('.'):
	    for filename in files:
		    name, ext = os.path.splitext(filename)
		    if ext.lower() in extensions and not filename.startswith('.'):
			    imageList.append(os.path.join(dirpath, filename))
	    for dirname in reversed(range(len(dirnames))):
		    if dirnames[dirname].startswith('.'):
			    del dirnames[dirname]

    return imageList

print getFileList('.jpg', '.gif', '.png')

The line I don't understand is:
reversed(range(len(dirnames)))


Justin  Azoff wrote:
> jaysherby at gmail.com wrote:
> > I've narrowed down the problem.  All the problems start when I try to
> > eliminate the hidden files and directories.  Is there a better way to
> > do this?
> >
>
> Well you almost have it, but your problem is that you are trying to do
> too many things in one function.  (I bet I am starting to sound like a
> broken record :-))  The four distinct things you are doing are:
>
> * getting a list of all files in a tree
> * combining a files directory with its name to give the full path
> * ignoring hidden directories
> * matching files based on their extension
>
> If you split up each of those things into their own function you will
> end up with smaller easier to test pieces, and separate, reusable
> functions.
>
> The core function would be basically what you already have:
>
> def get_files(directory, include_hidden=False):
>     """Return an expanded list of files for a directory tree
>        optionally not ignoring hidden directories"""
>     for path, dirs, files in os.walk(directory):
>         for fn in files:
>             full = os.path.join(path, fn)
>             yield full
>
>         if not include_hidden:
>             remove_hidden(dirs)
>
> and remove_hidden is a short, but tricky function since the directory
> list needs to be edited in place:
>
> def remove_hidden(dirlist):
>     """For a list containing directory names, remove
>        any that start with a dot"""
>
>     dirlist[:] = [d for d in dirlist if not d.startswith('.')]
>
> at this point, you can play with get_files on it's own, and test
> whether or not the include_hidden parameter works as expected.
>
> For the final step, I'd use an approach that pulls out the extension
> itself, and checks to see if it is in a list(or better, a set) of
> allowed filenames.  globbing (*.foo) works as well, but if you are only
> ever matching on the extension, I believe this will work better.
>
> def get_files_by_ext(directory, ext_list, include_hidden=False):
>     """Return an expanded list of files for a directory tree
>        where the file ends with one of the extensions in ext_list"""
>     ext_list = set(ext_list)
>
>     for fn in get_files(directory, include_hidden):
>         _, ext = os.path.splitext(fn)
>         ext=ext[1:] #remove dot
>         if ext.lower() in ext_list:
>             yield fn
>
> notice at this point we still haven't said anything about images!  The
> task of finding files by extension is pretty generic, so it shouldn't
> be concerned about the actual extensions.
>
> once that works, you can simply do
>
> def get_images(directory, include_hidden=False):
>     image_exts = ('jpg','jpeg','gif','png','bmp')
>     return get_files_by_ext(directory, image_exts, include_hidden)
> 
> Hope this helps :-)
> 
> --
> - Justin




More information about the Python-list mailing list