newb question: file searching
jaysherby at gmail.com
jaysherby at gmail.com
Tue Aug 8 21:45:03 EDT 2006
I do appreciate the advice, but I've got a 12 line function that does
all of that. And it works! I just wish I understood a particular line
of it.
def getFileList(*extensions):
import os
imageList = []
for dirpath, dirnames, files in os.walk('.'):
for filename in files:
name, ext = os.path.splitext(filename)
if ext.lower() in extensions and not filename.startswith('.'):
imageList.append(os.path.join(dirpath, filename))
for dirname in reversed(range(len(dirnames))):
if dirnames[dirname].startswith('.'):
del dirnames[dirname]
return imageList
print getFileList('.jpg', '.gif', '.png')
The line I don't understand is:
reversed(range(len(dirnames)))
Justin Azoff wrote:
> jaysherby at gmail.com wrote:
> > I've narrowed down the problem. All the problems start when I try to
> > eliminate the hidden files and directories. Is there a better way to
> > do this?
> >
>
> Well you almost have it, but your problem is that you are trying to do
> too many things in one function. (I bet I am starting to sound like a
> broken record :-)) The four distinct things you are doing are:
>
> * getting a list of all files in a tree
> * combining a files directory with its name to give the full path
> * ignoring hidden directories
> * matching files based on their extension
>
> If you split up each of those things into their own function you will
> end up with smaller easier to test pieces, and separate, reusable
> functions.
>
> The core function would be basically what you already have:
>
> def get_files(directory, include_hidden=False):
> """Return an expanded list of files for a directory tree
> optionally not ignoring hidden directories"""
> for path, dirs, files in os.walk(directory):
> for fn in files:
> full = os.path.join(path, fn)
> yield full
>
> if not include_hidden:
> remove_hidden(dirs)
>
> and remove_hidden is a short, but tricky function since the directory
> list needs to be edited in place:
>
> def remove_hidden(dirlist):
> """For a list containing directory names, remove
> any that start with a dot"""
>
> dirlist[:] = [d for d in dirlist if not d.startswith('.')]
>
> at this point, you can play with get_files on it's own, and test
> whether or not the include_hidden parameter works as expected.
>
> For the final step, I'd use an approach that pulls out the extension
> itself, and checks to see if it is in a list(or better, a set) of
> allowed filenames. globbing (*.foo) works as well, but if you are only
> ever matching on the extension, I believe this will work better.
>
> def get_files_by_ext(directory, ext_list, include_hidden=False):
> """Return an expanded list of files for a directory tree
> where the file ends with one of the extensions in ext_list"""
> ext_list = set(ext_list)
>
> for fn in get_files(directory, include_hidden):
> _, ext = os.path.splitext(fn)
> ext=ext[1:] #remove dot
> if ext.lower() in ext_list:
> yield fn
>
> notice at this point we still haven't said anything about images! The
> task of finding files by extension is pretty generic, so it shouldn't
> be concerned about the actual extensions.
>
> once that works, you can simply do
>
> def get_images(directory, include_hidden=False):
> image_exts = ('jpg','jpeg','gif','png','bmp')
> return get_files_by_ext(directory, image_exts, include_hidden)
>
> Hope this helps :-)
>
> --
> - Justin
More information about the Python-list
mailing list