newb question: file searching

Justin Azoff justin.azoff at gmail.com
Tue Aug 8 21:34:02 EDT 2006


jaysherby at gmail.com wrote:
> I've narrowed down the problem.  All the problems start when I try to
> eliminate the hidden files and directories.  Is there a better way to
> do this?
>

Well you almost have it, but your problem is that you are trying to do
too many things in one function.  (I bet I am starting to sound like a
broken record :-))  The four distinct things you are doing are:

* getting a list of all files in a tree
* combining a files directory with its name to give the full path
* ignoring hidden directories
* matching files based on their extension

If you split up each of those things into their own function you will
end up with smaller easier to test pieces, and separate, reusable
functions.

The core function would be basically what you already have:

def get_files(directory, include_hidden=False):
    """Return an expanded list of files for a directory tree
       optionally not ignoring hidden directories"""
    for path, dirs, files in os.walk(directory):
        for fn in files:
            full = os.path.join(path, fn)
            yield full

        if not include_hidden:
            remove_hidden(dirs)

and remove_hidden is a short, but tricky function since the directory
list needs to be edited in place:

def remove_hidden(dirlist):
    """For a list containing directory names, remove
       any that start with a dot"""

    dirlist[:] = [d for d in dirlist if not d.startswith('.')]

at this point, you can play with get_files on it's own, and test
whether or not the include_hidden parameter works as expected.

For the final step, I'd use an approach that pulls out the extension
itself, and checks to see if it is in a list(or better, a set) of
allowed filenames.  globbing (*.foo) works as well, but if you are only
ever matching on the extension, I believe this will work better.

def get_files_by_ext(directory, ext_list, include_hidden=False):
    """Return an expanded list of files for a directory tree
       where the file ends with one of the extensions in ext_list"""
    ext_list = set(ext_list)

    for fn in get_files(directory, include_hidden):
        _, ext = os.path.splitext(fn)
        ext=ext[1:] #remove dot
        if ext.lower() in ext_list:
            yield fn

notice at this point we still haven't said anything about images!  The
task of finding files by extension is pretty generic, so it shouldn't
be concerned about the actual extensions.

once that works, you can simply do

def get_images(directory, include_hidden=False):
    image_exts = ('jpg','jpeg','gif','png','bmp')
    return get_files_by_ext(directory, image_exts, include_hidden)

Hope this helps :-)

--
- Justin




More information about the Python-list mailing list