[Tutor] bash find equivalent in Python

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Thu Jul 10 19:45:03 2003


On Thu, 10 Jul 2003 tpc@csua.berkeley.edu wrote:

> While this seems to work, it also seems wrong to have to guess how many
> levels deep my directory structure is and copy, paste and indent the code
> accordingly.  Is there a recursive find utility in Python similar to bash
> find ?  For example:
>
> find . -name '*.html'
>
> will return all html files from current working directory and on, along
> with their paths respective to the current working directory.  I looked
> at string module find in the Python documentation and shutils and did
> not find the analog.

Hi tpc,

Your wish is my command.  *grin*


Here's a function --- DFS() --- that'll let you march through each
subdirectory:

###
import os
import os.path

def DFS(root, skip_symlinks = 1):
    """Depth first search traversal of directory structure.  Children
    are visited in alphabetical order."""
    stack = [root]
    visited = {}
    while stack:
        d = stack.pop()
        if d not in visited:  ## just to prevent any possible recursive
                              ## loops
            visited[d] = 1
            yield d
        stack.extend(subdirs(d, skip_symlinks))


def subdirs(root, skip_symlinks = 1):
    """Given a root directory, returns the first-level subdirectories."""
    try:
        dirs = [os.path.join(root, x)
                for x in os.listdir(root)]
        dirs = filter(os.path.isdir, dirs)
        if skip_symlinks:
            dirs = filter(lambda x: not os.path.islink(x), dirs)
        dirs.sort()
        return dirs
    except OSError, IOError: return []
###


DFS() returns an iteratable object that marches through all the
subdirectories of a given root.  Your code for going through all the .html
files, then, might look something like:

###
    for subdir in DFS(root):
        htmls = glob.glob(os.path.join(subdir, "*.html"))
        ...
###


Another way to do something like 'find' is to use the os.path.walk()
utility function:

    http://www.python.org/doc/lib/module-os.path.html

but I find DFS() a little easier on the brain... *grin*


Hope this helps!