dirwalk.py generator version of os.path.walk

Tom Good Tom_Good1 at excite.com
Wed Feb 27 19:21:39 EST 2002


jimd at vega.starshine.org (Jim Dennis) wrote in message news:<a5d36e$1daf$1 at news.idiom.com>...
> This function could probably use a bit of polishing,
>  and it certainly could use some enhancement (some options to
>  control if, and how we follow symlinks, to how to handle 
>  exceptions on listdir(), whether to be depth first, and an 
>  option to avoid crossing mount boundaries with os.path.ismount(), 
>  etc).
> 
>  However, it seems to work.  
> 
>  dirwalk() simply takes an optional top level directory/path name
>  as an argument and instantiates a generator which will walk down
>  that tree and return every filename that it can access.  
> 
>  It's late and I need sleep.  So I'm just going to post this in
>  it's rough (and probably buggy) form and let y'all thrash on it
>  a bit.
> 
>  I guess there's some sort of statcache module that might let me
>  cache the stat() tuples.  I guess I'm implicitly incurring a stat()
>  system call for each node by checking islink() and isdir() on it
>  so it seems like I ought to cache that and make it available to 
>  my caller (without forcing them to make an additional stat system
>  call).
> 
>  I hope that something like this (a simple dirwalk() or other 
>  greatly simplified alternative to os.path.walk()) makes it into 
>  Python 2.3 or later.
> 
> #!/usr/bin/env python2.2
> from __future__ import generators 
> import os
> 
> def dirwalk(startdir=None):
> 	if not startdir:
> 		startdir="."
> 	if not os.path.isdir(startdir):
> 		raise ValueError ## Is this the right exception?
> 	stack = [startdir]
> 	while stack:
> 		cwd = stack.pop(0)
> 		try:
> 			current = os.listdir(cwd)
> 		except (OSError):
> 			continue	# Skip it if we don't have access
> 		for each in current:
> 			each = os.path.join(cwd,each)
> 			if os.path.islink(each): 
> 				pass
> 			elif os.path.isdir(each):
> 				stack.append(each)
> 			yield(each)
> 
> if __name__ == "__main__":
> 	# import unittest?
> 	# test suite should consist of:
> 	# 	dirwalk() vs. os.listdir()
> 	# 	dirwalk("/") vs. os.path.walk()
> 	# 	dirwalk("/etc/passwd") (should raise exception)
> 	import sys
> 	for i in sys.argv[1:]:
> 		for j in dirwalk(i):
> 			print j
> 	# should compare this to os.popen("find ....") and
> 	# or to os.path.walk(...)

Hi,

I wrote a different implementation of this general concept at:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/105873

You don't really need to keep a stack of directories and push/pop
things, because with generators you can recurse instead.


Tom



More information about the Python-list mailing list