dirwalk.py generator version of os.path.walk

Jim Dennis jimd at vega.starshine.org
Thu Feb 28 05:00:41 EST 2002


In article <ac677656.0202271621.5a134d44 at posting.google.com>, Tom Good wrote:

>jimd at vega.starshine.org (Jim Dennis) wrote in message news:<a5d36e$1daf$1 at news.idiom.com>...
>> This function could probably use a bit of polishing,
>>  and it certainly could use some enhancement (some options to
>>  control if, and how we follow symlinks, to how to handle 
>>  exceptions on listdir(), whether to be depth first, and an 
>>  option to avoid crossing mount boundaries with os.path.ismount(), 
>>  etc).

>>  However, it seems to work.  

>>  dirwalk() simply takes an optional top level directory/path name
>>  as an argument and instantiates a generator which will walk down
>>  that tree and return every filename that it can access.  

>>  It's late and I need sleep.  So I'm just going to post this in
>>  it's rough (and probably buggy) form and let y'all thrash on it
>>  a bit.

>>  I guess there's some sort of statcache module that might let me
>>  cache the stat() tuples.  I guess I'm implicitly incurring a stat()
>>  system call for each node by checking islink() and isdir() on it
>>  so it seems like I ought to cache that and make it available to 
>>  my caller (without forcing them to make an additional stat system
>>  call).

>>  I hope that something like this (a simple dirwalk() or other 
>>  greatly simplified alternative to os.path.walk()) makes it into 
>>  Python 2.3 or later.

>> #!/usr/bin/env python2.2
>> from __future__ import generators 
>> import os

>> def dirwalk(startdir=None):
>> 	if not startdir:
>> 		startdir="."
>> 	if not os.path.isdir(startdir):
>> 		raise ValueError ## Is this the right exception?
>> 	stack = [startdir]
>> 	while stack:
>> 		cwd = stack.pop(0)
>> 		try:
>> 			current = os.listdir(cwd)
>> 		except (OSError):
>> 			continue	# Skip it if we don't have access
>> 		for each in current:
>> 			each = os.path.join(cwd,each)
>> 			if os.path.islink(each): 
>> 				pass
>> 			elif os.path.isdir(each):
>> 				stack.append(each)
>> 			yield(each)

>> if __name__ == "__main__":
>> 	# import unittest?
>> 	# test suite should consist of:
>> 	# 	dirwalk() vs. os.listdir()
>> 	# 	dirwalk("/") vs. os.path.walk()
>> 	# 	dirwalk("/etc/passwd") (should raise exception)
>> 	import sys
>> 	for i in sys.argv[1:]:
>> 		for j in dirwalk(i):
>> 			print j
>> 	# should compare this to os.popen("find ....") and
>> 	# or to os.path.walk(...)

>Hi,

> I wrote a different implementation of this general concept at:

> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/105873

> You don't really need to keep a stack of directories and push/pop
> things, because with generators you can recurse instead.

>Tom

 But recursion is likely to cost more.  The only state I need
 to keep is my current "todo" list of directories.  A recursion 
 would store functional state (unless Python supported tail-end
 recursion).  So the append/pop (total cost, 3 lines of code)
 seems like the lightest weight way to do this.



More information about the Python-list mailing list