os.walk bug?

Adam DePrince adam at cognitcorp.com
Thu Dec 16 21:16:59 EST 2004


On Thu, 2004-12-16 at 14:45, Gabriel Cosentino de Barros wrote: 
> Hi
> 
> I'm new to the list. i found a bad behaviour of os.walk that i can
> reproduce 100% but didn't find an answer to why it does that
> 
> I have the folowing tree:
> 
> t:\dir1
> t:\dir1\2000
> t:\dir1\2001
> t:\dir1\content
> t:\dir2
> t:\dir2\2000
> t:\dir2\2001
> t:\dir2\2002
> t:\dir2\2003
> t:\dir2\2004
> t:\dir2\content
> t:\dir2\templates
> 
> what i wanted was to loop in each dir and skip the years (i don't want
> to touch archive). So i wrote:
> 
> for root, dirs, files in os.walk('t:\'):
>         # -- do stuff
>         print "working on", root                
>         # -- stuff done
> 
>         print 'DEBUG: dirs =', dirs
>         for d in dirs:
>                 # -- remove archive
>                 if d[:2] == '20':
>                         print "--- removing:", d
>                         dirs.remove(d)
> 
> And when i run it i got the following output
> working on t:\
> working on t:\dir1
> DEBUG: dirs = ['2000', '2001', 'content']
> --- removing 2000
> --- removing 2001
> working on t:\dir1\content
> working on t:\dir2
> DEBUG: dirs = ['2000', '2001', '2003', '2002', '2004', 'content']  <-
> why is this in this sequence?
> --- removing 2000
> --- removing 2003
> --- removing 2004
> working on t:\dir2\2001
> .
> .
> .
> 
> I am nuts?
> 
> running as os.walk('t:\dir2')  make it ignore the same dirs.
> i'm using python 2.3 on windows 2000.
> The t: drive is a webdav mount.

Each iteration of os.walk returns three parameters; one of those, dirs,
is a list of directories within the current directory pointed at by
root. 

Am I correct to assume that you beleve that by changing the contents of
dir you will affect how os.walk traverses the directories.  If so, that
is your problem.

When looping in a for loop, it is generally good programming practice to
assume that nothing you do within your loop will affect which
root,dir,files tuples have been set aside to loop over.  It is likely
that once os.walk was called, the items to be iterated over are
predestined (i.e. assume you are iterating over a tuple.) 
 
Root contains the current path being visited, dirs contains a list of
directories that are within the current directory and files contains a
list of files withing the current directory.  It would seem that the
correect way of writing what you are tring to write is as follows:

test.py contains:
for root, dirs, files in os.walk('t:\'):
	if root(-4:-2) == "20": continue 
	# Do stuff 

[adam at localhost help]$ python test.py
.
./dir1
./dir1/content
./dir2
./dir2/content
./dir2/templates

Oh, and my directory structure is ... 
# bash -c 'mkdir -p dir1/200{0,1} dir1/content \
dir2/200{0,1,2,3,4} dir2/{content,templates}'

Does this help?

- Adam

> 

Adam DePrince 





More information about the Python-list mailing list