os.walk()

Max Erickson maxerickson at gmail.com
Thu Feb 17 10:06:31 EST 2005


<snip>

os.walk() is a generator. When you iterate over it, like in a for loop,
as
for r,ds,fs in os.walk(...):
r, ds and fs are set to new values at the beginning of each iteration.
If you want to end up with a list of files or dirs, rather than
processing them in the bodies of the file and dir for loops, you need
to keep a list of the files and dirs that os.walk gives you:

>>> import os
>>> dir_skip_list = ['sub2']
>>> file_skip_list = []
>>> keptfiles = list()
>>> keptdirs = list()
>>> for root, ds, fs in os.walk('c:\\bin\\gtest\\'):
	for f in fs:
		if f not in file_skip_list:
			keptfiles.append(f)
	for d in ds:
		if d in dir_skip_list:
			ds.remove(d)
		else:
			keptdirs.append(d)


>>> keptfiles
['P4064013.JPG', 'P4064015.JPG', 'Thumbs.db', 'P4064060.JPG',
'P4064061.JPG', 'Thumbs.db', 'PC030088.JPG', 'P4224133.JPG',
'Thumbs.db']
>>> keptdirs
['sub1', 'sub5', 'sub6']

There is something going on above that I don't quite understand, there
should be more directories, so if you can't get something working with
that, this gives you lists of files and dirs that you can then filter:

>>> keptfiles = list()
>>> keptdirs = list()
>>> for r, ds, fs in os.walk('c:\\bin\\gtest'):
	keptfiles.extend(fs)
	keptdirs.extend(ds)

>>> keptfiles
['P4064013.JPG', 'P4064015.JPG', 'Thumbs.db', 'P4064026.JPG',
'Thumbs.db', 'Thumbs.db', 'Thumbs.db', 'P4064034.JPG', 'Thumbs.db',
'P3123878.JPG', 'P4064065.JPG', 'Thumbs.db', 'P4064060.JPG',
'P4064061.JPG', 'Thumbs.db', 'PC030088.JPG', 'P4224133.JPG',
'Thumbs.db']
>>> keptdirs
['sub1', 'sub2', 'sub3', 'sub5', 'sub6', 'sub8', 'SubA', 'sub9',
'sub6']
>>> #filter away...

Hope this helps, 
max




More information about the Python-list mailing list