[Tutor] Re: glob or filter help
Kent Johnson
kent37 at tds.net
Sat Jan 22 15:19:50 CET 2005
Javier Ruere wrote:
> Jay Loden wrote:
>
>>I have the following code in my updates script (gets the five most recent
>>updated files on my site)
>>
>>def get_fles(exts, upd_dir):
>> '''return list of all the files matching any extensions in list exts'''
>> fle_list = []
>> for each in exts:
>> cmd = upd_dir + "*." + each
>> ext_ls = glob.glob(cmd)
>> fle_list = fle_list + ext_ls
>> return filter(notlink, fle_list)
>>
>>I wanted to just get one list, of all the .htm and .exe files in my upd_dir.
>>I was trying to make a far more elegant solution that what's above, that
>>could generate a list through a filter. Is there a way to trim the code down
>>to something that does ONE sort through the directory and picks up the .htm
>>and .exe files? (note, it is not necessary for this to recurse through
>>subdirectories in the upd_dir). I have cmd defined above because calling
>>"glob.glob(upd_dir + "*." + each) returned the error "cannot concatenate
>>string and list objects" - is this the only way around that, or is there a
>>better way?
Breaking out the expression and assigning it to a variable shouldn't make any difference. Are you
sure you didn't have something like this?
glob.glob(upd_dir + "*." + exts)
That would give the error message you cite.
In general if you have a question about an error, please post the exact error message including the
stack trace, it can be very helpful.
> If the filter criteria is complex, it deserves it's own function:
>
>
>>>>import os
>>>>import os.path
>>>>def get_files(exts, upd_dir):
>
> ... def criteria(filename):
> ... return filename.split('.')[-1] in exts \
> ... and not os.path.islink(filename)
> ... return filter(criteria, os.listdir(upd_dir))
> ...
>
>>>>get_files(('gz', 'conf'), '.')
>
> ['dynfun.pdf.gz', 'twander-3.160.tar.gz', 'PyCon2004DocTestUnit.pdf.gz',
> 'arg23.txt.gz', '.fonts.conf']
Javier's solution is good. You could also make a regular expression to look for the desired
extensions. Here is one way:
import os, re
def get_files(exts, upd_dir):
extnMatch = re.compile('(%s)$' % '|'.join(map(re.escape, exts)))
def criteria(filename):
return extnMatch.search(filename) \
and not os.path.islink(filename)
return filter(criteria, os.listdir(upd_dir))
print get_files(['.java', '.txt'], '.')
I better pick apart the extnMatch line a bit...what it does is build a regular expression that
matches any of the extensions if they occur at the end of the filename.
>>> exts = ['.java', '.txt']
First I use map() to apply re.escape() to each extension. This escapes any characters in the
extension that have special meaning in a regex; specifically the '.':
>>> e1 = map(re.escape, exts)
>>> e1
['\\.java', '\\.txt']
I could also have used a list comprehension
e1 = [ re.escape(ext) for ext in exts ]
but for applying a function like this I usually use map()
Next I join the individual extensions with '|'. In a regex this selects between alternatives.
>>> e2 = '|'.join(e1)
>>> e2
'\\.java|\\.txt'
Finally I put the alternatives in parentheses to group them and add a '$' at the end meaning 'match
the end of the string.
>>> e3 = '(%s)$' % e2
>>> e3
'(\\.java|\\.txt)$'
This solution is definitely harder to explain than Javier's :-)
Kent
More information about the Tutor
mailing list