[Tutor] Re: glob or filter help

Kent Johnson kent37 at tds.net
Sat Jan 22 15:19:50 CET 2005


Javier Ruere wrote:
> Jay Loden wrote:
> 
>>I have the following code in my updates script (gets the five most recent 
>>updated files on my site)
>>
>>def get_fles(exts, upd_dir):
>> '''return list of all the files matching any extensions in list exts'''
>> fle_list = [] 
>> for each in exts:
>>  cmd = upd_dir + "*." + each
>>  ext_ls = glob.glob(cmd)
>>  fle_list = fle_list + ext_ls
>> return filter(notlink, fle_list)
>>
>>I wanted to just get one list, of all the .htm and .exe files in my upd_dir.  
>>I was trying to make a far more elegant solution that what's above, that 
>>could generate a list through a filter.  Is there a way to trim the code down 
>>to something that does ONE sort through the directory and picks up the .htm 
>>and .exe files? (note, it is not necessary for this to recurse through 
>>subdirectories in the upd_dir). I have cmd defined above because calling
>>"glob.glob(upd_dir + "*." + each) returned the error "cannot concatenate 
>>string and list objects" - is this the only way around that, or is there a 
>>better way?

Breaking out the expression and assigning it to a variable shouldn't make any difference. Are you 
sure you didn't have something like this?
   glob.glob(upd_dir + "*." + exts)

That would give the error message you cite.

In general if you have a question about an error, please post the exact error message including the 
stack trace, it can be very helpful.

>   If the filter criteria is complex, it deserves it's own function:
> 
> 
>>>>import os
>>>>import os.path
>>>>def get_files(exts, upd_dir):
> 
> ...     def criteria(filename):
> ...             return filename.split('.')[-1] in exts \
> ...                     and not os.path.islink(filename)
> ...     return filter(criteria, os.listdir(upd_dir))
> ...
> 
>>>>get_files(('gz', 'conf'), '.')
> 
> ['dynfun.pdf.gz', 'twander-3.160.tar.gz', 'PyCon2004DocTestUnit.pdf.gz',
> 'arg23.txt.gz', '.fonts.conf']

Javier's solution is good. You could also make a regular expression to look for the desired 
extensions. Here is one way:

import os, re

def get_files(exts, upd_dir):
     extnMatch = re.compile('(%s)$' % '|'.join(map(re.escape, exts)))

     def criteria(filename):
             return extnMatch.search(filename) \
                     and not os.path.islink(filename)
     return filter(criteria, os.listdir(upd_dir))

print get_files(['.java', '.txt'], '.')


I better pick apart the extnMatch line a bit...what it does is build a regular expression that 
matches any of the extensions if they occur at the end of the filename.

  >>> exts = ['.java', '.txt']

First I use map() to apply re.escape() to each extension. This escapes any characters in the 
extension that have special meaning in a regex; specifically the '.':
  >>> e1 = map(re.escape, exts)
  >>> e1
['\\.java', '\\.txt']

I could also have used a list comprehension
e1 = [ re.escape(ext) for ext in exts ]
but for applying a function like this I usually use map()

Next I join the individual extensions with '|'. In a regex this selects between alternatives.

  >>> e2 = '|'.join(e1)
  >>> e2
'\\.java|\\.txt'

Finally I put the alternatives in parentheses to group them and add a '$' at the end meaning 'match 
the end of the string.
  >>> e3 = '(%s)$' % e2
  >>> e3
'(\\.java|\\.txt)$'


This solution is definitely harder to explain than Javier's :-)

Kent



More information about the Tutor mailing list