a list/re problem

Fri Dec 11 16:24:07 EST 2009

Ed Keith wrote:

> I have a problem and I am trying to find a solution to it that is both
> efficient and elegant.
> 
> I have a list call it 'l':
> 
> l = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']
> 
> Notice that some of the items in the list start and end with an '*'. I
> wish to construct a new list, call it 'n' which is all the members of l
> that start and end with '*', with the '*'s removed.
> 
> So in the case above n would be ['nbh', 'jkjsdfjasd']
> 
> the following works:
> 
> r = re.compile('\*(.+)\*')
> 
> def f(s):
>     m = r.match(s)
>     if m:
>         return m.group(1)
>     else:
>         return ''
> 
> n =  [f(x) for x in l if r.match(x)]
> 
> 
> 
> But it is inefficient, because it is matching the regex twice for each
> item, and it is a bit ugly.
> 
> I could use:
> 
> 
> n = []
> for x in keys:
>     m = r.match(x)
>         if m:
>             n.append(m.group(1))
> 
> 
> It is more efficient, but much uglier.

It's efficient and easy to understand; maybe you have to readjust your 
taste.

> Does anyone have a better solution?

In this case an approach based on string slicing is probably best. When the 
regular expression gets more complex you can use a nested a generator 
expression:

>>> items = ['asc', '*nbh*', 'jlsdjfdk', 'ikjh', '*jkjsdfjasd*', 'rewr']
>>> match = re.compile(r"\*(.+)\*").match
>>> [m.group(1) for m in (match(s) for s in items) if m is not None]
['nbh', 'jkjsdfjasd']

Peter