returning regex matches as lists

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Fri Feb 15 20:31:37 EST 2008


En Fri, 15 Feb 2008 19:25:59 -0200, Jonathan Lukens  
<jonathan.lukens at gmail.com> escribió:

>> What would you like to see instead?
>
> I had mostly just expected that there was some method that would
> return each entire match as an item on a list.  I have this pattern:
>
>>>> import re
>>>> corporate_names =  
>>>> re.compile(u'(?u)\\b([А-Я]{2,}\\s+)([<<"][а-яА-Я]+)(\\s*-?[а-яА-Я]+)*([>>"])')
>>>> terms = corporate_names.findall(sourcetext)
>
> Which matches a specific way that Russian company names are
> formatted.  I was expecting a method that would return this:
>
>>>> terms
> [u'string one', u'string two', u'string three']
>
> ...mostly because I was working it this way in Java and haven't
> learned to do things the Python way yet.  At the suggestion from
> someone on the list, I just used list() on all the tuples like so:

The group() method of match objects does what you want:

terms = [match.group() for match in corporate_names.finditer(sourcetext)]

See http://docs.python.org/lib/match-objects.html

>>>> detupled_terms = [list(term_tuple) for term_tuple in terms]
>>>> delisted_terms = [''.join(term_list) for term_list in detupled_terms]
>
> which achieves the desired result, but I am not a programmer and so I
> would still be interested to know if there is a more elegant way of
> doing this.

That ''.join(...) works equally well on tuples; you don't have to convert  
tuples to lists first:

delisted_terms = [''.join(term_list) for term in terms]

-- 
Gabriel Genellina




More information about the Python-list mailing list