reusing parts of a string in RE matches?

Fri May 12 04:48:25 EDT 2006

Mirco Wahab wrote:

> In Python, you have to deconstruct
> the 2D-lists (here: long list of
> short lists [a,2] ...) by
> 'slicing the slice':
>
>    char,num = list[:][:]
>
> in a loop and using the apropriate element then:
>
>    import re
>
>    t = 'a1a2a3Aa4a35a6b7b8c9c';
>    r =  r'(\w)(?=(.)\1)'
>    l = re.findall(r, t)
>
>    for a,b in (l[:][:]) : print  b
>
> In the moment, I find this syntax
> awkward and arbitary, but my mind
> should change if I'm adopted more
> to this in the end ;-)

in contemporary Python, this is best done by a list comprehension:

   l = [m[1] for m in re.findall(r, t)]

or, depending on what you want to do with the result, a generator
expression:

   g = (m[1] for m in re.findall(r, t))

or

   process(m[1] for m in re.findall(r, t))

if you want to avoid creating the tuples, you can use finditer instead:

    l = [m.group(2) for m in re.finditer(r, t)]
    g = (m.group(2) for m in re.finditer(r, t))

finditer is also a good tool to use if you need to do more things with
each match:

    for m in re.finditer(r, t):
        s = m.group(2)
        ... process s in some way ...

the code body will be executed every time the RE engine finds a match,
which can be useful if you're working on large target strings, and only
want to process the first few matches.

    for m in re.finditer(r, t):
        s = m.group(2)
        if s == something:
            break
        ... process s in some way ...

</F>