split encloser

Alex Martelli aleax at aleax.it
Wed Apr 16 03:53:29 EDT 2003


Chris wrote:

> Alex Martelli <aleax at aleax.it> wrote in message
> news:<7Bgma.8749$LB6.237526 at news1.tin.it>...
>> Chris wrote:
>>    ...
>> > Thanks Inyeol. Be nice if they gave a small example of using finditer.
>> 
>> How small?  E.g., here's how you could rewrite (a simpler version of)
>> findall using finditer (warning, untested):
>> 
>> def findall(are, instring):
>>     return [mo.group(0) for mo in are.finditer(instring)]
>> 
>> Too small?  Then perhaps you could mention a larger one...?
>> 
>> 
>> Alex
> 
> I had some small exposure to "iterators" in C++, and I remember them
> being used in loops - iterating over the items in a container. The
> example you have given appears to create a list from an "iterator"
> with no loop. Something called an "iterator" creating a list with no
> loop is very perplexing to me. But maybe my problem is I don't
> recognize the syntax [ mo.group(0) for ... ]. It looks like a method
> call on an object, followed by an incomplete for loop.

the syntax:

    [ <expression> for <target> in <iterable> ]

is known in Python as a "List Comprehension".  There is no issue of
"with no loop" -- the "for" keyword inside the list comprehension
is indicating exactly the fact that the loop is taking place.  (List
Comprehensions may optionally have other clauses too -- for clauses
to indicate nested loops, if clauses to use only SOME of the items
in an iterable -- but I'm using the simplest form in the above).

If you're familiar with Python 1.5.2 but none of the Python changes
in the last 4 years or so, then one way to explain list comprehensions
is to say that

    return [ <expression> for <target> in <iterable> ]

is just the same as:

    templist = []
    for <target> in <iterable> :
        templist.append( <expression> )
    return templist

The list comprehension notation is a Pythonization of Haskell's, which
uses punctuation rather than the for and in keywords -- it would say:
    [ <expression> | <target> <- <iterable> ]
which reads just fine in Haskell but definitely not in Python; in turn,
Haskell's list comprehension notation is a simple adaptation of the
widespread "set comprehension" notation used in maths, e.g.
    { x*x | x <- S }
to mean "the set of the squares of the elements of set S" (where
the <- set-membership indicato is often indicated by some glyph
that's more reminiscent of the Greek letter epsilon).


Python iterators are rather different beasts from C++'s, though
the analogy between a Python iterator and a C++ "input iterator"
is reasonably good... where in C++ you might have *it++ to
indicate "fetch the current value of the iterator then advance
the iterator", in Python you'd have it.next() for just the same
purpose [the for loop makes that call intrinsically on your
behalf!]; where in C++ you need to find out whether an iterator
is done by comparing it with some "end marker", in Python an
iterator lets you know it's done by raising StopIteration when
you call its .next method [again, the for loop handles that for
you intrinsically, catching StopIteration and just taking it as
the indicator for normal termination of the iteration/loop].


So, back to our muttons: are.finditer(instring) returns an iterable
(which is actually already an iterator, but that's not very
important here) whose items are the match-objects for each of
the nonoverlapping matches of compiled regular expression object
are inside string instring.  So, a loop of the form:
    for mo in are.finditer(instring):
        ...
(whether written out like this, or inside a list comprehension,
makes no difference) makes mo assume, one after the other, the
values of RE matchobjects for each of those non-overlapping
matches, left to right inside instring.  So for example a call
to mo.group(0) gives the substring of instring that corresponds
to the specific nonoverlapping match we're currently at, inside
a loop such as the above.


Alex





More information about the Python-list mailing list