Strange re behavior: normal?

Robin Munn rmunn at pobox.com
Mon Aug 18 12:40:51 EDT 2003


Michael Janssen <Janssen at rz.uni-frankfurt.de> wrote:
> 
> What's the good of splitting by boundaries? Someone else wanted this a 
> few days ago on tutor and I can't figure out a reason by now.

Heh. I bet I know the name of the person who was asking about this on
the tutor list. He's a friend of mine, and I've been helping him learn
Python. He E-mailed me about trying to split on word boundaries with
re.split(r'\b', 'some text'), and it was his E-mail that caused me to
discover that splitting by boundaries didn't do what I expected.

What's the good of it? As someone else pointed out, it allows you to
fetch the words and the separating text, yielding:

    ['See', ' ', 'Spot', '. ', 'See', ' ', 'Spot', ' ', 'run', '.']

which may be useful in certain English-language-parsing situations,
since it would allow you to look "ahead" or "back" from a word to see
what punctuation precedes or follows it.

Anyway, the re.split behavior I described isn't particularly bothering
me, but I do think it should be better documented. Time to submit a doc
patch, methinks...

-- 
Robin Munn <rmunn at pobox.com> | http://www.rmunn.com/ | PGP key 0x6AFB6838
-----------------------------+-----------------------+----------------------
"Remember, when it comes to commercial TV, the program is not the product.
YOU are the product, and the advertiser is the customer." - Mark W. Schumann




More information about the Python-list mailing list