Regexps and lists

John Machin sjmachin at lexicon.net
Sun Feb 11 17:38:40 EST 2007


On Feb 12, 9:08 am, "Paddy" <paddy3... at googlemail.com> wrote:
> I don't know enough to write an R.E. engine so forgive me if I am
> being naive.
> I have had to atch text involving lists in the past. These are usually
> comma separated words such as
>  "egg,beans,ham,spam,spam"
> you can match that with:
>  r"(\w+)(,\w+)*"

You *can*, but why do that? What are you trying to achieve? What is
the point of distinguishing the first element from the remainder?

See if any of the following do what you want:

| >>> s = "egg,beans,ham,spam,spam"
| >>> s.split(',')
| ['egg', 'beans', 'ham', 'spam', 'spam']
| >>> import re
| >>> re.split(r",", s)
| ['egg', 'beans', 'ham', 'spam', 'spam']
| >>> re.split(r"(,)", s)
| ['egg', ',', 'beans', ',', 'ham', ',', 'spam', ',', 'spam']

> and when you look at the groups you get the following
> >>> import re
> >>> re.match(r"(\w+)(,\w+)*", "egg,beans,ham,spam,spam").groups()
> ('egg', ',spam')
>
> Notice how you only get the last match as the second groups value.
>
> It would be nice if a repeat operator acting on a group turned that
> group into a sequence returning every match, in order. (or an empty
> sequence for no matches).
>
> The above exaple would become:
>
>  >>> import re>>> re.newmatch(r"(\w+)(,\w+)*", "egg,beans,ham,spam,spam").groups()
>
> ('egg', ('beans', 'ham', 'spam', ',spam'))

And then what are you going to do with the answer? Something like
this, maybe:

| >>> actual_answer = ('egg', ('beans', 'ham', 'spam', ',spam'))
| >>> [actual_answer[0]] +list(actual_answer[1])
| ['egg', 'beans', 'ham', 'spam', ',spam']


> 1, Is it possible?

Maybe, but I doubt the utility ...

> do any other RE engines do this?

If your Google is not working, then mine isn't either.

> 2, Should it be added to Python?

No.

HTH,

John




More information about the Python-list mailing list