regex: multiple matching for one string

rurpy at yahoo.com rurpy at yahoo.com
Fri Jul 24 00:19:52 EDT 2009


Nick Dumas wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Agreed. Two string.split()s, first at the semi-colon and then at the
> equal sign, will yield you your value, without having to fool around
> with regexes.
>
> On 7/23/2009 9:23 AM, Mark Lawrence wrote:
>> scriptlearner at gmail.com wrote:
>>> For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
>>> will like to take out the values (valuea, valueb, and valuec).  How do
>>> I do that in Python?  The group method will only return the matched
>>> part.  Thanks.
>>>
>>> p = re.compile('#a=*;b=*;c=*;')
>>> m = p.match(line)
>>>         if m:
>>>              print m.group(),
>>
>> IMHO a regex for this is overkill, a combination of string methods such
>> as split and find should suffice.

You're saying that something like the following
is better than the simple regex used by the OP?

[untested]
values = []
parts = line.split(';')
if len(parts) != 4: raise SomeError()
for p, expected in zip (parts[-1], ('#a','b','c')):
    name, x, value = p.partition ('=')
    if name != expected or x != '=':
        raise SomeError()
    values.append (value)
print values[0], values[1], values[2]

Blech, not in my book.  The regex checks the
format of the string, extracts the values, and
does so very clearly.  Further, it is easily
adapted to other similar formats, or evolutionary
changes in format.  It is also (once one is
familiar with regexes -- a useful skill outside
of Python too) easier to get right (at least in
a simple case like this.)

The only reason I can think of to prefer
a split-based solution is if this code were
performance-critical in that I would expect
the split code to be faster (although I don't
know that for sure.)

This is a perfectly fine use of a regex.



More information about the Python-list mailing list