regex: multiple matching for one string

rurpy at yahoo.com rurpy at yahoo.com
Fri Jul 24 22:41:50 EDT 2009


Scott David Daniels wrote:
> rurpy at yahoo.com wrote:
>> Nick Dumas wrote:
>>> On 7/23/2009 9:23 AM, Mark Lawrence wrote:
>>>> scriptlearner at gmail.com wrote:
>>>>> For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
>>>>> will like to take out the values (valuea, valueb, and valuec).  How do
>>>>> I do that in Python?  The group method will only return the matched
>>>>> part.  Thanks.
>>>>>
>>>>> p = re.compile('#a=*;b=*;c=*;')
>>>>> m = p.match(line)
>>>>>         if m:
>>>>>              print m.group(),
>>>> IMHO a regex for this is overkill, a combination of string methods such
>>>> as split and find should suffice.
>>
>> You're saying that something like the following
>> is better than the simple regex used by the OP?
>> [untested]
>> values = []
>> parts = line.split(';')
>> if len(parts) != 4: raise SomeError()
>> for p, expected in zip (parts[-1], ('#a','b','c')):
>>     name, x, value = p.partition ('=')
>>     if name != expected or x != '=':
>>         raise SomeError()
>>     values.append (value)
>> print values[0], values[1], values[2]
>
> I call straw man: [tested]
>      line = "#a=valuea;b=valueb;c=valuec;"
>      d = dict(single.split('=', 1)
>               for single in line.split(';') if single)
>      d['#a'], d['b'], d['c']
> If you want checking code, add:
>      if len(d) != 3:
>          raise ValueError('Too many keys: %s in %r)' % (
>                               sorted(d), line))

OK, that seems like a good solution.  It certainly
wasn't an obvious solution to me.  I still have no
problem maintaining that

      [tested]
      line = "#a=valuea;b=valueb;c=valuec;"
      m = re.match ('#a=(.*);b=(.*);c=(.*);', line)
      m.groups((1,2,3))
(If you want checking code, nothing else required.)

is still simpler and clearer (with the obvious
caveat that one is familiar with regexes.)

>> Blech, not in my book.  The regex checks the
>> format of the string, extracts the values, and
>> does so very clearly.  Further, it is easily
>> adapted to other similar formats, or evolutionary
>> changes in format.  It is also (once one is
>> familiar with regexes -- a useful skill outside
>> of Python too) easier to get right (at least in
>> a simple case like this.)
> The posted regex doesn't work; this might be homework, so
> I'll not fix the two problems.  The fact that you did not
> see the failure weakens your claim of "does so very clearly."

Fact? Maybe you should have read the whole thread before
spewing claims that I did not see the regex problem.
The fact that you did not bother to weakens any claims
you make in this thread.
(Of course this line of argumentation is stupid anyway --
even had I not noticed the problem, it would say nothing
about the general case.  My advice to you is not to try
to extrapolate when the sample size is one.)



More information about the Python-list mailing list