Help with regular expression in python

rurpy at yahoo.com rurpy at yahoo.com
Fri Aug 19 15:55:01 EDT 2011


On 08/19/2011 11:33 AM, Matt Funk wrote:
> On Friday, August 19, 2011, Alain Ketterlin wrote:
>> Matt Funk <matze999 at gmail.com> writes:
>> > thanks for the suggestion. I guess i had found another way around the
>> > problem as well. But i really wanted to match the line exactly and i
>> > wanted to know why it doesn't work. That is less for the purpose of
>> > getting the thing to work but more because it greatly annoys me off that
>> > i can't figure out why it doesn't work. I.e. why the expression is not
>> > matches {32} times. I just don't get it.
>>
>> Because a line is not 32 times a number, it is a number followed by 31
>> times "a space followed by a number". Using Jason's regexp, you can
>> build the regexp step by step:
>>
>> number = r"\d\.\d+e\+\d+"
>> numbersequence = r"%s( %s){31}" % (number,number)
> That didn't work either. Using the (modified (where the (.+) matches the end of
> the line)) expression as:
>
> number = r"\d\.\d+e\+\d+"
> numbersequence = r"%s( %s){31}(.+)" % (number,number)
> instance_linetype_pattern = re.compile(numbersequence)
>
> The results obtained are:
> results:
> [(' 2.199000e+01', ' : (instance: 0)\t:\tsome description')]
> so this matches the last number plus the string at the end of the line, but no
> retaining the previous numbers.

The secret is buried very unobtrusively in the re docs,
where it has caught me out in the past.  Specifically
in the docs for re.group():

  "If a group is contained in a part of the pattern that
  matched multiple times, the last match is returned."

In addition to the findall solution someone else
posted, another thing you could do is to explicitly
express the groups in your re:

  number = r"\d\.\d+e\+\d+"
  groups = (r"( %s)" % number)*31
  numbersequence = r"%s%s(.+)" % (number,groups)
  ...
  results = match_object.group(range(1,33))

Or (what I would probably do), simply match the
whole string of numbers and pull it apart later:

  number = r"\d\.\d+e\+\d+"
  numbersequence = r"(%s(?: %s){31})(.+)" % (number,number)
  results = (match_object.group(1)).split()

[none of this code is tested but should be close
enough to convey the general idea.]



More information about the Python-list mailing list