Regular expressions, help?

azrazer azra at glop.com
Thu Apr 19 09:15:22 EDT 2012


Le 19/04/2012 14:02, Sania a écrit :
> On Apr 19, 2:48 am, Jussi Piitulainen<jpiit... at ling.helsinki.fi>
[...]
>>>   text="accounts put the death toll at 637 and those missing at
>>> 653 , but the total number is likely to be much bigger"
>>>        dead=re.match(r".*death toll.*(\d[,\d\.]*)", text)
>>>        deadnum=dead.group(1)
>>>        deaths.append(deadnum)
>>>        print deaths
>>
>> It's the regexp. The .* after "death toll" each the input as far as it
>> can without making the whole match fail. The group matches only the
>> last digit in the text.
>>
>> You could allow only non-digits before the number. Or you could look
>> up the variant of * that only matches as much as it must.
>
> Hey Thanks,
> So now my regex is
>
>      dead=re.match(r".*death toll.{0,20}(\d[,\d\.]*)", text)
Hi,
But there, your regex matches :
<something>death toll<anything which length is <=20> followed by what 
you capture (which is made up of a digit, at least)
there are at least two issues here :
  - the number of characters between death toll and the figure may be > 20
  - your {0,20} is greedy => .{0,20} matches as many as "." as it can 
AND one digit is matched by (\d[,\d\.]*), since your group captures a 
digit followed(OR NOT) by a digit, a comma, a dot
     =====> so " at 63" is sucked by .{0,20} and (\d[,\d\.]*) matches 
the remaining digit "7"

a solution would be to follow what Jussi suggested...
=> dead=re.match(r".*death toll\D*(\d*)", text)
>
> But I only find 7 not 657. How is it that the group is only matching
> the last digit?
=> .{,20} greed
> The whole thing is parenthesis not just the last part. ?
yeah but only one digit remains when your group matches...

Good luck understanding regexes, it's a powerful tool ! :)

best,
azra.




More information about the Python-list mailing list