Regular expressions, help?

Jussi Piitulainen jpiitula at ling.helsinki.fi
Thu Apr 19 02:48:02 EDT 2012


Sania writes:

> So I am trying to get the number of casualties in a text. After 'death
> toll' in the text the number I need is presented as you can see from
> the variable called text. Here is my code
> I'm pretty sure my regex is correct, I think it's the group part
> that's the problem.
> I am using nltk by python. Group grabs the string in parenthesis and
> stores it in deadnum and I make deadnum into a list.
> 
>  text="accounts put the death toll at 637 and those missing at
> 653 , but the total number is likely to be much bigger"
>       dead=re.match(r".*death toll.*(\d[,\d\.]*)", text)
>       deadnum=dead.group(1)
>       deaths.append(deadnum)
>       print deaths

It's the regexp. The .* after "death toll" each the input as far as it
can without making the whole match fail. The group matches only the
last digit in the text.

You could allow only non-digits before the number. Or you could look
up the variant of * that only matches as much as it must.



More information about the Python-list mailing list