Regular expressions, help?

Jon Clements joncle at googlemail.com
Thu Apr 19 09:52:33 EDT 2012


On Thursday, 19 April 2012 07:11:54 UTC+1, Sania  wrote:
> Hi,
> So I am trying to get the number of casualties in a text. After 'death
> toll' in the text the number I need is presented as you can see from
> the variable called text. Here is my code
> I'm pretty sure my regex is correct, I think it's the group part
> that's the problem.
> I am using nltk by python. Group grabs the string in parenthesis and
> stores it in deadnum and I make deadnum into a list.
> 
>  text="accounts put the death toll at 637 and those missing at
> 653 , but the total number is likely to be much bigger"
>       dead=re.match(r".*death toll.*(\d[,\d\.]*)", text)
>       deadnum=dead.group(1)
>       deaths.append(deadnum)
>       print deaths
> 
> Any help would be appreciated,
> Thank you,
> Sania

Or just don't fully rely on a regex. I would, for time, and the little sanity I believe I have left, would just do something like:

death_toll = re.search(r'death toll.*\d+', text).group().rsplit(' ', 1)[1]

hth,

Jon.



More information about the Python-list mailing list