Python Regex Question

Gerardo Herzig gherzig at fmed.uba.ar
Thu Sep 20 17:01:38 EDT 2007


joemystery123 at gmail.com wrote:

>I need to extract the number on each <td tags from a html file.
>
>i.e 49.950 from the following:
>
><td align=right width=80><font size=2 face="New Times
>Roman,Times,Serif"> 49.950 </font></td>
>
>The actual number between:  49.950  can be any number of
>digits before decimal and after decimal.
>
><td align=right width=80><font size=2 face="New Times
>Roman,Times,Serif"> ######.#### </font></td>
>
>How can I just extract the real/integer number using regex?
>
>  
>
If all the td's content has the  [value_to_extract]  pattern, 
things goes simplest

[untested]

/<td.* ([^&]*) /

the parentesis will be used to group() the result (and extract what you 
really want)

Cheers
Gerardo



More information about the Python-list mailing list