Python Regex Question

Ivo ivonet at gmail.com
Fri Sep 21 15:09:22 EDT 2007


crybaby wrote:
> On Sep 20, 4:12 pm, Tobiah <t... at tobiah.org> wrote:
>> joemystery... at gmail.com wrote:
>>> I need to extract the number on each <td tags from a html file.
>>> i.e 49.950 from the following:
>>> <td align=right width=80><font size=2 face="New Times
>>> Roman,Times,Serif"> 49.950 </font></td>
>>> The actual number between:  49.950  can be any number of
>>> digits before decimal and after decimal.
>>> <td align=right width=80><font size=2 face="New Times
>>> Roman,Times,Serif"> ######.#### </font></td>
>>> How can I just extract the real/integer number using regex?
>> '[0-9]*\.[0-9]*'
>>
>> --
>> Posted via a free Usenet account fromhttp://www.teranews.com
> 
> I am trying to use BeautifulSoup:
> 
>     soup = BeautifulSoup(page)
> 
>     td_tags = soup.findAll('td')
>     i=0
>     for td in td_tags:
>         i = i+1
>         print "td: ", td
>         # re.search('[0-9]*\.[0-9]*', td)
>         price = re.compile('[0-9]*\.[0-9]*').search(td)
> 
> I am getting an error:
> 
>            price= re.compile('[0-9]*\.[0-9]*').search(td)
> TypeError: expected string or buffer
> 
> Does beautiful soup returns array of objects? If so, how do I pass
> "td" instance as string to re.search?  What is the different between
> re.search vs re.compile().search?
> 

I don't know anything about BeautifulSoup, but to the other questions:

var=re.compile(regexpr) compiles the expression and after that you can 
use var as the reference to that compiled expression (costs less)

re.search(expr, string) compiles and searches every time. This can 
potentially be more expensive in calculating power. especially if you 
have to use the expression a lot of times.

The way you use it it doesn't matter.

do:
pattern = re.compile('[0-9]*\.[0-9]*')
result = pattern.findall(your tekst here)

Now you can reuse pattern.

Cheers,
Ivo.



More information about the Python-list mailing list