regex help

Chris Rebert clp2 at rebertia.com
Wed Jul 8 18:37:48 EDT 2009


On Wed, Jul 8, 2009 at 3:06 PM, David<david.bramer at googlemail.com> wrote:
> Hi
>
> I have a few regexs I need to do, but im struggling to come up with a
> nice way of doing them, and more than anything am here to learn some
> tricks and some neat code rather than getting an answer - although
> thats obviously what i would like to get to.
>
> Problem 1 -
>
> <span class="chg"
>                id="ref_678774_cp">(25.47%)</span><br>
>
> I want to extract 25.47 from here - so far I've tried -
>
> xPer = re.search('<span class="chg" id="ref_"'+str(xID.group(1))+'"_cp
> \">(.*?)%', content)
>
> and
>
> xPer = re.search('<span class=\"chg\" id=\"ref_"+str(xID.group(1))+"_cp
> \">\((\d*)%\)</span><br>', content)
>
> neither of these seem to do what I want - am I not doing this
> correctly? (obviously!)
>
> Problem 2 -
>
> <td> </td>
>
> <td width="1%" class=key>Open:
> </td>
> <td width="1%" class=val>5.50
> </td>
> <td> </td>
> <td width="1%" class=key>Mkt Cap:
> </td>
> <td width="1%" class=val>6.92M
> </td>
> <td> </td>
> <td width="1%" class=key>P/E:
> </td>
> <td width="1%" class=val>21.99
> </td>
>
>
> I want to extract the open, mkt cap and P/E values - but apart from
> doing loads of indivdual REs which I think would look messy, I can't
> think of a better and neater looking way. Any ideas?

Use an actual HTML parser? Like BeautifulSoup
(http://www.crummy.com/software/BeautifulSoup/), for instance.

I will never understand why so many people try to parse/scrape
HTML/XML with regexes...

Cheers,
Chris
-- 
http://blog.rebertia.com



More information about the Python-list mailing list