Regular expression issue
Marc 'BlackJack' Rintsch
bj_666 at gmx.net
Wed Jul 19 07:02:23 EDT 2006
In <1153304898.226689.254330 at m79g2000cwm.googlegroups.com>, dmbkiwi wrote:
> I'm trying to parse a line of html as follows:
>
> <td style="width:20%" align="left">101.120:( KPA (-)</td>
> <td style="width:35%" align="left">Snow on Ground)0 </td>
>
> however, sometimes it looks like this:
>
> <td style="width:20%" align="left">N/A</td>
> <td style="width:35%" align="left">Snow on Ground)0 </td>
>
>
> I want to get either the numerical value 101.120 (which could be a
> different number depending on the data that's been fed into the page,
> or in terms of the second option, 'N/A'.
>
> The regexp I'm using is:
>
> .*?Pressure.*?"left">(?P<baro>\d+?|N/A)</td>|\sKPA.*?Snow\son\sGround
>
> Can someone help me debug this. It's not picking up the number, and
> I'm not sure I've got the syntax for '|' right, but can't find a
> detailed tutorial on how to use |.
What about something like
align="left">((?P<baro>[\d.]+):\(\sKPA)|(?P<na>N/A).*Ground\)
You need the flags re.MULTILINE and re.DOTALL when compiling the regular
expression.
You'll have to check the 'baro' and 'na' groups to decide if it matched a
numerical value or 'N/A'.
Ciao,
Marc 'BlackJack' Rintsch
More information about the Python-list
mailing list