Regular expression issue

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Wed Jul 19 07:02:23 EDT 2006


In <1153304898.226689.254330 at m79g2000cwm.googlegroups.com>, dmbkiwi wrote:

> I'm trying to parse a line of html as follows:
> 
> <td style="width:20%" align="left">101.120:( KPA (-)</td>
> <td style="width:35%" align="left">Snow on Ground)0 </td>
> 
> however, sometimes it looks like this:
> 
> <td style="width:20%" align="left">N/A</td>
> <td style="width:35%" align="left">Snow on Ground)0 </td>
> 
> 
> I want to get either the numerical value 101.120 (which could be a
> different number depending on the data that's been fed into the page,
> or in terms of the second option, 'N/A'.
> 
> The regexp I'm using is:
> 
> .*?Pressure.*?"left">(?P<baro>\d+?|N/A)</td>|\sKPA.*?Snow\son\sGround
> 
> Can someone help me debug this.  It's not picking up the number, and
> I'm not sure I've got the syntax for '|' right, but can't find a
> detailed tutorial on how to use |.

What about something like

   align="left">((?P<baro>[\d.]+):\(\sKPA)|(?P<na>N/A).*Ground\)

You need the flags re.MULTILINE and re.DOTALL when compiling the regular
expression.

You'll have to check the 'baro' and 'na' groups to decide if it matched a
numerical value or 'N/A'.

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list