Why it does NOT work on Linux ?

Eddie Corns eddie at holyrood.ed.ac.uk
Fri Feb 1 10:03:32 EST 2002


"A" <printers at sendme.cz> writes:

>I have the following part of program that finds ItemID numbers.
>Here, for example, are two
>146759 and 146700 .
>This program works well under windows but on Linux it does not 
>find any number. Can you please help?
>Thanks.
>Ladislav

>####################
>import re
>Text="""<tr BGCOLOR="#FFFFFF">
>                      <td valign="top" align="left"><a 
>href="lead.asp?ItemID=146759">[CN] Oak, Foiled & Antique 
>Furniture</a></td>
>                      <td valign="top" align="center">18/12/2001</td>
>                    </tr><tr BGCOLOR="#FFFFFF">
>                    <td valign="top" align="left"><a 
>href="lead.asp?ItemID=146700">[CN] Oak, Foiled & Antique 
>Furniture</a></td>
>                      <td valign="top" align="center">18/12/2001</td>
>                    </tr>"""

>IDs=re.compile('.*<a href="lead.asp\?ItemID=(\d{5,10}).*')
>Results=re.findall(IDs,Text)
>print Results

>##############
>t

Using the text exactly as above it wouldn't work because the patterns you're
searching for have newlines in the middle of them.  Changing either the
compiled pattern or the text works fine for Python 2.1 on Linux.

You probably don't want to hear people saying that this is a really hairy way
of parsing HTML anyway or talking about htmllib etc. but this is the sort of
reason why it's better to do it 'properly' :)

Eddie



More information about the Python-list mailing list