Why it does NOT work on Linux ?

Markus Schönhaber mks99 at t-online.de
Sat Feb 2 14:25:59 EST 2002


> I have the following part of program that finds ItemID numbers.
> Here, for example, are two
> 146759 and 146700 .
> This program works well under windows but on Linux it does not
> find any number. Can you please help?
> Thanks.
> Ladislav
>
> ####################
> import re
> Text="""<tr BGCOLOR="#FFFFFF">
>                       <td valign="top" align="left"><a
> href="lead.asp?ItemID=146759">[CN] Oak, Foiled & Antique
> Furniture</a></td>
>                       <td valign="top" align="center">18/12/2001</td>
>                     </tr><tr BGCOLOR="#FFFFFF">
>                     <td valign="top" align="left"><a
> href="lead.asp?ItemID=146700">[CN] Oak, Foiled & Antique
> Furniture</a></td>
>                       <td valign="top" align="center">18/12/2001</td>
>                     </tr>"""
>
> IDs=re.compile('.*<a href="lead.asp\?ItemID=(\d{5,10}).*')
> Results=re.findall(IDs,Text)
> print Results

The interesting thing is, that it works at all for you. It definitely
doesn't on my WinXP machine.

1.) Here
> IDs=re.compile('.*<a href="lead.asp\?ItemID=(\d{5,10}).*')
> Results=re.findall(IDs,Text)

you call re.findall with a regular expression object as a first parameter
which should be a string. What you want to do is

Results = IDs.findall(Text)

i. e. call the appropriate method on the re object you created.


2.) There are two whitespaces (a space and a newline - the latter may be
inserted by your or my mail agent) between "<a" and "href...". So you should
replace your re with something like this:

IDs = re.compile('<a\s*href="lead.asp\?ItemID=(\d{5,10})', re.MULTILINE)

Since you are using findall, the enclosing ".*" expressions are superfluous.


BTW: Be careful reagarding backslashes in REs since the string gets
interpreted two times.

Regards
  mks






More information about the Python-list mailing list