Help on regular expression match

Johnny Lee johnnyandfiona at hotmail.com
Fri Sep 23 01:36:30 EDT 2005


Hi,
   I've met a problem in match a regular expression in python. Hope
any of you could help me. Here are the details:

   I have many tags like this:
      xxx<a href="http://xxx.xxx.xxx" xxx>xxx
      xxx<a href="wap://xxx.xxx.xxx" xxx>xxx
      xxx<a href="http://xxx.xxx.xxx" xxx>xxx
      .....
   And I want to find all the "http://xxx.xxx.xxx" out, so I do it
like this:
      httpPat = re.compile("(<a )(href=\")(http://.*)(\")")
      result = httpPat.findall(data)
   I use this to observe my output:
      for i in result:
         print i[2]
   Surprisingly I will get some output like this:
      http://xxx.xxx.xxx">xxx</a>xxx
   In fact it's filtered from this kind of source:
      <a href="http://xxx.xxx.xxx">xxx</a>xxx"
   But some result are right, I wonder how can I get the all the
answers clean like "http://xxx.xxx.xxx"? Thanks for your help.


Regards,
Johnny




More information about the Python-list mailing list