spider, why isnt it finding the url?

notnorwegian at yahoo.se notnorwegian at yahoo.se
Thu May 22 20:36:29 EDT 2008


On 23 Maj, 02:02, notnorweg... at yahoo.se wrote:
> this program doesnt produce any output, however i know from testing
> that the url-regexp matches urls...
>
> import urllib
> import re
>
> site = urllib.urlopen("http://www.python.org")
>
> email = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
> url = re.compile("^((ht|f)tp(s?)\:\/\/|~/|/)?([\w]+:\w+@)?([a-zA-Z]{1}
> ([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?((/?\w+/)+|/?)(\w+\.[\w]{3,4})?
> ((\?\w+=\w+)?(&\w+=\w+)*)?")
>
> for row in site:
>     obj = url.search(row)
>     if obj != None:
>         print obj.group()

hmm ok it it printing it rows per rows. not what i expected.




More information about the Python-list mailing list